Re: test10-pre7

2000-10-31 Thread Linus Torvalds



Ok, how about this approach? It only works for the case where we do not
have the kind of multiple stuff that drivers/net has, but hey, we don't
actually need to handle all the cases right now.

We can leave that for the future, as the configuration process is likely
to change anyway during 2.5.x, and the multiple object case may go away
entirely (ie the case of slhc and 8390 will become just a normal
configuration dependency: you'd have a "CONFIG_SLHC" entry that is
computed by the dependency graph at configuration time, rather than by the
Makefile at build time).

This is the simplest rule base that I could come up with that should work
for both SCSI and USB:

# Translate to Rules.make lists.
multi-used  := $(filter $(list-multi), $(obj-y) $(obj-m))
multi-objs  := $(foreach m, $(multi-used), $($(basename $(m))-objs))
active-objs := $(sort $(multi-objs) $(obj-y) $(obj-m))

O_OBJS  := $(obj-y)
M_OBJS  := $(obj-m)
MIX_OBJS:= $(filter $(export-objs), $(active-objs))

Does anybody see any problems with it? Basically, we're sidestepping the
sorting, because neither SCSI nor USB need it. Making the problem simpler
is always good.

Now, the above won't work for drivers/net, but I think it will work for
just about anything else. So let's just leave drivers/net alone for now.
Simplicity is good.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Linux-2.4.0-test10

2000-10-31 Thread Linus Torvalds


Ok, test10-final is out there now. This has no _known_ bugs that I
consider show-stoppers, for what it's worth.

And when I don't know of a bug, it doesn't exist. Let us rejoice. In
traditional kernel naming tradition, this kernel hereby gets anointed as
one of the "greased weasel" kernel series, one of the final steps in a
stable release.

We're still waiting for the Vatican to officially canonize this kernel,
but trust me, that's only a matter of time. It's a little known fact, but
the Pope likes penguins too.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Linux-2.4.0-test10

2000-10-31 Thread Linus Torvalds



On Tue, 31 Oct 2000, Rik van Riel wrote:
 On Tue, 31 Oct 2000, Linus Torvalds wrote:
  
  Ok, test10-final is out there now. This has no _known_ bugs that
  I consider show-stoppers, for what it's worth.
  
  And when I don't know of a bug, it doesn't exist. Let us
  rejoice. In traditional kernel naming tradition, this kernel
  hereby gets anointed as one of the "greased weasel" kernel
  series, one of the final steps in a stable release.
 
 Well, there's the thing with RAW IO being done into a
 process' address space and the data arriving only after
 the page gets unmapped from the process.

Yes. But that doesn't count like a "show-stopper" for me, simply because
it's one of those small details that are known, and never materialize
under normal load.

Yes, it will have to be fixed before anybody starts doing RAW IO in a
major way. And I bet it will be fixed. But it's not on my list of "I
cannot release a 2.4.0 before this is done" - even if I think it will
actually be fixed for the common case before that anyway.

(Note: I suspect that we may just have to accept the fact that due to NFS
etc issues, RAW IO into a shared mapping might not really supported at
all. I don't think any raw IO user uses it that way anyway, so I think the
big and worrisome case is actually only the swap-out case).

  We're still waiting for the Vatican to officially canonize this
  kernel, but trust me, that's only a matter of time. It's a
  little known fact, but the Pope likes penguins too.
 
 Lets just hope he doesn't need RAW IO ;)

Naah, he mainly just does some browsing with netscape, and (don't tell a
soul) plays QuakeIII with the door locked.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test10-pre7

2000-10-31 Thread Linus Torvalds



On Tue, 31 Oct 2000, Russell King wrote:

 Linus Torvalds writes:
  On Wed, 1 Nov 2000, Keith Owens wrote:
   LINK_FIRST is processed in the order it is specified, so a.o will be
   linked before z.o when both are present.  See the patch.
  
  So why don't you do the same thing for obj-y, then?
  
  Why can't you do
  
  LINK_FIRST=$(obj-y)
  
  and be done with it?
 
 Hmm, so why don't we just call it obj-y and be done with it? ;)

That was going to be my next question if somebody actually said "sure".

The question was rhetorical, since the way LINK_FIRST is implemented means
that it has all the same problems that $(obj-y) has, and is hard to get
right in the generic case (but you can get it trivially right for the
subset case, like for USB).

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Linux-2.4.0-test10

2000-10-31 Thread Linus Torvalds



On Tue, 31 Oct 2000, Miles Lane wrote:
 
 Were there no changes between test10-pre7 and test10?
 I notice you didn't send out a Changelist.
 
 The Changelists help me focus my testing.

Sorry. Here it is..

Linus
-
 - final:
- Jeff Garzik: ISA network driver cleanup, wrapper.h fixes, 8139too
  update, etc
- Mike Coleman: fix TracerPid in /proc/n/status
- Thomas Molina: mark NAT packet drop message KERN_DEBUG
- Marcelo Tosatti: nbd should use GFP_BUFFER, not GFP_ATOMIC
- Steve Pratt: TLB flush order fix
- David Miller: network and sparc updates
- Alan Cox: various details (NULL ptr checks in SCSI etc)
- Daniel Roesen: pretty up microcode revision printouts
- Mike Coleman: fix ptrace ambiguity issues
- Paul Mackerras: make yenta work even in the absense of ISA irqs
- me: make USB Makefile do the right thing for export-objs.
- Randy Dunlap, USB: fix race conditions, usb enumeration etc.

 - pre7:
- Niels Jensen: remove no-longer-needed workarounds for old gcc versions
- Ingo Molnar  Rik v Riel: VM inactive list maintenance correction
- Randy Dunlap, USB: printer.c, usb-storage, usb identification and
  memory leak fixes
- David Miller: networking updates
- David Mosberger: add AT_CLKTCK to elf information. And make AT_PAGESZ work
  for static binaries too.
- oops. pcmcia broke by mistake
- Me: truncate vs page access race fix.

 - pre6:
- Jeremy Fitzhardinge: autofs4 expiry fix
- David Miller: sparc driver updates, networking updates
- Mathieu Chouquet-Stringer: buffer overflow in sg_proc_dressz_write
- Ingo Molnar: wakeup race fix (admittedly the window was basically
  non-existent, but still..)
- Rasmus Andersen: notice that "this_slice" is no longer used for
  scheduling - delete the code that calculates it.
- ALI pirq routing update. It's even uglier than we initially thought..
- Dimitrios Michailidis: fix ipip locking bugs
- Various: face it - gcc-2.7.2.3 miscompiles structure initializers.
- Paul Cassella: locking comments on dev_base
- Trond Myklebust: NFS locking atomicity. refresh inode properly.
- Andre Hedrick: Serverworks Chipset driver, IDE-tape fix
- Paul Gortmaker: kill unused code from 8390 support.
- Andrea Arcangeli: fix nfsv3d wrong truncates over 4G
- Maciej W. Rozycki: PIIX4 needs the same USB quirk handling as PIIX3.
- me: if we cannot figure out the PCI bridge windows, just "inherit"
  the window from the parent. Better than not booting.
- Ching-Ling Lee: ALI 5451 Audio core support update

 - pre5:
- Mikael Pettersson: more Pentium IV cleanup.
- David Miller: non-x86 platforms missed "pte_same()".
- Russell King: NFS invalidate_inode_pages() can do bad things!
- Randy Dunlap: usb-core.c is gone - module fix
- Ben LaHaise: swapcache fixups for the new atomic pte update code
- Oleg Drokin: fix nm256_audio memory region confusion
- Randy Dunlap: USB printer fixes
- David Miller: sparc updates
- David Miller: off-by-one error in /proc socket dumper
- David Miller: restore non-local bind() behaviour.
- David Miller: wakeups on socket shutdown()
- Jeff Garzik: DEPCA net drvr fixes and CodingStyle
- Jeff Garzik: netsemi net drvr fix
- Jeff Garzik  Andrea Arkangeli: keyboard cleanup
- Jeff Garzik: VIA audio update
- Andrea Arkangeli: mxcsr initialization cleanup and fix
- Gabriel Paubert: better twd_i387_to_fxsr() emulation
- Andries Brouwer: proper error return in ext2 mkdir()

 - pre4:
- disable writing to /proc/xxx/mem. Sure, it works now, but it's still
  a security risk.
- IDE driver update (Victroy66 SouthBridge support)
- i810 rng driver cleanup
- fix sbus Makefile
- named initializers in module..
- ppoe: remove explicit initializer - it's done with initcalls.
- x86 WP bit detection: do it cleanly with exception handling
- Arnaldo Carvalho de Melo: memory leaks in drivers/media/video
- Bartlomiej Zolnierkiewicz: video init functions get __init
- David Miller: get rid of net/protocols.c - they get to initialize themselves
- David Miller: get rid of dev_mc_lock - we hold dev-xmit_lock anyway.
- Geert Uytterhoeven: Zorro (Amiga) bus support update
- David Miller: work around gcc-2.7.2 bug
- Geert Uytterhoeven: mark struct consw's "const".
- Jeff Garzik: network driver cleanups, ns558 joystick driver oops fix
- Tigran Aivazian: clean up __alloc_pages(), kill_super() and
  notify_change()
- Tigran Aivazian: move stuff from .data to .bss
- Jeff Garzik: divert.h typename cleanups
- James Simmons: mdacon using spinlocks
- Tigran Aivazian: fix BFS free block calculation
- David Miller: sparc32 works again
- Bernd Schmidt: fix undefined C code (set/use without a sequence point)
- Mikael Pettersson: nicer Pentium IV setup handling.
- Georg 

Re: Poll and OSS API

2000-11-02 Thread Linus Torvalds



On Thu, 2 Nov 2000, Thomas Sailer wrote:

 The OSS API (http://www.opensound.com/pguide/oss.pdf, page 102ff)
 specifies that a select _with the sounddriver's filedescriptor
 set in the read mask_ should start the recording.

So fix the stupid API.

The above is just idiocy.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: Negative scalability by removal of lock_kernel()?(Was:Strange performance behavior of 2.4.0-test9)

2000-11-03 Thread Linus Torvalds

In article [EMAIL PROTECTED],
Andrew Morton  [EMAIL PROTECTED] wrote:

neither flock() nor fcntl() serialisation are effective
on linux 2.2 or linux 2.4.  This is because the file
locking code still wakes up _all_ waiters.  In my testing
with fcntl serialisation I have seen a single Apache
instance get woken and put back to sleep 1,500 times
before the poor thing actually got to service a request.

Indeed.

flock() is the absolute worst case, and always has been.  I guess nobody
every actually bothered to benchmark it.

For kernel 2.2 I recommend that Apache consider using
sysv semaphores for serialisation. They use wake-one. 

For kernel 2.4 I recommend that Apache use unserialised
accept.

No.

Please use unserialized accept() _always_, because we can fix that. 

Even 2.2.x can be fixed to do the wake-one for accept(), if required. 
It's not going to be any worse than the current apache config, and
basically the less games apache plays, the better the kernel can try to
accomodate what apache _really_ wants done.  When playing games, you
hide what you really want done, and suddenly kernel profiles etc end up
being completely useless, because they no longer give the data we needed
to fix the problem. 

Basically, the whole serialization crap is all about the Apache people
saying the equivalent of "the OS does a bad job on something we consider
to be incredibly important, so we do something else instead to hide it".

And regardless of _what_ workaround Apache does, whether it is the sucky
fcntl() thing or using SysV semaphores, it's going to hide the real
issue and mean that it never gets fixed properly.

And in the end it will result in really really bad performance. 

Instead, if apache had just done the thing it wanted to do in the first
place, the wake-one accept() semantics would have happened a hell of a
lot earlier. 

Now it's there in 2.4.x. Please use it. PLEASE PLEASE PLEASE don't play
games trying to outsmart the OS, it will just hurt Apache in the long run.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Poll and OSS API

2000-11-03 Thread Linus Torvalds



On Sat, 4 Nov 2000, Jeff Garzik wrote:
  So fix the stupid API.
  
  The above is just idiocy.
 
 We're pretty much stuck with the API, until we look at merging ALSA in
 2.5.x.  Broken API or not, OSS is a mature API, and there are
 spec-correct apps that depend on this behavior.

Considering that about 100% of the sound drivers do not follow that
particular API damage anyway (they can't, as has been pointed out: the
driver doesn't even receive enough information to be _able_ to follow the 
documented API), I doubt that there are all that many programs that depend
on it.

Yes, some drivers apparently _try_ to follow the spec to some degree, but
we should just change the documentation asap.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: Negative scalability by removal of

2000-11-04 Thread Linus Torvalds



On Sat, 4 Nov 2000, Alan Cox wrote:

  Even 2.2.x can be fixed to do the wake-one for accept(), if required. 
 
 Do we really want to retrofit wake_one to 2.2. I know Im not terribly keen to
 try and backport all the mechanism. I think for 2.2 using the semaphore is a 
 good approach. Its a hack to fix an old OS kernel. For 2.4 its not needed

We don't need to backport of the full exclusive wait queues: we could do
the equivalent of the semaphore inside the kernel around just accept(). It
wouldn't be a generic thing, but it would fix the specific case of
accept().

Otherwise we're going to have old binaries of apache lying around forever
that do the wrong thing..

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: Negative scalability by removal of

2000-11-06 Thread Linus Torvalds



On Tue, 7 Nov 2000, Andrew Morton wrote:

 Alan Cox wrote:
  
   Even 2.2.x can be fixed to do the wake-one for accept(), if required.
  
  Do we really want to retrofit wake_one to 2.2. I know Im not terribly keen to
  try and backport all the mechanism. I think for 2.2 using the semaphore is a
  good approach. Its a hack to fix an old OS kernel. For 2.4 its not needed
 
 It's a 16-liner!  I'll cheerfully admit that this patch
 may be completely broken, but hey, it's free.  I suggest
 that _something_ has to be done for 2.2 now, because
 Apache has switched to unserialised accept().

This is why I'd love to _not_ see silly work-arounds in apache: we
obviously _can_ fix the places where our performance sucks, but only if we
don't have other band-aids hiding the true issues.

For example, with a file-locking apache, we'd have to fix the (noticeably
harder) file locking thing to be wake-one instead, and even then we'd
never be able to do as well as something that gets the same wake-one thing
without the two extra system calls.

The patch looks superficially fine to me, although it does seem to add
another cache-line to the wakeup setup - it migth be worth-while to have
the exclusive state closer. But maybe I just didn't count right.

Linus


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



test11-pre1

2000-11-07 Thread Linus Torvalds


Mostly driver updates.

With a few notable exceptions: two rather subtle MM race conditions that
happened with SMP and highmem respectively. And the FXCSR and file locking
that was already discussed on the list.

Linus

-

 - pre1:
- me: make PCMCIA work even in the absense of PCI irq's
- me: add irq mapping capabilities for Cyrix southbridges
- me: make IBMMCA compile right as a module
- me: uhhuh. Major atomic-PTE SMP race boo-boo. Fixed.
- Andrea Arkangeli: don't allow people to set security-conscious
  bits in mxcsr through ptrace SETFPXREGS.
- Jürgen Fischer: aha152x update
- Andrew Morton, Trond Myklebust: file locking fixes
- me: TLB invalidate race with highmem
- Paul Fulghum: synclink/n_hdlc driver updates
- David Miller: export sysctl_jiffies, and have the proper no-sysctl
  version handy
- Neil Brown: RAID driver deadlock and nsfd read access to
  execute-only files fix
- Keith Owens: clean up module information passing, remove
  "get_module_symbol()".
- Jeff Garzik: network (and other) driver fixes and cleanups
- Andrea Arkangeli: scheduler cleanup.
- Ching-Ling Li: fix ALi sound driver memory leak
- Anton Altaparmakov: upcase fix for NTFS
- Thomas Woller: CS4281 audio update

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] deadlock fix

2000-11-07 Thread Linus Torvalds



On Tue, 7 Nov 2000, Gary E. Miller wrote:
 
 I see this patch did not make it into test11-pre1.  Without it
 raid1 and SMP do not work together.  Please consider for test11-pre2.

You must have a different test11-pre1 than the one I have.

It's already there in -pre1, as far as I can see.

Linus


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RANT] Linux-IrDA status

2000-11-07 Thread Linus Torvalds



On Wed, 8 Nov 2000, Michael Rothwell wrote:
 Linus Torvalds wrote:
  
  Also, I've never seen much in the form of explanation, and at least the
  last patch I saw just the first screenful was so off-putting that I just
  went "Ok, I have real bugs to fix, I don't need this crap".

 Like what? I'm not sure what you're saying here. It seems that the pople
 writing the IrDA code have gotten no feedback from you as to why their
 patch is never accepted -- could you clarify?

There's one _major_ reason why things never get accepted:

 CVS trees

I'm not fed patches. I'm force-fed big changes every once in a while. I
don't like it.

I like it even less when the very first screen of a patch is basically a
stupid change that implies that somebody calls ioctl's from interrupts.

When I get a big patch like that, where the very first screen is
bletcherous, what the hell am I supposed to do? I'm not going to waste my
time on people who cannot send multiple small and well-defined patches,
and who send be big, ugly, "non-maintained" (as far as I'm concerned)
patches.

I'm surprised Alan rants about this. He knows VERY well how I work, and is
(along with Jeff Garzik and Randy Dunlap) one of the people who are very
good at sending me 25 separate patches with explanations of what they do.

Basically, if you send me a big patch with tons of changes, how the hell
DO you expect me to answer them? Does anybodt really expect me to go
through ten thousand lines of code that I do not know, and comment on it?
Obviously not, as anybody with an ounce of sense would see.

So what choice do I have? Apply them blindly?

Quite frankly, I'd rather have a few people hate me deeply than apply
stuff I don't like. If I just start blindly applying big patches, I can
avoid nasty discussions. But I'd rather have people flame me. Maybe some
day people will instead start sending me smaller commented patches.

I'm NOT going to do other peoples work for them. If people can't be
bothered to send me well-specified patches ESPECIALLY now that we're close
to 2.4.x, then I can't be bothered to apply them,

Live with it. Hat eme all you like. I do not care. Th ething I care about
is not letting too much crap through unchecked.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RANT] Linux-IrDA status

2000-11-07 Thread Linus Torvalds



On Wed, 8 Nov 2000, Michael Rothwell wrote:
 
 Like what? I'm not sure what you're saying here. It seems that the pople
 writing the IrDA code have gotten no feedback from you as to why their
 patch is never accepted -- could you clarify?

Just to clarify.

The ONLY message from the IrDA people I've gotten during the last few
weeks has been a SINGLE email from Dag Brattli, with a 330kB patch.

The whole, full, unabridged explanation for those 330kB of patches:

 Hello Linus,
 
 Here is the latest IrDA patch for Linux-2.4.0-test10. 
 
 Short summary: 
 
 o Fixes IrDA in 2.4
 o Touches _no_ other files. 
 
 Please apply! 
 
 Best regards
 
 Dag Brattli

That's it.

ONE message during the last month. ONE huge patch. From people who should
have known about 2.4.x being pending for some time. 

10,000+ lines of diff, with _no_ effort to split it up, or explain it with
anything but

"o Fixes IrDA in 2.4"

and these people expect me to reply, sending long explanations of why I
don't like them? After they did nothing of the sort for the code they
claim should have been applied? Nada.

Get a grip. 

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Pentium 4 and 2.4/2.5

2000-11-08 Thread Linus Torvalds

In article [EMAIL PROTECTED],
Alan Cox  [EMAIL PROTECTED] wrote:

Be careful with the intel patches. The ones I've seen so far tried to call the
cpu 'if86' breaking several tools that do cpu model checking off uname. They
didnt fix the 2GHz CPU limit, they use 'rep nop' in the locks which is
explicitly 'undefined behaviour' for non intel processors and they use the
TSC without checking it had one.

"rep nop" is definitely not undefined behaviour except in some older
Intel manuals. 

Do you actually know of a CPU where it doesn't work? Every single
intel-compatible CPU I know of has the rep prefixes as no-ops if they
aren't used (lock - ILL being a later, documented, addition), and the
way the prefixes work it almost has to be that way.

As prefixes they can't be part of the instruction, because you can
legally have other prefixes in between the rep and the real instruction,
which means that any sane implementation will just set a flag when it
sees the prefix, and an instruction that doesn't care will just ignore
the flag.  So you'd almost have to do _extra_ work to make "rep nop"
fail, even if it used to be specified as "undefined". 

Standard 2.4.x will definitely be using "rep nop" unless somebody can
show me a CPU where it doesn't work (and even then I probably won't care
unless that CPU is also SMP-capable).  It's documented by intel these
days, and it works on all CPU's I've ever heard of, and it even makes
sense to me (*).

(*) Well..  More sense than _some_ instruction set extensions I've seen. 
After all, "repeat no-op" for a longer delay sounds almost logical. 
Certainly better than that IV == 15 thing, ugh ;)

Also, at least part of the reason Intel removed the TSC check was that
Linux actually seems to get the extended CPU capability flags wrong,
overwriting the _real_ capability flags which in turn caused the TSC
check on Linux to simply not work.  Peter Anvin is working on fixing
this. I suspect that Linux-2.2 has the same problem.

There's a few other minor details that need to be fixed for Pentium 4
features (aka " not very well documented errata"), and I think I have
them all except for waiting for Peter to get the capabilities flag
handling right.

So I suspect that we'll have good support for Pentium IV soon enough.. 

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Pentium 4 and 2.4/2.5

2000-11-08 Thread Linus Torvalds

In article [EMAIL PROTECTED],
Alan Cox  [EMAIL PROTECTED] wrote:

rep;nop is a magic instruction on the PIV and possibly some PIII series CPUs
[not sure]. As far as I can make out it naps momentarily or until bus
activity thus saving power on spinlocks.

From what I've heard, the reason Intel _really_ wants "rep nop" is that
without it the CPU will heat up quite efficiently (that's what you do
when you want to run at an eventual 2GHz with all cylinders firing all
the time), causing thermal meltdown on non-thermally protected CPU's and
CPU speed throttling on the ones that _are_ thermally protected (which
will obviously have to be all the shipping ones). 

And the thermal throttling will severly cripple performance.

The problem is 'rep nop' is not defined on other cpus so we can only really use
it on the PIII/PIV kernel builds

Intel retroactively defined it for all their CPU's. And I very strongly
suspect that every single other x86 CPU vendor does the same. Why not?
They get a new instruction for free, but just documenting it. Maybe they
can sell the same old chip with a new name ("The X Wonderchip. Now
with documetned 'rep nop' support! Get one today!").

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Pentium 4 and 2.4/2.5

2000-11-08 Thread Linus Torvalds



On Wed, 8 Nov 2000, Alan Cox wrote:
  unless that CPU is also SMP-capable).  It's documented by intel these
  days, and it works on all CPU's I've ever heard of, and it even makes
  sense to me (*).
 
 Do the intel docs guarantee it works on i486 and higher, if so SMP athlon
 will be the only check needed for the SMP users. You work for an x86 chip
 cloning company so if you say it works I trust you 8)

Well, we don't make low-power SMP laptops, so as such Transmeta doesn't
much care. It will work, though. And yes, as far as I know Intel made it
an "architecture feature", meaning that they claim it work son all their
ia32 chips.

Now, I could imagine that Intel would select an instruction that didn't
work on Athlon on purpose, but I really don't think they did.  I don't
have an athlon to test.

It's easy enough to generate a test-program. If the following works,
you're pretty much guaranteed that it's ok

int main()
{
printf("Testing 'rep nop' ... ");
asm volatile("rep ; nop");
printf("okey-dokey\n"); 
return 0;
}

(there's not much a "rep nop" _can_ do, after all - the most likely CPU
extension would be to raise an "Illegal Opcode" fault).

  Also, at least part of the reason Intel removed the TSC check was that
  Linux actually seems to get the extended CPU capability flags wrong,
  overwriting the _real_ capability flags which in turn caused the TSC
  check on Linux to simply not work.  Peter Anvin is working on fixing
  this. I suspect that Linux-2.2 has the same problem.
 
 I've not seen incorrect TSC detection in 2.2, do you know the precise
 circumstances this occurs and I'll check over them. I've also got no
 bug reports of this failing.

It won't fail on other CPU's. The bug is, as far as I can tell, in
get_model_name(),

cpuid(0x8001, dummy, dummy, dummy, (c-x86_capability));

Notice how we overwrite the x86_capability state with whatever we read
from the extended register 0x8001. So we overwrite the _real_
capabilities that we got the right way in head.S.

This is wrong. It just happens to work on other, non-Pentium IV,
processors. The extended capabilities are an _extention_, not replacement,
for the regular capabilities.

 check_config would also panic with the 'Kernel compiled for ..' message 
 if it occurred.

Which is what it apparently does, if you compile for TSC. Even though very
obviously a Pentium IV _does_ have a TSC.

NOTE! I don't actually have access to a Pentium IV myself yet, although
I'm promised one soon enough. So I've only got second-hand reports on the
cpuid thing so far.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: PATCH: rd - deadlock removal

2000-11-09 Thread Linus Torvalds



On Thu, 9 Nov 2000, Jens Axboe wrote:
 
   The second is more elegant in that it side steps the problem by
   giving rd.c a make_request function instead of using the default
   _make_request.   This means that io_request_lock is simply never
   claimed my rd.
 
 And this solution is much better, even given the freeze I think that
 is the way to go.

I agree, I already applied it. The second approach just makes the problem
go away, and also avoids needlessly merging the request etc. I suspect
that the lack of request-merging could also eventually be used to simplify
the driver a bit, as it now wouldn't need to worry about that issue any
more at all.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [BUG] /proc/pid/stat access stalls badly for swapping process,2.4.0-test10

2000-11-09 Thread Linus Torvalds



As to the real reason for stalls on /proc/pid/stat, I bet it has nothing
to do with IO except indirectly (the IO is necessary to trigger the
problem, but the _reason_ for the problem lies elsewhere).

And it has everything to do with the fact that the way Linux semaphores
are implemented, a non-blocking process has a HUGE advantage over a
blocking one. Linux kernel semaphores are extreme unfair in that way.

What happens is that some process is getting a lot of VM faults and gets
its VM semaphore. No contention yet. it holds the semaphore over the
IO, and now another process does a "ps".

The "ps" process goes to sleep on the semaphore. So far so good.

The original process releases the semaphore, which increments the count,
and wakes up the process waiting for it. Note that it _wakes_ it, it does
not give the semaphore to it. Big difference.

The process that got woken up will run eventually. Probably not all that
immediately, because the process that woke it (and held the semaphore)
just slept on a page fault too, so it's not likely to immediately
relinquish the CPU.

The original running process comes back faulting again, finds the
semaphore still unlocked (the "ps" process is awake but has not gotten to
run yet), gets the semaphore, and falls asleep on the IO for the next
page.

The "ps" process actually gets to run now, but it's a bit late. The
semaphore is locked again. 

Repeat until luck breaks the bad circle.

(This schenario, btw, is much harder to trigger on SMP than on UP. And
it's completely separate from the issue of simple disk bandwidth issues
which can obviously cause no end of stalls on anything that needs the
disk, and which can also happen on SMP).

NOTE! If somebody wants to fix this, the fix should be reasonably simple
but needs to be quite exhaustively checked and double-checked. It's just
too easy to break the semaphores by mistake.

The way to make semaphores more fair is to NOT allow a new process to just
come in immediately and steal the semaphore in __down() if there are other
sleepers. This is most easily accomplished by something along the lines of
the following in __down() in arch/i386/kernel/semaphore.c 

spin_lock_irq(semaphore_lock);
sem-sleepers++;
+
+   /*
+* Are there other people waiting for this?
+* They get to go first.
+*/
+   if (sleepers  1)
+   goto inside;
for (;;) {
int sleepers = sem-sleepers;

/*
 * Add "everybody else" into it. They aren't
 * playing, because we own the spinlock.
 */
if (!atomic_add_negative(sleepers - 1, sem-count)) {
sem-sleepers = 0;
break;
}
sem-sleepers = 1;  /* us - see -1 above */
+inside:
spin_unlock_irq(semaphore_lock);
schedule();
tsk-state = TASK_UNINTERRUPTIBLE|TASK_EXCLUSIVE;
spin_lock_irq(semaphore_lock);
}
spin_unlock_irq(semaphore_lock);

But note that teh above is UNTESTED and also note that from a throughput
(as opposed to latency) standpoint being unfair tends to be nice.

Anybody want to try out something like the above? (And no, I'm not
applying it to my tree yet. It needs about a hundred pairs of eyes to
verify that there isn't some subtle "lost wakeup" race somewhere).

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [bug] usb-uhci locks up on boot half the time

2000-11-09 Thread Linus Torvalds

In article [EMAIL PROTECTED], David Ford  [EMAIL PROTECTED] wrote:

The oddity is that kdb shows the machine to lock up on the popf in
pci_conf_write_word()+0x2c.  I never did get around to digging up this
routine and looking at the code, but I suspect this is a final return
from the routine.  I'm rather confused however, I have no idea why a
flags pop would hang the hardware.

Educated guess: it enables interrupts, after it has done something to
the hardware that causes an infinite stream of them.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



test11-pre2

2000-11-09 Thread Linus Torvalds


Nothing stands out as affecting most people here. Security fix for /proc,
and various cleanups. Alpha and sparc fixes. If you use RAID or ramdisk,
upgrade. 

Linus

-

 - pre2:
- Stephen Rothwell: directory notify could return with the lock held
- Richard Henderson: CLOCKS_PER_SEC on alpha.
- Jeff Garzik: ramfs and highmem: kmap() the page to clear it
- Asit Mallick: enable the APIC in the official order
- Neil Brown: avoid rd deadlock on io_request_lock by using a
  private rd-request function. This also avoids unnecessary
  request merging at this level.
- Ben LaHaise: vmalloc threadign and overflow fix
- Randy Dunlap: USB updates (plusb driver). PCI cacheline size.
- Neil Brown: fix a raid1 on top of lvm bug that crept in in pre1
- Alan Cox: various (Athlon mmx copy, NULL ptr checks for
  scsi_register etc). 
- Al Viro: fix /proc permission check security hole.
- Can-Ru Yeou: SiS301 fbcon driver
- Andrew Morton: NMI oopser and kernel page fault punch through
  both console_lock and timerlist_lock to make sure it prints out..
- Jeff Garzik: clean up "kmap()" return type (it returns a kernel
  virtual address, ie a "void *").
- Jeff Garzik: network driver docs, various one-liners.
- David Miller: add generic "special" flag to page flags, to be
  used by architectures as they see fit. Like keeping track of
  cache coherency issues.
- David Miller: sparc64 updates, make sparc32 boot again
- Davdi Millner: spel "synchronous" correctly
- David Miller: networking - fix some bridge issues, and correct
  IPv6 sysctl entries.
- Dan Aloni: make fork.c use proper macro rather than doing
  get_exec_domain() by hand. 

 - pre1:
- me: make PCMCIA work even in the absense of PCI irq's
- me: add irq mapping capabilities for Cyrix southbridges
- me: make IBMMCA compile right as a module
- me: uhhuh. Major atomic-PTE SMP race boo-boo. Fixed.
- Andrea Arkangeli: don't allow people to set security-conscious
  bits in mxcsr through ptrace SETFPXREGS.
- Jürgen Fischer: aha152x update
- Andrew Morton, Trond Myklebust: file locking fixes
- me: TLB invalidate race with highmem
- Paul Fulghum: synclink/n_hdlc driver updates
- David Miller: export sysctl_jiffies, and have the proper no-sysctl
  version handy
- Neil Brown: RAID driver deadlock and nsfd read access to
  execute-only files fix
- Keith Owens: clean up module information passing, remove
  "get_module_symbol()".
- Jeff Garzik: network (and other) driver fixes and cleanups
- Andrea Arkangeli: scheduler cleanup.
- Ching-Ling Li: fix ALi sound driver memory leak
- Anton Altaparmakov: upcase fix for NTFS
- Thomas Woller: CS4281 audio update

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [BUG] /proc/pid/stat access stalls badly for swapping process,2.4.0-test10

2000-11-10 Thread Linus Torvalds

In article [EMAIL PROTECTED],
Mike Galbraith  [EMAIL PROTECTED] wrote:
 
 (This schenario, btw, is much harder to trigger on SMP than on UP. And
 it's completely separate from the issue of simple disk bandwidth issues
 which can obviously cause no end of stalls on anything that needs the
 disk, and which can also happen on SMP).

Unfortunately, it didn't help in the scenario I'm running.

time make -j30 bzImage:

real14m19.987s  (within stock variance)
user6m24.480s
sys 1m12.970s

Note that the above kin of "throughput performance" should not have been
affected, and was not what I was worried about. 

procs  memoryswap  io system cpu
 r  b  w   swpd   free   buff  cache  si  sobibo   incs  us  sy  id
31  2  1 12   1432   4440  12660   0  1227   151  202   848  89  11   0
34  4  1   1908   2584536   5376 248 1904   602   763  785  4094  63  32  5
13 19  1  64140  67728604  33784 106500 84612 43625 21683 19080 52168  28  22  50

Looks like there was a big delay in vmstat there - that could easily be
due to simple disk throughput issues..

Does it feel any different under the original load that got the original
complaint? The patch may have just been buggy and ineffective, for all I
know. 

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [BUG] /proc/pid/stat access stalls badly for swapping process,2.4.0-test10

2000-11-10 Thread Linus Torvalds

In article [EMAIL PROTECTED],
David Mansfield  [EMAIL PROTECTED] wrote:
Linus Torvalds wrote:
...
 
 And it has everything to do with the fact that the way Linux semaphores
 are implemented, a non-blocking process has a HUGE advantage over a
 blocking one. Linux kernel semaphores are extreme unfair in that way.

...
 The original running process comes back faulting again, finds the
 semaphore still unlocked (the "ps" process is awake but has not gotten to
 run yet), gets the semaphore, and falls asleep on the IO for the next
 page.
 
 The "ps" process actually gets to run now, but it's a bit late. The
 semaphore is locked again.
 
 Repeat until luck breaks the bad circle.
 

But doesn't __down have a fast path coded in assembly?  In other words,
it only hits your patched code if there is already contention, which
there isn't in this case, and therefore the bug...?

The __down() case should be hit if there's a waiter, even if that waiter
has not yet been able to pick up the lock (the waiter _will_ have
decremented the count to negative in order to trigger the proper logic
at release time).

But as I mentioned, the pseudo-patch was certainly untested, so
somebody should probably walk through the cases to check that I didn't
miss something.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: sendfile(2) fails for devices?

2000-11-11 Thread Linus Torvalds

In article [EMAIL PROTECTED],
Jeff Garzik  [EMAIL PROTECTED] wrote:
sendfile(2) fails with -EINVAL every time I try to read from a device
file.

This sounds like a bug... is it?  (the man page doesn't mention such a
restriction)

sendfile() on purpose only works on things that use the page cache. 
EINVAL is basically sendfiles way of saying "I would fall back on doing
a read+write, so you might as well do it yourself in user space because
it might actually be more efficient that way". 

I am using kernel 2.4.0-test11-pre2.  All other tests with sendfile(2)
succeed:  file-file, file-STDOUT, STDIN-file...

Yes, as long as STDIN is a file ;)

sendfile() wants the source to be in the page cache, because the whole
point of sendfile() was to avoid a copy. 

The current device model does _not_ use the page cache. Now, arguably
that's a bug - it also means that you cannot mmap() a block device - but
as it could be easily documented (maybe it is, somewhere), I'll call it
a bad feature for now.

Now, if you want to add the code to do address spaces for block devices,
I wouldn't be all that unhappy.  I've wanted to see it for a while.  I'm
not likely to apply it for 2.4.x any more, but I'd love to have it early
for 2.5.x. 

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [patch] patch-2.4.0-test10-irda24 (resend)

2000-11-11 Thread Linus Torvalds



On Sun, 12 Nov 2000, Dag Brattli wrote:

 (resending in case it got lost, didn't show up on linux-kernel)

Didn't get lost, but I think the linux-kernel size filter killed it from
the kernel list.

Everything applied. Thanks,

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [patch] wakeup_bdflush related fixes and nfsd optimizations fortest10

2000-11-11 Thread Linus Torvalds



On Sat, 11 Nov 2000, Ying Chen/Almaden/IBM wrote:
 
 This patch includes two sets of things against test10:
 First, there are several places where schedule() is called after
 wakeup_bdflush(1) is called. This is completely unnecessary

Fair enough.

 Second, (I have posted this to the kernel mailing list, but I forgot to cc
 to Linus.) I made some optimizations on racache in nfsd in test10.

..but this would need a lot more testing/feedback, especially from the nfs
client maintainers (I see that Neil Brown did some querying already, I
think more is in order). 

Also, I'd _really_ like those lists to be real linux/list.h lists
instead of duplicating code.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: The IrDA patches !!! (was Re: [RANT] Linux-IrDA status)

2000-11-11 Thread Linus Torvalds



Ok, thanks to the work of Jean, everything seems to be applied now.

I'll make a test3 one of these days (probably tomorrow), please verify
that everything looks happy.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] show_task() and thread_saved_pc() fix for x86

2000-11-11 Thread Linus Torvalds



On Fri, 10 Nov 2000, Alexander Viro wrote:
 diff -urN rc11-2/include/asm-i386/processor.h 
rc11-2-show_task/include/asm-i386/processor.h
 --- rc11-2/include/asm-i386/processor.h   Fri Nov 10 09:14:04 2000
 +++ rc11-2-show_task/include/asm-i386/processor.h Fri Nov 10 16:08:15 2000
 @@ -412,7 +412,7 @@
   */
  extern inline unsigned long thread_saved_pc(struct thread_struct *t)
  {
 - return ((unsigned long *)t-esp)[3];
 + return ((unsigned long **)t-esp)[0][1];
  }

The above needs to get verified: it should be something like

unsigned long *ebp = *((unsigned long **)t-esp);

if ((void *) ebp  (void *) t)
return 0;
if ((void *) ebp = (void *) t + 2*PAGE_SIZE)
return 0;
if (3  (unsigned long)ebp)
return 0;
return *ebp;

because otherwise I guarantee that we'll eventually have a bug with a
invalid pointer reference in the debugging code and that would be bad.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



2.4.0-test11-pre3

2000-11-11 Thread Linus Torvalds


Drivers, drivers, drivers. IrDA and ISDN. PPC.

The most interesting part is probably the exclusive wait-queue patch.
David Miller noticed that exclusivity doesn't nest correctly the way we
used to do it: being on multiple wait-queues would potentially cause lost
wake-up events if a non-exclusive waiter got mistaken for an exclusive one
because the exclusive bit was a per-process thing.

Moving the exclusivity bit from the process into the wait-queue cleaned up
the interfaces and also made it nest properly.

No known uses were actually buggy, but at least one case was apparently ok
only by pure luck. 

Linus

-
 - pre3:
- James Simmons: vgacon "printk()" deadlock with global irq lock.
- don't poke blanked console on console output
- Ching-Ling: get channels right on ALI audio driver
- Dag Brattli and Jean Tourrilhes: big IrDA update
- Paul Mackerras: PPC updates
- Randy Dunlap: USB ID table support, LEDs with usbkbd, belkin
  serial converter. 
- Jeff Garzik: pcnet32 and lance net driver fix/cleanup
- Mikael Pettersson: clean up x86 ELF_PLATFORM
- Bartlomiej Zolnierkiewicz: sound and drm driver init fixes and
  cleanups
- Al Viro: Jeff missed some kmap()'s. sysctl cleanup
- Kai Germaschewski: ISDN updates
- Alan Cox: SCSI driver NULL ptr checks
- David Miller: networking updates, exclusive waitqueues nest properly,
  SMP i_shared_lock/page_table_lock lock order fix.

 - pre2:
- Stephen Rothwell: directory notify could return with the lock held
- Richard Henderson: CLOCKS_PER_SEC on alpha.
- Jeff Garzik: ramfs and highmem: kmap() the page to clear it
- Asit Mallick: enable the APIC in the official order
- Neil Brown: avoid rd deadlock on io_request_lock by using a
  private rd-request function. This also avoids unnecessary
  request merging at this level.
- Ben LaHaise: vmalloc threadign and overflow fix
- Randy Dunlap: USB updates (plusb driver). PCI cacheline size.
- Neil Brown: fix a raid1 on top of lvm bug that crept in in pre1
- Alan Cox: various (Athlon mmx copy, NULL ptr checks for
  scsi_register etc). 
- Al Viro: fix /proc permission check security hole.
- Can-Ru Yeou: SiS301 fbcon driver
- Andrew Morton: NMI oopser and kernel page fault punch through
  both console_lock and timerlist_lock to make sure it prints out..
- Jeff Garzik: clean up "kmap()" return type (it returns a kernel
  virtual address, ie a "void *").
- Jeff Garzik: network driver docs, various one-liners.
- David Miller: add generic "special" flag to page flags, to be
  used by architectures as they see fit. Like keeping track of
  cache coherency issues.
- David Miller: sparc64 updates, make sparc32 boot again
- Davdi Millner: spel "synchronous" correctly
- David Miller: networking - fix some bridge issues, and correct
  IPv6 sysctl entries.
- Dan Aloni: make fork.c use proper macro rather than doing
  get_exec_domain() by hand. 

 - pre1:
- me: make PCMCIA work even in the absense of PCI irq's
- me: add irq mapping capabilities for Cyrix southbridges
- me: make IBMMCA compile right as a module
- me: uhhuh. Major atomic-PTE SMP race boo-boo. Fixed.
- Andrea Arkangeli: don't allow people to set security-conscious
  bits in mxcsr through ptrace SETFPXREGS.
- Jürgen Fischer: aha152x update
- Andrew Morton, Trond Myklebust: file locking fixes
- me: TLB invalidate race with highmem
- Paul Fulghum: synclink/n_hdlc driver updates
- David Miller: export sysctl_jiffies, and have the proper no-sysctl
  version handy
- Neil Brown: RAID driver deadlock and nsfd read access to
  execute-only files fix
- Keith Owens: clean up module information passing, remove
  "get_module_symbol()".
- Jeff Garzik: network (and other) driver fixes and cleanups
- Andrea Arkangeli: scheduler cleanup.
- Ching-Ling Li: fix ALi sound driver memory leak
- Anton Altaparmakov: upcase fix for NTFS
- Thomas Woller: CS4281 audio update

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



test11-pre5

2000-11-14 Thread Linus Torvalds


More drivers.

The x86 capabilities cleanup is here.

Linus



 - pre5:
- Rasmus Andersen: add proper "linux/init.h" for sound drivers
- David Miller: sparc64 and networking updates
- David Trcka: MOXA numbering starts from 0, not 1.
- Jeff Garzik: sysctl.h standalone
- Dag Brattli: IrDA finishing touches
- Randy Dunlap: USB fixes
- Gerd Knorr: big bttv update
- Peter Anvin: x86 capabilities cleanup
- Stephen Rothwell: apm initcall fix - smp poweroff should work
- Andrew Morton: setscheduler() spinlock ordering fix
- Stephen Rothwell: directory notification documentation
- Petr Vandrovec: ncpfs capabilities check cleanup
- David Woodhouse: fix jffs to use generic is() library
- Chris Swiedler: oom_kill selection fix
- Jens Axboe: re-merge after sleeping in ll_rw_block.
- Randy Dunlap: USB updates (pegasus and ftdi_sio)
- Kai Germaschewski: ISDN ppp header compression fixed

 - pre4:
- Andrea Arcangeli: SMP scheduler memory barrier fixup
- Richard Henderson: fix alpha semaphores and spinlock bugs.
- Richard Henderson: clean up the file from hell: "xor.c" 

 - pre3:
- James Simmons: vgacon "printk()" deadlock with global irq lock.
- don't poke blanked console on console output
- Ching-Ling: get channels right on ALI audio driver
- Dag Brattli and Jean Tourrilhes: big IrDA update
- Paul Mackerras: PPC updates
- Randy Dunlap: USB ID table support, LEDs with usbkbd, belkin
  serial converter. 
- Jeff Garzik: pcnet32 and lance net driver fix/cleanup
- Mikael Pettersson: clean up x86 ELF_PLATFORM
- Bartlomiej Zolnierkiewicz: sound and drm driver init fixes and
  cleanups
- Al Viro: Jeff missed some kmap()'s. sysctl cleanup
- Kai Germaschewski: ISDN updates
- Alan Cox: SCSI driver NULL ptr checks
- David Miller: networking updates, exclusive waitqueues nest properly,
  SMP i_shared_lock/page_table_lock lock order fix.

 - pre2:
- Stephen Rothwell: directory notify could return with the lock held
- Richard Henderson: CLOCKS_PER_SEC on alpha.
- Jeff Garzik: ramfs and highmem: kmap() the page to clear it
- Asit Mallick: enable the APIC in the official order
- Neil Brown: avoid rd deadlock on io_request_lock by using a
  private rd-request function. This also avoids unnecessary
  request merging at this level.
- Ben LaHaise: vmalloc threadign and overflow fix
- Randy Dunlap: USB updates (plusb driver). PCI cacheline size.
- Neil Brown: fix a raid1 on top of lvm bug that crept in in pre1
- Alan Cox: various (Athlon mmx copy, NULL ptr checks for
  scsi_register etc). 
- Al Viro: fix /proc permission check security hole.
- Can-Ru Yeou: SiS301 fbcon driver
- Andrew Morton: NMI oopser and kernel page fault punch through
  both console_lock and timerlist_lock to make sure it prints out..
- Jeff Garzik: clean up "kmap()" return type (it returns a kernel
  virtual address, ie a "void *").
- Jeff Garzik: network driver docs, various one-liners.
- David Miller: add generic "special" flag to page flags, to be
  used by architectures as they see fit. Like keeping track of
  cache coherency issues.
- David Miller: sparc64 updates, make sparc32 boot again
- Davdi Millner: spel "synchronous" correctly
- David Miller: networking - fix some bridge issues, and correct
  IPv6 sysctl entries.
- Dan Aloni: make fork.c use proper macro rather than doing
  get_exec_domain() by hand. 

 - pre1:
- me: make PCMCIA work even in the absense of PCI irq's
- me: add irq mapping capabilities for Cyrix southbridges
- me: make IBMMCA compile right as a module
- me: uhhuh. Major atomic-PTE SMP race boo-boo. Fixed.
- Andrea Arkangeli: don't allow people to set security-conscious
  bits in mxcsr through ptrace SETFPXREGS.
- Jürgen Fischer: aha152x update
- Andrew Morton, Trond Myklebust: file locking fixes
- me: TLB invalidate race with highmem
- Paul Fulghum: synclink/n_hdlc driver updates
- David Miller: export sysctl_jiffies, and have the proper no-sysctl
  version handy
- Neil Brown: RAID driver deadlock and nsfd read access to
  execute-only files fix
- Keith Owens: clean up module information passing, remove
  "get_module_symbol()".
- Jeff Garzik: network (and other) driver fixes and cleanups
- Andrea Arkangeli: scheduler cleanup.
- Ching-Ling Li: fix ALi sound driver memory leak
- Anton Altaparmakov: upcase fix for NTFS
- Thomas Woller: CS4281 audio update

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: test11-pre5

2000-11-14 Thread Linus Torvalds



On Wed, 15 Nov 2000, Dan Aloni wrote:

 summery: dev_3c501.name shouldn't be NULL, or we get oops

Note that these days "name" is not a pointer at all, but an array, and as
such cannot be NULL any more. Not initializing it will just cause it to be
empty (ie is the same as initializing it to "").

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: test11-pre5

2000-11-14 Thread Linus Torvalds

In article [EMAIL PROTECTED],
Dan Aloni  [EMAIL PROTECTED] wrote:
On Tue, 14 Nov 2000, Jeff Garzik wrote:

 Dan Aloni wrote:
  
  reason: Correct me if I'm wrong, but 3c501.c:init_module() calls
  net_init.c:register_netdev(dev_3c501), which calls strchr(),
  {and might also,which might} dereference dev_3c501.name.
 
 There is no dereferencing involved, and therefore no problem.

Well, at least I was alertive. Almost a bug fix ;-)
Is there a special reason why dev-name is not a pointer?

It used to be.

And we used to have an incredible number of bugs with initialization and
with creating these things dynamically. A lot of Space.c was due to
horrible hackery with getting the static allocation right for these
things. Turning it into a plain array got rid of all the hackery, and
saved memory anyway.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Memory management bug

2000-11-15 Thread Linus Torvalds

In article [EMAIL PROTECTED],
After some trickery with some special hardware feature (storage
keys) I found out that empty_bad_pmd_table and empty_bad_pte_table have
been put to the page table quicklists multiple(!) times.

This is definitely bad, and means that something else really bad is
going on.

In fact, I have this fairly strong suspicion that we should just get rid
of the "bad" page tables altogether, and make the stuff that now uses
them BUG() instead. 

The whole concept of "bad" page tables comes from very early on in
Linux, when the way the page fault handler worked was that if it ran out
of memory or something else really bad happened, it would insert a dummy
page table entry that was guaranteed to let the CPU continue.  That way
the page fault handler was always "successful" from a hardware
standpoint, even if it ended up trying to kill the process. 

This used to be required simply because a page fault in kernel space
originally needed to let the process unwind sanely and cleanly.

These days, the requirement that page faults always "succeed" is long
long gone. The exception handling mechanism handles the cases where we
validly can take a page fault, and in other cases we will just kill the
process outright. As such, the bad page tables should no longer be
needed, and are apparently just hiding some nasty bugs.

What happens if you just replace all places that would use a bad page
table with a BUG()? (Ie do _not_ add the bug to the place where you
added the test: by that time it's too late.  I'm talking about the
places where the bad page tables are used, like in the error cases of
"get_pte_kernel_slow()" etc.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: BUG: isofs broken (2.2 and 2.4)

2000-11-15 Thread Linus Torvalds



On Thu, 16 Nov 2000, Andries Brouwer wrote:
 
 Has there been a kernel version that could read these?
 It looks like it proclaims blocksize 512 and uses blocksize 2048 or so.

The (de_len == 0) check in do_isofs_readdir() seems to imply that the
blocksize is always 2048. So at the very least something is inconsistent.
We use ISOFS_BUFFER_SIZE(inode) (512 in this case) for some sector sizes,
and then ISOFS_BLOCK_SIZE (2048) for others. 

But the way isofs_bmap() works, we need to work with
ISOFS_BUFFER_SIZE(inode). And I don't know if directories are always
_aligned_ at 2048 bytes even if they should be blocked at 2k.

Looking at the isofs lookup() logic, it will actually handle split
entries, instead of complaining about them. And I suspect readdir() did
too at some point, and the code was just removed (probably due to
excessive confusion) when one of the many readdir() reorganizations was
done. 

readdir() probably worked a long time ago.

Is the thing documented somewhere? It looks like we should just allow
entries that are split and not complain about them. We have the temporary
buffer for it already..

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: BUG: isofs broken (2.2 and 2.4)

2000-11-15 Thread Linus Torvalds



Does this patch fix it for you?

Warning: TOTALLY UNTESTED!!! Please test carefully.

Also, I'd be interested to know whether somebody really knows if the zero
length handling is correct. Should we really round up to 2048, or should
we perhaps round up only to the next bufsize?

Linus

-
--- v2.4.0-test10/linux/fs/isofs/dir.c  Fri Aug 11 14:29:01 2000
+++ linux/fs/isofs/dir.cWed Nov 15 17:14:26 2000
@@ -94,6 +94,14 @@
return retnamlen;
 }
 
+static struct buffer_head *isofs_bread(struct inode *inode, unsigned int bufsize, 
+unsigned int block)
+{
+   unsigned int blknr = isofs_bmap(inode, block);
+   if (!blknr)
+   return NULL;
+   return bread(inode-i_dev, blknr, bufsize);
+}
+
 /*
  * This should _really_ be cleaned up some day..
  */
@@ -105,7 +113,7 @@
unsigned char bufbits = ISOFS_BUFFER_BITS(inode);
unsigned int block, offset;
int inode_number = 0;   /* Quiet GCC */
-   struct buffer_head *bh;
+   struct buffer_head *bh = NULL;
int len;
int map;
int high_sierra;
@@ -117,46 +125,25 @@
return 0;
  
offset = filp-f_pos  (bufsize - 1);
-   block = isofs_bmap(inode, filp-f_pos  bufbits);
+   block = filp-f_pos  bufbits;
high_sierra = inode-i_sb-u.isofs_sb.s_high_sierra;
 
-   if (!block)
-   return 0;
-
-   if (!(bh = breada(inode-i_dev, block, bufsize, filp-f_pos, inode-i_size)))
-   return 0;
-
while (filp-f_pos  inode-i_size) {
int de_len;
-#ifdef DEBUG
-   printk("Block, offset, f_pos: %x %x %x\n",
-  block, offset, filp-f_pos);
-   printk("inode-i_size = %x\n",inode-i_size);
-#endif
-   /* Next directory_record on next CDROM sector */
-   if (offset = bufsize) {
-#ifdef DEBUG
-   printk("offset = bufsize\n");
-#endif
-   brelse(bh);
-   offset = 0;
-   block = isofs_bmap(inode, (filp-f_pos)  bufbits);
-   if (!block)
-   return 0;
-   bh = breada(inode-i_dev, block, bufsize, filp-f_pos, 
inode-i_size);
+
+   if (!bh) {
+   bh = isofs_bread(inode, bufsize, block);
if (!bh)
return 0;
-   continue;
}
 
de = (struct iso_directory_record *) (bh-b_data + offset);
-   if(first_de) inode_number = (block  bufbits) + (offset  (bufsize - 
1));
+   if (first_de) inode_number = (block  bufbits) + (offset  (bufsize - 
+1));
 
de_len = *(unsigned char *) de;
 #ifdef DEBUG
printk("de_len = %d\n", de_len);
-#endif
-   
+#endif 
 
/* If the length byte is zero, we should move on to the next
   CDROM sector.  If we are at the end of the directory, we
@@ -164,36 +151,36 @@
 
if (de_len == 0) {
brelse(bh);
-   filp-f_pos = ((filp-f_pos  ~(ISOFS_BLOCK_SIZE - 1))
-  + ISOFS_BLOCK_SIZE);
+   bh = NULL;
+   filp-f_pos = ((filp-f_pos  ~(ISOFS_BLOCK_SIZE - 1)) + 
+ISOFS_BLOCK_SIZE);
+   block = filp-f_pos  bufbits;
offset = 0;
-
-   if (filp-f_pos = inode-i_size)
-   return 0;
-
-   block = isofs_bmap(inode, (filp-f_pos)  bufbits);
-   if (!block)
-   return 0;
-   bh = breada(inode-i_dev, block, bufsize, filp-f_pos, 
inode-i_size);
-   if (!bh)
-   return 0;
continue;
}
 
-   offset +=  de_len;
+   offset += de_len;
+   if (offset == bufsize) {
+   offset = 0;
+   block++;
+   brelse(bh);
+   bh = NULL;
+   }
+
+   /* Make sure we have a full directory entry */
if (offset  bufsize) {
-   /*
-* This would only normally happen if we had
-* a buggy cdrom image.  All directory
-* entries should terminate with a null size
-* or end exactly at the end of the sector.
-*/
-   printk("next_offset (%x)  bufsize (%lx)\n",
-  offset,bufsize);
-   break;
+   int slop = bufsize - offset + de_len;
+   memcpy(tmpde, de, slop);
+   offset = bufsize - 1;
+ 

Re: BUG: isofs broken (2.2 and 2.4)

2000-11-15 Thread Linus Torvalds



On Wed, 15 Nov 2000, Linus Torvalds wrote:
 
 Does this patch fix it for you?
 
 Warning: TOTALLY UNTESTED!!! Please test carefully.

Ok, I tested it with the broken image.

It looks like "readdir()" is ok now (but not really knowing what the right
output should be I cannot guarantee that). HOWEVER, doing an "ls -l" on
some of the files gets ENOENT, implying that "lookup()" still has some
problems with the image.

I suspect the code to handle split entries in isofs_find_entry() has some
simple bug, but I'm too lazy to check it out right now. Anybody else
willing to finish this one off?

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: BUG: isofs broken (2.2 and 2.4)

2000-11-15 Thread Linus Torvalds



On Thu, 16 Nov 2000 [EMAIL PROTECTED] wrote:
 
 If noone else does, I suppose I can.

Thanks.

 
 ( .. gets ENOENT ..
 and that is not because it only is a partial image?)

I don't think so, but I obviously have no way of actually confirming my
suspicion.

If the stat information was wrong due to the partial image, the lookup
should still have succeeded (the directory entries certainly were there -
otherwise they'd not have shown up in readdir), and we would just have
gotten garbage inode information etc. I think.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [BUG] Inconsistent behaviour of rmdir

2000-11-16 Thread Linus Torvalds



On Thu, 16 Nov 2000, Jean-Marc Saffroy wrote:
 
 As you see, it looks like the rmdir fails simply because the dir name ends
 with a dot !! This is confirmed by sys_rmdir in fs/namei.c, around line
 1384 :
 
 switch(nd.last_type) {
 case LAST_DOTDOT:
 error = -ENOTEMPTY;
 goto exit1;
 case LAST_ROOT: case LAST_DOT:
 error = -EBUSY;
 goto exit1;
 }
 
 Should we rip off the offending "case LAST_DOT" ? Or do we need a smarter
 patch ? Is it really a problem that a process has its current directory
 deleted ? How about the root ?

The cwd is not the problem. The '.' is.

The reason for that check is that allowing "rmdir(".")" confuses a lot of
UNIX programs, because it wasn't traditionally allowed.

 The man page for rmdir(2) should be updated as well, the current one
 states :
EBUSY  pathname is the current working directory  or  root
   directory of some process.

That's definitely wrong. You can do 

rmdir `pwd`

and that's fine (not all filesystems will let you do that, but that's a
low-level filesystem issue). It's really only the special names "." and
".." that cannot be removed.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Memory management bug

2000-11-16 Thread Linus Torvalds



On Thu, 16 Nov 2000 [EMAIL PROTECTED] wrote:
 
 Ok, the BUG() hit in get_pmd_slow:
 
 pmd_t *
 get_pmd_slow(pgd_t *pgd, unsigned long offset)
 {
 pmd_t *pmd;
 int i;
 
 pmd = (pmd_t *) __get_free_pages(GFP_KERNEL,2);

You really need 4 pages?

There's no way to reliably get 4 consecutive pages when you're even close
to being low on memory. I would suggest just failing with a NULL return
here.

What is the architecture setup for this machine? I have no clue about
S/390 memory management. Maybe you can modify the pmd layout?

One potential fix for this is to just make the page size bigger. Make
"Linux pages" be _two_ hardware pages, and make a Linux pte contain two
"hardware pte's". That way the pmd would be an order-1 allocation instead
of an order-2 one. Which is statistically _much_ more likely to be around
(exponential distribution).

Linus


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Memory management bug

2000-11-16 Thread Linus Torvalds



On Thu, 16 Nov 2000, Andrea Arcangeli wrote:
 
 If they absolutely needs 4 pages for pmd pagetables due hardware constraints
 I'd recommend to use _four_ hardware pages for each softpage, not two.

Yes.

However, it definitely is an issue of making trade-offs. Most 64-bit MMU
models tend to have some flexibility in how you set up the page tables,
and it may be possible to just move bits around too (ie making both the
pmd and the pgd twice as large, and getting the expansion of 4 by doing
two expand-by-two's, for example, if the hardware has support for doing
things like that).

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: PATCH: 8139too kernel thread

2000-11-16 Thread Linus Torvalds



On Thu, 16 Nov 2000, Alexander Viro wrote:
 
 On Thu, 16 Nov 2000, Alan Cox wrote:
 
   The only disadvantage to this scheme is the added cost of a kernel
   thread over a kernel timer.  I think this is an ok cost, because this
   is a low-impact thread that sleeps a lot..
  
  8K of memory, two tlb flushes, cache misses on the scheduler. The price is
 ^^^
  actually extremely high.
 
 confused
 Does it really need non-lazy TLB?

If Alan wants to back-port it into 2.2.x, the lazy tlb won't work.

But yes, on 2.4.x the cost of threads is fairly low. The biggest cost by
far is probably the locking needed for the scheduler etc, and there the
best rule of thumb is probably to see whether the driver really ends up
being noticeably simpler.

The event stuff that we are discussing for pcmcia may make all of this
moot, maybe media selection is the perfect example of how to do the very
same thing. I'll forward Jeff the emails on that.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH (2.4)] atomic use count for proc_dir_entry

2000-11-16 Thread Linus Torvalds



On Thu, 16 Nov 2000, Dan Aloni wrote:
 
 Makes procfs use an atomic use count for dir entries, to avoid using 
 the Big kernel lock. Axboe says it looks ok.

There's a race there. Look at what happens if de_put() races with
remove_proc_entry(): we'd do free_proc_entry() twice. Not good.

Leave the kernel lock for now.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



test11-pre6

2000-11-16 Thread Linus Torvalds


The log-file says it all..

Linus

-

 - pre6:
- Intel: start to add Pentium IV specific stuff (128-byte cacheline
  etc)
- David Miller: search-and-destroy places that forget to mark us
  running after removing us from a wait-queue.
- me: NFS client write-back ref-counting SMP instability.
- me: fix up non-exclusive waiters
- Trond Myklebust: Be more careful about SMP in NFS and RPC code
- Trond Myklebust: inode attribute update race fix
- Charles White: don't do unaligned accesses in cpqarray driver.
- Jeff Garzik: continued driver cleanup and fixes
- Peter Anvin: integrate more of the Intel patches.
- Robert Love: add i815 signature to the intel AGP support
- Rik Faith: DRM update to make it easier to sync up 2.2.x
- David Woodhouse: make old 16-bit pcmcia controllers work
  again (ie i82365 and TCIC)

 - pre5:
- Rasmus Andersen: add proper "linux/init.h" for sound drivers
- David Miller: sparc64 and networking updates
- David Trcka: MOXA numbering starts from 0, not 1.
- Jeff Garzik: sysctl.h standalone
- Dag Brattli: IrDA finishing touches
- Randy Dunlap: USB fixes
- Gerd Knorr: big bttv update
- Peter Anvin: x86 capabilities cleanup
- Stephen Rothwell: apm initcall fix - smp poweroff should work
- Andrew Morton: setscheduler() spinlock ordering fix
- Stephen Rothwell: directory notification documentation
- Petr Vandrovec: ncpfs capabilities check cleanup
- David Woodhouse: fix jffs to use generic is() library
- Chris Swiedler: oom_kill selection fix
- Jens Axboe: re-merge after sleeping in ll_rw_block.
- Randy Dunlap: USB updates (pegasus and ftdi_sio)
- Kai Germaschewski: ISDN ppp header compression fixed

 - pre4:
- Andrea Arcangeli: SMP scheduler memory barrier fixup
- Richard Henderson: fix alpha semaphores and spinlock bugs.
- Richard Henderson: clean up the file from hell: "xor.c" 

 - pre3:
- James Simmons: vgacon "printk()" deadlock with global irq lock.
- don't poke blanked console on console output
- Ching-Ling: get channels right on ALI audio driver
- Dag Brattli and Jean Tourrilhes: big IrDA update
- Paul Mackerras: PPC updates
- Randy Dunlap: USB ID table support, LEDs with usbkbd, belkin
  serial converter. 
- Jeff Garzik: pcnet32 and lance net driver fix/cleanup
- Mikael Pettersson: clean up x86 ELF_PLATFORM
- Bartlomiej Zolnierkiewicz: sound and drm driver init fixes and
  cleanups
- Al Viro: Jeff missed some kmap()'s. sysctl cleanup
- Kai Germaschewski: ISDN updates
- Alan Cox: SCSI driver NULL ptr checks
- David Miller: networking updates, exclusive waitqueues nest properly,
  SMP i_shared_lock/page_table_lock lock order fix.

 - pre2:
- Stephen Rothwell: directory notify could return with the lock held
- Richard Henderson: CLOCKS_PER_SEC on alpha.
- Jeff Garzik: ramfs and highmem: kmap() the page to clear it
- Asit Mallick: enable the APIC in the official order
- Neil Brown: avoid rd deadlock on io_request_lock by using a
  private rd-request function. This also avoids unnecessary
  request merging at this level.
- Ben LaHaise: vmalloc threadign and overflow fix
- Randy Dunlap: USB updates (plusb driver). PCI cacheline size.
- Neil Brown: fix a raid1 on top of lvm bug that crept in in pre1
- Alan Cox: various (Athlon mmx copy, NULL ptr checks for
  scsi_register etc). 
- Al Viro: fix /proc permission check security hole.
- Can-Ru Yeou: SiS301 fbcon driver
- Andrew Morton: NMI oopser and kernel page fault punch through
  both console_lock and timerlist_lock to make sure it prints out..
- Jeff Garzik: clean up "kmap()" return type (it returns a kernel
  virtual address, ie a "void *").
- Jeff Garzik: network driver docs, various one-liners.
- David Miller: add generic "special" flag to page flags, to be
  used by architectures as they see fit. Like keeping track of
  cache coherency issues.
- David Miller: sparc64 updates, make sparc32 boot again
- Davdi Millner: spel "synchronous" correctly
- David Miller: networking - fix some bridge issues, and correct
  IPv6 sysctl entries.
- Dan Aloni: make fork.c use proper macro rather than doing
  get_exec_domain() by hand. 

 - pre1:
- me: make PCMCIA work even in the absense of PCI irq's
- me: add irq mapping capabilities for Cyrix southbridges
- me: make IBMMCA compile right as a module
- me: uhhuh. Major atomic-PTE SMP race boo-boo. Fixed.
- Andrea Arkangeli: don't allow people to set security-conscious
  bits in mxcsr through ptrace SETFPXREGS.
- Jürgen Fischer: aha152x update
- Andrew Morton, Trond Myklebust: file locking fixes
- me: TLB invalidate race with highmem
- Paul Fulghum: synclink/n_hdlc driver updates
- David Miller: export sysctl_jiffies, and 

Re: [PATCH] pcmcia event thread. (fwd)

2000-11-17 Thread Linus Torvalds



On Fri, 17 Nov 2000, Russell King wrote:

 Alan Cox writes:
  From a practical point of view that currently means 'delete Linus tree pcmcia
  regardless of what you are doing' since the modules from David Hinds and Linus
  pcmcia are not 100% binary compatible for all cases.
 
 However, deleting that code would render a significant number of ARM platforms
 without PCMCIA support, which would be real bad.

Right now, I suspect that the in-kernel pcmcia code is actually at the
point where it _is_ possible to use it. David Hinds has been keeping the
cs layer in synch with the external versions, and tons of people have
helped make the low-level drivers stable again.

If somebody still has a problem with the in-kernel stuff, speak up. 

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] pcmcia event thread. (fwd)

2000-11-17 Thread Linus Torvalds



On Fri, 17 Nov 2000, Alan Cox wrote:

   regardless of what you are doing' since the modules from David Hinds and Linus
   pcmcia are not 100% binary compatible for all cases.
  
  However, deleting that code would render a significant number of ARM platforms
  without PCMCIA support, which would be real bad.
 
 It would actually have made no difference as said code didnt actually work
 anyway. Dwmw2 seems to have solved that

Alan, Russell is talking about CardBus controllers (it's also PCMCIA, in
fact, these days it's the _only_ pcmcia in any machine made less than five
years ago).

The patches to get i82365 and TCIC up and running again are interesting
mainly for laptops with i486 CPUs and for desktops with pcmcia add-in
cards (which are basically always ISA i82365-clones). They aren't
interesting to ARM, I suspect.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] pcmcia event thread. (fwd)

2000-11-17 Thread Linus Torvalds



On Fri, 17 Nov 2000, Alan Cox wrote:

  Alan, Russell is talking about CardBus controllers (it's also PCMCIA, in
  fact, these days it's the _only_ pcmcia in any machine made less than five
  years ago).
 
 I have at least two machines here that are  2 years old but disagree
 with you. Once is only months old. 

Who makes those pieces of crap? And who _buys_ them? I can understand it
in embedded stuff simply because the chips are simpler and smaller, but in
a laptop you should definitely try to avoid it.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Memory management bug

2000-11-17 Thread Linus Torvalds



On Fri, 17 Nov 2000 [EMAIL PROTECTED] wrote:
 
  Whats the reasoning behind these ifs ?
 
 To catch memory corruption or things running out of control in the kernel.
 I was refering to the "if (!order) goto try_again" ifs in alloc_pages, not
 the "if (something) BUG()" ifs.

Basically, if you try to wait for orders  0, you may have to wait for a
LOONG time.

It actually works reasonably well on machines with big memories, because a
buddy allocator _will_ try to coalesce memory allocations as much as
possible. But it has nasty cases where you can be really unlucky. Feel
free to run simulations to see, but basically if you have reasonably
random allocation and free patterns and you want to get an order-X
contiguous allocation, you may have to free up a noticeable portion of
your memory before it succeeds.

Sure, you could do "directed freeing", where you actually try to look at
which pages would be worth freeing to find a large free area, but the
complexity is not insignificant, and quite frankly the proper approach has
always been "don't do that then". Don't rely on big contiguous chunks of
memory. Having an mm that can guarantee contiguous chunks of physical
memory would be cool, but I suspect strongly that it would have some
serious downsides.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] pcmcia event thread. (fwd)

2000-11-17 Thread Linus Torvalds



On Fri, 17 Nov 2000, Jeff Garzik wrote:
  
  2. Even when I specify cs_irq=27, it resorts to polling:
  
  Intel PCIC probe:
Intel i82365sl DF ISA-to-PCMCIA at port 0x8400 ofs 0x00, 2 sockets
  host opts [0]: none
  host opts [1]: none
  ISA irqs (default) = none! polling interval = 1000 ms
Intel i82365sl DF ISA-to-PCMCIA at port 0x8400 ofs 0x80, 2 sockets
  host opts [2]: none
  host opts [3]: none
  ISA irqs (default) = none! polling interval = 1000 ms
 
 For these two, it sounds to me like you need to be doing a PCI probe,
 and getting the irq and I/O port info from pci_dev.  And calling
 pci_enable_device, which may or may not be a showstopper here...

The i82365 stuff actually used to do much of this, but it was so
intimately intertwined with the cardbus handling that I pruned it out for
my sanity.

It should be possible to do the same thing with a nice simple concentrated
PCI probe, instead of having stuff quite as spread out as it used to be.

As to why it doesn't show any ISA interrupts, who knows... Some of the PCI
PCMCIA bridges need to be initialized.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: BUG: isofs broken (2.2 and 2.4)

2000-11-17 Thread Linus Torvalds



On Fri, 17 Nov 2000, Harald Koenig wrote:
 
 this seems to make things much worse:  starting with ~90M free memory
 "du" again started leaking (or maybe just using memory?) down to ~80M free
 memory when the system suddently locked up completely, no console switch
 was possible anymore (but Sysrq-B did reboot).

How about this version (full patch against test10 - it includes a
slightly corrected version of my earlier dir.c patch)?

It's entirely untested, but it looks good and compiles. Ship it!

Linus

-
diff -u --recursive --new-file v2.4.0-test10/linux/fs/isofs/dir.c linux/fs/isofs/dir.c
--- v2.4.0-test10/linux/fs/isofs/dir.c  Fri Aug 11 14:29:01 2000
+++ linux/fs/isofs/dir.cFri Nov 17 13:38:01 2000
@@ -94,6 +94,14 @@
return retnamlen;
 }
 
+static struct buffer_head *isofs_bread(struct inode *inode, unsigned int bufsize, 
+unsigned int block)
+{
+   unsigned int blknr = isofs_bmap(inode, block);
+   if (!blknr)
+   return NULL;
+   return bread(inode-i_dev, blknr, bufsize);
+}
+
 /*
  * This should _really_ be cleaned up some day..
  */
@@ -105,7 +113,7 @@
unsigned char bufbits = ISOFS_BUFFER_BITS(inode);
unsigned int block, offset;
int inode_number = 0;   /* Quiet GCC */
-   struct buffer_head *bh;
+   struct buffer_head *bh = NULL;
int len;
int map;
int high_sierra;
@@ -117,46 +125,25 @@
return 0;
  
offset = filp-f_pos  (bufsize - 1);
-   block = isofs_bmap(inode, filp-f_pos  bufbits);
+   block = filp-f_pos  bufbits;
high_sierra = inode-i_sb-u.isofs_sb.s_high_sierra;
 
-   if (!block)
-   return 0;
-
-   if (!(bh = breada(inode-i_dev, block, bufsize, filp-f_pos, inode-i_size)))
-   return 0;
-
while (filp-f_pos  inode-i_size) {
int de_len;
-#ifdef DEBUG
-   printk("Block, offset, f_pos: %x %x %x\n",
-  block, offset, filp-f_pos);
-   printk("inode-i_size = %x\n",inode-i_size);
-#endif
-   /* Next directory_record on next CDROM sector */
-   if (offset = bufsize) {
-#ifdef DEBUG
-   printk("offset = bufsize\n");
-#endif
-   brelse(bh);
-   offset = 0;
-   block = isofs_bmap(inode, (filp-f_pos)  bufbits);
-   if (!block)
-   return 0;
-   bh = breada(inode-i_dev, block, bufsize, filp-f_pos, 
inode-i_size);
+
+   if (!bh) {
+   bh = isofs_bread(inode, bufsize, block);
if (!bh)
return 0;
-   continue;
}
 
de = (struct iso_directory_record *) (bh-b_data + offset);
-   if(first_de) inode_number = (block  bufbits) + (offset  (bufsize - 
1));
+   if (first_de) inode_number = (bh-b_blocknr  bufbits) + offset;
 
de_len = *(unsigned char *) de;
 #ifdef DEBUG
printk("de_len = %d\n", de_len);
-#endif
-   
+#endif 
 
/* If the length byte is zero, we should move on to the next
   CDROM sector.  If we are at the end of the directory, we
@@ -164,36 +151,33 @@
 
if (de_len == 0) {
brelse(bh);
-   filp-f_pos = ((filp-f_pos  ~(ISOFS_BLOCK_SIZE - 1))
-  + ISOFS_BLOCK_SIZE);
+   bh = NULL;
+   filp-f_pos = ((filp-f_pos  ~(ISOFS_BLOCK_SIZE - 1)) + 
+ISOFS_BLOCK_SIZE);
+   block = filp-f_pos  bufbits;
offset = 0;
-
-   if (filp-f_pos = inode-i_size)
-   return 0;
-
-   block = isofs_bmap(inode, (filp-f_pos)  bufbits);
-   if (!block)
-   return 0;
-   bh = breada(inode-i_dev, block, bufsize, filp-f_pos, 
inode-i_size);
-   if (!bh)
-   return 0;
continue;
}
 
-   offset +=  de_len;
-   if (offset  bufsize) {
-   /*
-* This would only normally happen if we had
-* a buggy cdrom image.  All directory
-* entries should terminate with a null size
-* or end exactly at the end of the sector.
-*/
-   printk("next_offset (%x)  bufsize (%lx)\n",
-  offset,bufsize);
-   break;
+   offset += de_len;
+
+   /* Make sure we have a full directory entry */
+   if (offset = bufsize) {
+   int slop = 

Re: BUG: isofs broken (2.2 and 2.4)

2000-11-17 Thread Linus Torvalds



On Fri, 17 Nov 2000, Harald Koenig wrote:
 
 Linus:0.380u 76.850s 1:19.12 97.6%0+0k 0+0io 113pf+0w
 Andries:  0.470u 97.220s 1:40.29 97.4%0+0k 0+0io 112pf+0w

The biggest difference is just the system times and the fact that it's
more efficient coding. 

 BUT: there are some obvious bugs in the output of "du" and "find".
 some samples (all file names (should) match the format "xe%03d/xe%03d.%c%c"
 with both %03d being the _same_ number and both %c are in [a-z0-9]).

Yes. There's a silly bug there, now that I've tested it a bit. Basically
the test for stuff that traversed a boundary was wrong.

The whole name conversion code is pretty horrible. It's been written over
the years, and it was doing the same thing with small modifications in
both readdir() and lookup(). I've got a cleaned up version that also
should have the above bug fixed.

Still ready to test? This time I went over the files rather carefully, and
while I've not tested the fixed version I'm getting pretty happy with it.

I'll merge some more of the name translation logic, but before I do that
here's the newest patch..

Linus

-
diff -u --recursive --new-file v2.4.0-test10/linux/fs/isofs/dir.c linux/fs/isofs/dir.c
--- v2.4.0-test10/linux/fs/isofs/dir.c  Fri Aug 11 14:29:01 2000
+++ linux/fs/isofs/dir.cFri Nov 17 15:43:36 2000
@@ -40,14 +40,17 @@
lookup: isofs_lookup,
 };
 
-static int isofs_name_translate(char * old, int len, char * new)
+int isofs_name_translate(struct iso_directory_record *de, char *new, struct inode 
+*inode)
 {
-   int i, c;
+   char * old = de-name;
+   int len = de-name_len[0];
+   int i;

for (i = 0; i  len; i++) {
-   c = old[i];
+   unsigned char c = old[i];
if (!c)
break;
+
if (c = 'A'  c = 'Z')
c |= 0x20;  /* lower case */
 
@@ -74,8 +77,7 @@
 {
int std;
unsigned char * chr;
-   int retnamlen = isofs_name_translate(de-name,
-   de-name_len[0], retname);
+   int retnamlen = isofs_name_translate(de, retname, inode);
if (retnamlen == 0) return 0;
std = sizeof(struct iso_directory_record) + de-name_len[0];
if (std  1) std++;
@@ -105,7 +107,7 @@
unsigned char bufbits = ISOFS_BUFFER_BITS(inode);
unsigned int block, offset;
int inode_number = 0;   /* Quiet GCC */
-   struct buffer_head *bh;
+   struct buffer_head *bh = NULL;
int len;
int map;
int high_sierra;
@@ -117,46 +119,22 @@
return 0;
  
offset = filp-f_pos  (bufsize - 1);
-   block = isofs_bmap(inode, filp-f_pos  bufbits);
+   block = filp-f_pos  bufbits;
high_sierra = inode-i_sb-u.isofs_sb.s_high_sierra;
 
-   if (!block)
-   return 0;
-
-   if (!(bh = breada(inode-i_dev, block, bufsize, filp-f_pos, inode-i_size)))
-   return 0;
-
while (filp-f_pos  inode-i_size) {
int de_len;
-#ifdef DEBUG
-   printk("Block, offset, f_pos: %x %x %x\n",
-  block, offset, filp-f_pos);
-   printk("inode-i_size = %x\n",inode-i_size);
-#endif
-   /* Next directory_record on next CDROM sector */
-   if (offset = bufsize) {
-#ifdef DEBUG
-   printk("offset = bufsize\n");
-#endif
-   brelse(bh);
-   offset = 0;
-   block = isofs_bmap(inode, (filp-f_pos)  bufbits);
-   if (!block)
-   return 0;
-   bh = breada(inode-i_dev, block, bufsize, filp-f_pos, 
inode-i_size);
+
+   if (!bh) {
+   bh = isofs_bread(inode, bufsize, block);
if (!bh)
return 0;
-   continue;
}
 
de = (struct iso_directory_record *) (bh-b_data + offset);
-   if(first_de) inode_number = (block  bufbits) + (offset  (bufsize - 
1));
+   if (first_de) inode_number = (bh-b_blocknr  bufbits) + offset;
 
de_len = *(unsigned char *) de;
-#ifdef DEBUG
-   printk("de_len = %d\n", de_len);
-#endif
-   
 
/* If the length byte is zero, we should move on to the next
   CDROM sector.  If we are at the end of the directory, we
@@ -164,36 +142,33 @@
 
if (de_len == 0) {
brelse(bh);
-   filp-f_pos = ((filp-f_pos  ~(ISOFS_BLOCK_SIZE - 1))
-  + ISOFS_BLOCK_SIZE);
+   bh = NULL;
+   filp-f_pos = ((filp-f_pos  ~(ISOFS_BLOCK_SIZE - 1)) + 
+ISOFS_BLOCK_SIZE);
+   block = filp-f_pos  bufbits;

Re: BUG: isofs broken (2.2 and 2.4)

2000-11-17 Thread Linus Torvalds



On Sat, 18 Nov 2000 [EMAIL PROTECTED] wrote:
 
 But now that you did two-thirds of the job I take it you'll
 also do the third part? It is again precisely the same stuff.

Are you talking about isofs_lookup_grandparent()?

The code is now dead, and has been for a long time actually (as the VFS
layer keeps track of ".." for us these days). Removed.

I'll look at the isofs_read_level3_size() thing. At least that one doesn't
have the name translation crap in it.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: BUG: isofs broken (2.2 and 2.4)

2000-11-17 Thread Linus Torvalds



Oh, and sorry - the last patch doesn't contain the (obvious) fixes to the
header files to take some of the calling convention changes into account.


Linus

---
--- v2.4.0-test10/linux/include/linux/iso_fs.h  Fri Sep  8 12:52:56 2000
+++ linux/include/linux/iso_fs.hFri Nov 17 15:52:03 2000
@@ -177,16 +177,17 @@
 
 extern int parse_rock_ridge_inode(struct iso_directory_record *, struct inode *);
 extern int get_rock_ridge_filename(struct iso_directory_record *, char *, struct 
inode *);
+extern int isofs_name_translate(struct iso_directory_record *, char *, struct inode 
+*);
 
 extern int find_rock_ridge_relocation(struct iso_directory_record *, struct inode *);
 
-int get_joliet_filename(struct iso_directory_record *, struct inode *, unsigned char 
*);
+int get_joliet_filename(struct iso_directory_record *, unsigned char *, struct inode 
+*);
 int get_acorn_filename(struct iso_directory_record *, char *, struct inode *);
 
 extern struct dentry *isofs_lookup(struct inode *, struct dentry *);
 extern int isofs_get_block(struct inode *, long, struct buffer_head *, int);
 extern int isofs_bmap(struct inode *, int);
-extern int isofs_lookup_grandparent(struct inode *, int);
+extern struct buffer_head *isofs_bread(struct inode *, unsigned int, unsigned int);
 
 extern struct inode_operations isofs_dir_inode_operations;
 extern struct file_operations isofs_dir_operations;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: BUG: isofs broken (2.2 and 2.4)

2000-11-17 Thread Linus Torvalds



There's a test11-pre7 there now, and I'd really ask people to check out
the isofs changes because slight worry about those is what held me up from
just calling it test11 outright.

It's almost guaranteed to be better than what we had before, but anyway..

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test11-pre7 compile failure

2000-11-17 Thread Linus Torvalds

In article [EMAIL PROTECTED], J Sloan  [EMAIL PROTECTED] wrote:

looks like the md fixes broke something -

In file included from /usr/src/linux/include/linux/pagemap.h:17,
 from /usr/src/linux/include/linux/locks.h:9,
 from /usr/src/linux/include/linux/raid/md.h:37,
 from init/main.c:25:
/usr/src/linux/include/linux/highmem.h: In function `bh_kmap':
/usr/src/linux/include/linux/highmem.h:23: structure has no member named
`p_page'

The "p_page" should be a "b_page". Duh.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Freeze on FPU exception with Athlon

2000-11-17 Thread Linus Torvalds

In article [EMAIL PROTECTED],
=?iso-8859-1?q?Markus=20Schoder?=  [EMAIL PROTECTED] wrote:
The following small program (linked against glibc 2.1.3) reliably
freezes my system (Athlon Thunderbird CPU) with at least kernels
2.4.0-test10 and 2.4.0-test11-pre5.  Even the SysRq keys do not work
after the freeze.

Are you sure sysrq doesn't work? Many distributions will disable the
kernel printing to the console, or move it to console 7 or similar. 

It would be really good to get the EIP trace of RightAlt+ScrollLock
pressed a few times if you can try to see if you can use klogd to enable
proper printk's.

Older kernels (e.g. 2.3.40) seem to work.  Any Ideas?

The FP exception handling has certainly changed, but the changes should
all have affected mainly just PIII kernels with XMM support enabled. An
Athlon system should have been pretty unaffected. But I'll take a look
if I see something obvious.

One thing to try: if interrupts really don't work for you (and if SysRq
doesn't work, that may be the case), please test out a kernel that
simply ignores irq13 by just commenting out the line

setup_irq(13, irq13);

in arch/i386/kernel/i8259.c.  Does that make any difference? (irq13
shouldn't be used any more, it's horrible legacy crap, but we do want to
support even horrible legacy systems). 

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Freeze on FPU exception with Athlon

2000-11-17 Thread Linus Torvalds

In article [EMAIL PROTECTED],
=?iso-8859-1?q?Markus=20Schoder?=  [EMAIL PROTECTED] wrote:
The following small program (linked against glibc 2.1.3) reliably
freezes my system (Athlon Thunderbird CPU) with at least kernels
2.4.0-test10 and 2.4.0-test11-pre5.  Even the SysRq keys do not work
after the freeze.

Older kernels (e.g. 2.3.40) seem to work.  Any Ideas?

It certainly doesn't happen for me on any of the machines I work with,
but it wouldn't compile as-is for me, so I exchanged the FPU setting
with a simpler

asm("fldcw %0": :"m" (0));

which should do the equivalent (ie unmask divide by zero errors). Does
that make a difference for you?

Can you try to figure out where it started happening? Ie try test9 and
back too, to figure out what might be bringing it on... 

I sure as hell hope this isn't an Athlon issue.  Can other people try
the test-program and see if we have a pattern (ie "it happens only on
Athlons", or "Linus is on drugs and it happens for everybody else").

Thanks,

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: BUG: isofs broken (2.2 and 2.4)

2000-11-17 Thread Linus Torvalds



On Sat, 18 Nov 2000, Keith Owens wrote:

 On Fri, 17 Nov 2000 17:21:53 -0800 (PST), 
 Linus Torvalds [EMAIL PROTECTED] wrote:
 There's a test11-pre7 there now, and I'd really ask people to check out
 the isofs changes because slight worry about those is what held me up from
 just calling it test11 outright.
 
 It's almost guaranteed to be better than what we had before, but anyway..
 
  Linus
 
 namei.c: In function `isofs_find_entry':
 namei.c:130: warning: passing arg 2 of `get_joliet_filename' from incompatible 
pointer type
 namei.c:130: warning: passing arg 3 of `get_joliet_filename' from incompatible 
pointer type

Thanks. The second and third arguments were switched around to match all
the other filename conversion stuff, and because I don't have joliet
enabled I didn't notice this. Just switch them around where the warning
occurs, and you should be golden.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [patch] potential death in disassociate_ctty()

2000-11-17 Thread Linus Torvalds

In article [EMAIL PROTECTED],
Andrew Morton  [EMAIL PROTECTED] wrote:

Also, somewhere on the path from kernel 2.2 to 2.4 the call to
do_notify_parent() was moved inside the tasklist lock.  Why was this?

Ehh.. Because that is also what protects our "parent" pointer.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Please send Changelog info and patch notices for the test and-pre releases.

2000-11-18 Thread Linus Torvalds



On Fri, 17 Nov 2000, Miles Lane wrote:
 
 I haven't seen any announcements of recent test and test-pre releases.
 Can you begin sending those again, please?

You can actually get them off kernel.org these days: Peter Anvin set up a
system whereby when I upload a changelog it automatically gets added to
the web-site (main page, bottom).

Linus

-
 - pre7:
- Kai Germaschewski: more ISDN cleanups and small fixes.
- Al Viro: fix ntfs_new_inode() that he broke. Cleanups.
- various: handle !CONFIG_HOTPLUG properly
- David Miller: sparc and networking
- me: more iso9660 fixes. 
- Neil Brown: fix rd and RAID on highmem machines
- Vojtech Pavlik: input driver fixes
- David Woodhouse: module unload races - up_and_exit()

 - pre6:
- Intel: start to add Pentium IV specific stuff (128-byte cacheline
  etc)
- David Miller: search-and-destroy places that forget to mark us
  running after removing us from a wait-queue.
- me: NFS client write-back ref-counting SMP instability.
- me: fix up non-exclusive waiters
- Trond Myklebust: Be more careful about SMP in NFS and RPC code
- Trond Myklebust: inode attribute update race fix
- Charles White: don't do unaligned accesses in cpqarray driver.
- Jeff Garzik: continued driver cleanup and fixes
- Peter Anvin: integrate more of the Intel patches.
- Robert Love: add i815 signature to the intel AGP support
- Rik Faith: DRM update to make it easier to sync up 2.2.x
- David Woodhouse: make old 16-bit pcmcia controllers work
  again (ie i82365 and TCIC)

 - pre5:
- Rasmus Andersen: add proper "linux/init.h" for sound drivers
- David Miller: sparc64 and networking updates
- David Trcka: MOXA numbering starts from 0, not 1.
- Jeff Garzik: sysctl.h standalone
- Dag Brattli: IrDA finishing touches
- Randy Dunlap: USB fixes
- Gerd Knorr: big bttv update
- Peter Anvin: x86 capabilities cleanup
- Stephen Rothwell: apm initcall fix - smp poweroff should work
- Andrew Morton: setscheduler() spinlock ordering fix
- Stephen Rothwell: directory notification documentation
- Petr Vandrovec: ncpfs capabilities check cleanup
- David Woodhouse: fix jffs to use generic is() library
- Chris Swiedler: oom_kill selection fix
- Jens Axboe: re-merge after sleeping in ll_rw_block.
- Randy Dunlap: USB updates (pegasus and ftdi_sio)
- Kai Germaschewski: ISDN ppp header compression fixed

 - pre4:
- Andrea Arcangeli: SMP scheduler memory barrier fixup
- Richard Henderson: fix alpha semaphores and spinlock bugs.
- Richard Henderson: clean up the file from hell: "xor.c" 

 - pre3:
- James Simmons: vgacon "printk()" deadlock with global irq lock.
- don't poke blanked console on console output
- Ching-Ling: get channels right on ALI audio driver
- Dag Brattli and Jean Tourrilhes: big IrDA update
- Paul Mackerras: PPC updates
- Randy Dunlap: USB ID table support, LEDs with usbkbd, belkin
  serial converter. 
- Jeff Garzik: pcnet32 and lance net driver fix/cleanup
- Mikael Pettersson: clean up x86 ELF_PLATFORM
- Bartlomiej Zolnierkiewicz: sound and drm driver init fixes and
  cleanups
- Al Viro: Jeff missed some kmap()'s. sysctl cleanup
- Kai Germaschewski: ISDN updates
- Alan Cox: SCSI driver NULL ptr checks
- David Miller: networking updates, exclusive waitqueues nest properly,
  SMP i_shared_lock/page_table_lock lock order fix.

 - pre2:
- Stephen Rothwell: directory notify could return with the lock held
- Richard Henderson: CLOCKS_PER_SEC on alpha.
- Jeff Garzik: ramfs and highmem: kmap() the page to clear it
- Asit Mallick: enable the APIC in the official order
- Neil Brown: avoid rd deadlock on io_request_lock by using a
  private rd-request function. This also avoids unnecessary
  request merging at this level.
- Ben LaHaise: vmalloc threadign and overflow fix
- Randy Dunlap: USB updates (plusb driver). PCI cacheline size.
- Neil Brown: fix a raid1 on top of lvm bug that crept in in pre1
- Alan Cox: various (Athlon mmx copy, NULL ptr checks for
  scsi_register etc). 
- Al Viro: fix /proc permission check security hole.
- Can-Ru Yeou: SiS301 fbcon driver
- Andrew Morton: NMI oopser and kernel page fault punch through
  both console_lock and timerlist_lock to make sure it prints out..
- Jeff Garzik: clean up "kmap()" return type (it returns a kernel
  virtual address, ie a "void *").
- Jeff Garzik: network driver docs, various one-liners.
- David Miller: add generic "special" flag to page flags, to be
  used by architectures as they see fit. Like keeping track of
  cache coherency issues.
- David Miller: sparc64 updates, make sparc32 boot again
- Davdi Millner: spel "synchronous" correctly
- David Miller: networking - fix some bridge issues, and correct
  

Re: [PATCH] pcmcia event thread. (fwd)

2000-11-18 Thread Linus Torvalds



On Sat, 18 Nov 2000, David Ford wrote:

 Linus Torvalds wrote:
 [...]
 
  If somebody still has a problem with the in-kernel stuff, speak up.
 
 The kernel's irq detection for the card sockets doesn't work for me.  It's the NEC
 Versa LX story.  The DH code also reports no IRQ found but still figures out a
 working IRQ (normally 3) and assigns it for the tulip card.  I use the i82365 module
 w/ the DH code.  The below is the output of the kernel pcmcia code.

 PCI: No IRQ known for interrupt pin B of device 00:03.1. Please try using
 pci=biosirq.
 PCI: No IRQ known for interrupt pin A of device 00:03.0. Please try using
 pci=biosirq.

Strange. Your interrupt router is a bog-standard PIIX4, we know how to
route the thing, AND your device shows up:

 # dump_pirq
 Interrupt routing table found at address 0xf5a80:
   Version 1.0, size 0x0080
   Interrupt router is device 00:07.0
   PCI exclusive interrupt mask: 0x
   Compatible router: vendor 0x8086 device 0x1234
 
 Device 00:03.0 (slot 0):
   INTA: link 0x60, irq mask 0x0420
   INTB: link 0x61, irq mask 0x0420

 Interrupt router: Intel 82371AB PIIX4/PIIX4E PCI-to-ISA bridge
   PIRQ1 (link 0x60): irq 10
   PIRQ2 (link 0x61): irq 5
   PIRQ3 (link 0x62): unrouted
   PIRQ4 (link 0x63): irq 9
   Serial IRQ: [enabled] [continuous] [frame=21] [pulse=4]

Can you (you've probably done this before, but anyway) enable DEBUG in
arch/i386/kernel/pci-i386.h? I wonder if the kernel for some strange
reason doesn't find your router, even though "dump_pirq" obviously does..
If there's something wrong with the checksumming for example..

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Freeze on FPU exception with Athlon

2000-11-18 Thread Linus Torvalds



On Sat, 18 Nov 2000, Brian Gerst wrote:
 
 I get Floating Point Exception (core dumped), but I needed to use the
 modified program below to keep GCC from optimizing the division away as
 a constant.  This is on test11-pre5.

I'm starting to suspect that it's really a combination of three things: 
 - 3dnow optimization (ie you have to compile the kernel with Athlon
   support)
 - pending, but not yet noticed, FPU exceptions.
 - a bug/feature in the kernel, where a process exit does not bother to
   clear the FPU, only marks it as "unused".

If I'm right, the proper test-program should be something like

int main(int argc, char **argv)
{
asm("fldcw %0": :"m" (0));
asm("fldz ; fld1 ; fdiv");
sleep(1);
return 0;
}

where it's important that we do not wait for the result of the fdiv, we
just exit after having caused a pending exception (and you cannot do this 
reliably from C code - depending on compiler version and optimizations
gcc may try to write the bad value back to memory etc). 

Now, with the pending exception, do a 3dnow MMX memcpy() - which will
clear the TS bit (because it decides that the FP state can be thrown
away and doesn't need to do a full save/restore) and start using the FPU.
Boom. Instant FP exception. With the exception handler deciding that
nobody owns the FP state, and thus doing nothing sane.

If I'm right (and I'm _always_ right), the following patch would make a
difference.

Markus?

Linus


--- v2.4.0-test10/linux/arch/i386/kernel/traps.cTue Oct 31 12:42:26 2000
+++ linux/arch/i386/kernel/traps.c  Fri Nov 17 21:52:55 2000
@@ -643,6 +640,12 @@
 asmlinkage void do_coprocessor_error(struct pt_regs * regs, long error_code)
 {
ignore_irq13 = 1;
+
+   /* Due to lazy error handling, we might have false pending errors! */
+   if (!current-used_math) {
+   init_fpu();
+   return;
+   }   
math_error((void *)regs-eip);
 }
 
@@ -700,6 +703,12 @@
if (cpu_has_xmm) {
/* Handle SIMD FPU exceptions on PIII+ processors. */
ignore_irq13 = 1;
+
+   /* Due to lazy error handling, we might have false pending errors! */
+   if (!current-used_math) {
+   init_fpu();
+   return;
+   }   
simd_math_error((void *)regs-eip);
} else {
/*

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Freeze on FPU exception with Athlon

2000-11-18 Thread Linus Torvalds



On Sat, 18 Nov 2000, Alan Cox wrote:

  Linus Torvalds wrote:
   
   I sure as hell hope this isn't an Athlon issue.  Can other people try
   the test-program and see if we have a pattern (ie "it happens only on
   Athlons", or "Linus is on drugs and it happens for everybody else").
  
  I've tried both variants (fesetenv and inline-asm) with glibc-2.1.3,
  2.4.0-test11pre7 and an AMD Thunderbird. Neither does freeze, but
  both yield:
  
  Floating point exception (core dumped)
 
 Compiler specific ?

There's almost certainly more than that. I'd love to have a report on my
asm-only version, but even so I suspect it also requires the 3dnow stuff,
because I'm not able to trigger anything like this on any machines I have
access to (none of them are AMD, though)

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: test11-pre6 still very broken

2000-11-18 Thread Linus Torvalds

In article [EMAIL PROTECTED], Greg KH  [EMAIL PROTECTED] wrote:
On Fri, Nov 17, 2000 at 11:25:50PM -0800, Ben Ford wrote:
 Here is lspci output from the laptop in question.  Is this not UHCI?

Yes it is.  Just a bit funny if you think about it, but with Intel and
Via putting the UHCI core into their chipsets I guess it makes sense.

One note for the archives, if you are presented a choice between a OHCI
or a UHCI controller, go for the OHCI.  It has a "cleaner" interface,
handles more of the logic in the silicon, and due to this provides
faster transfers.

I'd disagree.  UHCI has tons of advantages, not the least of which is
[Cthat it was there first and is widely available.  If OHCI hadn't been
done we'd have _one_ nice good USB controller implementation instead of
fighting stupid and unnecessary battles that shouldn't have existed in
the first place. 

For example, the UHCI root hub can be controlled without DMA, which
makes it a lot cheaper on the system. When a UHCI system is unconnected
and idle, it doesn't waste cycles on extra memory traffic the way OHCI
does.

UHCI also requires fewer transistors, and is the more common by far
simply because Intel is good at getting their chipsets out.

Basically, the advantages of OHCI are not worth the differentiation, and
are not always advantages at all.  Many people think that it is "good"
that the root hub looks more like a regular hub, but that's just wrong. 

Especially with faster speeds, the memory pressure of the USB controller
is going to be noticeable, and it would be much preferable if the root
directory of the USB tree would be separated out (and cached in the
controller) by the root hub.  The UHCI approach of making the root a bit
special should be taken _further_, and not seen as a mistake. 

I hope EHCI makes it all moot. Some way or another.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Freeze on FPU exception with Athlon

2000-11-18 Thread Linus Torvalds



On Sat, 18 Nov 2000, Markus Schoder wrote:
 
 Your test program is indeed sufficient to trigger the
 freeze.  Unfortunately the patch does not make a
 difference :(

Ok.

This may in fact be an Athlon CPU bug. But before we contact anybody from
AMD, I'd really need to know what the result from the irq13 disabling and
the non-3dnow thing is.

Considering that Udo reports no lockup at all with the same test-program
even with an Athlon and 3dnow, it looks like it's either irq13 (and a
motherboard routing issue: sane modern motherboards shouldn't even route
the external FERR at _all_ any more), or something stepping-specific on
your Athlon. It doesn't sound kernel-related per se.

Let's hope it's irq13. If so, it will be easy to fix (tentative fix: any
CPU that reports a built-in FPU just doesn't get irq13 enabled at all).

Current workign theory:

 - Athlons do FERR wrong. They drive FERR externally when the
   unmasked exception happens, rather than when the next FP instruction
   actually detects the exception. This means that the external FERR irq13
   actually happens _before_ the internal exception 16, which is wrong.

 - Linux has seen exception 16 working, so it ignores irq13 and assumes
   that it's some real external device (which does happen - sometimes
   SCI is wired to irq13).

 - irq13 is not only wired on the motherboard (which was right in 1989,
   but is not right in 2000), but is marked level-triggered (which
   probably wasn't right even in 1989). So when the irq13 happens, it
   _keeps_ on happening, and we never get an exception 16 at all.

The reason 2.2.x works on your machine might be that the early bootup test
for FP exceptions will have done something to mask the fpu exception just
by luck. I forget the exact details of the test - it got removed in later
kernels because it made it really nasty to handle XMM faults correctly.

Does anybody have any better ideas? 

Linus



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Freeze on FPU exception with Athlon

2000-11-18 Thread Linus Torvalds



On Sat, 18 Nov 2000, adrian wrote:

 
 
 On Sat, 18 Nov 2000, Linus Torvalds wrote:
 
  There's almost certainly more than that. I'd love to have a report on my
  asm-only version, but even so I suspect it also requires the 3dnow stuff,
 
 I tried all three versions, and no freezes.  I forgot to mention the tests
 were run on a model 2 Athlon (original slot K7, .18 micron).  The kernel
 is compiled with 3dnow support.

Apparently it isn't the stepping, as we have Athlon model 4's both showing
it and not showing it. The motherboard seems to be the only real
difference here, which is why I like the irq13 explanation more and more.

I've been wanting to get rid of irq13 anyway (some boards wire up USB
and/or ACPI to irq13 and the fact that the FPU has claimed it makes those
machines unhappy), so if the solution is to only check for irq13 on old
i386 and i486sx machines and just leave it alone for newer CPU's, I won't
complain.

Markus, can you make the irq13 test the first thing - don't worry about
3dnow as that seems to not be a deciding factor..

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] semaphore fairness patch against test11-pre6

2000-11-18 Thread Linus Torvalds



On Sun, 19 Nov 2000, Andrew Morton wrote:
 
 Has anyone tried it on SMP?  I get fairly repeatable instances of immortal
 `D'-state processes with this patch.

Too bad. I really thought it should be safe to do.

 The patch isn't right - it allows `sleepers' to increase without bound.
 But it's a boolean!

It's not a boolean. It's really a "bias count". It happens to get only the
values 0 and 1 simply becase the logic is that we always account for all
the other people when any process goes to sleep, so "sleepers" only ever
counts the one process that went to sleep last. 

But the algorithm itself should allow for other values. In fact, I think
that you'll find that it works fine if you switch to non-exclusive
wait-queues, and the only reason you see the repeatable D states is
exactly the case where we didn't "take" the semaphore even though we were
awake, and that basically makes us an exclusive process that didn't react
to an exclusive wakeup.

(Think of it this way: with the "inside" patch, the process does

tsk-state = TASK_INTERRUPTIBLE;

twice, even though there was only one semaphore that woke it up: we
basically "lost" a wakeup event, not because "sleepers" cannot be 2, but
because we didn't pick up the wakeup that we might have gotten.

Instead of the "goto inside", how about just doing it without the "double
sleep", and doing something like

tsk-state = TASK_INTERRUPTIBLE;
add_wait_queue_exclusive(sem-wait, wait);

spin_lock_irq(semaphore_lock);
sem-sleepers ++;
+   if (sem-sleepers  1) {
+   spin_unlock_irq(semaphore_lock);
+   schedule();
+   spin_lock_irq(semaphore_lock);
+   }
for (;;) {

The only difference between the above and the "goto inside" variant is
really that the above sets "tsk-state = TASK_INTERRUPTIBLE;" just once
per loop, not twice as the "inside" case did. So if we happened to get an
exclusive wakeup at just the right point, we won't go to sleep again and
miss it.

But these things are very subtle. The current semaphore algorithm was
basically perfected over a week of some serious thinking. The fairness
change should similarly get a _lot_ of attention. It's way too easy to
miss things.

Does the above work for you even in SMP?

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] pcmcia event thread. (fwd)

2000-11-18 Thread Linus Torvalds



On Sat, 18 Nov 2000, David Ford wrote:
 Linus Torvalds wrote:
 
  Can you (you've probably done this before, but anyway) enable DEBUG in
  arch/i386/kernel/pci-i386.h? I wonder if the kernel for some strange
  reason doesn't find your router, even though "dump_pirq" obviously does..
  If there's something wrong with the checksumming for example..
 
 ..building now.

Actually, try this patch first. It adds the PCI_DEVICE_ID_INTEL_82371MX
router type, and also makes the PCI router search fall back more
gracefully on the device it actually found if there is not an exact match
on the "compatible router" entry...

It should make Linux find and accept the chip you have. Knock wood.

Linus

--- v2.4.0-test10/linux/arch/i386/kernel/pci-irq.c  Tue Oct 31 12:42:26 2000
+++ linux/arch/i386/kernel/pci-irq.cSat Nov 18 21:11:19 2000
@@ -283,12 +297,19 @@
{ "PIIX", PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82371FB_0, pirq_piix_get, 
pirq_piix_set },
{ "PIIX", PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82371SB_0, pirq_piix_get, 
pirq_piix_set },
{ "PIIX", PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82371AB_0, pirq_piix_get, 
pirq_piix_set },
+   { "PIIX", PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82371MX,   pirq_piix_get, 
+pirq_piix_set },
{ "PIIX", PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82443MX_0, pirq_piix_get, 
pirq_piix_set },
+
{ "ALI", PCI_VENDOR_ID_AL, PCI_DEVICE_ID_AL_M1533, pirq_ali_get, pirq_ali_set 
},
+
{ "VIA", PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_82C586_0, pirq_via_get, 
pirq_via_set },
{ "VIA", PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_82C596, pirq_via_get, 
pirq_via_set },
{ "VIA", PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_82C686, pirq_via_get, 
pirq_via_set },
+
{ "OPTI", PCI_VENDOR_ID_OPTI, PCI_DEVICE_ID_OPTI_82C700, pirq_opti_get, 
pirq_opti_set },
+
+   { "NatSemi", PCI_VENDOR_ID_CYRIX, PCI_DEVICE_ID_CYRIX_5520, pirq_cyrix_get, 
+pirq_cyrix_set },
+
{ "default", 0, 0, NULL, NULL }
 };
 
@@ -298,7 +319,6 @@
 static void __init pirq_find_router(void)
 {
struct irq_routing_table *rt = pirq_table;
-   u16 rvendor, rdevice;
struct irq_router *r;
 
 #ifdef CONFIG_PCI_BIOS
@@ -308,32 +328,31 @@
return;
}
 #endif
-   if (!(pirq_router_dev = pci_find_slot(rt-rtr_bus, rt-rtr_devfn))) {
+   /* fall back to default router if nothing else found */
+   pirq_router = pirq_routers + sizeof(pirq_routers) / sizeof(pirq_routers[0]) - 
+1;
+
+   pirq_router_dev = pci_find_slot(rt-rtr_bus, rt-rtr_devfn);
+   if (!pirq_router_dev) {
DBG("PCI: Interrupt router not found at %02x:%02x\n", rt-rtr_bus, 
rt-rtr_devfn);
-   /* fall back to default router */
-   pirq_router = pirq_routers + sizeof(pirq_routers) / 
sizeof(pirq_routers[0]) - 1;
return;
}
-   if (rt-rtr_vendor) {
-   rvendor = rt-rtr_vendor;
-   rdevice = rt-rtr_device;
-   } else {
-   /*
-* Several BIOSes forget to set the router type. In such cases, we
-* use chip vendor/device. This doesn't guarantee us semantics of
-* PIRQ values, but was found to work in practice and it's still
-* better than not trying.
-*/
-   DBG("PCI: Guessed interrupt router ID from %s\n", 
pirq_router_dev-slot_name);
-   rvendor = pirq_router_dev-vendor;
-   rdevice = pirq_router_dev-device;
-   }
-   for(r=pirq_routers; r-vendor; r++)
-   if (r-vendor == rvendor  r-device == rdevice)
+
+   for(r=pirq_routers; r-vendor; r++) {
+   /* Exact match against router table entry? Use it! */
+   if (r-vendor == rt-rtr_vendor  r-device == rt-rtr_device) {
+   pirq_router = r;
break;
-   pirq_router = r;
-   printk("PCI: Using IRQ router %s [%04x/%04x] at %s\n", r-name,
-  rvendor, rdevice, pirq_router_dev-slot_name);
+   }
+   /* Match against router device entry? Use it as a fallback */
+   if (r-vendor == pirq_router_dev-vendor  r-device == 
+pirq_router_dev-device) {
+   pirq_router = r;
+   }
+   }
+   printk("PCI: Using IRQ router %s [%04x/%04x] at %s\n",
+   pirq_router-name,
+   pirq_router_dev-vendor,
+   pirq_router_dev-device,
+   pirq_router_dev-slot_name);
 }
 
 static struct irq_info *pirq_get_info(struct pci_dev *dev, int pin)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] semaphore fairness patch against test11-pre6

2000-11-19 Thread Linus Torvalds



On Sun, 19 Nov 2000, Andrew Morton wrote:
 
 I don't see a path where David's patch can cause a lost wakeup in the
 way you describe.

Basically, if there are two up() calls, they might end up waking up only
one process, because the same process goes to sleep twice. That's wrong.
It should wake up two processes.

However, thinking about it more, that's obviously possible only for
semaphores that are used for more than just mutual exclusion, and
basically nobody does that anyway. 

 Next step is to move the waitqueue and wakeup operations so they're
 inside the spinlock.  Nope.  That doesn't work either.
 
 Next step is to throw away the semaphore_lock and use the sem-wait
 lock instead.  That _does_ work.  This is probably just a
 fluke - it synchronises the waker with the sleepers and we get lucky.

Yes, especially on a two-cpu machine that kind of synchronization can
basically end up hiding real bugs.

I'll think about this some more. One thing I noticed is that the
"wake_up(sem-wait);" at the end of __down() is kind of bogus: we don't
actually want to wake anybody up at that point at all, it's just that if
we don't wake anybody up we'll end up having "sem = 0, sleeper = 0", and
when we unlock the semaphore the "__up()" logic won't trigger, and we
won't ever wake anybody up. That's just incredibly bogus.

Instead of the "wake_up()" at the end of __down(), we should have
something like this at the end of __down() instead:

... for-loop ...
}
tsk-state = TASK_RUNNING;
remove_wait_queue(sem-wait, wait);

/* If there are others, mark the semaphore active */
if (wait_queue_active(sem_wait)) {
atomic_dec(sem-count);
sem-sleepers = 1;
}
spin_unlock_irq(semaphore_lock);
}

which would avoid an unnecessary reschedule, and cause the wakeup to
happen at the proper point, namely "__up()" when we release the
semaphore.

I suspect this may be part of the trouble with the "sleepers" count
playing: we had these magic rules that I know I thought about when the
code was written, but that aren't really part of the "real" rules.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Linux 2.4.0-test11

2000-11-19 Thread Linus Torvalds



On Sun, 19 Nov 2000, Rich Baum wrote:

 The patch is in the v2.3 directory.  You may want to move it to the 
 v2.4 directory so people can find it easier.

Oops. Thanks. Done.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: i386 cleanups

2001-04-18 Thread Linus Torvalds



On Tue, 17 Apr 2001, Pavel Machek wrote:

 These are tiny cleanups you might like. sizes are "logically"
 long.

No. Sizes are not "logical". They are whatever you decide they are, ie
it's purely a complier convention.

At least earlier, size_t was defined as "unsigned int" in user mode, and
doing anything else would make gcc complain about clashes with its
compiled-in __builtin_size_t that it uses for the builtin prototypes (ie
if you had a declaration for "void *memcpy(void *dest, const void *src,
size_t n);" and your size_t didn't match the gcc builtin_size_t, you'd get
a "redefined with different arguments" warning or something).

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-18 Thread Linus Torvalds


[ Cc'd to linux-kernel, to get feedback etc. I've already talked this over
  with some people a long time ago, but more people might get interested ]

On Tue, 17 Apr 2001, Mike Kravetz wrote:

 In the near future, I should have some time to begin
 working on a prototype implementation.  One thing that
 I don't remember too clearly is a reference you made to
 the System V semaphore implementation.  I'm pretty sure
 you indicated any new light weight implementation should
 not be based on the System V APIs.  Is this correct, or
 did I remember incorrectly?

It's correct. I don't see any way the kernel can do the SysV semantics for
"cleanup" for a semaphore when a process dies in an uncontrolled manner
(or do it fast enough even when it can use at_exit() etc). The whole point
of fast semaphores would be to avoid the kernel entry entirely for the
non-contention case, which basically means that the kernel doesn't even
_know_ who holds the semaphore at any given moment. So the kernel cannot
do the cleanups on process exit that are part of the SysV semantics.

My personal absolute favourite "fast semaphore" implementation is as
follows. First the user interface, just to make it clear that the
implementation is very far from the interface:

/*
 * a fast semaphore is a 128-byte opaque thing,
 * aligned on a 128-byte boundary. This is partly
 * to minimize false sharing in the L1 (we assume
 * that 128-byte cache-lines are going to be fairly
 * common), but also to allow the kernel to hide
 * data there
 */
struct fast_semaphore {
unsigned int opaque[32];
} __attribute__((aligned, 64));

struct fast_semaphore *FS_create(char *ID);
int FS_down(struct fast_semaphore *, unsigned long timeout);
void FS_up(struct fast_semaphore *);

would basically be the interface. People would not need to know what the
implementation is like. Add to taste (ie make rw-semaphores, etc), but the
above is a kind of "fairly minimal thing". So "trydown()" would just be a
FS_down() with a zero timeout, for example.

Anyway, the implementation would be roughly:

 - FS_create is responsible for allocating a shared memory region
   at "FS_create()" time. This is what the ID is there for: a "anonymous"
   semaphore would have an ID of NULL, and could only be used by threads
   or across a fork(): it would basically be done with a MAP_ANON |
   MAP_SHARED, and the pointer returned would just be a pointer to that
   memory.

   So FS_create() starts out by allocating the backing store for the
   semaphore. This can basically be done in user space, although the
   kernel does need to get involved for the second part of it, which
   is to (a) allocate a kernel "backing store" thing that contains the
   waiters and the wait-queues for other processes and (b) fill in the
   opaque 128-bit area with the initial count AND the magic to make it
   fly. More on the magic later.

   So the second part of FS_create needs a new system call.

 - FS_down() and FS_up() would be two parts: the fast case (no
   contention), very similar to what the Linux kernel itself uses. And the
   slow case (contention), which ends up being a system call. You'd have
   something like this on x86 in user space:

extern void FS_down(struct fast_semahore *fs,
unsigned long timeout) __attribute__((regparm(3)));

/* Four-instruction fast-path: the call plus these ones */
FS_down:
lock ; decl (%edx)
js FS_down_contention
ret
FS_down_contention:
movl $FS_down_contention_syscall,%eax
int 80
ret

   (Note: the regparm(3) thing makes the arguments be passed in %edx and
   %ecx - check me on details in which order, and realize that they will
   show up as arguments to the system call too because the x86 system call
   interface is already register-based)

   FS_up() does the same - see how the kernel already knows to avoid doing
   the wakup if there has been no contention, and has a fast-path that
   never goes out-of-line (ie the kernel semaphore out-of-line case is the
   user-level system call case).

So now we get to the "subtle" part. Getting contention right. The above
causes us to get to the kernel when we have contention, and the kernel
gets only a pointer to user space. In particular, it gets a pointer to
memory that it cannot trust, and from that _untrusted_ pointer it needs to
quickly get to the _trusted_ part, ie the part that only the kernel itself
controls (the stuff with the wait-queues etc). This is where subtlety is
needed.

The speed concerns are paramount: I am convinced that the non-contention
case is the important one, but at the same time we can't allow contention
to be _too_ costly either. The system call is fairly cheap (and already
acts as a first-level back-off, so that's ok), but we can't afford to

Re: light weight user level semaphores

2001-04-19 Thread Linus Torvalds



On Thu, 19 Apr 2001, Alon Ziv wrote:

 * the userspace struct was just a signed count and a file handle.

The main reason I wanted to avoid a filehandle is just because it's
another name space that people already use, and that people know what the
semantics are for (ie "open()" is _defined_ to return the "lowest
available file descriptor", and people depend on that).

So if you use a file handle, you'd need to do magic - open it, and then
use dup2() to move it up high, or something. Which has its own set of
problems: just _how_ high woul dyou move it? Would it potentially disturb
an application that opens thousands of files, and knows that they get
consecutive file descriptors? Which is _legal_ and well-defined in UNIX.

However, I'm not married to the secure hash version - you could certainly
use another name-space, and something more akin to file descriptors. You
should be aware of issues like the above, though. Maybe it would be ok to
say "if you use fast semaphores, they use file descriptors and you should
no longer depend on consecutive fd's".

But note how that might make it really nasty for things like libraries:
can libraries use fast semaphores behind the back of the user? They might
well want to use the semaphores exactly for things like memory allocator
locking etc. But libc certainly cant use fd's behind peoples backs.

So personally, I actually think that you must _not_ use file descriptors.
But that doesn't mean that you couldn't have a more "file-desciptor-like"
approach.

Side note: the design _should_ allow for "lazy initialization". In
particular, it should be ok for FS_create() to not actually do a system
call at all, but just initialize the count and set a "uninitialized" flag.
And then the actual initialization would be done at "FS_down()" time, and
only if contention happens.

Why? Note that there are many cases where contention simply _cannot_
happen. The classic one is a thread-safe library that is used both by
threaded applications and by single-threaded ones, where the
single-threaded one would never actually trigger contention.

For these kinds of reasons it would actually be best to make try to
abstract the interfaces (notably the system call interface) as much as
possible, so that you can change the implementation inside the kernel
without having to recompile applications that use it. So the sanest
implementation might be one where

 - FS_create is a system call that just gets a 128-byte area and an ID.
 - the contention cases are plain system calls with no user-mode part to
   them at all.

This allows people to modify the behaviour of the semaphores later,
_without_ having any real coupling between user-mode expectations and
kernel implementation.

For example, if the user-mode library actually does a physical "open()" or
plays games with file descriptors itself, we will -always- be stuck with
the fd approach, and we can never fix it. But if you have opaque system
calls, you mist start out with a system call that internally just does the
equivalent of the "open a file descriptor and hide it in the semaphore",
and later on the thing can be changed to do whatever else without the user
program ever even realizing..

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Linus Torvalds



On Thu, 19 Apr 2001, Abramo Bagnara wrote:

  [ Using file descriptors ]

 This would also permit:
 - to have poll()
 - to use mmap() to obtain the userspace area

 It would become something very near to sacred Unix dogmas ;-)

No, this is NOT what the UNIX dogmas are all about.

When UNIX says "everything is a file", it really means that "everything is
a stream of bytes". Things like magic operations on file desciptors are
_anathema_ to UNIX. ioctl() is the worst wart of UNIX. Having magic
semantics of file descriptors is NOT Unix dogma at all, it is a horrible
corruption of the original UNIX cleanlyness.

Please don't excuse "semaphore file descriptors" with the "everything is a
file" mantra. It is not at ALL applicable.

The "everything is a file" mantra is to make pipe etc meaningful -
processes don't have to worry about whether the fd they have is from a
file open, a pipe() system call, opening a special block device, or a
socket()+connect() thing. They can just read and write. THAT is what UNIX
is all about.

And this is obviously NOT true of a "magic file descriptors for
semaphores". You can't pass it off as stdin to another process and expect
anything useful from it unless the other process _knows_ it is a special
semaphore thing and does mmap magic or something.

The greatness of UNIX comes from "everything is a stream of bytes". That's
something that almost nobody got right before UNIX. Remember VMS
structured files? Did anybody ever realize what an absolutely _idiotic_
crock the NT "CopyFile()" thing is for the same reason?

Don't confuse that with "everything should be a file descriptor". The two
have nothing to do with each other.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Linus Torvalds



On Thu, 19 Apr 2001, Alan Cox wrote:
  can libraries use fast semaphores behind the back of the user? They might
  well want to use the semaphores exactly for things like memory allocator
  locking etc. But libc certainly cant use fd's behind peoples backs.

 libc is entitled to, and most definitely does exactly that. Take a look at
 things like gethostent, getpwent etc etc.

Ehh.. I will bet you $10 USD that if libc allocates the next file
descriptor on the first "malloc()" in user space (in order to use the
semaphores for mm protection), programs _will_ break.

You want to take the bet?

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Linus Torvalds



On Thu, 19 Apr 2001, Alexander Viro wrote:

 Ehh... Non-lazy variant is just read() and write() as down_failed() and
 up_wakeup() Lazy... How about

Looks good to me. Anybody want to try this out and test some benchmarks?

There may be problems with large numbers of semaphores, but hopefully that
won't be an issue. And the ability to select/poll on these things might
come in handy for various implementation issues (ie locks with timeouts
etc).

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Children first in fork

2001-04-19 Thread Linus Torvalds

In article 9bn3sr$fer$[EMAIL PROTECTED],
Wichert Akkerman [EMAIL PROTECTED] wrote:

What you can do is what strace does: insert a loop instruction after
the fork or clone call and remove that when the call returns.

You're probably even better off just intercepting the fork, turning it
into a clone, and setting the CLONE_PTRACE option. Which (together with
tracing the parent, which you will obviously be doing already in order
to do all this in the first place) will nicely cause the child to get an
automatic SIGSTOP _and_ be already traced.

Not that I've tested it myself.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Linus Torvalds



On 19 Apr 2001, Ulrich Drepper wrote:

 Linus Torvalds [EMAIL PROTECTED] writes:

  Looks good to me. Anybody want to try this out and test some benchmarks?

 I fail to see how this works across processes.

It's up to FS_create() to create whatever shared mapping is needed.

For threads, you don't need anything special.

For fork()'d helper stuff, you'd use MAP_ANON | MAP_SHARED.

For execve(), you need shm shared memory or MAP_SHARED on a file.

It all depends on your needs.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Linus Torvalds



On Thu, 19 Apr 2001, Ingo Oeser wrote:

 Are you sure, you can implement SMP-safe, atomic operations (which you need
 for all up()/down() in user space) WITHOUT using privileged
 instructions on ALL archs Linux supports?

Why do you care?

Sure, there are broken architectures out there. They'd need system calls.
They'd be slow. That's THEIR problem.

No sane architecture has this limitation.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Linus Torvalds



On Thu, 19 Apr 2001, Ingo Oeser wrote:

 On Thu, Apr 19, 2001 at 09:11:56AM -0700, Linus Torvalds wrote:
  No, this is NOT what the UNIX dogmas are all about.
 
  When UNIX says "everything is a file", it really means that "everything is
  a stream of bytes". Things like magic operations on file desciptors are
  _anathema_ to UNIX. ioctl() is the worst wart of UNIX. Having magic
  semantics of file descriptors is NOT Unix dogma at all, it is a horrible
  corruption of the original UNIX cleanlyness.

 Right. And on semaphores, this stream is exactly 0 bytes long.
 This is perfectly normal and can be handled by all applications
 I'm aware of.

It's perfectly normal, but it does NOT conform to the idea "everything is
a file".

The fact that there are other ugly examples (ioctls and special files)
does not mean that adding a new one is a good idea.

When people say "everything is a file", they mean that it can be _used_ as
a file, not that it can passably return a valid error code.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Linus Torvalds



On 19 Apr 2001, Ulrich Drepper wrote:
 Linus Torvalds [EMAIL PROTECTED] writes:

   I fail to see how this works across processes.
 
  It's up to FS_create() to create whatever shared mapping is needed.

 No, the point is that FS_create is *not* the one creating the shared
 mapping.  The user is explicitly doing this her/himself.

No.

Who creates the shared mapping is _irrelevant_, because it ends up being
entirely a function of what the chosen interface is.

For example, quote often you want semaphores for threading purposes only,
and then you don't need a shared mapping at all. So you'd use the proper
interfaces for that, and for that, your "thread_semaphore()" function
would just do a malloc() and initialize the memory to zero. Doing a mmap
or something like that would just be stupid, because you're protecting
only one VM space anyway.

In other cases, you may need to have process-wide semaphores, and you'd
use "process_semaphore(char *ID)" or something, which actually does a
mmap() on a shared file. Or you'd have "fork_semaphore()" that creates a
semaphore that is valid across forks, not not valid across execve's and
cannot be passed around.

So normally the user does NOT create the shared mapping himself. Normally
you'd just use the "proper interface" for your needs, nothing more.

Sure, you can have the option of saying "I've created this shared memory
region, please make it use the generic semaphore engine code", but quite
frankly I think that is a BAD IDEA. Why? Because it won't work portably
across architectures anyway. You don't know what the requirements of the
architecture are, so it should be done by a nice "semaphore library". NOT
by the user.

Remember: these semaphores are NOT a new SysV bogosity. These semaphores
are a new interface, with sane performance and sane design. And you can
have multiple external interfaces to the same "semaphore engine".

I'm not interested in re-creating the idiocies of Sys IPC.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Children first in fork

2001-04-20 Thread Linus Torvalds



On Fri, 20 Apr 2001, Mark Kettenis wrote:
I believe the 2.2.x behaviour was pretty much
 useless,

No. 2.2.x is not useless, it is apparently _buggy_ in this regard. Some of
the fixes in the 2.3.x timeframe seem to not have made it into 2.2.x.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-20 Thread Linus Torvalds

In article [EMAIL PROTECTED],
Olaf Titz  [EMAIL PROTECTED] wrote:
 Ehh.. I will bet you $10 USD that if libc allocates the next file
 descriptor on the first "malloc()" in user space (in order to use the
 semaphores for mm protection), programs _will_ break.

Of course, but this is a result from sloppy coding.

ABSOLUTELY NOT!

This is guaranteed behaviour of UNIX. You get file handles in order, or
you don't get them at all.

Sure, some library functions are allowed to use up file handles. But
most sure as hell are NOT.

In general, open()
can just return anything and about the only case where you can even
think of ignoring its result is this:
 close(0); close(1); close(2);
 open("/dev/null", O_RDWR); dup(0); dup(0);

Which is quite common to do.

Imagine a server that starts up another process, which does exactly
something like the above: the _usual_ execve() case looks something like

pid = fork();
if (!pid) {
close(0);
close(1);
dup(pipe[0]);   /* input pipe */
dup(pipe[1]);   /* output pipe */
execve("child");
exit(1);
}

The above is absolutely _standard_ behaviour. It's required to work.

And btw, it's _still_ required to work even if there happens to be a
"malloc()" in between the close() and the dup() calls.

Trust me. You're arguing for clearly broken behaviour. malloc() and
friends MUST NOT open file descriptors. It _will_ break programs that
rely on traditional and documented features.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: x86 rwsem in 2.4.4pre[234] are still buggy [was Re: rwsembenchmarks [Re: generic rwsem [Re: Alpha process table hang]]]

2001-04-20 Thread Linus Torvalds



On Fri, 20 Apr 2001, Andrea Arcangeli wrote:

 While dropping the list_empty check to speed up the fast path I faced the same
 complexity of the 2.4.4pre4 lib/rwsem.c and so before reinventing the wheel I
 read how the problem was solved in 2.4.4pre4.

I would suggest the following:

 - the generic semaphores should use the lock that already exists in the
   wait-queue as the semaphore spinlock.

 - the generic semaphores should _not_ drop the lock. Right now it drops
   the semaphore lock when it goes into the slow path, only to re-aquire
   it. This is due to bad interfacing with the generic slow-path routines.

   I suspect that this lock-drop is why Andrea sees problems with the
   generic semaphores. The changes to "count" and "sleeper" aren't
   actually atomic, because we don't hold the lock over them all. And
   re-using the lock means that we don't need the two levels of
   spinlocking for adding ourselves to the wait queue. Easily done by just
   moving the locking _out_ of the wait-queue helper functions, no?

 - the generic semaphores are entirely out-of-line, and are just declared
   universally as regular FASTCALL() functions.

The fast-path x86 code looks ok to me. The debugging stuff makes it less
readable than it should be, I suspect, and is probably not worth it at
this stage. The users of rw-semaphores are so well-defined (and so well
debugged) that the debugging code only makes the code harder to follow
right now.

Comments?  Andrea? Your patches have looked ok, but I absoutely refuse to
see the non-inlined fast-path for reasonable x86 hardware..

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Fix for SMP deadlock in autofs4

2001-04-20 Thread Linus Torvalds



On Fri, 20 Apr 2001, Jeremy Fitzhardinge wrote:

 This is a fix for a potential deadlock in autofs4's expire routine.

It's wrong.

I don't think we should be able to do a mntput() _either_ inside the
spinlock. The filesystem should not "know" that mntput is safe.

For this reason I don't think "dput_locked()" is the right answer either.

Why are we doing the mntget/dget at all? We hold the spinlock, so we know
they are not going away. Not doing the mntget/dget means that we (a) run
faster and (b) don't have the bug, because we don't need to put the damn
things.

Comments?

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Fix for SMP deadlock in autofs4

2001-04-20 Thread Linus Torvalds



On Fri, 20 Apr 2001, Jeremy Fitzhardinge wrote:

 I kept the dget/put out caution and ignorance, but they're clearly
 problematic.  I'm happy to drop them if holding dcache_lock is enough
 to keep the tree stable while I traverse it.

How does this patch look to you people?

It's untested, but looks fairly obvious. It removes the increment, and
changes autofs4_expire() to properly bump the count of the returned dentry
(and callers will dput() it when done). This may be unnecessarily careful,
but it's the RightThing(tm) to do.

Jeremy, would you mind verifying that this WorksForYou(tm)?

Linus

-
diff -u --recursive --new-file pre5/linux/fs/autofs4/expire.c linux/fs/autofs4/expire.c
--- pre5/linux/fs/autofs4/expire.c  Mon Oct 23 21:57:38 2000
+++ linux/fs/autofs4/expire.c   Fri Apr 20 22:57:51 2001
@@ -98,8 +98,6 @@
 top, count));
this_parent = top;

-   count--;/* top is passed in after being dgot */
-
if (is_autofs4_dentry(top)) {
count--;
DPRINTK(("is_tree_busy: autofs; count=%d\n", count));
@@ -168,8 +166,6 @@
unsigned long timeout;
struct dentry *root = sb-s_root;
struct list_head *tmp;
-   struct dentry *d;
-   struct vfsmount *p;

if (!sbi-exp_timeout || !root)
return NULL;
@@ -208,23 +204,17 @@
 attempts if expire fails the first time */
ino-last_used = now;
}
-   p = mntget(mnt);
-   d = dget_locked(dentry);
-
-   if (!is_tree_busy(p, d)) {
+   if (!is_tree_busy(mnt, dentry)) {
DPRINTK(("autofs_expire: returning %p %.*s\n",
 dentry, (int)dentry-d_name.len, 
dentry-d_name.name));
/* Start from here next time */
list_del(root-d_subdirs);
list_add(root-d_subdirs, dentry-d_child);
+   dget(dentry);
spin_unlock(dcache_lock);

-   dput(d);
-   mntput(p);
return dentry;
}
-   dput(d);
-   mntput(p);
}
spin_unlock(dcache_lock);

@@ -251,6 +241,7 @@
pkt.len = dentry-d_name.len;
memcpy(pkt.name, dentry-d_name.name, pkt.len);
pkt.name[pkt.len] = '\0';
+   dput(dentry);

if ( copy_to_user(pkt_p, pkt, sizeof(struct autofs_packet_expire)) )
return -EFAULT;
@@ -278,6 +269,7 @@
de_info-flags |= AUTOFS_INF_EXPIRING;
ret = autofs4_wait(sbi, dentry-d_name, NFY_EXPIRE);
de_info-flags = ~AUTOFS_INF_EXPIRING;
+   dput(dentry);
}

return ret;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [andrea@suse.de: Re: generic rwsem [Re: Alpha process tablehang]]

2001-04-20 Thread Linus Torvalds



On Fri, 20 Apr 2001, David Howells wrote:

 The file should only be used for the 80386 and maybe early 80486's where
 CMPXCHG doesn't work properly, everything above that can use the XADD
 implementation.

Why are those not using the generic files? The generic code is obviously
more maintainable.

 But if you want it totally non-inline, then that can be done. However, whilst
 developing it, I did notice that that slowed things down, hence why I wanted
 it kept in line.

I want to keep the _fast_ case in-line.

I do not care at ALL about the stupid spinlock version. That should be the
_fallback_, and it should be out-of-line. It is always going to be the
slowest implementation, modulo bugs in architecture-specific code.

For i386 and i486, there is no reason to try to maintain a complex fast
case. The machines are unquestionably going away - we should strive to not
burden them unnecessarily, but we should _not_ try to save two cycles.

In short:
 - the only case that _really_ matters for performance is the uncontended
   read-lock for "reasonable" machines. A i386 no longer counts as
   reasonable, and designing for it would be silly. And the write-lock
   case is much less compelling.
 - We should avoid any inlines where the inline code is 2* the
   out-of-line code. Icache issues can overcome any cycle gains, and do
   not show up well in benchmarks (benchmarks tend to have very hot
   icaches). Note that this is less important for the out-of-line code in
   another segment that doesn't get brought into the icache at all for the
   non-contention case, but that should still be taken _somewhat_ into
   account if only because of kernel size issues.

Both of the above rules implies that the generic spin-lock implementation
should be out-of-line.

   (1) asm-i386/rwsem-spin.h is wrong, and can probably be replaced with the
   generic spinlock implementation without inconveniencing people much.
   (though someone has commented that they'd want this to be inline as
cycles are precious on the slow 80386).

Icache is also precious on the 386, which has no L2 in 99% of all cases.
Make it out-of-line.

   (2) "fix up linux/rwsem-spinlock.h": do you want the whole generic spinlock
   implementation made non-inline then?

Yes. People who care about performance _will_ have architecture-specific
inlines on architectures where they make sense (ie 99% of them).

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: x86 rwsem in 2.4.4pre[234] are still buggy [was Re: rwsembenchmarks [Re: generic rwsem [Re: Alpha process table hang]]]

2001-04-21 Thread Linus Torvalds



On Sat, 21 Apr 2001, Russell King wrote:

 Erm, spin_lock()?  What if up_read or up_write gets called from interrupt
 context (is this allowed)?

Currently that is not allowed.

We allow it for regular semaphores, but not for rw-semaphores.

We may some day have to revisit that issue, but I suspect we won't have
much reason to.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: try_to_swap_out() deactivating pages w. count 2

2001-04-21 Thread Linus Torvalds

In article [EMAIL PROTECTED],
Rik van Riel  [EMAIL PROTECTED] wrote:

What I _am_ worried about is the fact that we do this to pages with
a really high page age. These things are in active use and cannot
be swapped out any time soon, yet we do claim swap space for it ...

Ehh... And if we didn't do that, then how could they every become less
active?

We should _absolutely_ do the swap space reclaiming without looking at
the page count. If we don't, you will never free those pages, and I have
a trivial exploit for you that will basically mlock all pages in memory.

try_to_swap_out() _absolutely_ does the right thing.  Also note how it
will need to allocate the swap space backing store only once. 

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: try_to_swap_out() deactivating pages w. count 2

2001-04-21 Thread Linus Torvalds


On Sat, 21 Apr 2001, Rik van Riel wrote:
  
  We should _absolutely_ do the swap space reclaiming without looking at
  the page count.
 
 page-age != page-count

It's all the same thing.

The page age and count are used to decice when the page actually gets
thrown _out_ of memory. That's a decision that is based on the _physical_
page attributes.

But try_to_swap_out() is based on the attribute on this particular virtual
mapping of the page. If this particular virtual mapping does not have the
"accessed" bit set, then try_to_swap_out() should get rid of that virtual
mapping. It should absolutely not use the global page characteristics
(either global usage count or global age) in making that decision. Because
those do not matter - they have absoilutely no meaning for this virtual
mapping of the page.

Put another way: if process A is a heavy user of a page, and process B
just touched it once and will never touch it again, what do you think
should happen?

Answer: the page should be dropped from process B. It's a cheap thing to
do (we can get it back if necessary without any IO), and it means that if
we end up having toi actually swap out the page eventually, we will not be
confused by "noise" in the page count from a mappign that hasn't been
active for a long time.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-A1, massive swapping speedup

2001-04-23 Thread Linus Torvalds


On Mon, 23 Apr 2001, Jonathan Morton wrote:
 There seems to be one more reason, take a look at the function
 read_swap_cache_async() in swap_state.c, around line 240:
 
 /*
  * Add it to the swap cache and read its contents.
  */
 lock_page(new_page);
 add_to_swap_cache(new_page, entry);
 rw_swap_page(READ, new_page, wait);
 return new_page;
 
 Here we add an empty page to the swap cache and use the
 page lock to protect people from reading this non-up-to-date
 page.
 
 How about reversing the order of the calls - ie. add the page to the cache
 only when it's been filled?  That would fix the race.

No. The page cache is used as the IO synchronization point, both for
swapping and for regular IO. You have to add the page in _first_, because
otherwise you may end up doing multiple IO's from different pages.

The proper fix is to do the equivalent of this on all the lookup paths
that want a valid page:

if (!PageUptodate(page)) {
lock_page(page);
if (PageUptodate(page)) {
unlock_page(page);
return 0;
}
rw_swap_page(page, 0);
wait_on_page(page);
if (!PageUptodate(page))
return -EIO;
}
return 0;

This is the whole point of the uptodate flag, and for all I know we may
already do all of this (it's certainly the normal setup).

Note how we do NOT block on write-backs in the above: the page will be
up-to-date from the bery beginning (it had _better_ be, it's hard to write
back a swap page that isn't up-to-date ;).

The above is how all the file paths work. 

Linus

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-A2

2001-04-23 Thread Linus Torvalds


On Mon, 23 Apr 2001, Ingo Molnar wrote:
 
 you are right - i thought about that issue too and assumed it works like
 the pagecache (which first reads the page without hashing it, then tries
 to add the result to the pagecache and throws away the page if anyone else
 finished it already), but that was incorrect.

The above is NOT how the page cache works. Or if some part of the page
cache works that way, then it is a BUG. You must NEVER allow multiple
outstanding reads from the same location - that implies that you're doing
something wrong, and the system is doing too much IO.

The way _all_ parts of the page cache should work is:

Create new page:
 - look up page. If found, return it
 - allocate new page.
 - look up page again, in case somebody else added it while we allocated
   it.
 - add the page atomically with the lookup if the lookup failed, otherwise
   just free the page without doing anything.
 - return the looked-up / allocated page.

return up-to-date page:
 - call the above to get a page cache page.
 - if uptodate, return
 - lock_page()
 - if now uptodate (ie somebody else filled it and held the lock), unlock
   and return.
 - start the IO
 - wait on IO by waiting on the page (modulo other work that you could do
   in the background).
 - if the page is still not up-to-date after we tried to read it, we got
   an IO error. Return error.

The above is how it is always meant to work. The above works for both new
allocations and for old. It works even if an earlier read had failed (due
to wrong permissions for example - think about NFS page caches where some
people may be unable to actually fill a page, so that you need to re-try
on failure). The above is how the regular read/write paths work (modulo
bugs). And it's also how the swap cache should work.

Linus

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Longstanding elf fix (2.4.3 fix)

2001-04-23 Thread Linus Torvalds


On 23 Apr 2001, Eric W. Biederman wrote:
 
 ptrace is protected by the big kernel lock, but exec isn't so that
 doesn't help.  Hmm.  ptrace does require that the process be stopped
 in all cases

Right. Ptrace definitely cannot access a process at arbitrary times. In
fact, it is very serialized indeed, in that it can only access a process
at signal points, ie effectively when it is returning to user space.

With threads, of course, that doesn't help us. But with threads, the other
threads could have caused the same page faults, so ptrace() isn't actually
adding any new cases in that sense.

I'd be a lot more worried about /proc accesses.

execve() doesn't really need the mm semaphore, but on the other hand it
would be cleaner to get it, and it won't really hurt (there can not be any
real contention on it anyway - the only contention might come through
/proc, and I haven't looked at what that might imply).

load-library should definitely get it. I thought it did already, but..

Did you have a patch? Maybe I missed it.

Linus

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] rw_semaphores, optimisations try #3

2001-04-23 Thread Linus Torvalds



On Mon, 23 Apr 2001, D.W.Howells wrote:

 Linus, you suggested that the generic list handling stuff would be faster (2
 unconditional stores) than mine (1 unconditional store and 1 conditional
 store and branch to jump round it). You are both right and wrong. The generic
 code does two stores per _process_ woken up (list_del) mine does the 1 or 2
 stores per _batch_ of processes woken up. So the generic way is better when
 the queue is an even mixture of readers or writers and my way is better when
 there are far greater numbers of waiting readers. However, that said, there
 is not much in it either way, so I've reverted it to the generic list stuff.

Note that the generic list structure already has support for batching.
It only does it for multiple adds right now (see the list_splice
merging code), but there is nothing to stop people from doing it for
multiple deletions too. The code is something like

static inline void list_remove_between(x,y)
{
n-next = y;
y-prev = x;
}

and notice how it's still just two unconditional stores for _any_ number
of deleted entries.

Anyway, I've already applied your #2, how about a patch relative to that?

Linus

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] rw_semaphores, optimisations try #3

2001-04-24 Thread Linus Torvalds


On Tue, 24 Apr 2001, David Howells wrote:
 
 Yes but the struct rwsem_waiter batch would have to be entirely deleted from
 the list before any of them are woken, otherwise the waking processes may
 destroy their rwsem_waiter blocks before they are dequeued (this destruction
 is not guarded by a spinlock).

Look again.

Yes, they may destroy the list, but nobody cares.

Why?

 - nobody will look up the list because we do have the spinlock at this
   point, so a destroyed list doesn't actually _matter_ to anybody

   You were actually depending on this earlier, although maybe not on
   purpose.

 - list_remove_between() doesn't care about the integrity of the entries
   it destroys. It only uses, and only changes, the entries that are still
   on the list.

Subtlety is fine. It might warrant a comment, though.

Linus

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: rwsem benchmark [was Re: [PATCH] rw_semaphores, optimisationstry #3]

2001-04-24 Thread Linus Torvalds


On Tue, 24 Apr 2001, Andrea Arcangeli wrote:
 
   Again it's not a performance issue, the +a (sem) is a correctness issue
   because the slow path will clobber it.
  
  There must be a performance issue too, otherwise our read up/down fastpaths
  are the same. Which clearly they're not.
 
 I guess I'm faster because I avoid the pipeline stall using +m (sem-count)
 that is written as a constant, that was definitely intentional idea.

Guys.

You're arguing over stalls that are (a) compiler-dependent and (b) in code
that doesn't hapeen _anywhere_ except in the specific benchmark you're
using.

Get over it.

 - The benchmark may use constant addresses. None of the kernel does. The
   benchmark is fairly meaningless in this regard.

 - the stalls will almost certainly depend on the code around the thing,
   and will also depend on the compiler version. If you're down to
   haggling about issues like that, then there is no real difference
   between the code.

So calm down guys. And improving the benchmark might not be a bad idea.

Linus

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread Linus Torvalds

In article [EMAIL PROTECTED],
Christian Ehrhardt [EMAIL PROTECTED] wrote:

1.) If I'm not mistaken switch_to changes current-flags without
atomic operations and without any locks and sys_ptrace changes
child-flags only protected by the big kernel lock.

ptrace only operates on processes that are stopped. So there are no
locking issues - we've synchronized on a much higher level than a
spinlock or semaphore.

That said, it does look like 2.2.x has a real bug, and maybe the ptrace
task stopping sycnhronization is broken..

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread Linus Torvalds

[ Alan, I'm lazy and only have 2.2.14 sources on-line. Maybe this has
  been fixed already and there's something else going on. Worth a look ]

In article [EMAIL PROTECTED],
Victor Zandy  [EMAIL PROTECTED] wrote:

Someone else here traced the process flags of a FP-intensive program
on a machine before and after it is put in the faulty FPU state.  He
periodically sampled /proc/pid/stat while the program was running.

He found that PF_USEDFPU was always set before the machine was broken.
After he found that it was set about 70% of the time.

[ Looks closer at the ptrace synchronization ]

Ahh.. This actually _does_ look like a race on current-flags:
PTRACE_ATTACH will do a

child-flags |= PF_PTRACED;

without waiting for the child to have stopped.

(Aside: thinking more about the stopping logic - I'm not actually sure
the ptrace synchronization is complete wrt scheduling, as there will be
a window when the process has set the task state to TASK_STOPPED but
hasn't actually yet scheduled away. Oh, well).

All other ptrace operations (not counting killing the child) will check
that the child is quiescent.  But PTRACE_ATTACH will not, as we're just
setting up the stopping.

In 2.4.x, this bug doesn't happen because flags was split up into
current-ptrace and current-flags.  Exactly because of locking
concerns.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3

2001-04-24 Thread Linus Torvalds


On Tue, 24 Apr 2001, Ingo Molnar wrote:
 
 the latest swap-speedup patch can be found at:

Please don't add more of those horrible wait arguments.

Make two different versions of a function instead. It's going to clean up
and simplify the code, and there really isn't any reason to do what you're
doing.

You should split up the logic differently: if you want to wait for the
page, then DO so:

page = lookup_swap_cache(..);
if (page) {
wait_for_swap_cache:valid(page);
.. use page ..
}

Note how much more readable and UNDERSTANDABLE the above is, compared to

page = lookup_swap_cache(..., 1);
if (page) {
...

and note also how splitting up the waiting will

 - simplify the swap cache lookup function, making it faster for people
   who do _NOT_ want to wait.

 - make it easier to statically check the correctness of programs by just
   eye-balling them (Hey, he's calling 'wait' with the spinlock held).

 - more easily moving the wait around, allowing for more concurrency.

Basically, I don't want to mix synchronous and asynchronous
interfaces. Everything should be asynchronous by default, and waiting
should be explicit.

Linus

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: orinoco_cs IrDA

2001-04-25 Thread Linus Torvalds



On Tue, 24 Apr 2001, Jean Tourrilhes wrote:

   I've got a question... I would like where to send my driver
 patches...

Probably both me and Alan.

[ General rules follow. Too few people seem to have seen them before ]

Most importantly, when sending patches to me:

 - specify clearly that you really want to see them in the standard
   kernel, and why. I occasionally get patches that just say this is a
   good idea. I don't apply them. Especially if they are cc'd to somebody
   else too, in which case I pretty much assume that it's a RFC, not a
   real patch.

 - do NOT send patches in attachements. Send one patch per mail, in
   clear-text under your message, so that I can easily see the patch and
   decide then-and-there whether it looks ok. And if it doesn't look ok,
   and I do a reply, the patch gets included in the reply so that I can
   point out which part of the patch I dislike.

   Don't worry about sending me five emails. That's FINE. I much prefer
   seeing five consecutive emails from the same person with five distinct
   subject lines and five distinct patches, than seeing one email with
   five attachements to it.

 - if your email system is broken, and you want to send patches as
   attachements to avoid whitspace damage, then please FIX YOUR EMAIL
   SYSTEM INSTEAD.

 - Don't point to web-sites. If I have to move the mouse outside my email
   xterm to work on the email, your email just got ignored.

 - Make your patches one sub-directory under the source tree you're
   working on. In short, your patches should look like something like

--- clean/fs/inode.c ...
+++ linux/fs/inode.c ..
@@ -179,7 +179,7 @@
...

   so that I can (regardless of where my source tree is) apply them
   with patch -p1 from my linux top directory. Then I can just do a

cd v2.4/linux
patch -p1  ~/multiple-emails-with-multiple-accepted-patches

   and not have to worry about three patches being based on
   /usr/src/linux, while two others not having a path at all and being
   individual filenames in linux/drivers/net.

 - and finally: re-send. If I had laser-eye surgery the fay you sent the
   patches, I won't have applied them. If I took a day off and spent it
   with the kids at the pool instead, I won't have applied them. If I
   decided that this weekend I'm not going to read email for a change, I
   won't have applied them.

   And when I come back to work a day or two later, I will have several
   hundred other emails to work through. I never go backwards in my
   emails.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-26 Thread Linus Torvalds


On Thu, 26 Apr 2001, Mike Galbraith wrote:
 
 2.4.4.pre7.virgin
 real11m33.589s
 user7m57.790s
 sys 0m38.730s
 
 2.4.4.pre7.sillyness
 real9m30.336s
 user7m55.270s
 sys 0m38.510s

Well, I actually like parts of this. The always swap out current mm one
looks rather dangerous, and the lastscan jiffy thing is too ugly for
words, but refill_inactive() looks much nicer. There is beauty in
simplicity. 

The page aging in drop_pte feels pretty harsh, though.

Have you looked at free_pte()? I don't like that function, and it might
make a difference. There are several small nits with it:

 - it should probably try to deactivate the page. If drop_pte does that
   when it deacctivates a page involuntarily, why not do it for a real we
   just free'd the page voluntarily?

 - swap-cache pages should probably not just be de-activated, but actively
   aged down. Right now, they are neither, so we have to work all the 
   way through refill_inactive() and then page_launder() to clear them
   out. Even though the page may be entirely useless by now (we had a
   complex special case that caught and short-circuited some of the pages,
   and maybe it was worth it. But maybe the right thing is to just age
   them down and naturally deactivate them?)

   After all, we aged them up for references to this virtual
   mapping, and free_pte() just made it go away. Unlike normal page cache
   pages, we don't get any advantage from trying to cache the things
   across multiple VM's.

 - we're dropping the accessed bit on the floor. In the vmscan case the
   accessed bit would have aged the page up. 

On the other hand, to offset some of these, we actually count the page
accessed _twice_ sometimes: we count it on lookup, and we count it when we
see the accessed bit in vmscan.c. Which results in some pages getting aged
up twice for just one access if we go through the vmscan logic, while if
we just map and unmap them they get counted just once.

Obviously the page aging logic seems to be making a noticeable difference
to you. So looking at page aging logic issues in the bigger picture migth
be worthwhile - not just staring at the actual swap-out code. The fact is,
the swap-out-code cannot get the aging right if the rest of the system
ignores it or does it only for some cases.

I _think_ the logic should be something along the lines of: freeing the
page amounts to a implied down-aging of the page, but the 'accessed' bit
would have aged it up, so the two take each other out. But if so, the
free_pte() logic should have something like

if (page-mapping) {
if (!pte_young(pte) || PageSwapCache(page))
age_page_down_ageonly(page);
if (!page-age)
deactivate_page(page);
}

instead of just ignoring these issues completely.

Comments? 

Linus

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



<    3   4   5   6   7   8   9   10   11   12   >