Re: max sizes for files and file systems

2001-07-05 Thread Albert D. Cahalan

Derek Vadala writes:

> It's clear that under 2.4, the kernel imposes a limit of 2TB as the
> maximum file size and that some portion of the kernels before 2.4 had a
> limit of 2GB.
>
> However, it's not clear to me when the file size limit was increased, or
> what the maximum file system sizes under 2.0, 2.2 and 2.4 are. I realize
> that both of these values are also contingent on the filesystem used, but
> I'm wondering about what limits the kernel itself imposes. 
> 
> I'm also a bit unclear as to where the 2GB limit in kernels < 2.4 comes
> from. It appears to be a kernel imposed limit, but there also seems to be
> a lot conflicting information out there, blaming the problem on
> EXT2. However, from what I can tell, 2.0.39, 2.2.19 and 2.4.5 all use the
> same version (0.5b-95/08/09) of ext2-- either that or EXT2FS_VERSION and
> EXT2FS_DATE in .../include/linux/ext2_fs.h simply haven't been updated.

No 32-bit Linux system could exceed 1 TB on anything until this week.
This is caused by signed 32-bit math on units of 512 bytes.
Now there are experimental patches for larger devices.

The file access API was limited to signed 32-bit byte values.
Officially, this was fixed for the 2.4 series. Most distributions
shipped 2.2 series kernels with patches to allow large files.

The ext2, FAT, and NFSv2 filesystems all had a 32-bit file
size limit. For ext2 this was lifted just as the 2.2 series
came out, but only Alpha systems could use the large files.
FAT has not been fixed. NFSv2 has been replaced by NFSv3.

EXT2FS_VERSION has not been updated because feature flag bits
are being used instead.

I have a graph of ext2 limits:
http://www.cs.uml.edu/~acahalan/linux/ext2.gif

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] more SAK stuff

2001-07-05 Thread Albert D. Cahalan

Rob Landley writes:

> Off the top of my head, fun things you can't do suid root:
...
> ps  (What the...?  Worked in Red Hat 7, but not in suse 7.1.
> Huh?  "suid-to  apache ps ax" works fine, though...)

The ps command used to require setuid root. People would set the
bit by habit.

> I keep bumping into more of these all the time.  Often it's fun
> little warnings "you shouldn't have the suid bit on this
> executable", which is frustrating 'cause I haven't GOT the suid bit
> on that executable, it inherited it from its parent process, which
> DOES explicitly set the $PATH and blank most of the environment
> variables and other fun stuff...)

Oh, cry me a river. You can set the RUID, EUID, SUID, and FUID
in that same parent process or after you fork().

Since you didn't set all the UID values, I have to wonder what
else you forgot to do. Maybe you shouldn't be messing with
setuid programming.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [OT] Re: LILO calling modprobe?

2001-07-05 Thread Albert D. Cahalan

Wakko Warner writes:

> I believe there is.  It wants to find what drive is bios drive 80h.  Really
> annoying since there's no way (correct me if I'm wrong) to read bios from
> linux.  If there is, lilo should do that.  But since it's an old copy, this
> probably was fixed.
>
> I had a machine at work with both ide and scsi.  ide hdd was hdc and ide
> cdrom was hda just to keep lilo from thinking hdc is the first bios drive
> which infact sda was

The easy way to handle this is to md5 checksum the disks at boot.
Read the first and last track of the first and last cylinder of
every BIOS drive. Then match up the disks when partition tables
get scanned.

The hard way involves running the BIOS in virtual-8088 mode to
trap IO accesses, then mapping to drivers by IO region later.

Neither way is 100% reliable, but the current guess is worse.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [Re: gcc: internal compiler error: program cc1 got fatal signal 11]

2001-06-29 Thread Albert D. Cahalan

> Almost always ?
> It seems like gcc is THE ONLY program which gets
> signal 11
> Why the X server doesn't get signal 11 ?
> Why others programs don't get signal 11 ?
...
> Some time ago I installed Linux (Redhat 6.0) on my 
> pc (Cx486 8M RAM) and gcc had a lot of signal 11 (a
> couple every hour) I was upgrading
> the kernel every time there was a new kernel and
> from 2.2.12(or 14) no more signal 11 (very rare)
> Is this still a hardware problem ?

It could be. One possible way:

1. your system is clogged with dust
2. gcc runs the CPU hard, generating lots of heat
3. the heat causes crashes
4. a new Linux version that sets a Cyrix-specific power-saving mode
5. your heat problems go away, and so do the crashes

Another possible way:

1. you have buggy motherboard or disk hardware
2. when you swap, gcc gets corrupted by the hardware
3. you get a new Linux kernel that has a bug work-around
4. your problems go away

Yet another way:

1. your room is hot, your computer is near a huge motor...
2. you upgrade to Linux 2.2.12 and move your computer
3. soon you realize that the crashes are gone
4. you credit the kernel, but location was the problem
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] User chroot

2001-06-28 Thread Albert D. Cahalan

Sean Hunter writes:
> On Wed, Jun 27, 2001 at 04:55:56PM -0400, Albert D. Cahalan wrote:

>> ln /dev/zero /tmp/zero
>> ln /dev/hda ~/hda
>> ln /dev/mem /var/tmp/README
>
> None of these (of course) work if you use mount options to
> restrict device nodes on those filesystems.

In which case, you can't boot. Think about it.

Never mind the method. One way or another, it is very often
possible for a normal users to set up a chroot environment
with the device files that are needed. Maybe they do something
obscene with the admin. :-) So chroot() is useful for users.

In my case, I _am_ the admin and I just don't want to run
every damn little test program and hack as root.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] User chroot

2001-06-27 Thread Albert D. Cahalan

H. Peter Anvin writes:
> "Albert D. Cahalan" wrote:

>> BTW, it is way wrong that /dev/zero should be needed at all.
>> Such use is undocumented ("man zero", "man mmap") anyway, and
>> AFAIK one should use mmap() with MAP_ANON instead. Not that
>> the documentation on MAP_ANON is any good either, but at least
>> the mere existence of the flag is mentioned.
>
> RTFM(POSIX).

No manual entry for RTFM in section POSIX

Seriously:

1. both features ought to be documented in the man pages
   (I did submit a man page too, back in 1996)

2. it is slow and nasty to open /dev/zero for getting memory
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] User chroot

2001-06-27 Thread Albert D. Cahalan

H. Peter Anvin writes:
> Albert D. Cahalan wrote:

>> Normal users can use an environment provided for them.
>>
>> While trying to figure out why the "heyu" program would not
>> work on a Red Hat box, I did just this. As root I set up all
>> the device files needed, along Debian libraries and the heyu
>> executable itself. It was annoying that I couldn't try out
>> my chroot environment as a regular user.
>>
>> Creating the device files isn't a big deal. It wouldn't be
>> hard to write a setuid app to make the few needed devices.
>> If we had per-user limits, "mount --bind /dev/zero /foo/zero"
>> could be allowed. One way or another, devices can be provided.
>
> Hell no!  This would give the user a way to subvert root or other
> system-provided things by having device nodes or such appear where
> they aren't expected.  NOT GOOD.

On every normal (default Red Hat or Debian at least) system
this is already trivial:

ln /dev/zero /tmp/zero
ln /dev/hda ~/hda
ln /dev/mem /var/tmp/README

So the user often can provide device nodes. The above is _worse_
than allowing "mount --bind ..." because the admin has to search
the whole filesystem to find such links.

Never mind that though; it doesn't matter how the devices are
created. Social engineering can work. Once the device problem
is taken care of, chroot() becomes useful for normal users.

BTW, it is way wrong that /dev/zero should be needed at all.
Such use is undocumented ("man zero", "man mmap") anyway, and
AFAIK one should use mmap() with MAP_ANON instead. Not that
the documentation on MAP_ANON is any good either, but at least
the mere existence of the flag is mentioned.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] User chroot

2001-06-26 Thread Albert D. Cahalan

H. Peter Anvin writes:
> [somebody]

>> Have you ever wondered why normal users are not allowed to chroot?
>>
>> I have. The reasons I can figure out are:
>>
>> * Changing root makes it trivial to trick suid/sgid binaries to do
>>   nasty things.
>>
>> * If root calls chroot and changes uid, he expects that the process
>>   can not escape to the old root by calling chroot again.
>>
>> If we only allow user chroots for processes that have never been
>> chrooted before, and if the suid/sgid bits won't have any effect under
>> the new root, it should be perfectly safe to allow any user to chroot.
>
> Safe, perhaps, but also completely useless: there is no way the user
> can set up a functional environment inside the chroot.  In other
> words, it's all pain, no gain.

Normal users can use an environment provided for them.

While trying to figure out why the "heyu" program would not
work on a Red Hat box, I did just this. As root I set up all
the device files needed, along Debian libraries and the heyu
executable itself. It was annoying that I couldn't try out
my chroot environment as a regular user.

Creating the device files isn't a big deal. It wouldn't be
hard to write a setuid app to make the few needed devices.
If we had per-user limits, "mount --bind /dev/zero /foo/zero"
could be allowed. One way or another, devices can be provided.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: EXT2 Filesystem permissions (bug)?

2001-06-26 Thread Albert D. Cahalan

Kenneth Johansson writes:

> Do linux even support the sticky bit (t) I can't see a reason
> to use it, why would I want the file to be stored in the swap ?? 

It is not currently supported. Swapping out executables would
be very nice when using an NFS or CD-ROM filesystem, because
swap space is much faster.

> Also I think S (setuid but no execute bit) have something to
> do with file locking but I'am not shure exactly how it works. 

Yeah, if you mount with mandatory locking enabled it does stuff.
It's a UNIX feature.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: FAT32 superiority over ext2 :-)

2001-06-24 Thread Albert D. Cahalan

Daniel Phillips writes:
> On Monday 25 June 2001 00:54, Albert D. Cahalan wrote:

>> By dumb luck (?), FAT32 is compatible with the phase-tree algorithm
>> as seen in Tux2. This means it offers full data integrity.
>> Yep, it whips your typical journalling filesystem. Look at what
>> we have in the superblock (boot sector):
>>
>> __u32  fat32_length;  /* sectors/FAT */
>> __u16  flags; /* bit 8: fat mirroring, low 4: active fat */
>> __u8   version[2];/* major, minor filesystem version */
>> __u32  root_cluster;  /* first cluster in root directory */
>> __u16  info_sector;   /* filesystem info sector */
>>
>> All in one atomic write, one can...
>>
>> 1. change the active FAT
>> 2. change the root directory
>> 3. change the free space count
>>
>> That's enough to atomically move from one phase to the next.
>> You create new directories in the free space, and make FAT
>> changes to an inactive FAT copy. Then you write the superblock
>> to atomically transition to the next phase.
>
> Yes, FAT is what inspired me to go develop the algorithm.  However, two
> words: 'lost clusters'.  Now that may just be an implemenation detail ;-)

What lost clusters?

Set bit 8 of "flags" (A_BF_BPBExtFlags to Microsoft) to disable
FAT mirroring. Then the low 4 bits are a 0-based value that
indicates which copy of the FAT should be used.

Assume we have 2 copies of the FAT, as is (was?) common. I'll call
them X and Y. When we mount the filesystem, we disable FAT mirroring
and mark FAT X active.

Now we can make changes to FAT Y without affecting filesystem
integrity. Windows will not use FAT Y. As is usual with the
phase-tree algorithm, we use free space to create a new structure
beside the old one.

Time for a phase change:

We have FAT Y, currently inactive, updated on disk.
FAT X is active; it describes the current on-disk state.
We have a new root directory on disk, sitting in free space.
We have a new filesystem info sector on disk, sitting in free space.

We write one single sector, then:

FAT X becomes inactive, and will not be used by Windows.
FAT Y becomes active; it describes the new on-disk state.
The old root directory is marked free in FAT Y. Good!
The old filesystem info sector is marked free in FAT Y. Good!

Once the superblock goes to disk, FAT X may be written to.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



FAT32 superiority over ext2 :-)

2001-06-24 Thread Albert D. Cahalan


By dumb luck (?), FAT32 is compatible with the phase-tree algorithm
as seen in Tux2. This means it offers full data integrity.
Yep, it whips your typical journalling filesystem. Look at what
we have in the superblock (boot sector):

__u32  fat32_length;  /* sectors/FAT */
__u16  flags; /* bit 8: fat mirroring, low 4: active fat */
__u8   version[2];/* major, minor filesystem version */
__u32  root_cluster;  /* first cluster in root directory */
__u16  info_sector;   /* filesystem info sector */

All in one atomic write, one can...

1. change the active FAT
2. change the root directory
3. change the free space count

That's enough to atomically move from one phase to the next.
You create new directories in the free space, and make FAT
changes to an inactive FAT copy. Then you write the superblock
to atomically transition to the next phase.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Shared memory quantity not being reflected by /proc/meminfo

2001-06-23 Thread Albert D. Cahalan

Allan Duncan writes:

> Since the 2.4.x advent of shm as tmpfs or thereabouts,
> /proc/meminfo shows shared memory as 0.  It is in
> reality not zero, and is being allocated, and shows
> up in /proc/sysvipc/shm and /proc/sys/kernel/shmall
> etc..
> Neither 2.4.6-pre5 nor 2.4.5-ac17 have the correct
> display.

You misunderstood what 2.2.xx kernels were reporting.
The "shared" memory in /proc/meminfo refers to something
completely unrelated to SysV shared memory. This is no
longer calculated because the computation was too costly.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: For comment: draft BIOS use document for the kernel

2001-06-22 Thread Albert D. Cahalan

Alan Cox writes:
> [somebody]

>> I could not find any reference to BIOS int 0x15, function 0x87,
>> block-move, used to copy the kernel to above the 1 megabyte
>> real-mode boundary. I think this is still used.
>
> I dont think the kernel has ever used it. The path has always been to
> enter 32bit mode then relocate/uncompress the kernel, then run it

There are several non-kernel BIOS users:

lilo
grub
syslinux
XFree86 (using virtual-8088 to run a video BIOS for a second card?)
dosemu?
loadlin?
the boot block that reads ext2 (in 1 kB -- damn what a hack)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [RFC][PATCH] cutting up struct kernel_stat into cpu_stat

2001-06-21 Thread Albert D. Cahalan

Zach Brown writes:

> The attached patch-in-progress removes the per-cpu statistics from
> struct kernel_stat and puts them in a cpu_stat structure, one per cpu,
> cacheline padded.  The data is still coolated and presented through
> /proc/stat, but another file /proc/cpustat is also added.  The locking
> is as nonexistant as it was with kernel_stat, but who cares, they're
> just fuzzy stats to be eyeballed by system tuners :).

Hey! The lack of atomicity causes "top" to do one of 3 things
for the idle time report, depending on the version:

1. negative numbers
2. wrap-around (4200.00% idle)
3. truncate to zero (the numbers don't add up)

This is because top sees the idle time run backwards for a moment.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [OT] Threads, inelegance, and Java

2001-06-21 Thread Albert D. Cahalan

Rob Landley writes:
> On Wednesday 20 June 2001 15:53, Martin Dalecki wrote:
>> Mike Harrold wrote:

>> super computing, hmm what about some PowerPC CPU variant - they very
>> compettetiv in terms of cost and FPU performance! Transmeta isn't the
>> adequate choice here.
>
> You honestly think you can fit 142 PowerPC processors in a single 1U,
> air cooled?

That 142 would be what, a SHARC DSP system? It sure doesn't look
like Transmeta's Crueso. The best I found was 6 and 8 per 1U:

"RLX has managed to tuck 24 servers into a 3U enclosure" --> 8/U
"WebBunker units can hold 12 processors [in 2U]" --> 6/U

For PowerPC I found 32/U to 40/U, in increments of 9U.
See www.mc.com for an example. The processor gets you 4 (four!)
floating-point fused multiply-add operations per cycle, typically
at 400 MHz. Being optimistic, that's a teraflop in 9U.

> Liquid air cooled, maybe...

Nope, plain old air or conduction.

If you're going to rant about off-topic junk, at least try to
throw in a few useful references so people can check facts and
maybe take advantage of whatever it is you're ranting about.
(yeah, yeah, sorry about the VGA console thing)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Alan Cox quote? (was: Re: accounting for threads)

2001-06-20 Thread Albert D. Cahalan

Rob Landley writes:

> My only real gripe with Linux's threads right now [...] is
> that ps and top and such aren't thread aware and don't group them
> right.
>
> I'm told they added some kind of "threadgroup" field to processes
> that allows top and ps and such to get the display right.  I haven't
> noticed any upgrades, and haven't had time to go hunting myself.

There was a "threadgroup" added just before the 2.4 release.
Linus said he'd remove it if he didn't get comments on how
useful it was, examples of usage, etc. So I figured I'd look at
the code that weekend, but the patch was removed before then!

There is nothing that ps and top can do about this problem.
I've certainly looked into the matter; much of the code is mine.
BTW, the version in debian-unstable is the most stable. :-)

These options might help a little bit: --forest -H f

> (Ever tried to sumit a patch to the FSF?  They want you to sign
> legal documents.  That's annoying.  I usually just send the bug
> reports to red hat and let THEM deal with it...)

Submit patches to me, under the LGPL please. The FSF isn't likely
to care. What, did you think this was the GNU system or something?

> Linus's job is to keep code OUT of the kernel.  He has veto power,
> nothing else.  I suspect he's pre-emptively vetoing some stuff to
> keep the flood down to a level he can deal with.  Maybe someday
> we'll convince him to use some variant of source control (not
> necessarily CVS, how about just a seperate mailing list of the
> individual patches as he applies them?  One linus can post to and
> that is read-only to everybody else?  HE always wants patches
> seperated down nicely into individual messages with explanations,
> but WE have to get pre2-pre3 as one big patch lump.  With a
> patches-from-linus mailing list that he forwarded posts to, we'd
> know exactly when a patch went in and who it was from without
> bothering Linus. :)

How about a filesystem filter to spit out patches, or a filesystem
interface to version control?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: very strange (semi-)lockups in 2.4.5

2001-06-18 Thread Albert D. Cahalan

Pozsar Balazs writes:

> I'm having ~2 lockups a day. The following happens:
>  If I was under X, i only can use the magic-key, but no other keyboard (eg
> numlock) or mouse response, the screen freezes, processes stop.
>  If i was using textmode:
>   numlock still works
>   cursor blinks
>   processess stop (eg, gpm doesn't work, outputs freeze)
>   i can still switch vt's.
>   BUT, i can only type into a few vt's, last time into 3,5,6,7,8, but not
> into 1,2 or 4!
> 
> I cannot give you any traces, as i dont have any.
> 
> Also note that magic-key works, and it says that it umounts filesystems if
> i press magic-u, but next time at mount i see that reiserfs is replaying
> transactions.
> 
> 
> Any ideas?
> 
> The machine is a P3-750, 512M ram, abit vp6 mb. No overclocking, and it
> passes memtest86.

I think I'm getting the same thing, but I don't have the magic-key
compiled in. I'm going to hook up a VT510 to the serial port, in case
this is just XFree86 crashing. For anyone collecting statistics:

kernels 2.4.4-pre6 (?) and now 2.4.6-pre3
plain Pentium MMX @ 200 MHz
Intel motherboard -- see below
stable since 1996, on a UPS, dust-free, and the fan works
one lockup per day with desktop usage

In case the serial console doesn't work, could someone post plans
for a safe NMI board? (both ISA and PCI) The best I found:
http://www.sandelman.ottawa.on.ca/linux-ipsec/html/2000/02/msg00425.html
http://www.sandelman.ottawa.on.ca/linux-ipsec/html/2000/02/msg00391.html
(for PCI you're supposed to assert SERR# on the clock -- how?)

00:00.0 Host bridge: Intel Corporation 430TX - 82439TX MTXC (rev 01)
00:07.0 ISA bridge: Intel Corporation 82371AB PIIX4 ISA (rev 01)
00:07.1 IDE interface: Intel Corporation 82371AB PIIX4 IDE (rev 01)
00:07.2 USB Controller: Intel Corporation 82371AB PIIX4 USB (rev 01)
00:07.3 Bridge: Intel Corporation 82371AB PIIX4 ACPI (rev 01)
00:11.0 Ethernet controller: Digital Equipment Corporation DECchip 21040 [Tulip] (rev 
23)
00:13.0 Ethernet controller: Lite-On Communications Inc LNE100TX Fast Ethernet Adapter 
(rev 25)
00:14.0 VGA compatible controller: ATI Technologies Inc 3D Rage Pro 215GP (rev 5c)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] nonblinking VGA block cursor

2001-06-15 Thread Albert D. Cahalan

Daniel Phillips writes:
> On Friday 15 June 2001 21:21, Albert D. Cahalan wrote:

>> Non-blinking cursors are just wrong. You need to patch your brain.
>> You really fucked up, because now apps can't restore your cursor
>> to proper behavior as defined by IBM.
>
> Just one question Albert: why doesn't my mouse cursor blink? ;-)

1. confusion with the text cursor, which should blink
2. need for continuous pixel-to-pixel accuracy with the mouse
3. you can wiggle your mouse as needed to find the mouse cursor

Apps do funny things when you try to wiggle the text cursor
with the arrow keys, and movement tends to be harshly constrained.
So the blinking is important.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] nonblinking VGA block cursor

2001-06-15 Thread Albert D. Cahalan

Leon Breedt writes:

> Attached is a patch to enforce a non-blinking, FreeBSD-syscons like
> block cursor in console mode.
> 
> This is useful for laptop types, or people like me who really really
> detest a blinking cursor.
> 
> NOTE: It disables the softcursor escape codes 
>   (/usr/src/linux/Documentation/VGA-softcursor.txt), since I don't 
>   ever want anything to change my cursor shape/style :)

I've seen this 666 times too often.

Non-blinking cursors are just wrong. You need to patch your brain.
You really fucked up, because now apps can't restore your cursor
to proper behavior as defined by IBM.

The blinking cursor is implemented in your video hardware.
IBM knew what was right for you. Millions of people know that
the blinking cursor is good. It is so right that a proper GUI
will implement the blinking cursor even without hardware support.

Of course FreeBSD has a block cursor. It was easy to program,
and it seems nice to the pot-smoking hippies out in Berkeley.
FreeBSD doesn't define standards. FreeBSD breaks standards.
(zombie creation, "ps -ef", partition tables, pty allocation...)
Gee, kind of like Microsoft, except Microsoft got the cursor right!

Ever wonder why IBM supports Linux instead of FreeBSD? Hmmm?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Client receives TCP packets but does not ACK

2001-06-15 Thread Albert D. Cahalan

Mike Black writes:

> I'm concerned that you're probably just overruning your IP stack:
...
> TCP is NOT a guaranteed protocol -- you can't just blast data from one port
> to another and expect it to work.

Yes you can. This is why we have TCP in fact.

> a tcp-write is NOT guaranteed -- and as you've seen -- a recv() isn't either
> (that's why you need timeouts).
> You're probably overrunning the tcp buffer on your "print" statement and
> truncating a block.
> I don't see where you're checking forEAGAIN or EWOULDBLOCK (see man
> send).

You do have to check for partial writes due to the UNIX API.
Then check for EAGAIN and EINTR at least.

> You need a layer-7 protocol that will guarantee your transactions -- once
> you're client acks/naks your server I'll bet everything works hunky-dory.
> If you're not familiar with the OSI model
> http://www.csihq.com/~mike/students/networking/iso/isomodel.html

You don't need that crap. TCP/IP doesn't even fit the OSI model,
and we're missing much of the OSI stack AFAIK. (Do we have that
thing with 10-byte addresses? I think not.)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Going beyond 256 PCI buses

2001-06-14 Thread Albert D. Cahalan

David S. Miller writes:
> Jeff Garzik writes:

>> According to the PCI spec it is -impossible- to have more than 256
>> buses on a single "hose", so you simply have to implement multiple
>> hoses, just like Alpha (and Sparc64?) already do.  That's how the
>> hardware is forced to implement it...
>
> Right, what userspace had to become aware of are "PCI domains" which
> is just another fancy term for a "hose" or "controller".
>
> All you have to do is (right now, the kernel supports this fully)
> open up a /proc/bus/pci/${BUS}/${DEVICE} node and then go:
> 
>   domain = ioctl(fd, PCIIOC_CONTROLLER, 0);
>
> Viola.
>
> There are only two real issues:

No, three.

0) The API needs to be taken out and shot.

   You've added an ioctl. This isn't just any ioctl. It's a
   wicked nasty ioctl. It's an OH MY GOD YOU CAN'T BE SERIOUS
   ioctl by any standard.

   Consider the logical tree:
   hose -> bus -> slot -> function -> bar

   Well, the hose and bar are missing. You specify the middle
   three parts in the filename (with slot and function merged),
   then use an ioctl to specify the hose and bar.

   Doing the whole thing by filename would be better. Else
   why not just say "screw it", open /proc/pci, and do the
   whole thing by ioctl? Using ioctl for both the most and
   least significant parts of the path while using a path
   for the middle part is Wrong, Bad, Evil, and Broken.

   Fix:

   /proc/bus/PCI/0/0/3/0/config   config space
   /proc/bus/PCI/0/0/3/0/0the first bar
   /proc/bus/PCI/0/0/3/0/1the second bar
   /proc/bus/PCI/0/0/3/0/driver   info about the driver, if any
   /proc/bus/PCI/0/0/3/0/eventhot-plug, messages from driver...

   Then we have arch-specific MMU cruft. For example the PowerPC
   defines bits that affect caching, ordering, and merging policy.
   The chips from IBM also define an endianness bit. I don't think
   this ought to be an ioctl either. Maybe mmap() flags would be
   reasonable. This isn't just for PCI; one might do an anon mmap
   with pages locked and cache-incoherent for better performance.

> 1) Extending the type bus numbers use inside the kernel.
...
> 2) Figure out what to do wrt. sys_pciconfig_{read,write}()
...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Going beyond 256 PCI buses

2001-06-13 Thread Albert D. Cahalan

Tom Gall writes:

>   I was wondering if there are any other folks out there like me who
> have the 256 PCI bus limit looking at them straight in the face?

I might. The need to reserve bus numbers for hot-plug looks like
a quick way to waste all 256 bus numbers.

> each PHB has an
> additional id, then each PHB can have up to 256 buses.

Try not to think of him as a PHB with an extra id. Lots of people
have weird collections. If your boss wants to collect buses, well,
that's his business. Mine likes boats. It's not a big deal, really.

(Did you not mean your pointy-haired boss has mental problems?)


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: IBM PPC 405 series little endian?

2001-06-11 Thread Albert D. Cahalan

Zehetbauer Thomas writes:

> Has someone experimented with running linux in little-endian mode on IBM
> PowerPC 405 (Walnut) yet?

I doubt it. You are at least the 3rd person to want little-endian.
Somebody at Matrox posted a patch for little-endian on the 74xx.
You need a bit more than that though; you need to change the way
page table bits get set and modify head_4xx.S IIRC.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: temperature standard - global config option?

2001-06-09 Thread Albert D. Cahalan

Michael H. Warfiel writes:
> On Fri, Jun 08, 2001 at 05:16:39PM -0400, Albert D. Cahalan wrote:

>> The bits are free; the API is hard to change.
>> Sensors might get better, at least on high-end systems.
>> Rounding gives a constant 0.15 degree error.
>> Only the truly stupid would assume accuracy from decimal places.
>> Again, the bits are free; the API is hard to change.
...
>   No...  The average person, NO, the vast majority of people,
> DO assume accuracy from decimal places and honestly do not know the
> difference between precision and accuracy.  I've had comments on this
> thread in private E-Mail the reinforce this impression.

I hope you don't think people would assume that a "float" always
has useful data in all 23 fraction bits. It is a similar case.

So here you go, a kernel-safe conversion from C to K. It works
from 0 to 238 degrees C. Print as hex, so user code can toss it
into a union or maybe abuse scanf. Adjust as needed for F to K
or for hardware with greater resolution.

/* unsigned int degrees C --> float degrees K */
unsigned ic_to_fk(unsigned c){
  unsigned exponent;
  unsigned tmp;

  tmp = (c<<23) + 0x8893; /* Kelvin shifted 23 left */
  exponent = 127; /* IEEE floating-point bias */
  while(tmp&0xff00){
tmp >>= 1;
exponent++;
  }
  tmp &= 0x007f; /* keep only the fraction */
  tmp |= exponent<<23;
  return tmp;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



checker suggestion

2001-06-09 Thread Albert D. Cahalan

Struct padding is a problem. Really, there shouldn't be any
implicit padding. This causes:

1. security leaks when such structs are copied to userspace
   (the implicit padding is uninitialized, and so may contain
   a chunk of somebody's private key or password)

2. bloat, when struct members could be reordered to eliminate
   the need for padding
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: temperature standard - global config option?

2001-06-08 Thread Albert D. Cahalan

Michael H. Warfiel writes:
> On Fri, Jun 08, 2001 at 05:16:39PM -0400, Albert D. Cahalan wrote:

>> The bits are free; the API is hard to change.
>> Sensors might get better, at least on high-end systems.
>> Rounding gives a constant 0.15 degree error.
>> Only the truly stupid would assume accuracy from decimal places.
>> Again, the bits are free; the API is hard to change.
...
>   No...  The average person, NO, the vast majority of people,
> DO assume accuracy from decimal places and honestly do not know the
> difference between precision and accuracy.  I've had comments on this
> thread in private E-Mail the reinforce this impression.

Fine. Most user apps can round to the nearest degree, or even
display the values "cool", "warm", "hot", and "BURNING!".
The kernel API should not be so limiting.

>   Even the rounding error vis-a-vis the .15 is silly and irrelevant!
> If the sensor is +- 1 degree, you can't even measure the rounding error,
> even if you HAVE two decimal places.  With that degree of accuracy, you
> are no better off than 273 with no decimal places.  Worrying about rounding
> error on .15 when the accuracy is in the units is exactly the kind of
> misinformed false precision that I worry about.  You actually though that
> the .15 was significant enough to worry about round error when, in fact,
> it will be impossible to measure with the equipment available in the
> environment of discourse.

The 0.15 may mean the difference between:

a.  less than 0.005 chance of exceeding 370 degrees
b.  less than 0.01 chance of exceeding 370 degrees

for a measurement that might be 365 degrees.

>> One might provide other numbers to specify accuracy and precision.
>
>   Now...  That I can agree with and it would make absolute sense.
> Especially if we were discussing lab grade or scientific grade measure
> equipment and measurements.  In fact, that would be a requirement for
> any validity to be attached to measurements of that level of precision.

No, at any level of precision. I'd sure want to know if the device
is specified as "resolution 8 degrees, standard deviation 23".

This information is fairly important. The user is responsible for
defining acceptable risk, and the app should be able to provide a
warning or shutdown based on this.

For typical PC hardware, one might assume that the device is a
cheap piece of junk 2 mm below the CPU. (with quite a bit of lag!)
The lag ought to be specified too of course.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: temperature standard - global config option?

2001-06-08 Thread Albert D. Cahalan

John Chris Wren writes:

> coupling to the CPU that is about as bad as it can get.  You've got an epoxy
> housing of an inconsistent shape in contact with ceramic.  The actual
> contact point is miniscule.  There's no thermal paste, and often, I've seen
> the sensors not quite raised high enough to contact the chip (you should be
> able to rack a business card across the empty socket and feel a slight
> "bump" as you touch the sensor.  If not, you need to bend it up slightly, to
> give better physical contact to the CPU).
> 
> But in spite of all this, you're not really measure the critical
> temperature, which is junction tempature.  Yes, case tempature has *some*

There are processors with temperature measurement built right
into the silicon.

> For the record, in the course of a normal day, I see my temperatures
> fluctuate from 48C with the house A/C set to 73, to 56C when I open the
> doors, and let it get up to 76 in the house.  That's 8C (14.4F) over a 3F
> change in ambient.

This makes sense. Heat increases resistance, which generates heat.
At some point, a tiny increase will cause thermal run-away.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] sockreg2.4.5-05 inet[6]_create() register/unregister table

2001-06-08 Thread Albert D. Cahalan

Henning P. Schmied writes:
> Alan Cox <[EMAIL PROTECTED]> writes:

>> So it comes down to the question of whether the module is linking
>> (which is about dependancies and requirements) and what the legal
>> scope is. Which is a matter for lawyers.
>
> And this would void DaveMs' argument, that only the "official in
> Linus' kernel published interface is allowed for binary modules". This
> would mean, that putting the posted, unofficial patch under GPL into
> the kernel and then using this interface for a binary module is just
> the same as using only the official ABI from a lawyers' point of
> view! 
>
> This would make DaveMs' position even less understandable, because
> there would be no difference for a proprietary vendor but keeping the
> patch out of the kernel makes life harder for people like the original
> poster that want to test new (open sourced) protocols like SCTP.

Yep.

Consider a chunk of x86 instructions using a home-grown OS
abstraction layer, and drivers that implement that layer for
both Linux and any non-GPL operating system. The binary blob
is obviously not derived from Linux, and may in fact run
without modification in a BSD or Solaris/x86 kernel.

There is in fact just such a layer. It might not currently
have the features needed to implement TCP, but it could be
extended as needed.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: temperature standard - global config option?

2001-06-08 Thread Albert D. Cahalan

L. K. writes:
> On Fri, 8 Jun 2001, Albert D. Cahalan wrote:

>> The bits are free; the API is hard to change.
>> Sensors might get better, at least on high-end systems.
>> Rounding gives a constant 0.15 degree error.
>> Only the truly stupid would assume accuracy from decimal places.
>> Again, the bits are free; the API is hard to change.
>>
>> One might provide other numbers to specify accuracy and precision.
>
> I really do not belive that for a CPU or a motherboard +- 1 degree would
> make any difference.
>
> If a CPU runs fine at, say, 37 degrees C, I do not belive it will have any
> problems running at 38 or 36 degrees. I support the ideea of having very
> good sensors for temperature monitoring, but CPU and motherboard
> temperature do not depend on the rise of the temperature of 1 degree, but
> when the temperature rises 10 or more degrees. I hope you understand what
> I want to say.

Of course I understand. Motorola offers 4-degree resolution,
with a random offset of up to 12 degrees. (calibration is possible)
You seem to need another reminder that THE BITS ARE FREE.

Why would you even consider trying to squeeze out a few bits?
You can't be absolutely sure that they will never be useful.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: temperature standard - global config option?

2001-06-08 Thread Albert D. Cahalan

Michael H. Warfiel writes:

> We don't have sensors that are accurate to 1/10 of a K and certainly not
> to 1/100 of a K.  Knowing the CPU temperature "precise" to .01 K when
> the accuracy of the best sensor we are likely to see is no better than
> +- 1 K is just about as relevant as negative absolute temperatures.
...
>   Even if we had or could, anticiplate, sensors with a +- .01 K,
> the relevance of knowing the CPU temperature to that precision is
> lost on me.  I see no sense in stuffing a field with meaningless
> bits just because the field will hold them.  In fact, this "false precision"
> quickly leads to the false impression of accuracy.  Based on several
> messages I have seen on this thread and in private E-Mail, there are a
> number of people who don't seem to grasp the fundamental difference
> between precision and accuracy and truely don't understand that adding
> meaningless precision like this adds nothing to the accuracy.
>
>   I can see maybe making it precise to .1 K.  But stuffing the bits
> in there to be precise to .01 K just because we have the bits and not
> because we have any realistic information to fill the bits in with, is
> just silly to me.  Just as silly as allowing for negative numbers in an
> absolute temperature field.  We have the bits to support it, but why?

The bits are free; the API is hard to change.
Sensors might get better, at least on high-end systems.
Rounding gives a constant 0.15 degree error.
Only the truly stupid would assume accuracy from decimal places.
Again, the bits are free; the API is hard to change.

One might provide other numbers to specify accuracy and precision.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: temperature standard - global config option?

2001-06-07 Thread Albert D. Cahalan

Chris Boot writes:

 Kelvins good idea in general - it is always positive ;-)

 0.01*K fits in 16 bits and gives reasonable range.
...
> OK, I think by now we've all agreed the following:
>  - The issue is NOT displaying temperatures to the user, but a userspace
>program reading them from the kernel.  The userspace program itself can
>do temperature conversions for the user if he/she wants.
>  - The most preferable units would be decikelvins, as the value can give a
>relatively precise as well as wide range of numbers ranging from absolute
>zero to about 6340 degrees Celsius ((65535 / 10) - 273) which is well
>within anything that a computer can operate.  It also gives us a good
>base for all sorts of other temperature sensing devices.
>
> Do we all agree on those now?

I nearly do.

There isn't any need to cram the data into 16 bits.
The offset to Celsius is 273.15 degrees.
So hundredths of a degree, in Kelvin, is a better choice.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: CacheFS

2001-06-07 Thread Albert D. Cahalan

Jan Kasprzak writes:

> Another goal is to use the Linux filesystem
> as a backing store (as opposed to the block device or single large file
> used by CODA).
...
> - kernel module, implementing the filesystem of the type "cachefs"
>   and a character device /dev/cachefs
> - user-space daemon, which would communicate with the kernel
>   over /dev/cachefs and which would manage the backing store
>   in a given directory.
>
>   Every file on the front filesystem (NFS or so) volume will be cached
> in two local files by cachefsd: The first one would contain the (parts of)
...
> * Should the cachefsd be in user space (as it is in the prototype
> implementation) or should it be moved to the kernel space? The
> former allows probably better configuration (maybe a deeper
> directory structure in the backing store), but the later is
> faster as it avoids copying data between the user and kernel spaces.

I think that, if speed is your goal, you should have the kernel
code use swap space for the cache. Look at what tmpfs does, but
running over top of tmpfs leaves you with the overhead of running
two filesystems and a daemon. It is better to be direct.

Maybe this shouldn't even be a filesystem. You could have a general
way to flag a filesystem as being significantly slower than swap.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: temperature standard - global config option?

2001-06-07 Thread Albert D. Cahalan

L. K. writes:

> Why not make it in Celsius ? Is more easy to read it this way.

No, because then the software must handle negative numbers for
cooled computers. CentiKelvin is fine. Do C=cK/100-273.15 if you
really must... but you still have a number that is useless to
a human. Humans need a seconds-to-destruction value or an alarm.

Negative temperatures do not really exist.





-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Missing cache flush.

2001-06-06 Thread Albert D. Cahalan

David S. Miller writes:
> David Woodhouse writes:

>>> Call it flush_ecache_full() or something.
>>
>> Strange name. Why? How about __flush_cache_range()?
>
> How about flush_cache_range_force() instead?
>
> I want something in the name that tells the reader "this flushes
> the caches, even though under every other ordinary circumstance
> you would not need to".

"flush" means what to you?

write-back
write-back-and-invalidate
discard-and-invalidate

All 3 behaviors are useful to me, and a few more. I've been
using chunks of PowerPC assembly. Using PowerPC mnemonics...

dcba -- allocate a cache block with undefined content
dcbf -- write to RAM, then invalidate ("data cache block flush")
dcbi -- invalidate, discarding any data
dcbst -- initiate write if dirty
dcbt -- prefetch, hinting about future load instructions
dcbtst -- prefetch, hinting about future store instructions
dcbz -- allocate and zero a cache block (cacheable mem only!)

So dcbf_range() and dcbi_range() sound good to me. :-)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Inconsistent "#ifdef __KERNEL__" on different architectures

2001-06-05 Thread Albert D. Cahalan

Paul Mackerras writes:

> The only valid reason for userspace programs to be including kernel
> headers is to get definitions that are part of the kernel API.  (And
> in fact others here will go further and assert that there are *no*
> valid reasons for userspace programs to include kernel headers.)
>
> If you want some atomic functions or whatever for your userspace
> program and the ones in the kernel look like they would be useful,
> then take a copy of the relevant kernel code if you like, but don't
> include the kernel headers directly.

Sure. That copy belongs in /usr/include/asm for all programs
to use, and it should match the libc that will be linked against.
(note: "copy", not a symlink)

Red Hat 7 gets this right:

$ ls -ldog /usr/include/asm /usr/include/linux
drwxr-xr-x2 root 2048 Sep 28  2000 /usr/include/asm
drwxr-xr-x   10 root10240 Sep 28  2000 /usr/include/linux

Debian's "unstable" is correct too:

$ ls -ldog /usr/include/asm /usr/include/linux
drwxr-xr-x2 root 6144 Mar 12 15:57 /usr/include/asm
drwxr-xr-x   10 root23552 Mar 12 15:57 /usr/include/linux

> This is why I added #ifdef __KERNEL__ around most of the contents
> of include/asm-ppc/*.h.  It was done deliberately to flush out those
> programs which are depending on kernel headers when they shouldn't.

What, is  being used? I doubt it.

If /usr/include/asm is a link into /usr/src/linux, then you
have a problem with your Linux distribution. Don't blame the
apps for this problem.

Adding "#ifdef __KERNEL__" causes extra busywork for someone
trying to adapt kernel headers for userspace use. At least do
something easy to rip out. Three lines, all together at the top:

#ifndef __KERNEL__
#error Raw kernel headers may not be compatible with user code.
#endif
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: symlink_prefix

2001-06-04 Thread Albert D. Cahalan

Alexander Viro writes:

> leaves ncp with its ioctls ugliness.

Authentication will be ugly. Joe mounts a filesystem, and does
not bother to authenticate. He gets world-accessible files.
Then Kevin authenticates as himself, and later as db_adm too.
Along comes Sue, who can authenticate the whole box as trusted.

The /fs/ext2 stuff is one of the nastiest hacks I've seen in
a long time, and it doesn't solve the authentication problem.

GUI users might like to see a dialog box pop up whenever they
hit restricted filesystem space. (example: an authentication tool
blocked on /dev/auth-notify or getting signals with info)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Highmem Bigmem question

2001-06-01 Thread Albert D. Cahalan

[EMAIL PROTECTED] writes:

> This is probably an FAQ, but I read the FAQ and its not in there.

Odd.

> I have a machine with 2G of memory.  I compiled the kernel with the
> 4G memory option.  How much address space should each process be
> able to address?

3 GB for user stuff, or 3.5 GB with a patch

> Does this change if I use the 64G option?

No. Don't do that.

> I'm after 2.4 information.  Right now I am running on a 2.2 kernel
> and it looks like the user processes are limited to ~1G.

This is not a kernel problem. Try a libc upgrade, or use some
other way to allocate memory. At least sbrk() and mmap() can
be used.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: How to know HZ from userspace?

2001-05-30 Thread Albert D. Cahalan

Harald Welte writes:

> Is there any way to read out the compile-time HZ value of the kernel?
> 
> I had a brief look at /proc/* and didn't find anything.

Look again, this time with a sick mind. Got your barf bag?
Kubys made me do it.

//
/***\
*   Copyright (C) 1992-1998 by Michael K. Johnson, [EMAIL PROTECTED] *
*  *
*  This file is placed under the conditions of the GNU Library *
*  General Public License, version 2, or any later version.*
*  See file COPYING for information on distribution conditions.*
\***/

/* ...but Albert Cahalan wrote the really evil parts.
MKJ is only guilty for the macro */

/* Sets Hertz equal to the kernel's HZ, as seen in /proc. */

#include 
#include 
#include 
#include 

#include 
#include 

#ifndef HZ
#include   /* htons */
#endif

long smp_num_cpus; /* number of CPUs */

#define BAD_OPEN_MESSAGE\
"Error: /proc must be mounted\n"\
"  To mount /proc at boot you need an /etc/fstab line like:\n"  \
"  /proc   /proc   procdefaults\n"  \
"  In the meantime, mount /proc /proc -t proc\n"

#define STAT_FILE"/proc/stat"
static int stat_fd = -1;
#define UPTIME_FILE  "/proc/uptime"
static int uptime_fd = -1;
#define LOADAVG_FILE "/proc/loadavg"
static int loadavg_fd = -1;
#define MEMINFO_FILE "/proc/meminfo"
static int meminfo_fd = -1;

static char buf[1024];

/* This macro opens filename only if necessary and seeks to 0 so
 * that successive calls to the functions are more efficient.
 * It also reads the current contents of the file into the global buf.
 */
#define FILE_TO_BUF(filename, fd) do{   \
static int local_n; \
if (fd == -1 && (fd = open(filename, O_RDONLY)) == -1) {\
fprintf(stderr, BAD_OPEN_MESSAGE);  \
fflush(NULL);   \
_exit(102); \
}   \
lseek(fd, 0L, SEEK_SET);\
if ((local_n = read(fd, buf, sizeof buf - 1)) < 0) {\
perror(filename);   \
fflush(NULL);   \
_exit(103); \
}   \
buf[local_n] = '\0';\
}while(0)

unsigned long Hertz;
static void init_Hertz_value(void) __attribute__((constructor));
static void init_Hertz_value(void){
  unsigned long user_j, nice_j, sys_j, other_j;  /* jiffies (clock ticks) */
  double up_1, up_2, seconds;
  unsigned long jiffies, h;
  smp_num_cpus = sysconf(_SC_NPROCESSORS_CONF);
  if(smp_num_cpus==-1) smp_num_cpus=1;
  do{
FILE_TO_BUF(UPTIME_FILE,uptime_fd);  sscanf(buf, "%lf", &up_1);
/* uptime(&up_1, NULL); */
FILE_TO_BUF(STAT_FILE,stat_fd);
sscanf(buf, "cpu %lu %lu %lu %lu", &user_j, &nice_j, &sys_j, &other_j);
FILE_TO_BUF(UPTIME_FILE,uptime_fd);  sscanf(buf, "%lf", &up_2);
/* uptime(&up_2, NULL); */
  } while((long)( (up_2-up_1)*1000.0/up_1 )); /* want under 0.1% error */
  jiffies = user_j + nice_j + sys_j + other_j;
  seconds = (up_1 + up_2) / 2;
  h = (unsigned long)( (double)jiffies/seconds/smp_num_cpus );
  /* actual values used by 2.4 kernels: 32 64 100 128 1000 1024 1200 */
  switch(h){
  case   30 ...   34 :  Hertz =   32; break; /* ia64 emulator */
  case   48 ...   52 :  Hertz =   50; break;
  case   58 ...   62 :  Hertz =   60; break;
  case   63 ...   65 :  Hertz =   64; break; /* StrongARM /Shark */
  case   95 ...  105 :  Hertz =  100; break; /* normal Linux */
  case  124 ...  132 :  Hertz =  128; break; /* MIPS, ARM */
  case  195 ...  204 :  Hertz =  200; break; /* normal << 1 */
  case  253 ...  260 :  Hertz =  256; break;
  case  393 ...  408 :  Hertz =  400; break; /* normal << 2 */
  case  790 ...  808 :  Hertz =  800; break; /* normal << 3 */
  case  990 ... 1010 :  Hertz = 1000; break; /* ARM */
  case 1015 ... 1035 :  Hertz = 1024; break; /* Alpha, ia64 */
  case 1180 ... 1220 :  Hertz = 1200; break; /* Alpha */
  default:
#ifdef HZ
Hertz = (unsigned long)HZ;/*  */
#else
/* If 32-bit or big-endian (not Alpha or ia64), assume HZ is 100. */
Hertz = (sizeof(long)==sizeof(int) || htons(999)==999) ? 100UL : 1024UL;
#endif
fprintf(stderr, "Unknown HZ value! (%ld) Assume %ld.\n", h, Hertz);
  }
}
//





-
To unsubscribe from this list: send the line "unsubscribe 

Re: How to know HZ from userspace?

2001-05-30 Thread Albert D. Cahalan

Jonathan Lundell writes:
> At 5:07 PM -0700 2001-05-30, H. Peter Anvin wrote:

>>> If you now want to set those values from a userspace program / script in
>>>  a portable manner, you need to be able to find out of HZ of the currently
>>>  running kernel.
>>
>> Yes, but that's because the interfaces are broken.  The decision has
>> been that these values should be exported using the default HZ for the
>> architecture, and that it is the kernel's responsibility to scale them
>> when HZ != USER_HZ.  I don't know if any work has been done in this
>> area.

Nope.

HZ-derived values are not scaled in the /proc code.
The real value is not available to apps. (Linus said so)
People often change the HZ value.

Thus we have problems.

Maybe I'll post my disgusting hack. You _can_ get HZ out
of /proc if you know where to look. >:-)

> FWIW (perhaps not much in this context), the POSIX way is
> sysconf(_SC_CLK_TCK) POSIX sysconf is pretty useful for this
> kind of thing (not just HZ, either).

That does not report the real value. It reports the default.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] severe softirq handling performance bug, fix, 2.4.5

2001-05-26 Thread Albert D. Cahalan

David S. Miller
> Ingo Molnar writes:

>> (unlike bottom halves, soft-IRQs do not preempt kernel code.)
> ...
>
> Since when do we have this rule? :-)
...
> You should check Softirqs on return from every single IRQ.
> In do_softirq() it will make sure that we won't run softirqs
> while already doing so or being already nested in a hard-IRQ.
> 
> Every port works this way, I don't know where you got this "soft-IRQs
> cannot run when returning to kernel code" rule, it simply doesn't
> exist.

After you two argue this out, please toss a note in Documentation.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4 freezes on VIA KT133

2001-05-24 Thread Albert D. Cahalan

Mark Hahn writes:

> contrary to the implication here, I don't believe there is any *general*
> problem with Linux/VIA/AMD stability.  there are well-known issues
> with specific items (VIA 686b, for instance), but VIA/AMD hardware
> is quite suitable for servers.

VIA hardware is not suitable for anything until we _know_ the
truth about what is wrong. VIA is hiding something big.

Simple fix:

0. get lawyer
1. start class-action lawsuit
2. do discovery
3. unseal court records
4. done -- you may drop the case if not settled already

Well, something like that... not a lawyer, etc.
If you have the time and money, go for it. Have fun.

Creative Labs ought to toast VIA over blaming the sound card. :-)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device

2001-05-24 Thread Albert D. Cahalan

Oliver Xymoron writes:

> The /dev dir should not be special. At least not to the kernel. I have
> device files in places other than /dev, and you probably do too (hint:
> anonymous FTP).

This is a horribly broken FTP server.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: alpha iommu fixes

2001-05-22 Thread Albert D. Cahalan

David S. Miller writes:

> What are these "devices", and what drivers "just program the cards to
> start the dma on those hundred mbyte of ram"?

Hmmm, I have a few cards that are used that way. They are used
for communication between nodes of a cluster.

One might put 16 cards in a system. The cards are quite happy to
do a 2 GB DMA transfer. Scatter-gather is possible, but it cuts
performance. Typically the driver would provide a huge chunk
of memory for an app to use, mapped using large pages on x86 or
using BAT registers on ppc. (reserved during boot of course)
The app would crunch numbers using the CPU (with AltiVec, VIS,
3dnow, etc.) and instruct the device to transfer data to/from
the memory region.

Remote nodes initiate DMA too, even supplying the PCI bus address
on both sides of the interconnect. :-) No IOMMU problems with
that one, eh? The other node may transfer data at will.






-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: LANANA: To Pending Device Number Registrants

2001-05-20 Thread Albert D. Cahalan

Guest section DW writes:
> On Thu, May 17, 2001 at 02:35:55AM -0400, Albert D. Cahalan wrote:

>> The PC partition table has such an ID. The LILO change log
>> mentions it. I think it's 6 random bytes, with some restriction
>> about being non-zero.
>
> You are confused. The partition table contains IDs, but these are
> the numbers like 83 for a Linux partition. No disk-identifying numbers.

Care to explain "duplicate MBR signature handling" in the GPT FAQ?
While describing the new-style partitions, Microsoft mentions that
Windows 2000 has a way to mark old-style ("MBR") partitions:

: 58. What happens if a duplicate Disk or Partition GUID is detected? 
: Windows Whistler will generate new GUIDs for any duplicate Disk GUID,
: MSR Partition GUID, or MSR basic data GUID upon detection. This is
: similar to the duplicate MBR signature handling in Windows 2000.
: Duplicate GUIDs on a dynamic container or database partition
: cause unpredictable results.

Well, the way to test this would be with Windows 2000, two disks,
and a Linux rescue disk that has "dd" on it. See what gets changed
when the "duplicate MBR signature handling in Windows 2000" runs.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] 2.4.5pre3 warning fixes

2001-05-17 Thread Albert D. Cahalan

Bingner Sam J. Con writes:

> Looks to me like it's adding { and } on each side of the
> "c->devices->prev=d;" statement... so changing from:
> 
> if (c->devices != NULL)
>   c->devices->prev=d;
> 
> to 
> 
> if (c->devices != NULL){
>   c->devices->prev=d;
> }
> 
> I assume the new compiler likes the if to have explicit
> brackets instead of using the next statement...

Maybe one of these will make it happy:

(void)(c->devices && (c->devices->prev=d));

!c->devices ?: (c->devices->prev=d);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: LANANA: To Pending Device Number Registrants

2001-05-16 Thread Albert D. Cahalan

Heinz J. Mauelshag writes:

> LVM does a similar thing storing UUIDs in its private metadata
> area on every device used by it.
>
> Problem is: neither MD nor LVM define a standard in Linux
> which *needs* to be used on every device!
>
> It is just up to the user to configure devices with them or not.
>
> BTW: in case we had a Linux standard it wouldn't solve the
> "different OS" situation mentioned in this thread either.
>
>
> Generally speaking:
> 
> It is not the problem to reserve some space to store a uuid or
> something at such and such location on a device.
>
> The problem is the lack of a standard which eventually
> could be implemented in all OSes at some point in time.

The PC partition table has such an ID. The LILO change log
mentions it. I think it's 6 random bytes, with some restriction
about being non-zero.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: ((struct pci_dev*)dev)->resource[...].start

2001-05-16 Thread Albert D. Cahalan

Jeff Garzik writes:
> "Khachaturov, Vassilii" wrote:

>> Can someone please confirm if my assumptions below are correct:
>> 1) Unless someone specifically tampered with my driver's device
>> since the OS bootup, the mapping of the PCI base address registers
>> to virtual memory will remain the same (just as seen in /proc/pci,
>> and as reflected in )? If not, is there a way to freeze it
>> for the time I want to access it?
>
> This is not a safe assumption, because the OS may reprogram the
> PCI BARs at certain times.  The rule is:  ALWAYS read from
> dev->resource[] unless you are a bus driver (PCI bridges, for
> example, need to assign resources).

Well, I have a bus driver. Just how do I get a bus number?
My hardware comes up as a regular device, then mutates into
a bridge when I flip a bit in a config register. The header
even changes from type 1 to type 2. The class code is always
the same, a bridge device, but not PCI-to-PCI. It's kind of
like hot-plug PCI over a network, with all sorts of extra
alignment restrictions on address space allocation.

So maybe this card is on bus 42. I need a secondary bus number,
plus a few more in case there are more bridges downstream.
I can't just grab 42..44 because they might be used elsewhere,
and I can't just grab 253..255 either because that upsets the
whole system of bus number assignment being done by carving up
the space granted to upstream bridges.

BTW, is there any reason why the primary bus register of a
bridge would have to be set correctly? I have to set mine equal
to the secondary bus register to keep the hardware happy.

> Further, access to PCI BARs -and- dev->resource[] in a driver is
> wrong until you have called pci_enable_device.  Resource and IRQ
> assignment potentially occurs at pci_enable_device time, so BAR
> is [potentially] undefined before then.

Hmmm. I can use device-specific config space registers to change
the size of a BAR. (or limit & base, whatever) Say I want to have
512 MB, but the bridge upstream only has 128 MB allotted to it.
How do I fix this?


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Getting FS access events

2001-05-15 Thread Albert D. Cahalan

H. Peter Anvin writes:

> This would leave no way (without introducing new interfaces) to write,
> for example, the boot block on an ext2 filesystem.  Note that the
> bootblock (defined as the first 1024 bytes) is not actually used by
> the filesystem, although depending on the block size it may share a
> block with the superblock (if blocksize > 1024).

The lack of coherency would screw this up anyway, doesn't it?
You have a block device, soon to be in the page cache, and
a superblock, also soon to be in the page cache. LILO writes to
the block device, while the ext2 driver updates the superblock.
Whatever gets written out last wins, and the other is lost.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: How VFS interacts with a file driver

2001-05-14 Thread Albert D. Cahalan

Daniel Phillips writes:
> On Monday 14 May 2001 07:29, Blesson Paul wrote:

>>I am trying to implement a distributed file system.

Me too!  :-)

>> For that I write a file driver. I want to know the following things
>>
>> 1 . If I am writing a new file system, is it necessary to modify the
>> existing structs including inode struct.

Nope. There is a generic pointer you can use. You just need to
figure out when to free it, assuming you don't want to leak
lots of memory. Student projects can leak -- lucky you!

>> 2 . If it is not needed, will a simple registration of the file
>> system is needed to mount the file system
>> More over I am new to this area. I am doing as my
>> graduate project. I need someones help to crack the working of VFS
>> Thanks in advance
>
> 1. In .config, change CONFIG_EXT2_FS to 'm'
> 2. change "ext2" to "newfs" at DECLARE_FSTYPE_DEV in super.c
> 3. make modules SUBDIRS=fs/ext2
> 4. insmod fs/ext2/ext2.o
> 
> Poof!  New filesystem.  (cat /proc/filesystems) Don't forget to change 
> ext2 in .config back to "y" before you build your next kernel.  You'll 
> need to study the kernel *hard* before you can expect to have half a 
> chance of having your filesystem work properly.

Gotta love d_delete the function and d_delete the function pointer
in a struct. Discovery was cause for inventing new curses.

Along the way I stumble accross a "retval" that is only set once
and a "if(de &&" of "if(!de ||" (I forget) that is redundant.
Maybe in the proc or tmpfs code, just in case someone cares enough.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Inodes

2001-05-14 Thread Albert D. Cahalan

Blesson Paul writes:

> This is an another doubt related to VFS. I want to know
> wheather all files are assigned their inode number at the
> mounting time itself or inodes are assigned to files upon
> accessing only

That would depend on what type of filesystem you use.
For ext2, inode numbers are assigned at file creation.
For vfat, inode numbers are assigned as needed, and
forgotten when not needed.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [reiserfs-dev] Re: reiserfs, xfs, ext2, ext3

2001-05-11 Thread Albert D. Cahalan

Hans Reiser writes:

> Tell us what to code for, and so long as it doesn't involve looking
> up files by their 32 bit inode numbers we'll probably be happy to
> code to it.  The Neil Brown stuff is already coded for though.

Next time around, when you update the on-disk format, how about
allowing for such a thing?

You could have a tree that maps from inode number to whatever
you need to find a file. This shouldn't affect much more than
file creation and deletion. Maybe it will allow for a more
robust fsck as well, helping to justify the cost.

It would be really nice to be able to find all filenames that
refer to a given inode number.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Patch to make ymfpci legacy address 16 bits

2001-05-09 Thread Albert D. Cahalan

Jeff Garzik writes:
> Pavel Roskin wrote:

>> You may need to save some data in memory when the system goes
>> to suspend and restore them afterwards. I believe that the PCI
>> config space should be saved by BIOS. Everything else is the
>> responsibility of the driver.
>
> In ACPI land the kernel should save and restore the PCI device
> config space and the PCI bus config space.  It is probably that
> similar is necessary under APM.

When you write "the kernel", do you mean the driver or generic
code? I hope you mean the driver, because I have this:

1. the device looks normal at power on
2. the driver pokes a device-specific config register
3. the config space header changes from type 0 to type 1

(The class code does NOT indicate PCI-to-PCI bridge.
You could say this is like CardBus but much weirder)

If the kernel saves type 1 header data, cuts power using
motherboard features, restores power, and then tries to
restore type 1 header data into a type 0 header... the
system will be well and truly screwed IMHO.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: pci_pool_free from IRQ

2001-05-08 Thread Albert D. Cahalan

Pete Zaitcev writes:

> Russel King complained that you might be calling pci_consistent_free
> from an interrupt, which is unsafe on ARM.

This sure makes life difficult. Device removal events can be called
from interrupt context according to Documentation/pci.txt. This is
certainly a place where one might want to call pci_consistent_free.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: OT: ps source?

2001-05-08 Thread Albert D. Cahalan

Pierre Rousselet writes:
> James Bourne wrote:

>> From the procps man page:
>>Albert Cahalan <[EMAIL PROTECTED]> rewrote ps  for  full
>>Unix98  and  BSD  support,  along with some ugly hacks for
>>obsolete and foreign syntax.
>> 
>>Michael K. Johnson <[EMAIL PROTECTED]>  is  the  current
>>maintainer.

There has been a bit of a fork actually... sorry.

> Right. For international support procps-2.0.7 is the one to choose with
> the patch procps-2.0.7-intl.patch.

That one is quite buggy. The parser is broken ("ps -o %p" fails),
you can get a core dump if you get unlucky with the System.map file,
the BSD-style process selection is incorrect... I've fixed about 100
bugs and introduced only a few.

What you really ought to use is the Debian package. That gives you
my source plus a few fixes that I don't have yet. Head over to
www.debian.org and drill down to the "unstable" package. There you
will find a source tarball and a patch file for it.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: curedump configuration additions

2001-05-05 Thread Albert D. Cahalan

=?iso-8859-1?Q?Jak writes:

> Hi, just wanted to recommend that this goes in, in one form or
> another  -  it would help a lot around here.

Yes, it looks very nice. The codes match those used by ps even.

> Today we have to manually "fix" the kernel
> source to get proper core.[executable] naming of core dumps.

That is just wrong. What about when "tar" dumps core?
It will overwrite my /usr/src/linux/net/core backup.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] SMP race in ext2 - metadata corruption.

2001-05-05 Thread Albert D. Cahalan

Alexander Viro writes:
>> On Fri, 4 May 2001, Alexander Viro wrote:

>>> Ehh... There _is_ a way to deal with that, but it's deeply Albertesque:
>  ^^^

Ah, you learn from the master.

> ObProcfs: I don't think that walking the page tables is a good way to
> compute RSS, especially since VM maintains the thing. Mind if I rip

Handling of mapped device memory should not change. For example
there is the X server with mapped video memory. There is another
RSS value provided elsewhere in case one does not want to include
mapped device memory.

Currently top uses the statm file in the following manner:

  case P_SIZE:
sprintf(tmp, "%5.5s ", scale_k((task->size << CL_pg_shift), 5, 1));
break;
  case P_TRS:
sprintf(tmp, "%4.4s ", scale_k((task->trs << CL_pg_shift), 4, 1));
break;
  case P_SWAP:
sprintf(tmp, "%4.4s ",
scale_k(((task->size - task->resident) << CL_pg_shift), 4, 1));
break;
  case P_SHARE:
sprintf(tmp, "%5.5s ", scale_k((task->share << CL_pg_shift), 5, 1));
break;
  case P_DT:
sprintf(tmp, "%3.3s ", scale_k(task->dt, 3, 0));
break;
  case P_RSS:   /* rss, not resident (which includes IO memory) */
sprintf(tmp, "%4.4s ",
scale_k((task->rss << CL_pg_shift), 4, 1));


> it out? In effect, implementation of /prc//statm
>   * produces extremely bogus values (VMA is from library if it goes
> beyond 0x6000? Might be even true 7 years ago...) and nobody
> had cared about them for 6-7 years

One could count pages that are mapped executable and do not come
from the main executable... but this is pretty worthless and does
not consider non-executable library sections.

The latest "top" does not bother to display this value.

>   * makes stuff like top(1) _walk_ _whole_ _page_ _tables_ _of_ _all_
> _processes_ each 5 seconds. No wonder it's slow like hell and eats
> tons of CPU time.

On my system, "statm" takes 50% longer than "stat" or "status".
Maybe there is a significant difference with Oracle on a 32 GB box?

I'd rather top didn't have to read the file at all.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Possible README patch

2001-05-05 Thread Albert D. Cahalan

Duncan Gauld writes:

> Information in the README file says that when patching, the -p0 option is 
> used with patch (eg tar xvzf .tar.gz | patch -p0). However I have 
> never got this to work as I always get something like "can't find file to 
> patch at line 5". However, replacing -p0 with -p1 seems to work perfectly.
> Maybe the penguin doesn't like me, but still, whenever I've downloaded 
> patches I had to say -p1, not -p0...
...
> -- README Sat May  5 09:51:36 2001
> +++ READMESat May  5 09:52:24 2001
> @@ -66,10 +66,10 @@
> install by patching, get all the newer patch files, enter the
> directory in which you unpacked the kernel source and execute:

This is ambiguous:
"the directory in which you unpacked the kernel source"

If I do "cd /usr/src" then "tar Ixf linux-2.4.4.tar.bz2",
then where did I unpack the kernel source? I think you could
argue for /usr/src or /usr/src/linux equally well.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: iso9660 endianness cleanup patch

2001-05-03 Thread Albert D. Cahalan

Pavel Machek writes:

> It  should ot break anything. gcc decides its bad to inline it, so it
> does not inline it. Small code growth at worst. Compiler has right to
> make your code bigger or slower, if it decides to do so.

Oh come on. The logical way:

inline  Compiler must inline (only!) or report an error.
extern inline   This is a contradiction. Report an error.
static inline   This is a contradiction. Report an error.

Anything else is obvious crap. It isn't OK for the compiler
to ever ignore me; inline recursive functions are just wrong.
Taking the address of an inline function is just wrong too.

Of course the above is not what we are given. We get crap.
The old gcc behavior was crap, and I guess the C99 behavior
is too. So the only sane thing is a #define that is set to
whatever makes the compiler behave as nicely as possible.
Then we use _INLINE everywhere, and get decent behavior out
of both old and new compilers.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Unknown HZ value! (2000) Assume 1024.

2001-05-03 Thread Albert D. Cahalan

Tom Holroyd writes:
> On Wed, 2 May 2001, Albert D. Cahalan wrote:

>> For 32-bit systems, we use 32-bit values to reduce overhead.
>> This causes problems at 495/smp_num_cpus days of uptime.
>
> You mean for HZ == 100.

Well, OK. No unmodified 32-bit system runs HZ == 1024.

> And I guess the overhead in question is the cost
> of a 64 bit add vs. a 32 bit add HZ times per second?  On a 64 bit
> machine, that overhead is likely to be exactly zero.  It is zero on my
> machine.  For integer math on an Alpha, changing the ints to longs can
> even make a program run faster.

Yes.

>> Proposed hack: set a very-long-duration timer (several days)
>> to check for the high bit changing. Count bit flips.
>
> What about the interval between when it flips and when you notice it?

Not a problem. Note that I count bit flips, not roll overs.
Here are the two variables, with "flips" lagging a bit:

flips  jiffies
0  0x7f26
0  0x8003   (not noticed yet)
1  0x8000b01a
1  0xffe7
1  0x0666   (not noticed yet)
2  0xee15

Calculate 64-bit (well, 63-bit) jiffies as:

long long total;
unsigned f = flips;
unsigned j = jiffies;
f += (f ^ (j>>31)) & 1;
total = ((long long)f<<31) | j;

Now print the total.

Well, there it is. Like it? The /proc reader does 64-bit operations
and a timer goes off every few days, saving the clock tick from
doing any 64-bit operations. The fast path stays fast, while procps
can get useful data even after years of uptime.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Unknown HZ value! (2000) Assume 1024.

2001-05-01 Thread Albert D. Cahalan

> /proc/uptime:
> 4400586.27 150439.36
> 
> /proc/stat:
> cpu  371049158 3972370867 8752820 4448994822
>  (user,nice,  system, idle)
> 
> In .../fs/proc/proc_misc.c:kstat_read_proc(), the cpu line is being
> computed by:
> 
> len = sprintf(page, "cpu  %u %u %u %lu\n", user, nice, system,
>   jif * smp_num_cpus - (user + nice + system));

This is pretty bogus. The idle time can run _backwards_ on an SMP
system. What is "top" supposed to do with that, print a negative
number for %idle time? (some versions do, while others truncate
at zero or wrap around to 4 billion -- pick your poison)

> The user, nice, and system values add up to 4352172845 > 2^32, and jif is
> 4400586.27 * 1024 = 4506200340, leading to the incorrect idle time (1
> cpu).  It should be calculated this way:
>
>len = sprintf(page, "cpu  %u %u %u %lu\n", user, nice, system,
> jif * smp_num_cpus - ((unsigned long)user + nice + system));
>
> or just declare those as unsigned longs instead of ints.  I notice also
> that since kstat.per_cpu_nice is an int, it's going to overflow in another
> 3.6 days anyhow.  I'll let you know what blows up then.  Any chance of
> making those guys longs?

That is good for the Alpha.

For 32-bit systems, we use 32-bit values to reduce overhead.
This causes problems at 495/smp_num_cpus days of uptime.

Proposed hack: set a very-log-duration timer (several days)
to check for the high bit changing. Count bit flips.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: iso9660 endianness cleanup patch

2001-04-30 Thread Albert D. Cahalan

Linus Torvalds writes:

> Btw, please use "static inline" instead of "extern inline", as gcc may
> decide not to inline the latter at all, leading to confusing link-time
> errors. (Gcc may also decide not to inline "static inline", but then gcc
> will output the actual body of the function out-of-line if it gets used,
> so you don't get the link-time failure).
> 
> Right now only certain broken versions of gcc will actually show this
> behaviour, I think, but it's at least in theory going to be an issue.

Since the best choice depends on compiler version:

#if(GCC_VERSION_FOO)
#define __inline extern inline
#else
#define __inline static inline
#endif

(that, or _INLINE if you prefer)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



best zero-copy example?

2001-04-29 Thread Albert D. Cahalan


What would be the cleanest driver that does everything right?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4 and 2GB swap partition limit

2001-04-28 Thread Albert D. Cahalan

Rogier Wolff writes:
> Wakko Warner wrote:

>>> So you've spent almost $200 for RAM, and refuse to spend
>>> $4 for 1Gb of swap space. Fine with me. 

So that is a factor of 50 in price. It's what, a factor of 100
in access time?

> That disk space is just sitting there. Never to be used. I spent $400
> on the RAM, and I'm now reserving about $8 worth of disk space for
> swap. I think that the $8 is well worth it. It keeps my machine
> functional a while longer should something go haywire... As I said:
> If you don't want to see it that way: Fine with me. 

It is a disaster waiting to happen. Instead of having the offending
process get killed, your machine could suffer extreme thrashing.

Have enough swap for idle processes and no more.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] SMP race in ext2 - metadata corruption.

2001-04-27 Thread Albert D. Cahalan

Linus Torvalds writes:

> The buffer cache is "virtual" in the sense that /dev/hda is a
> completely separate name-space from /dev/hda1, even if there
> is some physical overlap.

So the aliasing problems and elevator algorithm confusion remain?
Is this ever likely to change, and what is with the 1 kB assumptions?
(Hmmm, cruft left over from the 1 kB Minix filesystem blocks?)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Single user linux

2001-04-26 Thread Albert D. Cahalan

[EMAIL PROTECTED] writes:

> i wrote somewhere that it was my mistake to call it single-user when i
> mean all user has the same root cap, and reduce "user" (account) to
> "profile".

Seen this way it makes a tad more sense:

1. you and your spouse share the computer
2. you have different shells, mail folders, etc.
3. both of you are too lazy to use su or sudo

It isn't really bright having UID 0 have properties that can't
sanely be granted to other UIDs. Sure, we have the capability
bits, but just try using them. On the "would be nice" list goes
the ability to grant capabilities to a user, and the Novell-like
ability to grant one user complete access to the files of
another user without mucking with the permission bits on disk.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Single user linux

2001-04-25 Thread Albert D. Cahalan

[EMAIL PROTECTED] writes:

> i didn't change all uid/gid to 0!
> 
> why? so with that radical patch, users will still have
> uid/gid so programs know the user's profile.

So you:

1. broke security (OK, fine...)
2. didn't remove all the support for security

It would be far more interesting to rip out all trace of security.
That would include the kernel memory access checking, parts of the
task struct, filesystem and VFS code, and surely much more.

Then you can try to show a measurable performance difference.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: hundreds of mount --bind mountpoints?

2001-04-23 Thread Albert D. Cahalan

Richard Gooch writes:

> We want to take out that union because it sucks for virtual
> filesystems. Besides, it's ugly.

I hope you won't mind if people trash this with benchmarks.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Problem with "su -" and kernels 2.4.3-ac11 and higher

2001-04-22 Thread Albert D. Cahalan

Wayne writes:
> In mailing-lists.linux-kernel, Manuel A. McLure wrote:

>> Did you try nesting more than one "su -"? The first one after a boot
>> works for me - every other one fails.
>
> Same here: the first "su -" works OK, but a second nested one hangs:
>
>  8825 pts/2S  0:00 /bin/su -
>  8826 pts/2S  0:00 -bash
>  8854 pts/2T  0:00 stty erase ?
>  8855 pts/0R  0:00 ps ax

Try this:

ps -t pts/2 -o pid,ppid,pgid,sess,f,stat,ruid,euid,fname,nwchan,wchan
ps -t pts/2 s

(replace "pts/2" as needed to select the right tty, and split that
first one into two commands if it is too long)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Request for comment -- a better attribution system

2001-04-21 Thread Albert D. Cahalan

Eric S. Raymond writes:

> This is a proposal for an attribution metadata system in the Linux
> kernel sources.  The goal of the system is to make it easy for
> people reading any given piece of code to identify the responsible
> maintainer.  The motivation for this proposal is that the present
> system, a single top-level MAINTAINERS file, doesn't seem to be
> scaling well.

It is nice to have a single file for grep. With the proposed
changes one would sometimes need to grep every file.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: OK, let's try cleaning up another nit. Is anyone paying attention?

2001-04-19 Thread Albert D. Cahalan

Matthew Wilcox writes:
> On Thu, Apr 19, 2001 at 10:07:22PM -0600, james rich wrote:

>> Doesn't this seem a little like the problems occurring with lvm right now?
>> A separate tree maintained with the maintainers not wanting others
>> submitting patches that conflict with their particular tree?  It seems
>> that any project should be able to submit any patch against The One True
>> Tree: Linus' tree.
>
> every single architecture has their own development tree.

This sucks for users of that architecture. Also, though not
applicable to PA-RISC, it sucks for sub-architecture porters.
(by sub-architecture I mean: Mac, PReP, PowerCore, BeBox, etc.)

It's hard enough deciding between Linus and Alan. I'm not at all
happy trying to pick through obscure CVS and BitKeeper trees that
might not be up-to-data with the latest mainstream bug fixes.

> the pa project
> has not been running as long as the other ports, and has a large amount of
> development going on.  i count 28 commits for april (so far), 75 commits
> for march, 187 for february and 112 for january (to the kernel tree, other
> parts of the port also have commit messages).  linus would go insane if
> we sent him every single one of those patches individually.  and we'd
> go insane trying to keep up with what he'd taken and what he'd dropped.
> 
> until you've actually tried doing this, please don't attempt to criticise.

Have _you_ tried? If I recall correctly, Linus spoke out against the
PowerPC people doing the exact same thing. So unless you get told to
quit annoying him with patches, sending them is the safe bet.

Well here we go. It's about IrDA though, not PowerPC. Read it!
http://lwn.net/2000/1109/a/lt-IrDA.php3

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Kernel 2.5 Workshop RealVideo streams -- next time, please get better audio.

2001-04-17 Thread Albert D. Cahalan

Theodore Tso writes:
> On Mon, Apr 16, 2001 at 05:53:19PM -0700, David S. Miller wrote:

>> It does not work in a relaxed "people sit at tables and comment
>> at arbitrary points in time during a talk" setting such as the
>> kernel summit.  Besides putting a microphone at every table (which
>> isn't all that practical honestly) I can't come up with a solution.
>
> I suspect that if we're going to do this again, having a microphone at
> each table is what we'd have to do, assuming that we can keep the
> numbers of people at the workshop down to 60-70 (which will be a *lot*
> harder next time, since everyone and his brother will want to show up,
> and will therefore pester, whine, and otherwise beg the workshop
> organizers to be included onto the invite list).

Nah, my brother does Java.

Being an outsider, I'm still trying to find out WTF happened
on friday evening when NUMA was discussed. I can't find any
video, audio, or even technical notes. This sucks; I'm writing
support for NUMA hardware (it's not cache coherent) right now
and I don't have any idea where things will be going.

> If we have a lot more people, we'll probably have to go to the two
> microphones in the aisle approach.  But at that point a large part of
> the workshop will be destroyed; so hopefully we'll just be able to
> keep the numbers of people in the workshop to manageable number.

You can have 90% of the people invited to one day only.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Process pinning

2001-04-17 Thread Albert D. Cahalan

Nick Pollitt writes:

> Changes to array.c expose cpus_allowed in proc/pid/stat.  
...
> -%lu %lu %lu %lu %lu %lu %lu %lu %d %d\n",
> +%lu %lu %lu %lu %lu %lu %lu %lu %d %d %lu\n",
...
> - task->processor);
> + task->processor,
> + task->cpus_allowed);

This isn't good. While it might be reasonable to have
an 8*sizeof(long) processor limit in the kernel, it is
not OK to expose this in the API. The API's limit should
be insanely high, like 256 or more.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: RFC: pageable kernel-segments

2001-04-17 Thread Albert D. Cahalan

H. Peter Anvin writes:
> By author:"Heusden, Folkert van" <[EMAIL PROTECTED]>

>> Would anyone be intrested (besides me) in a kernel which can page
...
>> Certain parts of drivers could get the __pageable prefix or so

> VMS does this.  It at least used to have a great tendency to crash
> itself, because it swapped out something that was called from a driver
> that was called by the swapper -- resulting in deadlock.  You need
> iron discipline for this to work right in all circumstances.
> 
> Second, it makes it quite hard to know what operations can cause a
> task to sleep, since any reference to paged-out memory can require a
> page-in and the associated schedule.  You almost need pointer
> annotation in order for this to be safe.

It wouldn't be nearly so dangerous to page from compressed
data in memory. The memory could be ROM.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Kernel 2.5 Workshop RealVideo streams -- next time, please get

2001-04-16 Thread Albert D. Cahalan

Miles Lane writes:
>> Randolph Bentson wrote:

>>> I've heard of conferences where a wireless audience
>>> microphone was put inside a Nerf ball.  It could
>>> then be tossed to the audience member who wished
>>> to speak.
>
> Seriously though, this would probably still be an
> impediment to the sort of stream-of-conciousness
> dialog that we'd like to have.  Sometimes, there
> is a quick series of one or two sentence comments
> from several participants.  With a "mike-in-a-ball"
> your discussion might turn into a sports event.

No, you just need a half dozen microphones. They get
tossed back to assistants on a least-recently-used basis.

> Plus, personally, I am a crappy ball thrower.
> If many of you have my level of athletic prowess,
> there'd be a lot of time spent scrambling under
> tables and chairs.

This is a reason to have athletic assitants, and another
reason to have a half dozen microphones instead of just one.

Still, the post-conference mixing from dozens of overhead
microphones looks best. It adds cost, setup time, and
post-processing time, but is totally reliable and does not
interfere with the conference at all.

If you wanted to get fancy, multiple overhead microphones
ought to let you cancel any sort of background noise with
a bit of 3d audio processing. Sneezes, coughs, footsteps,
people falling out of their chairs... all processed out.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-16 Thread Albert D. Cahalan

> CLOCK_10MS a wall clock supporting timers with 10 ms resolution (same as
> linux today). 

Except on the Alpha, and on some ARM systems, etc.
The HZ constant varies from 10 to 1200.

> At the same time we will NOT support the following clocks:
> 
> CLOCK_VIRTUAL a clock measuring the elapsed execution time (real or
> wall) of a given task.  
...
> For tick less systems we will need to provide code to collect execution
> times.  For the ticked system the current method of collection these
> times will be used.  This project will NOT attempt to improve the
> resolution of these timers, however, the high speed, high resolution
> access to the current time will allow others to augment the system in
> this area.
...
> This project will NOT provide higher resolution accounting (i.e. user
> and system execution times).

It is nice to have accurate per-process user/system accounting.
Since you'd be touching the code anyway...

> The POSIX interface provides for "absolute" timers relative to a given
> clock.  When these timers are related to a "wall" clock they will need
> adjusting when the wall clock time is adjusted.  These adjustments are
> done for "leap seconds" and the date command.

This is a BIG can of worms. You have UTC, TAI, GMT, and a loosely
defined POSIX time that is none of the above. This is a horrid mess,
even ignoring gravity and speed. :-)

Can a second be 2 billion nanoseconds?
Can a nanosecond be twice as long as normal?
Can a second appear twice, with the nanoseconds getting reset?
Can a second never appear at all?
Can you compute times more than 6 months into the future?
How far does time deviate from solar time? Is this constrained?

If you deal with leap seconds, you have to have a table of them.
This table grows with time, with adjustments being made with only
about 6 months notice. So the user upgrades after a year or two,
and the installer discovers that the user has been running a
system that is unaware of the most recent leap second. Arrrgh.

Sure you want to touch this? The Austin group argued over it for
a very long time and never did find a really good solution.
Maybe you should just keep the code simple and fast, without any
concern for clock adjustments.

> In either a ticked or tick less system, it is expected that resolutions
> higher than 1/HZ will come with some additional overhead.  For this
> reason, the CLOCK resolution will be used to round up times for each
> timer.  When the CLOCK provides 1/HZ (or coarser) resolution, the
> project will attempt to meet or exceed the current systems timer
> performance.

Within the kernel at least, it would be good to let drivers specify
desired resolution. Then a near-by value could be selected, perhaps
with some consideration for event type. (for cache reasons)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Unisys pc keyboard new keys patch, kernel 2.4.3

2001-04-16 Thread Albert D. Cahalan

Guest section DW writes:
> On Mon, Apr 16, 2001 at 12:29:11AM -0600, Eric W. Biederman wrote:

>> If we can try to keycodes in 8-bits it would be nice.  The difficulty
>> is that X cannot handle more than 8-bits without telling it you have
>> multiple keyboards.  The keycode (at least in X) is exported to
>> X applications.  This is certainly something to coordinate with the
>> XFree folks about.  If you really need more then 8-bits. 
>
> X keycodes are unrelated to Linux keycodes.

Yes, but they could be. Changing the Linux keycodes is a major
break with compatibility. If the Linux keycodes are to be changed,
then they ought to be become something that would allow XFree86
to become keyboard-independent. Why invent yet another encoding?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bug in EZ-Drive remapping code (ide.c)

2001-04-16 Thread Albert D. Cahalan

Andries.Brouwer writes:
>From [EMAIL PROTECTED] Mon Apr 16 08:35:09 2001
>>Andries.Brouwer writes:

>>> What one wants is to remap access to sector 0 to sector 1,
>>> and leave all other sectors alone. Thus, if someone asks
>>> for sectors 0 1 2 3 4, she should get sectors 1 1 2 3 4.
>>
>> No, because then you can't write to the real first sector.
>> Assuming translation is good, 1 0 2 3 4 is a better order.
>> Then "dd if=/dev/zero of=/dev/hda bs=1k count=999" will get
>> rid of all this crap. Otherwise, killing it is difficult.
>
> If you use EZdrive and damage its code, then probably you
> cannot boot anymore, or lose access to your data.
> Killing it must be difficult.

The above dd command wipes out out the partition table anyway,
with or without EZdrive. I think it also kills the EZdrive code.

EZdrive tends to come installed by default, to support DOS and
similar crufty Microsoft bits. For a pure Linux system it should
be removed.

What you are arguing for is protecting root from himself.
You want to limit the rope, but this is silly as the partitions
themselves are still completely unprotected.

The "1 0 2 3 4" order is nicely 1-to-1, unlike the other orders.

> EZdrive provides uninstall code itself, but if you really want,
> boot with "hda=noremap", and then your dd command will erase
> both EZdrive and your precious data.

This is a pain. The fdisk program ought to have a "wipe EZdisk"
option. More generally, it ought to let the user wipe everything
of a similar nature, by both brute force and by copying the second
sector over the first.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Unisys pc keyboard new keys patch, kernel 2.4.3

2001-04-15 Thread Albert D. Cahalan

H. Peter Anvin writes:

> This means you don't have to configure two levels (scancodes ->
> keycodes and keycodes -> keymap); since currently the keycodes are
> keyboard-specific anyway there is no benefit to the two levels.

The medium-raw level ought to be what the X11R6 protocol uses.
Then the keyboard-specific stuff can be removed from XFree86,
and there would be one less mapping to configure.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bug in EZ-Drive remapping code (ide.c)

2001-04-15 Thread Albert D. Cahalan

Andries.Brouwer writes:

> What one wants is to remap access to sector 0 to sector 1,
> and leave all other sectors alone. Thus, if someone asks
> for sectors 0 1 2 3 4, she should get sectors 1 1 2 3 4.

No, because then you can't write to the real first sector.
Assuming translation is good, 1 0 2 3 4 is a better order.
Then "dd if=/dev/zero of=/dev/hda bs=1k count=999" will get
rid of all this crap. Otherwise, killing it is difficult.

> So yes, the problem is known, but I do not see a clean solution,
> unless the solution is to rip out all this EZ drive nonsense.

Linux should still be able to read the partition table.
The translation can go.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: CML2 1.1.1, wiuth experimental fast mode

2001-04-15 Thread Albert D. Cahalan

>   * Added fast-mode command to suppress side-effect computation 
> on slow machines.

You could put the computation in a low-priority thread, so that it
still gets done but doesn't mess up responsiveness.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: CML2 1.0.0 release announcement

2001-04-11 Thread Albert D. Cahalan

> * All three interfaces do progressive disclosure -- the user only sees
>   questions he/she needs to answer (no more hundreds of greyed-out menu
>   entries for irrelevant drivers!).

Well, that sucks. The greyed-out menu entries were the only good
thing about xconfig. Such entries provide a clue that you need
to enable something else to get the feature you desire. Otherwise
you might figure that the feature is missing, or that you have
overlooked it.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-10 Thread Albert D. Cahalan

Martin Mares writes:
> [lost]

>> Just how would you do kernel/user CPU time accounting then ?
>> It's currently done on every timer tick, and doing it less
>> often would make it useless.
>
> Except for machines with very slow timers we really should account time
> to processes during context switch instead of sampling on timer ticks.
> The current values are in many situations (i.e., lots of processes
> or a process frequently waiting for events bound to timer) a pile
> of random numbers.

Linux should maintain some sort of per-process decaying average.
This data is required for a Unix98-compliant ps program. (for %CPU)
Currently ps is using total CPU usage over the life of the process.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [RFC] Ext2 Directory Index - File Structure

2001-04-10 Thread Albert D. Cahalan

Daniel Phillips writes:

> The zeroth block of an indexed directory is the index root.  Initially
> the index has only one block.  The following blocks are normal ext2
> directory entry blocks.  When the directory grows large enough to fill
> all the available entries in the root index block (around 80-90,000
> entries on a 4K blocksize filesystem) a second level is added to the
> index tree in the form of an internal index block appended to the
> directory.  As the directory expands, new index blocks are appended as
> needed so that the directory consists of normal directory blocks with
> index nodes interspersed every 200 blocks or so.

It looks like you end up jumping back and forth to read the
index blocks. (but maybe I need more sleep) It might be better
to allocate 1, 2, 4, 8, ... index blocks at once, instead of
always allocating just one.

> The high four bits of the block pointer field are reserved for use by
> a coalesce-on-delete feature which may be implemented in the future. 
> The remaining 28 bits are capable of indexing up to 1TB per directory
> file.  (Perhaps I should coopt 8 bits instead of 4.)

Doing a 1 TB directory means you must give up on i_size, which is
too small. You may instead calculate what you need from the block count.
If you don't give up on i_size, you might as well coopt 11 bits.

Oh, just grab 12 or 16 bits. It isn't at all OK to make directories
that are pretty much impossible to read on a 32-bit system. Think
about what /bin/ls must do to sort a 1 TB directory.

> The first kind of forward compatibility is addressed by hiding all the
> new index structures inside what appears to earlier versions of Ext2 to
> be free space.  This is accomplished by placing an empty Ext2 dirent
> structure at the beginning of each index node which marks the entire
> block as empty, from the point of view of non-index-aware versions of
> Ext2.

Well, it looks better than vfat. Next you will be wanting to
increase the inode size and switch to 64-bit block numbers.
You could write such a wonderful NEW filesystem.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Re: softirq buggy

2001-04-09 Thread Albert D. Cahalan

> I'd prefer to inline cpu_is_idle(), but optimizing the idle
> code path is probably not that important ;-)

Sure it is, in one way: how fast can you get back to work?
(not OK to take a millisecond getting out of the idle loop)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: bug database braindump from the kernel summit

2001-04-01 Thread Albert D. Cahalan

Gregory Maxwell writes:
> On Sun, Apr 01, 2001 at 03:43:52PM -0400, Albert D. Cahalan wrote:

>> I'm really sick of being buried in useless information. The signal
>> gets lost in the noise. It is easy to discard automatically generated
>> bug reports, and way too annoying to wade through the crud.
>>
>> When network connections hang, the console-tools package version
>> isn't likely to be of any use. When ramfs leaks memory, nobody needs
>> the content of /proc/pci.
>>
>> Sometimes the bit of crud are HUGE. Imagine the hardware info
>> for a 64-way SGI or Sun box with plenty of devices attached.
>
> Disk space is 'free'.

Disk space isn't the issue. Just a few days ago I tried to help
somebody who posted one of the bloated fill-in-the-form bug reports.
I gave him a useless answer, because I didn't see amid all the junk
that he had no problems with a 2.2.xx kernel. The good information
had been buried in fluff.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: bug database braindump from the kernel summit

2001-04-01 Thread Albert D. Cahalan

Manfred Spraul writes:
> [Larry McVoy]

>> There was a lot of discussion about possible tools
>> that would dig out the /proc/pci info
>
> I think the tools should not dig too much information out of the system.
> I remember some Microsoft (win98 beta?) bugtracking software that
> insisted on sending a several hundert kB long compressed blob with every
> bug report.
> IMHO it must be possible to file bugreports without the complete hw info
> if I know that the bug isn't hw related.

Yep. The two hardware-related items that usually matter:

Little-endian or broken-endian?
32-bit or 64-bit?

The CPU type is not necessary or sufficient, since one can often
run a 32-bit kernel on 64-bit hardware and at least MIPS has both
little-endian and broken-endian supported.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: bug database braindump from the kernel summit

2001-04-01 Thread Albert D. Cahalan

> Problem details
> Bug report quality
>   There was lots of discussion on this.  The main agreement was that we
>   wanted the bug reporting system to dig out as much info as possible
>   and prefill that.  There was a lot of discussion about possible tools
>   that would dig out the /proc/pci info; there was discussion about
>   Andre's tools which can tell you if you can write your disk; someone
>   else had something similar.
> 
>   But the main thing was to extract all the info we could
>   automatically.  One thing was the machine config (hardware and
>   at least kernel version).  The other thing was extract any oops
>   messages and get a stack traceback.

I'm really sick of being buried in useless information. The signal
gets lost in the noise. It is easy to discard automatically generated
bug reports, and way too annoying to wade through the crud.

When network connections hang, the console-tools package version
isn't likely to be of any use. When ramfs leaks memory, nobody needs
the content of /proc/pci.

Sometimes the bit of crud are HUGE. Imagine the hardware info
for a 64-way SGI or Sun box with plenty of devices attached.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] pae-2.4.3-C3

2001-03-28 Thread Albert D. Cahalan

Ingo Molnar writes:

> the attached pae-2.4.3-C3 patch fixes the PAE code to work with SLAB
> FORCED_DEBUG (which enables redzoning) too.
> 
> the problem is that redzoning is enabled unconditionally, and SLAB has no
> information about how crutial alignment is in the case of any particular
> SLAB cache. The CPU generates a general protection fault if in PAE mode a
> non-16-byte aligned pgd is loaded into %cr3.

How about just fixing the debug code to align things? Sure it wastes
a bit of memory, but debug code is like that.

Sane alignment might be: largest power-of-two factor of the size,
or 4 bytes, which ever is larger. (adjust "4" per-arch as needed)
For an SMP config, set a minimum alignment equal to the cache line size.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Larger dev_t

2001-03-27 Thread Albert D. Cahalan

Andrew Pimlott writes:
> On Tue, Mar 27, 2001 at 02:13:47PM -0800, H. Peter Anvin wrote:

>> The problems with devfs (other than kernel memory bloat, which is pretty
>> much guaranteed to be much worse than the bloat a larger dev_t would
>> entail) is that it needs complex auxilliary mechanisms to make
>> "chmod /dev/foo" work as expected (the change to /dev/foo is to be
>> permanent, without having to edit some silly config file)
>
> The elegant solution seems obvious to me.  What we have today is two
> namespaces--device major/minor, and filesystem--that are bridged by
> special files.  Special files live in the filesystem namespace and
> point into the major/minor namespace.  Objects in the major/minor
> namespace are directly accessible only by root (ie, only root can
> mknod(2)); but when accessed through special files, access control
> comes from the special file.
> 
> The concept that makes this work is that the special file is a
> "pointer with permissions".  To make devfs work, you want the same
> thing--except a pointer into filesystem space, not major/minor
> space.  Unix doesn't have this, but it would be a simple cross of
> symlinks (pointer living in the filesystem and pointing into the
> filesystem) and special files (pointers with permissions).
> 
> To be concrete:  You'd have a root-only (or perhaps the directories
> could be a+rx--but minimal policy) hierarchy under /devices, and the
> admin would populate /dev with "special symlinks" that point into
> /devices, and give the appropriate permissions to users.

This can be done with an lchmod() and support for setuid symlinks.

Read can see where the link points
Writeignored, or XOR the on-disk data with 0222 and...?
Execute  can follow the link
Setuid   link followed as for the owner
Setgid   link followed as for the owner's group
Sticky   reserved for future use

Then you get:

lr-sr-xr-x 1 root root 17 Mar 21 2000 /dev/null -> /devices/mem/null
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Larger dev_t

2001-03-27 Thread Albert D. Cahalan

[EMAIL PROTECTED] writes:
> [Linus Torvalds]

>> You'e now forced every piece of code that needs a dev_t
>> to carry along the overhead of having a 64-bit field
>
> Let me repeat: there is no such code. In user space dev_t already is
> 64 bits, whether you like it or not. We cannot go back to libc5.
...
> In other words, inside the kernel the normal obvious coding will
> give us ints major, minor. Outside the kernel we have a 64-bit dev_t.
...
> But while dev_t already is 64-bits in user space, the same does not

In your dreams

int c_has_loose_type_checking(char *name){
  struct stat sbuf;
  /* ... */
  return sbuf.st_rdev;
}

Then we have NFSv2, archive file formats, and zillions of
little tools.

I enjoy truncating dev_t to a reasonable size. Sometimes I check
my input arguments for illogically huge values, and other times I
just relish the opportunity to inflict data loss on you personally.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Improved version reporting

2001-03-23 Thread Albert D. Cahalan

Riley Williams writes:
> Hi Albert.

 The rule should be like this:

List the lowest version number required to get
2.2.xx-level features while running a 2.4.xx kernel.

>>> Replace that "a 2.2.xx" with "my current" and remove all
>>> restrictions on what the current kernel is, and that becomes
>>> an important question.
>>
>> No, not "my current" but "the previous stable". I say "2.2.xx"
>> because that is the previous stable kernel.
>
> Again, saying either "2.2.xx" or "the previous stable" is meaningless.
> Saying "The 2.2 kernel series" might have some meaning if it was not
> for the fact that the requirements differ for different members of
> that (or any other) kernel series.

Oh please. List whatever the hell is needed for an upgrade from any
of 2.2, 2.2.1, 2.2.2, 2.2.3, 2.2.4, 2.2.5, ..., 2.2.255 of course.
Also include previous 2.4.xx kernels, in case some bastard decided to
break stuff within a stable kernel series... like PPP for example.

> On Saturday night, I had the pleasure of upgrading a system from the
> 2.2.4 kernel to the 2.4.2 one, and another system from the 2.2.14
> kernel to the 2.4.2 one. Although the target kernel version was the
> same, several subsystems needed upgrading on the former that did NOT
> need upgrading on the latter, and that was just to compile the thing!

So what? Your point is??? Obviously one system was partly upgraded.

> According to you, both of these upgrades would have required EXACTLY
> THE SAME upgrades to be made, but that isn't the case.

I never claimed that.

>> If you upgrade from 2.0.xx, you should read the 2.2.xx changes file.
> 
> Fairy Nuff.
> 
> However, your argument still fails, simply because of its reliance on
> the assumption that an entire kernel series has static requirements
> when such just isn't the case.

There is no such assumption.

If 2.2.4 needs foo-1.7 while 2.2.5 and 2.4.4 need foo-1.8, then
foo gets listed. If 2.1.99 needs bar-0.6 while 2.2.0 and 2.4.4
need bar 0.7, then there is no need to list bar. Never mind that
both foo and bar are up to version 666, since that isn't needed.

>> The important thing is to avoid version number inflation. I don't
>> even bother reading the changes file, because I know it is bogus.
>> Nearly all of my old software works great with a 2.4.xx kernel.
>
> The fact that you said "Nearly all" shows that your argument is false,
> since your argument has been that NO software needs upgrading.
>
> Can I suggest that you re-read my previous missive, and actually look
> at the points raised. If you do, we might just get a sensible
> discussion on this subject...

Try it yourself, w/o alcohol. I didn't argue "NO software...".

> Incidentally, your argument to date has assumed that everybody always
> installs every single kernel version. In my opinion, that is a very
> dangerous assumption to make.

Nope. Most people go from one stable series to the next, often skipping
the first and last few revisions. (2.2.6, 2.2.9, 2.2.17, 2.4.3, 2.4.8...)

> A more responsible assumption would be
> that the person wishing to upgrade to the version in this particular
> kernel source tree has a random system installed, and wishes to know:

That random system should be capable of working with at least
one kernel from the previous stable series.

>  1. What is the absolute minimum upgrades required to compile the
> kernel in the source tree I have just downloaded?
> 
>  2. What is the absolute minimum upgrades required to install the
> kernel in the source tree I have just downloaded and compiled?
> 
>  3. What is the absolute minimum upgrades required to enable me
> to run the kernel I have just compiled from this source tree,
> assuming that I wish to retain the use of the shell scripts
> that I developed under my previous kernel?
> 
>  4. What other upgrades are recommended for reasons of system
> security or stability?

Good, assuming "reasons of system security or stability" relates
to problems that a kernel upgrade might cause.

>  5. What further upgrades are required to enable me to make use
> of the advertised new facilities in this kernel?

This is noise. Such upgrades are not required.

>  6. What additional subsystems could be upgraded if desired?

This is worse noise: "...and The GIMP is now at version..."

>  7. I note that some upgrades are only required if certain of the
> subsystems are installed. Which upgrades are these, and which
> subsystems are they dependant on?

This is getting too fancy.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Improved version reporting

2001-03-18 Thread Albert D. Cahalan

Riley Williams writes:

>> The rule should be like this:
>>
>>  List the lowest version number required to get
>>  2.2.xx-level features while running a 2.4.xx kernel.
>
> That's a meaningless definition, and can only be taken as such. What
> use would such a list be to somebody wishing (like I recently was) to
> upgrade a system running the 2.0.12 kernel so it runs the 2.4.2
> kernel instead?
...
>> Basically I ask: would existing scripts for a 2.2.xx kernel
>> break? If the old mount can still do what it used to do, then
>> "mount" need not be listed at all.
>
> Replace that "a 2.2.xx" with "my current" and remove all restrictions
> on what the current kernel is, and that becomes an important question.

No, not "my current" but "the previous stable". I say "2.2.xx" because
that is the previous stable kernel.

If you upgrade from 2.0.xx, you should read the 2.2.xx changes file.

The important thing is to avoid version number inflation. I don't
even bother reading the changes file, because I know it is bogus.
Nearly all of my old software works great with a 2.4.xx kernel.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Improved version reporting

2001-03-16 Thread Albert D. Cahalan

Andries.Brouwer writes:
>> From: "Albert D. Cahalan" <[EMAIL PROTECTED]>
>>> On Wed, 14 Mar 2001 [EMAIL PROTECTED] wrote:

>>>>> +o  Console Tools  #   0.3.3# loadkeys -V
>>>>> +o  Mount  #   2.10e# mount --version
>>>>
>>>> Concerning mount: (i) the version mentioned is too old,
>>
>> Exactly why? Mere missing features don't make for a required
>> upgrade. Version number inflation should be resisted.
...
> These days you can mount several filesystems at the same mount point.
> The old mount does not understand this at all.
> Recent versions of mount act better in this respect,
> even though it is still easy to confuse them.

The rule should be like this:

   List the lowest version number required to get
   2.2.xx-level features while running a 2.4.xx kernel.

Remember what the purpose of the table is. It is a list of REQUIRED
upgrades. Failure to upgrade should result in a broken system. So pppd
must be listed, since somebody changed the kernel API for 2.4.1.

If I run the mount command from Red Hat 6.2, using it as intended
for a 2.2.xx kernel, doesn't everything work? There won't be any
multi-mount confusion because 2.2.xx can't do that anyway. There
isn't any problem with NFSv3 either, since 2.2.xx lacks NFSv3.

Basically I ask: would existing scripts for a 2.2.xx kernel break?
If the old mount can still do what it used to do, then "mount" need
not be listed at all.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Improved version reporting

2001-03-14 Thread Albert D. Cahalan

Alexander Viro writes:
> On Wed, 14 Mar 2001 [EMAIL PROTECTED] wrote:

>>> +o  Console Tools  #   0.3.3# loadkeys -V
>>> +o  Mount  #   2.10e# mount --version
>>
>> Concerning mount: (i) the version mentioned is too old,

Exactly why? Mere missing features don't make for a required
upgrade. Version number inflation should be resisted.

>> Concerning Console Tools: maybe kbd-1.05 is uniformly better.
>> I am not aware of any reason to recommend the use of console-tools.
>
> Debian has console-tools with priority:required and kbd
> with priority:extra. Take it with Yann Dirson...

Both should be "extra". They can be removed from the version script.
I'm even one of the few remaining console users.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: system call for process information?

2001-03-13 Thread Albert D. Cahalan

Nathan Paul Simons writes:
> On Tue, Mar 13, 2001 at 04:05:13PM -0500, Albert D. Cahalan wrote:

>> Bloat removal: being able to run without /proc mounted.
>>
>> We don't have "kernel speed". We have kernel-mode screwing around
>> with text formatting.
>
>   Or calculating things that really should be taken care of in
> user space, such as CPU utilization.

That can not be done reliably in user space. I know this; the "top"
program used to try.

>> This isn't just for him. Many people have wanted it.
>
>   Yes, but how many people would actually *use* it?  How many
> programs out of the thousands out there would benefit from this?
> If it's more than 50 widely used packages, I'd be more than happy
> to see something that speeds them all up added to the kernel.

Oh please. How many programs use the mount() system call? One?
Most system calls are rarely used. This is OK.

>> 1. variable-length ASCII strings with undefined ad-hoc syntax
>
>   Use enumerated string functions, always.
>
>> 2. array of fixed-size (64-bit) values
>
>   It's an array?  That can still be overflowed by sloppy
> programming.

No it can't. You fill it like this:

tmp[0] = p->pid;
tmp[1] = p->uid;
/* ... */

Throw in some pretty symbolic names if you like. It's effectively
a struct, but a real struct would tempt people to use non-64-bit
values. Using an array enforces uniform 64-bit usage.

Good design involves NOT tempting people to write irregular hacks.

>  When it comes right down to it, I'd rather have
> something that could potentially die badly be run on the user
> side, rather than the kernel side.

Good. Thus you'd like the new system call in place of our
current pile of crud. Unfortunately the crud will need to
remain for at least a decade of transition time.

>> Parsing costs programmer time.
>
>   But it's fairly easy to do in any number of programming
> languages besides C which can't be easily used in the kernel.
> Not to mention parsing libraries for C that fit much better on
> the user side because they would make the kernel huge and slow
> if compiled into it.

Huh? The kernel need not parse its own ASCII output. The kernel
natively maintains information in a binary format. The proposed
system call would not parse /proc output!!!

>   Last but not least, I don't want to waste time in kernel
> scanning through a list of syscalls a mile long, half of them
> I don't ever use.

Well, tough luck. Learn to use an editor with search ability.
Even "less" and Netscape can search.

>  Or having a kernel that's so big that you
> can't fit it on embedded systems anymore.

The proposed system call was implemented for an embedded system.
This allowed operation without the /proc filesystem, which is
some serious bloat.

> And once you start
> adding every "nifty" syscall that comes along, that's what
> will happen.  So again, I say give us all a really good reason
> for this syscall, or just hack it into your own kernels and
> let us have our speedy, small vanilla kernels.

If you think /proc is speedy and small...



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: system call for process information?

2001-03-13 Thread Albert D. Cahalan

Nathan Paul Simons writes:
> On Mon, Mar 12, 2001 at 09:21:37PM +, Guennadi Liakhovetski wrote:

>> CPU utilisation. Each new application has to calculate it (ps, top, qps,
>> kps, various sysmons, procmons, etc.). Wouldn't it be worth it having a
>> syscall for that? Wouldn't it be more optimal?
>
>   No, it wouldn't be worth it because you're talking about 
> sacrificing simplicity and kernel speed in favor of functionality.
> This has been know to lead to "bloat-ware".  Every new syscall you

Bloat removal: being able to run without /proc mounted.

We don't have "kernel speed". We have kernel-mode screwing around
with text formatting.

> add takes just a little bit more time and space in the kernel, and
> when only a small number of programs will be using it, it's really 
> not worth it.  This time and space may not be large, but once you
> get _your_ syscall added, why can't everyone else get theirs added 
> as well?  And so, after making about a thousand specialized syscalls
> standard in the kernel, you end up with IRIX (from what I've heard).

This isn't just for him. Many people have wanted it.

>   Don't even get me started about opening security holes, and
> increasing code complexity.  Please do a search for every other

I'll get you started. Compare:

1. variable-length ASCII strings with undefined ad-hoc syntax
2. array of fixed-size (64-bit) values

> ps - CPU time is cheap, that's why they don't charge for it anymore.
> Programmer time is _not_.

Parsing costs programmer time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Penguin logos

2001-03-11 Thread Albert D. Cahalan

Geert Uytterhoeven writes:

>   - The colors for the 16 color logo are wrong. We used a hack to
> give the logo its own color palette, but this no longer works
> as a side effect of a console color map bug being fixed a while
> ago. The solution is to replace the logo with a new one that
> uses the standard VGA console palette.

Good idea, but the feet don't look too good. Either dither a bit,
or pick a single color for the feet. Maybe a checkerboard-dither
would get close to the right color without looking grainy.

>   - There are still some politically-incorrect (PI) logos of a penguin
> holding a glass of beer or wine (or perhaps even worse? :-).

Those also just look bad. The drink sort of floats above the penguin's
foot. It really looks like it was just pasted onto the image.

The arch-specific logos look bad in general, and the swirly gray
background isn't so great either. Why not use the original image?

> Changes:
>  1. Update the frame buffer console code to no longer change the
> palette when displaying the 16 color logo. Remove the tricks
> to load the logo palette in unused palette entries on displays
> with >= 32 colors.

I used to have only 256 colors on my display. I upgraded because
there still isn't a global system palette. I'd have been happy
enough with 256 colors allocated in a sane way, for kernel & X:

1. the 16 VGA colors and extra 4 Windows colors (so Wine can work)
2. the 216 Netscape colors
3. gray: 0x00, 0x11, 0x22... 0xff, plus both 0x7f and 0x80
4. everything else reserved for future global allocation

The current situation is way too painful to use.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Status of posix-ACL's

2001-03-11 Thread Albert D. Cahalan

Jochen Dolze writes:

> i found at http://acl.bestbits.at the ACL-linux-project. Now i want to know,
> if there is a plan to integrate posix-ACLs into the fs-part of the kernel,
> e.g. into the VFS-Layer? Is there a general discussion about this anywhere?
> What are the biggest problems? (i know that many userland-tools must be
> changed for this).

I hope not. POSIX ACLs are crap. NFSv4 mostly follows NT.
Compatibility with NFSv4 and SMB (Samba's protocol) is important.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Process vs. Threads

2001-03-09 Thread Albert D. Cahalan

Hank Leininger writes:
> On 2001-03-07, "Albert D. Cahalan" <[EMAIL PROTECTED]> wrote:

>> Then for proper ps and top output, you need a reasonably efficient
>> way to grab all threads as a group. This could be as simple as
>> ensuring that /proc directory reads return related tasks together.
>> This works too:   /proc/42/threads/98 -> ../../98
>
> For this (but not for other "proper thread support" things
> you mention) would it be enough to have /proc publish some token
> that represent unique ->fs, ->mm, etc pointers?  (The kernel-space
> address of each would work, though that might be leaking too much
> info; the least userspace must treat such values as opaque canary
> tokens.)  This does not give you the most efficient "ps --threads 231"
> but it does let ps, top, (fuser?), etc group processes with the
> same vm, files, etc, no?

You've identified the problem yourself.

When I wrote the new ps, I made a rule for myself: the default
output would not require sorting of any kind. Output would be
produced as soon as possible. This is for performance, and to
help tolerate kernel bugs that cause a hang.

So far I've resisted using threads myself to work around the
hang problem.

It won't be "ps --threads 231". It will be one of the following
options if you do want to see individual threads: m -m -T -L
(the options are in use on Tru64, AIX, IRIX, Solaris, UnixWare...)

So think of a way to wrap tasks together, preferably in a
way that is impossible for a non-POSIX thread to escape from.
Taking a guess at it:

Have an inherited task-group-ID that gets set equal to the task ID
whenever a task breaks away from other tasks. This includes fork(),
execve(), and any unshare() call that might be implemented. Note that
this does not, and indeed _must_ not, enforce POSIX signal behavior.
(the leader of a new and empty group might be able to request the
POSIX behavior for his group, but it can not be the default)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Process vs. Threads

2001-03-07 Thread Albert D. Cahalan

Helge Hafting writes:
> Gregory Maxwell wrote:

>> There are no threads in Linux.
>> All tasks are processes.
>> Processes can share any or none of a vast set of resources.
>
> Is there a way a user program can find out what resources 
> are shared among which processes? 
>
> That would allow enhancing ps, top, etc to
> report memory usage correctly.

I already looked into this. Sorry, it can not be done.

Linux briefly had the code needed to support threads properly.
Linus added it with the warning that it would be removed if he
didn't get enough feedback. Well I have a real job, and the code
was gone before the weekend! Look around near 2.4.0-test8 maybe.

For proper thread support:

First you need the concept of a thread group. This groups tasks
together similar to the way they form process groups and sessions.

Then for proper POSIX thread support, you need a flag to indicate
some awkward POSIX-mandated signal behavior within a thread group.

Then for proper ps and top output, you need a reasonably efficient
way to grab all threads as a group. This could be as simple as
ensuring that /proc directory reads return related tasks together.
This works too:   /proc/42/threads/98 -> ../../98

Severely non-POSIX threads are just not going to do anything sane,
unless thread groups get automatically wrapped around any threads
that share resources. So if 50 shares memory with 67, and 50 shares
the filesystem with 82, then 67 and 82 are non-POSIX threads of the
same non-POSIX process even if they share nothing with each other.

Automatic wrapping works much better, assuming it doesn't also cause
the awkward POSIX signal behavior by default. Tasks should need to
explicitly request the extra suffering.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



  1   2   3   >