Re: user limits for 'security'?

2001-06-25 Thread LA Walsh

I suppose another question related to the first, is 'limit' checking
part of the 'standard linux security' that embedded Linux users might
find to be a waste of precious code-space?

-l

--
The above thoughts and| I know I don't know the opinions
writings are my own.  | of every part of my company. :-)
L A Walsh, law at sgi.com | Sr Eng, Trust Technology
01-650-933-5338   | Core Linux, SGI



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



user limits for 'security'?

2001-06-25 Thread LA Walsh

I've seen some people saying that user-limits are an essential part of a
secure system to prevent local DoS attacks.  Given that, should
a system call like 'fork' return -EPERM if the user has reached their
limit?

My local manpage (SuSE 7.2 system) says this under fork:

ERRORS
   EAGAIN fork  cannot allocate sufficient memory to copy the
  parent's page tables and allocate a task  structure
  for the child.
-
Should the man page be updated to reflect that EAGAIN is returned
when the user has reached their limit?  From a user-monitoring point
of view, it might be security relevant to know if a EAGAIN is being
returned because the system really is low on resources or if it
is a user hitting their limit.

--
The above thoughts and| I know I don't know the opinions
writings are my own.  | of every part of my company. :-)
L A Walsh, law at sgi.com | Sr Eng, Trust Technology
01-650-933-5338   | Core Linux, SGI



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Break 2.4 VM in five easy steps

2001-06-07 Thread LA Walsh

"Eric W. Biederman" wrote:

> LA Walsh <[EMAIL PROTECTED]> writes:
>
> > Now for whatever reason, since 2.4, I consistently use at least
> > a few Mb of swap -- stands at 5Meg now.  Weird -- but I notice things
> > like nscd running 7 copies that take 72M.  Seems like overkill for
> > a laptop.
>
> So the question becomes why you are seeing an increased swap usage.
> Currently there are two canidates in the 2.4.x code path.
>
> 1) Delayed swap deallocation, when a program exits after it
>has gone into swap it's swap usage is not freed. Ouch.

---
Double ouch.  Swap is backing a non-existent program?

>
>
> 2) Increased tenacity of swap caching.  In particular in 2.2.x if a page
>that was in the swap cache was written to the the page in the swap
>space would be removed.  In 2.4.x the location in swap space is
>retained with the goal of getting more efficient swap-ins.


But if the page in memory is 'dirty', you can't be efficient with swapping
*in* the page.  The page on disk is invalid and should be released, or am I
missing something?

> Neither of the known canidates from increasing the swap load applies
> when you aren't swapping in the first place.  They may aggrevate the
> usage of swap when you are already swapping but they do not cause
> swapping themselves.  This is why the intial recommendation for
> increased swap space size was made.  If you are swapping we will use
> more swap.
>
> However what pushes your laptop over the edge into swapping is an
> entirely different question.  And probably what should be solved.


On my laptop, it is insignificant and to my knowledge has no measurable
impact.  It seems like there is always 3-5 Meg used in swap no matter what's
running (or not) on the system.

> > I think that is the point -- it was supported in 2.2, it is, IMO,
> > a serious regression that it is not supported in 2.4.
>
> The problem with this general line of arguing is that it lumps a whole
> bunch of real issues/regressions into one over all perception.  Since
> there are multiple reasons people are seeing problems, they need to be
> tracked down with specifics.

---
Uhhh, yeah, sorta -- it's addressing the statement that a "new requirement of
2.4 is to have double the swap space".  If everyone agrees that's a problem, then
yes, we can go into specifics of what is causing or contributing to the problem.
It's getting past the attitude of some people that 2xMem for swap is somehow
'normal and acceptable -- deal with it".  In my case, seems like 10Mb of swap would
be all that would generally be used (I don't think I've ever seen swap usage over 7Mb)
on a 512M system.  To be told "oh, your wrong, you *should* have 1Gig or you are
operating in an 'unsupported' or non-standard configuration".  I find that very
user-unfriendly.


>
> The swapoff case comes down to dead swap pages in the swap cache.
> Which greatly increases the number of swap pages slows the system
> down, but since these pages are trivial to free we don't generate any
> I/O so don't wait for I/O and thus never enter the scheduler.  Making
> nothing else in the system runnable.

---
I haven't ever *noticed* this on my machine but that could be
because there isn't much in swap to begin with?  Could be I was
just blissfully ignorant of the time it took to do a swapoff.
Hmmmlet's see...  Just tried it.  I didn't get a total lock up,
but cursor movement was definitely jerky:
> time sudo swapoff -a

real0m10.577s
user0m0.000s
sys 0m9.430s

Looking at vmstat, the needed space was taken mostly out of the
page cache (86M->81.8M) and about 700K each out of free and buff.


> Your case is significantly different.  I don't know if you are seeing
> any issues with swapping at all.  With a 5M usage it may simply be
> totally unused pages being pushed out to the swap space.

---
Probably -- I guess the page cache and disk buffers put enough pressure to
push some things off to swap.

-linda
--
The above thoughts and   | They may have nothing to do with
writings are my own. | the opinions of my employer. :-)
L A Walsh| Senior MTS, Trust Tech, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Break 2.4 VM in five easy steps

2001-06-07 Thread LA Walsh

"Eric W. Biederman" wrote:

> There are cetain scenario's where you can't avoid virtual mem =
> min(RAM,swap). Which is what I was trying to say, (bad formula).  What
> happens is that pages get referenced  evenly enough and quickly enough
> that you simply cannot reuse the on disk pages.  Basically in the
> worst case all of RAM is pretty much in flight doing I/O.  This is
> true of all paging systems.


So, if I understand, you are talking about thrashing behavior
where your active set is larger than physical ram.  If that
is the case then requiring 2X+ swap for "better" performance
is reasonable.  However, if your active set is truely larger
than your physical memory on a consistant basis, in this day,
the solution is usually "add more RAM".  I may be wrong, but
my belief is that with today's computers people are used to having
enough memory to do their normal tasks and that swap is for
"peak loads" that don't occur on a sustained basis.  Of course
I imagine that this is my belief as it is my own practice/view.
I want to have considerably more memory than my normal working
set.  Swap on my laptop disk is *slow*.  It's a low-power, low-RPM,
slow seek rate all to conserve power (difference between spinning/off
= 1W).  So I have 50% of my phys mem on swap -- because I want to
'feel' it when I goto swap and start looking for memory hogs.
For me, the pathological case is touching swap *at all*.  So the
idea of the entire active set being >=phys mem is already broken
on my setup.  Thus my expectation of swap only as 'warning'/'buffer'
zone.

Now for whatever reason, since 2.4, I consistently use at least
a few Mb of swap -- stands at 5Meg now.  Weird -- but I notice things
like nscd running 7 copies that take 72M.  Seems like overkill for
a laptop.

> However just because in the worst case virtual mem = min(RAM,swap), is
> no reason other cases should use that much swap.  If you are doing a
> lot of swapping it is more efficient to plan on mem = min(RAM,swap) as
> well, because frequently you can save on I/O operations by simply
> reusing the existing swap page.

---
Agreed.  But planning your swap space for a worst
case scenario that you never hit is wasteful.  My worst
case is using any swap.  The system should be able to live
with swap=1/2*phys in my situation.  I don't think I'm
unique in this respect.

> It's a theoretical worst case and they all have it.  In practice it is
> very hard to find a work load where practically every page in the
> system is close to the I/O point howerver.

---
Well exactly the point.  It was in such situations in some older
systems that some programs were swapped out and temporarily made
unavailable for running (they showed up in the 'w' space in vmstat).

> Except for removing pages that aren't used paging with swap < RAM is
> not useful.  Simply removing pages that aren't in active use but might
> possibly be used someday is a common case, so it is worth supporting.

---
I think that is the point -- it was supported in 2.2, it is, IMO,
a serious regression that it is not supported in 2.4.

-linda

--
The above thoughts and   | They may have nothing to do with
writings are my own. | the opinions of my employer. :-)
L A Walsh| Senior MTS, Trust Tech., Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Break 2.4 VM in five easy steps

2001-06-06 Thread LA Walsh

"Eric W. Biederman" wrote:

> The hard rule will always be that to cover all pathological cases swap
> must be greater than RAM.  Because in the worse case all RAM will be
> in thes swap cache.  That this is more than just the worse case in 2.4
> is problematic.  I.e. In the worst case:
> Virtual Memory = RAM + (swap - RAM).

Hmmmso my 512M laptop only really has 256M?  Um...I regularlly run
more than 256M of programs.  I don't want it to swap -- its a special, weird
condition if I do start swapping.  I don't want to waste 1G of HD (5%) for
something I never want to use.  IRIX runs just fine with swap You can't improve the worst case.  We can improve the worst case that
> many people are facing.

---
Other OS's don't have this pathological 'worst case' scenario.  Even
my Windows [vm]box seems to operate fine with swap It's worth complaining about.  It is also worth digging into and find
> out what the real problem is.  I have a hunch that this hole
> conversation on swap sizes being irritating is hiding the real
> problem.

---
Okay, admission of ignorance.  When we speak of "swap space",
is this term inclusive of both demand paging space and
swap-out-entire-programs space or one or another?
-linda

--
The above thoughts and   | They may have nothing to do with
writings are my own. | the opinions of my employer. :-)
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: ln -s broken on 2.4.5

2001-05-30 Thread LA Walsh

Marcus Meissner wrote:

> $ ln -s fupp/bar bar
> $ ls -la bar

---
Is it peculiar to a specific architecture?
What does strace show for args to the symlink cmd?
-l
--
The above thoughts and   | They may have nothing to do with
writings are my own. | the opinions of my employer. :-)
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[i386 arch] MTR messages significant?]

2001-05-08 Thread LA Walsh

I've been seeing these for a while now (2.4.4 - <=2.4.2) also coincidental
with a change to XFree86 X 4.0.3 from "MetroX" in the time frame.  Am not sure
exactly when they started but was wondering if they were significant.  It
seems some app is trying to delete or modify something.  On console and in syslog:

mtrr: no MTRR for fd00,80 found
mtrr: MTRR 1 not used
mtrr: reg 1 not used

while /proc/mtrr currently contains:

reg00: base=0x (   0MB), size= 512MB: write-back, count=1
reg01: base=0xfd00 (4048MB), size=   8MB: write-combining, count=1

Could it be the X server trying to delete a segment when it it starts up or
shuts down?  Is it an error in the X server to try to delete a non-existant
segment?  Does the kernel 'care'?  I.e. -- why is it printing out messages --
are they debug messages that perhaps should be off by default?

Concurrent with these messages and perhaps unrelated is a new, unwelcome,
behavior of X dying on display of some Netscape-rendered websites (cf. it
doesn't die under konqueror).

thanks,
-linda
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.4 code breaks compile of VMWare network bridging

2001-05-02 Thread LA Walsh

"Mohammad A. Haque" wrote:

> This was answered several hours ago. Check the list archives.

---
Many thanks -- it was in my neverending backlog

-l

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



2.4.4 code breaks compile of VMWare network bridging

2001-05-02 Thread LA Walsh

In 2.4.4, the define, in
include/linux/skbuff.h
and corresponding structure in
net/core/skbuff.c
, "skb_datarefp" disappeared.

I'm not reporting this as a 'bug' as kernel internal interfaces are subject
to change, but more as an "FYI".  I haven't had a chance to try to
debug or figure out the offending bit of code to see exactly what it
was trying to do, but the offending code snippet follows.  I haven't yet
reported it to the folks at VMware, but their response to problem reports
against 2.4.x is "can you duplicate it against 2.2.x, we don't support
2.4.x yet".  Perhaps someone expert in the 'net/core' area could explain
what changed and what they shouldn't be doing anymore?

It appears the references:
#  define KFREE_SKB(skb, type)  kfree_skb(skb)
#  define DEV_KFREE_SKB(skb, type)  dev_kfree_skb(skb)
^^
are the offending culprits.

Thanks for any insights...
-linda

/*
 *--
 * VNetBridgeReceiveFromDev --
 *  Receive a packet from a bridged peer device
 *  This is called from the bottom half.  Must be careful.
 * Results:
 *  errno.
 * Side effects:
 *  A packet may be sent to the vnet.
 *--
 */
int
VNetBridgeReceiveFromDev(struct sk_buff *skb,
 struct device *dev,
 struct packet_type *pt)
{
   VNetBridge *bridge = *(VNetBridge**)&((struct sock *)pt->data)->protinfo;
   int i;

   if (bridge->dev == NULL) {
  LOG(3, (KERN_DEBUG "bridge-%s: received %d closed\n",
  bridge->name, (int) skb->len));
  DEV_KFREE_SKB(skb, FREE_READ);
  return -EIO;  // value is ignored anyway
   }

   // XXX need to lock history
   for (i = 0; i < VNET_BRIDGE_HISTORY; i++) {
  struct sk_buff *s = bridge->history[i];
  if (s != NULL &&
  (s == skb || SKB_IS_CLONE_OF(skb, s))) {
 bridge->history[i] = NULL;
 KFREE_SKB(s, FREE_WRITE);
 LOG(3, (KERN_DEBUG "bridge-%s: receive %d self %d\n",
 bridge->name, (int) skb->len, i));
 // FREE_WRITE because we did the allocation, it's not used anyway
 DEV_KFREE_SKB(skb, FREE_WRITE);
 return 0;
  }
   }
   skb_push(skb, skb->data - skb->mac.raw);
   VNetSend(&bridge->port.jack, skb);

   return 0;
}

--
The above thoughts and   | They may have nothing to do with
writings are my own. | the opinions of my employer. :-)
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4 and 2GB swap partition limit

2001-04-27 Thread LA Walsh

Rik van Riel wrote:

> On Fri, 27 Apr 2001, LA Walsh wrote:
>
> > An interesting option (though with less-than-stellar performance
> > characteristics) would be a dynamically expanding swapfile.  If you're
> > going to be hit with swap penalties, it may be useful to not have to
> > pre-reserve something you only hit once in a great while.
>
> This makes amazingly little sense since you'd still need to
> pre-reserve the disk space the swapfile grows into.

---
Why?  Why not have a zero length file that you grow only if you spill?
If you can't spill, you are out of memory -- or reserve a 'safety'
margin ahead -- like reserve 32k at a time and grow it.  It may make
little sense, but I believe it is what is used on pseudo OS's
like Windows -- you *can* preallocate, but the normal case has
Windows managing the swap file and growing it as needed up to
available disk space.  If it is doable in windows, you'd think there'd
be some way of doing it in Linux, but perhaps linux's complexity
doesn't allow for that type of feature.

As for disk-space reserves, if you have 5% reserved for
root' on a 20G ext disk, that still amounts to 1G reserved for root.
Seems an automatically sizing swap file might be just fine for some people
not me, I don't even like to use swap, but I'm not my mom using windows ME either).

But, conversely, if it's coming out of space I wouldn't normally
use anyway -- say the "5%" -- i.e. the 5% is something I'd likely only
use under *rare* conditions.  I might have enough memory and the
right system load that I also 'rarely' use swap -- so not reserving
1G/1G (2xMEM) on my laptop both of which will rarely get used seems like
a waste of 2G.  I suppose if I put it that way I might convince myself
to use it,

--
The above thoughts and   | They may have nothing to do with
writings are my own. | the opinions of my employer. :-)
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4 and 2GB swap partition limit

2001-04-27 Thread LA Walsh

Rogier Wolff wrote:

> > > On Linux any swap adds to the memory pool, so 1xRAM would be
> > > equivalent to 2xRAM with the old old OS's.
> >
> > no more true AFAIK
>
> I've always been trying to convice people that 2x RAM remains a good
> rule-of-thumb.

---
Ug.  I like to view swap as "low grade memory" -- i.e. I really
should spend 99.9% of my time in RAM -- if I spill, then it means
I'm running too much/too big for my computer and should get more RAM --
meanwhile, I suffer with performance degradation to remind me I'm really
exceeding my machine's physical memory capacity.

An interesting option (though with less-than-stellar performance
characteristics) would be a dynamically expanding swapfile.  If you're
going to be hit with swap penalties, it may be useful to not have to
pre-reserve something you only hit once in a great while.

Definitely only for systems where you don't expect to use swap (but
it could be there for "emergencies" up to some predefined limit or
available disk space).

--
The above thoughts and   | They may have nothing to do with
writings are my own. | the opinions of my employer. :-)
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] SMP race in ext2 - metadata corruption.

2001-04-27 Thread LA Walsh

Andrzej Krzysztofowicz wrote:

> I know a few people that often do:
>
> dd if=/dev/hda1 of=/dev/hdc1
> e2fsck /dev/hdc1
>
> to make an "exact" copy of a currently working system.

---
Presumably this isn't a problem is the source disks are either unmounted or 
mounted 'read-only' ?


--
The above thoughts and   | They may have nothing to do with
writings are my own. | the opinions of my employer. :-)
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [QUESTION] 2.4.x nice level

2001-04-02 Thread LA Walsh

Quim K Holland wrote:
> 
> > "BS" == BERECZ Szabolcs <[EMAIL PROTECTED]> writes:
> 
> BS> ... a setiathome running at nice level 19, and a bladeenc at
> BS> nice level 0. setiathome uses 14 percent, and bladeenc uses
> BS> 84 percent of the processor. I think, setiathome should use
> BS> max 2-3 percent.  the 14 percent is way too much for me.
> BS> ...
> BS> with kernel 2.2.16 it worked for me.
> BS> now I use 2.4.2-ac20
---
I was running 2 copies of setiathome on a 4 CPU server
@ work.  The two processes ran nice'd -19.  The builds we were 
running still took 20-30% longer as opposed to when setiathome wasn't
running (went from 45 minutes up to about an hour).  This machine
has 1G, so I don't think it was hurting from swapping.

I finally wrote a script that checked every 30 seconds -- if the
load on the machine climbed over '4', the script would SIGSTOP the
seti jobs.  Once the load on the machine fell below 2, it would 
send a SIGCONT to them.  

I was also running setiathome on my laptop for a short while --
but the fan kept coming on and the computer would get really hot.
So I stopped that.  Linux @ idle doesn't seem to ever kick on
the fan, but turn on a CPU cruching program and it sure seemed
to heat up the machine.  I still wonder how many kilo or mega watts
go to running dispersed computation programs.  Just one of those
things I may never know

-l

-- 
The above thoughts and   | They may have nothing to do with
writings are my own. | the opinions of my employer. :-)
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: unistd.h and 'extern's and 'syscall' "standard(?)"

2001-04-01 Thread LA Walsh

Andreas Schwab wrote:
> Don't use kernel headers in user programs.  Just use syscall(3).
> 
> Andreas.
---
I'm on a SuSE71 system and have all the manpages installed:
law> man syscall
No manual entry for syscall

The problem is not so much for user programs as library
writers that write support libraries for kernel calls.  For 
example there is libcap to implement posix capabilities on top
of the kernel call.  We have a libaudit to implement posix-auditing
on top a a few kernel calls.  It's the "system" library to system-call
interface that's the problem, mainly.  On ia64, it doesn't seem
like there is a reliable, cross-distro, cross architecture way of
interfacing to the kernel.

In saying "use syscall(3)" (which is undocumented on
my SuSE system, and on a RH61 sytem), implies it is in some
library.  I've heard rumors that the call isn't present in RH
distros and they claim its because it's not exported from glibc.
Then I heard glibc said it wasn't their intention to export it.
(This is all 2nd hand, so forgive me if I have parties or details
confused or mis-stated). It seems like kernel source points to an 
external source, Vender points at glibc, glibc says not their intention.
Meanwhile, an important bit of kernel functionality --
being able to use syscall0, syscall1, syscall2...etc, ends up
missing for those wanting to construct libraries on top of the
kernel.

I end up being rather perplexed about the correct course
of action to take.  Seeing as you work for suse, would you know
where this 'syscall(3)' interface should be documented?  Is it
supposed to be present in all distro's?  


Thanks,
-linda
-- 
The above thoughts and   | They may have nothing to do with
writings are my own. | the opinions of my employer. :-)
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



unistd.h and 'extern's and 'syscall' "standard(?)"

2001-04-01 Thread LA Walsh


I have a question.  Some architectures have "system calls"
implemented as library calls (calls that are "system calls" on ia32)
For example, the expectation on 'arm', seems to be that sys_sync
is in a library.  On alpha, sys_open appears to be in a library.
Is this correct?

Is it the expectation that the library that handles this
is the 'glibc' for that platform or is there a special "kernel.lib"
that goes with each platform?

Is there 1 library that I need to link my apps with to
get the 'externs' referenced in "unistd.h"?

The reason I ask is that in ia64 the 'syscall' call
isn't done with inline assembler but is itself an 'extern' call.
This implies that you can't do system calls directly w/o some 
support library.

The implication of this is that developers working on
platform independent system calls and library functions, for
example, extended attributes, audit or MAC, can't provide
platform independent patches w/o also providing their own
syscall implementation for ia64.

This came up as a problem when we wanted to provide a
a new piece of code but found it wouldn't link on some distributions.
In inquiry there seems to be some confusion regarding who is responsible
for providing this the code/library to satisfy this 'unistd.h' extern.

Should something so basic as the 'syscall' interface be provided
in the kernel sources, perhaps as a kernel-provided 'lib', or is
it expected it will be provided by someone else or is it expected
that each developer should provide their own syscall implementation for
ia64?

Thanks,
-linda
-- 
The above thoughts and   | They may have nothing to do with
writings are my own. | the opinions of my employer. :-)
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 64-bit block sizes on 32-bit systems

2001-03-27 Thread LA Walsh

Jan Harkes wrote:
> 
> On Tue, Mar 27, 2001 at 01:57:42PM -0600, Jesse Pollard wrote:
> > > Using similar numbers as presented. If we are working our way through
> > > every single block in a Pentabyte filesystem, and the blocksize is 512
> > > bytes. Then the 1us in extra CPU cycles because of 64-bit operations
> > > would add, according to by back of the envelope calculation, 2199023
> > > seconds of CPU time a bit more than 25 days.
> >
> > Ummm... I don't think it adds that much. You seem to be leaving out the
> > overlap disk/IO and computation for read-ahead. This should eliminate the
> > majority of the delay effect.
> 
> 1024 TB should be around 2*10^12 512-byte blocks, divide by 10^6 (1us)
> of "assumed" overhead per block operation is 2*10^6 seconds, no I
> believe I'm pretty close there. I am considering everything being
> "available in the cache", i.e. no waiting for disk access.
---
If everything being used is only used from the cache, then
the application probably doesn't need 64-bit block support.  

I submit that your argument may be flawed in the assumption that
if an application needs multi-terabyte files and devices, that most
of the data will be in the in-memory cache. 
 

> The time to update the pagetables is identical to the time to update a
> 4KB page when the OS is using a 2MB pagesize. Ofcourse it will take more
> time to load the data into the page, however it should be a consecutive
> stretch of data on disk, which should give a more efficient transfer
> than small blocks scattered around the disk.
---
Not if you were doing alot of random reads where you only
needd 1-2K of data.  The read-time of the extra 2M-1K would seem
to eat into any performance boot gained by the large pagesize.

> 
> > Granted, 512 bytes could be considered too small for some things, but
> > once you pass 32K you start adding a lot of rotational delay problems.
> > I've used file systems with 256K blocks - they are slow when compaired
> > to the throughput using 32K. I wasn't the one running the benchmarks,
> > but with a MaxStrat 400GB raid with 256K sized data transfer was much
> > slower (around 3 times slower) than 32K. (The target application was
> > a GIS server using Oracle).
> 
> But your subsystem (the disk) was probably still using 512 byte blocks,
> possibly scattered. And the OS was still using 4KB pages, it takes more
> time to reclaim and gather 64 pages per IO operation than one, that's
> why I'm saying that the pagesize needs to scale along with the blocksize.
> 
> The application might have been assuming a small block size as well, and
> the OS was told to do several read/modify/write cycles, perhaps even 512
> times as much as necessary.
> 
> I'm not saying that the current system will perform well when working
> with large blocks, but compared to increasing the size of block_t, a
> larger blocksize has more potential to give improvements in the long
> term without adding an unrecoverable performance hit.
---
That's totally application dependent.  Database applications
might tend to skip around in the data and do short/reads/writes over
a very large file.  Large block sizes will degrade their performance.

This was the idea of making it a *configurable* option.  If
you need it, configure it.  Same with block size -- that should
likely have a wider range for configuration as well.  But
configuration (and ideally auto-configuration where possible)
seems the ultimate win-win situation.

-l
-- 
The above thoughts are my own and do not necessarily represent those
of my employer.
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 64-bit block sizes on 32-bit systems

2001-03-27 Thread LA Walsh

Ion Badulescu wrote:
> Are you being deliberately insulting, "L", or are you one of those users
> who bitch and scream for features they *need* at *any cost*, and who
> have never even opened up the book for Computer Architecture 101?
---
Sorry, I was borderline insulting.  I'm getting pressure on
personal fronts other than just here.  But my degree is in computer
science and I've had almost 20 years experience programming things
as small as 8080's w/ 4K ram on up.  I'm familiar with 'cost' of
emulation.

> Let's try to keep the discussion civilized, shall we?
---
Certainly.
> 
> Compile option or not, 64-bit arithmetic is unacceptable on IA32. The
> introduction of LFS was bad enough, we don't need yet another proof that
> IA32 sucks. Especially when there *are* better alternatives.
===
So if it is a compile option -- the majority of people
wouldn't be affected, is that in agreement?  Since the default would
be to use the same arithmetic as we use  now.

In fact, I posit that if anything, the majority of the people
might be helped as the block_nr becomes a a 'typed' value -- and
perhaps the sector_nr as well.  They remain the same size, but as
a typed value the kernel gains increased integrity from the increased
type checking.  At worst, it finds no new bugs and there is no impact
in speed.  Are we in agreement so far?

Now lets look at the sites want to process terabytes of
data -- perhaps files systems up into the Pentabyte range.  Often I
can see these being large multi-node (think 16-1024 clusters as 
are in use today for large super-clusters).  If I was to characterize
the performance of them, I'd likely see the CPU pegged at 100% 
with 99% usage in user space.  Let's assume that increasing the
block size decreases disk accesses by as much as 10% (you'll have
to admit -- using a 64bit quantity vs. 32bit quantity isn't going
to even come close to increasing disk access times by 1 millisecond,
really, so it really is going to be a much smaller fraction when
compared to the actual disk latency).  

Ok...but for the sake of
argument using 10% -- that's still only 10% of 1% spent in the system.
or a slowdown of .1%.  Now that's using a really liberal figure
of 10%.  If you look at the actual speed of 64 bit arithmatic vs.
32, we're likely talking -- upper bound, 10x the clocks for 
disk block arithmetic.  Disk block arithmetic is a small fraction
of time spent in the kernel.  We have to be looking at *maximum*
slowdowns in the range of a few hundred maybe a few thousand extra clocks.
A 1000 extra clocks on a 1G machine is 1 microsecond, or approx
1/5000th your average seek latency on a *fast* hard disk.  So
instead of 10% slowdown we are talking slowdowns in the 1/1000 range
or less.  Now that's a slowdown in the 1% that was being spent in
the kernel, so now we've slowdown the total program speed by .001%
at the increase benefit (to that site) of being able to process
those mega-gig's (Pentabytes) of information.  For a hit that is
not noticable to human perception, they go from not being able to
use super-clusters of IA32 machines (for which HW and SW is cheap), 
to being able to use it.  That's quite a cost savings for them.

Is there some logical flaw in the above reasoning?

-linda
-- 
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 64-bit block sizes on 32-bit systems

2001-03-26 Thread LA Walsh

Manfred Spraul wrote:
> Which field do you access? bh->b_blocknr instead of bh->r_sector?
---
Yes.
> 
> There were plans to split the buffer_head into 2 structures: buffer
> cache data and the block io data.
> b_blocknr is buffer cache only, no driver should access them.
---
My 'device' only lives in the buffer cache.  I write
to the device 95% only from kernel space and then it is read
out in large 256K reads by a user-land daemon to copy to a file.
The user-land daemon may also use 'sendfile' to pull the
data out of the device and copy it to a file which should, as I
understand it, result in a kernel only copy from the device
to the output file buffers -- meaning no copy of the data
to user space would be needed.  My primary 'dig' in all this is the 
32-bit block_nr's in the buffer cache.

-l

-- 
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 64-bit block sizes on 32-bit systems

2001-03-26 Thread LA Walsh

Manfred Spraul wrote:
> 
> >4k page size * 2GB = 8TB.
> 
> Try it.
> If your drive (array) is larger than 512byte*4G (4TB) linux will eat
> your data.
---
I have a block device that doesn't use 'sectors'.  It
only uses the logical block size (which is currently set for
1K).  Seems I could up that to the max blocksize (4k?) and
get 8TB...No?

I don't use the generic block make request (have my
own).  

-- 
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 64-bit block sizes on 32-bit systems

2001-03-26 Thread LA Walsh


Matthew Wilcox wrote:
> 
> On Mon, Mar 26, 2001 at 08:39:21AM -0800, LA Walsh wrote:
> > I vaguely remember a discussion about this a few months back.
> > If I remember, the reasoning was it would unnecessarily slow
> > down smaller systems that would never have block devices in
> > the 4-28T range attached.
> 
> 4k page size * 2GB = 8TB.
---
Drat...was being more optimistic -- you're right
the block_nr can be negative.  Somehow thought page size could
be 8Kliving in future land.  That just makes the limitations
even closer at hand...:-(

> you keep on trying to increase the size of types without looking at
> what gcc outputs in the way of code that manipulates 64-bit types.
---
Maybe someone will backport some of the features of the
IA-64 code generator into 'gcc'.  I've been told that in some 
cases it's a 2.5x performance difference.  If 'gcc' is generating
bad code, then maybe the 'gcc' people will increase the quality
of their code -- I'm sure they are just as eagerly working on
gcc improvements as we are kernel improvements.  When I worked
on the PL/M compiler project at Intel, I know our code-optimization
guy would spend endless cycles trying to get better optimization
out of the code.  He got great joy out of doing so. -- and
that was almost 20 years ago -- and code generation has come
a *long* way since then.

> seriously, why don't you just try it?  see what the performance is.
> see what the code size is.  then come back with some numbers.  and i mean
> numbers, not `it doesn't feel any slower'.
---
As for 'trying' it -- would anyone care if we virtualized
the block_nr into a typedef?  That seems like it would provide
for cleaner (type-checked) code at no performance penalty and
more easily allow such comparisons.

Well this is my point: if I have disks > 8T, wouldn't
it be at *all* beneficial to be able to *choose* some slight
performance impact and access those large disks vs. having not
choice?  Having it as a configurable would allow a given 
installation to make that choice rather than them having no
choice.  BTW, are block_nr's on RAID arrays subject to this
limitation?
> 
> personally, i'm going to see what the situation looks like in 5 years time
> and try to solve the problem then.
---
It's not the same, but SGI has had customers for over
3 years using >2T *files*.  The point I'm looking at is if
the P-X series gets developed enough, and someone is using a
4-16P system, a corp user might be approaching that limit
today or tomorrow.  Joe User, might not for 5 years, but that's
what the configurability is about.  Keep linux usable for both
ends of the scale -- "I love scalability"

-l

-- 
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338
-- 
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



64-bit block sizes on 32-bit systems

2001-03-26 Thread LA Walsh

I vaguely remember a discussion about this a few months back.
If I remember, the reasoning was it would unnecessarily slow
down smaller systems that would never have block devices in
the 4-28T range attached.  

However, isn't it possible there will continue to be a series
of P-IV,V,VI,VII ...etc, addons that will be used for sometime
to come.  I've even heard it suggested that we might see
2 or more CPU's on a single chip as a way to increase cpu
capacity w/o driving up clock speed.  Given the cheapness of
.25T drives now, seeing the possibility of 4T drives doesn't seem
that remote (maybe 5 years?).  

Side question: does the 32-bit block size limit also apply to 
RAID disks or does it use a different block-nr type?

So...is it the plan, or has it been though about -- 'abstracting'
block numbes as a typedef 'block_nr', then at compile time
having it be selectable as to whether or not this was to
be a 32-bit or 64 bit quantity -- that way older systems would
lose no efficiency.  Drivers that couldn't be or hadn't been
ported to use 'block_nr' could default to being disabled if
64-bit blocks were selected, etc.

So has this idea been tossed about and or previously thrashed?

-l

-- 
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: NCR53c8xx driver and multiple controllers...(not new prob)

2001-03-25 Thread LA Walsh

Here is the 'alternate' output when the ncr53c8xx driver is
compiled in:

SCSI subsystem driver Revision: 1.00
scsi-ncr53c7,8xx : at PCI bus 0, device 8, function 0
scsi-ncr53c7,8xx : warning : revision of 35 is greater than 2.
scsi-ncr53c7,8xx : NCR53c810 at memory 0xfa101000, io 0x2000, irq 58
scsi0 : burst length 16
scsi0 : NCR code relocated to 0x37d6c610 (virt 0xf7d6c610)
scsi0 : test 1 started
scsi0 : NCR53c{7,8}xx (rel 17)
request_module[block-major-8]: Root fs not mounted
VFS: Cannot open root device "807" or 08:07
Please append a correct "root=" boot option
Kernel panic: VFS: Unable to mount root fs on 08:07
-
Note how this compares to the case where the driver is a module:

(note scsi0 was an IDE emulation in this setup -- something also removed in
the above setup)
ncr53c8xx: at PCI bus 0, device 8, function 0
ncr53c8xx: 53c810a detected
ncr53c8xx: at PCI bus 1, device 3, function 0
ncr53c8xx: 53c896 detected
ncr53c8xx: at PCI bus 1, device 3, function 1
ncr53c8xx: 53c896 detected
ncr53c810a-0: rev=0x23, base=0xfa101000, io_port=0x2000, irq=58
ncr53c810a-0: ID 7, Fast-10, Parity Checking
ncr53c810a-0: restart (scsi reset).
ncr53c896-1: rev=0x01, base=0xfe004000, io_port=0x3000, irq=57
ncr53c896-1: ID 7, Fast-40, Parity Checking
ncr53c896-1: on-chip RAM at 0xfe00
ncr53c896-1: restart (scsi reset).
ncr53c896-1: Downloading SCSI SCRIPTS.
ncr53c896-2: rev=0x01, base=0xfe004400, io_port=0x3400, irq=56
ncr53c896-2: ID 7, Fast-40, Parity Checking
ncr53c896-2: on-chip RAM at 0xfe002000
ncr53c896-2: restart (scsi reset).
ncr53c896-2: Downloading SCSI SCRIPTS.
scsi1 : ncr53c8xx - version 3.2a-2
scsi2 : ncr53c8xx - version 3.2a-2
scsi3 : ncr53c8xx - version 3.2a-2
scsi : 4 hosts.
  Vendor: SEAGATE   Model: ST318203LCRev: 0002
  Type:   Direct-Access  ANSI SCSI revision: 02
Detected scsi disk sda at scsi2, channel 0, id 1, lun 0
  Vendor: SGI   Model: SEAGATE ST318203  Rev: 2710
  Type:   Direct-Access  ANSI SCSI revision: 02
Detected scsi disk sdb at scsi2, channel 0, id 2, lun 0
  Vendor: SGI   Model: SEAGATE ST336704  Rev: 2742


This is on a 4x550 PIII(Xeon) system.  The 2nd two 
controllers are on pci bus 1.  The boot disk is sda, which is off of
scsi2 in the working example, or scsi1 in the non-working example.

It seems that compiling it in somehow causes controllers
1 and 2 (which are off of the 2nd pci bus, "1", to get missed during 
scsi initialization.  Is there a parameter I need to pass to the 
ncr53c8xx driver to get it to scan the 2nd bus?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



NCR53c8xx driver and multiple controllers...(not new prob)

2001-03-24 Thread LA Walsh

I have a machine with 3 of these controllers (a 4 CPU server).  The
3 controllers are:
ncr53c810a-0: rev=0x23, base=0xfa101000, io_port=0x2000, irq=58
ncr53c810a-0: ID 7, Fast-10, Parity Checking
ncr53c896-1: rev=0x01, base=0xfe004000, io_port=0x3000, irq=57
ncr53c896-1: ID 7, Fast-40, Parity Checking
ncr53c896-2: rev=0x01, base=0xfe004400, io_port=0x3400, irq=56
ncr53c896-2: ID 7, Fast-40, Parity Checking
ncr53c896-2: on-chip RAM at 0xfe002000

I'd like to be able to make a kernel with the driver compiled in and
no loadable module support.  It don't see how to do this from the
documentation -- it seems to require a separate module loaded for
each controller.  When I compile it in, it only see the 1st controller
and the boot partition I think is on the 3rd.  Any ideas?

This problem is present in the 2.2.x series as well as 2.4.x (x up to 2).

Thanks,
-linda
-- 
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Is swap == 2 * RAM a permanent thing?

2001-03-15 Thread LA Walsh

The not reclaiming swap space is flawed in more than once instance.
Suppose my P1 and P2 have their swap reserved -- now both grow.
P3 is idle but can't fit in swap.  This is going to result in fragmentation
no?  How is this fragmentation less worse than just freeing swap.

Ever since Ram sizes got to about 256M, I've tended toward using swap spaces 
about half my RAM size -- thinking of swap as an 'overflow' place that
really shouldn't get used much if at all.  As you mention, not reclaiming
swap space, but having 'double-reservations' for previously swapped
programs becomes a problem fast in this situation.  Makes the swap
much less flexible.

-- 
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: (struct dentry *)->vfsmnt;

2001-03-09 Thread LA Walsh

Alexander Viro wrote:
> No such thing. The same fs may be present in many places. Please,
> describe the situation - where do you get that dentry from?
> Cheers,
> Al
---

Al,
I'm getting it from various places, 1) if I want to know the
path relative to the root of the dentry at the end of 'path_walk'
or __user_path_walk (as used in truncate)  and
2) If I've gotten a dentry as in sys_fchdir/fchown/fstat/newfstat 
from a file descriptor and I want the absolute path or if multple
(such as multiple mounts of the same fs in different locations), the
one that the user used to access the dentry.

In 2.2 there was a way to get the path only from the
dentry (d_path) -- I'm looking for similar functionality for the
above cases.

Is it such that in 2.2 dentries were only relative to root
where in 2.4 they are relative to their mount point and instead of
duplicate dcache entries for each possible mount point, they get stored
as one?  

If that's the case, then while I might get a path for user-path
walk, if I just have a 'fd', it may not be poasible to backtrace into
the path the user used to access the file?

Just some wild speculations on my part:-/...did
I refine the question enough?

thanks,
-linda


-- 
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



(struct dentry *)->vfsmnt;

2001-03-09 Thread LA Walsh

Could someone enlighten me as to the purpose of this field in the
dentry struct?  There is no elucidating comment in the header for this
particular field and the name/type only indicate it is pointing to
a list of vfsmounts.  Can a dentry belong to more than one vfsmount?

If I have a 'dentry' and simply want to determine what the absolute
path from root is, in the 'd_path' macro, would I use 'rootmnt' of my
current->fs as the 'vfsmount' as well?

Thanks, in advance...
-linda


-- 
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-53
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Elevator algorithm parameters

2001-03-08 Thread LA Walsh

I hate when that happens...

LA Walsh wrote:
> If you ask for code from me, it'll be a while -- My read and write
...Q's are rather full right now with some higher priority I/O...:-)
-l
-- 
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Elevator algorithm parameters

2001-03-08 Thread LA Walsh

I have a few comments/questions on the elv. alg. as it is now.  Some
of them may be based on a flawed understanding, but please be patient
anyway :-).

1) read-ahead is given the same 'latency' [max-wait priority] as 'read'
   I can see r-a as being less important than 'read' -- 'read' means
   some app is blocked waiting for input *now*.  'ra' -- means the
   kernel is being clever in hopes it is predicting a usage pattern where
   reading ahead will be useful.  I'd be tempted to give read-ahead
   a higher acceptable latency than reads and possibly higher than
   writes.  By definition, 'ra' i/o is i/o that no one currently has
   requested be done.
   a) the code may be there, but if a read request comes in for a
  sector marked for ra, then the latency should be set to 
  min(r-latency,remaining ra latency)

2) I seem to notice a performance boost for my laptop setting the
   read latency down to 1/8th of the write (2048/16384) instead of
   the current 1:2 ratio.  

   I am running my machine as a nfs server as well as doing local tasks
   and compiles.  I got better overall performance because nfs requests
   got serviced more quickly to feed a data-hungry dual-processor
   "compiler-server".  Also, my interactive processes which need
   lots of random reads perform better because they got 'fed' faster
   while some background data transfers (read and writes) of large
   streams of data were going on.

3) It seems that the balance of optimal latency figures would vary
   based on how many cpu-processes are blocked on data-reads, how many
   cpu's are reading from the same disk, the disk speed, the cpu speed
   and available memory for buffering.  Maybe there is a neat wiz-bang
   self-adjusting algorithm that can adapt dynamically to different
   loads (like say detects -- hmmm, we have 100 non mergable read 
   requests plugged, should I wait for more?...well only 1 active write
   request is runningmaybe I should lower the read latency...etc).
   However, in the interim, it seems having the values at least be
   tunable via /proc (rather than the current ioctl) would be useful --
   just able to echo some values into there @ runtime.  I couldn't
   seem to find such a beast in /proc.

Comments/cares?

If you ask for code from me, it'll be a while -- My read and write 
-- 
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



setfsuid

2001-03-07 Thread LA Walsh

Why doesn't setfsuid return -EPERM when it can't perform the operation?
file: kernel/sys.c, 'sys_setfsuid' around line 779 depending on your
source version.

There is a check if capable(CAP_SETUID), that if it fails, doesn't
return an error.  This seems inconsistent.  In fact the manpage
I have on it states:

RETURN VALUE
   On success, the previous value of fsuid is  returned.   On
   error, the current value of fsuid is returned.
BUGS
   No error messages of any kind are returned to the  caller.
   At  the very least, EPERM should be returned when the call
   fails.

-l
-- 
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Annoying CD-rom driver error messages

2001-03-06 Thread LA Walsh

Alan Cox wrote:
> 
> > support to function efficiently -- perhaps that technology needs to be further 
>developed
> > on Linux so app writers don't also have to be kernel experts and experts in all the
> > various bus and device types out there?
> 
> You mean someone should write a libcdrom that handles stuff like that - quite
> possibly
---
More generally -- if I want to know if a DVD has been inserted and of what type
and/or a floppy has been inserted or a removable media of type "X" or perhaps
more generally -- not just if a 'device' has changed but a file or directory?

I think that is what famd is supposed to do, but apparently it does so (I'm 
guessing from the external description) by polling and says it needs kernel support
to be more efficient.  Famd was apparently ported to Linux from Irix where it had
the kernel ability to be notified of changed file-space items (file-space = anything
accessible w/a pathname).


Now if I can just remember where I saw this mythical port of the 'file-access
monitoring daemon'

-l

-- 
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Annoying CD-rom driver error messages

2001-03-06 Thread LA Walsh

Alan Cox wrote:
> 
> >   Then it seems the less ideal question is what is the "approved and 
>recommended
> > way for a program to "poll" such devices to check for 'changes' and 'media type'
> > without the kernel generating spurious WARNINGS/ERRORS?
> 
> The answer to that could probably fill a book unfortunately. You need to use
> the various mtfuji and other ata or scsi query commands intended to notify you
> politely of media and other status changes
---
Taking myself out of the role of someone who knows anything about the kernel --
and only knows application writing in the fields of GUI's and audio, what do you think
I'm going to use to check if their has been a playable CD inserted into the CD drive?

There is an application called 'famd' -- which says it needs some kernel 
support to function efficiently -- perhaps that technology needs to be further 
developed
on Linux so app writers don't also have to be kernel experts and experts in all the
various bus and device types out there?

Just an idea...?
-linda 
-- 
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Annoying CD-rom driver error messages

2001-03-06 Thread LA Walsh

God wrote:
> 
> On Mon, 5 Mar 2001, Alan Cox wrote:
> 
> > > > this isnt a kernel problem, its a _very_ stupid app
> > > ---
> > > Must be more than one stupid app...
> >
> > Could well be. You have something continually trying to open your cdrom and
> > see if there is media in it
> 
> Gnome / KDE? does exactly that... (rather annoying too) ..  what app
> specificaly I don't know...
---
So I'm still wondering what the "approved and recommended" way for a program
to be "automatically" informed of a CD or floppy change/insertion and be able to
informed of media 'type' w/o kernel warnings/error messages.  It sounds like
there is no kernel support for this so far?

Then it seems the less ideal question is what is the "approved and recommended
way for a program to "poll" such devices to check for 'changes' and 'media type'
without the kernel generating spurious WARNINGS/ERRORS?


-- 
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Annoying CD-rom driver error messages

2001-03-05 Thread LA Walsh

Alan Cox wrote:
> 
> > > this isnt a kernel problem, its a _very_ stupid app
> > ---
> >   Must be more than one stupid app...
> 
> Could well be. You have something continually trying to open your cdrom and
> see if there is media in it
---
Is there some feature they *should* be using instead to check for media
presence so I can forward it to their dev-team?

Thanks!
-l

-- 
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Annoying CD-rom driver error messages

2001-03-05 Thread LA Walsh

LA Walsh wrote:
> 
> > this isnt a kernel problem, its a _very_ stupid app
> ---
> Must be more than one stupid app...
> 
> xena:/var/log# rpm -q magicdev
> package magicdev is not installed
> xena:/var/log# locate magicdev
> xena:/var/log#
> xena:/var/log# rpm -qa |grep -i magic
> ImageMagick-5.2.6-4
---

Maybe the stupid app is 'freeamp'?  It only happens when I run it...:-(


-- 
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Annoying CD-rom driver error messages

2001-03-05 Thread LA Walsh

> this isnt a kernel problem, its a _very_ stupid app
---
Must be more than one stupid app...

xena:/var/log# rpm -q magicdev
package magicdev is not installed
xena:/var/log# locate magicdev
xena:/var/log#
xena:/var/log# rpm -qa |grep -i magic
ImageMagick-5.2.6-4



-- 
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Annoying CD-rom driver error messages

2001-03-05 Thread LA Walsh


Slightly less annoying -- when no CD is in the drive, I'm getting:

Mar  5 09:30:42 xena kernel: VFS: Disk change detected on device ide1(22,0)
Mar  5 09:31:17 xena last message repeated 7 times
Mar  5 09:32:18 xena last message repeated 12 times
Mar  5 09:33:23 xena last message repeated 13 times
Mar  5 09:34:24 xena last message repeated 12 times

(22,0 = /dev/hdc,cdrom)

Perturbing.

-l
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Annoying CD-rom driver error messages

2001-03-05 Thread LA Walsh

I have a music play program (freeamp) playing MP3's running.  It has the
feature in that it scans to see if a CD is in the drive and tries to look it up
in CDDB.  Well, I don't have a CD in the drive -- I have a DVD-ROM with UDF file
system on it.  Freeamp doesn't complain, but in my syslog/warnings file, every 5 
seconds
I get:

Mar  5 09:17:00 xena kernel: hdc: packet command error: status=0x51 { DriveReady 
SeekComplete Error }
Mar  5 09:17:00 xena kernel: hdc: packet command error: error=0x50
Mar  5 09:17:00 xena kernel: ATAPI device hdc:
Mar  5 09:17:00 xena kernel:   Error: Illegal request -- (Sense key=0x05)
Mar  5 09:17:00 xena kernel:   Cannot read medium - incompatible format -- (asc=0x30, 
ascq=0x02)
Mar  5 09:17:00 xena kernel:   The failed "Read Subchannel" packet command was:
Mar  5 09:17:00 xena kernel:   "42 02 40 01 00 00 00 00 10 00 00 00 "

Needless to say, this fills up messages/warnings fairly quickly.  If there's no
DVD in the drive or if there is a CD in the drive, I don't notice this problem.

Seems like a undesirable feature for the kernel to write out 7-line error messages
everytime a program polls for a CD and fails.  Is there a way to disable this when I
have a DVD ROM disk in the drive? (vanilla 2.4.2 kernel).

Thanks...
-l


-- 
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



odd memory corrupt problem

2001-02-22 Thread LA Walsh

I have a kernel driver that has a variable (surprise) 'audit_state'.  It's statically
initialized to 0 in the C code.  The only way it can get set on is if the audit modules
are loaded and one makes a system call to enable it.

There is no 'driver' initialization performed.

This code seemed to work in 2.2.17, but not in the 2.4.x series.

Somehow the 'audit_state' variable is being mysteriously set to '1' (which with the
driver not loaded causes less than perfect behavior.  

So I started sprinkling "if (audit_state) BUG();" in various places in the code.
It fails during the pcnet32 driver initialization (compiled in vs. module).  That
in turn calls pci init code which calls net driver code.  That calls 'core/'
register_netdevice, which finally ends up calling run_sbin_hotplug in net/core/dev.c.
That tries to load the program /sbin/hotplug via call_usermodehelper in kmod.c  
That 'schedules' the task and things are still ok, then it goes down on the process 
sem 
to wait until it has started.  The program it is trying to execute "hotplug" which
doesn't exist on my machine...ok, fine (the network interface seems to function just
fine).  The program doesn't exist, but when it gets back from the down(&sem), the
value of "audit_state" has changed to 1.  

Any ideas why?  Not that I'm whining, but a good debugger with a 'watch' capability
would do wonders at this point.  I'm trying to figure out code that has nothing to
do with my driver -- just happens to be randomly stomping on a key variable.  

I suppose something could be stomping on the checks to see if the module is loaded
and something is randomly calling the system call to turn it on, but that seems like
a less likely path.  Note that the system hasn't even gotten up to the point of calling
the 'boot' script yet.

I get the same behavior in 2.4.0, 2.4.1 and 2.4.2 (was hoping some memory corruption
bug got fixed along the way).  

Meanwhile, guess it's on to more debugging linux style -- insert printk's.  How
quaint.

Linda
-- 
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338
p
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



interactive disk performance

2001-02-22 Thread LA Walsh

A problem that I seem to have noticed to some extent or another in the 2.4 series
is that while the elevator algorithm may achieve best disk bandwidth utilization,
it seems to be heavily at the expense of interactive use.

I was running a disk intensive program over nfs, so the nfsd's were quite busy --
usually 3/4 were in 'D' wait.

During this time, I tried to bring up this compose window for the email I am
writing.  It took over 2 minutes to come up.  Now the CPU is 66%idle, 31%in idled --
meaning it's fairly inactive -- everything was waiting on the disk waits.

I'm sure that the file the nfsd's were writing out was one long contiguous stream --
most of which could be coalesced into large multi-block writes.  Somehow it seems
that the multi-block writer was getting 1 block in, then more blocks kept coming
in so fast that the Q would only unplug every once in a while -- and maybe 1
block of an interactive request would go through.

I don't remember the exact timeout or max wait/sector while blocks are being
coalesced, but it seems it heavily favors the heavy disk user.

In Unix design, the CPU algorithm was designed to lower the priority of CPU
intensive tasks such that interactive use got higher priority for short bursts.

Maybe a process should have a disk (and maybe net while we are at it) priority that 
adjusts
based on usage in the way the CPU algorithm adjusts -- then the block structure could
have an added 'priority' field of what the process's priority was when it wrote the
block.  Thus even if a process goes away -- the blocks still retain priority.

Then the elevator algorithm would sort not just by locality but also weighting it 
with the block's priority.  Perhaps it would be a make-time or run-time configurable
whether or not to optimize for disk-throughput, or interactive usage.  Perhaps even
a 'nice' value that allows the user to subjectively prioritize processes.

Possible?  Usefulness?

-l


-- 
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Linux stifles innovation...

2001-02-16 Thread LA Walsh

"David D.W. Downey" wrote:
> 
> Seriously though folks, look at who's doing this!
> 
> They've already tried once to sue 'Linux', were told they couldn't because
> Linux is a non-entity (or at least one that they can not effectively sue
> due to the classification Linux holds), ...
---
Not having a long memory on these things, do you have an article
or reference on this -- I'd love to read about that one.  Sue Linux?  For
what?  Competing?  

Perhaps by saying Open Source is a threat to the "American Way", they
mean they can't effectively 'sue', buy up or destroy it?  

-l

-- 
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



To Linus: kdb in 2.4?

2001-02-13 Thread LA Walsh

I'm wondering about the possibility of re-examining the idea of a kernel debugger
option distributed with 2.4.  

I'm thinking that it could be a great teaching tool to break and examine structures,
variables, process states, as well as an aid to people who may not have a grasp
of the entire kernel but need to write device drivers.

It's easy for someone who's "grown up" with Linux to know it all so thoroughly 
that such a tool seems fluff.  But even the best mechanics on new cars use complex
diagnostic tools to do car repair.  Sure there may be experts that designed the engine
that wouldn't need it, but large numbers of people need to repair cars or modify them 
for
their purposes.  Having tools to aid in that isn't so much a crutch as it is
a learning tool.  It's like being able to look at the characters of the alphabet
individually before one learns to comprehend the entirety of the writings of Buddha.

Certainly Buddha doesn't need to know how to read to know his own writings -- and
certainly, if everyone meditates and 'evolves' to their Buddha nature, they wouldn't
need to read the texts or recognize the letters either.  

But not everyone is at the same place on the mountain (or even the same mountain, for
that matter).

In wisdom, one would, I posit, understand others are in different places and may
find it useful to have tools to learn to read before they comprehend.  

Just my 2-4 cents on the matter...
-- 
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Block driver design issue

2001-02-12 Thread LA Walsh

I have a block driver I inherited that I working on that has a problem and
was wondering for cleaner solutions.

The driver can accept written characters from either userspace programs or from
the kernel.  From userspace it uses sys_write.  That in turn calls block_write.
There's almost 100 lines of duplicated code in a copy of the block_write
code in the driver "block_writek" as well as duplicate code in audit_write vs. 
audit_writek.
The only difference being down in block_write at the "copy_from_user(p,buf,chars); "
which becomes a "memcpy(p,buf,chars)" in the "block_writek" version.  

I find this duplication of code to be inefficient.  Is there a way to dummy up the
the 'buf' address so that the "copy_from_user" will copy the buffer from kernel space?
My assumption is that it wouldn't "just work" (which may also be an invalid 
assumption).

Suggestions?  Abuse?

Thanks!
-linda

-- 
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://vger.kernel.org/lkml/



question on comment in fs.h

2001-02-10 Thread LA Walsh

Excuse my ignorance, but in file include/linux/fs.h, 2.4.x source
in the struct buffer_head, there is a member:
unsigned short b_size;  /* block size */
later there is a member:
char * b_data;  /* pointer to data block (512 byte) */ 

Is the "(512 byte)" part of the comment in error or do I misunderstand
the nature of 'b_size'

-l

-- 
Linda A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



2.4.x Shared memory question

2001-02-04 Thread LA Walsh


Another oddity -- I notice things taking alot more memory
in 2.4.  This coincides with 'top' consistently showing I have 0 shared
memory.  These two observations would have me wondering if I
have somehow misconfigured my system to disallow sharing.  Note
that /proc/meminfo also shows 0 shared memory:

total:used:free:  shared: buffers:  cached:
Mem:  525897728 465264640 606330880 82145280 287862784
Swap: 2709094400 270909440
MemTotal:   513572 kB
MemFree: 59212 kB
MemShared:   0 kB
Buffers: 80220 kB
Cached: 281116 kB
Active:  22340 kB
Inact_dirty:338996 kB
Inact_clean: 0 kB
Inact_target:0 kB
HighTotal:   0 kB
HighFree:0 kB
LowTotal:   513572 kB
LowFree: 59212 kB
SwapTotal:  264560 kB
SwapFree:   264560 kB 

Not that it seems unrelated, but I do have filesystem type shm 
mounted on /dev/shm as suggested for POSIX shared memory.


-- 
Linda A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338



-- 
Linda A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



2.4.2-test1 better on disk lock/freezups

2001-02-04 Thread LA Walsh

In trying to apply Jens's patch I upgraded to 2.4.2-pre1.  The figures on it(242-p1) 
look
better at this point: a vmstat dump, same data...notice this time it only took maybe 45
seconds to write out the data.  I also got better interactive performance.
So write speed is up to about 3.5Mb/s.  Fastest reads using 'hdparm' are in the 
12-14Mb/s
range.  Sooo...IDE hdparm block dev read vs. file writes...3-4:1 ratio?

I honestly have little clue as to what would be considered 'good' numbers.

Note the maximum 'system freeze' seems under 10 seconds now -- alot more 
tolerable.  

Note also, this was without my applying Jens's patch -- as I could not figure out how
to get it to apply cleanly  :-(.


 0  0  0  0  77564  80220 280164   0   0 0   348  287  1367  10   7  83
 0  0  1  0  77560  80220 280164   0   0 0   304  193   225   0   1  99
 0  1  1  0  77572  80220 280156   0   0 0   162  241   354   4   2  95
 0  1  1  0  77572  80220 280156   0   0 0   156  218   182   0   1  99
 1  1  1  0  77560  80220 280164   0   0 0   165  217   218   0   1  99
 0  1  1  0  77328  80220 280164   0   0 0   134  213   215   1   1  97
 0  1  1  0  77328  80220 280164   0   0 0   138  217   177   0   1  98
 0  1  1  0  77328  80220 280164   0   0 0   206  215   178   0   1  99
 0  1  1  0  77332  80220 280164   0   0 0   166  219   206   1   1  98
 0  0  0  0  85632  80220 280172   0   01412  192   360   1   1  98
 
-- 
Linda A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: System unresponsitive when copying HD/HD

2001-02-04 Thread LA Walsh

Alan Cox wrote:
> But try 2.4.1 before worrying too much. That fixed a lot of the block
> performance problems I was seeing (2.4.1 ruins the VM performance under paging
> loads but the I/O speed is fixed ;))

---
Seems to have gotten a bit worse.  Vmstat output after 'vmware' had completed
write -- but system unresponsive and writing out a 155M file...

 1  0  0  0 113960  47528 277152   0   0 0 0  397   861   1  24  75
 1  0  0  0 114060  47560 277152   0   0 4   350  432  1435   4  17  79
 0  0  1  0 127380  47560 266196   0   0 0   516  216   435   7   3  90
 1  0  1  0 127380  47560 266196   0   0 0   240  203   173   0   1  99
 0  0  1  0 127380  47560 266196   0   0 0   434  275   180   0   2  98
 1  0  1  0 127376  47560 266196   0   0 0   218  204   173   0   2  98
 0  0  1  0 127376  47560 266196   0   0 0   288  203   174   0   0 100
 0  0  1  0 127376  47560 266196   0   0 0   337  230   176   0   1  99
 0  0  1  0 127376  47560 266196   0   0 0   267  241   177   0   1  99
 0  0  1  0 127376  47560 266196   0   0 0   210  204   173   0   1  99
 0  0  1  0 127376  47560 266196   0   0 0   204  203   173   0   1  99
 0  0  1  0 127376  47560 266196   0   0 0   216  212   250   0   1  99
 0  0  1  0 127376  47560 266196   0   0 0   208  205   172   0   2  98
 0  0  1  0 127372  47560 266196   0   0 0   225  203   160   0   2  98
 0  0  1  0 127372  47560 266196   0   0 0   316  214   212   0   1  99
 1  0  1  0 127144  47560 266196   0   0 0   281  218   304   1   2  96
 0  0  0  0 127144  47560 266196   0   0 0 1  161   240   1   0  99
 0  0  0  0 127144  47560 266196   0   0 0 0  101   232   0   1  99 
---
What is the meaning of having a process in the 'w' column?  On other
systems, I was used to that meaning an executable had been *swapped* out completely
(as opposed to no pages mapped in) and that it meant your system vm was 'thrashing'.
But that obviously isn't the case here.

Those columns are output from a 'vmstat 5'.  Meaning it took about 70 seconds
to write out 158M.  Or about 2.2M/s.  That's probably not bad.  It still locks
up the system for over a minute though -- which is really undesirable performance
for interactive use.  I'm guessing the vmstat output numbers are showing 4K? 8K? 
blocks?  8K would about make sense for the 2.2M average.

-- 
Linda A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: System unresponsitive when copying HD/HD

2001-02-03 Thread LA Walsh

I've noticed less responsive disk response on 2.4.0 vs. 2.2.17.  For example --
I run vmware and suspend it frequently when I'm not using it.  One of them requires
a 158Mb save file.  Before, I could suspend that one, then start another which
reads in a smaller 50M save file.  The smaller one would come up while the other
was still saving.  As of 2.4, the smaller one doesn't come up -- I can't even do
an 'ls' until the big save finishes.  

Now big image program has actually exited and I can close the window -- the disk
writes are going on from the disk cache with 'kupdate' taking some minor fraction (<1%)
of the CPU and the rest of the system being mostly idle.

If I have vmstat running, I notice blocks trickling out to the disk, 5sec averages
495,142,151,155,136,257,15,0.  Note that the maximum read rate (hdparm -t) of this
disk is in the 12-14M/s range.  I'm getting about 1-5% of that on output with the
system's disk subsystem being apparently unable to do anything else.

This is with IDE hard disk with DMA enabled.

a) is this expected performance on a large linear write?  
b) should I expect other disk operations to be denied service as long as
the write is 'flushing'?

-l
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Power usage Q and parallel make question (separate issues)

2001-02-01 Thread LA Walsh

Keith Owens wrote:
> 
> On Wed, 31 Jan 2001 19:02:03 -0800,
> LA Walsh <[EMAIL PROTECTED]> wrote:
> >This seems to serialize the delete, run the mod-installs in parallel, then run the
> >depmod when they are done.
> 
> It works, until somebody does this
> 
>  make -j 4 modules modules_install
---
But that doesn't work now.  

> There is not, and never has been, any interlock between make modules
> and make modules_install.  If you let modules_install run in parallel
> then people will be tempted to issue the incorrect command above
> instead of the required separate commands.
---

> 
>  make -j 4 modules
>  make -j 4 modules_install
> 
> You gain a few seconds on module_install but leave more room for user
> error.
---
A bit of documentation at the beginning of the Makefile would do wonders
for kernel-developer (not end user, please!) clarity.  I've oft'asked the question
as to what really is supported.  I've tried things like make dep bzImage modules --
I noticed it didn't work fairly quickly.  Same with modules/modules_install -- 
people would probably figure that one out, but just a bit of documentation would
help even that.  



-- 
Linda A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Power usage Q and parallel make question (separate issues)

2001-01-31 Thread LA Walsh

Keith Owens wrote:
>
> The only bit that could run in parallel is this one.
> 
> .PHONY: $(patsubst %, _modinst_%, $(SUBDIRS))
> $(patsubst %, _modinst_%, $(SUBDIRS)) :
> $(MAKE) -C $(patsubst _modinst_%, %, $@) modules_install
> 
> The erase must be done first (serial), then make modules_install in
> every subdir (parallel), then depmod (serial).
---
Right...Wouldn't something like this work?  (Seems to)
--- Makefile.oldWed Jan 31 18:57:21 2001
+++ MakefileWed Jan 31 18:54:53 2001
@@ -351,8 +351,12 @@
 $(patsubst %, _mod_%, $(SUBDIRS)) : include/linux/version.h include/config/MARKER
$(MAKE) -C $(patsubst _mod_%, %, $@) CFLAGS="$(CFLAGS) $(MODFLAGS)" 
MAKING_MODULES=1 modules
 
+modules_inst_subdirs: _modinst_
+   $(MAKE) $(patsubst %, _modinst_%, $(SUBDIRS))
+
+
 .PHONY: modules_install
-modules_install: _modinst_ $(patsubst %, _modinst_%, $(SUBDIRS)) _modinst_post
+modules_install: _modinst_post
 
 .PHONY: _modinst_
 _modinst_:
@@ -372,7 +376,7 @@
 depmod_opts:= -b $(INSTALL_MOD_PATH) -r
 endif
 .PHONY: _modinst_post
-_modinst_post: _modinst_post_pcmcia
+_modinst_post: _modinst_post_pcmcia modules_inst_subdirs
if [ -r System.map ]; then $(DEPMOD) -ae -F System.map $(depmod_opts) 
$(KERNELRELEASE); fi
 
 # Backwards compatibilty symlinks for people still using old versions  
---
This seems to serialize the delete, run the mod-installs in parallel, then run the
depmod when they are done.  
-- 
Linda A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Power usage Q and parallel make question (separate issues)

2001-01-31 Thread LA Walsh

I remember reading some time back that on a pentium the difference between a
pentium in HLT vs. running was about 2-3 watts vs. 15-20 watts.  Does anyone
know the difference for today's CPU's?  P-III/P-IV or other archs?

How about the difference when calling the BIOS power-save feature?  With
the threat of rolling blackouts here in CA, I was wondering what the power
consumption might be of a 100,000 or 1,000,000 CPU's in HLT vs. doing complex
mathematical computation?

Separately -- Parallel Make's
--===
So, just about anyone I know uses make -j X [-l Y] bzImage modules, but I noticed that
make modules_install isn't parallel safe in 2.4 -- since it takes much longer than the
old, it would make sense to want to run it in parallel as well, but it has a 
delete-old, , index-new for deps.  Those "3" steps can't be done
in parallel safely.  Was this intentional or would a 'fix' be desired?

Is it the intention of the Makefile maintainers to allow a parallel or distributed
make?  I know for me it makes a noticable difference even on a 1 CPU machine
(CPU overlap with disk I/O), and with multi CPU machines, it's even more noticable.

Is a make of the kernel and/or the modules designed to be parallel safe?  Is it 
something I should 'rely' on?  If it isn't, should it be?

-l

-- 
Linda A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: seti@home and es1371

2001-01-31 Thread LA Walsh

Try "freeamp".  It uses darn close to 0 CPU and may not be affected by setiathome.
2nd -- renice setiathome to '19' -- you only want it to use up 'background' cputime
anyway



Rainer Wiener wrote:
> 
> Hi,
> 
> I hope you can help me. I have a problem with my on board soundcard and
> seti. I have a Gigabyte GA-7ZX Creative 5880 sound chip. I use the kernel
> driver es1371 and it works goot. But when I run seti@home I got some noise
> in my sound when I play mp3 and other sound. But it is not every time 10s
> play good than for 2 s bad and than 10s good 2s bad and so on. When I kill
> seti@home every thing is ok. So what can I do?
> 
> I have a Athlon 800 Mhz and 128 MB RAM

-- 
Linda A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



2.4 IDE slowdown (misconfigure)

2001-01-24 Thread LA Walsh

This seems to have fixed the 66% slowdown -- disk speeds w/hdparm.  They are
reading in the same range.

For others -- my problem was that I upgraded from a 2.2.x config -- I
thought 'make xconfig' would add additional new params as needed as
'make config' does.  Guess I thought wrong.  

Thanks, Andre, for the quick help/fix!

-linda


> -Original Message-
> From: Andre Hedrick [mailto:[EMAIL PROTECTED]]
> Sent: Tuesday, January 23, 2001 11:40 PM
> To: Linda Walsh
> Subject: Forwarded mail
> 
> 
> 
> CONFIG_BLK_DEV_IDEDMA_PCI=y
> was
> CONFIG_BLK_DEV_IDEDMA=y
> 
> Added a few missing
> 
> 
> Andre Hedrick
> Linux ATA Development
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



2.4 cpu usage...

2001-01-23 Thread LA Walsh

I decided put 2.4 on my laptop.  After getting config issues seemingly
sorted out, still have some things I can't explain.  VMware seems to run
about 30% slower.  X was even sluggish at times.  When I'm doing 'nothing',
top shows about 67% IDLE and 30% in 'system time'.  I notice that
the process "kapm-idled" is being counted as receiving alot of CPU time.
Now this could make some sense maybe that idled is getting 30% of the time,
but then there's the remaining 67% that came up idle.

I shut down X -- then top showed 5% idle and 95% in "kapm-idled"  (and
95% system time) which could still make sense but is probably not the output
you want to see when your computer is really idle.

So the kapm thing could be a "display" / accounting problem, but the
slowdown in vmware/X was real.  I ran a WIN Norton "Benchmark" -- comes
up reliably over "300" -- usually around 320-350 under 2.2.17.  Under
2.4, it came up reliably *under* 300 with typical being about 265".

So...I'm bummed.  I'm assuming a 30% degradation in an app is probably
not expected behavior?  Swap usage is '0' in both OS's (i.e. it's not
a run out of memory issue).

-l

--
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice/Vmail: (650) 933-5338

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: Is sendfile all that sexy?

2001-01-21 Thread LA Walsh

FYI -
Another use sendfile(2) might be used for.  Suppose you were to generate
large amounts of data -- maybe kernel profiling data, audit data, whatever,
in the kernel.

You want to pull that data out as fast as possible and write it to
a disk or network socket.  Normally, I think you'd do a "read/write" that
would xfer the data into user space, then write it back to the target
in system space.  With sendfile, it seems, one could write a dump-daemon
that used sendfile to dump the data directly out to a target file descriptor
w/o it going through user space.

Just make sure the internal 'raw' data is massaged into the format
of a block device and voila!  A side benefit would be that data in the
kernel that is written to the block device would be 'queued' in the
block buffers and them being marked 'dirty' and needing to be written out.
The device driver marks the buffers as clean once they are pushed out
of a fd by doing a 'seek' to a new (later) position in the file -- whole
buffers
before that point are marked 'clean' and freed.

Seems like this would have the benefit of reusing an existing
buffer management system for buffering while also using a single-copy
to get data to the target.

???
-l
--
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice/Vmail: (650) 933-5338


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: Linus's include file strategy redux

2000-12-15 Thread LA Walsh

> From: Werner Almesberger [mailto:[EMAIL PROTECTED]]
> Sent: Friday, December 15, 2000 1:21 PM
> I don't think restructuring the headers in this way would cause
> a long period of instability. The main problem seems to be to
> decide what is officially private and what isn't.
---
If someone wants to restructure headers, that's fine.  I was only
trying to understand the confusingly stated intentions of Linus.  I 
was attempting to fit into those intentions, not change the world.  

> > Any other solution, as I see it, would break existing module code.
> 
> Hmm, I think what I've outlined above wouldn't break more code than
> your approach. Obviously, modiles currently using "private" interfaces
> are in trouble either way.
---
You've misunderstood.  My approach would break *nothing*.  

If module-public include file includes a private, it would still
work since 'sys' would be a directory under 'include/linux'.  No new
links need be added, needed or referenced.  Thus nothing breaks.

-l
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: Linus's include file strategy redux

2000-12-15 Thread LA Walsh


> From: Werner Almesberger [mailto:[EMAIL PROTECTED]]
>
> I think there are three possible directions wrt visibility of kernel
> headers:
>
>  - non at all - anything that needs kernel headers needs to provide them
>itself
>  - kernel-specific extentions only; libc is self-contained, but user
>space can get items from .../include/linux (the current glibc
>approach)
>  - share as much as possible; libc relies on kernel for "standard"
>definitions (the libc5 approach, and also reasonably feasible
>today)
>
> So we get at least the following levels of visibility:
>
>  0) kernel-internal interfaces; should only be visible to "base" kernel
>  1) public kernel interfaces; should be visible to modules (exposing
> type 0 interfaces to modules may create ways to undermine the GPL)
>  2) interfaces to kernel-specific user space tools (modutils, mount,
> etc.); should be visible to user space that really wants them
>  3) interface to common non-POSIX extensions (BSD system calls, etc.);
> should be visible to user space on request, or on an opt-out basis
>  4) interfaces to POSIX elements (e.g. struct stat, mode_t); should be
> visible unconditionally (**)
---
The problem came in a case where I had a kernel module that included
standard memory allocation .  That file, in turn, included
, then that included  and .  From
there more and more files were included until it got down to files in
a kernel/kernel-module only directory "".  It was at that
point, the externally compiled module "barfed", because like many modules,
it expected, like many externally compiled modules, that it could simply
access all of it's needed files through /usr/include/linux which it gets
by putting /usr/include in it's path.  I've seen commercial modules like
vmware's kernel modules use a similar system where they expect
/usr/include/linux to contain or point to headers for the currently running
kernel.

So I'm doing my compile in a 'chrooted' environment where the headers
for the new kernel are installed.  However, now, with the new include/kernel
dir in the linux kernel, modules compiled separately out of the kernel
tree have no way of finding hidden kernel include files -- even though
those files may be needed for modules.  Precisely -- in this case, "memory
allocation" for the kernel (not userland) was needed.  Arguably, this
belongs(ed)
in a kernel-only directory.  If that directory is not /usr/include/linux or
*under* /usr/include/linux, then modules need a separate way to find it --
namely a new link in /usr/include() to point to the new location,
or we move the internal kernel interfaces to something under the current
 so while the intent of "kernel-only" is made clear, they
are still accessible in the way they already are, thus not requiring
rewrites
of all the existing makefiles.


I think in my specific case, perhaps, linux/malloc.h *is* a public
interface that is to be included by module writers and belongs in the
'public interface dir -- and that's great.  But it includes files like
'slab.h' which are kernel mm-specific that may change in the future.  Those
files should be in the private interface dir.  But that dir may still need
to be included by public interface (malloc) file.

So the user should/needs to be blind to how that is handled.  They
shouldn't have to change their makefiles or add new links just because
how 'malloc' implements its functionality changes.  This would impy that
kernel only interfaces need to be include-able within the current
model -- just moved out of the existing "public-for-module" interface
directory (/usr/include/linux).  For that to happen transparently, that
directory needs to exist under the current hierarchy (under
/usr/include/linux),
not parallel.

So at that point it becomes what we should name it under
/usr/include/linux.  Should it be:

1) "/usr/include/linux/sys" (my preference)
2) "/usr/include/linux/kernel"
3) "/usr/include/linux/private"
4) "/usr/include/linux/kernel-only"
5) 

???

Any other solution, as I see it, would break existing module code.

Comments??  Any preferences from /dev/linus?

Any flaws in my logic chain?

tnx,
-linda

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: Linus's include file strategy redux

2000-12-14 Thread LA Walsh

> Huh?
> % ls -ld /usr/include/linux
> drwxr-xr-x6 root root18432 Sep  2 22:35
> /usr/include/linux/
>
> > So if we create a separate /usr/src/linux/include/kernel dir, does that
> > imply that we'll have a 2nd link:
>
> What 2nd link? There should be _no_ links from /usr/include to the
> kernel tree. Period. Case closed.
---

> ll -d /usr/include/linux
lrwxrwxrwx   1 root root   26 Dec 25  1999 /usr/include/linux ->
../src/linux/include/linux/
---

I've seen this setup on RH, SuSE and Mandrake systems.  I thought
this was somehow normal practice?


> Stuff in /usr/include is private libc copy extracted from some kernel
> version. Which may have _nothing_ to the kernel you are developing for.
> In the situation above they should have
> -I/include
> in CFLAGS. Always had to. No links, no pain in ass, no interference with
> userland compiles.
>
> IOW, let them fix their Makefiles.
---

Why would Linus want two separate directories -- one for 'kernel-only'
include files and one for kernel files that may be included in user
land?  It seems to me, if /usr/include/linux was normally a separate
directory there would be no need for him to mention a desire to create
a separate kernel-only include directory, so my assumption was the
linked behavior was somehow 'normal'.

I think many source packages only use "-I /usr/include" and
make no provision for compiling against kernel header files in
different locations that need to be entered by hand. It is difficult
to create an automatic package regeneration mechanism like RPM if such
details need to be entered for each package.

So what you seem to be saying, if I may rephrase, is that
the idea of automatic package generation for some given kernel is
impractical because users should be expected to edit each package
makefile for their own setup with no expectation from the packages
designers of a standard kernel include location?

I'm not convinced this is a desirable goal.

:-/
-linda



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Linus's include file strategy redux

2000-12-14 Thread LA Walsh

So, I brought up the idea of a linux/sys for kernel level include files.

A few other people came up with a desire of a 'kernel' dir under
include, parallel w/linux.


So I ran into a snag with that scenario.  Let's suppose we have
a module developer or a company developing a driver in their own
/home/nvidia/video/drivers/newcard directory.  Now they need to include
kernel
development files and are used to just doing the:
#include 

Which works because in a normal compile environment they have /usr/include
in their include path and /usr/include/linux points to the directory
under /usr/src/linux/include.

So if we create a separate /usr/src/linux/include/kernel dir, does that
imply that we'll have a 2nd link:

/usr/include/kernel ==> /usr/src/linux/include/kernel  ?

If the idea was to 'hide' kernel interfaces and make them not 'easy'
to include doesn't providing a 2nd link defeat that?

If we don't provide a 2nd link, how do module writers access kernel
includes?

If the kernel directory is under 'linux' (as in linux/sys), then the
link is already there and we can just say 'don't use sys in apps'.  If
we create 'kernel' under 'include', it seems we'll still end up having to
tell users "don't include files under directory "x"' (either kernel/ or
linux/sys/)

Note that putting kernel as a new directory parallel to linux requires
adding another symlink -- so is that solving anything or adding more
administrative "gotcha's"?

-linda

--
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice/Vmail: (650) 933-5338

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



include conventions /usr/include/linux/sys ?

2000-11-22 Thread LA Walsh

Linus has mentioned a desire to move kernel internal interfaces into
a separate kernel include directory.  In creating some code, I'm wondering
what the name of this should/will be.  Does it follow that convention
would point toward a linux/sys directory?
-l

--
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice/Vmail: (650) 933-5338

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: IDE0 /dev/hda performance hit in 2217 on my HW - more info - maybe extended partitions

2000-11-14 Thread LA Walsh

It seems to be the output of vmstat that isn't matching things.  First it
says
it's getting near 10M/s, but if you divide 128M/27 seconds, it's more like
4.7.
So where is the time being wasted?  It's not in cpu either.

Now lets look at hda7 where vmstat reported 2-3meg/sec.  Again, the math
says it's a rate near 5.  So it still doesn't make sense.



> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED]]On Behalf Of Andries Brouwer
> Sent: Monday, November 13, 2000 4:59 PM
> To: LA Walsh
> Cc: lkml
> Subject: Re: IDE0 /dev/hda performance hit in 2217 on my HW - more info
> - maybe extended partitions
>
>
> On Mon, Nov 13, 2000 at 03:47:27PM -0800, LA Walsh wrote:
>
> > Some further information in response to a private email, I did
> hdparm -ti
> > under both
> > 2216 and 2217 -- they are identical -- this may be something weird
> > w/extended partitions...
>
> What nonsense. There is nothing special with extended partitions.
> Partitions influence the logical view on the disk, but not I/O.
>
> (But the outer rim of a disk is faster than the inner side.)
>
> Moreover, you report elapsed times
> 0:27, 0:22, 0:24, 0:28, 0:21, 0:24, 0:27
> where is this performance hit?
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: IDE0 /dev/hda performance hit in 2217 on my HW

2000-11-14 Thread LA Walsh

According to hdparm, dma was already on.  It was also suggested I try
setting
32-bit mode and multcount (which I had tried before and not noticed much
difference).
Here's the current settings and results.  Note that the timings still don't
make
alot of sense when comparing them to the vmstat numbers.  All transfers were
256M (bs=256k, count=1k).

/dev/hda:
 multcount= 16 (on)
 I/O support  =  1 (32-bit)
 unmaskirq=  0 (off)
 using_dma=  1 (on)
 keepsettings =  0 (off)
 nowerr   =  0 (off)
 readonly =  0 (off)
 readahead=  8 (on)
 geometry = 3278/240/63, sectors = 49577472, start = 0

   procs  memoryswap  io system
cpu
 r  b  w   swpd   free   buff  cache  si  sobibo   incs  us  sy
id
 0  0  0   1004   3028 436452  11372   0   1  133118  338   757   3  17
80
 0  0  0   1004   3020 436456  11372   0   0 0 1  103   166   0   1
99
/dev/hda
 1  0  0   1004   2932 436464  11420   0   0 2 1  103   166   0   1
99
 1  0  0   1004   2276 432752  11488   0   0 13751 1  319   594   0  12
88
 0  2  0   1004   2704 428192  11456   0   0 11751 2  286   529   0  14
86
 1  0  0   1004   2764 423784  11456   0   0 12685 4  303   557   0  13
87
 1  0  0   1004   3124 418472  11456   0   0 14144 0  323   597   0  18
82
1024+0 records in
1024+0 records out
0.01user 2.60system 0:20.13elapsed 12%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (105major+76minor)pagefaults 0swaps
/dev/hda1
 3  0  0   1004   2772 414760  11456   0   0 11699 1  285   530   0  11
89
 0  1  0   1004   2828 411688  11328   0   0  9037 0  242   439   0  11
89
 1  0  0   1004   2528 411016  11296   0   0  2854 0  146   253   0   2
98
 1  0  0   1004   2208 409680  10840   0   0 11366 0  279   511   0  13
87
 2  0  0   1004   2344 409584  10808   0   0 13542 0  313   588   0  17
83
1024+0 records in
1024+0 records out
0.01user 2.55system 0:26.65elapsed 9%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (104major+76minor)pagefaults 0swaps
/dev/hda3
 2  0  0   1004   2560 409160  11024   0   0 12850 1  308   568   0  16
84
 0  1  0   1004   2832 408904  11024   0   0  8346 1  232   424   0  11
89
 1  0  0   1004   2560 409160  11024   0   0 13568 0  313   583   0  10
90
 2  0  0   1004   2440 409288  11024   0   0 13952 0  320   597   0  22
78
1024+0 records in
1024+0 records out
0.00user 2.81system 0:21.34elapsed 13%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (105major+76minor)pagefaults 0swaps
/dev/hda4
 1  0  0   1004   2308 410064  11132   0   0  8524 1  275   508   0  12
88
 2  0  0   1004   2096 412124  11124   0   0  2317 1  246   454   0  10
90
 1  0  0   1004   2684 413788  11124   0   0  2406 0  252   456   0   9
91
 2  0  0   1004   2564 416376  11096   0   0  2496 0  257   476   0  10
90
 1  0  1   1004   3104 418168  11096   0   0  2470 0  255   464   0   8
92
   procs  memoryswap  io system
cpu
 r  b  w   swpd   free   buff  cache  si  sobibo   incs  us  sy
id
 1  0  0   1004   2884 420344  11096   0   0  2304 1  246   455   0   7
93
1024+0 records in
1024+0 records out
0.00user 2.06system 0:27.79elapsed 7%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (104major+76minor)pagefaults 0swaps
/dev/hda5
 2  0  0   1004   2576 423288  11096   0   0  2880 1  282   521   0  10
89
 1  0  0   1004   2900 425976  11096   0   0  3123 1  297   555   0  11
89
 2  0  0   1004   2164 430124  10916   0   0  3174 0  300   549   0  15
85
 1  0  0   1004   2048 431724  10856   0   0  3072 0  294   548   0  11
89
1024+0 records in
1024+0 records out
0.00user 2.19system 0:21.32elapsed 10%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (104major+76minor)pagefaults 0swaps
/dev/hda6
 2  0  0   1004   2556 432488  10944   0   0  2781 1  278   511   1  10
89
 2  0  0   1004   2104 434284  10944   0   0  3098 1  296   542   0  11
88
 2  0  0   1004   2572 435432  10944   0   0  3174 0  300   564   0  11
89
 1  0  0   1004   3144 435048  10944   0   0  3046 0  292   536   0  12
88
1024+0 records in
1024+0 records out
0.02user 2.15system 0:21.50elapsed 10%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (105major+76minor)pagefaults 0swaps
/dev/hda7
 2  0  0   1004   2556 435672  10944   0   0  3020 1  290   549   0  12
88
 1  0  0   1004   3108 435316  10916   0   0  2278 1  244   441   0   7
93
 2  0  0   1004   2588 436088  10912   0   0  2906 0  283   528   0  10
90
 0  1  0   1004   2324 436596  10908   0   0  2316 0  247   444   0   8
92
 2  0  0   1004   2140 437248  10904   0   0  2893 1  283   527   0  10
90
1024+0 records in
1024+0 records out
0.01user 1.94system 0:24.62elapsed 7%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (104major+76minor)pagefaults 0swaps
 0  0  0   1004   2416 437724  10812   0   0  1920 1  221   399   0   5

RE: IDE0 /dev/hda performance hit in 2217 on my HW - more info - maybe extended partitions

2000-11-13 Thread LA Walsh

It seems to be the output of vmstat that isn't matching things.  First it
says
it's getting near 10M/s, but if you divide 128M/27 seconds, it's more like
4.7.
So where is the time being wasted?  It's not in cpu either.

Now I look at hda7 where vmstat reported 2000-3000 blocks/sec.  Again, the
math
says it's a rate near 5m/s.  So it still doesn't make sense.



> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED]]On Behalf Of Andries Brouwer
> Sent: Monday, November 13, 2000 4:59 PM
> To: LA Walsh
> Cc: lkml
> Subject: Re: IDE0 /dev/hda performance hit in 2217 on my HW - more info
> - maybe extended partitions
>
>
> On Mon, Nov 13, 2000 at 03:47:27PM -0800, LA Walsh wrote:
>
> > Some further information in response to a private email, I did
> hdparm -ti
> > under both
> > 2216 and 2217 -- they are identical -- this may be something weird
> > w/extended partitions...
>
> What nonsense. There is nothing special with extended partitions.
> Partitions influence the logical view on the disk, but not I/O.
>
> (But the outer rim of a disk is faster than the inner side.)
>
> Moreover, you report elapsed times
> 0:27, 0:22, 0:24, 0:28, 0:21, 0:24, 0:27
> where is this performance hit?
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: IDE0 /dev/hda performance hit in 2217 on my HW - more info - maybe extended partitions

2000-11-13 Thread LA Walsh

Some further information in response to a private email, I did hdparm -ti
under both
2216 and 2217 -- they are identical -- this may be something weird
w/extended
partitions...

/dev/hda:
 multcount=  0 (off)
 I/O support  =  0 (default 16-bit)
 unmaskirq=  0 (off)
 using_dma=  1 (on)
 keepsettings =  0 (off)
 nowerr   =  0 (off)
 readonly =  0 (off)
 readahead=  8 (on)
 geometry = 3278/240/63, sectors = 49577472, start = 0

 Model=IBM-DARA-225000, FwRev=SHAOA50A, SerialNo=SQASQ023976
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
 BuffType=3(DualPortCache), BuffSize=418kB, MaxMultSect=16, MultSect=off
 DblWordIO=no, OldPIO=2, DMA=yes, OldDMA=2
 CurCHS=17475/15/63, CurSects=16513875, LBA=yes, LBAsects=49577472
 tDMA={min:120,rec:120}, DMA modes: mword0 mword1 mword2
 IORDY=on/off, tPIO={min:240,w/IORDY:120}, PIO modes: mode3 mode4
 UDMA modes: mode0 mode1 *mode2 mode3 mode4
 Drive Supports : ATA/ATAPI-4 T13 1153D revision 17 : ATA-1 ATA-2 ATA-3
ATA-4
---
Speed comparisons, 2216:
 Timing buffered disk reads:  64 MB in  4.61 seconds = 13.88 MB/sec
 Timing buffered disk reads:  64 MB in  4.65 seconds = 13.76 MB/sec
 Timing buffered disk reads:  64 MB in  4.69 seconds = 13.65 MB/sec
2217:
 Timing buffered disk reads:  64 MB in  4.59 seconds = 13.94 MB/sec
 Timing buffered disk reads:  64 MB in  4.63 seconds = 13.82 MB/sec
 Timing buffered disk reads:  64 MB in  4.56 seconds = 14.04 MB/sec

-

After rebooting several times, I can get equally bad performance on both.
:-(

Here's the key.  I read from /dev/hda, hda1, {hda4, hda5, hda6, hda7} hda3.

The performance in reading from a, a1 and a3 is near or above 10M/s -- but
in the "Extended" partition, rates from 4-7 are all under 3M/s.  So what's
the
deal?  Why do extended partitions drop performance?  Here's the log.  Did
dd's if=device of=/dev/null, bs=128k count=1k.  Timings are interwoven with
vmstat
output:
   procs  memoryswap  io system
cpu
 r  b  w   swpd   free   buff  cache  si  sobibo   incs  us  sy
id
 1  0  0   1928   3188 432352  10976   1   3  3111 3  183   424   3   6
91
 0  0  0   1928   3448 432352  10984   0   0 1 0  125   352   1   1
98
 0  0  0   1928   3356 432352  11016   0   0 1 3  107   180   0   0
99
/dev/hda
 1  0  0   1928   2068 433716  10984   0   0 12597 3  302   598   0  11
89
 1  0  0   1928   2196 433600  10972   0   0  6810 0  208   388   0   6
94
 0  1  0   1928   2132 433668  10968   0   0  8806 0  239   454   0  12
88
 0  1  0   1928   2132 433668  10968   0   0  5914 0  193   357   0   4
96
 2  0  0   1928   2100 430184  10484   0   0 12365 0  295   558   0  12
88
1024+0 records in
1024+0 records out
0.01user 2.31system 0:27.43elapsed 8%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (104major+76minor)pagefaults 0swaps
/dev/hda1
 0  2  0   2572   2120 426948  10268   0 129 1180533  292   544   0  14
86
 0  1  0   2572   2940 422320  10268   0   0 10972 0  275   511   0  11
89
 1  0  0   2572   2660 419024  10268   0   0 10266 2  264   485   0   9
91
 0  1  0   2572   2052 418192  10268   0   0 11789 0  285   554   0  13
87
 2  0  0   2572   2176 418044  10296   0   0 13045 0  307   608   0  17
83
1024+0 records in
1024+0 records out
0.01user 2.83system 0:22.71elapsed 12%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (104major+76minor)pagefaults 0swaps
/dev/hda3
 1  0  0   2572   2048 418168  10296   0   0 14220 0  324   655   0  11
89
 0  1  0   2572   2180 418040  10296   0   0  7027 3  213   398   0   7
93
 0  1  0   2700   2116 418104  10424   0  26  8858 7  240   460   0  10
90
 1  0  0   2956   2112 418464  10288   0  51  965113  253   488   0  17
83
1024+0 records in
1024+0 records out
0.03user 2.65system 0:24.70elapsed 10%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (104major+76minor)pagefaults 0swaps
/dev/hda4
 2  1  0   2952   2736 417752  10424  26   0 13216 0  310   577   0  14
86
 1  0  0   2952   2192 419716  10544  26   0  2159 0  237   428   0   9
91
 1  0  0   2952   2808 419488  10484   0   0  2304 2  247   456   0   9
91
 1  0  0   2948   3092 420260  10476   0   0  2406 1  252   461   0   9
91
   procs  memoryswap  io system
cpu
 r  b  w   swpd   free   buff  cache  si  sobibo   incs  us  sy
id
 1  0  0   2948   2304 421540  10476   0   0  2355 0  249   459   0   7
93
 2  0  0   2948   2588 421604  10476   0   0  2496 0  257   480   0   9
91
1024+0 records in
1024+0 records out
0.01user 2.12system 0:28.64elapsed 7%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (104major+76minor)pagefaults 0swaps
/dev/hda5
 1  0  0   2948   2340 423172  10476   0   0  2394 1  251   471   0   8
92
 1  0  0   3460   2596 425420   9988   0 102  275228  282   512   0  1

writing out disk cache

2000-11-13 Thread LA Walsh

Another question that's been bugging me -- this is behavior that seems
identical in 2216/2217 and not related to my ealier performance degredation
post.

I run VMware.  It runs w/144Mg and writes out a 153M suspend file when I
suspend it to disk.  My system has a total of 512M, so the entire
suspend file gets written to the disk buffers pronto (often under 1 second).

But a 'sync' done afterwards can take anywhere from 20-40 seconds.
vmstat shows a maximum b/o rate of 700, with 200-500 being typical.

So, I know that the maximum write rate through the disk cache is
close to 10,000 blocks/second.  So why when the disk cache of a
large file is 'sync'ed out, do I get such low b/o rates?

Two sample 'vmstat 5' outputs during a sync were:
 1  0  0   6292  13500 254572 165712   0   0 1 0  119   282   1   1
98
 2  0  0   6292  13444 254572 165716   0   0 0   702  279   534   0   2
98
 1  1  0   6292  13444 254572 165716   0   0 0   501  352   669   0   1
99
 0  1  0   6292  13444 254572 165716   0   0 0   520  372   697   0   2
97
 1  0  0   6292  13444 254572 165716   0   0 0   510  367   694   0   2
98
 0  1  0   6292  13444 254572 165716   0   0 0   694  379   715   0   2
98
 1  0  1   6292  13444 254572 165716   0   0 0   618  391   964   0   2
98
 0  1  1   6292  13444 254572 165716   0   0 0   441  302   765   0   1
98
 0  0  0   6292  13496 254572 165716   0   0 063  180   355   1   1
99
 0  0  0   6292  13496 254572 165716   0   0 0 0  103   195   0   1
99
and
 0  0  0   6228  18836 246036 167824   0   0 0 0  232   563   6  13
82
 0  1  0   6228  18784 246036 167824   0   0 0   506  175   489   2   1
97
 1  0  0   6228  18780 246036 167824   0   0 0   292  305   647   0   1
99
 0  1  0   6228  18780 246036 167824   0   0 0   253  285   602   0   1
99
 0  1  0   6228  18780 246036 167824   0   0 0   226  289   612   0   1
99
 1  0  0   6228  18832 246036 167824   0   0 0   157  199   406   0   1
99
 0  0  0   6228  18832 246036 167824   0   0 0 0  101   240   1   1
99
---
Another oddity -- If you add up the rates in the 2nd example, and multiply
the average rate by 5, you get around 5200 blocks written out (for a 152M
file).
Note that a du on it shows it to use 155352, so it isn't that it is sparse.

Is vmstat an unreliable measure?  The above tests were on a 2216 system.

-l

--
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice/Vmail: (650) 933-5338

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



IDE0 /dev/hda performance hit in 2217 on my HW

2000-11-13 Thread LA Walsh

I skimmed over the archives and didn't find a mention of this.  I thought
I'd noticed this when I first installed 2217, but I was too busy to verify
it at the time.

Simple case:
Under 2216, I can do a 'badblocks /dev/hda1 X'.  Vmstat shows about
10,000K/s average.  This is consistent with 'dd' operations I use to copy
partitions for disk mirroring/backup.

Under 2217, the xfer speed drops to near 1,000K/s.  This is for both
'badblocks'
and a 'dd' if=/dev/hda of=/dev/hdb bs=256k.  In both instances, I notice
a near 90% performance degredation.

Haven't tried any latest 2.2.18's -- has there been any work that might
have fixed this problem in 2218.  Am I the only person who noticed this?
I.e. -- maybe it's something peculiar to my HW (Inspiron 7500),
IBM DARA-22.5G HD.



--
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice/Vmail: (650) 933-5338

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: Weightless process class

2000-10-04 Thread LA Walsh




> One problem here is that you might end up with a weightless
> process having grabbed a superblock lock, after which a
> normal priority CPU hog kicks in and starves the weightless
> process.
---
One way would be to set a flag "I'm holding a lock" and when
it releases the lock(s), deschedule it?

> This makes little sense. If the system doesn't page out
> the least used page in the system, the disks will be more
> busy on page faults than they need to be, taking away IO
> bandwidth from more important processes ;)
---
Strictly speaking, true, probably nothing to make an exception for.
-l
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Weightless process class

2000-10-04 Thread LA Walsh

I had another thought regarding resource scheduling -- has the idea
of a "weightless" process been brought up?  Weightless means it doesn't
count toward 'load' and the class strictly has lowest priority in the
system and gets *no* CPU unless there are "idle" cycles.  So even a
process niced to -19 could CPU starve a weightless process.

Perhaps if memory was needed, the paging code would page out weightless
processes first... etc?...

??
-linda

--
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice/Vmail: (650) 933-5338

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: Disk priorities...

2000-10-01 Thread LA Walsh

I wasn't so worried about a 'trick' in this situation.  I'm running
all the processes.  Three of them were clean up and do book-keeping
processes that I didn't care so much about when they ended.  The foreground
process was also mine -- so I'm not so worried about cheating at this
point.  

Specifically, I'm talking about 'nice'd "down" processes -- things
I want to take lower priority over what I am doing in foreground.  I'd
like that to apply to disk, cpu and network usage.  CPU's are getting so
fast these days such that the bottlenecks are more and more becoming 
how fast can you get the data to them.  Disk drives with ms seek times
are a problem.  The seek times on drives.  

I don't think the disk ops have been tuned to minimize seeking,
have they?  -- that'd be a good algorithm to use for same-disk-priority
disk requests.

We'd have to start thinking about disks as 'processors' and have
per-disk queues that are prioritized with the cpu constantly feeding
1 event ahead of the one the disk is currently processing, that way the
cpu can reorder disk operations as needed.

-l

> -Original Message-
> From: Alexander Viro [mailto:[EMAIL PROTECTED]]
> Sent: Sunday, October 01, 2000 1:52 PM
> To: Rik van Riel
> Cc: LA Walsh; lkml
> Subject: Re: Disk priorities...
> 
> 
> 
> 
> On Sun, 1 Oct 2000, Rik van Riel wrote:
> 
> > > And if you mean reads... Good luck propagating the originator
> > > information.
> > 
> > Isn't it the case that for most of the filesystem
> > reads the current process is the one that is the
> > originator of the request ?
> 
> Not true for metadata (consider the access to indirect blocks done by
> pageout, for one thing). Besides, even for data it's wide open to abuse -
> it doesn't take much to trick another process into populating the caches
> for you.
> 
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Disk priorities...

2000-10-01 Thread LA Walsh

Forgive me if this has been asked before, but has there ever been any
thought of having a 'nice' value for disk accesses?.  I was on a
server with 4 CPU's but only 2 SCSI disks.  Many times I'll see 4 processes
on disk wait, 3 of them at a cpu-nice of 19 while the foreground processes
get bogged down by the lower priority processes due to disk contention.

I've also thought before a simple 'netnice' would be good as well -- real
nice and easy to use, lets see:
netnice
disknice
cpunice
nice | -p  , -d , -n 

Just wondering...
-linda

--
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice/Vmail: (650) 933-5338

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/