Re: user limits for 'security'?
I suppose another question related to the first, is 'limit' checking part of the 'standard linux security' that embedded Linux users might find to be a waste of precious code-space? -l -- The above thoughts and| I know I don't know the opinions writings are my own. | of every part of my company. :-) L A Walsh, law at sgi.com | Sr Eng, Trust Technology 01-650-933-5338 | Core Linux, SGI - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
user limits for 'security'?
I've seen some people saying that user-limits are an essential part of a secure system to prevent local DoS attacks. Given that, should a system call like 'fork' return -EPERM if the user has reached their limit? My local manpage (SuSE 7.2 system) says this under fork: ERRORS EAGAIN fork cannot allocate sufficient memory to copy the parent's page tables and allocate a task structure for the child. - Should the man page be updated to reflect that EAGAIN is returned when the user has reached their limit? From a user-monitoring point of view, it might be security relevant to know if a EAGAIN is being returned because the system really is low on resources or if it is a user hitting their limit. -- The above thoughts and| I know I don't know the opinions writings are my own. | of every part of my company. :-) L A Walsh, law at sgi.com | Sr Eng, Trust Technology 01-650-933-5338 | Core Linux, SGI - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
"Eric W. Biederman" wrote: > LA Walsh <[EMAIL PROTECTED]> writes: > > > Now for whatever reason, since 2.4, I consistently use at least > > a few Mb of swap -- stands at 5Meg now. Weird -- but I notice things > > like nscd running 7 copies that take 72M. Seems like overkill for > > a laptop. > > So the question becomes why you are seeing an increased swap usage. > Currently there are two canidates in the 2.4.x code path. > > 1) Delayed swap deallocation, when a program exits after it >has gone into swap it's swap usage is not freed. Ouch. --- Double ouch. Swap is backing a non-existent program? > > > 2) Increased tenacity of swap caching. In particular in 2.2.x if a page >that was in the swap cache was written to the the page in the swap >space would be removed. In 2.4.x the location in swap space is >retained with the goal of getting more efficient swap-ins. But if the page in memory is 'dirty', you can't be efficient with swapping *in* the page. The page on disk is invalid and should be released, or am I missing something? > Neither of the known canidates from increasing the swap load applies > when you aren't swapping in the first place. They may aggrevate the > usage of swap when you are already swapping but they do not cause > swapping themselves. This is why the intial recommendation for > increased swap space size was made. If you are swapping we will use > more swap. > > However what pushes your laptop over the edge into swapping is an > entirely different question. And probably what should be solved. On my laptop, it is insignificant and to my knowledge has no measurable impact. It seems like there is always 3-5 Meg used in swap no matter what's running (or not) on the system. > > I think that is the point -- it was supported in 2.2, it is, IMO, > > a serious regression that it is not supported in 2.4. > > The problem with this general line of arguing is that it lumps a whole > bunch of real issues/regressions into one over all perception. Since > there are multiple reasons people are seeing problems, they need to be > tracked down with specifics. --- Uhhh, yeah, sorta -- it's addressing the statement that a "new requirement of 2.4 is to have double the swap space". If everyone agrees that's a problem, then yes, we can go into specifics of what is causing or contributing to the problem. It's getting past the attitude of some people that 2xMem for swap is somehow 'normal and acceptable -- deal with it". In my case, seems like 10Mb of swap would be all that would generally be used (I don't think I've ever seen swap usage over 7Mb) on a 512M system. To be told "oh, your wrong, you *should* have 1Gig or you are operating in an 'unsupported' or non-standard configuration". I find that very user-unfriendly. > > The swapoff case comes down to dead swap pages in the swap cache. > Which greatly increases the number of swap pages slows the system > down, but since these pages are trivial to free we don't generate any > I/O so don't wait for I/O and thus never enter the scheduler. Making > nothing else in the system runnable. --- I haven't ever *noticed* this on my machine but that could be because there isn't much in swap to begin with? Could be I was just blissfully ignorant of the time it took to do a swapoff. Hmmmlet's see... Just tried it. I didn't get a total lock up, but cursor movement was definitely jerky: > time sudo swapoff -a real0m10.577s user0m0.000s sys 0m9.430s Looking at vmstat, the needed space was taken mostly out of the page cache (86M->81.8M) and about 700K each out of free and buff. > Your case is significantly different. I don't know if you are seeing > any issues with swapping at all. With a 5M usage it may simply be > totally unused pages being pushed out to the swap space. --- Probably -- I guess the page cache and disk buffers put enough pressure to push some things off to swap. -linda -- The above thoughts and | They may have nothing to do with writings are my own. | the opinions of my employer. :-) L A Walsh| Senior MTS, Trust Tech, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
"Eric W. Biederman" wrote: > There are cetain scenario's where you can't avoid virtual mem = > min(RAM,swap). Which is what I was trying to say, (bad formula). What > happens is that pages get referenced evenly enough and quickly enough > that you simply cannot reuse the on disk pages. Basically in the > worst case all of RAM is pretty much in flight doing I/O. This is > true of all paging systems. So, if I understand, you are talking about thrashing behavior where your active set is larger than physical ram. If that is the case then requiring 2X+ swap for "better" performance is reasonable. However, if your active set is truely larger than your physical memory on a consistant basis, in this day, the solution is usually "add more RAM". I may be wrong, but my belief is that with today's computers people are used to having enough memory to do their normal tasks and that swap is for "peak loads" that don't occur on a sustained basis. Of course I imagine that this is my belief as it is my own practice/view. I want to have considerably more memory than my normal working set. Swap on my laptop disk is *slow*. It's a low-power, low-RPM, slow seek rate all to conserve power (difference between spinning/off = 1W). So I have 50% of my phys mem on swap -- because I want to 'feel' it when I goto swap and start looking for memory hogs. For me, the pathological case is touching swap *at all*. So the idea of the entire active set being >=phys mem is already broken on my setup. Thus my expectation of swap only as 'warning'/'buffer' zone. Now for whatever reason, since 2.4, I consistently use at least a few Mb of swap -- stands at 5Meg now. Weird -- but I notice things like nscd running 7 copies that take 72M. Seems like overkill for a laptop. > However just because in the worst case virtual mem = min(RAM,swap), is > no reason other cases should use that much swap. If you are doing a > lot of swapping it is more efficient to plan on mem = min(RAM,swap) as > well, because frequently you can save on I/O operations by simply > reusing the existing swap page. --- Agreed. But planning your swap space for a worst case scenario that you never hit is wasteful. My worst case is using any swap. The system should be able to live with swap=1/2*phys in my situation. I don't think I'm unique in this respect. > It's a theoretical worst case and they all have it. In practice it is > very hard to find a work load where practically every page in the > system is close to the I/O point howerver. --- Well exactly the point. It was in such situations in some older systems that some programs were swapped out and temporarily made unavailable for running (they showed up in the 'w' space in vmstat). > Except for removing pages that aren't used paging with swap < RAM is > not useful. Simply removing pages that aren't in active use but might > possibly be used someday is a common case, so it is worth supporting. --- I think that is the point -- it was supported in 2.2, it is, IMO, a serious regression that it is not supported in 2.4. -linda -- The above thoughts and | They may have nothing to do with writings are my own. | the opinions of my employer. :-) L A Walsh| Senior MTS, Trust Tech., Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Break 2.4 VM in five easy steps
"Eric W. Biederman" wrote: > The hard rule will always be that to cover all pathological cases swap > must be greater than RAM. Because in the worse case all RAM will be > in thes swap cache. That this is more than just the worse case in 2.4 > is problematic. I.e. In the worst case: > Virtual Memory = RAM + (swap - RAM). Hmmmso my 512M laptop only really has 256M? Um...I regularlly run more than 256M of programs. I don't want it to swap -- its a special, weird condition if I do start swapping. I don't want to waste 1G of HD (5%) for something I never want to use. IRIX runs just fine with swap You can't improve the worst case. We can improve the worst case that > many people are facing. --- Other OS's don't have this pathological 'worst case' scenario. Even my Windows [vm]box seems to operate fine with swap It's worth complaining about. It is also worth digging into and find > out what the real problem is. I have a hunch that this hole > conversation on swap sizes being irritating is hiding the real > problem. --- Okay, admission of ignorance. When we speak of "swap space", is this term inclusive of both demand paging space and swap-out-entire-programs space or one or another? -linda -- The above thoughts and | They may have nothing to do with writings are my own. | the opinions of my employer. :-) L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ln -s broken on 2.4.5
Marcus Meissner wrote: > $ ln -s fupp/bar bar > $ ls -la bar --- Is it peculiar to a specific architecture? What does strace show for args to the symlink cmd? -l -- The above thoughts and | They may have nothing to do with writings are my own. | the opinions of my employer. :-) L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[i386 arch] MTR messages significant?]
I've been seeing these for a while now (2.4.4 - <=2.4.2) also coincidental with a change to XFree86 X 4.0.3 from "MetroX" in the time frame. Am not sure exactly when they started but was wondering if they were significant. It seems some app is trying to delete or modify something. On console and in syslog: mtrr: no MTRR for fd00,80 found mtrr: MTRR 1 not used mtrr: reg 1 not used while /proc/mtrr currently contains: reg00: base=0x ( 0MB), size= 512MB: write-back, count=1 reg01: base=0xfd00 (4048MB), size= 8MB: write-combining, count=1 Could it be the X server trying to delete a segment when it it starts up or shuts down? Is it an error in the X server to try to delete a non-existant segment? Does the kernel 'care'? I.e. -- why is it printing out messages -- are they debug messages that perhaps should be off by default? Concurrent with these messages and perhaps unrelated is a new, unwelcome, behavior of X dying on display of some Netscape-rendered websites (cf. it doesn't die under konqueror). thanks, -linda - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.4 code breaks compile of VMWare network bridging
"Mohammad A. Haque" wrote: > This was answered several hours ago. Check the list archives. --- Many thanks -- it was in my neverending backlog -l - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.4.4 code breaks compile of VMWare network bridging
In 2.4.4, the define, in include/linux/skbuff.h and corresponding structure in net/core/skbuff.c , "skb_datarefp" disappeared. I'm not reporting this as a 'bug' as kernel internal interfaces are subject to change, but more as an "FYI". I haven't had a chance to try to debug or figure out the offending bit of code to see exactly what it was trying to do, but the offending code snippet follows. I haven't yet reported it to the folks at VMware, but their response to problem reports against 2.4.x is "can you duplicate it against 2.2.x, we don't support 2.4.x yet". Perhaps someone expert in the 'net/core' area could explain what changed and what they shouldn't be doing anymore? It appears the references: # define KFREE_SKB(skb, type) kfree_skb(skb) # define DEV_KFREE_SKB(skb, type) dev_kfree_skb(skb) ^^ are the offending culprits. Thanks for any insights... -linda /* *-- * VNetBridgeReceiveFromDev -- * Receive a packet from a bridged peer device * This is called from the bottom half. Must be careful. * Results: * errno. * Side effects: * A packet may be sent to the vnet. *-- */ int VNetBridgeReceiveFromDev(struct sk_buff *skb, struct device *dev, struct packet_type *pt) { VNetBridge *bridge = *(VNetBridge**)&((struct sock *)pt->data)->protinfo; int i; if (bridge->dev == NULL) { LOG(3, (KERN_DEBUG "bridge-%s: received %d closed\n", bridge->name, (int) skb->len)); DEV_KFREE_SKB(skb, FREE_READ); return -EIO; // value is ignored anyway } // XXX need to lock history for (i = 0; i < VNET_BRIDGE_HISTORY; i++) { struct sk_buff *s = bridge->history[i]; if (s != NULL && (s == skb || SKB_IS_CLONE_OF(skb, s))) { bridge->history[i] = NULL; KFREE_SKB(s, FREE_WRITE); LOG(3, (KERN_DEBUG "bridge-%s: receive %d self %d\n", bridge->name, (int) skb->len, i)); // FREE_WRITE because we did the allocation, it's not used anyway DEV_KFREE_SKB(skb, FREE_WRITE); return 0; } } skb_push(skb, skb->data - skb->mac.raw); VNetSend(&bridge->port.jack, skb); return 0; } -- The above thoughts and | They may have nothing to do with writings are my own. | the opinions of my employer. :-) L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 and 2GB swap partition limit
Rik van Riel wrote: > On Fri, 27 Apr 2001, LA Walsh wrote: > > > An interesting option (though with less-than-stellar performance > > characteristics) would be a dynamically expanding swapfile. If you're > > going to be hit with swap penalties, it may be useful to not have to > > pre-reserve something you only hit once in a great while. > > This makes amazingly little sense since you'd still need to > pre-reserve the disk space the swapfile grows into. --- Why? Why not have a zero length file that you grow only if you spill? If you can't spill, you are out of memory -- or reserve a 'safety' margin ahead -- like reserve 32k at a time and grow it. It may make little sense, but I believe it is what is used on pseudo OS's like Windows -- you *can* preallocate, but the normal case has Windows managing the swap file and growing it as needed up to available disk space. If it is doable in windows, you'd think there'd be some way of doing it in Linux, but perhaps linux's complexity doesn't allow for that type of feature. As for disk-space reserves, if you have 5% reserved for root' on a 20G ext disk, that still amounts to 1G reserved for root. Seems an automatically sizing swap file might be just fine for some people not me, I don't even like to use swap, but I'm not my mom using windows ME either). But, conversely, if it's coming out of space I wouldn't normally use anyway -- say the "5%" -- i.e. the 5% is something I'd likely only use under *rare* conditions. I might have enough memory and the right system load that I also 'rarely' use swap -- so not reserving 1G/1G (2xMEM) on my laptop both of which will rarely get used seems like a waste of 2G. I suppose if I put it that way I might convince myself to use it, -- The above thoughts and | They may have nothing to do with writings are my own. | the opinions of my employer. :-) L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 and 2GB swap partition limit
Rogier Wolff wrote: > > > On Linux any swap adds to the memory pool, so 1xRAM would be > > > equivalent to 2xRAM with the old old OS's. > > > > no more true AFAIK > > I've always been trying to convice people that 2x RAM remains a good > rule-of-thumb. --- Ug. I like to view swap as "low grade memory" -- i.e. I really should spend 99.9% of my time in RAM -- if I spill, then it means I'm running too much/too big for my computer and should get more RAM -- meanwhile, I suffer with performance degradation to remind me I'm really exceeding my machine's physical memory capacity. An interesting option (though with less-than-stellar performance characteristics) would be a dynamically expanding swapfile. If you're going to be hit with swap penalties, it may be useful to not have to pre-reserve something you only hit once in a great while. Definitely only for systems where you don't expect to use swap (but it could be there for "emergencies" up to some predefined limit or available disk space). -- The above thoughts and | They may have nothing to do with writings are my own. | the opinions of my employer. :-) L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] SMP race in ext2 - metadata corruption.
Andrzej Krzysztofowicz wrote: > I know a few people that often do: > > dd if=/dev/hda1 of=/dev/hdc1 > e2fsck /dev/hdc1 > > to make an "exact" copy of a currently working system. --- Presumably this isn't a problem is the source disks are either unmounted or mounted 'read-only' ? -- The above thoughts and | They may have nothing to do with writings are my own. | the opinions of my employer. :-) L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [QUESTION] 2.4.x nice level
Quim K Holland wrote: > > > "BS" == BERECZ Szabolcs <[EMAIL PROTECTED]> writes: > > BS> ... a setiathome running at nice level 19, and a bladeenc at > BS> nice level 0. setiathome uses 14 percent, and bladeenc uses > BS> 84 percent of the processor. I think, setiathome should use > BS> max 2-3 percent. the 14 percent is way too much for me. > BS> ... > BS> with kernel 2.2.16 it worked for me. > BS> now I use 2.4.2-ac20 --- I was running 2 copies of setiathome on a 4 CPU server @ work. The two processes ran nice'd -19. The builds we were running still took 20-30% longer as opposed to when setiathome wasn't running (went from 45 minutes up to about an hour). This machine has 1G, so I don't think it was hurting from swapping. I finally wrote a script that checked every 30 seconds -- if the load on the machine climbed over '4', the script would SIGSTOP the seti jobs. Once the load on the machine fell below 2, it would send a SIGCONT to them. I was also running setiathome on my laptop for a short while -- but the fan kept coming on and the computer would get really hot. So I stopped that. Linux @ idle doesn't seem to ever kick on the fan, but turn on a CPU cruching program and it sure seemed to heat up the machine. I still wonder how many kilo or mega watts go to running dispersed computation programs. Just one of those things I may never know -l -- The above thoughts and | They may have nothing to do with writings are my own. | the opinions of my employer. :-) L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: unistd.h and 'extern's and 'syscall' "standard(?)"
Andreas Schwab wrote: > Don't use kernel headers in user programs. Just use syscall(3). > > Andreas. --- I'm on a SuSE71 system and have all the manpages installed: law> man syscall No manual entry for syscall The problem is not so much for user programs as library writers that write support libraries for kernel calls. For example there is libcap to implement posix capabilities on top of the kernel call. We have a libaudit to implement posix-auditing on top a a few kernel calls. It's the "system" library to system-call interface that's the problem, mainly. On ia64, it doesn't seem like there is a reliable, cross-distro, cross architecture way of interfacing to the kernel. In saying "use syscall(3)" (which is undocumented on my SuSE system, and on a RH61 sytem), implies it is in some library. I've heard rumors that the call isn't present in RH distros and they claim its because it's not exported from glibc. Then I heard glibc said it wasn't their intention to export it. (This is all 2nd hand, so forgive me if I have parties or details confused or mis-stated). It seems like kernel source points to an external source, Vender points at glibc, glibc says not their intention. Meanwhile, an important bit of kernel functionality -- being able to use syscall0, syscall1, syscall2...etc, ends up missing for those wanting to construct libraries on top of the kernel. I end up being rather perplexed about the correct course of action to take. Seeing as you work for suse, would you know where this 'syscall(3)' interface should be documented? Is it supposed to be present in all distro's? Thanks, -linda -- The above thoughts and | They may have nothing to do with writings are my own. | the opinions of my employer. :-) L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
unistd.h and 'extern's and 'syscall' "standard(?)"
I have a question. Some architectures have "system calls" implemented as library calls (calls that are "system calls" on ia32) For example, the expectation on 'arm', seems to be that sys_sync is in a library. On alpha, sys_open appears to be in a library. Is this correct? Is it the expectation that the library that handles this is the 'glibc' for that platform or is there a special "kernel.lib" that goes with each platform? Is there 1 library that I need to link my apps with to get the 'externs' referenced in "unistd.h"? The reason I ask is that in ia64 the 'syscall' call isn't done with inline assembler but is itself an 'extern' call. This implies that you can't do system calls directly w/o some support library. The implication of this is that developers working on platform independent system calls and library functions, for example, extended attributes, audit or MAC, can't provide platform independent patches w/o also providing their own syscall implementation for ia64. This came up as a problem when we wanted to provide a a new piece of code but found it wouldn't link on some distributions. In inquiry there seems to be some confusion regarding who is responsible for providing this the code/library to satisfy this 'unistd.h' extern. Should something so basic as the 'syscall' interface be provided in the kernel sources, perhaps as a kernel-provided 'lib', or is it expected it will be provided by someone else or is it expected that each developer should provide their own syscall implementation for ia64? Thanks, -linda -- The above thoughts and | They may have nothing to do with writings are my own. | the opinions of my employer. :-) L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
Jan Harkes wrote: > > On Tue, Mar 27, 2001 at 01:57:42PM -0600, Jesse Pollard wrote: > > > Using similar numbers as presented. If we are working our way through > > > every single block in a Pentabyte filesystem, and the blocksize is 512 > > > bytes. Then the 1us in extra CPU cycles because of 64-bit operations > > > would add, according to by back of the envelope calculation, 2199023 > > > seconds of CPU time a bit more than 25 days. > > > > Ummm... I don't think it adds that much. You seem to be leaving out the > > overlap disk/IO and computation for read-ahead. This should eliminate the > > majority of the delay effect. > > 1024 TB should be around 2*10^12 512-byte blocks, divide by 10^6 (1us) > of "assumed" overhead per block operation is 2*10^6 seconds, no I > believe I'm pretty close there. I am considering everything being > "available in the cache", i.e. no waiting for disk access. --- If everything being used is only used from the cache, then the application probably doesn't need 64-bit block support. I submit that your argument may be flawed in the assumption that if an application needs multi-terabyte files and devices, that most of the data will be in the in-memory cache. > The time to update the pagetables is identical to the time to update a > 4KB page when the OS is using a 2MB pagesize. Ofcourse it will take more > time to load the data into the page, however it should be a consecutive > stretch of data on disk, which should give a more efficient transfer > than small blocks scattered around the disk. --- Not if you were doing alot of random reads where you only needd 1-2K of data. The read-time of the extra 2M-1K would seem to eat into any performance boot gained by the large pagesize. > > > Granted, 512 bytes could be considered too small for some things, but > > once you pass 32K you start adding a lot of rotational delay problems. > > I've used file systems with 256K blocks - they are slow when compaired > > to the throughput using 32K. I wasn't the one running the benchmarks, > > but with a MaxStrat 400GB raid with 256K sized data transfer was much > > slower (around 3 times slower) than 32K. (The target application was > > a GIS server using Oracle). > > But your subsystem (the disk) was probably still using 512 byte blocks, > possibly scattered. And the OS was still using 4KB pages, it takes more > time to reclaim and gather 64 pages per IO operation than one, that's > why I'm saying that the pagesize needs to scale along with the blocksize. > > The application might have been assuming a small block size as well, and > the OS was told to do several read/modify/write cycles, perhaps even 512 > times as much as necessary. > > I'm not saying that the current system will perform well when working > with large blocks, but compared to increasing the size of block_t, a > larger blocksize has more potential to give improvements in the long > term without adding an unrecoverable performance hit. --- That's totally application dependent. Database applications might tend to skip around in the data and do short/reads/writes over a very large file. Large block sizes will degrade their performance. This was the idea of making it a *configurable* option. If you need it, configure it. Same with block size -- that should likely have a wider range for configuration as well. But configuration (and ideally auto-configuration where possible) seems the ultimate win-win situation. -l -- The above thoughts are my own and do not necessarily represent those of my employer. L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
Ion Badulescu wrote: > Are you being deliberately insulting, "L", or are you one of those users > who bitch and scream for features they *need* at *any cost*, and who > have never even opened up the book for Computer Architecture 101? --- Sorry, I was borderline insulting. I'm getting pressure on personal fronts other than just here. But my degree is in computer science and I've had almost 20 years experience programming things as small as 8080's w/ 4K ram on up. I'm familiar with 'cost' of emulation. > Let's try to keep the discussion civilized, shall we? --- Certainly. > > Compile option or not, 64-bit arithmetic is unacceptable on IA32. The > introduction of LFS was bad enough, we don't need yet another proof that > IA32 sucks. Especially when there *are* better alternatives. === So if it is a compile option -- the majority of people wouldn't be affected, is that in agreement? Since the default would be to use the same arithmetic as we use now. In fact, I posit that if anything, the majority of the people might be helped as the block_nr becomes a a 'typed' value -- and perhaps the sector_nr as well. They remain the same size, but as a typed value the kernel gains increased integrity from the increased type checking. At worst, it finds no new bugs and there is no impact in speed. Are we in agreement so far? Now lets look at the sites want to process terabytes of data -- perhaps files systems up into the Pentabyte range. Often I can see these being large multi-node (think 16-1024 clusters as are in use today for large super-clusters). If I was to characterize the performance of them, I'd likely see the CPU pegged at 100% with 99% usage in user space. Let's assume that increasing the block size decreases disk accesses by as much as 10% (you'll have to admit -- using a 64bit quantity vs. 32bit quantity isn't going to even come close to increasing disk access times by 1 millisecond, really, so it really is going to be a much smaller fraction when compared to the actual disk latency). Ok...but for the sake of argument using 10% -- that's still only 10% of 1% spent in the system. or a slowdown of .1%. Now that's using a really liberal figure of 10%. If you look at the actual speed of 64 bit arithmatic vs. 32, we're likely talking -- upper bound, 10x the clocks for disk block arithmetic. Disk block arithmetic is a small fraction of time spent in the kernel. We have to be looking at *maximum* slowdowns in the range of a few hundred maybe a few thousand extra clocks. A 1000 extra clocks on a 1G machine is 1 microsecond, or approx 1/5000th your average seek latency on a *fast* hard disk. So instead of 10% slowdown we are talking slowdowns in the 1/1000 range or less. Now that's a slowdown in the 1% that was being spent in the kernel, so now we've slowdown the total program speed by .001% at the increase benefit (to that site) of being able to process those mega-gig's (Pentabytes) of information. For a hit that is not noticable to human perception, they go from not being able to use super-clusters of IA32 machines (for which HW and SW is cheap), to being able to use it. That's quite a cost savings for them. Is there some logical flaw in the above reasoning? -linda -- L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
Manfred Spraul wrote: > Which field do you access? bh->b_blocknr instead of bh->r_sector? --- Yes. > > There were plans to split the buffer_head into 2 structures: buffer > cache data and the block io data. > b_blocknr is buffer cache only, no driver should access them. --- My 'device' only lives in the buffer cache. I write to the device 95% only from kernel space and then it is read out in large 256K reads by a user-land daemon to copy to a file. The user-land daemon may also use 'sendfile' to pull the data out of the device and copy it to a file which should, as I understand it, result in a kernel only copy from the device to the output file buffers -- meaning no copy of the data to user space would be needed. My primary 'dig' in all this is the 32-bit block_nr's in the buffer cache. -l -- L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
Manfred Spraul wrote: > > >4k page size * 2GB = 8TB. > > Try it. > If your drive (array) is larger than 512byte*4G (4TB) linux will eat > your data. --- I have a block device that doesn't use 'sectors'. It only uses the logical block size (which is currently set for 1K). Seems I could up that to the max blocksize (4k?) and get 8TB...No? I don't use the generic block make request (have my own). -- L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
Matthew Wilcox wrote: > > On Mon, Mar 26, 2001 at 08:39:21AM -0800, LA Walsh wrote: > > I vaguely remember a discussion about this a few months back. > > If I remember, the reasoning was it would unnecessarily slow > > down smaller systems that would never have block devices in > > the 4-28T range attached. > > 4k page size * 2GB = 8TB. --- Drat...was being more optimistic -- you're right the block_nr can be negative. Somehow thought page size could be 8Kliving in future land. That just makes the limitations even closer at hand...:-( > you keep on trying to increase the size of types without looking at > what gcc outputs in the way of code that manipulates 64-bit types. --- Maybe someone will backport some of the features of the IA-64 code generator into 'gcc'. I've been told that in some cases it's a 2.5x performance difference. If 'gcc' is generating bad code, then maybe the 'gcc' people will increase the quality of their code -- I'm sure they are just as eagerly working on gcc improvements as we are kernel improvements. When I worked on the PL/M compiler project at Intel, I know our code-optimization guy would spend endless cycles trying to get better optimization out of the code. He got great joy out of doing so. -- and that was almost 20 years ago -- and code generation has come a *long* way since then. > seriously, why don't you just try it? see what the performance is. > see what the code size is. then come back with some numbers. and i mean > numbers, not `it doesn't feel any slower'. --- As for 'trying' it -- would anyone care if we virtualized the block_nr into a typedef? That seems like it would provide for cleaner (type-checked) code at no performance penalty and more easily allow such comparisons. Well this is my point: if I have disks > 8T, wouldn't it be at *all* beneficial to be able to *choose* some slight performance impact and access those large disks vs. having not choice? Having it as a configurable would allow a given installation to make that choice rather than them having no choice. BTW, are block_nr's on RAID arrays subject to this limitation? > > personally, i'm going to see what the situation looks like in 5 years time > and try to solve the problem then. --- It's not the same, but SGI has had customers for over 3 years using >2T *files*. The point I'm looking at is if the P-X series gets developed enough, and someone is using a 4-16P system, a corp user might be approaching that limit today or tomorrow. Joe User, might not for 5 years, but that's what the configurability is about. Keep linux usable for both ends of the scale -- "I love scalability" -l -- L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 -- L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
64-bit block sizes on 32-bit systems
I vaguely remember a discussion about this a few months back. If I remember, the reasoning was it would unnecessarily slow down smaller systems that would never have block devices in the 4-28T range attached. However, isn't it possible there will continue to be a series of P-IV,V,VI,VII ...etc, addons that will be used for sometime to come. I've even heard it suggested that we might see 2 or more CPU's on a single chip as a way to increase cpu capacity w/o driving up clock speed. Given the cheapness of .25T drives now, seeing the possibility of 4T drives doesn't seem that remote (maybe 5 years?). Side question: does the 32-bit block size limit also apply to RAID disks or does it use a different block-nr type? So...is it the plan, or has it been though about -- 'abstracting' block numbes as a typedef 'block_nr', then at compile time having it be selectable as to whether or not this was to be a 32-bit or 64 bit quantity -- that way older systems would lose no efficiency. Drivers that couldn't be or hadn't been ported to use 'block_nr' could default to being disabled if 64-bit blocks were selected, etc. So has this idea been tossed about and or previously thrashed? -l -- L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NCR53c8xx driver and multiple controllers...(not new prob)
Here is the 'alternate' output when the ncr53c8xx driver is compiled in: SCSI subsystem driver Revision: 1.00 scsi-ncr53c7,8xx : at PCI bus 0, device 8, function 0 scsi-ncr53c7,8xx : warning : revision of 35 is greater than 2. scsi-ncr53c7,8xx : NCR53c810 at memory 0xfa101000, io 0x2000, irq 58 scsi0 : burst length 16 scsi0 : NCR code relocated to 0x37d6c610 (virt 0xf7d6c610) scsi0 : test 1 started scsi0 : NCR53c{7,8}xx (rel 17) request_module[block-major-8]: Root fs not mounted VFS: Cannot open root device "807" or 08:07 Please append a correct "root=" boot option Kernel panic: VFS: Unable to mount root fs on 08:07 - Note how this compares to the case where the driver is a module: (note scsi0 was an IDE emulation in this setup -- something also removed in the above setup) ncr53c8xx: at PCI bus 0, device 8, function 0 ncr53c8xx: 53c810a detected ncr53c8xx: at PCI bus 1, device 3, function 0 ncr53c8xx: 53c896 detected ncr53c8xx: at PCI bus 1, device 3, function 1 ncr53c8xx: 53c896 detected ncr53c810a-0: rev=0x23, base=0xfa101000, io_port=0x2000, irq=58 ncr53c810a-0: ID 7, Fast-10, Parity Checking ncr53c810a-0: restart (scsi reset). ncr53c896-1: rev=0x01, base=0xfe004000, io_port=0x3000, irq=57 ncr53c896-1: ID 7, Fast-40, Parity Checking ncr53c896-1: on-chip RAM at 0xfe00 ncr53c896-1: restart (scsi reset). ncr53c896-1: Downloading SCSI SCRIPTS. ncr53c896-2: rev=0x01, base=0xfe004400, io_port=0x3400, irq=56 ncr53c896-2: ID 7, Fast-40, Parity Checking ncr53c896-2: on-chip RAM at 0xfe002000 ncr53c896-2: restart (scsi reset). ncr53c896-2: Downloading SCSI SCRIPTS. scsi1 : ncr53c8xx - version 3.2a-2 scsi2 : ncr53c8xx - version 3.2a-2 scsi3 : ncr53c8xx - version 3.2a-2 scsi : 4 hosts. Vendor: SEAGATE Model: ST318203LCRev: 0002 Type: Direct-Access ANSI SCSI revision: 02 Detected scsi disk sda at scsi2, channel 0, id 1, lun 0 Vendor: SGI Model: SEAGATE ST318203 Rev: 2710 Type: Direct-Access ANSI SCSI revision: 02 Detected scsi disk sdb at scsi2, channel 0, id 2, lun 0 Vendor: SGI Model: SEAGATE ST336704 Rev: 2742 This is on a 4x550 PIII(Xeon) system. The 2nd two controllers are on pci bus 1. The boot disk is sda, which is off of scsi2 in the working example, or scsi1 in the non-working example. It seems that compiling it in somehow causes controllers 1 and 2 (which are off of the 2nd pci bus, "1", to get missed during scsi initialization. Is there a parameter I need to pass to the ncr53c8xx driver to get it to scan the 2nd bus? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
NCR53c8xx driver and multiple controllers...(not new prob)
I have a machine with 3 of these controllers (a 4 CPU server). The 3 controllers are: ncr53c810a-0: rev=0x23, base=0xfa101000, io_port=0x2000, irq=58 ncr53c810a-0: ID 7, Fast-10, Parity Checking ncr53c896-1: rev=0x01, base=0xfe004000, io_port=0x3000, irq=57 ncr53c896-1: ID 7, Fast-40, Parity Checking ncr53c896-2: rev=0x01, base=0xfe004400, io_port=0x3400, irq=56 ncr53c896-2: ID 7, Fast-40, Parity Checking ncr53c896-2: on-chip RAM at 0xfe002000 I'd like to be able to make a kernel with the driver compiled in and no loadable module support. It don't see how to do this from the documentation -- it seems to require a separate module loaded for each controller. When I compile it in, it only see the 1st controller and the boot partition I think is on the 3rd. Any ideas? This problem is present in the 2.2.x series as well as 2.4.x (x up to 2). Thanks, -linda -- L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Is swap == 2 * RAM a permanent thing?
The not reclaiming swap space is flawed in more than once instance. Suppose my P1 and P2 have their swap reserved -- now both grow. P3 is idle but can't fit in swap. This is going to result in fragmentation no? How is this fragmentation less worse than just freeing swap. Ever since Ram sizes got to about 256M, I've tended toward using swap spaces about half my RAM size -- thinking of swap as an 'overflow' place that really shouldn't get used much if at all. As you mention, not reclaiming swap space, but having 'double-reservations' for previously swapped programs becomes a problem fast in this situation. Makes the swap much less flexible. -- L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: (struct dentry *)->vfsmnt;
Alexander Viro wrote: > No such thing. The same fs may be present in many places. Please, > describe the situation - where do you get that dentry from? > Cheers, > Al --- Al, I'm getting it from various places, 1) if I want to know the path relative to the root of the dentry at the end of 'path_walk' or __user_path_walk (as used in truncate) and 2) If I've gotten a dentry as in sys_fchdir/fchown/fstat/newfstat from a file descriptor and I want the absolute path or if multple (such as multiple mounts of the same fs in different locations), the one that the user used to access the dentry. In 2.2 there was a way to get the path only from the dentry (d_path) -- I'm looking for similar functionality for the above cases. Is it such that in 2.2 dentries were only relative to root where in 2.4 they are relative to their mount point and instead of duplicate dcache entries for each possible mount point, they get stored as one? If that's the case, then while I might get a path for user-path walk, if I just have a 'fd', it may not be poasible to backtrace into the path the user used to access the file? Just some wild speculations on my part:-/...did I refine the question enough? thanks, -linda -- L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
(struct dentry *)->vfsmnt;
Could someone enlighten me as to the purpose of this field in the dentry struct? There is no elucidating comment in the header for this particular field and the name/type only indicate it is pointing to a list of vfsmounts. Can a dentry belong to more than one vfsmount? If I have a 'dentry' and simply want to determine what the absolute path from root is, in the 'd_path' macro, would I use 'rootmnt' of my current->fs as the 'vfsmount' as well? Thanks, in advance... -linda -- L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-53 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Elevator algorithm parameters
I hate when that happens... LA Walsh wrote: > If you ask for code from me, it'll be a while -- My read and write ...Q's are rather full right now with some higher priority I/O...:-) -l -- L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Elevator algorithm parameters
I have a few comments/questions on the elv. alg. as it is now. Some of them may be based on a flawed understanding, but please be patient anyway :-). 1) read-ahead is given the same 'latency' [max-wait priority] as 'read' I can see r-a as being less important than 'read' -- 'read' means some app is blocked waiting for input *now*. 'ra' -- means the kernel is being clever in hopes it is predicting a usage pattern where reading ahead will be useful. I'd be tempted to give read-ahead a higher acceptable latency than reads and possibly higher than writes. By definition, 'ra' i/o is i/o that no one currently has requested be done. a) the code may be there, but if a read request comes in for a sector marked for ra, then the latency should be set to min(r-latency,remaining ra latency) 2) I seem to notice a performance boost for my laptop setting the read latency down to 1/8th of the write (2048/16384) instead of the current 1:2 ratio. I am running my machine as a nfs server as well as doing local tasks and compiles. I got better overall performance because nfs requests got serviced more quickly to feed a data-hungry dual-processor "compiler-server". Also, my interactive processes which need lots of random reads perform better because they got 'fed' faster while some background data transfers (read and writes) of large streams of data were going on. 3) It seems that the balance of optimal latency figures would vary based on how many cpu-processes are blocked on data-reads, how many cpu's are reading from the same disk, the disk speed, the cpu speed and available memory for buffering. Maybe there is a neat wiz-bang self-adjusting algorithm that can adapt dynamically to different loads (like say detects -- hmmm, we have 100 non mergable read requests plugged, should I wait for more?...well only 1 active write request is runningmaybe I should lower the read latency...etc). However, in the interim, it seems having the values at least be tunable via /proc (rather than the current ioctl) would be useful -- just able to echo some values into there @ runtime. I couldn't seem to find such a beast in /proc. Comments/cares? If you ask for code from me, it'll be a while -- My read and write -- L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
setfsuid
Why doesn't setfsuid return -EPERM when it can't perform the operation? file: kernel/sys.c, 'sys_setfsuid' around line 779 depending on your source version. There is a check if capable(CAP_SETUID), that if it fails, doesn't return an error. This seems inconsistent. In fact the manpage I have on it states: RETURN VALUE On success, the previous value of fsuid is returned. On error, the current value of fsuid is returned. BUGS No error messages of any kind are returned to the caller. At the very least, EPERM should be returned when the call fails. -l -- L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Annoying CD-rom driver error messages
Alan Cox wrote: > > > support to function efficiently -- perhaps that technology needs to be further >developed > > on Linux so app writers don't also have to be kernel experts and experts in all the > > various bus and device types out there? > > You mean someone should write a libcdrom that handles stuff like that - quite > possibly --- More generally -- if I want to know if a DVD has been inserted and of what type and/or a floppy has been inserted or a removable media of type "X" or perhaps more generally -- not just if a 'device' has changed but a file or directory? I think that is what famd is supposed to do, but apparently it does so (I'm guessing from the external description) by polling and says it needs kernel support to be more efficient. Famd was apparently ported to Linux from Irix where it had the kernel ability to be notified of changed file-space items (file-space = anything accessible w/a pathname). Now if I can just remember where I saw this mythical port of the 'file-access monitoring daemon' -l -- L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Annoying CD-rom driver error messages
Alan Cox wrote: > > > Then it seems the less ideal question is what is the "approved and >recommended > > way for a program to "poll" such devices to check for 'changes' and 'media type' > > without the kernel generating spurious WARNINGS/ERRORS? > > The answer to that could probably fill a book unfortunately. You need to use > the various mtfuji and other ata or scsi query commands intended to notify you > politely of media and other status changes --- Taking myself out of the role of someone who knows anything about the kernel -- and only knows application writing in the fields of GUI's and audio, what do you think I'm going to use to check if their has been a playable CD inserted into the CD drive? There is an application called 'famd' -- which says it needs some kernel support to function efficiently -- perhaps that technology needs to be further developed on Linux so app writers don't also have to be kernel experts and experts in all the various bus and device types out there? Just an idea...? -linda -- L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Annoying CD-rom driver error messages
God wrote: > > On Mon, 5 Mar 2001, Alan Cox wrote: > > > > > this isnt a kernel problem, its a _very_ stupid app > > > --- > > > Must be more than one stupid app... > > > > Could well be. You have something continually trying to open your cdrom and > > see if there is media in it > > Gnome / KDE? does exactly that... (rather annoying too) .. what app > specificaly I don't know... --- So I'm still wondering what the "approved and recommended" way for a program to be "automatically" informed of a CD or floppy change/insertion and be able to informed of media 'type' w/o kernel warnings/error messages. It sounds like there is no kernel support for this so far? Then it seems the less ideal question is what is the "approved and recommended way for a program to "poll" such devices to check for 'changes' and 'media type' without the kernel generating spurious WARNINGS/ERRORS? -- L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Annoying CD-rom driver error messages
Alan Cox wrote: > > > > this isnt a kernel problem, its a _very_ stupid app > > --- > > Must be more than one stupid app... > > Could well be. You have something continually trying to open your cdrom and > see if there is media in it --- Is there some feature they *should* be using instead to check for media presence so I can forward it to their dev-team? Thanks! -l -- L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Annoying CD-rom driver error messages
LA Walsh wrote: > > > this isnt a kernel problem, its a _very_ stupid app > --- > Must be more than one stupid app... > > xena:/var/log# rpm -q magicdev > package magicdev is not installed > xena:/var/log# locate magicdev > xena:/var/log# > xena:/var/log# rpm -qa |grep -i magic > ImageMagick-5.2.6-4 --- Maybe the stupid app is 'freeamp'? It only happens when I run it...:-( -- L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Annoying CD-rom driver error messages
> this isnt a kernel problem, its a _very_ stupid app --- Must be more than one stupid app... xena:/var/log# rpm -q magicdev package magicdev is not installed xena:/var/log# locate magicdev xena:/var/log# xena:/var/log# rpm -qa |grep -i magic ImageMagick-5.2.6-4 -- L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Annoying CD-rom driver error messages
Slightly less annoying -- when no CD is in the drive, I'm getting: Mar 5 09:30:42 xena kernel: VFS: Disk change detected on device ide1(22,0) Mar 5 09:31:17 xena last message repeated 7 times Mar 5 09:32:18 xena last message repeated 12 times Mar 5 09:33:23 xena last message repeated 13 times Mar 5 09:34:24 xena last message repeated 12 times (22,0 = /dev/hdc,cdrom) Perturbing. -l - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Annoying CD-rom driver error messages
I have a music play program (freeamp) playing MP3's running. It has the feature in that it scans to see if a CD is in the drive and tries to look it up in CDDB. Well, I don't have a CD in the drive -- I have a DVD-ROM with UDF file system on it. Freeamp doesn't complain, but in my syslog/warnings file, every 5 seconds I get: Mar 5 09:17:00 xena kernel: hdc: packet command error: status=0x51 { DriveReady SeekComplete Error } Mar 5 09:17:00 xena kernel: hdc: packet command error: error=0x50 Mar 5 09:17:00 xena kernel: ATAPI device hdc: Mar 5 09:17:00 xena kernel: Error: Illegal request -- (Sense key=0x05) Mar 5 09:17:00 xena kernel: Cannot read medium - incompatible format -- (asc=0x30, ascq=0x02) Mar 5 09:17:00 xena kernel: The failed "Read Subchannel" packet command was: Mar 5 09:17:00 xena kernel: "42 02 40 01 00 00 00 00 10 00 00 00 " Needless to say, this fills up messages/warnings fairly quickly. If there's no DVD in the drive or if there is a CD in the drive, I don't notice this problem. Seems like a undesirable feature for the kernel to write out 7-line error messages everytime a program polls for a CD and fails. Is there a way to disable this when I have a DVD ROM disk in the drive? (vanilla 2.4.2 kernel). Thanks... -l -- L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
odd memory corrupt problem
I have a kernel driver that has a variable (surprise) 'audit_state'. It's statically initialized to 0 in the C code. The only way it can get set on is if the audit modules are loaded and one makes a system call to enable it. There is no 'driver' initialization performed. This code seemed to work in 2.2.17, but not in the 2.4.x series. Somehow the 'audit_state' variable is being mysteriously set to '1' (which with the driver not loaded causes less than perfect behavior. So I started sprinkling "if (audit_state) BUG();" in various places in the code. It fails during the pcnet32 driver initialization (compiled in vs. module). That in turn calls pci init code which calls net driver code. That calls 'core/' register_netdevice, which finally ends up calling run_sbin_hotplug in net/core/dev.c. That tries to load the program /sbin/hotplug via call_usermodehelper in kmod.c That 'schedules' the task and things are still ok, then it goes down on the process sem to wait until it has started. The program it is trying to execute "hotplug" which doesn't exist on my machine...ok, fine (the network interface seems to function just fine). The program doesn't exist, but when it gets back from the down(&sem), the value of "audit_state" has changed to 1. Any ideas why? Not that I'm whining, but a good debugger with a 'watch' capability would do wonders at this point. I'm trying to figure out code that has nothing to do with my driver -- just happens to be randomly stomping on a key variable. I suppose something could be stomping on the checks to see if the module is loaded and something is randomly calling the system call to turn it on, but that seems like a less likely path. Note that the system hasn't even gotten up to the point of calling the 'boot' script yet. I get the same behavior in 2.4.0, 2.4.1 and 2.4.2 (was hoping some memory corruption bug got fixed along the way). Meanwhile, guess it's on to more debugging linux style -- insert printk's. How quaint. Linda -- L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 p - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
interactive disk performance
A problem that I seem to have noticed to some extent or another in the 2.4 series is that while the elevator algorithm may achieve best disk bandwidth utilization, it seems to be heavily at the expense of interactive use. I was running a disk intensive program over nfs, so the nfsd's were quite busy -- usually 3/4 were in 'D' wait. During this time, I tried to bring up this compose window for the email I am writing. It took over 2 minutes to come up. Now the CPU is 66%idle, 31%in idled -- meaning it's fairly inactive -- everything was waiting on the disk waits. I'm sure that the file the nfsd's were writing out was one long contiguous stream -- most of which could be coalesced into large multi-block writes. Somehow it seems that the multi-block writer was getting 1 block in, then more blocks kept coming in so fast that the Q would only unplug every once in a while -- and maybe 1 block of an interactive request would go through. I don't remember the exact timeout or max wait/sector while blocks are being coalesced, but it seems it heavily favors the heavy disk user. In Unix design, the CPU algorithm was designed to lower the priority of CPU intensive tasks such that interactive use got higher priority for short bursts. Maybe a process should have a disk (and maybe net while we are at it) priority that adjusts based on usage in the way the CPU algorithm adjusts -- then the block structure could have an added 'priority' field of what the process's priority was when it wrote the block. Thus even if a process goes away -- the blocks still retain priority. Then the elevator algorithm would sort not just by locality but also weighting it with the block's priority. Perhaps it would be a make-time or run-time configurable whether or not to optimize for disk-throughput, or interactive usage. Perhaps even a 'nice' value that allows the user to subjectively prioritize processes. Possible? Usefulness? -l -- L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux stifles innovation...
"David D.W. Downey" wrote: > > Seriously though folks, look at who's doing this! > > They've already tried once to sue 'Linux', were told they couldn't because > Linux is a non-entity (or at least one that they can not effectively sue > due to the classification Linux holds), ... --- Not having a long memory on these things, do you have an article or reference on this -- I'd love to read about that one. Sue Linux? For what? Competing? Perhaps by saying Open Source is a threat to the "American Way", they mean they can't effectively 'sue', buy up or destroy it? -l -- L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
To Linus: kdb in 2.4?
I'm wondering about the possibility of re-examining the idea of a kernel debugger option distributed with 2.4. I'm thinking that it could be a great teaching tool to break and examine structures, variables, process states, as well as an aid to people who may not have a grasp of the entire kernel but need to write device drivers. It's easy for someone who's "grown up" with Linux to know it all so thoroughly that such a tool seems fluff. But even the best mechanics on new cars use complex diagnostic tools to do car repair. Sure there may be experts that designed the engine that wouldn't need it, but large numbers of people need to repair cars or modify them for their purposes. Having tools to aid in that isn't so much a crutch as it is a learning tool. It's like being able to look at the characters of the alphabet individually before one learns to comprehend the entirety of the writings of Buddha. Certainly Buddha doesn't need to know how to read to know his own writings -- and certainly, if everyone meditates and 'evolves' to their Buddha nature, they wouldn't need to read the texts or recognize the letters either. But not everyone is at the same place on the mountain (or even the same mountain, for that matter). In wisdom, one would, I posit, understand others are in different places and may find it useful to have tools to learn to read before they comprehend. Just my 2-4 cents on the matter... -- L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Block driver design issue
I have a block driver I inherited that I working on that has a problem and was wondering for cleaner solutions. The driver can accept written characters from either userspace programs or from the kernel. From userspace it uses sys_write. That in turn calls block_write. There's almost 100 lines of duplicated code in a copy of the block_write code in the driver "block_writek" as well as duplicate code in audit_write vs. audit_writek. The only difference being down in block_write at the "copy_from_user(p,buf,chars); " which becomes a "memcpy(p,buf,chars)" in the "block_writek" version. I find this duplication of code to be inefficient. Is there a way to dummy up the the 'buf' address so that the "copy_from_user" will copy the buffer from kernel space? My assumption is that it wouldn't "just work" (which may also be an invalid assumption). Suggestions? Abuse? Thanks! -linda -- L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://vger.kernel.org/lkml/
question on comment in fs.h
Excuse my ignorance, but in file include/linux/fs.h, 2.4.x source in the struct buffer_head, there is a member: unsigned short b_size; /* block size */ later there is a member: char * b_data; /* pointer to data block (512 byte) */ Is the "(512 byte)" part of the comment in error or do I misunderstand the nature of 'b_size' -l -- Linda A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
2.4.x Shared memory question
Another oddity -- I notice things taking alot more memory in 2.4. This coincides with 'top' consistently showing I have 0 shared memory. These two observations would have me wondering if I have somehow misconfigured my system to disallow sharing. Note that /proc/meminfo also shows 0 shared memory: total:used:free: shared: buffers: cached: Mem: 525897728 465264640 606330880 82145280 287862784 Swap: 2709094400 270909440 MemTotal: 513572 kB MemFree: 59212 kB MemShared: 0 kB Buffers: 80220 kB Cached: 281116 kB Active: 22340 kB Inact_dirty:338996 kB Inact_clean: 0 kB Inact_target:0 kB HighTotal: 0 kB HighFree:0 kB LowTotal: 513572 kB LowFree: 59212 kB SwapTotal: 264560 kB SwapFree: 264560 kB Not that it seems unrelated, but I do have filesystem type shm mounted on /dev/shm as suggested for POSIX shared memory. -- Linda A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 -- Linda A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
2.4.2-test1 better on disk lock/freezups
In trying to apply Jens's patch I upgraded to 2.4.2-pre1. The figures on it(242-p1) look better at this point: a vmstat dump, same data...notice this time it only took maybe 45 seconds to write out the data. I also got better interactive performance. So write speed is up to about 3.5Mb/s. Fastest reads using 'hdparm' are in the 12-14Mb/s range. Sooo...IDE hdparm block dev read vs. file writes...3-4:1 ratio? I honestly have little clue as to what would be considered 'good' numbers. Note the maximum 'system freeze' seems under 10 seconds now -- alot more tolerable. Note also, this was without my applying Jens's patch -- as I could not figure out how to get it to apply cleanly :-(. 0 0 0 0 77564 80220 280164 0 0 0 348 287 1367 10 7 83 0 0 1 0 77560 80220 280164 0 0 0 304 193 225 0 1 99 0 1 1 0 77572 80220 280156 0 0 0 162 241 354 4 2 95 0 1 1 0 77572 80220 280156 0 0 0 156 218 182 0 1 99 1 1 1 0 77560 80220 280164 0 0 0 165 217 218 0 1 99 0 1 1 0 77328 80220 280164 0 0 0 134 213 215 1 1 97 0 1 1 0 77328 80220 280164 0 0 0 138 217 177 0 1 98 0 1 1 0 77328 80220 280164 0 0 0 206 215 178 0 1 99 0 1 1 0 77332 80220 280164 0 0 0 166 219 206 1 1 98 0 0 0 0 85632 80220 280172 0 01412 192 360 1 1 98 -- Linda A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: System unresponsitive when copying HD/HD
Alan Cox wrote: > But try 2.4.1 before worrying too much. That fixed a lot of the block > performance problems I was seeing (2.4.1 ruins the VM performance under paging > loads but the I/O speed is fixed ;)) --- Seems to have gotten a bit worse. Vmstat output after 'vmware' had completed write -- but system unresponsive and writing out a 155M file... 1 0 0 0 113960 47528 277152 0 0 0 0 397 861 1 24 75 1 0 0 0 114060 47560 277152 0 0 4 350 432 1435 4 17 79 0 0 1 0 127380 47560 266196 0 0 0 516 216 435 7 3 90 1 0 1 0 127380 47560 266196 0 0 0 240 203 173 0 1 99 0 0 1 0 127380 47560 266196 0 0 0 434 275 180 0 2 98 1 0 1 0 127376 47560 266196 0 0 0 218 204 173 0 2 98 0 0 1 0 127376 47560 266196 0 0 0 288 203 174 0 0 100 0 0 1 0 127376 47560 266196 0 0 0 337 230 176 0 1 99 0 0 1 0 127376 47560 266196 0 0 0 267 241 177 0 1 99 0 0 1 0 127376 47560 266196 0 0 0 210 204 173 0 1 99 0 0 1 0 127376 47560 266196 0 0 0 204 203 173 0 1 99 0 0 1 0 127376 47560 266196 0 0 0 216 212 250 0 1 99 0 0 1 0 127376 47560 266196 0 0 0 208 205 172 0 2 98 0 0 1 0 127372 47560 266196 0 0 0 225 203 160 0 2 98 0 0 1 0 127372 47560 266196 0 0 0 316 214 212 0 1 99 1 0 1 0 127144 47560 266196 0 0 0 281 218 304 1 2 96 0 0 0 0 127144 47560 266196 0 0 0 1 161 240 1 0 99 0 0 0 0 127144 47560 266196 0 0 0 0 101 232 0 1 99 --- What is the meaning of having a process in the 'w' column? On other systems, I was used to that meaning an executable had been *swapped* out completely (as opposed to no pages mapped in) and that it meant your system vm was 'thrashing'. But that obviously isn't the case here. Those columns are output from a 'vmstat 5'. Meaning it took about 70 seconds to write out 158M. Or about 2.2M/s. That's probably not bad. It still locks up the system for over a minute though -- which is really undesirable performance for interactive use. I'm guessing the vmstat output numbers are showing 4K? 8K? blocks? 8K would about make sense for the 2.2M average. -- Linda A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: System unresponsitive when copying HD/HD
I've noticed less responsive disk response on 2.4.0 vs. 2.2.17. For example -- I run vmware and suspend it frequently when I'm not using it. One of them requires a 158Mb save file. Before, I could suspend that one, then start another which reads in a smaller 50M save file. The smaller one would come up while the other was still saving. As of 2.4, the smaller one doesn't come up -- I can't even do an 'ls' until the big save finishes. Now big image program has actually exited and I can close the window -- the disk writes are going on from the disk cache with 'kupdate' taking some minor fraction (<1%) of the CPU and the rest of the system being mostly idle. If I have vmstat running, I notice blocks trickling out to the disk, 5sec averages 495,142,151,155,136,257,15,0. Note that the maximum read rate (hdparm -t) of this disk is in the 12-14M/s range. I'm getting about 1-5% of that on output with the system's disk subsystem being apparently unable to do anything else. This is with IDE hard disk with DMA enabled. a) is this expected performance on a large linear write? b) should I expect other disk operations to be denied service as long as the write is 'flushing'? -l - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Power usage Q and parallel make question (separate issues)
Keith Owens wrote: > > On Wed, 31 Jan 2001 19:02:03 -0800, > LA Walsh <[EMAIL PROTECTED]> wrote: > >This seems to serialize the delete, run the mod-installs in parallel, then run the > >depmod when they are done. > > It works, until somebody does this > > make -j 4 modules modules_install --- But that doesn't work now. > There is not, and never has been, any interlock between make modules > and make modules_install. If you let modules_install run in parallel > then people will be tempted to issue the incorrect command above > instead of the required separate commands. --- > > make -j 4 modules > make -j 4 modules_install > > You gain a few seconds on module_install but leave more room for user > error. --- A bit of documentation at the beginning of the Makefile would do wonders for kernel-developer (not end user, please!) clarity. I've oft'asked the question as to what really is supported. I've tried things like make dep bzImage modules -- I noticed it didn't work fairly quickly. Same with modules/modules_install -- people would probably figure that one out, but just a bit of documentation would help even that. -- Linda A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Power usage Q and parallel make question (separate issues)
Keith Owens wrote: > > The only bit that could run in parallel is this one. > > .PHONY: $(patsubst %, _modinst_%, $(SUBDIRS)) > $(patsubst %, _modinst_%, $(SUBDIRS)) : > $(MAKE) -C $(patsubst _modinst_%, %, $@) modules_install > > The erase must be done first (serial), then make modules_install in > every subdir (parallel), then depmod (serial). --- Right...Wouldn't something like this work? (Seems to) --- Makefile.oldWed Jan 31 18:57:21 2001 +++ MakefileWed Jan 31 18:54:53 2001 @@ -351,8 +351,12 @@ $(patsubst %, _mod_%, $(SUBDIRS)) : include/linux/version.h include/config/MARKER $(MAKE) -C $(patsubst _mod_%, %, $@) CFLAGS="$(CFLAGS) $(MODFLAGS)" MAKING_MODULES=1 modules +modules_inst_subdirs: _modinst_ + $(MAKE) $(patsubst %, _modinst_%, $(SUBDIRS)) + + .PHONY: modules_install -modules_install: _modinst_ $(patsubst %, _modinst_%, $(SUBDIRS)) _modinst_post +modules_install: _modinst_post .PHONY: _modinst_ _modinst_: @@ -372,7 +376,7 @@ depmod_opts:= -b $(INSTALL_MOD_PATH) -r endif .PHONY: _modinst_post -_modinst_post: _modinst_post_pcmcia +_modinst_post: _modinst_post_pcmcia modules_inst_subdirs if [ -r System.map ]; then $(DEPMOD) -ae -F System.map $(depmod_opts) $(KERNELRELEASE); fi # Backwards compatibilty symlinks for people still using old versions --- This seems to serialize the delete, run the mod-installs in parallel, then run the depmod when they are done. -- Linda A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Power usage Q and parallel make question (separate issues)
I remember reading some time back that on a pentium the difference between a pentium in HLT vs. running was about 2-3 watts vs. 15-20 watts. Does anyone know the difference for today's CPU's? P-III/P-IV or other archs? How about the difference when calling the BIOS power-save feature? With the threat of rolling blackouts here in CA, I was wondering what the power consumption might be of a 100,000 or 1,000,000 CPU's in HLT vs. doing complex mathematical computation? Separately -- Parallel Make's --=== So, just about anyone I know uses make -j X [-l Y] bzImage modules, but I noticed that make modules_install isn't parallel safe in 2.4 -- since it takes much longer than the old, it would make sense to want to run it in parallel as well, but it has a delete-old, , index-new for deps. Those "3" steps can't be done in parallel safely. Was this intentional or would a 'fix' be desired? Is it the intention of the Makefile maintainers to allow a parallel or distributed make? I know for me it makes a noticable difference even on a 1 CPU machine (CPU overlap with disk I/O), and with multi CPU machines, it's even more noticable. Is a make of the kernel and/or the modules designed to be parallel safe? Is it something I should 'rely' on? If it isn't, should it be? -l -- Linda A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: seti@home and es1371
Try "freeamp". It uses darn close to 0 CPU and may not be affected by setiathome. 2nd -- renice setiathome to '19' -- you only want it to use up 'background' cputime anyway Rainer Wiener wrote: > > Hi, > > I hope you can help me. I have a problem with my on board soundcard and > seti. I have a Gigabyte GA-7ZX Creative 5880 sound chip. I use the kernel > driver es1371 and it works goot. But when I run seti@home I got some noise > in my sound when I play mp3 and other sound. But it is not every time 10s > play good than for 2 s bad and than 10s good 2s bad and so on. When I kill > seti@home every thing is ok. So what can I do? > > I have a Athlon 800 Mhz and 128 MB RAM -- Linda A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
2.4 IDE slowdown (misconfigure)
This seems to have fixed the 66% slowdown -- disk speeds w/hdparm. They are reading in the same range. For others -- my problem was that I upgraded from a 2.2.x config -- I thought 'make xconfig' would add additional new params as needed as 'make config' does. Guess I thought wrong. Thanks, Andre, for the quick help/fix! -linda > -Original Message- > From: Andre Hedrick [mailto:[EMAIL PROTECTED]] > Sent: Tuesday, January 23, 2001 11:40 PM > To: Linda Walsh > Subject: Forwarded mail > > > > CONFIG_BLK_DEV_IDEDMA_PCI=y > was > CONFIG_BLK_DEV_IDEDMA=y > > Added a few missing > > > Andre Hedrick > Linux ATA Development > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
2.4 cpu usage...
I decided put 2.4 on my laptop. After getting config issues seemingly sorted out, still have some things I can't explain. VMware seems to run about 30% slower. X was even sluggish at times. When I'm doing 'nothing', top shows about 67% IDLE and 30% in 'system time'. I notice that the process "kapm-idled" is being counted as receiving alot of CPU time. Now this could make some sense maybe that idled is getting 30% of the time, but then there's the remaining 67% that came up idle. I shut down X -- then top showed 5% idle and 95% in "kapm-idled" (and 95% system time) which could still make sense but is probably not the output you want to see when your computer is really idle. So the kapm thing could be a "display" / accounting problem, but the slowdown in vmware/X was real. I ran a WIN Norton "Benchmark" -- comes up reliably over "300" -- usually around 320-350 under 2.2.17. Under 2.4, it came up reliably *under* 300 with typical being about 265". So...I'm bummed. I'm assuming a 30% degradation in an app is probably not expected behavior? Swap usage is '0' in both OS's (i.e. it's not a run out of memory issue). -l -- L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice/Vmail: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Is sendfile all that sexy?
FYI - Another use sendfile(2) might be used for. Suppose you were to generate large amounts of data -- maybe kernel profiling data, audit data, whatever, in the kernel. You want to pull that data out as fast as possible and write it to a disk or network socket. Normally, I think you'd do a "read/write" that would xfer the data into user space, then write it back to the target in system space. With sendfile, it seems, one could write a dump-daemon that used sendfile to dump the data directly out to a target file descriptor w/o it going through user space. Just make sure the internal 'raw' data is massaged into the format of a block device and voila! A side benefit would be that data in the kernel that is written to the block device would be 'queued' in the block buffers and them being marked 'dirty' and needing to be written out. The device driver marks the buffers as clean once they are pushed out of a fd by doing a 'seek' to a new (later) position in the file -- whole buffers before that point are marked 'clean' and freed. Seems like this would have the benefit of reusing an existing buffer management system for buffering while also using a single-copy to get data to the target. ??? -l -- L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice/Vmail: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Linus's include file strategy redux
> From: Werner Almesberger [mailto:[EMAIL PROTECTED]] > Sent: Friday, December 15, 2000 1:21 PM > I don't think restructuring the headers in this way would cause > a long period of instability. The main problem seems to be to > decide what is officially private and what isn't. --- If someone wants to restructure headers, that's fine. I was only trying to understand the confusingly stated intentions of Linus. I was attempting to fit into those intentions, not change the world. > > Any other solution, as I see it, would break existing module code. > > Hmm, I think what I've outlined above wouldn't break more code than > your approach. Obviously, modiles currently using "private" interfaces > are in trouble either way. --- You've misunderstood. My approach would break *nothing*. If module-public include file includes a private, it would still work since 'sys' would be a directory under 'include/linux'. No new links need be added, needed or referenced. Thus nothing breaks. -l - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Linus's include file strategy redux
> From: Werner Almesberger [mailto:[EMAIL PROTECTED]] > > I think there are three possible directions wrt visibility of kernel > headers: > > - non at all - anything that needs kernel headers needs to provide them >itself > - kernel-specific extentions only; libc is self-contained, but user >space can get items from .../include/linux (the current glibc >approach) > - share as much as possible; libc relies on kernel for "standard" >definitions (the libc5 approach, and also reasonably feasible >today) > > So we get at least the following levels of visibility: > > 0) kernel-internal interfaces; should only be visible to "base" kernel > 1) public kernel interfaces; should be visible to modules (exposing > type 0 interfaces to modules may create ways to undermine the GPL) > 2) interfaces to kernel-specific user space tools (modutils, mount, > etc.); should be visible to user space that really wants them > 3) interface to common non-POSIX extensions (BSD system calls, etc.); > should be visible to user space on request, or on an opt-out basis > 4) interfaces to POSIX elements (e.g. struct stat, mode_t); should be > visible unconditionally (**) --- The problem came in a case where I had a kernel module that included standard memory allocation . That file, in turn, included , then that included and . From there more and more files were included until it got down to files in a kernel/kernel-module only directory "". It was at that point, the externally compiled module "barfed", because like many modules, it expected, like many externally compiled modules, that it could simply access all of it's needed files through /usr/include/linux which it gets by putting /usr/include in it's path. I've seen commercial modules like vmware's kernel modules use a similar system where they expect /usr/include/linux to contain or point to headers for the currently running kernel. So I'm doing my compile in a 'chrooted' environment where the headers for the new kernel are installed. However, now, with the new include/kernel dir in the linux kernel, modules compiled separately out of the kernel tree have no way of finding hidden kernel include files -- even though those files may be needed for modules. Precisely -- in this case, "memory allocation" for the kernel (not userland) was needed. Arguably, this belongs(ed) in a kernel-only directory. If that directory is not /usr/include/linux or *under* /usr/include/linux, then modules need a separate way to find it -- namely a new link in /usr/include() to point to the new location, or we move the internal kernel interfaces to something under the current so while the intent of "kernel-only" is made clear, they are still accessible in the way they already are, thus not requiring rewrites of all the existing makefiles. I think in my specific case, perhaps, linux/malloc.h *is* a public interface that is to be included by module writers and belongs in the 'public interface dir -- and that's great. But it includes files like 'slab.h' which are kernel mm-specific that may change in the future. Those files should be in the private interface dir. But that dir may still need to be included by public interface (malloc) file. So the user should/needs to be blind to how that is handled. They shouldn't have to change their makefiles or add new links just because how 'malloc' implements its functionality changes. This would impy that kernel only interfaces need to be include-able within the current model -- just moved out of the existing "public-for-module" interface directory (/usr/include/linux). For that to happen transparently, that directory needs to exist under the current hierarchy (under /usr/include/linux), not parallel. So at that point it becomes what we should name it under /usr/include/linux. Should it be: 1) "/usr/include/linux/sys" (my preference) 2) "/usr/include/linux/kernel" 3) "/usr/include/linux/private" 4) "/usr/include/linux/kernel-only" 5) ??? Any other solution, as I see it, would break existing module code. Comments?? Any preferences from /dev/linus? Any flaws in my logic chain? tnx, -linda - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Linus's include file strategy redux
> Huh? > % ls -ld /usr/include/linux > drwxr-xr-x6 root root18432 Sep 2 22:35 > /usr/include/linux/ > > > So if we create a separate /usr/src/linux/include/kernel dir, does that > > imply that we'll have a 2nd link: > > What 2nd link? There should be _no_ links from /usr/include to the > kernel tree. Period. Case closed. --- > ll -d /usr/include/linux lrwxrwxrwx 1 root root 26 Dec 25 1999 /usr/include/linux -> ../src/linux/include/linux/ --- I've seen this setup on RH, SuSE and Mandrake systems. I thought this was somehow normal practice? > Stuff in /usr/include is private libc copy extracted from some kernel > version. Which may have _nothing_ to the kernel you are developing for. > In the situation above they should have > -I/include > in CFLAGS. Always had to. No links, no pain in ass, no interference with > userland compiles. > > IOW, let them fix their Makefiles. --- Why would Linus want two separate directories -- one for 'kernel-only' include files and one for kernel files that may be included in user land? It seems to me, if /usr/include/linux was normally a separate directory there would be no need for him to mention a desire to create a separate kernel-only include directory, so my assumption was the linked behavior was somehow 'normal'. I think many source packages only use "-I /usr/include" and make no provision for compiling against kernel header files in different locations that need to be entered by hand. It is difficult to create an automatic package regeneration mechanism like RPM if such details need to be entered for each package. So what you seem to be saying, if I may rephrase, is that the idea of automatic package generation for some given kernel is impractical because users should be expected to edit each package makefile for their own setup with no expectation from the packages designers of a standard kernel include location? I'm not convinced this is a desirable goal. :-/ -linda - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Linus's include file strategy redux
So, I brought up the idea of a linux/sys for kernel level include files. A few other people came up with a desire of a 'kernel' dir under include, parallel w/linux. So I ran into a snag with that scenario. Let's suppose we have a module developer or a company developing a driver in their own /home/nvidia/video/drivers/newcard directory. Now they need to include kernel development files and are used to just doing the: #include Which works because in a normal compile environment they have /usr/include in their include path and /usr/include/linux points to the directory under /usr/src/linux/include. So if we create a separate /usr/src/linux/include/kernel dir, does that imply that we'll have a 2nd link: /usr/include/kernel ==> /usr/src/linux/include/kernel ? If the idea was to 'hide' kernel interfaces and make them not 'easy' to include doesn't providing a 2nd link defeat that? If we don't provide a 2nd link, how do module writers access kernel includes? If the kernel directory is under 'linux' (as in linux/sys), then the link is already there and we can just say 'don't use sys in apps'. If we create 'kernel' under 'include', it seems we'll still end up having to tell users "don't include files under directory "x"' (either kernel/ or linux/sys/) Note that putting kernel as a new directory parallel to linux requires adding another symlink -- so is that solving anything or adding more administrative "gotcha's"? -linda -- L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice/Vmail: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
include conventions /usr/include/linux/sys ?
Linus has mentioned a desire to move kernel internal interfaces into a separate kernel include directory. In creating some code, I'm wondering what the name of this should/will be. Does it follow that convention would point toward a linux/sys directory? -l -- L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice/Vmail: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: IDE0 /dev/hda performance hit in 2217 on my HW - more info - maybe extended partitions
It seems to be the output of vmstat that isn't matching things. First it says it's getting near 10M/s, but if you divide 128M/27 seconds, it's more like 4.7. So where is the time being wasted? It's not in cpu either. Now lets look at hda7 where vmstat reported 2-3meg/sec. Again, the math says it's a rate near 5. So it still doesn't make sense. > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED]]On Behalf Of Andries Brouwer > Sent: Monday, November 13, 2000 4:59 PM > To: LA Walsh > Cc: lkml > Subject: Re: IDE0 /dev/hda performance hit in 2217 on my HW - more info > - maybe extended partitions > > > On Mon, Nov 13, 2000 at 03:47:27PM -0800, LA Walsh wrote: > > > Some further information in response to a private email, I did > hdparm -ti > > under both > > 2216 and 2217 -- they are identical -- this may be something weird > > w/extended partitions... > > What nonsense. There is nothing special with extended partitions. > Partitions influence the logical view on the disk, but not I/O. > > (But the outer rim of a disk is faster than the inner side.) > > Moreover, you report elapsed times > 0:27, 0:22, 0:24, 0:28, 0:21, 0:24, 0:27 > where is this performance hit? > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > Please read the FAQ at http://www.tux.org/lkml/ > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: IDE0 /dev/hda performance hit in 2217 on my HW
According to hdparm, dma was already on. It was also suggested I try setting 32-bit mode and multcount (which I had tried before and not noticed much difference). Here's the current settings and results. Note that the timings still don't make alot of sense when comparing them to the vmstat numbers. All transfers were 256M (bs=256k, count=1k). /dev/hda: multcount= 16 (on) I/O support = 1 (32-bit) unmaskirq= 0 (off) using_dma= 1 (on) keepsettings = 0 (off) nowerr = 0 (off) readonly = 0 (off) readahead= 8 (on) geometry = 3278/240/63, sectors = 49577472, start = 0 procs memoryswap io system cpu r b w swpd free buff cache si sobibo incs us sy id 0 0 0 1004 3028 436452 11372 0 1 133118 338 757 3 17 80 0 0 0 1004 3020 436456 11372 0 0 0 1 103 166 0 1 99 /dev/hda 1 0 0 1004 2932 436464 11420 0 0 2 1 103 166 0 1 99 1 0 0 1004 2276 432752 11488 0 0 13751 1 319 594 0 12 88 0 2 0 1004 2704 428192 11456 0 0 11751 2 286 529 0 14 86 1 0 0 1004 2764 423784 11456 0 0 12685 4 303 557 0 13 87 1 0 0 1004 3124 418472 11456 0 0 14144 0 323 597 0 18 82 1024+0 records in 1024+0 records out 0.01user 2.60system 0:20.13elapsed 12%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (105major+76minor)pagefaults 0swaps /dev/hda1 3 0 0 1004 2772 414760 11456 0 0 11699 1 285 530 0 11 89 0 1 0 1004 2828 411688 11328 0 0 9037 0 242 439 0 11 89 1 0 0 1004 2528 411016 11296 0 0 2854 0 146 253 0 2 98 1 0 0 1004 2208 409680 10840 0 0 11366 0 279 511 0 13 87 2 0 0 1004 2344 409584 10808 0 0 13542 0 313 588 0 17 83 1024+0 records in 1024+0 records out 0.01user 2.55system 0:26.65elapsed 9%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (104major+76minor)pagefaults 0swaps /dev/hda3 2 0 0 1004 2560 409160 11024 0 0 12850 1 308 568 0 16 84 0 1 0 1004 2832 408904 11024 0 0 8346 1 232 424 0 11 89 1 0 0 1004 2560 409160 11024 0 0 13568 0 313 583 0 10 90 2 0 0 1004 2440 409288 11024 0 0 13952 0 320 597 0 22 78 1024+0 records in 1024+0 records out 0.00user 2.81system 0:21.34elapsed 13%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (105major+76minor)pagefaults 0swaps /dev/hda4 1 0 0 1004 2308 410064 11132 0 0 8524 1 275 508 0 12 88 2 0 0 1004 2096 412124 11124 0 0 2317 1 246 454 0 10 90 1 0 0 1004 2684 413788 11124 0 0 2406 0 252 456 0 9 91 2 0 0 1004 2564 416376 11096 0 0 2496 0 257 476 0 10 90 1 0 1 1004 3104 418168 11096 0 0 2470 0 255 464 0 8 92 procs memoryswap io system cpu r b w swpd free buff cache si sobibo incs us sy id 1 0 0 1004 2884 420344 11096 0 0 2304 1 246 455 0 7 93 1024+0 records in 1024+0 records out 0.00user 2.06system 0:27.79elapsed 7%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (104major+76minor)pagefaults 0swaps /dev/hda5 2 0 0 1004 2576 423288 11096 0 0 2880 1 282 521 0 10 89 1 0 0 1004 2900 425976 11096 0 0 3123 1 297 555 0 11 89 2 0 0 1004 2164 430124 10916 0 0 3174 0 300 549 0 15 85 1 0 0 1004 2048 431724 10856 0 0 3072 0 294 548 0 11 89 1024+0 records in 1024+0 records out 0.00user 2.19system 0:21.32elapsed 10%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (104major+76minor)pagefaults 0swaps /dev/hda6 2 0 0 1004 2556 432488 10944 0 0 2781 1 278 511 1 10 89 2 0 0 1004 2104 434284 10944 0 0 3098 1 296 542 0 11 88 2 0 0 1004 2572 435432 10944 0 0 3174 0 300 564 0 11 89 1 0 0 1004 3144 435048 10944 0 0 3046 0 292 536 0 12 88 1024+0 records in 1024+0 records out 0.02user 2.15system 0:21.50elapsed 10%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (105major+76minor)pagefaults 0swaps /dev/hda7 2 0 0 1004 2556 435672 10944 0 0 3020 1 290 549 0 12 88 1 0 0 1004 3108 435316 10916 0 0 2278 1 244 441 0 7 93 2 0 0 1004 2588 436088 10912 0 0 2906 0 283 528 0 10 90 0 1 0 1004 2324 436596 10908 0 0 2316 0 247 444 0 8 92 2 0 0 1004 2140 437248 10904 0 0 2893 1 283 527 0 10 90 1024+0 records in 1024+0 records out 0.01user 1.94system 0:24.62elapsed 7%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (104major+76minor)pagefaults 0swaps 0 0 0 1004 2416 437724 10812 0 0 1920 1 221 399 0 5
RE: IDE0 /dev/hda performance hit in 2217 on my HW - more info - maybe extended partitions
It seems to be the output of vmstat that isn't matching things. First it says it's getting near 10M/s, but if you divide 128M/27 seconds, it's more like 4.7. So where is the time being wasted? It's not in cpu either. Now I look at hda7 where vmstat reported 2000-3000 blocks/sec. Again, the math says it's a rate near 5m/s. So it still doesn't make sense. > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED]]On Behalf Of Andries Brouwer > Sent: Monday, November 13, 2000 4:59 PM > To: LA Walsh > Cc: lkml > Subject: Re: IDE0 /dev/hda performance hit in 2217 on my HW - more info > - maybe extended partitions > > > On Mon, Nov 13, 2000 at 03:47:27PM -0800, LA Walsh wrote: > > > Some further information in response to a private email, I did > hdparm -ti > > under both > > 2216 and 2217 -- they are identical -- this may be something weird > > w/extended partitions... > > What nonsense. There is nothing special with extended partitions. > Partitions influence the logical view on the disk, but not I/O. > > (But the outer rim of a disk is faster than the inner side.) > > Moreover, you report elapsed times > 0:27, 0:22, 0:24, 0:28, 0:21, 0:24, 0:27 > where is this performance hit? > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > Please read the FAQ at http://www.tux.org/lkml/ > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: IDE0 /dev/hda performance hit in 2217 on my HW - more info - maybe extended partitions
Some further information in response to a private email, I did hdparm -ti under both 2216 and 2217 -- they are identical -- this may be something weird w/extended partitions... /dev/hda: multcount= 0 (off) I/O support = 0 (default 16-bit) unmaskirq= 0 (off) using_dma= 1 (on) keepsettings = 0 (off) nowerr = 0 (off) readonly = 0 (off) readahead= 8 (on) geometry = 3278/240/63, sectors = 49577472, start = 0 Model=IBM-DARA-225000, FwRev=SHAOA50A, SerialNo=SQASQ023976 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs } RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4 BuffType=3(DualPortCache), BuffSize=418kB, MaxMultSect=16, MultSect=off DblWordIO=no, OldPIO=2, DMA=yes, OldDMA=2 CurCHS=17475/15/63, CurSects=16513875, LBA=yes, LBAsects=49577472 tDMA={min:120,rec:120}, DMA modes: mword0 mword1 mword2 IORDY=on/off, tPIO={min:240,w/IORDY:120}, PIO modes: mode3 mode4 UDMA modes: mode0 mode1 *mode2 mode3 mode4 Drive Supports : ATA/ATAPI-4 T13 1153D revision 17 : ATA-1 ATA-2 ATA-3 ATA-4 --- Speed comparisons, 2216: Timing buffered disk reads: 64 MB in 4.61 seconds = 13.88 MB/sec Timing buffered disk reads: 64 MB in 4.65 seconds = 13.76 MB/sec Timing buffered disk reads: 64 MB in 4.69 seconds = 13.65 MB/sec 2217: Timing buffered disk reads: 64 MB in 4.59 seconds = 13.94 MB/sec Timing buffered disk reads: 64 MB in 4.63 seconds = 13.82 MB/sec Timing buffered disk reads: 64 MB in 4.56 seconds = 14.04 MB/sec - After rebooting several times, I can get equally bad performance on both. :-( Here's the key. I read from /dev/hda, hda1, {hda4, hda5, hda6, hda7} hda3. The performance in reading from a, a1 and a3 is near or above 10M/s -- but in the "Extended" partition, rates from 4-7 are all under 3M/s. So what's the deal? Why do extended partitions drop performance? Here's the log. Did dd's if=device of=/dev/null, bs=128k count=1k. Timings are interwoven with vmstat output: procs memoryswap io system cpu r b w swpd free buff cache si sobibo incs us sy id 1 0 0 1928 3188 432352 10976 1 3 3111 3 183 424 3 6 91 0 0 0 1928 3448 432352 10984 0 0 1 0 125 352 1 1 98 0 0 0 1928 3356 432352 11016 0 0 1 3 107 180 0 0 99 /dev/hda 1 0 0 1928 2068 433716 10984 0 0 12597 3 302 598 0 11 89 1 0 0 1928 2196 433600 10972 0 0 6810 0 208 388 0 6 94 0 1 0 1928 2132 433668 10968 0 0 8806 0 239 454 0 12 88 0 1 0 1928 2132 433668 10968 0 0 5914 0 193 357 0 4 96 2 0 0 1928 2100 430184 10484 0 0 12365 0 295 558 0 12 88 1024+0 records in 1024+0 records out 0.01user 2.31system 0:27.43elapsed 8%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (104major+76minor)pagefaults 0swaps /dev/hda1 0 2 0 2572 2120 426948 10268 0 129 1180533 292 544 0 14 86 0 1 0 2572 2940 422320 10268 0 0 10972 0 275 511 0 11 89 1 0 0 2572 2660 419024 10268 0 0 10266 2 264 485 0 9 91 0 1 0 2572 2052 418192 10268 0 0 11789 0 285 554 0 13 87 2 0 0 2572 2176 418044 10296 0 0 13045 0 307 608 0 17 83 1024+0 records in 1024+0 records out 0.01user 2.83system 0:22.71elapsed 12%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (104major+76minor)pagefaults 0swaps /dev/hda3 1 0 0 2572 2048 418168 10296 0 0 14220 0 324 655 0 11 89 0 1 0 2572 2180 418040 10296 0 0 7027 3 213 398 0 7 93 0 1 0 2700 2116 418104 10424 0 26 8858 7 240 460 0 10 90 1 0 0 2956 2112 418464 10288 0 51 965113 253 488 0 17 83 1024+0 records in 1024+0 records out 0.03user 2.65system 0:24.70elapsed 10%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (104major+76minor)pagefaults 0swaps /dev/hda4 2 1 0 2952 2736 417752 10424 26 0 13216 0 310 577 0 14 86 1 0 0 2952 2192 419716 10544 26 0 2159 0 237 428 0 9 91 1 0 0 2952 2808 419488 10484 0 0 2304 2 247 456 0 9 91 1 0 0 2948 3092 420260 10476 0 0 2406 1 252 461 0 9 91 procs memoryswap io system cpu r b w swpd free buff cache si sobibo incs us sy id 1 0 0 2948 2304 421540 10476 0 0 2355 0 249 459 0 7 93 2 0 0 2948 2588 421604 10476 0 0 2496 0 257 480 0 9 91 1024+0 records in 1024+0 records out 0.01user 2.12system 0:28.64elapsed 7%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (104major+76minor)pagefaults 0swaps /dev/hda5 1 0 0 2948 2340 423172 10476 0 0 2394 1 251 471 0 8 92 1 0 0 3460 2596 425420 9988 0 102 275228 282 512 0 1
writing out disk cache
Another question that's been bugging me -- this is behavior that seems identical in 2216/2217 and not related to my ealier performance degredation post. I run VMware. It runs w/144Mg and writes out a 153M suspend file when I suspend it to disk. My system has a total of 512M, so the entire suspend file gets written to the disk buffers pronto (often under 1 second). But a 'sync' done afterwards can take anywhere from 20-40 seconds. vmstat shows a maximum b/o rate of 700, with 200-500 being typical. So, I know that the maximum write rate through the disk cache is close to 10,000 blocks/second. So why when the disk cache of a large file is 'sync'ed out, do I get such low b/o rates? Two sample 'vmstat 5' outputs during a sync were: 1 0 0 6292 13500 254572 165712 0 0 1 0 119 282 1 1 98 2 0 0 6292 13444 254572 165716 0 0 0 702 279 534 0 2 98 1 1 0 6292 13444 254572 165716 0 0 0 501 352 669 0 1 99 0 1 0 6292 13444 254572 165716 0 0 0 520 372 697 0 2 97 1 0 0 6292 13444 254572 165716 0 0 0 510 367 694 0 2 98 0 1 0 6292 13444 254572 165716 0 0 0 694 379 715 0 2 98 1 0 1 6292 13444 254572 165716 0 0 0 618 391 964 0 2 98 0 1 1 6292 13444 254572 165716 0 0 0 441 302 765 0 1 98 0 0 0 6292 13496 254572 165716 0 0 063 180 355 1 1 99 0 0 0 6292 13496 254572 165716 0 0 0 0 103 195 0 1 99 and 0 0 0 6228 18836 246036 167824 0 0 0 0 232 563 6 13 82 0 1 0 6228 18784 246036 167824 0 0 0 506 175 489 2 1 97 1 0 0 6228 18780 246036 167824 0 0 0 292 305 647 0 1 99 0 1 0 6228 18780 246036 167824 0 0 0 253 285 602 0 1 99 0 1 0 6228 18780 246036 167824 0 0 0 226 289 612 0 1 99 1 0 0 6228 18832 246036 167824 0 0 0 157 199 406 0 1 99 0 0 0 6228 18832 246036 167824 0 0 0 0 101 240 1 1 99 --- Another oddity -- If you add up the rates in the 2nd example, and multiply the average rate by 5, you get around 5200 blocks written out (for a 152M file). Note that a du on it shows it to use 155352, so it isn't that it is sparse. Is vmstat an unreliable measure? The above tests were on a 2216 system. -l -- L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice/Vmail: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
IDE0 /dev/hda performance hit in 2217 on my HW
I skimmed over the archives and didn't find a mention of this. I thought I'd noticed this when I first installed 2217, but I was too busy to verify it at the time. Simple case: Under 2216, I can do a 'badblocks /dev/hda1 X'. Vmstat shows about 10,000K/s average. This is consistent with 'dd' operations I use to copy partitions for disk mirroring/backup. Under 2217, the xfer speed drops to near 1,000K/s. This is for both 'badblocks' and a 'dd' if=/dev/hda of=/dev/hdb bs=256k. In both instances, I notice a near 90% performance degredation. Haven't tried any latest 2.2.18's -- has there been any work that might have fixed this problem in 2218. Am I the only person who noticed this? I.e. -- maybe it's something peculiar to my HW (Inspiron 7500), IBM DARA-22.5G HD. -- L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice/Vmail: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Weightless process class
> One problem here is that you might end up with a weightless > process having grabbed a superblock lock, after which a > normal priority CPU hog kicks in and starves the weightless > process. --- One way would be to set a flag "I'm holding a lock" and when it releases the lock(s), deschedule it? > This makes little sense. If the system doesn't page out > the least used page in the system, the disks will be more > busy on page faults than they need to be, taking away IO > bandwidth from more important processes ;) --- Strictly speaking, true, probably nothing to make an exception for. -l - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Weightless process class
I had another thought regarding resource scheduling -- has the idea of a "weightless" process been brought up? Weightless means it doesn't count toward 'load' and the class strictly has lowest priority in the system and gets *no* CPU unless there are "idle" cycles. So even a process niced to -19 could CPU starve a weightless process. Perhaps if memory was needed, the paging code would page out weightless processes first... etc?... ?? -linda -- L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice/Vmail: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Disk priorities...
I wasn't so worried about a 'trick' in this situation. I'm running all the processes. Three of them were clean up and do book-keeping processes that I didn't care so much about when they ended. The foreground process was also mine -- so I'm not so worried about cheating at this point. Specifically, I'm talking about 'nice'd "down" processes -- things I want to take lower priority over what I am doing in foreground. I'd like that to apply to disk, cpu and network usage. CPU's are getting so fast these days such that the bottlenecks are more and more becoming how fast can you get the data to them. Disk drives with ms seek times are a problem. The seek times on drives. I don't think the disk ops have been tuned to minimize seeking, have they? -- that'd be a good algorithm to use for same-disk-priority disk requests. We'd have to start thinking about disks as 'processors' and have per-disk queues that are prioritized with the cpu constantly feeding 1 event ahead of the one the disk is currently processing, that way the cpu can reorder disk operations as needed. -l > -Original Message- > From: Alexander Viro [mailto:[EMAIL PROTECTED]] > Sent: Sunday, October 01, 2000 1:52 PM > To: Rik van Riel > Cc: LA Walsh; lkml > Subject: Re: Disk priorities... > > > > > On Sun, 1 Oct 2000, Rik van Riel wrote: > > > > And if you mean reads... Good luck propagating the originator > > > information. > > > > Isn't it the case that for most of the filesystem > > reads the current process is the one that is the > > originator of the request ? > > Not true for metadata (consider the access to indirect blocks done by > pageout, for one thing). Besides, even for data it's wide open to abuse - > it doesn't take much to trick another process into populating the caches > for you. > > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Disk priorities...
Forgive me if this has been asked before, but has there ever been any thought of having a 'nice' value for disk accesses?. I was on a server with 4 CPU's but only 2 SCSI disks. Many times I'll see 4 processes on disk wait, 3 of them at a cpu-nice of 19 while the foreground processes get bogged down by the lower priority processes due to disk contention. I've also thought before a simple 'netnice' would be good as well -- real nice and easy to use, lets see: netnice disknice cpunice nice | -p , -d , -n Just wondering... -linda -- L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice/Vmail: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/