Re: [Orinoco-devel] [0/5] Orinoco merge updates, part the fourth
Hello, DG> Here's yet another batch of orinoco updates. Will this patches be included in CVS-Head code? Now I checked and the last modification of orinoco.c was about 4 days ago, when switch to enable monitor mode on all firmwares was included... DG> Smaller and less significant than the last, What was the last? DG> this is basically a handful of remaining small updates before DG> tackling the big changes (wext v15, monitor and scanning). So, the next thing that will be do, will be WPA and monitor mode improvement (maybe in all firmwares)? -- Bye, abuas_zmailto:[EMAIL PROTECTED] The Bat! v3.0.2.10, Windows XP 5.1, Build 2600, Service Pack 1 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Fastboot] Re: Query: Kdump: Core Image ELF Format
Mark Williamson <[EMAIL PROTECTED]> writes: > > The Xen guys idea of memory hotplug is another matter it sounds > > like the want to page an OS with memory hotplug which is just > > plain silly, and also unimplemented so I will cross that bridge > > when I come to it. > > The idea isn't to page the OS per se. The guest OS is responsible for the > fine-grain paging of its applications in the usually way to fit within its > physical memory allocation. > > In order to allow coarse-grained changes in physical memory allocation (e.g. > I > want to shrink a domain by 128MB so I can run another one), XenLinux uses a > "balloon driver", which basically allocates a load of memory and gives it > back to Xen to be used elsewhere. > > This is currently invoked by the administrator, although we've talked about a > daemon that will automatically shift memory allocations around between > domains based on their requirements. > > A memory hotplug interface would clean up the ballooning interface somewhat > (rather than using pretend allocations) but would still only be activated > relatively infrequently. And what I am really objecting to is xen doing memory allocation in 4KiB chunks. Pushing the chunk size up to 2MiB or 4MiB, or even doing plain extents of memory like the old protected mode OS's did before paging sounds more reasonable. Without allowing the OS access to large contiguous chunks of physical memory you are asking the OS to give up significant performance tuning opportunities. Plus with by giving the OS large pages much of the mess of needing a virtual, logical and physical address is unnecessary and the OS can simply have virtual and physical addresses as they do not. In addition large chunks of memory are going to work better with whatever memory hotplug infrastructure is implemented, than 4KiB chunks. As memory hotplug is either going to be memory controller hotplug (in numa systems) or possible DIMM hotplug is extremely fault tolerant servers. So please simply everyone's lives and code and use large pages in Xen. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFD] 'nice' attribute for executable files
Wiktor <[EMAIL PROTECTED]> wrote: > furthermore, on many systems root may want to make users able to run > some program with lowered nice, but not from root account and without > having to know the root password... i've found a way to do this using > shell scripts combined with suid bit and strange fils ownerships, but it > is absolute diseaster. You want su1, or maybe sudo. > so i thought that it would be nice to add an attribute to file > (changable only for root) that would modify nice value of process when > it starts. if there is one byte free in ext2/3 file metadata, maybe it > could be used for that? i think that it woundn't be more dangerous than > setuid bit. Remember: xmms might be configured to spawn the shell plugin. I guess there should be a maximum renice value ulimit instead, which would allow running allmost any user task on a higher nice level, except the important stuff, with the additional benefit of being able to temporarily renice some tasks until the more important work is done. I remember something similar being discussed for realtime tasks, but I don't remember the outcome. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
On Fri, Mar 25, 2005 at 09:58:40AM -0500, Dmitry Torokhov wrote: > I wonder why ALPS reconnect failed. You don't have a serial console > set up, do you? If not then maybe you could make a huge framebuffer to > capture as much info as you can... I hope you have a digital camera ;) No serial ports brought out on this laptop, and I've not tried framebuffer... > Then do "echo 1 > /sys/modules/i8042/parameters/debug" and try to > suspend. I am interested of data coming in and out of i8042. Transcribed by hand, the last few bytes are < fa ACK > d4 e9GETINFO < fa 20 00 64 > d4 ffRESET_BAT < fa aa 00 RET_BAT (Because I used O= the __FILE__ is very long so each dbg() takes two lines of my 80x25 console...) Dunno if that's helpful, sorry... -andy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ppc32: Fix MPC8555 & MPC8555E device lists (updated)
Andrew, (As I decide how long to keep the paper bag on this time, here is an updated patch. This one actually reduces the number of devices as well as removes the device from the lists which makes things work better :) Removed the FCC3 device from the lists of devices on MPC8555 & MPC8555E since it does not exist on these processors. Signed-off-by: Jason McMullan <[EMAIL PROTECTED]> Signed-off-by: Kumar Gala <[EMAIL PROTECTED]> --- diff -Nru a/arch/ppc/syslib/mpc85xx_sys.c b/arch/ppc/syslib/mpc85xx_sys.c --- a/arch/ppc/syslib/mpc85xx_sys.c 2005-03-30 01:23:14 -06:00 +++ b/arch/ppc/syslib/mpc85xx_sys.c 2005-03-30 01:23:14 -06:00 @@ -80,7 +80,7 @@ .ppc_sys_name = "8555", .mask = 0x, .value = 0x8071, - .num_devices= 20, + .num_devices= 19, .device_list= (enum ppc_sys_devices[]) { MPC85xx_TSEC1, MPC85xx_TSEC2, MPC85xx_IIC1, @@ -88,7 +88,7 @@ MPC85xx_PERFMON, MPC85xx_DUART, MPC85xx_CPM_SPI, MPC85xx_CPM_I2C, MPC85xx_CPM_SCC1, MPC85xx_CPM_SCC2, MPC85xx_CPM_SCC3, - MPC85xx_CPM_FCC1, MPC85xx_CPM_FCC2, MPC85xx_CPM_FCC3, + MPC85xx_CPM_FCC1, MPC85xx_CPM_FCC2, MPC85xx_CPM_SMC1, MPC85xx_CPM_SMC2, MPC85xx_CPM_USB, }, @@ -97,7 +97,7 @@ .ppc_sys_name = "8555E", .mask = 0x, .value = 0x8079, - .num_devices= 21, + .num_devices= 20, .device_list= (enum ppc_sys_devices[]) { MPC85xx_TSEC1, MPC85xx_TSEC2, MPC85xx_IIC1, @@ -105,7 +105,7 @@ MPC85xx_PERFMON, MPC85xx_DUART, MPC85xx_SEC2, MPC85xx_CPM_SPI, MPC85xx_CPM_I2C, MPC85xx_CPM_SCC1, MPC85xx_CPM_SCC2, MPC85xx_CPM_SCC3, - MPC85xx_CPM_FCC1, MPC85xx_CPM_FCC2, MPC85xx_CPM_FCC3, + MPC85xx_CPM_FCC1, MPC85xx_CPM_FCC2, MPC85xx_CPM_SMC1, MPC85xx_CPM_SMC2, MPC85xx_CPM_USB, }, - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC] DDRaid higher level cluster raid
Greetings, I am pleased to be able to present today an interesting project that has kept me busy for the last couple of months. DDRaid is a cluster block device that, together with a cluster filesystem like GFS, gives you the ability to operate a "distributed data cluster" where the cluster data is distributed redundantly over the nodes of a cluster rather than using a single, shared disk. You could also use ddraid with iscsi or fiber channel disks, and it even works reasonably well as a local software raid. But the interesting thing about it to me is the distributed data aspect. As far as I know, ddraid is the first higher level cluster raid, or if that is not correct, it is certainly the first to appear as open source. It is based on Raid 3.5, a simple raid scheme I investigated earlier, and presented a paper on at Linux Kongress 2002: http://sourceware.org/cluster/ddraid/raid35.pdf Raid 3.5 has the attractive property that it can be implemented without any caching or read-before-write, which is very important for a cluster. Cluster caching is a wretchedly complex affair that is normally implemented at a higher level by the cluster filesystem and/or vfs. We certainly do not want to have two wretchedly complex layers of cluster caching if we can avoid it. This is what you would get by extending Raid 5, say, to operate across a cluster. My Raid 3.5 scheme turned out to work pretty well. Some initial benchmarks were posted yesterday, here: https://www.redhat.com/archives/linux-cluster/2005-March/msg00112.html The executive summary is that on an ideal linear load, ddraid runs about 62% faster than a single raw disk. An example of such a linear load is copying a large file. On random IO loads, ddraid performs no worse than a single raw disk. Of course, increased performance is only the secondary goal of ddraid. The primary goal is data redundancy. Further details on ddraid were provided in the initial project announcement, and I will not repeat them here: https://www.redhat.com/archives/linux-cluster/2005-March/msg00034.html My purpose today is twofold: to solicit feedback on some of the kernel issues in the ddraid driver, and to introduce some relatively approachable cluster code that is easy to install and try out, even if you don't have a cluster. In other words, I would like to begin the process of involving more of the kernel community in cluster issues. The ddraid driver is a rather nice test case for this, because it touches on most of the interesting cluster issues without being particularly big and complex. Let me start by defining the difference between a cluster block device and a non-cluster block device. It is not necessarily what you would think. For example, you can export a block device over the network, but that does not make it a cluster block device: you can still only mount one filesystem at a time on it. Here are some of the things we expect of a cluster block device: * Since multiple nodes can access the device simultaneously, the cluster block device may need to prevent these accesses from interfering in situations that the cluster filesystem itself has no knowledge of and therefore cannot handle. * If the cluster block device has its own metadata, access to the metadata must be synchronized across the cluster * Cluster control: The cluster block device needs to respond to management commands arriving from other nodes. For example, so that a instance of the device may be created simultaneously on all nodes of the cluster, and each instance will know how to access the same underlying hardware resources. * Fault tolerance: If the block device relies on services provided by other nodes, those services need to be able to fail over to other nodes in the event a node fails. If a connection is temporarily broken, the cluster block device should be able to resume operation without failing any IO. A cluster block device does not need to or should not provide: * Caching and cache synchronization. Except for its own metadata, a cluster block device should let the cluster filesystem and vfs take care of this. * Multiple access. Every block device already provides this, albeit not necessarily safely. A cluster block device may use a cluster lock manager (e.g., gdlm) to implement whatever synchronization it needs. I did not use this approach myself. Instead I used streaming message based synchronization over standard sockets, something like DBus. I did this for efficiency, but it also has the attractive side effect of avoiding a dependency on any particular cluster lock manager. Instead I depend only on sockets. Which brings up an issue. I implement socket failover by arranging for a userspace process to open a new link and pass it to the kernel driver using SCM_RIGHTS. I don't think I can do that with netlink. So I use PF_UNIX, and kludge
Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
On Tue, Mar 29, 2005 at 01:42:26PM -0500, Dmitry Torokhov wrote: > Could you please try the patch below - it should fix the issues you are [snip] > --- dtor.orig/drivers/input/serio/serio.c > +++ dtor/drivers/input/serio/serio.c > if (!serio->drv || !serio->drv->reconnect || > serio->drv->reconnect(serio)) { > - serio_disconnect_port(serio); > /* >* Driver re-probing can take a while, so better let kseriod Yep, that fixes it. I applied your patch to 2.6.12-rc1-mm1 and suspended and resumed 5 times in a row without any difficulty. Thanks! -andy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Accessing data structure from kernel space
Hello sir, I successfully added linked list data structure in kernel in header file. Write a C source file and add it to kernel directory. then write 2 system calls that read and write to linked list from user space through that syscalls. recompile kernel. Now able to read/write that linked list. I want to write user data in that linked list and allow kernel to use that info in linked list. Is my approach to send data from user to kernel and store there as long as OS is not rebooted is right? Please reply me. Thanks in advance. regards, linux_lover. __ Do you Yahoo!? Yahoo! Small Business - Try our new resources site! http://smallbusiness.yahoo.com/resources/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Strange memory problem with Linux booted from U-Boot
On Mon, Mar 28, 2005 at 07:57:52PM +0500, Ara Avanesyan wrote: > Hi, > > I need some help on solving this strange problem. > Here is what I have, > I have a loadable module (linux.2.4.20) which contains a 2 mb static gloabal > array. > When I load it from linux booted via U-Boot the system crashes. > Everything works ok if I do the same thing with the same linux booted with > RedBoot. As usual for such problems, check how different firmware configure memory controller, etc. Get dump of relevant chip registers under U-Boot and RedBoot and compare them. Other possible problem area can be firmware -> kernel interface. I'm not familiar with that particular chip and RedBoot, but it's not uncommon for different firmware to have different conventions for the environment in which kernel starts execution. I'd recommend posting to the specific mail-lists, lkml doesn't seem a good place for embedded and firmware related questions :) -- Eugene - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] new fifo I/O elevator that really does nothing at all
On Tue, Mar 29 2005, Bill Davidsen wrote: > Jens Axboe wrote: > >On Mon, Mar 28 2005, Chen, Kenneth W wrote: > > > >>The noop elevator is still too fat for db transaction processing > >>workload. Since the db application already merged all blocks before > >>sending it down, the I/O presented to the elevator are actually not > >>merge-able anymore. Since I/O are also random, we don't want to sort > >>them either. However the noop elevator is still doing a linear search > >>on the entire list of requests in the queue. A noop elevator after > >>all isn't really noop. > >> > >>We are proposing a true no-op elevator algorithm, no merge, no > >>nothing. Just do first in and first out list management for the I/O > >>request. The best name I can come up with is "FIFO". I also piggy > >>backed the code onto noop-iosched.c. I can easily pull those code > >>into a separate file if people object. Though, I hope Jens is OK with > >>it. > > > > > >It's not quite ok, because you don't honor the insertion point in > >fifo_add_request. The only 'fat' part of the noop io scheduler is the > >merge stuff, the original plan was to move that to a hash table lookup > >instead like the other io schedulers do. So I would suggest just > >changing noop to hash the request on the end point for back merges and > >forget about front merges, since they are rare anyways. Hmm actually, > >the last merge hint should catch most of the merges at almost zero cost. > > Making the noop faster is clearly a good thing, but some database > software may depend on transaction order as provided by a true fifo > process. It would be nice to have both. Just look at the code. It does FIFO for any request that _isn't_ specified as ELEVATOR_INSERT_FRONT - which means any fs request, or any plain pc request. There is no specific reordering going on. Drivers expect to be able to add a request back at the head, for eg retrying it after a QUEUE_BUSY or similar condition. -- Jens Axboe - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 0/8] CKRM: Core patch set
On Tue, 29 Mar 2005 22:05:30 PST, Paul Jackson wrote: > gerrit wrote: > > This is the core patch set for CKRM > > Welcome. Hi Paul. > Newcomers to CKRM might want to start reading these patches with "[patch > 8/8] CKRM: Documentation". Starting with patch 0/8 or 1/8 will be > difficult, at least if you're as dimm witted as I am. > > Even the documentation included in patch 8/8 is missing the motivation > and context essential to understanding this patch set. It might have > helped if the Introduction text at http://ckrm.sourceforge.net/ had been > included in some form, as part of patch 0/8. I'm just a little penguin > here (lkml), but from what I can tell by watching how things work, > you're going to have to "make the case" -- explain what this is, how > it's put togeher, and why it's needed. This is a sizable patch, in > lines of code, in hooks in critical places, and in amount of "new > concepts." I presume (unless you've managed to bribe or blackmail some > big penguin) you're going to have convince some others that this is > worth having. I for one am a CKRM skeptic, so won't be much help to you > in that quest. Good luck. Good point on including the pointer to the web site. As you probably noticed, there is a history of the design, papers presented, etc. Also, Jonathan Corbet did a nice write up from the discussion at the 2004 Kernel summit which is archived here: http://lwn.net/Articles/94573/ which may be of use. The OLS and LinuxTag papers are archived at the site that you pointed to and there will be a tutorial on configuring, using and writing controllers for CKRM at OLS this year. You may also want to see the previous postings of this code to LKML for more background. In short, CKRM provides very basic desktop to server workload management capabilities similar to those provided by most of the old fashioned operating systems. The code provides a fairly simple mechanism for adding controllers for any resource type and the code is currently widely deployed by PlanetLab, a part of Novell/SuSE's distro, and the capabilities are requested by a fair number of Linux users and customers. > I don't see any performance numbers, either on small systems, or > scalability on large systems. Certainly this patch does not fall under > the "obviously no performance impact" exclusion. Fair point. We have been running some of the smaller benchmarks but have not yet had a chance to do any kind of performance comparison based on the current code. However, when configured out, it will have zero impact. We do have some performance analysis of the code with CONFIG_CKRM set to y but no rules configured planned for the very near future. > A couple of nits: > > 1) Instead of disabling routines with #defines: > #define numtasks_put_ref(core_class) do {} while (0) > one can do it with static inlines, preserving more compiler > checking. Yeah - that works well in some cases but it turns out to not do so well when an argument to a function refers to a structure element which is not configured in. In that case, the compiler emits a reference to an undefined structure value in the case of the static inline, where otherwise the entire set of code is pre-processed away. I think we've gone through the code and used the correct balance of static inlines and #define constructs as appropriate. If we've missed any, I'm more than willing to accept a patch to correct a specific instance. > 2) I take it that the following constitutes the 'documentation' > for what is in /proc//delay. Perhaps I missed something. > > + res = sprintf(buffer,"%u %llu %llu %u %llu %u %llu\n", > + (unsigned int) get_delay(task,runs), > + (uint64_t) get_delay(task,runcpu_total), > + (uint64_t) get_delay(task,waitcpu_total), > + (unsigned int) get_delay(task,num_iowaits), > + (uint64_t) get_delay(task,iowait_total), > + (unsigned int) get_delay(task,num_memwaits), > + (uint64_t) get_delay(task,mem_iowait_total) The code is the documentation? :) There is probably some documentation on /proc// in general and we'll see if we can get it updated appropriately. Vivek? > 3) Typo in init/Kconfig "atleast": > > If you say Y here, enable the Resource Class File System and atleast Got it - thanks! Someone liked the new word "atleast" - at least three occurences removed. Oh - and uniformly updated diffstats - I probably missed some when I was playing with quilt originally. gerrit - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: no need to check for NULL before calling kfree() -fs/ext2/
Hi, In my wlan driver module, i allocated some memory using kmalloc in interrupt context, this one failed but its not returning NULL , so i was proceeding further everything was going wrong... & finally the kernel crahed. Can any one of you tell me why this is happening ? i cannot use GFP_KERNEL because i'm calling this function from interrupt context & it may block. Any other solution for this ?? I'm concerned abt why kmalloc is not returning null if its not a success ?? Is it not necessary to check for NULL before calling kfree() ?? Regards, Lavin Pekka J Enberg wrote: Hi, Paul Jackson writes: Even such obvious changes as removing redundant checks doesn't seem to ensure a performance improvement. Jesper Juhl posted performance data for such changes in his microbenchmark a couple of days ago. It is not a performance issue, it's an API issue. Please note that kfree() is analogous libc free() in terms of NULL checking. People are checking NULL twice now because they're confused whether kfree() deals it or not. Paul Jackson writes: Maybe we should be following your good advice: > You don't know that until you profile! instead of continuing to make these code changes. I am all for profiling but it should not stop us from merging the patches because we can restore the generated code with the included (totally untested) patch. Pekka Signed-off-by: Pekka Enberg <[EMAIL PROTECTED]> --- Index: 2.6/include/linux/slab.h === --- 2.6.orig/include/linux/slab.h 2005-03-22 14:31:30.0 +0200 +++ 2.6/include/linux/slab.h2005-03-30 09:08:13.0 +0300 @@ -105,8 +105,14 @@ return __kmalloc(size, flags); } +static inline void kfree(const void * p) +{ + if (!p) + return; + __kfree(p); +} + extern void *kcalloc(size_t, size_t, int); -extern void kfree(const void *); extern unsigned int ksize(const void *); extern int FASTCALL(kmem_cache_reap(int)); Index: 2.6/mm/slab.c === --- 2.6.orig/mm/slab.c 2005-03-22 14:31:31.0 +0200 +++ 2.6/mm/slab.c 2005-03-30 09:08:45.0 +0300 @@ -2567,13 +2567,11 @@ * Don't free memory not originally allocated by kmalloc() * or you will run into trouble. */ -void kfree (const void *objp) +void __kfree (const void *objp) { kmem_cache_t *c; unsigned long flags; - if (!objp) - return; local_irq_save(flags); kfree_debugcheck(objp); c = GET_PAGE_CACHE(virt_to_page(objp)); @@ -2581,7 +2579,7 @@ local_irq_restore(flags); } -EXPORT_SYMBOL(kfree); +EXPORT_SYMBOL(__kfree); #ifdef CONFIG_SMP /** - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-10
* Lee Revell <[EMAIL PROTECTED]> wrote: > > could you run a bit with tracing disabled (in the .config) on the C3? > > (but wakeup timing still enabled) It may very well be tracing overhead > > that makes those latencies that high. Also, we'd thus have some hard > > data on how much overhead tracing is in such a situation, on that CPU. > > I have not left it to run overnight yet with the swappiness set to > 100, which triggers the biggest latencies as my entire desktop is > swapped out, but so far it looks like the problem was tracing > overhead. With timing enabled but tracing disabled the longest > latency on the C3 so far is 270 usecs. > > An important giveaway is that with tracing enabled the same code path > only triggers ~200 usec latencies on the K7 but ~2ms on the C3. Since > the longest latency with PREEMPT_DESKTOP is normally more a function > of memory bandwidth than processor speed, and the machines differ much > more in the latter, this agrees with the theory that the overhead is > the problem. besides cycle overhead, function tracing increases cache footprint - and with a CPU that has smaller caches (such as the C3) it can tip a loop over the edge, and can make it cache-trashing, while it would fit into the cache before. In such a situation the difference can be dramatic. (on CPUs with larger caches similar artifacts can happen too, but it needs a 'fatter' loop, which are apparently rarer.) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07
* Steven Rostedt <[EMAIL PROTECTED]> wrote: > OK, I'm declaring defeat here. I've been fighting race conditions all > day, and it's now 1 in the morning where I live. It looks like this > implementation has no other choice but to have the waking up "pending > owner" take the wait_list lock once again. How heavy of a overhead is > that really? as i mentioned it before, taking a lock is not a big issue at all. Since you have to touch the lock data structure anyway (and all of it fits into a single cacheline), it doesnt really matter whether it's atomic flag setting/clearing, or raw spinlock based. later on, once things are stable and well-understood, we can still attempt to micro-optimize it. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] fixup newly added jsm driver
On Tue, Mar 29, 2005 at 05:56:13PM -0500, Jeff Garzik wrote: > It got a decent review, but from Christoph's list it looks like not all > the issues raised during the review got addressed. Exactly. Most reviews take more than one pass. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/2] fork_connector: add a fork connector
Guillaume wrote: > I'm sorry but I really don't understand why you're speaking about > accounting when I present results about fork connector. I agree that > ELSA is using the fork connector but the fork connector has nothing to > do with accounting. True - sorry. I kinda hijacked your thread. I had fork_connector associated in my mind with process accounting, so made the leap from analyzing the fork_connector mechanism on its own merit, to analyzing whether it was useful for collecting the new process accounting information that was needed from forks. In my own defense, I don't see where the motivations for fork_connector are spelled out in the presentation to this patch, and it seems that the other potential uses of it are less well explored at this point. So I think my leap was a small one ;). -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] fixup newly added jsm driver
On Tue, Mar 29, 2005 at 01:47:34PM -0800, Andrew Morton wrote: > Christoph Hellwig <[EMAIL PROTECTED]> wrote: > > > > One more prematurely added drivers.. > > This driver was first sent out for review a month ago, was upissued twice > and generated over seventy linux-kernel emails including some from Russell > and some from Greg. It was by no means a "premature" addition. > > One could say that it was inadequately reviewed, but how is one to > determine that? If the thing has been under discussion for a month and the > submitter says "I've addressed all comments" then it's going to get merged. I don't think the submitter should say all issues have been addressed but that should come from a sufficiently trusted reviewer. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: API changes to the slab allocator for NUMA memory allocation
Christoph Lameter wrote: The patch makes the following function calls available to allocate memory on a specific node without changing the basic operation of the slab allocator: kmem_cache_alloc_node(kmem_cache_t *cachep, unsigned int flags, int node); kmalloc_node(size_t size, unsigned int flags, int node); I intentionally didn't add a kmalloc_node() function: kmalloc is just a wrapper around kmem_find_general_cachep+kmem_cache_alloc. It exists only for efficiency. The _node functions are slow, thus a wrapper is IMHO not required. kmalloc_node(size,flags,node) is identical to kmem_cache_alloc(kmem_find_general_cachep(size,flags),flags,node). What about making kmem_find_general_cachep() public again and removing kmalloc_node()? And I don't know if it's a good idea to make kmalloc() a special case of kmalloc_node(): It adds one parameter to every kmalloc call and kmem_cache_alloc call, virtually everyone passes -1. Does it increase the .text size? -- Manfred - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/2] fork_connector: add a fork connector
On Tue, 2005-03-29 at 22:06 -0800, dean gaudet wrote: > On Tue, 29 Mar 2005, Jay Lan wrote: > > > The fork_connector is not designed to solve accounting data collection > > problem. > > > > The accounting data collection must be done via a hook from do_exit(). > > by the time do_exit() occurs the parent may have disappeared... you do > need to record something at fork() time so that you can account to the > correct ancestor. You're right. At fork(), the "job daemon", provided by ELSA, records information about parent PID, child PID and also about the group of processes they belong to. At exit(), accounting data are recorded by CSA or BSD-like accounting. > an example of where this ancestry is useful would be the summation of all > cpu time spent by children of apache, spamd, clamd, ... You're right. One usage can be: apache, spamd and clamd can be put in a job (a group of processes) by using the "job daemon" and automatically, all children will belong to the same jobs respectively. So the gaol here is really to perform per-group of processes accounting using ELSA and CSA accounting data. Best Regards, Guillaume - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/2] fork_connector: add a fork connector
Guillaume wrote: > When I wrote "several user space applications" it was just to say that > this fork connector is not designed only for ELSA and fork information > is available to every listeners. So I suppose if fork_connector were not used to collect information for accounting, then someone would have to make the case that there were enough other uses, of sufficient value, to add fork_connector. We have to be a bit careful, in the kernel, to avoid adding mechanisms until we have the immediate use in hand. If we don't do this, then the kernel ends up looking like the Gargoyles on a Renaissance church - burdened with overly ornate features serving no earthly purpose. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07
On Sat, 2005-03-26 at 11:04 -0500, Steven Rostedt wrote: > > On Fri, 25 Mar 2005, Esben Nielsen wrote: > > > > I like the idea of having the scheduler take care of it - it is a very > > optimal coded queue-system after all. That will work on UP but not on SMP. > > Having the unlock operation to set the mutex in a "partially owned" state > > will work better. The only problem I see, relative to Ingo's > > implementation, is that then the awoken task have to go in and > > change the state of the mutex, i.e. it has to lock the wait_lock again. > > Will the extra schedulings being the problem happen offen enough in > > practise to have the extra overhead? > > Another answer is to have the "pending owner" bit be part of the task > structure. A flag maybe. If a higher priority process comes in and > decides to grab the lock from this owner, it does a test_and_clear on the > this flag on the pending owner task. When the pending owner task wakes > up, it does the test_and_clear on its own bit. Who ever had the bit set > on the test wins. If the higher prio task were to clear it first, then it > takes the ownership away from the pending owner. If the pending owner > were to clear the bit first, it won and would contiue as though it got the > lock. The higher priority tasks would do this test within the wait_lock > to keep from having more than one trying to grab the lock from the pending > owner, but the pending owner won't need to do anything since it will know > if it was the new owner just by testing its own bit. OK, I'm declaring defeat here. I've been fighting race conditions all day, and it's now 1 in the morning where I live. It looks like this implementation has no other choice but to have the waking up "pending owner" take the wait_list lock once again. How heavy of a overhead is that really? The problem I've painfully discovered, is that a task trying to take the lock must test the pending owner for two things. One is, is the pending owner owning the same lock as the one the task is trying to get. Since the waking up of the pending owner has no synchronous locking, it can grab the lock and then become a pending owner to another lock after the other task thinks it's still the pending owner of the lock its trying to get, but before testing it. So it can mistake it as the pending owner still for this lock, when in reality it owns to lock and is pending for another. The other test must also do the test_and_clear_bit on the pending owner bit. So you need to make sure the owner not only stays the owner of the lock the task is trying to get, but also be able to do the atomic test_and_clear on the owner's pending owner bit. I can't get these two in sync without grabbing a lock (in this case, the wait_list lock). Ingo, unless you can think of a way to do this, tomorrow (actually today), I'll change this to have the end of __down (and friends) grab the wait_list lock to test and clear it's pending owner bit. -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] the nommu support for ARM linux 2.6.10
An updated MPU and noMMU support patch for ARM against linux 2.6.10 kernel is available at : http://opensrc.sec.samsung.com/download/linux-2.6.10-hsc1.patch.gz You can select the memory management type "MPU" or "NONE" in the arm kernel configuration menu, which was traditionally known as "armnommu" or uClinux/ARM by 2.6.9. (sure, you can choose "MMU" for vanila Linux :-) It's a different way from other uclinux arch. (i.e. m68knommu), which enables simultaneous support to use "singular address space" support even for MMU platforms. You can choose "MMU" or "NONE" for your mmu based arm platform with a few modification. i.e. virtual address --> physical address conversion. the 2.6.11.6-hsc0 patch will be available in this week, and some benchmark will be provided for both cases on a same h/w platform. and addtional MPU support API is pending for some services like memory protection, even for uClinux. any suggesstions welcomed. You can reach the project home at : http://opensrc.sec.samsung.com/ currently officially supported platforms are : s3c24a0, s5c7375, atmel, espd_s3c510b, P2001, s3c3410, s3cb0x. thanks for contribution : Tobias Lorenz and Jiun-Shian Ho Regards, Hyok --- CHOI, HYOK-SUNG Engineer (Kernel/System Architecture) Digital Media R Center, Samsung Electronics Co.,Ltd. tel: +82-31-200-8594 fax: +82-31-200-3427 e-mail: [EMAIL PROTECTED] [Linux 2.6 ARM MPU/noMMU Kernel Maintainer] http://opensrc.sec.samsung.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc1-bk2+PREEMPT_BKL: Oops at serio_interrupt
On Tuesday 29 March 2005 14:49, Alexey Dobriyan wrote: > On Tuesday 29 March 2005 23:02, Dmitry Torokhov wrote: > > On Tue, 29 Mar 2005 21:28:20 +0400, Alexey Dobriyan <[EMAIL PROTECTED]> > > wrote: > > > On Tuesday 29 March 2005 10:27, Dmitry Torokhov wrote: > > > > On Monday 28 March 2005 12:26, Alexey Dobriyan wrote: > > > > > Steps to reproduce for me: > > > > > * Boot CONFIG_PREEMPT_BKL=y kernel (.config, dmesg are attached) > > > > > * Start rebooting > > > > > * Start moving serial mouse (I have Genius NetMouse Pro) > > > > > * Right after gpm is shut down I see the oops > > > > > * The system continues to reboot > > > > > > > > Could you try the patch below, please? Thanks! > > > > > > > Input: serport - fix an Oops when closing port - should not call > > > >serio_interrupt when serio port is being unregistered. > > > > > > Doesn't work, sorry. Even worse: rebooting now also produces many pages of > > > oopsen, then hang the system. I'm willing to test any new patches. > > > > Does it oops at the same place with this patch or at some other place? > > I manage to find this in the logs (nothing more :-( ): > > Unable to handle kernel NULL pointer dereference at virtual address 0068 > printing eip: > c0202947 > *pde = > Oops: [#1] > PREEMPT > Modules linked in: ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables > binfmt_misc uhci_hcd snd_intel8x0 snd_ac97_codec snd_pcm snd_timer snd > soundcore snd_page_alloc floppy > CPU:0 > EIP:0060:[]Not tainted VLI > EFLAGS: 00010282 (2.6.12-rc1-bk2-serio) > > According to vmlinux, c0202947 is at: > > c020293e : Could you please try this one instead? Thanks! -- Dmitry serport.c | 98 +++--- 1 files changed, 68 insertions(+), 30 deletions(-) Index: dtor/drivers/input/serio/serport.c === --- dtor.orig/drivers/input/serio/serport.c +++ dtor/drivers/input/serio/serport.c @@ -27,11 +27,15 @@ MODULE_LICENSE("GPL"); MODULE_ALIAS_LDISC(N_MOUSE); #define SERPORT_BUSY 1 +#define SERPORT_ACTIVE 2 +#define SERPORT_DEAD 3 struct serport { struct tty_struct *tty; wait_queue_head_t wait; struct serio *serio; + struct serio_device_id id; + spinlock_t lock; unsigned long flags; }; @@ -45,11 +49,29 @@ static int serport_serio_write(struct se return -(serport->tty->driver->write(serport->tty, , 1) != 1); } +static int serport_serio_open(struct serio *serio) +{ + struct serport *serport = serio->port_data; + unsigned long flags; + + spin_lock_irqsave(>lock, flags); + set_bit(SERPORT_ACTIVE, >flags); + spin_unlock_irqrestore(>lock, flags); + + return 0; +} + + static void serport_serio_close(struct serio *serio) { struct serport *serport = serio->port_data; + unsigned long flags; + + spin_lock_irqsave(>lock, flags); + clear_bit(SERPORT_ACTIVE, >flags); + set_bit(SERPORT_DEAD, >flags); + spin_unlock_irqrestore(>lock, flags); - serport->serio->id.type = 0; wake_up_interruptible(>wait); } @@ -61,36 +83,21 @@ static void serport_serio_close(struct s static int serport_ldisc_open(struct tty_struct *tty) { struct serport *serport; - struct serio *serio; - char name[64]; if (!capable(CAP_SYS_ADMIN)) return -EPERM; - serport = kmalloc(sizeof(struct serport), GFP_KERNEL); - serio = kmalloc(sizeof(struct serio), GFP_KERNEL); - if (unlikely(!serport || !serio)) { - kfree(serport); - kfree(serio); + serport = kcalloc(1, sizeof(struct serport), GFP_KERNEL); + if (!serport) return -ENOMEM; - } - memset(serport, 0, sizeof(struct serport)); - serport->serio = serio; - set_bit(TTY_DO_WRITE_WAKEUP, >flags); serport->tty = tty; - tty->disc_data = serport; - - memset(serio, 0, sizeof(struct serio)); - strlcpy(serio->name, "Serial port", sizeof(serio->name)); - snprintf(serio->phys, sizeof(serio->phys), "%s/serio0", tty_name(tty, name)); - serio->id.type = SERIO_RS232; - serio->write = serport_serio_write; - serio->close = serport_serio_close; - serio->port_data = serport; - + spin_lock_init(>lock); init_waitqueue_head(>wait); + tty->disc_data = serport; + set_bit(TTY_DO_WRITE_WAKEUP, >flags); + return 0; } @@ -100,7 +107,8 @@ static int serport_ldisc_open(struct tty static void serport_ldisc_close(struct tty_struct *tty) { - struct serport *serport = (struct serport*) tty->disc_data; + struct serport *serport = (struct serport *)
Re: [patch 1/2] fork_connector: add a fork connector
Dean wrote: > by the time do_exit() occurs the parent may have disappeared I don't think Jay was disagreeing with this. I think he agrees that there is to be collected: 1) the classic bsd accounting data, in do_exit 2) the fork time by some mechanism at fork time (perhaps just not the fork_connect mechanism) 3) some additional data to be harvested at exit time, for CSA I suspect you two are just tripping over words to describe this. However, this does expose another possibility. Record the original forking parent pid in another task_struct field at fork time (didn't someone else have a 'bio_pid' patch to this affect?), and add that task struct value to the list of additional items to be written out, at exit time. I was skeptical that CBUS could have zero impact on fork, but recording one more word in the task struct at fork gets about as close to zero impact as one can get on fork. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Reduce stack usage in module.c
On Tue, 29 Mar 2005 09:43:12 -0800, Randy.Dunlap <[EMAIL PROTECTED]> wrote: > Yum Rayan wrote: > > Attempt to reduce stack usage in module.c (linux-2.6.12-rc1-mm3). > > Specifically from checkstack.pl > > > > Before patch > > -- > > who_is_doing_it: 512 > > obsolete_params: 160 > > > > After patch > > > > who_is_doing_it: none > So all function local variables are in registers? Yes, all function local variables of the patched who_is_doing_it(...) are in registers. > > Also while at it, fix following in who_is_doing_it(...) > > - use only as much memory is needed > > - do not write past array index for the boundary case > > I don't see a boundary case problem with the current code, > hence I don't see why the kmalloc(len + 1, GFP_KERNEL) is > needed... Let's consider the original code and len = 513 1399 static void who_is_doing_it(void) 1400 { 1401 /* Print out all the args. */ 1402 char args[512]; 1403 unsigned long i, len = current->mm->arg_end - current->mm->arg_start; 1404 1405 if (len > 512) 1406 len = 512; 1407 1408 len -= copy_from_user(args, (void *)current->mm->arg_start, len); 1409 1410 for (i = 0; i < len; i++) { 1411 if (args[i] == '\0') 1412 args[i] = ' '; 1413 } 1414 args[i] = 0; 1415 printk("ARGS: %s\n", args); 1416 } After lines 1410 thru 1413, "i" wil be 512. So line 1414 will be "args[512] = 0". But args is 512 byte array with last legally accessible element at 511? > File names start one level deeper than wanted. They should begin > with linux/ or a/ or ./ e.g. > There are plenty of docs on this, please let me know if you need > references to them. Point noted. Will post patch to linux/Documentation/SubmittingPatches, hopefully making it more clear. Reworked patch at end of email. > > > @@ -769,15 +769,25 @@ > > struct kernel_param *kp; > > unsigned int i; > > int ret; > > + char *sym_name = NULL; > > + unsigned int sym_name_len = 0; > > > > kp = kmalloc(sizeof(kp[0]) * num, GFP_KERNEL); > > if (!kp) > > return -ENOMEM; > > Style thing, I guess, but since the case of num == 0 doesn't do > anything here, I would just begin the function with: > >if (!num) >return; > or goto out; > to maintain one return point. > > and then eliminate the kmalloc()s, if (num), kfree()s, and > parse_args(). Was attempting to preserve the call flow of the previous author. But yes, this makes more sense. I changed code to return "0" for !num case. Thanks, Rayan Summary: Reduce stack usage in obsolete_params() and who_is_doing_it() Target: linux-2.6.12-rc1-mm3 Signed-off-by: Yum Rayan <[EMAIL PROTECTED]> --- a/kernel/module.c 2005-03-25 22:11:06.0 -0800 +++ b/kernel/module.c 2005-03-29 22:16:09.0 -0800 @@ -767,17 +767,27 @@ const char *strtab) { struct kernel_param *kp; - unsigned int i; + char *sym_name; + unsigned int sym_name_len, i; int ret; + if (!num) + return 0; + kp = kmalloc(sizeof(kp[0]) * num, GFP_KERNEL); if (!kp) return -ENOMEM; - for (i = 0; i < num; i++) { - char sym_name[128 + sizeof(MODULE_SYMBOL_PREFIX)]; + sym_name_len = 128 + sizeof (MODULE_SYMBOL_PREFIX); + sym_name = kmalloc(sym_name_len, GFP_KERNEL); + if (!sym_name) { + ret = -ENOMEM; + goto free_kp; + } - snprintf(sym_name, sizeof(sym_name), "%s%s", + for (i = 0; i < num; i++) { + + snprintf(sym_name, sym_name_len, "%s%s", MODULE_SYMBOL_PREFIX, obsparm[i].name); kp[i].name = obsparm[i].name; @@ -791,13 +801,15 @@ printk("%s: falsely claims to have parameter %s\n", name, obsparm[i].name); ret = -EINVAL; - goto out; + goto free_sym; } kp[i].arg = [i]; } ret = parse_args(name, args, kp, num, NULL); - out: + free_sym: + kfree(sym_name); + free_kp: kfree(kp); return ret; } @@ -1399,12 +1411,16 @@ static void who_is_doing_it(void) { /* Print out all the args. */ - char args[512]; + char *args; unsigned long i, len = current->mm->arg_end - current->mm->arg_start; if (len > 512) len = 512; + args = kmalloc(len + 1, GFP_KERNEL); + if (!args) + return; + len -= copy_from_user(args, (void *)current->mm->arg_start, len); for (i = 0; i < len; i++) { @@ -1413,6 +1429,7 @@ } args[i] = 0; printk("ARGS: %s\n", args); +
Re: no need to check for NULL before calling kfree() -fs/ext2/
Pekka writes: > It is not a performance issue, it's an API issue. > ... > I am all for profiling but it should not stop us from merging the patches Ok - sounds right. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: memcpy(a,b,CONST) is not inlined by gcc 3.4.1 in Linux kernel
On Wednesday 30 March 2005 05:27, Gerold Jury wrote: > > >> On Tue, Mar 29, 2005 at 05:37:06PM +0300, Denis Vlasenko wrote: > >> > /* > >> > * This looks horribly ugly, but the compiler can optimize it totally, > >> > * as the count is constant. > >> > */ > >> > static inline void * __constant_memcpy(void * to, const void * from, > >> > size_t n) { > >> > if (n <= 128) > >> > return __builtin_memcpy(to, from, n); > >> > >> The problem is that in GCC < 4.0 there is no constant propagation > >> pass before expanding builtin functions, so the __builtin_memcpy > >> call above sees a variable rather than a constant. > > > >or change "size_t n" to "const size_t n" will also fix the issue. > >As we do some (well very little and with inlining and const values) > >const progation before 4.0.0 on the trees before expanding the builtin. > > > >-- Pinski > >- > I used the following "const size_t n" change on x86_64 > and it reduced the memcpy count from 1088 to 609 with my setup and gcc 3.4.3. > (kernel 2.6.12-rc1, running now) What do you mean, 'reduced'? (/me is checking) Oh shit... It still emits half of memcpys, to be exact - for struct copies: arch/i386/kernel/process.c: int copy_thread(int nr, unsigned long clone_flags, unsigned long esp, unsigned long unused, struct task_struct * p, struct pt_regs * regs) { struct pt_regs * childregs; struct task_struct *tsk; int err; childregs = ((struct pt_regs *) (THREAD_SIZE + (unsigned long) p->thread_info)) - 1; *childregs = *regs; ^^^ childregs->eax = 0; childregs->esp = esp; # make arch/i386/kernel/process.s copy_thread: pushl %ebp movl%esp, %ebp pushl %edi pushl %esi pushl %ebx subl$20, %esp movl24(%ebp), %eax movl4(%eax), %esi pushl $60 leal8132(%esi), %ebx pushl 28(%ebp) pushl %ebx callmemcpy <= movl$0, 24(%ebx) movl16(%ebp), %eax movl%eax, 52(%ebx) movl24(%ebp), %edx addl$8192, %esi movl%ebx, 516(%edx) movl%esi, -32(%ebp) movl%esi, 504(%edx) movl$ret_from_fork, 512(%edx) Jakub, is there a way to instruct gcc to inine this copy, or better yet, to use user-supplied inline version of memcpy? -- vda - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: no need to check for NULL before calling kfree() -fs/ext2/
Hi, Paul Jackson writes: Even such obvious changes as removing redundant checks doesn't seem to ensure a performance improvement. Jesper Juhl posted performance data for such changes in his microbenchmark a couple of days ago. It is not a performance issue, it's an API issue. Please note that kfree() is analogous libc free() in terms of NULL checking. People are checking NULL twice now because they're confused whether kfree() deals it or not. Paul Jackson writes: Maybe we should be following your good advice: > You don't know that until you profile! instead of continuing to make these code changes. I am all for profiling but it should not stop us from merging the patches because we can restore the generated code with the included (totally untested) patch. Pekka Signed-off-by: Pekka Enberg <[EMAIL PROTECTED]> --- Index: 2.6/include/linux/slab.h === --- 2.6.orig/include/linux/slab.h 2005-03-22 14:31:30.0 +0200 +++ 2.6/include/linux/slab.h2005-03-30 09:08:13.0 +0300 @@ -105,8 +105,14 @@ return __kmalloc(size, flags); } +static inline void kfree(const void * p) +{ + if (!p) + return; + __kfree(p); +} + extern void *kcalloc(size_t, size_t, int); -extern void kfree(const void *); extern unsigned int ksize(const void *); extern int FASTCALL(kmem_cache_reap(int)); Index: 2.6/mm/slab.c === --- 2.6.orig/mm/slab.c 2005-03-22 14:31:31.0 +0200 +++ 2.6/mm/slab.c 2005-03-30 09:08:45.0 +0300 @@ -2567,13 +2567,11 @@ * Don't free memory not originally allocated by kmalloc() * or you will run into trouble. */ -void kfree (const void *objp) +void __kfree (const void *objp) { kmem_cache_t *c; unsigned long flags; - if (!objp) - return; local_irq_save(flags); kfree_debugcheck(objp); c = GET_PAGE_CACHE(virt_to_page(objp)); @@ -2581,7 +2579,7 @@ local_irq_restore(flags); } -EXPORT_SYMBOL(kfree); +EXPORT_SYMBOL(__kfree); #ifdef CONFIG_SMP /** - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ACPI] Re: [BKPATCH] ACPI for 2.6.12-rc1
On Tuesday 29 March 2005 16:13, Romano Giannetti wrote: > This is to report an issue with 2.6.11 and ACPI battery/ac. The resume is: > acpi battery with preemptive kernel do not work, while the same kernel > with no preempt works ok. I have tried to collect all the possible info; > tell me if you need something more. > > The details: > > The working kernel is 2.6.11 with the patch from the acpi-devel list to > fix acpi keys (not working otherwise). See for a description > http://bugme.osdl.org/show_bug.cgi?id=4124 If you can find AE_AML_BUFFER_LIMIT in your long, then, it should be interpreter bug. please see http://bugzilla.kernel.org/show_bug.cgi?id=4150 Otherwise, maybe it is related to EC driver. -- Thanks, Luming - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/2] fork_connector: add a fork connector
On Tue, 29 Mar 2005, Jay Lan wrote: > The fork_connector is not designed to solve accounting data collection > problem. > > The accounting data collection must be done via a hook from do_exit(). by the time do_exit() occurs the parent may have disappeared... you do need to record something at fork() time so that you can account to the correct ancestor. an example of where this ancestry is useful would be the summation of all cpu time spent by children of apache, spamd, clamd, ... -dean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 0/8] CKRM: Core patch set
gerrit wrote: > This is the core patch set for CKRM Welcome. Newcomers to CKRM might want to start reading these patches with "[patch 8/8] CKRM: Documentation". Starting with patch 0/8 or 1/8 will be difficult, at least if you're as dimm witted as I am. Even the documentation included in patch 8/8 is missing the motivation and context essential to understanding this patch set. It might have helped if the Introduction text at http://ckrm.sourceforge.net/ had been included in some form, as part of patch 0/8. I'm just a little penguin here (lkml), but from what I can tell by watching how things work, you're going to have to "make the case" -- explain what this is, how it's put togeher, and why it's needed. This is a sizable patch, in lines of code, in hooks in critical places, and in amount of "new concepts." I presume (unless you've managed to bribe or blackmail some big penguin) you're going to have convince some others that this is worth having. I for one am a CKRM skeptic, so won't be much help to you in that quest. Good luck. I don't see any performance numbers, either on small systems, or scalability on large systems. Certainly this patch does not fall under the "obviously no performance impact" exclusion. Here's a combined diffstat showing how much code is added by these patches, where. Some of the patches have individual diffstat's, some don't seem to. Documentation/ckrm/TODO | 17 Documentation/ckrm/ckrm_basics | 66 ++ Documentation/ckrm/core_usage| 72 +++ Documentation/ckrm/crbce | 33 + Documentation/ckrm/installation | 70 +++ Documentation/ckrm/rbce_basics | 67 ++ Documentation/ckrm/rbce_usage| 98 fs/Makefile |1 fs/exec.c|2 fs/proc/array.c | 18 fs/proc/base.c | 17 fs/proc/internal.h |1 fs/rcfs/Makefile |9 fs/rcfs/dir.c| 220 + fs/rcfs/inode.c | 160 ++ fs/rcfs/magic.c | 517 ++ fs/rcfs/rootdir.c| 220 + fs/rcfs/socket_fs.c | 280 fs/rcfs/super.c | 291 fs/rcfs/tc_magic.c | 93 include/linux/ckrm_ce.h | 95 include/linux/ckrm_events.h | 230 +- include/linux/ckrm_net.h | 42 + include/linux/ckrm_rc.h | 345 +++ include/linux/ckrm_tc.h | 46 ++ include/linux/ckrm_tsk.h | 35 + include/linux/rcfs.h | 116 - include/linux/sched.h| 105 include/linux/taskdelays.h | 35 + include/net/sock.h |3 include/net/tcp.h|4 init/Kconfig | 68 ++ init/main.c |2 kernel/Makefile |1 kernel/ckrm/Makefile | 14 kernel/ckrm/ckrm.c | 892 +++ kernel/ckrm/ckrm_events.c| 86 +++ kernel/ckrm/ckrm_numtasks.c | 522 ++ kernel/ckrm/ckrm_numtasks_stub.c | 53 ++ kernel/ckrm/ckrm_sockc.c | 559 kernel/ckrm/ckrm_tc.c| 745 kernel/ckrm/ckrmutils.c | 188 kernel/exit.c|3 kernel/fork.c| 12 kernel/sched.c | 20 kernel/sys.c | 11 mm/memory.c | 10 net/ipv4/tcp_ipv4.c |5 48 files changed, 6460 insertions(+), 39 deletions(-) A couple of nits: 1) Instead of disabling routines with #defines: #define numtasks_put_ref(core_class) do {} while (0) one can do it with static inlines, preserving more compiler checking. 2) I take it that the following constitutes the 'documentation' for what is in /proc//delay. Perhaps I missed something. + res = sprintf(buffer,"%u %llu %llu %u %llu %u %llu\n", + (unsigned int) get_delay(task,runs), + (uint64_t) get_delay(task,runcpu_total), + (uint64_t) get_delay(task,waitcpu_total), + (unsigned int) get_delay(task,num_iowaits), + (uint64_t) get_delay(task,iowait_total), + (unsigned int) get_delay(task,num_memwaits), + (uint64_t) get_delay(task,mem_iowait_total) 3) Typo in init/Kconfig "atleast": If you say Y here, enable the Resource Class File System and atleast -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe
Re: prefetch on ppc64
Antonio Vargas writes: > Don't know exactly about power5, but G5 processor is described on IBM > docs as doing automatic whole-page prefetch read-ahead when detecting > linear accesses. Sure, but linked lists would rarely be laid out linearly in memory. Paul. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Driver States
On Sun, 27 Mar 2005, Adam Belay wrote: > Dynamic power management may require devices and drivers to transition > between various physical and logical states. I would like to start a > discussion on how these might be defined at the bus, driver, and class > levels. > Bus Level > = > At the bus level, there are two state attributes, power and > enable/disable. Enable/disable may mean different things on different > buses, but they generally refer to resource decoding. A device can only > be enabled during a non-off power state. <...> > Driver Level > > At the driver level there are two areas of interest, physical and > logical state. There is an additional concern of transitioning between > these states multiple times. Because a driver acts as a bridge between > physical and logical components, I think separating these steps seems > natural. <...> > *attach - allocates data structures, creates sysfs entries, prepares driver >to handle the hardware. > > *start - Sets up device resources and configures the hardware. Loads > firmware, etc. > (physical) > > *open - engages the hardware, and makes it usable by the class device. > (logical and physical) > > *close - disengages the hardware, and stops class level access > (logical and physical) > > *stop - physically disables the hardware > (physical) > > *detach - tears down the driver and releases it from the "struct device" > You have a few things here that can easily conflict, and that will be developed at different paces. I like the direction that it's going, but how do you intend to do it gradually. I.e. what to do first? Pat - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ACPI] 2.6.12-rc1-mm[1-3]: ACPI battery monitor does not work
On Tuesday 29 March 2005 17:56, Rafael J. Wysocki wrote: > Hi, > > There is a problem on my box (Asus L5D, x86-64 kernel) with the ACPI > battery driver in the 2.6.12-rc1-mm[1-3] kernels. Namely, the battery > monitor that I use (the kpowersave applet from SUSE 9.2) is no longer able > to report the battery status (ie how much % it is loaded). It can only > check if the AC power is connected (if it is connected, kpowersave behaves > as though there was no battery in the box, and if it is not connected, > kpowersave always shows that the battery is 1% loaded). > > Also, there are big latencies on loading and accessing the battery module, > but the module loads successfully and there's nothing suspicious in dmesg. > > Please let me know if you need any additional information. > > Greets, > Rafael Could you just revert ec-mode patch, then retest? -- Thanks, Luming - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] embarassing typo
On Tuesday 29 March 2005 20:40, Dmitry Torokhov wrote: >On Tuesday 29 March 2005 16:58, Michael Tokarev wrote: >> Well, it's a matter of readability mostly. For now at least, when >> char is always 8 bytes... > >Wow, that's one huge char you have there ;) Yeah, I was gonna ask what language is so complex as to need an 8 byte char? Certainly not an earthly one I'd think ;) -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.34% setiathome rank, not too shabby for a WV hillbilly Yahoo.com and AOL/TW attorneys please note, additions to the above message by Gene Heskett are: Copyright 2005 by Maurice Eugene Heskett, all rights reserved. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/2] fork_connector: add a fork connector
On Tue, 2005-03-29 at 07:35 -0800, Paul Jackson wrote: > Guillaume wrote: > > I ran some test using the CBUS instead of the cn_netlink_send() routine > > and the overhead is nearly 0%: > > Overhead of what? Does this include merging the data and getting it to > disk? I test the overhead of sending the fork information to a user space application. The merge of the data is done later and it has nothing to do with the fork connector... > Am I even asking the right question here - is it true that this data, > when collected for accounting purposes, needs to go to disk, and that > summarizing and analyzing the data is done 'off-line', perhaps hours > later? That's the way it was 25 years ago ... but perhaps the basic > data flow appropriate for accounting has changed since then. Accounting is another problem and, as you said previously, summarizing and analyzing the data is done later. I'm sorry but I really don't understand why you're speaking about accounting when I present results about fork connector. I agree that ELSA is using the fork connector but the fork connector has nothing to do with accounting. Regards, Guillaume - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Pageset Localization V2
This patch modifies the way pagesets in struct zone are allocated. It relocates the pagesets contained in a zone for each cpu to the node that is nearest to the cpu instead keeping the pagesets in the (possibly remote) target zone. This means that the operations to manage caches of pages on remote zones can be done with information available in the local zone. The patch depends on the API changes to the slab allocator posted before this patch. AIM7 benchmark on a 32 CPU SMP system: w/o patches: Tasksjobs/min jti jobs/min/task real cpu 1 484.68 100 484.6769 12.01 1.97 Fri Mar 25 11:01:42 2005 10027140.46 89 271.4046 21.44148.71 Fri Mar 25 11:02:04 2005 20030792.02 82 153.9601 37.80296.72 Fri Mar 25 11:02:42 2005 30032209.27 81 107.3642 54.21451.34 Fri Mar 25 11:03:37 2005 40034962.83 7887.4071 66.59588.97 Fri Mar 25 11:04:44 2005 50031676.92 7563.3538 91.87742.71 Fri Mar 25 11:06:16 2005 60036032.69 7360.0545 96.91885.44 Fri Mar 25 11:07:54 2005 70035540.43 7750.7720114.63 1024.28 Fri Mar 25 11:09:49 2005 80033906.70 7442.3834137.32 1181.65 Fri Mar 25 11:12:06 2005 90034120.67 7337.9119153.51 1325.26 Fri Mar 25 11:14:41 2005 100034802.37 7434.8024167.23 1465.26 Fri Mar 25 11:17:28 2005 with Slab API changes and pageset patch: Tasksjobs/min jti jobs/min/task real cpu 1 485.00 100 485. 12.00 1.96 Fri Mar 25 11:46:18 2005 10028000.96 89 280.0096 20.79150.45 Fri Mar 25 11:46:39 2005 20032285.80 79 161.4290 36.05293.37 Fri Mar 25 11:47:16 2005 30040424.15 84 134.7472 43.19438.42 Fri Mar 25 11:47:59 2005 40039155.01 7997.8875 59.46590.05 Fri Mar 25 11:48:59 2005 50037881.25 8275.7625 76.82730.19 Fri Mar 25 11:50:16 2005 60039083.14 7865.1386 89.35872.79 Fri Mar 25 11:51:46 2005 70038627.83 7755.1826105.47 1022.46 Fri Mar 25 11:53:32 2005 80039631.94 7849.5399117.48 1169.94 Fri Mar 25 11:55:30 2005 90036903.70 7941.0041141.94 1310.78 Fri Mar 25 11:57:53 2005 100036201.23 7736.2012160.77 1458.31 Fri Mar 25 12:00:34 2005 The major improvement is in the mid range when running 100-600 tasks. For 1 task there is barely any improvement since most data will be locally allocated. In the high range other factors seem to become important. Patch against 2.6.11.6-bk3 Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> Signed-off-by: Shobhit Dayal <[EMAIL PROTECTED]> Signed-off-by: Shai Fultheim <[EMAIL PROTECTED]> Index: linux-2.6.11/drivers/base/node.c === --- linux-2.6.11.orig/drivers/base/node.c 2005-03-21 13:18:06.0 -0800 +++ linux-2.6.11/drivers/base/node.c2005-03-21 13:22:06.0 -0800 @@ -87,7 +87,7 @@ static ssize_t node_read_numastat(struct for (i = 0; i < MAX_NR_ZONES; i++) { struct zone *z = >node_zones[i]; for (cpu = 0; cpu < NR_CPUS; cpu++) { - struct per_cpu_pageset *ps = >pageset[cpu]; + struct per_cpu_pageset *ps = z->pageset[cpu]; numa_hit += ps->numa_hit; numa_miss += ps->numa_miss; numa_foreign += ps->numa_foreign; Index: linux-2.6.11/include/linux/mm.h === --- linux-2.6.11.orig/include/linux/mm.h2005-03-21 13:18:06.0 -0800 +++ linux-2.6.11/include/linux/mm.h 2005-03-21 13:22:06.0 -0800 @@ -691,6 +691,7 @@ extern void mem_init(void); extern void show_mem(void); extern void si_meminfo(struct sysinfo * val); extern void si_meminfo_node(struct sysinfo *val, int nid); +extern void setup_per_cpu_pageset(void); /* prio_tree.c */ void vma_prio_tree_add(struct vm_area_struct *, struct vm_area_struct *old); Index: linux-2.6.11/include/linux/mmzone.h === --- linux-2.6.11.orig/include/linux/mmzone.h2005-03-21 13:21:59.0 -0800 +++ linux-2.6.11/include/linux/mmzone.h 2005-03-21 13:22:06.0 -0800 @@ -122,7 +122,7 @@ struct zone { */ unsigned long lowmem_reserve[MAX_NR_ZONES]; - struct per_cpu_pageset pageset[NR_CPUS]; + struct per_cpu_pageset *pageset[NR_CPUS]; /* * free areas of different sizes Index: linux-2.6.11/init/main.c === --- linux-2.6.11.orig/init/main.c
Re: Mac mini sound woes
On Wed, 2005-03-30 at 03:48 +0200, Marcin Dalecki wrote: > On 2005-03-30, at 01:39, Benjamin Herrenschmidt wrote: > > Look at the pile of junk that are most winmodem driver implementations, > > nothing I want to see in the kernel ever. Those things should be in > > userland. > > You are joking? Linux IS NOT an RT OS. Are you joking? Any system that can capture audio, do a little DSP on it and play it back without skipping can drive a Winmodem. Are you saying Linux can't possibly do that because it's not an RTOS? I bet you could implement a Winmodem driver as a JACK client. Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Aligning file system data
On Tue, Mar 29, 2005 at 11:32:16PM -0500, John Richard Moser wrote: > Does crossing a > track boundary incur anything expensive? AFAIK, yes. It's going to involve some kind of seeking (even a head switch needs microjogging on modern drives), and it will certainly add latency (although I don't remember how much, off the top of my head). However, trying to control this from the kernel may be vastly harder than you're expecting (assuming a modern hard drive). You may want to look at these pages for more info: http://www.storagereview.com/guide2000/ref/hdd/geom/tracksZBR.html http://www.storagereview.com/guide2000/ref/hdd/geom/geomLogical.html Also look at the last paragraph on this page -- not the paragraph with the "Stop" sign, but the one after it: http://www.storagereview.com/guide2000/ref/hdd/geom/formatDefect.html I think this could in fact be done, but it would be a lot of effort, and the kernel would need knowledge on a per-drive-model basis (or at least it would need a way to obtain such knowledge from user space, and the per-model knowledge would need to be stored there somehow). For all I know, vendor-specific commands might also be needed in order to find out which blocks are remapped, in order to use that knowledge to avoid changing tracks spuriously. (And one other note: Since your device almost certainly has many tracks with well over 256 sectors in reality, your device is actually incapable of reading or writing a single track with a single ATA command unless it supports LBA48.) -Barry K. Nathan <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/2] fork_connector: add a fork connector
On Tue, 2005-03-29 at 07:23 -0800, Paul Jackson wrote: > Guillaume wrote: > > The goal of the fork connector is to inform a user space application > > that a fork occurs in the kernel. This information (cpu ID, parent PID > > and child PID) can be used by several user space applications. It's not > > only for accounting. Accounting and fork_connector are two different > > things and thus, fork_connector doesn't do the merge of any kinds of > > data (and it will never do). > > Yes - it is clear that the fork_connector does this - inform user space > of fork information . I'm not saying that > fork_connector should merge data; I'm observing that it doesn't, and > that this would seem to serve the needs of accounting poorly. > > Out of curiosity, what are these 'several user space applications?' The > only one I know of is this extension to bsd accounting to include > capturing parent and child pid at fork. Probably you've mentioned some > other uses of fork_connector before here, but I missed it. During the discussion some people like Erich Focht and Ram mentioned that this information can be useful for them. I remember that Erich had in mind something like cluster-wide pid tracking in user space. When I wrote "several user space applications" it was just to say that this fork connector is not designed only for ELSA and fork information is available to every listeners. Regards, Guillaume - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: prefetch on ppc64
On Wed, 30 Mar 2005 13:55:25 +1000, Paul Mackerras <[EMAIL PROTECTED]> wrote: > Serge E. Hallyn writes: > > > While investigating the inordinate performance impact one of my patches > > seemed to be having, we tracked it down to two hlist_for_each_entry > > loops, and finally to the prefetch instruction in the loop. > > I would be interested to know what results you get if you leave the > loops using hlist_for_each_entry but change prefetch() and prefetchw() > to do the dcbt or dcbtst instruction only if the address is non-zero, > like this: > > static inline void prefetch(const void *x) > { > if (x) > __asm__ __volatile__ ("dcbt 0,%0" : : "r" (x)); > } > > static inline void prefetchw(const void *x) > { > if (x) > __asm__ __volatile__ ("dcbtst 0,%0" : : "r" (x)); > } > > It seems that doing a prefetch on a NULL pointer, while it doesn't > cause a fault, does waste time looking for a translation of the zero > address. > > Paul. Don't know exactly about power5, but G5 processor is described on IBM docs as doing automatic whole-page prefetch read-ahead when detecting linear accesses. -- Greetz, Antonio Vargas aka winden of network http://wind.codepixel.com/ Las cosas no son lo que parecen, excepto cuando parecen lo que si son. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Aligning file system data
In article <[EMAIL PROTECTED]> you wrote: > How likely is it that I can actually align stuff to 31.5KiB on the > physical disk, i.e. have each block be a track? It is not that easy to allign on tracks, even on raw partition. Some disks have different length of tracks (of course because the inner cylinders are shorter), some show a totally different geometry than they have internally, and the disks are happyly remapping. With raid and lvm the situation get worse. Why do you want to do thoe micro optimizations? With a filesystem in between you have virtuelly no way to allign larger files for streaming. Let the buffer cache and prefetch do, what they are intended for and feel happy. Greetings Bernd - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Aligning file system data
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Well then, the verdict is reached. My original design is based around storing related data in the same block so that the track cache allows me to evade doing reads while I poke around. The design will stay the same; but the dependency on the track cache will dissappear. I'll simply consider 32KiB or 64KiB to be a nice block size, 64KiB being the biggest, and leverage the design on the kernel reading whole blocks into main memory to play with at a time. Back to designing my file system. . . . The only lasting regrets I have is that I don't have a good, fast way to do on-disk locking for a cluster file system. This would make my FS a complete solution. . . . It doesn't matter, finishing the design is a while off anyway. I still have to define several extended journal transaction types to support fault tolerant dynamic resizing (grow, shrink) while running. I don't see how to grow left; shrinking from the left is easy enough. Wait, suddenly I see how to grow left: Superblock at the end, and a bit of magic. . . . Robert Hancock wrote: > John Richard Moser wrote: > >> How likely is it that I can actually align stuff to 31.5KiB on the >> physical disk, i.e. have each block be a track? > > > I don't think this is very likely. Even being able to find out what the > physical disk arrangement is, or whether it is consistent in terms of > track size, etc. seems unlikely. > >> >> Rather than leveraging the track cache, would it be less expensive for >> me to simply read in blocks totaling about 16 or 32KiB all at once? > > > For block sizes that small I think that the kernel should be smart > enough to do this itself, there is no need to concern with such low > level details in the application. > >> How much more latency is involved in (B) than in (C)? Does crossing a >> track boundary incur anything expensive? > > > Given that both the disk and the kernel will likely read far more than > 32KB ahead I can't see much difference other than the overhead inside > your application.. > - -- All content of all messages exchanged herein are left in the Public Domain, unless otherwise explicitly stated. Creative brains are a valuable, limited resource. They shouldn't be wasted on re-inventing the wheel when there are so many fascinating new problems waiting out there. -- Eric Steven Raymond -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD4DBQFCSjmPhDd4aOud5P8RAgB7AJiWq4Qiyfk1G0SJa+5ZCtJ//WH8AJ9ysogo 3z6+FLvkNgyU/k0o9HBf1w== =OPXo -END PGP SIGNATURE- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
API changes to the slab allocator for NUMA memory allocation
The patch makes the following function calls available to allocate memory on a specific node without changing the basic operation of the slab allocator: kmem_cache_alloc_node(kmem_cache_t *cachep, unsigned int flags, int node); kmalloc_node(size_t size, unsigned int flags, int node); These are similar then to the existing node-blind functions: kmem_cache_alloc(kmem_cache_t *cachep, unsigned int flags); kmalloc(size, flags); The implementation for kmalloc_node is a slight variation on the old kmalloc function. kmem_cache_alloc_node was changed to pass flags and the node information through the existing layers of the slab allocator (which lead to some minor rearrangements). The functions at the lowest layer (kmem_getpages, cache_grow) are already node aware. Also __alloc_percpu can call kmalloc_node now. This patch is necessary for the pageset localization patch posted after this patch. The pageset patch also contains results of an AIM7 benchmark that exercises this patch. Patch against 2.6.11.6-bk3 Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> Index: linux-2.6.11/include/linux/slab.h === --- linux-2.6.11.orig/include/linux/slab.h 2005-03-29 15:02:20.0 -0800 +++ linux-2.6.11/include/linux/slab.h 2005-03-29 18:17:19.0 -0800 @@ -61,15 +61,6 @@ extern kmem_cache_t *kmem_cache_create(c void (*)(void *, kmem_cache_t *, unsigned long)); extern int kmem_cache_destroy(kmem_cache_t *); extern int kmem_cache_shrink(kmem_cache_t *); -extern void *kmem_cache_alloc(kmem_cache_t *, unsigned int __nocast); -#ifdef CONFIG_NUMA -extern void *kmem_cache_alloc_node(kmem_cache_t *, int); -#else -static inline void *kmem_cache_alloc_node(kmem_cache_t *cachep, int node) -{ - return kmem_cache_alloc(cachep, GFP_KERNEL); -} -#endif extern void kmem_cache_free(kmem_cache_t *, void *); extern unsigned int kmem_cache_size(kmem_cache_t *); @@ -80,9 +71,23 @@ struct cache_sizes { kmem_cache_t*cs_dmacachep; }; extern struct cache_sizes malloc_sizes[]; -extern void *__kmalloc(size_t, unsigned int __nocast); -static inline void *kmalloc(size_t size, unsigned int __nocast flags) +extern void *__kmalloc_node(size_t, unsigned int __nocast, int node); +#ifdef CONFIG_NUMA +extern void *kmem_cache_alloc_node(kmem_cache_t *, unsigned int __nocast, int); +#define kmem_cache_alloc(cachep, flags) kmem_cache_alloc_node(cachep, flags, -1) +#else +extern void *kmem_cache_alloc(kmem_cache_t *, unsigned int __nocast); +#define kmem_cache_alloc_node(cachep, flags, node) kmem_cache_alloc(cachep, flags) +#endif + +#define __kmalloc(size, flags) __kmalloc_node(size, flags, -1) +#define kmalloc(size, flags) kmalloc_node(size, flags, -1) + +/* + * Allocating memory on a specific node. + */ +static inline void *kmalloc_node(size_t size, unsigned int flags, int node) { if (__builtin_constant_p(size)) { int i = 0; @@ -98,11 +103,11 @@ static inline void *kmalloc(size_t size, __you_cannot_kmalloc_that_much(); } found: - return kmem_cache_alloc((flags & GFP_DMA) ? + return kmem_cache_alloc_node((flags & GFP_DMA) ? malloc_sizes[i].cs_dmacachep : - malloc_sizes[i].cs_cachep, flags); + malloc_sizes[i].cs_cachep, flags, node); } - return __kmalloc(size, flags); + return __kmalloc_node(size, flags, node); } extern void *kcalloc(size_t, size_t, unsigned int __nocast); Index: linux-2.6.11/mm/slab.c === --- linux-2.6.11.orig/mm/slab.c 2005-03-29 15:02:20.0 -0800 +++ linux-2.6.11/mm/slab.c 2005-03-29 15:02:27.0 -0800 @@ -676,7 +676,7 @@ static struct array_cache *alloc_arrayca kmem_cache_t *cachep; cachep = kmem_find_general_cachep(memsize, GFP_KERNEL); if (cachep) - nc = kmem_cache_alloc_node(cachep, cpu_to_node(cpu)); + nc = kmem_cache_alloc_node(cachep, GFP_KERNEL, cpu_to_node(cpu)); } if (!nc) nc = kmalloc(memsize, GFP_KERNEL); @@ -1988,7 +1988,7 @@ bad: #define check_slabp(x,y) do { } while(0) #endif -static void *cache_alloc_refill(kmem_cache_t *cachep, unsigned int __nocast flags) +static void *cache_alloc_refill(kmem_cache_t *cachep, unsigned int __nocast flags, int node) { int batchcount; struct kmem_list3 *l3; @@ -2070,7 +2070,7 @@ alloc_done: if (unlikely(!ac->avail)) { int x; - x = cache_grow(cachep, flags, -1); + x = cache_grow(cachep, flags, node); // cache_grow can reenable interrupts, then ac could change. ac = ac_data(cachep); @@ -2140,7 +2140,7 @@ cache_alloc_debugcheck_after(kmem_cache_
Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-10
On Sun, 2005-03-27 at 10:58 +0200, Ingo Molnar wrote: > * Lee Revell <[EMAIL PROTECTED]> wrote: > > > Running for several days with PREEMPT_DESKTOP, on the Athlon XP the > > worst latency I am seeing is ~150 usecs! But on the C3 its about 4ms: > > could you run a bit with tracing disabled (in the .config) on the C3? > (but wakeup timing still enabled) It may very well be tracing overhead > that makes those latencies that high. Also, we'd thus have some hard > data on how much overhead tracing is in such a situation, on that CPU. > I have not left it to run overnight yet with the swappiness set to 100, which triggers the biggest latencies as my entire desktop is swapped out, but so far it looks like the problem was tracing overhead. With timing enabled but tracing disabled the longest latency on the C3 so far is 270 usecs. An important giveaway is that with tracing enabled the same code path only triggers ~200 usec latencies on the K7 but ~2ms on the C3. Since the longest latency with PREEMPT_DESKTOP is normally more a function of memory bandwidth than processor speed, and the machines differ much more in the latter, this agrees with the theory that the overhead is the problem. Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Mac mini sound woes
On Wed, 2005-03-30 at 03:45 +0200, Marcin Dalecki wrote: > > I think your misunderstanding is that you beliieve user-space can't do > > RT. It's wrong. See JACK (jackit.sf.net), for example. > > I know JACK in and out. It doesn't provide what you claim. Are you implying that "He don't know JACK!" Sorry, couldn't resist. Move along now, nothing to see here :-) God it's late, I need to go to bed. Is that an American phrase. If so, it might not be understood elsewhere. So just in case others don't understand this stupid joke. There's a phrase "You don't know Jack" which is equivalent to saying "you don't know what you're talking about". Which makes this kind of a pun. -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Aligning file system data
John Richard Moser wrote: How likely is it that I can actually align stuff to 31.5KiB on the physical disk, i.e. have each block be a track? I don't think this is very likely. Even being able to find out what the physical disk arrangement is, or whether it is consistent in terms of track size, etc. seems unlikely. Rather than leveraging the track cache, would it be less expensive for me to simply read in blocks totaling about 16 or 32KiB all at once? For block sizes that small I think that the kernel should be smart enough to do this itself, there is no need to concern with such low level details in the application. How much more latency is involved in (B) than in (C)? Does crossing a track boundary incur anything expensive? Given that both the disk and the kernel will likely read far more than 32KB ahead I can't see much difference other than the overhead inside your application.. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Aligning file system data
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 How likely is it that I can actually align stuff to 31.5KiB on the physical disk, i.e. have each block be a track? Rather than leveraging the track cache, would it be less expensive for me to simply read in blocks totaling about 16 or 32KiB all at once? Let's say I have two situations... A) My blocks are all 31.5KiB (512 bytes/sector * 63 sectors) and aligned to tracks. The track cache on the disk stores the entire block, so repeted reads to the disk are 0mS seek. I leverage this to read a couple sectors at a time and seek as I care within the block while it's cached, making several requests to the ATA device. B) My blocks are all 32KiB and cross track boundaries. All of them exist in part in two separate tracks. Upon reading a block, I request the entire block and work with it in main memory. Which situation has less overhead? C) My blocks are all 31.5KiB and perfectly aligned within tracks. I read the entire block as in (B) and work with it in main memory. How much more latency is involved in (B) than in (C)? Does crossing a track boundary incur anything expensive? - -- All content of all messages exchanged herein are left in the Public Domain, unless otherwise explicitly stated. Creative brains are a valuable, limited resource. They shouldn't be wasted on re-inventing the wheel when there are so many fascinating new problems waiting out there. -- Eric Steven Raymond -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFCSivPhDd4aOud5P8RAszeAJ4wPonhpXas8IprMBUq8/NdM57aegCdEBva 24LXB3O+7GEE0XKxPBFr1L0= =iTEm -END PGP SIGNATURE- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ppc32: CPM2 PIC cleanup irq_to_siubit array
Done, updated patch w/comment sent to Andrew. - kumar On Mar 29, 2005, at 7:10 PM, Dan Malek wrote: On Mar 29, 2005, at 5:30 PM, Kumar Gala wrote: > Cleaned up irq_to_siubit array so we no longer need to do 1 << > (31-bit), > just 1 << bit. Will you please put a comment in here that indicates this array now has this computation done? When I wrote it, these bit numbers matched the registers and the documentation, so I didn't take the time to explain. :-) Thanks. -- Dan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ppc32: CPM2 PIC cleanup irq_to_siubit array (updated)
Andrew, (Updated this patch to include a comment at Dan Malek's request.) Cleaned up irq_to_siubit array so we no longer need to do 1 << (31-bit), just 1 << bit. Signed-off-by: Kumar Gala <[EMAIL PROTECTED]> --- diff -Nru a/arch/ppc/syslib/cpm2_pic.c b/arch/ppc/syslib/cpm2_pic.c --- a/arch/ppc/syslib/cpm2_pic.c2005-03-29 22:15:24 -06:00 +++ b/arch/ppc/syslib/cpm2_pic.c2005-03-29 22:15:24 -06:00 @@ -32,15 +32,17 @@ 0, 0, 0, 0, 0, 0, 0, 0 }; +/* bit numbers do not match the docs, these are precomputed so the bit for + * a given irq is (1 << irq_to_siubit[irq]) */ static u_char irq_to_siubit[] = { - 31, 16, 17, 18, 19, 20, 21, 22, - 23, 24, 25, 26, 27, 28, 29, 30, - 29, 30, 16, 17, 18, 19, 20, 21, - 22, 23, 24, 25, 26, 27, 28, 31, -0, 1, 2, 3, 4, 5, 6, 7, -8, 9, 10, 11, 12, 13, 14, 15, - 15, 14, 13, 12, 11, 10, 9, 8, -7, 6, 5, 4, 3, 2, 1, 0 +0, 15, 14, 13, 12, 11, 10, 9, +8, 7, 6, 5, 4, 3, 2, 1, +2, 1, 15, 14, 13, 12, 11, 10, +9, 8, 7, 6, 5, 4, 3, 0, + 31, 30, 29, 28, 27, 26, 25, 24, + 23, 22, 21, 20, 19, 18, 17, 16, + 16, 17, 18, 19, 20, 21, 22, 23, + 24, 25, 26, 27, 28, 29, 30, 31, }; static void cpm2_mask_irq(unsigned int irq_nr) @@ -54,7 +56,7 @@ word = irq_to_siureg[irq_nr]; simr = &(cpm2_immr->im_intctl.ic_simrh); - ppc_cached_irq_mask[word] &= ~(1 << (31 - bit)); + ppc_cached_irq_mask[word] &= ~(1 << bit); simr[word] = ppc_cached_irq_mask[word]; } @@ -69,7 +71,7 @@ word = irq_to_siureg[irq_nr]; simr = &(cpm2_immr->im_intctl.ic_simrh); - ppc_cached_irq_mask[word] |= (1 << (31 - bit)); + ppc_cached_irq_mask[word] |= 1 << bit; simr[word] = ppc_cached_irq_mask[word]; } @@ -85,9 +87,9 @@ simr = &(cpm2_immr->im_intctl.ic_simrh); sipnr = &(cpm2_immr->im_intctl.ic_sipnrh); - ppc_cached_irq_mask[word] &= ~(1 << (31 - bit)); + ppc_cached_irq_mask[word] &= ~(1 << bit); simr[word] = ppc_cached_irq_mask[word]; - sipnr[word] = 1 << (31 - bit); + sipnr[word] = 1 << bit; } static void cpm2_end_irq(unsigned int irq_nr) @@ -103,7 +105,7 @@ word = irq_to_siureg[irq_nr]; simr = &(cpm2_immr->im_intctl.ic_simrh); - ppc_cached_irq_mask[word] |= (1 << (31 - bit)); + ppc_cached_irq_mask[word] |= 1 << bit; simr[word] = ppc_cached_irq_mask[word]; } } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Mac mini sound woes
On Wed, 2005-03-30 at 03:45 +0200, Marcin Dalecki wrote: > > I think your misunderstanding is that you beliieve user-space can't do > > RT. It's wrong. See JACK (jackit.sf.net), for example. > > I know JACK in and out. It doesn't provide what you claim. > This was just an example, to prove the point that user space can do RT just fine. JACK can do low latency sample accurate audio, and mixing and volume control are fairly trivial compared to what some JACK clients do. If it works well enough for professional hard disk recording systems, then it can certainly handle system sounds and playing movies and MP3s. And as a matter of fact you can implement all the audio needs of a desktop system with JACK, this is what Linspire is doing for the next release, even though it wasn't designed for this. The system mixer is just a JACK mixing client and each app opens ports for I/O, and only JACK talks to the hardware (through ALSA). The fact that OSX and Windows do this in the kernel is not a good argument, those kernels are bloated. Windows drivers also do things like AC3 decoding in the kernel. And the OSX kernel uses 16K stacks. If audio does not work as well OOTB as on those other OSes, it's an indication of their relative maturity vs JACK/ALSA, not an inherently superior design. Most audio people consider JACK + ALSA a better design than anything in the proprietary world (CoreAudio, ASIO). Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.30-rc3 md/ext3 problems (ext3 gurus : please check)
On Tuesday March 29, [EMAIL PROTECTED] wrote: > > Attached is the backout patch, for convenience. Thanks. I had another look, and think I may be able to see the problem. If I'm right, it is a problem with this patch. > diff -Nru a/fs/jbd/commit.c b/fs/jbd/commit.c > --- a/fs/jbd/commit.c 2005-03-29 18:50:55 -03:00 > +++ b/fs/jbd/commit.c 2005-03-29 18:50:55 -03:00 > @@ -92,7 +92,7 @@ > struct buffer_head *wbuf[64]; > int bufs; > int flags; > - int err = 0; > + int err; > unsigned long blocknr; > char *tagp = NULL; > journal_header_t *header; > @@ -299,8 +299,6 @@ > spin_unlock(_datalist_lock); > unlock_journal(journal); > wait_on_buffer(bh); > - if (unlikely(!buffer_uptodate(bh))) > - err = -EIO; > /* the journal_head may have been removed now */ > lock_journal(journal); > goto write_out_data; I think the "!buffer_update(bh)" test is not safe at this point as, after the wait_on_buffer which could cause a schedule, the bh may no longer exist, or be for the same block. There doesn't seem to be any locking or refcounting that would keep it valid. Note the comment "the journal_head may have been removed now". If the journal_head is gone, the associated buffer_head is likely gone as well. I'm not certain that this is right, but it seems possible and would explain the symptoms. Maybe Stephen or Andrew could comments? > --- a/mm/filemap.c2005-03-29 18:50:55 -03:00 > +++ b/mm/filemap.c2005-03-29 18:50:55 -03:00 > @@ -3261,12 +3261,7 @@ > status = generic_osync_inode(inode, > OSYNC_METADATA|OSYNC_DATA); > } > > - /* > - * generic_osync_inode always returns 0 or negative value. > - * So 'status < written' is always true, and written should > - * be returned if status >= 0. > - */ > - err = (status < 0) ? status : written; > + err = written ? written : status; > out: > > return err; As an aside, this looks extremely dubious to me. There is a loop earlier in this routine (do_generic_file_write) that passes a piece-at-a-time of the write request to prepare_write / commit_write. Successes are counted in 'written'. A failure causes the loop to abort with a status in 'status'. If some of the write succeeded and some failed, then I believe the correct behaviour is to return the number of bytes that succeeded. However this change to the return status (remember the above patch is a reversal) causes any failure to over-ride any success. This, I think, is wrong. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i386/x86_64 segment register issuses (Re: PATCH: Fix x86 segment register access)
On Tue, Mar 29, 2005 at 06:44:18PM -0800, Linus Torvalds wrote: > > > On Tue, 29 Mar 2005, H. J. Lu wrote: > > > > > the smaller and faster version do not want to just rely on gas > > > automatically getting it right, especially since gas has historically been > > > very very bad at getting things right. > > > > We are fixing those issues in assembler. If people run into problems > > like that with gas, they can report them. They will be fixed. > > It's fine if gas fixes things. It's not fine if gas breaks things that > used to work, for no really good reason. > > > > What is the advantage of not allowing "movl %ds,mem"? Really? Especially > > > since I suspect the kernel is pretty much the only one who does this, and > > > the kernel really does do it on purpose. The kernel explicitly wants the > > > 32-bit version, knowing that the upper bits are undefined. > > > > > > > Kernel has > > > > unsigned gsindex; > > asm volatile("movl %%gs,%0" : "=g" (gsindex)); > > Ok, that's a real x86-64 bug, it seems. Andi, please fix, preferably by > just making the "g" be a "r". > > However, your argument isn't very valid, since: > > > The new assembler will make sure that it won't happen. > > Not true, since the suggestion was just to change all segment "movl" > things to "mov", at which point the same old bug is still there, and the > assembler didn't really help us at all. The new assembler won't accept movl %gs,128(%rsp) It makes it harder to generate binary code user doesn't tend. FWIW, what I suggested are in http://sourceware.org/ml/binutils/2005-03/msg00873.html Thera are things like - asm volatile("movl %%fs,%0" : "=g" (fsindex)); + asm volatile("movl %%fs,%0" : "=r" (fsindex)); > > See the problem? You're not actually protecting anything. The change just > makes it _harder_ to make sizes explicit, and suddenly we have to trust an > assembler to be clever about sizes, when that assembler historically has > definitely _not_ been very clever about them at all. > There is no such an instruction of "movl %ds,(%eax)". The old assembler accepts it and turns it into "movw %ds,(%eax)". It won't catch problems like unsigned fsindex; asm volatile("movl %%fs,%0" : "=m" (fsindex)); The "movw %ds,(%eax)" bug was fixed in binutils 2.15.94.0.1. Gas no longer generates 0x66 for it. If you find gas preventing you from doing what the hardware supports, I will be happy to fix it. H.J. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: prefetch on ppc64
Serge E. Hallyn writes: > While investigating the inordinate performance impact one of my patches > seemed to be having, we tracked it down to two hlist_for_each_entry > loops, and finally to the prefetch instruction in the loop. I would be interested to know what results you get if you leave the loops using hlist_for_each_entry but change prefetch() and prefetchw() to do the dcbt or dcbtst instruction only if the address is non-zero, like this: static inline void prefetch(const void *x) { if (x) __asm__ __volatile__ ("dcbt 0,%0" : : "r" (x)); } static inline void prefetchw(const void *x) { if (x) __asm__ __volatile__ ("dcbtst 0,%0" : : "r" (x)); } It seems that doing a prefetch on a NULL pointer, while it doesn't cause a fault, does waste time looking for a translation of the zero address. Paul. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[3/5] Orinoco merge updates, part the fourth: kill dump_recs
Remove the dump_recs debugging iwpriv command. It will be replaced later with the simpler and more flexible get_rid command. Signed-off-by: David Gibson <[EMAIL PROTECTED]> Index: working-2.6/drivers/net/wireless/orinoco.c === --- working-2.6.orig/drivers/net/wireless/orinoco.c 2005-03-24 15:57:43.0 +1100 +++ working-2.6/drivers/net/wireless/orinoco.c 2005-03-24 15:57:44.0 +1100 @@ -607,7 +607,6 @@ static int orinoco_ioctl(struct net_device *dev, struct ifreq *rq, int cmd); static int __orinoco_program_rids(struct net_device *dev); static void __orinoco_set_multicast_list(struct net_device *dev); -static int orinoco_debug_dump_recs(struct net_device *dev); // /* Internal helper functions*/ @@ -3861,7 +3860,6 @@ { SIOCIWFIRSTPRIV + 0x7, 0, IW_PRIV_TYPE_INT | IW_PRIV_SIZE_FIXED | 1, "get_ibssport" }, - { SIOCIWLASTPRIV, 0, 0, "dump_recs" }, }; wrq->u.data.length = sizeof(privtab) / sizeof(privtab[0]); @@ -3949,14 +3947,6 @@ err = orinoco_ioctl_getibssport(dev, wrq); break; - case SIOCIWLASTPRIV: - err = orinoco_debug_dump_recs(dev); - if (err) - printk(KERN_ERR "%s: Unable to dump records (%d)\n", - dev->name, err); - break; - - default: err = -EOPNOTSUPP; } @@ -3970,187 +3960,6 @@ return err; } -struct { - u16 rid; - char *name; - int displaytype; -#define DISPLAY_WORDS 0 -#define DISPLAY_BYTES 1 -#define DISPLAY_STRING 2 -#define DISPLAY_XSTRING3 -} record_table[] = { -#define DEBUG_REC(name,type) { HERMES_RID_##name, #name, DISPLAY_##type } - DEBUG_REC(CNFPORTTYPE,WORDS), - DEBUG_REC(CNFOWNMACADDR,BYTES), - DEBUG_REC(CNFDESIREDSSID,STRING), - DEBUG_REC(CNFOWNCHANNEL,WORDS), - DEBUG_REC(CNFOWNSSID,STRING), - DEBUG_REC(CNFOWNATIMWINDOW,WORDS), - DEBUG_REC(CNFSYSTEMSCALE,WORDS), - DEBUG_REC(CNFMAXDATALEN,WORDS), - DEBUG_REC(CNFPMENABLED,WORDS), - DEBUG_REC(CNFPMEPS,WORDS), - DEBUG_REC(CNFMULTICASTRECEIVE,WORDS), - DEBUG_REC(CNFMAXSLEEPDURATION,WORDS), - DEBUG_REC(CNFPMHOLDOVERDURATION,WORDS), - DEBUG_REC(CNFOWNNAME,STRING), - DEBUG_REC(CNFOWNDTIMPERIOD,WORDS), - DEBUG_REC(CNFMULTICASTPMBUFFERING,WORDS), - DEBUG_REC(CNFWEPENABLED_AGERE,WORDS), - DEBUG_REC(CNFMANDATORYBSSID_SYMBOL,WORDS), - DEBUG_REC(CNFWEPDEFAULTKEYID,WORDS), - DEBUG_REC(CNFDEFAULTKEY0,BYTES), - DEBUG_REC(CNFDEFAULTKEY1,BYTES), - DEBUG_REC(CNFMWOROBUST_AGERE,WORDS), - DEBUG_REC(CNFDEFAULTKEY2,BYTES), - DEBUG_REC(CNFDEFAULTKEY3,BYTES), - DEBUG_REC(CNFWEPFLAGS_INTERSIL,WORDS), - DEBUG_REC(CNFWEPKEYMAPPINGTABLE,WORDS), - DEBUG_REC(CNFAUTHENTICATION,WORDS), - DEBUG_REC(CNFMAXASSOCSTA,WORDS), - DEBUG_REC(CNFKEYLENGTH_SYMBOL,WORDS), - DEBUG_REC(CNFTXCONTROL,WORDS), - DEBUG_REC(CNFROAMINGMODE,WORDS), - DEBUG_REC(CNFHOSTAUTHENTICATION,WORDS), - DEBUG_REC(CNFRCVCRCERROR,WORDS), - DEBUG_REC(CNFMMLIFE,WORDS), - DEBUG_REC(CNFALTRETRYCOUNT,WORDS), - DEBUG_REC(CNFBEACONINT,WORDS), - DEBUG_REC(CNFAPPCFINFO,WORDS), - DEBUG_REC(CNFSTAPCFINFO,WORDS), - DEBUG_REC(CNFPRIORITYQUSAGE,WORDS), - DEBUG_REC(CNFTIMCTRL,WORDS), - DEBUG_REC(CNFTHIRTY2TALLY,WORDS), - DEBUG_REC(CNFENHSECURITY,WORDS), - DEBUG_REC(CNFGROUPADDRESSES,BYTES), - DEBUG_REC(CNFCREATEIBSS,WORDS), - DEBUG_REC(CNFFRAGMENTATIONTHRESHOLD,WORDS), - DEBUG_REC(CNFRTSTHRESHOLD,WORDS), - DEBUG_REC(CNFTXRATECONTROL,WORDS), - DEBUG_REC(CNFPROMISCUOUSMODE,WORDS), - DEBUG_REC(CNFBASICRATES_SYMBOL,WORDS), - DEBUG_REC(CNFPREAMBLE_SYMBOL,WORDS), - DEBUG_REC(CNFSHORTPREAMBLE,WORDS), - DEBUG_REC(CNFWEPKEYS_AGERE,BYTES), - DEBUG_REC(CNFEXCLUDELONGPREAMBLE,WORDS), - DEBUG_REC(CNFTXKEY_AGERE,WORDS), - DEBUG_REC(CNFAUTHENTICATIONRSPTO,WORDS), - DEBUG_REC(CNFBASICRATES,WORDS), - DEBUG_REC(CNFSUPPORTEDRATES,WORDS), - DEBUG_REC(CNFTICKTIME,WORDS), - DEBUG_REC(CNFSCANREQUEST,WORDS), - DEBUG_REC(CNFJOINREQUEST,WORDS), - DEBUG_REC(CNFAUTHENTICATESTATION,WORDS), - DEBUG_REC(CNFCHANNELINFOREQUEST,WORDS), - DEBUG_REC(MAXLOADTIME,WORDS), - DEBUG_REC(DOWNLOADBUFFER,WORDS), - DEBUG_REC(PRIID,WORDS), - DEBUG_REC(PRISUPRANGE,WORDS), - DEBUG_REC(CFIACTRANGES,WORDS), - DEBUG_REC(NICSERNUM,XSTRING), - DEBUG_REC(NICID,WORDS), -
[4/5] Orinoco merge updates, part the fourth: don't set channel in managed mode
Don't attempt to manually set the channel in infrastructure mode, the firmware doesn't like that much. Also don't attempt to override the firmware's default channel number for IBSS mode (I believe default channel can vary by regulatory domain). Signed-off-by: David Gibson <[EMAIL PROTECTED]> Index: working-2.6/drivers/net/wireless/orinoco.c === --- working-2.6.orig/drivers/net/wireless/orinoco.c 2005-03-11 15:07:08.0 +1100 +++ working-2.6/drivers/net/wireless/orinoco.c 2005-03-11 16:13:31.0 +1100 @@ -1615,17 +1615,15 @@ return err; } /* Set the channel/frequency */ - if (priv->channel == 0) { - printk(KERN_DEBUG "%s: Channel is 0 in __orinoco_program_rids()\n", dev->name); - if (priv->createibss) - priv->channel = 10; - } - err = hermes_write_wordrec(hw, USER_BAP, HERMES_RID_CNFOWNCHANNEL, - priv->channel); - if (err) { - printk(KERN_ERR "%s: Error %d setting channel\n", - dev->name, err); - return err; + if (priv->channel != 0 && priv->iw_mode != IW_MODE_INFRA) { + err = hermes_write_wordrec(hw, USER_BAP, + HERMES_RID_CNFOWNCHANNEL, + priv->channel); + if (err) { + printk(KERN_ERR "%s: Error %d setting channel %d\n", + dev->name, err, priv->channel); + return err; + } } if (priv->has_ibss) { @@ -2405,7 +2403,7 @@ /* By default use IEEE/IBSS ad-hoc mode if we have it */ priv->prefer_port3 = priv->has_port3 && (! priv->has_ibss); set_port_type(priv); - priv->channel = 10; /* default channel, more-or-less arbitrary */ + priv->channel = 0; /* use firmware default */ priv->promiscuous = 0; priv->wep_on = 0; -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/people/dgibson - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[1/5] Orinoco merge updates, part the fourth: wireless stats updates
Minor updates/bugfixes to the handling of wireless statistics. Signed-off-by: David Gibson <[EMAIL PROTECTED]> Index: working-2.6/drivers/net/wireless/orinoco.c === --- working-2.6.orig/drivers/net/wireless/orinoco.c 2005-02-25 15:47:53.314373136 +1100 +++ working-2.6/drivers/net/wireless/orinoco.c 2005-02-25 16:20:13.951351472 +1100 @@ -686,7 +686,7 @@ struct orinoco_private *priv = netdev_priv(dev); hermes_t *hw = >hw; struct iw_statistics *wstats = >wstats; - int err = 0; + int err; unsigned long flags; if (! netif_device_present(dev)) { @@ -695,9 +695,21 @@ return NULL; /* FIXME: Can we do better than this? */ } + /* If busy, return the old stats. Returning NULL may cause +* the interface to disappear from /proc/net/wireless */ if (orinoco_lock(priv, ) != 0) - return NULL; /* FIXME: Erg, we've been signalled, how - * do we propagate this back up? */ + return wstats; + + /* We can't really wait for the tallies inquiry command to +* complete, so we just use the previous results and trigger +* a new tallies inquiry command for next time - Jean II */ + /* FIXME: Really we should wait for the inquiry to come back - +* as it is the stats we give don't make a whole lot of sense. +* Unfortunately, it's not clear how to do that within the +* wireless extensions framework: I think we're in user +* context, but a lock seems to be held by the time we get in +* here so we're not safe to sleep here. */ + hermes_inquire(hw, HERMES_INQ_TALLIES); if (priv->iw_mode == IW_MODE_ADHOC) { memset(>qual, 0, sizeof(wstats->qual)); @@ -716,25 +728,16 @@ err = HERMES_READ_RECORD(hw, USER_BAP, HERMES_RID_COMMSQUALITY, ); - - wstats->qual.qual = (int)le16_to_cpu(cq.qual); - wstats->qual.level = (int)le16_to_cpu(cq.signal) - 0x95; - wstats->qual.noise = (int)le16_to_cpu(cq.noise) - 0x95; - wstats->qual.updated = 7; + + if (!err) { + wstats->qual.qual = (int)le16_to_cpu(cq.qual); + wstats->qual.level = (int)le16_to_cpu(cq.signal) - 0x95; + wstats->qual.noise = (int)le16_to_cpu(cq.noise) - 0x95; + wstats->qual.updated = 7; + } } - /* We can't really wait for the tallies inquiry command to -* complete, so we just use the previous results and trigger -* a new tallies inquiry command for next time - Jean II */ - /* FIXME: We're in user context (I think?), so we should just - wait for the tallies to come through */ - err = hermes_inquire(hw, HERMES_INQ_TALLIES); - orinoco_unlock(priv, ); - - if (err) - return NULL; - return wstats; } -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/people/dgibson - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[5/5] Orinoco merge updates, part the fourth: consolidate allocation code
Consolidate allocation of firmware buffers. In the process, remove duplication of a workaround for an old symbol firmware bug, and fix a bug where we could retry the workaround, even if it already failed to help. Signed-off-by: David Gibson <[EMAIL PROTECTED]> Index: working-2.6/drivers/net/wireless/orinoco.c === --- working-2.6.orig/drivers/net/wireless/orinoco.c 2005-03-11 16:13:31.0 +1100 +++ working-2.6/drivers/net/wireless/orinoco.c 2005-03-11 16:21:55.0 +1100 @@ -1418,7 +1418,7 @@ return err; err = hermes_allocate(hw, priv->nicbuf_size, >txfid); - if (err == -EIO) { + if (err == -EIO && priv->nicbuf_size > TX_NICBUF_SIZE_BUG) { /* Try workaround for old Symbol firmware bug */ printk(KERN_WARNING "%s: firmware ALLOC bug detected " "(old Symbol firmware?). Trying to work around... ", @@ -2270,7 +2270,7 @@ priv->nicbuf_size = IEEE802_11_FRAME_LEN + ETH_HLEN; /* Initialize the firmware */ - err = hermes_init(hw); + err = orinoco_reinit_firmware(dev); if (err != 0) { printk(KERN_ERR "%s: failed to initialize firmware (err = %d)\n", dev->name, err); @@ -2409,25 +2409,6 @@ priv->wep_on = 0; priv->tx_key = 0; - err = hermes_allocate(hw, priv->nicbuf_size, >txfid); - if (err == -EIO) { - /* Try workaround for old Symbol firmware bug */ - printk(KERN_WARNING "%s: firmware ALLOC bug detected " - "(old Symbol firmware?). Trying to work around... ", - dev->name); - - priv->nicbuf_size = TX_NICBUF_SIZE_BUG; - err = hermes_allocate(hw, priv->nicbuf_size, >txfid); - if (err) - printk("failed!\n"); - else - printk("ok.\n"); - } - if (err) { - printk("%s: Error %d allocating Tx buffer\n", dev->name, err); - goto out; - } - /* Make the hardware available, as long as it hasn't been * removed elsewhere (e.g. by PCMCIA hot unplug) */ spin_lock_irq(>lock); -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/people/dgibson - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[2/5] Orinoco merge updates, part the fourth: ignore_disconnect flag
Adds an ignore_disconnect module parameter. When enabled, the driver will continue attempting to send packets even when the firmware has told us we've lost our link to the AP. On some firmwares this substantially increases the usable range of the card (presumably because we have an interrmittent connection, but the firmware is able to queue the packets for us until we're connected again). On some other cards, it causes the firmware to fall in a screaming heap :( (hence, default off). Signed-off-by: David Gibson <[EMAIL PROTECTED]> Index: working-2.6/drivers/net/wireless/orinoco.c === --- working-2.6.orig/drivers/net/wireless/orinoco.c 2005-03-11 14:44:09.0 +1100 +++ working-2.6/drivers/net/wireless/orinoco.c 2005-03-11 14:51:33.0 +1100 @@ -492,6 +492,9 @@ static int suppress_linkstatus; /* = 0 */ module_param(suppress_linkstatus, bool, 0644); MODULE_PARM_DESC(suppress_linkstatus, "Don't log link status changes"); +static int ignore_disconnect; /* = 0 */ +module_param(ignore_disconnect, int, 0644); +MODULE_PARM_DESC(ignore_disconnect, "Don't report lost link to the network layer"); // /* Compile time configuration and compatibility stuff */ @@ -1320,7 +1323,7 @@ if (connected) netif_carrier_on(dev); - else + else if (!ignore_disconnect) netif_carrier_off(dev); if (newstatus != priv->last_linkstatus) -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/people/dgibson - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[0/5] Orinoco merge updates, part the fourth
Hi Jeff, please apply: Here's yet another batch of orinoco updates. Smaller and less significant than the last, this is basically a handful of remaining small updates before tackling the big changes (wext v15, monitor and scanning). Patches are: orinoco-wstats-updates Updates and bugfixes to wireless stats handling orinoco-ignore-disconnect Add the ignore_disconnect module parameter orinoco-kill-dump-recs Remove ugly debugging code, to be replaced later with simpler and more useful stuff orinoco-no-infra-channel Don't attempt to set channel in managed mode, the firmware doesn't like that. orinoco-consolidate-allocate Remove some duplicated code for firmware buffer allocation, removing a bug in a hw workaround in the process. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/people/dgibson - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
prefetch on ppc64
Hi, While investigating the inordinate performance impact one of my patches seemed to be having, we tracked it down to two hlist_for_each_entry loops, and finally to the prefetch instruction in the loop. The machine I'm testing on has 4 power5 1.5Ghz cpus and 16G ram. I was mostly using dbench (v3.03) in runs of 50 and 100 on an ext2 system. Kernel was 2.6.11-rc5. I've not had much of a chance to test on x86, but the few tests I've run have shown that prefetch does improve performance there. From what I've seen this seems to be a ppc (perhaps ppc64) specific symptom. Following are two sets of interesting results on the ppc64 system. The first is on a stock 2.6.11-rc5 kernel. The actual stock kernel gave the following results for 100 runs of dbench: # elements: 100, mean 862.580380, variance 5.973441, std dev 2.444062 When I patched fs/dcache.c to replace the three hlist_for_each_entry{,_rcu} rules with manual loops, as shown in the attached file dcache-nohlist.patch, I got: # elements: 50, mean 881.804980, variance 10.695022, std dev 3.270325 The next set of results is based on 2.6.11-rc5 with the LSM stacking patches (from www.sf.net/projects/lsm-stacker). I was understandably alarmed to find the original patched version gave me: # elements: 100, mean 797.654870, variance 7.503588, std dev 2.739268 The code which I determined to be responsible contained two list_for_each_entry loops, Replacing one with a manual loop gave me # elements: 50, mean 835.859980, variance 81.901719, std dev 9.049957 and replacing the second gave me # elements: 50, mean 846.541060, variance 17.095401, std dev 4.134658 Finally I followed Paul McKenney's suggestion and just commented out the ppc definition of prefetch altogether, which gave me: # elements: 50, mean 860.823880, variance 47.567428, std dev 6.896914 I am currently testing this same patch against a non-stacking kernel. thanks, -serge Index: linux-2.6.11-rc5-nostack/fs/dcache.c === --- linux-2.6.11-rc5-nostack.orig/fs/dcache.c 2005-03-11 15:19:58.0 -0600 +++ linux-2.6.11-rc5-nostack/fs/dcache.c2005-03-26 01:35:29.0 -0600 @@ -656,7 +656,7 @@ do { found = 0; spin_lock(_lock); - hlist_for_each(lp, head) { + for (lp=head->first; lp; lp = lp->next) { struct dentry *this = hlist_entry(lp, struct dentry, d_hash); if (!list_empty(>d_lru)) { dentry_stat.nr_unused--; @@ -1047,7 +1047,9 @@ rcu_read_lock(); - hlist_for_each_rcu(node, head) { + for (node=head->first; node; + ({ node = node->next; smp_read_barrier_depends(); })) + { struct dentry *dentry; struct qstr *qstr; @@ -1123,7 +1125,7 @@ spin_lock(_lock); base = d_hash(dparent, dentry->d_name.hash); - hlist_for_each(lhp,base) { + for (lhp=base->first; lhp; lhp = lhp->next) { /* hlist_for_each_rcu() not required for d_hash list * as it is parsed under dcache_lock */
Re: [PATCH] embarassing typo
Vicente Feito <[EMAIL PROTECTED]> writes: > On Tuesday 29 March 2005 09:58 pm, you wrote: >> Måns Rullgård wrote: >> > "Ronald S. Bultje" <[EMAIL PROTECTED]> writes: >> >>--- linux-2.6.5/drivers/media/video/zr36050.c.old 16 Sep 2004 22:53:27 >> >> - 1.2 +++ linux-2.6.5/drivers/media/video/zr36050.c 29 Mar 2005 >> >> 20:30:23 - @@ -419,7 +419,7 @@ >> >> dri_data[2] = 0x00; >> >> dri_data[3] = 0x04; >> >> dri_data[4] = ptr->dri >> 8; >> >>- dri_data[5] = ptr->dri * 0xff; >> >>+ dri_data[5] = ptr->dri & 0xff; >> > >> > Hey, that's a nice obfuscation of a simple negation. >> >> It's not a negation. This statement always assigns zero to >> dri_data[5] if dri_data is char[]. Looks like gcc isn't catching >> this problem. >> > As long as the variable doesn't get overflowed you would have a > negation, you shouldn't do dri_data[5] = ptr->dri * 0xff; if > ptr->dri it's 255, but if ptr->dri = 1 i.e. (like is set in > zr36050_setup) then you would be getting the negation, -1. the > Direct rendering support is a flag afaik, so in this case I believe > is a worthy C obfuscated negation code :) > btw, are you sure about this patch?I would contact the maintainer > first, because and'ing that doesn't make much sense... It seems pretty obvious to me, that the code is supposed to store the high byte in dri_data[4], and the low byte in dri_data[5]. Mistyping & as * doesn't seem too unlikely, either. -- Måns Rullgård [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] embarassing typo
On Wed, Mar 30, 2005 at 04:07:39AM +0200, M?ns Rullg?rd wrote: | Michael Tokarev <[EMAIL PROTECTED]> writes: | | > M?ns Rullg?rd wrote: | >> "Ronald S. Bultje" <[EMAIL PROTECTED]> writes: | >> | >>>--- linux-2.6.5/drivers/media/video/zr36050.c.old 16 Sep 2004 22:53:27 - 1.2 | >>>+++ linux-2.6.5/drivers/media/video/zr36050.c 29 Mar 2005 20:30:23 - | >>>@@ -419,7 +419,7 @@ | >>> dri_data[2] = 0x00; | >>> dri_data[3] = 0x04; | >>> dri_data[4] = ptr->dri >> 8; | >>>- dri_data[5] = ptr->dri * 0xff; | >>>+ dri_data[5] = ptr->dri & 0xff; | >> Hey, that's a nice obfuscation of a simple negation. | > | > It's not a negation. This statement always assigns zero to | > dri_data[5] if dri_data is char[]. | | Sure about that? | | __u16 i; | char c; | i = 1; c = i * 255; /* c = 255 = -1 */ | i = 2; c = i * 255; /* c = 510 & 0xff = 254 = -2 */ | ... | | Looks like negation to me. Sure it's negation because 255 _is_ 256 - 1. Basic finite math. ( x * 256 ) mod 256 == 0 ( ( x * 256 ) - ( x * 1 ) ) mod 256 == - ( x * 1 ) ( x * ( 256 - 1 ) ) mod 256 == - ( x * 1 ) ( x * 255 ) mod 256 == - ( x * 1 ) ( x * 255 ) mod 256 == - x Now what I am interested in is if gcc optimized it to a faster negation or subtraction instruction. -- - | Phil Howard KA9WGN | http://linuxhomepage.com/ http://ham.org/ | | (first name) at ipal.net | http://phil.ipal.org/ http://ka9wgn.ham.org/ | - - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Do not misuse Coverity please (Was: sound/oss/cs46xx.c: fix a check after use)
"Jean Delvare" <[EMAIL PROTECTED]> said: [Sttributions missing, sorry] > > > Think about it. If the pointer could be NULL, then it's unlikely that > > > the bug would have gone unnoticed so far (unless the code is very > > > recent). Coverity found 3 such bugs in one i2c driver [1], and the > > > correct solution was to NOT check for NULL because it just couldn't > > > happen. > > No, there is a third case: the pointer can be NULL, but the compiler > > happened to move the dereference down to after the check. > Wow. Great point. I completely missed that possibility. In fact I didn't > know that the compiler could possibly alter the order of the > instructions. For one thing, I thought it was simply not allowed to. For > another, I didn't know that it had been made so aware that it could > actually figure out how to do this kind of things. What a mess. Let's > just hope that the gcc folks know their business :) The compiler is most definitely /not/ allowed to change the results the code gives. -- Dr. Horst H. von Brand User #22616 counter.li.org Departamento de Informatica Fono: +56 32 654431 Universidad Tecnica Federico Santa Maria +56 32 654239 Casilla 110-V, Valparaiso, ChileFax: +56 32 797513 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 8/8] CKRM: Documentation
This patch adds all current documentation on CKRM. Signed-Off-By: Hubertus Franke <[EMAIL PROTECTED]> Signed-Off-By: Chandra Seetharaman <[EMAIL PROTECTED]> Signed-Off-By: Shailabh Nagar <[EMAIL PROTECTED]> Signed-Off-By: Vivek Kashyap <[EMAIL PROTECTED]> Signed-Off-By: Gerrit Huizenga <[EMAIL PROTECTED]> Index: linux-2.6.12-rc1/Documentation/ckrm/ckrm_basics === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux-2.6.12-rc1/Documentation/ckrm/ckrm_basics 2005-03-18 15:16:46.010477430 -0800 @@ -0,0 +1,66 @@ +CKRM Basics +- +A brief review of CKRM concepts and terminology will help make installation +and testing easier. For more details, please visit http://ckrm.sf.net. + +Currently there are two class types, taskclass and socketclass for grouping, +regulating and monitoring tasks and sockets respectively. + +To avoid repeating instructions for each classtype, this document assumes a +task to be the kernel object being grouped. By and large, one can replace task +with socket and taskclass with socketclass. + +RCFS depicts a CKRM class as a directory. Hierarchy of classes can be +created in which children of a class share resources allotted to +the parent. Tasks can be classified to any class which is at any level. +There is no correlation between parent-child relationship of tasks and +the parent-child relationship of classes they belong to. + +Without a Classification Engine, class is inherited by a task. A privileged +user can reassigned a task to a class as described below, after which all +the child tasks under that task will be assigned to that class, unless the +user reassigns any of them. + +A Classification Engine, if one exists, will be used by CKRM to +classify a task to a class. The Rule based classification engine uses some +of the attributes of the task to classify a task. When a CE is present +class is not inherited by a task. + +Characteristics of a class can be accessed/changed through the following magic +files under the directory representing the class: + +shares: allows to change the shares of different resources managed by the + class +stats: allows to see the statistics associated with each resources managed + by the class +target: allows to assign a task to a class. If a CE is present, assigning + a task to a class through this interface will prevent CE from +reassigning the task to any class during reclassification. +members: allows to see which tasks has been assigned to a class +config: allow to view and modify configuration information of different + resources in a class. + +Resource allocations for a class is controlled by the parameters: + +guarantee: specifies how much of a resource is guranteed to a class. A + special value DONT_CARE(-2) mean that there is no specific + guarantee of a resource is specified, this class may not get + any resource if the system is runing short of resources +limit: specifies the maximum amount of resource that is allowed to be + allocated by a class. A special value DONT_CARE(-2) mean that + there is no specific limit is specified, this class can get all + the resources available. +total_guarantee: total guarantee that is allowed among the children of this + class. In other words, the sum of "guarantee"s of all children + of this class cannot exit this number. +max_limit: Maximum "limit" allowed for any of this class's children. In + other words, "limit" of any children of this class cannot exceed + this value. + +None of this parameters are absolute or have any units associated with +them. These are just numbers(that are relative to its parents') that are +used to calculate the absolute number of resource available for a specific +class. + +Note: The root class has an absolute number of resource units associated with it. + Index: linux-2.6.12-rc1/Documentation/ckrm/core_usage === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux-2.6.12-rc1/Documentation/ckrm/core_usage 2005-03-18 15:16:46.011477350 -0800 @@ -0,0 +1,72 @@ +Usage of CKRM without a classification engine +--- + +1. Create a class + + # mkdir /rcfs/taskclass/c1 + creates a taskclass named c1 , while + # mkdir /rcfs/socket_class/s1 + creates a socketclass named s1 + +The newly created class directory is automatically populated by magic files +shares, stats, members, target and config. + +2. View default shares + + # cat /rcfs/taskclass/c1/shares + + "guarantee=-2,limit=-2,total_guarantee=100,max_limit=100" is the default + value set for resources that have controllers registered with CKRM. + +3. change shares of a + + One or more of the following fields can/must be specified + res= #mandatory +
[patch 7/8] CKRM: Numtasks Controller
This patch provides a resource controller for limiting the number of tasks per class in CKRM. Signed-Off-By: Chandra Seetharaman <[EMAIL PROTECTED]> Signed-Off-By: Hubertus Franke <[EMAIL PROTECTED]> Signed-Off-By: Shailabh Nagar <[EMAIL PROTECTED]> Signed-Off-By: Gerrit Huizenga <[EMAIL PROTECTED]> Index: linux-2.6.12-rc1/include/linux/ckrm_tsk.h === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux-2.6.12-rc1/include/linux/ckrm_tsk.h 2005-03-18 15:16:41.818810820 -0800 @@ -0,0 +1,35 @@ +/* ckrm_tsk.h - No. of tasks resource controller for CKRM + * + * Copyright (C) Chandra Seetharaman, IBM Corp. 2003 + * + * Provides No. of tasks resource controller for CKRM + * + * Latest version, more details at http://ckrm.sf.net + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + */ + +#ifndef _LINUX_CKRM_TSK_H +#define _LINUX_CKRM_TSK_H + +#ifdef CONFIG_CKRM_TYPE_TASKCLASS +#include + +typedef int (*get_ref_t) (struct ckrm_core_class *, int); +typedef void (*put_ref_t) (struct ckrm_core_class *); + +extern int numtasks_get_ref(struct ckrm_core_class *, int); +extern void numtasks_put_ref(struct ckrm_core_class *); +extern void ckrm_numtasks_register(get_ref_t, put_ref_t); + +#else /* CONFIG_CKRM_TYPE_TASKCLASS */ + +#define numtasks_get_ref(core_class, ref) (1) +#define numtasks_put_ref(core_class) do {} while (0) + +#endif /* CONFIG_CKRM_TYPE_TASKCLASS */ +#endif /* _LINUX_CKRM_RES_H */ Index: linux-2.6.12-rc1/init/Kconfig === --- linux-2.6.12-rc1.orig/init/Kconfig 2005-03-18 15:16:37.397162502 -0800 +++ linux-2.6.12-rc1/init/Kconfig 2005-03-18 15:16:41.819810740 -0800 @@ -185,6 +185,16 @@ config CKRM_TYPE_SOCKETCLASS Say Y if unsure. +config CKRM_RES_NUMTASKS + tristate "Number of Tasks Resource Manager" + depends on CKRM_TYPE_TASKCLASS + default y + help + Provides a Resource Controller for CKRM that allows limiting no of + tasks a task class can have. + + Say N if unsure, Y to use the feature. + endmenu config SYSCTL Index: linux-2.6.12-rc1/kernel/ckrm/ckrm_numtasks.c === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux-2.6.12-rc1/kernel/ckrm/ckrm_numtasks.c2005-03-18 15:16:41.820810661 -0800 @@ -0,0 +1,522 @@ +/* ckrm_numtasks.c - "Number of tasks" resource controller for CKRM + * + * Copyright (C) Chandra Seetharaman, IBM Corp. 2003 + * + * Latest version, more details at http://ckrm.sf.net + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + */ + +/* + * CKRM Resource controller for tracking number of tasks in a class. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define TOTAL_NUM_TASKS (131072) /* 128 K */ +#define NUMTASKS_DEBUG +#define NUMTASKS_NAME "numtasks" + +struct ckrm_numtasks { + struct ckrm_core_class *core; /* the core i am part of... */ + struct ckrm_core_class *parent; /* parent of the core above. */ + struct ckrm_shares shares; + spinlock_t cnt_lock;/* always grab parent's lock before child's */ + int cnt_guarantee; /* num_tasks guarantee in local units */ + int cnt_unused; /* has to borrow if more than this is needed */ + int cnt_limit; /* no tasks over this limit. */ + atomic_t cnt_cur_alloc; /* current alloc from self */ + atomic_t cnt_borrowed; /* borrowed from the parent */ + + int over_guarantee; /* turn on/off when cur_alloc goes */ + /* over/under guarantee */ + + /* internally maintained statictics to compare with max numbers */ + int limit_failures; /* # failures as request was over the limit */ + int borrow_sucesses;/* # successful borrows */ + int borrow_failures;/* # borrow failures */ + + /* Maximum the specific statictics has reached. */ + int max_limit_failures; + int max_borrow_sucesses; + int max_borrow_failures; + + /* Total number of specific statistics */ + int tot_limit_failures; + int tot_borrow_sucesses; + int tot_borrow_failures; +}; + +struct ckrm_res_ctlr numtasks_rcbs; + +/* Initialize rescls values + * May be called on each rcfs unmount or as part of error recovery + * to make share values sane. + * Does not traverse hierarchy
[patch 6/8] CKRM: Socket Class Controller
This patch provides the extensions for CKRM to track per socket classes. This is the base to enable socket based resource control for inbound connection control, bandwidth control etc. Signed-Off-By: Vivek Kashyap <[EMAIL PROTECTED]> Signed-Off-By: Gerrit Huizenga <[EMAIL PROTECTED]> Signed-off-by: Nishanth Aravamudan <[EMAIL PROTECTED]> Index: linux-2.6.12-rc1/fs/rcfs/Makefile === --- linux-2.6.12-rc1.orig/fs/rcfs/Makefile 2005-03-18 15:16:33.370482769 -0800 +++ linux-2.6.12-rc1/fs/rcfs/Makefile 2005-03-18 15:16:37.387163297 -0800 @@ -6,3 +6,4 @@ obj-$(CONFIG_RCFS_FS) += rcfs.o rcfs-y := super.o inode.o dir.o rootdir.o magic.o rcfs-$(CONFIG_CKRM_TYPE_TASKCLASS) += tc_magic.o +rcfs-$(CONFIG_CKRM_TYPE_SOCKETCLASS) += socket_fs.o Index: linux-2.6.12-rc1/fs/rcfs/rootdir.c === --- linux-2.6.12-rc1.orig/fs/rcfs/rootdir.c 2005-03-18 15:16:33.372482610 -0800 +++ linux-2.6.12-rc1/fs/rcfs/rootdir.c 2005-03-18 15:16:37.387163297 -0800 @@ -187,6 +187,10 @@ EXPORT_SYMBOL_GPL(rcfs_deregister_classt extern struct rcfs_mfdesc tc_mfdesc; #endif +#ifdef CONFIG_CKRM_TYPE_SOCKETCLASS +extern struct rcfs_mfdesc rcfs_sock_mfdesc; +#endif + /* Common root and magic file entries. * root name, root permissions, magic file names and magic file permissions * are needed by all entities (classtypes and classification engines) existing @@ -203,4 +207,10 @@ struct rcfs_mfdesc *genmfdesc[CKRM_MAX_C #else NULL, #endif +#ifdef CONFIG_CKRM_TYPE_SOCKETCLASS + _sock_mfdesc, +#else + NULL, +#endif + }; Index: linux-2.6.12-rc1/fs/rcfs/socket_fs.c === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux-2.6.12-rc1/fs/rcfs/socket_fs.c2005-03-18 15:16:37.391162979 -0800 @@ -0,0 +1,280 @@ +/* ckrm_socketaq.c + * + * Copyright (C) Vivek Kashyap, IBM Corp. 2004 + * + * Latest version, more details at http://ckrm.sf.net + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + */ + +/*** + * Socket class type + * + * Defines the root structure for socket based classes. Currently only inbound + * connection control is supported based on prioritized accept queues. + **/ + +#include +#include + +extern int rcfs_create_noperm(struct inode *, struct dentry *, int, + struct nameidata *); +extern int rcfs_symlink_noperm(struct inode *, struct dentry *, const char *); +extern int rcfs_mkdir_noperm(struct inode *, struct dentry *, int); +extern int rcfs_rmdir_noperm(struct inode *, struct dentry *); +extern int rcfs_link_noperm(struct dentry *, struct inode *, struct dentry *); +extern int rcfs_unlink_noperm(struct inode *, struct dentry *); +extern int rcfs_mknod_noperm(struct inode *, struct dentry *, int mode, dev_t); + +extern int rcfs_rmdir(struct inode *, struct dentry *); +extern int rcfs_unlink(struct inode *, struct dentry *); +extern int rcfs_rename(struct inode *, struct dentry *, struct inode *, + struct dentry *); + +extern int rcfs_create_coredir(struct inode *, struct dentry *); + +int rcfs_sock_mkdir(struct inode *, struct dentry *, int mode); +int rcfs_sock_rmdir(struct inode *, struct dentry *); +struct inode_operations my_iops; +struct inode_operations class_iops; +struct inode_operations sub_iops; + + +struct rcfs_magf def_magf = { + .mode = RCFS_DEFAULT_DIR_MODE, + .i_op = _iops, + .i_fop = NULL, +}; + +struct rcfs_magf rcfs_sock_rootdesc[] = { + { +/* .name = should not be set, copy from classtype name, */ +.mode = RCFS_DEFAULT_DIR_MODE, +.i_op = _iops, +/* .i_fop = _dir_operations, */ +.i_fop = NULL, +}, + { +.name = "members", +.mode = RCFS_DEFAULT_FILE_MODE, +.i_op = _iops, +.i_fop = _fileops, +}, + { +.name = "target", +.mode = RCFS_DEFAULT_FILE_MODE, +.i_op = _iops, +.i_fop = _fileops, +}, + { +.name = "reclassify", +.mode = RCFS_DEFAULT_FILE_MODE, +.i_op = _iops, +.i_fop = _fileops, +}, +}; + +struct rcfs_magf rcfs_sock_magf[] = { + { +.name = "config", +.mode = RCFS_DEFAULT_FILE_MODE, +.i_op = _iops, +.i_fop = _fileops, +}, + { +.name = "members", +.mode = RCFS_DEFAULT_FILE_MODE, +.i_op = _iops, +.i_fop = _fileops, +}, + { +.name = "shares", +
[patch 5/8] CKRM: Task Class Controller
This patch provides the extensions for CKRM to track task classes. This is the base to enable task class based resource control for cpu, memory and disk I/O. Signed-Off-By: Chandra Seetharaman <[EMAIL PROTECTED]> Signed-Off-By: Hubertus Franke <[EMAIL PROTECTED]> Signed-Off-By: Shailabh Nagar <[EMAIL PROTECTED]> Signed-Off-By: Vivek Kashyap <[EMAIL PROTECTED]> Signed-Off-By: Gerrit Huizenga <[EMAIL PROTECTED]> Signed-off-by: Nishanth Aravamudan <[EMAIL PROTECTED]> Index: linux-2.6.12-rc1/fs/rcfs/Makefile === --- linux-2.6.12-rc1.orig/fs/rcfs/Makefile 2005-03-18 15:16:29.721772974 -0800 +++ linux-2.6.12-rc1/fs/rcfs/Makefile 2005-03-18 15:16:33.370482769 -0800 @@ -5,3 +5,4 @@ obj-$(CONFIG_RCFS_FS) += rcfs.o rcfs-y := super.o inode.o dir.o rootdir.o magic.o +rcfs-$(CONFIG_CKRM_TYPE_TASKCLASS) += tc_magic.o Index: linux-2.6.12-rc1/fs/rcfs/rootdir.c === --- linux-2.6.12-rc1.orig/fs/rcfs/rootdir.c 2005-03-18 15:16:29.721772974 -0800 +++ linux-2.6.12-rc1/fs/rcfs/rootdir.c 2005-03-18 15:16:33.372482610 -0800 @@ -58,7 +58,7 @@ int rcfs_unregister_engine(struct rbce_e return 0; } -EXPORT_SYMBOL(rcfs_unregister_engine); +EXPORT_SYMBOL_GPL(rcfs_unregister_engine); /* * rcfs_mkroot @@ -183,6 +183,10 @@ int rcfs_deregister_classtype(struct ckr EXPORT_SYMBOL_GPL(rcfs_deregister_classtype); +#ifdef CONFIG_CKRM_TYPE_TASKCLASS +extern struct rcfs_mfdesc tc_mfdesc; +#endif + /* Common root and magic file entries. * root name, root permissions, magic file names and magic file permissions * are needed by all entities (classtypes and classification engines) existing @@ -193,6 +197,10 @@ EXPORT_SYMBOL_GPL(rcfs_deregister_classt * table to initialize their magf entries. */ -struct rcfs_mfdesc *genmfdesc[] = { +struct rcfs_mfdesc *genmfdesc[CKRM_MAX_CLASSTYPES] = { +#ifdef CONFIG_CKRM_TYPE_TASKCLASS + _mfdesc, +#else NULL, +#endif }; Index: linux-2.6.12-rc1/fs/rcfs/tc_magic.c === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux-2.6.12-rc1/fs/rcfs/tc_magic.c 2005-03-18 15:16:33.373482530 -0800 @@ -0,0 +1,93 @@ +/* + * fs/rcfs/tc_magic.c + * + * Copyright (C) Shailabh Nagar, IBM Corp. 2004 + * (C) Vivek Kashyap, IBM Corp. 2004 + * (C) Chandra Seetharaman, IBM Corp. 2004 + * (C) Hubertus Franke, IBM Corp. 2004 + * + * define magic fileops for taskclass classtype + * + * Latest version, more details at http://ckrm.sf.net + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +#include +#include + +/* + * Taskclass general + * + * Define structures for taskclass root directory and its magic files + * In taskclasses, there is one set of magic files, created automatically under + * the taskclass root (upon classtype registration) and each directory (class) + * created subsequently. However, classtypes can also choose to have different + * sets of magic files created under their root and other directories under + * root using their mkdir function. RCFS only provides helper functions for + * creating the root directory and its magic files + * + */ + +#define TC_FILE_MODE (S_IFREG | S_IRUGO | S_IWUSR) + +#define NR_TCROOTMF 7 +struct rcfs_magf tc_rootdesc[NR_TCROOTMF] = { + /* First entry must be root */ + { + /* .name = should not be set, copy from classtype name */ +.mode = RCFS_DEFAULT_DIR_MODE, +.i_op = _dir_inode_operations, +.i_fop = _dir_operations, +}, + /* Rest are root's magic files */ + { +.name = "target", +.mode = TC_FILE_MODE, +.i_fop = _fileops, +.i_op = _file_inode_operations, +}, + { +.name = "members", +.mode = TC_FILE_MODE, +.i_fop = _fileops, +.i_op = _file_inode_operations, +}, + { +.name = "stats", +.mode = TC_FILE_MODE, +.i_fop = _fileops, +.i_op = _file_inode_operations, +}, + { +.name = "shares", +.mode = TC_FILE_MODE, +.i_fop = _fileops, +.i_op = _file_inode_operations, +}, + /* +* Reclassify and Config should be made available only at the +* root level. Make sure they are the last two entries, as +* rcfs_mkdir depends on it. +*/ + { +.name = "reclassify", +.mode = TC_FILE_MODE, +.i_fop = _fileops, +.i_op = _file_inode_operations, +}, + { +.name = "config", +.mode = TC_FILE_MODE, +.i_fop = _fileops, +.i_op
[patch 4/8] CKRM: Resource Control File System (rcfs)
Updates CKRM Resource Control Filesystem (rcfs) to include full directory structure support. Signed-Off-By: Chandra Seetharaman <[EMAIL PROTECTED]> Signed-Off-By: Hubertus Franke <[EMAIL PROTECTED]> Signed-Off-By: Shailabh Nagar <[EMAIL PROTECTED]> Signed-Off-By: Vivek Kashyap <[EMAIL PROTECTED]> Signed-Off-By: Gerrit Huizenga <[EMAIL PROTECTED]> Signed-off-by: Nishanth Aravamudan <[EMAIL PROTECTED]> Index: linux-2.6.12-rc1/fs/Makefile === --- linux-2.6.12-rc1.orig/fs/Makefile 2005-03-17 17:34:17.0 -0800 +++ linux-2.6.12-rc1/fs/Makefile2005-03-18 15:16:29.717773292 -0800 @@ -92,6 +92,7 @@ obj-$(CONFIG_JFS_FS) += jfs/ obj-$(CONFIG_XFS_FS) += xfs/ obj-$(CONFIG_AFS_FS) += afs/ obj-$(CONFIG_BEFS_FS) += befs/ +obj-$(CONFIG_RCFS_FS) += rcfs/ obj-$(CONFIG_HOSTFS) += hostfs/ obj-$(CONFIG_HPPFS)+= hppfs/ obj-$(CONFIG_DEBUG_FS) += debugfs/ Index: linux-2.6.12-rc1/fs/rcfs/dir.c === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux-2.6.12-rc1/fs/rcfs/dir.c 2005-03-18 15:16:29.718773213 -0800 @@ -0,0 +1,220 @@ +/* + * fs/rcfs/dir.c + * + * Copyright (C) Shailabh Nagar, IBM Corp. 2004 + * Vivek Kashyap, IBM Corp. 2004 + * + * + * Directory operations for rcfs + * + * Latest version, more details at http://ckrm.sf.net + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the version 2 of the GNU General Public License + * as published by the Free Software Foundation. + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define rcfs_positive(dentry) ((dentry)->d_inode && !d_unhashed((dentry))) + +int rcfs_empty(struct dentry *dentry) +{ + struct dentry *child; + int ret = 0; + + spin_lock(_lock); + list_for_each_entry(child, >d_subdirs, d_child) + if (!rcfs_is_magic(child) && rcfs_positive(child)) + goto out; + ret = 1; +out: + spin_unlock(_lock); + return ret; +} + +/* Directory inode operations */ + +int rcfs_create_coredir(struct inode *dir, struct dentry *dentry) +{ + + struct rcfs_inode_info *ripar, *ridir; + int sz; + + ripar = rcfs_get_inode_info(dir); + ridir = rcfs_get_inode_info(dentry->d_inode); + /* Inform resource controllers - do Core operations */ + if (ckrm_is_core_valid(ripar->core)) { + sz = strlen(ripar->name) + strlen(dentry->d_name.name) + 2; + ridir->name = kmalloc(sz, GFP_KERNEL); + if (!ridir->name) { + return -ENOMEM; + } + snprintf(ridir->name, sz, "%s/%s", ripar->name, +dentry->d_name.name); + ridir->core = (*(ripar->core->classtype->alloc)) + (ripar->core, ridir->name); + } else { + printk(KERN_ERR "rcfs_mkdir: Invalid parent core %p\n", + ripar->core); + return -EINVAL; + } + + return 0; +} + +int rcfs_mkdir(struct inode *dir, struct dentry *dentry, int mode) +{ + + int retval = 0; + struct ckrm_classtype *clstype; + + if (rcfs_mknod(dir, dentry, mode | S_IFDIR, 0)) { + printk(KERN_ERR "rcfs_mkdir: error in rcfs_mknod\n"); + return retval; + } + dir->i_nlink++; + /* Inherit parent's ops since rcfs_mknod assigns noperm ops. */ + dentry->d_inode->i_op = dir->i_op; + dentry->d_inode->i_fop = dir->i_fop; + retval = rcfs_create_coredir(dir, dentry); + if (retval) { + simple_rmdir(dir, dentry); + return retval; + } + /* create the default set of magic files */ + clstype = (rcfs_get_inode_info(dentry->d_inode))->core->classtype; + rcfs_create_magic(dentry, &(((struct rcfs_magf *)clstype->mfdesc)[1]), + clstype->mfcount - 3); + return retval; +} + +int rcfs_rmdir(struct inode *dir, struct dentry *dentry) +{ + struct rcfs_inode_info *ri = rcfs_get_inode_info(dentry->d_inode); + + if (!rcfs_empty(dentry)) { + printk(KERN_ERR "rcfs_rmdir: directory not empty\n"); + return -ENOTEMPTY; + } + /* Core class removal */ + + if (ri->core == NULL) { + printk(KERN_ERR "rcfs_rmdir: core==NULL\n"); + /* likely a race condition */ + return 0; + } + + if ((*(ri->core->classtype->free)) (ri->core)) { + printk(KERN_ERR "rcfs_rmdir: ckrm_free_core_class failed\n"); + goto out; + } + ri->core = NULL;/* just to be safe */ + + /*
[patch 3/8] CKRM: Default Classification Engine
Main code for CKRM default classification engine. Adds Resrouce Control (rc) filesystem as mechanism for setting policies for class assignments in CKRM. Signed-Off-By: Chandra Seetharaman <[EMAIL PROTECTED]> Signed-Off-By: Hubertus Franke <[EMAIL PROTECTED]> Signed-Off-By: Shailabh Nagar <[EMAIL PROTECTED]> Signed-Off-By: Gerrit Huizenga <[EMAIL PROTECTED]> Signed-Off-By: Vivek Kashyap <[EMAIL PROTECTED]> include/linux/ckrm_ce.h | 108 + include/linux/ckrm_events.h |8 include/linux/ckrm_rc.h | 355 include/linux/rcfs.h| 96 include/linux/sched.h |6 init/main.c |2 kernel/ckrm/Makefile|2 kernel/ckrm/ckrm.c | 927 kernel/ckrm/ckrmutils.c | 195 + 9 files changed, 1694 insertions(+), 5 deletions(-) Index: linux-2.6.12-rc1/include/linux/ckrm_ce.h === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux-2.6.12-rc1/include/linux/ckrm_ce.h2005-03-18 15:16:24.330201800 -0800 @@ -0,0 +1,95 @@ +/* + * ckrm_ce.h - Header file to be used by Classification Engine of CKRM + * + * Copyright (C) Hubertus Franke, IBM Corp. 2003 + * (C) Shailabh Nagar, IBM Corp. 2003 + * (C) Chandra Seetharaman, IBM Corp. 2003 + * + * Provides data structures, macros and kernel API of CKRM for + * classification engine. + * + * Latest version, more details at http://ckrm.sf.net + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of version 2.1 of the GNU Lesser General Public License + * as published by the Free Software Foundation. + * + * This program is distributed in the hope that it would be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. + * + */ + +#ifndef _LINUX_CKRM_CE_H +#define _LINUX_CKRM_CE_H + +#ifdef CONFIG_CKRM + +#include + +/* + * Action parameters identifying the cause of a task<->class notify callback + * these can perculate up to user daemon consuming records send by the + * classification engine + */ + +typedef void *(*ce_classify_fct) (enum ckrm_event event, void *obj, ...); +typedef void (*ce_notify_fct) (enum ckrm_event event, void *classobj, +void *obj); + +struct ckrm_eng_callback { + /* general state information */ + int always_callback;/* set if CE should always be called back + regardless of numclasses */ + + /* callbacks which are called without holding locks */ + + unsigned long c_interest; /* set of classification events of +* interest to CE +*/ + + /* generic classify */ + ce_classify_fct classify; + + /* class added */ + void (*class_add) (const char *name, void *core, int classtype); + + /* class deleted */ + void (*class_delete) (const char *name, void *core, int classtype); + + /* callbacks which are called while holding task_lock(tsk) */ + unsigned long n_interest; /* set of notification events of +* interest to CE +*/ + /* notify on class switch */ + ce_notify_fct notify; +}; + +struct inode; +struct dentry; + +struct rbce_eng_callback { + int (*mkdir) (struct inode *, struct dentry *, int);/* mkdir */ + int (*rmdir) (struct inode *, struct dentry *); /* rmdir */ + int (*mnt) (void); + int (*umnt) (void); +}; + +extern int ckrm_register_engine(const char *name, struct ckrm_eng_callback *); +extern int ckrm_unregister_engine(const char *name); + +extern void *ckrm_classobj(char *, int *classtype); + +extern int rcfs_register_engine(struct rbce_eng_callback *); +extern int rcfs_unregister_engine(struct rbce_eng_callback *); + +extern int ckrm_reclassify(int pid); + +#ifndef _LINUX_CKRM_RC_H + +extern void ckrm_core_grab(struct ckrm_core_class *core); +extern void ckrm_core_drop(struct ckrm_core_class *core); +#endif + +#endif /* CONFIG_CKRM */ +#endif /* _LINUX_CKRM_CE_H */ Index: linux-2.6.12-rc1/include/linux/ckrm_events.h === --- linux-2.6.12-rc1.orig/include/linux/ckrm_events.h 2005-03-18 15:16:16.981786266 -0800 +++ linux-2.6.12-rc1/include/linux/ckrm_events.h2005-03-18 15:16:24.335201402 -0800 @@ -108,70 +108,78 @@ int ckrm_unregister_event_cb(enum ckrm_e extern void ckrm_invoke_event_cb_chain(enum ckrm_event ev, void *arg); /* forward declarations for function arguments */ -struct task_struct; + +#include/* for task_struct */ + struct sock; struct user_struct; static inline void ckrm_cb_fork(struct task_struct *p) { -
[patch 1/8] CKRM: Core CKRM Event Callbacks
Core CKRM Event Callbacks. On exec, fork, exit, real/effective gid/uid, use CKRM to associate tasks with appropriate class. Addressed all review comments except: Greg KH: Use of __bitwise and sparse in enum's Use of kernel list type Signed-off-by: Shailabh Nagar <[EMAIL PROTECTED]> Signed-off-by: Hubertus Franke <[EMAIL PROTECTED]> Signed-off-by: Chandra Seetharaman <[EMAIL PROTECTED]> Signed-off-by: Gerrit Huizenga <[EMAIL PROTECTED]> fs/exec.c |2 include/linux/ckrm_events.h | 190 include/linux/sched.h |1 init/Kconfig| 16 +++ kernel/Makefile |2 kernel/ckrm/Makefile|7 + kernel/ckrm/ckrm_events.c | 97 ++ kernel/exit.c |3 kernel/fork.c |4 kernel/sys.c| 10 ++ 10 files changed, 331 insertions(+), 1 deletion(-) Index: linux-2.6.12-rc1/fs/exec.c === --- linux-2.6.12-rc1.orig/fs/exec.c 2005-03-17 17:34:09.0 -0800 +++ linux-2.6.12-rc1/fs/exec.c 2005-03-18 15:16:16.981786266 -0800 @@ -48,6 +48,7 @@ #include #include #include +#include #include #include @@ -1087,6 +1088,7 @@ int search_binary_handler(struct linux_b fput(bprm->file); bprm->file = NULL; current->did_exec = 1; + ckrm_cb_exec(bprm->filename); return retval; } read_lock(_lock); Index: linux-2.6.12-rc1/include/linux/ckrm_events.h === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux-2.6.12-rc1/include/linux/ckrm_events.h2005-03-18 15:16:16.981786266 -0800 @@ -0,0 +1,192 @@ +/* + * ckrm_events.h - Class-based Kernel Resource Management (CKRM) + * event handling + * + * Copyright (C) Hubertus Franke, IBM Corp. 2003,2004 + * (C) Shailabh Nagar, IBM Corp. 2003 + * (C) Chandra Seetharaman, IBM Corp. 2003 + * + * + * Provides a base header file including macros and basic data structures. + * + * Latest version, more details at http://ckrm.sf.net + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of version 2.1 of the GNU Lesser General Public License + * as published by the Free Software Foundation. + * + * This program is distributed in the hope that it would be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. + * + */ + +#ifndef _LINUX_CKRM_EVENTS_H +#define _LINUX_CKRM_EVENTS_H + +#ifdef CONFIG_CKRM + +/* + * Data structure and function to get the list of registered + * resource controllers. + */ + +/* + * CKRM defines a set of events at particular points in the kernel + * at which callbacks registered by various class types are called + */ + +enum ckrm_event { + /* +* we distinguish these events types: +* +* (a) CKRM_LATCHABLE_EVENTS +* events can be latched for event callbacks by classtypes +* +* (b) CKRM_NONLATACHBLE_EVENTS +* events can not be latched but can be used to call classification +* +* (c) event that are used for notification purposes +* range: [ CKRM_EVENT_CANNOT_CLASSIFY .. ) +*/ + + /* events (a) */ + + CKRM_LATCHABLE_EVENTS, + + CKRM_EVENT_NEWTASK = CKRM_LATCHABLE_EVENTS, + CKRM_EVENT_FORK, + CKRM_EVENT_EXIT, + CKRM_EVENT_EXEC, + CKRM_EVENT_UID, + CKRM_EVENT_GID, + CKRM_EVENT_LOGIN, + CKRM_EVENT_USERADD, + CKRM_EVENT_USERDEL, + CKRM_EVENT_LISTEN_START, + CKRM_EVENT_LISTEN_STOP, + CKRM_EVENT_APPTAG, + + /* events (b) */ + + CKRM_NONLATCHABLE_EVENTS, + + CKRM_EVENT_RECLASSIFY = CKRM_NONLATCHABLE_EVENTS, + + /* events (c) */ + + CKRM_NOTCLASSIFY_EVENTS, + + CKRM_EVENT_MANUAL = CKRM_NOTCLASSIFY_EVENTS, + + CKRM_NUM_EVENTS +}; + +/* + * CKRM event callback specification for the classtypes or resource controllers + * typically an array is specified using CKRM_EVENT_SPEC terminated with + * CKRM_EVENT_SPEC_LAST and then that array is registered using + * ckrm_register_event_set. + * Individual registration of event_cb is also possible + */ + +struct ckrm_hook_cb { + void (*fct)(void *arg); + struct ckrm_hook_cb *next; +}; + +struct ckrm_event_spec { + enum ckrm_event ev; + struct ckrm_hook_cb cb; +}; + +int ckrm_register_event_set(struct ckrm_event_spec especs[]); +int ckrm_unregister_event_set(struct ckrm_event_spec especs[]); +int ckrm_register_event_cb(enum ckrm_event ev, struct
[patch 2/8] CKRM: Processor Delay Accounting
CKRM processor scheduling delay accounting - provides a mechanism to In addition to counting frequency the total delay in ns is also recorded. CPU delays are specified as cpu-wait and cpu-run. I/O delays are recorded for memory and regular I/O. Information is accessible through /proc//delay. Signed-Off-By: Chandra Seetharaman <[EMAIL PROTECTED]> Signed-Off-By: Hubertus Franke <[EMAIL PROTECTED]> Signed-Off-By: Shailabh Nagar <[EMAIL PROTECTED]> Signed-Off-By: Gerrit Huizenga <[EMAIL PROTECTED]> fs/proc/array.c| 18 + fs/proc/base.c | 18 + include/linux/sched.h | 86 + include/linux/taskdelays.h | 45 +++ init/Kconfig |8 kernel/fork.c |1 kernel/sched.c | 17 mm/memory.c|9 +++- 8 files changed, 200 insertions(+), 2 deletions(-) Index: linux-2.6.12-rc1/fs/proc/array.c === --- linux-2.6.12-rc1.orig/fs/proc/array.c 2005-03-17 17:34:18.0 -0800 +++ linux-2.6.12-rc1/fs/proc/array.c2005-03-18 15:16:20.884475861 -0800 @@ -482,3 +482,21 @@ int proc_pid_statm(struct task_struct *t return sprintf(buffer,"%d %d %d %d %d %d %d\n", size, resident, shared, text, lib, data, 0); } + + +int proc_pid_delay(struct task_struct *task, char * buffer) +{ + int res; + + res = sprintf(buffer,"%u %llu %llu %u %llu %u %llu\n", + (unsigned int) get_delay(task,runs), + (uint64_t) get_delay(task,runcpu_total), + (uint64_t) get_delay(task,waitcpu_total), + (unsigned int) get_delay(task,num_iowaits), + (uint64_t) get_delay(task,iowait_total), + (unsigned int) get_delay(task,num_memwaits), + (uint64_t) get_delay(task,mem_iowait_total) + ); + return res; +} + Index: linux-2.6.12-rc1/fs/proc/base.c === --- linux-2.6.12-rc1.orig/fs/proc/base.c2005-03-17 17:34:18.0 -0800 +++ linux-2.6.12-rc1/fs/proc/base.c 2005-03-18 15:16:20.889475463 -0800 @@ -120,6 +120,10 @@ enum pid_directory_inos { #ifdef CONFIG_AUDITSYSCALL PROC_TID_LOGINUID, #endif +#ifdef CONFIG_DELAY_ACCT +PROC_TID_DELAY_ACCT, +PROC_TGID_DELAY_ACCT, +#endif PROC_TID_FD_DIR = 0x8000, /* 0x8000-0x */ PROC_TID_OOM_SCORE, PROC_TID_OOM_ADJUST, @@ -155,6 +159,9 @@ static struct pid_entry tgid_base_stuff[ #ifdef CONFIG_SECURITY E(PROC_TGID_ATTR, "attr",S_IFDIR|S_IRUGO|S_IXUGO), #endif +#ifdef CONFIG_DELAY_ACCT + E(PROC_TGID_DELAY_ACCT,"delay", S_IFREG|S_IRUGO), +#endif #ifdef CONFIG_KALLSYMS E(PROC_TGID_WCHAN, "wchan", S_IFREG|S_IRUGO), #endif @@ -191,6 +198,9 @@ static struct pid_entry tid_base_stuff[] #ifdef CONFIG_SECURITY E(PROC_TID_ATTR, "attr",S_IFDIR|S_IRUGO|S_IXUGO), #endif +#ifdef CONFIG_DELAY_ACCT + E(PROC_TGID_DELAY_ACCT,"delay", S_IFREG|S_IRUGO), +#endif #ifdef CONFIG_KALLSYMS E(PROC_TID_WCHAN, "wchan", S_IFREG|S_IRUGO), #endif @@ -1564,6 +1574,13 @@ static struct dentry *proc_pident_lookup ei->op.proc_read = proc_pid_wchan; break; #endif +#ifdef CONFIG_DELAY_ACCT + case PROC_TID_DELAY_ACCT: + case PROC_TGID_DELAY_ACCT: + inode->i_fop = _info_file_operations; + ei->op.proc_read = proc_pid_delay; + break; +#endif #ifdef CONFIG_SCHEDSTATS case PROC_TID_SCHEDSTAT: case PROC_TGID_SCHEDSTAT: Index: linux-2.6.12-rc1/fs/proc/internal.h === --- linux-2.6.12-rc1.orig/fs/proc/internal.h2005-03-17 17:33:50.0 -0800 +++ linux-2.6.12-rc1/fs/proc/internal.h 2005-03-18 15:16:20.889475463 -0800 @@ -36,6 +36,7 @@ extern int proc_tid_stat(struct task_str extern int proc_tgid_stat(struct task_struct *, char *); extern int proc_pid_status(struct task_struct *, char *); extern int proc_pid_statm(struct task_struct *, char *); +extern int proc_pid_delay(struct task_struct *, char*); static inline struct task_struct *proc_task(struct inode *inode) { Index: linux-2.6.12-rc1/include/linux/sched.h === --- linux-2.6.12-rc1.orig/include/linux/sched.h 2005-03-17 17:33:50.0 -0800 +++ linux-2.6.12-rc1/include/linux/sched.h 2005-03-18 15:16:20.891475304 -0800 @@ -34,6 +34,7 @@ #include #include #include +#include struct exec_domain; @@ -727,6 +728,9 @@ struct task_struct { nodemask_t mems_allowed; int cpuset_mems_generation;
[patch 0/8] CKRM: Core patch set
-- This is the core patch set for CKRM, review comments almost all applied (there are a few we are still working on, mostly cosmetic). However, this set has been extensively regression tested on IA32, x86-64/EM64T, and PPC64, with various CKRM CONFIG options on and off and both regression tests and ckrm's functional tests. I believe this set is ready for additional testing in -mm. We have an additional 4 patch sets that will follow this (classification engines, memory controller, IO controller, updated network controller). Continued comments are welcome; once we have patches for the last of the cleanups, we are hoping we'll have sufficient testing to be able to push this towards mainline. gerrit - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.30-rc3 md/ext3 problems
On Tue, Mar 29, 2005 at 10:10:34AM +1000, Neil Brown wrote: > On Monday March 28, [EMAIL PROTECTED] wrote: > > On Mon, Mar 28, 2005 at 10:34:05AM +0300, [Ville Herva] wrote: > > > > > > I just upgraded from linux-2.4.21 + vserser 0.17 to 2.4.30rc3 + vserver > > > 1.2.10. The box has been running stable with 2.4.21 + vserver 0.17/0.16 > > > for > > > a few years (uptime before reboot was nearly 400 days.) > > > > > > The boot went fine, but after few hours I got > > > Message from [EMAIL PROTECTED] at Sun Mar 27 22:07:00 2005 ... > > > kernel: journal commit I/O error > > I got that error on 2.4.30-rc1 a couple of times, and now cannot > reproduce it :-( > But if you got it too, then it wasn't just bad luck. > > The ext3 code in 2.4.30-rc does have a few more checks for IO errors > which will cause the journal to be aborted and produce this error, so > I suspect that change which caused the problem is a change in ext3. > However that doesn't mean the bug is there. > > The extra code in ext3 seems to just check if buffer_uptodate is false > after it has waited on a locked buffer, and triggers a journal abort > if it isn't. This should be perfectly safe, and I cannot find any > logic error near by. But nor can I find any errors that would cause a > buffer returned from raid1 to not be uptodate (unless there really was > an IO error). Attached is the backout patch, for convenience. # This is a BitKeeper generated diff -Nru style patch. # # ChangeSet # 2005/03/29 18:49:25-03:00 [EMAIL PROTECTED] # Cset exclude: [EMAIL PROTECTED]|ChangeSet|20050226095914|25750 # # mm/filemap.c # 2005/03/29 18:49:22-03:00 [EMAIL PROTECTED] +0 -0 # Exclude # # include/linux/jbd.h # 2005/03/29 18:49:22-03:00 [EMAIL PROTECTED] +0 -0 # Exclude # # fs/jbd/transaction.c # 2005/03/29 18:49:21-03:00 [EMAIL PROTECTED] +0 -0 # Exclude # # fs/jbd/journal.c # 2005/03/29 18:49:21-03:00 [EMAIL PROTECTED] +0 -0 # Exclude # # fs/jbd/commit.c # 2005/03/29 18:49:21-03:00 [EMAIL PROTECTED] +0 -0 # Exclude # # fs/ext3/super.c # 2005/03/29 18:49:21-03:00 [EMAIL PROTECTED] +0 -0 # Exclude # # fs/ext3/fsync.c # 2005/03/29 18:49:21-03:00 [EMAIL PROTECTED] +0 -0 # Exclude # diff -Nru a/fs/ext3/fsync.c b/fs/ext3/fsync.c --- a/fs/ext3/fsync.c 2005-03-29 18:50:56 -03:00 +++ b/fs/ext3/fsync.c 2005-03-29 18:50:56 -03:00 @@ -69,7 +69,7 @@ if (test_opt(inode->i_sb, DATA_FLAGS) == EXT3_MOUNT_WRITEBACK_DATA) ret |= fsync_inode_data_buffers(inode); - ret |= ext3_force_commit(inode->i_sb); + ext3_force_commit(inode->i_sb); return ret; } diff -Nru a/fs/ext3/super.c b/fs/ext3/super.c --- a/fs/ext3/super.c 2005-03-29 18:50:56 -03:00 +++ b/fs/ext3/super.c 2005-03-29 18:50:56 -03:00 @@ -1608,13 +1608,12 @@ static int ext3_sync_fs(struct super_block *sb) { - int err; tid_t target; sb->s_dirt = 0; target = log_start_commit(EXT3_SB(sb)->s_journal, NULL); - err = log_wait_commit(EXT3_SB(sb)->s_journal, target); - return err; + log_wait_commit(EXT3_SB(sb)->s_journal, target); + return 0; } /* diff -Nru a/fs/jbd/commit.c b/fs/jbd/commit.c --- a/fs/jbd/commit.c 2005-03-29 18:50:55 -03:00 +++ b/fs/jbd/commit.c 2005-03-29 18:50:55 -03:00 @@ -92,7 +92,7 @@ struct buffer_head *wbuf[64]; int bufs; int flags; - int err = 0; + int err; unsigned long blocknr; char *tagp = NULL; journal_header_t *header; @@ -299,8 +299,6 @@ spin_unlock(_datalist_lock); unlock_journal(journal); wait_on_buffer(bh); - if (unlikely(!buffer_uptodate(bh))) - err = -EIO; /* the journal_head may have been removed now */ lock_journal(journal); goto write_out_data; @@ -328,8 +326,6 @@ spin_unlock(_datalist_lock); unlock_journal(journal); wait_on_buffer(bh); - if (unlikely(!buffer_uptodate(bh))) - err = -EIO; lock_journal(journal); spin_lock(_datalist_lock); continue; /* List may have changed */ @@ -355,9 +351,6 @@ } spin_unlock(_datalist_lock); - if (err) - __journal_abort_hard(journal); - /* * If we found any dirty or locked buffers, then we should have * looped back up to the write_out_data label. If there weren't @@ -548,8 +541,6 @@ if (buffer_locked(bh)) { unlock_journal(journal); wait_on_buffer(bh); - if (unlikely(!buffer_uptodate(bh))) - err = -EIO; lock_journal(journal);
Re: no need to check for NULL before calling kfree() -fs/ext2/
Pekka wrote: > (4) The cleanups Jesper and others are doing are to remove the > _redundant_ NULL checks (i.e. it is now checked twice). Even such obvious changes as removing redundant checks doesn't seem to ensure a performance improvement. Jesper Juhl posted performance data for such changes in his microbenchmark a couple of days ago. As I posted then, I could swear that his numbers show: > Just looking at the third run, it seems to me that "if (likely(p)) > kfree(p);" beats a naked "kfree(p);" everytime, whether p is half > NULL's, or very few NULL's, or almost all NULL's. Twice now I have asked Jesper to explain this strange result. I have heard no explanation (not even a terse "you idiot ;)"), nor anyone else comment on these numbers. Maybe we should be following your good advice: > You don't know that until you profile! instead of continuing to make these code changes. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i386/x86_64 segment register issuses (Re: PATCH: Fix x86 segment register access)
On Tue, 29 Mar 2005, H. J. Lu wrote: > > > the smaller and faster version do not want to just rely on gas > > automatically getting it right, especially since gas has historically been > > very very bad at getting things right. > > We are fixing those issues in assembler. If people run into problems > like that with gas, they can report them. They will be fixed. It's fine if gas fixes things. It's not fine if gas breaks things that used to work, for no really good reason. > > What is the advantage of not allowing "movl %ds,mem"? Really? Especially > > since I suspect the kernel is pretty much the only one who does this, and > > the kernel really does do it on purpose. The kernel explicitly wants the > > 32-bit version, knowing that the upper bits are undefined. > > > > Kernel has > > unsigned gsindex; > asm volatile("movl %%gs,%0" : "=g" (gsindex)); Ok, that's a real x86-64 bug, it seems. Andi, please fix, preferably by just making the "g" be a "r". However, your argument isn't very valid, since: > The new assembler will make sure that it won't happen. Not true, since the suggestion was just to change all segment "movl" things to "mov", at which point the same old bug is still there, and the assembler didn't really help us at all. See the problem? You're not actually protecting anything. The change just makes it _harder_ to make sizes explicit, and suddenly we have to trust an assembler to be clever about sizes, when that assembler historically has definitely _not_ been very clever about them at all. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] embarassing typo
As long as the variable doesn't get overflowed you would have a negation, you shouldn't do dri_data[5] = ptr->dri * 0xff; if ptr->dri it's 255, but if ptr->dri = 1 i.e. (like is set in zr36050_setup) then you would be getting the negation, -1. the Direct rendering support is a flag afaik, so in this case I believe is a worthy C obfuscated negation code :) btw, are you sure about this patch?I would contact the maintainer first, because and'ing that doesn't make much sense... Disclaimer, all this is: AFAIK! :) On Tuesday 29 March 2005 09:58 pm, you wrote: > Måns Rullgård wrote: > > "Ronald S. Bultje" <[EMAIL PROTECTED]> writes: > >>--- linux-2.6.5/drivers/media/video/zr36050.c.old 16 Sep 2004 22:53:27 > >> - 1.2 +++ linux-2.6.5/drivers/media/video/zr36050.c 29 Mar 2005 > >> 20:30:23 - @@ -419,7 +419,7 @@ > >> dri_data[2] = 0x00; > >> dri_data[3] = 0x04; > >> dri_data[4] = ptr->dri >> 8; > >>- dri_data[5] = ptr->dri * 0xff; > >>+ dri_data[5] = ptr->dri & 0xff; > > > > Hey, that's a nice obfuscation of a simple negation. > > It's not a negation. This statement always assigns zero to > dri_data[5] if dri_data is char[]. Looks like gcc isn't catching > this problem. > > > BTW, when assigning to a char type, is the masking really necessary at > > all? I can't see that it should make a difference. Am I missing > > something subtle? > > Well, it's a matter of readability mostly. For now at least, when > char is always 8 bytes... > > /mjt > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Mac mini sound woes
On Wed, 2005-03-30 at 03:45 +0200, Marcin Dalecki wrote: > On 2005-03-29, at 12:22, Takashi Iwai wrote: > > > > ALSA provides the "driver" feature in user-space because it's more > > flexible, more efficient and safer than doing in kernel. It's > > transparent from apps perspective. It really doesn't matter whether > > it's in kernel or user space. > > Yes because it's that wonder full linux sound processing sucks in > compare > to the other OSs out there doing it in kernel. What are you taking about? It's actually quite good. Have you actually tried these other OSes lately? These devices in question (those lacking hardware mixing and volume control) don't exactly work great under that OS. Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] embarassing typo
Michael Tokarev <[EMAIL PROTECTED]> writes: > Måns Rullgård wrote: >> "Ronald S. Bultje" <[EMAIL PROTECTED]> writes: >> >>>--- linux-2.6.5/drivers/media/video/zr36050.c.old16 Sep 2004 22:53:27 >>>- 1.2 >>>+++ linux-2.6.5/drivers/media/video/zr36050.c29 Mar 2005 20:30:23 >>>- >>>@@ -419,7 +419,7 @@ >>> dri_data[2] = 0x00; >>> dri_data[3] = 0x04; >>> dri_data[4] = ptr->dri >> 8; >>>-dri_data[5] = ptr->dri * 0xff; >>>+dri_data[5] = ptr->dri & 0xff; >> Hey, that's a nice obfuscation of a simple negation. > > It's not a negation. This statement always assigns zero to > dri_data[5] if dri_data is char[]. Sure about that? __u16 i; char c; i = 1; c = i * 255; /* c = 255 = -1 */ i = 2; c = i * 255; /* c = 510 & 0xff = 254 = -2 */ ... Looks like negation to me. -- Måns Rullgård [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Industry db benchmark result on recent 2.6 kernels
Chen, Kenneth W wrote: Nick Piggin wrote on Tuesday, March 29, 2005 5:32 PM If it is doing a lot of mapping/unmapping (or fork/exit), then that might explain why 2.6.11 is worse. Fortunately there are more patches to improve this on the way. Once benchmark reaches steady state, there is no mapping/unmapping going on. Actually, the virtual address space for all the processes are so stable at steady state that we don't even see it grow or shrink. Oh, well there goes that theory ;) The only other thing I can think of is the CPU scheduler changes that went into 2.6.11 (but there are obviously a lot that I can't think of). I'm sure I don't need to tell you it would be nice to track down the source of these problems rather than papering over them with improvements to the block layer... any indication of what has gone wrong? Typically if the CPU scheduler has gone bad and is moving too many tasks around (and hurting caches), you'll see things like copy_*_user increase in cost for the same units of work performed. Wheras if it is too reluctant to move tasks, you'll see increased idle time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i386/x86_64 segment register issuses (Re: PATCH: Fix x86 segment register access)
On Tue, Mar 29, 2005 at 04:30:01PM -0800, Linus Torvalds wrote: > > > On Mon, 28 Mar 2005, Andi Kleen wrote: > > > > "H. J. Lu" <[EMAIL PROTECTED]> writes: > > > The new assembler will disallow them since those instructions with > > > memory operand will only use the first 16bits. If the memory operand > > > is 16bit, you won't see any problems. But if the memory destinatin > > > is 32bit, the upper 16bits may have random values. The new assembler > > > > Does it really have random values on existing x86 hardware? > > The upper bits are not written at all, so it's not random. > > > If it is a only a "theoretical" problem that does not happen > > in practice I would advise to not do the change. > > My preference too. The reason we use "movl" is because we really do want > the 32-bit versions, since they are faster. It's a conscious choice. In > contrast "movw" generates bigger and slower code on all assemblers out > there, and "mov" doesn't make it clear which one it is. Is it the slow > one, or the fast one? "mov" shouldn't generate the 0x66 prefix, at least with the assembler since binutils 2.14.90.0.4 20030523. The assembler in CVS won't generate 0x66 for "movw" either. > Now, those versions of gas may be so old that nobody cares, but the > explicit size still is a GOOD THING. The size DOES MATTER. People who want Suggesting "mov" instead of "movw" is for the existing assemblers. Or kernel can check assembler version to decide if "movw" should be used. I can verify the first Linux assembler which won't generate 0x66 for "movw". > the smaller and faster version do not want to just rely on gas > automatically getting it right, especially since gas has historically been > very very bad at getting things right. We are fixing those issues in assembler. If people run into problems like that with gas, they can report them. They will be fixed. > > What is the advantage of not allowing "movl %ds,mem"? Really? Especially > since I suspect the kernel is pretty much the only one who does this, and > the kernel really does do it on purpose. The kernel explicitly wants the > 32-bit version, knowing that the upper bits are undefined. > Kernel has unsigned gsindex; asm volatile("movl %%gs,%0" : "=g" (gsindex)); ... if (gsindex) It is OK if gcc never generates memory access like movl %gs,0x128(%rsp) Otherwise, the upper bits in gsindex are undefined. The new assembler will make sure that it won't happen. H.J. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Mac mini sound woes
On 2005-03-30, at 01:39, Benjamin Herrenschmidt wrote: On Tue, 2005-03-29 at 17:25 -0600, Chris Friesen wrote: Lee Revell wrote: This is the exact line of reasoning that led to Winmodems. My main issue with winmodems is not so much the software offload, but rather that the vendors don't release full specs. If all winmodem manufacturers released full hardware specs, I doubt people would really complain all that much. There's a fairly large pool of talent available to write drivers once the interfaces are known. Look at the pile of junk that are most winmodem driver implementations, nothing I want to see in the kernel ever. Those things should be in userland. You are joking? Linux IS NOT an RT OS. And well not too long ago you could be jailed for example in germany for using not well behaving communication devices. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Mac mini sound woes
On 2005-03-30, at 00:13, Lee Revell wrote: On Tue, 2005-03-29 at 11:22 +0200, Marcin Dalecki wrote: No. You didn't get it. I'm taking the view that mixing sound is simply a task you would typically love to make a DSP firmware do. However providing a DSP for sound processing at 44kHZ on the same PCB as an 1GHZ CPU is a ridiculous waste of resources. Thus most hardware vendors out there decided to use the main CPU instead. Thus the "firmware" is simply running on the main CPU now. Now where should it go? I'm convinced that its better to put it near the hardware in the whole stack. You think it's best to put it far away and to invent artificial synchronization problems between different applications putting data down to the same hardware device. This is the exact line of reasoning that led to Winmodems. Yes and BTW those are from a hardware point of view a technically perfectly fine solution. The obstacles here are two fold: Win32 kernel sucks big rocks on latency issues. However since the time we are over 1GHz and use XP they work perfectly fine. On Linux you don't get the necessary DSP processing code/docs. Both are just pragmatical arguments which don't apply to sound processing at all. And for you note - I'm the guy who several years ago wrote the first ever GDI-Printer driver for Linux (oki4linux) despite claims from quite prominent people here that this couldn't be ever done. And yes I did it in user space because pages are not data streams. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Mac mini sound woes
On 2005-03-29, at 12:22, Takashi Iwai wrote: ALSA provides the "driver" feature in user-space because it's more flexible, more efficient and safer than doing in kernel. It's transparent from apps perspective. It really doesn't matter whether it's in kernel or user space. Yes because it's that wonder full linux sound processing sucks in compare to the other OSs out there doing it in kernel. I think your misunderstanding is that you beliieve user-space can't do RT. It's wrong. See JACK (jackit.sf.net), for example. I know JACK in and out. It doesn't provide what you claim. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] embarassing typo
On Tuesday 29 March 2005 16:58, Michael Tokarev wrote: > Well, it's a matter of readability mostly. For now at least, when > char is always 8 bytes... Wow, that's one huge char you have there ;) -- Dmitry - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Disc driver is module, software suspend fails
On Tue, 29 Mar 2005, Pavel Machek wrote: > You insmod driver for your swap device, then you echo device numbers > to /sys... then initiate resume. So you're saying, let the machine come all the way up, log in as root, "echo 8:5 > /sys/power/resume" (I think that was the name), then "echo resume > /sys/power/state"? Hmm, you would have to bypass "swapon -a", e.g. boot with the -b kernel parameter. Or I'll bet one could do something equivalent in the initrd -- much more user friendly. But the friendliest of all would be if the swsusp resume call were not a late_initcall but rather were called just before the root was mounted, after the initrd (if any) had loaded whatever modules. I think you're confirming that that approach would not blow up the kernel -- if it will work with the root mounted and user space in full roar (well, skimpy roar with the -b switch), then it's got to be OK at the earlier time. I'll see what I can do. James F. Carter Voice 310 825 2897FAX 310 206 6673 UCLA-Mathnet; 6115 MSA; 405 Hilgard Ave.; Los Angeles, CA, USA 90095-1555 Email: [EMAIL PROTECTED] http://www.math.ucla.edu/~jimc (q.v. for PGP key) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Industry db benchmark result on recent 2.6 kernels
Nick Piggin wrote on Tuesday, March 29, 2005 5:32 PM > If it is doing a lot of mapping/unmapping (or fork/exit), then that > might explain why 2.6.11 is worse. > > Fortunately there are more patches to improve this on the way. Once benchmark reaches steady state, there is no mapping/unmapping going on. Actually, the virtual address space for all the processes are so stable at steady state that we don't even see it grow or shrink. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Industry db benchmark result on recent 2.6 kernels
Linus Torvalds wrote: On Tue, 29 Mar 2005, Chen, Kenneth W wrote: Linus Torvalds wrote on Tuesday, March 29, 2005 4:00 PM The fact that it seems to fluctuate pretty wildly makes me wonder how stable the numbers are. I can't resist myself from bragging. The high point in the fluctuation might be because someone is working hard trying to make 2.6 kernel run faster. Hint hint hint . ;-) Heh. How do you explain the low-point? If there's somebody out there working hard on making it run slower, I want to whack the guy ;) If it is doing a lot of mapping/unmapping (or fork/exit), then that might explain why 2.6.11 is worse. Fortunately there are more patches to improve this on the way. Kernel profiles would be useful if possible. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: memcpy(a,b,CONST) is not inlined by gcc 3.4.1 in Linux kernel
>> On Tue, Mar 29, 2005 at 05:37:06PM +0300, Denis Vlasenko wrote: >> > /* >> > * This looks horribly ugly, but the compiler can optimize it totally, >> > * as the count is constant. >> > */ >> > static inline void * __constant_memcpy(void * to, const void * from, >> > size_t n) { >> > if (n <= 128) >> > return __builtin_memcpy(to, from, n); >> >> The problem is that in GCC < 4.0 there is no constant propagation >> pass before expanding builtin functions, so the __builtin_memcpy >> call above sees a variable rather than a constant. > >or change "size_t n" to "const size_t n" will also fix the issue. >As we do some (well very little and with inlining and const values) >const progation before 4.0.0 on the trees before expanding the builtin. > >-- Pinski >- I used the following "const size_t n" change on x86_64 and it reduced the memcpy count from 1088 to 609 with my setup and gcc 3.4.3. (kernel 2.6.12-rc1, running now) --- include/asm-x86_64/string.h.~1~ 2005-03-02 08:38:33.0 +0100 +++ include/asm-x86_64/string.h 2005-03-30 03:24:35.0 +0200 @@ -28,9 +28,9 @@ function. */ #define __HAVE_ARCH_MEMCPY 1 -extern void *__memcpy(void *to, const void *from, size_t len); +extern void *__memcpy(void *to, const void *from, const size_t len); #define memcpy(dst,src,len) \ - ({ size_t __len = (len);\ + ({ const size_t __len = (len); \ void *__ret; \ if (__builtin_constant_p(len) && __len >= 64)\ __ret = __memcpy((dst),(src),__len); \ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ppc32: CPM2 PIC cleanup irq_to_siubit array
On Mar 29, 2005, at 5:30 PM, Kumar Gala wrote: Cleaned up irq_to_siubit array so we no longer need to do 1 << (31-bit), just 1 << bit. Will you please put a comment in here that indicates this array now has this computation done? When I wrote it, these bit numbers matched the registers and the documentation, so I didn't take the time to explain. :-) Thanks. -- Dan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/2] fork_connector: add a fork connector
[ Hmmm .. the following pertains more to accounting than to fork_connector, as have my other remarks earlier today. I notice just now I am on a thread whose Subject is "fork_connector". Oh well. Sorry. - pj ] Jay wrote: > You probably can look at it this way: the accounting data being > written out by BSD are per process data and the fork connector > provides information needed to group processes into process > aggregates. I guess so. Though that doesn't provide any explicit guidance as to what the necessary dataflow must be -- who (which essential piece(s) of software) needs the data when, to accomplish what purposes that some Linux users will desire. Well, maybe to someone expert in Process Aggregates, it provides such guidance, implicitly. That's definitely not I. Let me step back a minute here. What's needed to is work from the actual user requirements down to what technical pieces are needed. There's an old saying that if you want something done bad enough, do it yourself. Or, on usenet and now on mailing lists, this has become: if you want something done, post a sufficiently botched example yourself, and someone who actually knows will become sufficiently annoyed to post a useful answer. So here goes my botched effort to work from user requirements down to actual technical pieces needed. I look forward to being shot down in flames. My current understanding of the 'system accounting' requirement is that users of large shared resource servers want to determine, after the fact, what was the usage by or for various tasks/jobs/users/groups/time-periods of various compute resources, in order to perform such tasks as billing and sizing of future equipment needs, and to identify patterns of over or under utilized system resources that might present other opportunities for useful action, or causes for remedial action. I am working under the assumption that there is some accounting (of computer users and resources, not of money ;) software (runnacct, CSA, and ELSA, for example) that runs, after the fact in some post-processing mode, that reads records of actual usage details from disk files and does useful stuff (like generate reports useful to the above requirements) with what it can glean from those records and from other configuration information it can find about the current system (by reading other disk files, typically). This processing can be and often is done in batch mode, and is often scheduled out of a cron job for some time when the system is normally under relatively lighter load, such as late at night. I assume that the information needed by this accounting software includes both the classic BSD accounting records and the information at fork. I am not aware of any other uses of the information from fork, though it would not surprise me to learn that there other such uses - you're welcome to educate me on this matter. I suspect that there is other information, or will be, in addition to the specific details collected by the classic bsd accounting kernel hooks, and in addition to the information at fork, which will also be needed by CSA and/or ELSA, and which also needs to be written to disk files as the data is collected, for subsequent processing by such accounting software as CSA and ELSA, or the classic runacct(1M) daily accounting software and variants. If the above is all true, then the basic problem to solve regarding the information collected at fork is how to get it into a disk file, with close to minimum impact on the system. Since the data is not needed in anything like realtime (or if it is, I don't realize that yet) therefore there is an opportunity to combine the data records into buffers of data, so as to amortize some of the costs of writing the data to disk over several records. The classic bsd accounting hooks do this merging aggressively, in the context of the process doing the exit. The classic accounting hooks may have a problem that they are not NUMA friendly - having all the nodes in a big system trying to simultaneously add small (64 bytes, typically) snippets to the same shared file buffers at the same time might not scale well. These hooks were designed over 25 years ago, when multiprocessing was in its infancy, and may need overhaul. The fork_connector mechanism is being proposed to get the particular bit of information from fork moved to what I presume is a data collector daemon user process, which will I presume then write merged records of this data to disk. This may have the problem that it moves the individual records between various contexts on the system, more than is necessary, before it can be merged into buffers and written. While such data motion does not happen inline to the fork itself, it still has to occur in near realtime (minutes) of the fork event, so still impacts system performance (both CPU cycles and memory footprint) during peak usage hours. Performance impact numbers have been
[PATCH 2.6.12-rc1-mm3] m32r: m32r_sio driver update (was Re: [PATCH] Re: Bitrotting serial drivers)
Hello, Here is an additional patch to update m32r_sio driver. This patch is against 2.6.12-rc1-mm3. m32r_sio driver updates: - Move m32r_sio specific description from asm-m32r/serial.h to driver/serial/m32r_sio.c. - Remove __register_m32r_sio, register_m32r_sio and unregister_m32r_sio from driver/serial/m32r_sio.c. Thank you. From: Russell King <[EMAIL PROTECTED]> Subject: Re: [PATCH] Re: Bitrotting serial drivers Date: Thu, 24 Mar 2005 12:17:46 + > On Thu, Mar 24, 2005 at 07:14:24PM +0900, Hirokazu Takata wrote: > > diff -ruNp a/include/asm-m32r/serial.h b/include/asm-m32r/serial.h > > --- a/include/asm-m32r/serial.h 2004-12-25 06:35:40.0 +0900 > > +++ b/include/asm-m32r/serial.h 2005-03-24 17:25:05.812651363 +0900 > > Can m32r accept PCMCIA cards? If so, this may mean that 8250.c gets > built, which will use this file to determine where it should look for > built-in 8250 ports. > > If this file is used to describe non-8250 compatible ports, you could > end up with a nasty mess. Therefore, I recommend that you do not use > asm-m32r/serial.h to describe your SIO ports. > > Instead, since these definitions are private to your own driver, you > may consider moving them into the driver, or a header file closely > associated with your driver in drivers/serial. Signed-off-by: Hirokazu Takata <[EMAIL PROTECTED]> --- drivers/serial/m32r_sio.c | 131 ++ include/asm-m32r/serial.h | 41 -- 2 files changed, 31 insertions(+), 141 deletions(-) diff -ruNp a/include/asm-m32r/serial.h b/include/asm-m32r/serial.h --- a/include/asm-m32r/serial.h 2005-03-29 21:47:12.912822762 +0900 +++ b/include/asm-m32r/serial.h 2005-03-29 18:15:37.0 +0900 @@ -1,47 +1,10 @@ #ifndef _ASM_M32R_SERIAL_H #define _ASM_M32R_SERIAL_H -/* - * include/asm-m32r/serial.h - */ +/* include/asm-m32r/serial.h */ #include -#include -/* - * This assumes you have a 1.8432 MHz clock for your UART. - * - * It'd be nice if someone built a serial card with a 24.576 MHz - * clock, since the 16550A is capable of handling a top speed of 1.5 - * megabits/second; but this requires the faster clock. - */ -#define BASE_BAUD ( 1843200 / 16 ) - -/* Standard COM flags */ -#define STD_COM_FLAGS (ASYNC_BOOT_AUTOCONF | ASYNC_SKIP_TEST) - -/* Standard PORT definitions */ -#if defined(CONFIG_PLAT_USRV) - -#define STD_SERIAL_PORT_DEFNS \ - /* UART CLK PORT IRQFLAGS */ \ - { 0, BASE_BAUD, 0x3F8, PLD_IRQ_UART0, STD_COM_FLAGS }, /* ttyS0 */ \ - { 0, BASE_BAUD, 0x2F8, PLD_IRQ_UART1, STD_COM_FLAGS }, /* ttyS1 */ - -#else /* !CONFIG_PLAT_USRV */ - -#if defined(CONFIG_SERIAL_M32R_PLDSIO) -#define STD_SERIAL_PORT_DEFNS \ - { 0, BASE_BAUD, ((unsigned long)PLD_ESIO0CR), PLD_IRQ_SIO0_RCV, \ - STD_COM_FLAGS }, /* ttyS0 */ -#else -#define STD_SERIAL_PORT_DEFNS \ - { 0, BASE_BAUD, M32R_SIO_OFFSET, M32R_IRQ_SIO0_R, \ - STD_COM_FLAGS }, /* ttyS0 */ -#endif - -#endif /* !CONFIG_PLAT_USRV */ - -#define SERIAL_PORT_DFNS STD_SERIAL_PORT_DEFNS +#define BASE_BAUD 115200 #endif /* _ASM_M32R_SERIAL_H */ diff -ruNp a/drivers/serial/m32r_sio.c b/drivers/serial/m32r_sio.c --- a/drivers/serial/m32r_sio.c 2005-03-29 21:47:12.924820913 +0900 +++ b/drivers/serial/m32r_sio.c 2005-03-29 21:56:38.001930365 +0900 @@ -54,13 +54,6 @@ #include "m32r_sio_reg.h" /* - * Configuration: - * share_irqs - whether we pass SA_SHIRQ to request_irq(). This option - *is unsafe when used on edge-triggered interrupts. - */ -unsigned int share_irqs_sio = M32R_SIO_SHARE_IRQS; - -/* * Debugging. */ #if 0 @@ -86,15 +79,36 @@ unsigned int share_irqs_sio = M32R_SIO_S #include +/* Standard COM flags */ +#define STD_COM_FLAGS (ASYNC_BOOT_AUTOCONF | ASYNC_SKIP_TEST) + /* * SERIAL_PORT_DFNS tells us about built-in ports that have no * standard enumeration mechanism. Platforms that can find all * serial ports via mechanisms like ACPI or PCI need not supply it. */ -#ifndef SERIAL_PORT_DFNS -#define SERIAL_PORT_DFNS +#undef SERIAL_PORT_DFNS +#if defined(CONFIG_PLAT_USRV) + +#define SERIAL_PORT_DFNS \ + /* UART CLK PORT IRQFLAGS */ \ + { 0, BASE_BAUD, 0x3F8, PLD_IRQ_UART0, STD_COM_FLAGS }, /* ttyS0 */ \ + { 0, BASE_BAUD, 0x2F8, PLD_IRQ_UART1, STD_COM_FLAGS }, /* ttyS1 */ + +#else /* !CONFIG_PLAT_USRV */ + +#if defined(CONFIG_SERIAL_M32R_PLDSIO) +#define SERIAL_PORT_DFNS \ + { 0, BASE_BAUD, ((unsigned long)PLD_ESIO0CR), PLD_IRQ_SIO0_RCV, \ + STD_COM_FLAGS }, /* ttyS0 */ +#else +#define SERIAL_PORT_DFNS \ + { 0, BASE_BAUD, M32R_SIO_OFFSET,
[PATCH 2.6.12-rc1] m32r: Fix spinlock.h for CONFIG_DEBUG_SPINLOCK
This patch is for fixing a build error of asm-m32r/spinlock.h for CONFIG_DEBUG_SPINLOCK. Please apply. Thanks, Signed-off-by: Hirokazu Takata <[EMAIL PROTECTED]> --- include/asm-m32r/spinlock.h |6 ++ 1 files changed, 2 insertions(+), 4 deletions(-) diff -ruNp a/include/asm-m32r/spinlock.h b/include/asm-m32r/spinlock.h --- a/include/asm-m32r/spinlock.h 2005-03-07 14:10:57.0 +0900 +++ b/include/asm-m32r/spinlock.h 2005-03-08 14:08:57.0 +0900 @@ -102,10 +102,8 @@ static inline void _raw_spin_lock(spinlo unsigned long tmp0, tmp1; #ifdef CONFIG_DEBUG_SPINLOCK - __label__ here; -here: - if (lock->magic != SPINLOCK_MAGIC) { - printk("pc: %p\n", &); + if (unlikely(lock->magic != SPINLOCK_MAGIC)) { + printk("pc: %p\n", __builtin_return_address(0)); BUG(); } #endif -- Hirokazu Takata <[EMAIL PROTECTED]> Linux/M32R Project: http://www.linux-m32r.org/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] use cheaper elv_queue_empty when unplug a device
Nick Piggin wrote: Jens Axboe wrote: Looks good, I've been toying with something very similar for a long time myself. Here is another thing I just noticed that should further reduce the locking by at least 1, sometimes 2 lock/unlock pairs per request. At the cost of uglifying the code somewhat. Although it is pretty nicely contained, so Jens you might consider it acceptable as is, or we could investigate how to make it nicer if Kenneth reports some improvement. Note, this isn't runtime tested - it could easily have a bug. OK - I have booted this on a 4-way SMP with SCSI disks, and done some IO tests, and no hangs. So Kenneth if you could look into this one as well, to see if it is worthwhile, that would be great. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Industry db benchmark result on recent 2.6 kernels
On Tue, 29 Mar 2005, Chen, Kenneth W wrote: > > Linus Torvalds wrote on Tuesday, March 29, 2005 4:00 PM > > The fact that it seems to fluctuate pretty wildly makes me wonder > > how stable the numbers are. > > I can't resist myself from bragging. The high point in the fluctuation > might be because someone is working hard trying to make 2.6 kernel run > faster. Hint hint hint . ;-) Heh. How do you explain the low-point? If there's somebody out there working hard on making it run slower, I want to whack the guy ;) Good luck with the million-dollar grants, btw. We're all rooting for you, and hope your manager is a total push-over. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: drivers/net/at1700.c: at1700_probe1: array overflow
On Fri, 25 Mar 2005, Adrian Bunk wrote: > Date: Fri, 25 Mar 2005 21:38:20 +0100 > From: Adrian Bunk <[EMAIL PROTECTED]> > To: Roland Dreier <[EMAIL PROTECTED]> > Cc: [EMAIL PROTECTED], linux-net@vger.kernel.org, > linux-kernel@vger.kernel.org > Subject: Re: drivers/net/at1700.c: at1700_probe1: array overflow > > On Fri, Mar 25, 2005 at 10:42:11AM -0800, Roland Dreier wrote: > > Adrian> This can result in indexing in an array with 8 entries the > > Adrian> 10th entry. > > > > Well, not really, since the first 8 entries of the array have every > > 3-bit pattern. So pos3 & 0x07 will always match one of them. > > > > I agree it would be cleaner to make the loop only go up to 7 though. > > You either have this (impossible) overflow, or the case l_i == 7 isn't > tested explicitely. > > I'd say simply leave it as it is now. > > But if noone disagrees, I'm inclined to add a comment. > > > - R. > > cu > Adrian > But on the other hand why loop if you don't have to? static int at1700_ioaddr_pattern[] __initdata = { - 0x00, 0x04, 0x01, 0x05, 0x02, 0x06, 0x03, 0x07 + 0x00, 0x02, 0x04, 0x06, 0x01, 0x03, 0x05, 0x07 }; ... static int __init at1700_probe1(struct net_device *dev, int ioaddr) { ... - for (l_i = 0; l_i < 0x09; l_i++) - if (( pos3 & 0x07) == at1700_ioaddr_pattern[l_i]) - break; - ioaddr = at1700_mca_probe_list[l_i]; + ioaddr = at1700_mca_probe_list[at1700_ioaddr_pattern[pos3&7]]; ... } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Industry db benchmark result on recent 2.6 kernels
Linus Torvalds wrote on Tuesday, March 29, 2005 4:00 PM > The fact that it seems to fluctuate pretty wildly makes me wonder > how stable the numbers are. I can't resist myself from bragging. The high point in the fluctuation might be because someone is working hard trying to make 2.6 kernel run faster. Hint hint hint . ;-) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i386/x86_64 segment register issuses (Re: PATCH: Fix x86 segment register access)
On Mon, 28 Mar 2005, Andi Kleen wrote: > > "H. J. Lu" <[EMAIL PROTECTED]> writes: > > The new assembler will disallow them since those instructions with > > memory operand will only use the first 16bits. If the memory operand > > is 16bit, you won't see any problems. But if the memory destinatin > > is 32bit, the upper 16bits may have random values. The new assembler > > Does it really have random values on existing x86 hardware? The upper bits are not written at all, so it's not random. > If it is a only a "theoretical" problem that does not happen > in practice I would advise to not do the change. My preference too. The reason we use "movl" is because we really do want the 32-bit versions, since they are faster. It's a conscious choice. In contrast "movw" generates bigger and slower code on all assemblers out there, and "mov" doesn't make it clear which one it is. Is it the slow one, or the fast one? For example, "mov %ds,%eax" does seem to generate the (faster) 32-bit code on modern assemblers, while "mov %ds,%ax" generates (slower) 16-bit code that leaves the high bits of %eax untouched. Sometimes you may want the slower one, sometimes the faster one. I have this pretty strong memory of old versions of gas not making any difference between %ax and %eax as a target, and that you really needed to set the size explicitly. Now, those versions of gas may be so old that nobody cares, but the explicit size still is a GOOD THING. The size DOES MATTER. People who want the smaller and faster version do not want to just rely on gas automatically getting it right, especially since gas has historically been very very bad at getting things right. What is the advantage of not allowing "movl %ds,mem"? Really? Especially since I suspect the kernel is pretty much the only one who does this, and the kernel really does do it on purpose. The kernel explicitly wants the 32-bit version, knowing that the upper bits are undefined. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: How to measure time accurately.
Peter Chubb wrote: "Chris" == Chris Friesen <[EMAIL PROTECTED]> writes: Chris> Most cpus have some way of getting at a counter or decrementer Chris> of various frequencies. Usually it requires low-level hardware Chris> knowledge and often it needs assembly code. As a device driver is inside the linux kernel (unless you're writein a user-mode device driver :-)) you can use the getcycles() macro that's defined for most architectures. It provides a snapshot of the cycle-counter. For ppc this only gives 32-bit values, which overflow every 129 seconds on my G5. Depending on how long you're trying to time, this could be a problem. Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Industry db benchmark result on recent 2.6 kernels
On Mon, 28 Mar 2005, Chen, Kenneth W wrote: > With that said, here goes our first data point along with some historical data > we have collected so far. > > 2.6.11-13% > 2.6.9 - 6% > 2.6.8 -23% > 2.6.2 - 1% > baseline (rhel3) Linus Torvalds wrote on Tuesday, March 29, 2005 4:00 PM > How repeatable are the numbers across reboots with the same kernel? Some > benchmarks will depend heavily on just where things land in memory, > especially with things like PAE or even just cache behaviour (ie if some > frequenly-used page needs to be kmap'ped or not depending on where it > landed). Very repeatable. This workload is very steady and resolution in throughput is repeatable down to 0.1%. We toss everything below that level as noise. > You don't have the PAE issue on ia64, but there could be other issues. > Some of them just disk-layout issues or similar, ie performance might > change depending on where on the disk the data is written in relationship > to where most of the reads come from etc etc. The fact that it seems to > fluctuate pretty wildly makes me wonder how stable the numbers are. This workload has been around for 10+ years and people at Intel studied the characteristics of this workload inside out for 10+ years. Every stones will be turned at least more than once while we tune the entire setup making sure everything is well balanced. And we tune the system whenever there is a hardware change. Data layout on the disk spindle are very well balanced. > Also, it would be absolutely wonderful to see a finer granularity (which > would likely also answer the stability question of the numbers). If you > can do this with the daily snapshots, that would be great. If it's not > easily automatable, or if a run takes a long time, maybe every other or > every third day would be possible? I sure will make my management know that Linus wants to see the performance number on a daily bases (I will ask for a couple of million dollar to my manager for this project :-)) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/