Re: [Orinoco-devel] [0/5] Orinoco merge updates, part the fourth

2005-03-29 Thread abuas_z
Hello,

DG> Here's yet another batch of orinoco updates.

Will this patches be included in CVS-Head code? Now I checked and the
last modification of orinoco.c was about 4 days ago, when switch to
enable monitor mode on all firmwares was included...

DG> Smaller and less significant than the last,

What was the last?

DG> this is basically a handful of remaining small updates before
DG> tackling the big changes (wext v15, monitor and scanning).

So, the next thing that will be do, will be WPA and monitor mode
improvement (maybe in all firmwares)?

-- 
Bye,
 abuas_zmailto:[EMAIL PROTECTED]

The Bat! v3.0.2.10, Windows XP 5.1, Build 2600, Service Pack 1

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Fastboot] Re: Query: Kdump: Core Image ELF Format

2005-03-29 Thread Eric W. Biederman
Mark Williamson <[EMAIL PROTECTED]> writes:

> > The Xen guys idea of memory hotplug is another matter it sounds
> > like the want to page an OS with memory hotplug which is just
> > plain silly, and also unimplemented so I will cross that bridge
> > when I come to it.
> 
> The idea isn't to page the OS per se.  The guest OS is responsible for the 
> fine-grain paging of its applications in the usually way to fit within its 
> physical memory allocation.
> 
> In order to allow coarse-grained changes in physical memory allocation (e.g. 
> I 
> want to shrink a domain by 128MB so I can run another one), XenLinux uses a 
> "balloon driver", which basically allocates a load of memory and gives it 
> back to Xen to be used elsewhere.
> 
> This is currently invoked by the administrator, although we've talked about a 
> daemon that will automatically shift memory allocations around between 
> domains based on their requirements.
> 
> A memory hotplug interface would clean up the ballooning interface somewhat 
> (rather than using pretend allocations) but would still only be activated 
> relatively infrequently.

And what I am really objecting to is xen doing memory allocation in 4KiB
chunks.  Pushing the chunk size up to 2MiB or 4MiB, or even doing
plain extents of memory like the old protected mode OS's did before
paging sounds more reasonable.  Without allowing the OS access to
large contiguous chunks of physical memory you are asking the OS to
give up significant performance tuning opportunities.

Plus with by giving the OS large pages much of the mess of needing a
virtual, logical and physical address is unnecessary and the OS can
simply have virtual and physical addresses as they do not.

In addition large chunks of memory are going to work better with
whatever memory hotplug infrastructure is implemented, than 4KiB
chunks.  As memory hotplug is either going to be memory controller
hotplug (in numa systems) or possible DIMM hotplug is extremely fault
tolerant servers.

So please simply everyone's lives and code and use large pages
in Xen.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFD] 'nice' attribute for executable files

2005-03-29 Thread Bodo Eggert
Wiktor <[EMAIL PROTECTED]> wrote:

> furthermore, on many systems root may want to make users able to run
> some program with lowered nice, but not from root account and without
> having to know the root password... i've found a way to do this using
> shell scripts combined with suid bit and strange fils ownerships, but it
> is absolute diseaster.

You want su1, or maybe sudo.

> so i thought that it would be nice to add an attribute to file
> (changable only for root) that would modify nice value of process when
> it starts. if there is one byte free in ext2/3 file metadata, maybe it
> could be used for that? i think that it woundn't be more dangerous than
> setuid bit.

Remember: xmms might be configured to spawn the shell plugin.



I guess there should be a maximum renice value ulimit instead, which would
allow running allmost any user task on a higher nice level, except the
important stuff, with the additional benefit of being able to temporarily
renice some tasks until the more important work is done.

I remember something similar being discussed for realtime tasks, but I don't
remember the outcome.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

2005-03-29 Thread Andy Isaacson
On Fri, Mar 25, 2005 at 09:58:40AM -0500, Dmitry Torokhov wrote:
> I wonder why ALPS reconnect failed. You don't have a serial console
> set up, do you? If not then maybe you could make a huge framebuffer to
> capture as much info as you can... I hope you have a digital camera ;)

No serial ports brought out on this laptop, and I've not tried
framebuffer...

> Then do "echo 1 > /sys/modules/i8042/parameters/debug" and try to
> suspend. I am interested of data coming in and out of i8042.

Transcribed by hand, the last few bytes are
< fa   ACK
> d4 e9GETINFO
< fa 20 00 64  
> d4 ffRESET_BAT
< fa aa 00 RET_BAT

(Because I used O= the __FILE__ is very long so each dbg() takes two lines
of my 80x25 console...)

Dunno if that's helpful, sorry...

-andy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ppc32: Fix MPC8555 & MPC8555E device lists (updated)

2005-03-29 Thread Kumar Gala
Andrew,

(As I decide how long to keep the paper bag on this time, here is an 
updated patch.  This one actually reduces the number of devices as well as 
removes the device from the lists which makes things work better :)

Removed the FCC3 device from the lists of devices on MPC8555 & MPC8555E
since it does not exist on these processors.

Signed-off-by: Jason McMullan <[EMAIL PROTECTED]>
Signed-off-by: Kumar Gala <[EMAIL PROTECTED]>

---

diff -Nru a/arch/ppc/syslib/mpc85xx_sys.c b/arch/ppc/syslib/mpc85xx_sys.c
--- a/arch/ppc/syslib/mpc85xx_sys.c 2005-03-30 01:23:14 -06:00
+++ b/arch/ppc/syslib/mpc85xx_sys.c 2005-03-30 01:23:14 -06:00
@@ -80,7 +80,7 @@
.ppc_sys_name   = "8555",
.mask   = 0x,
.value  = 0x8071,
-   .num_devices= 20,
+   .num_devices= 19,
.device_list= (enum ppc_sys_devices[])
{
MPC85xx_TSEC1, MPC85xx_TSEC2, MPC85xx_IIC1,
@@ -88,7 +88,7 @@
MPC85xx_PERFMON, MPC85xx_DUART,
MPC85xx_CPM_SPI, MPC85xx_CPM_I2C, MPC85xx_CPM_SCC1,
MPC85xx_CPM_SCC2, MPC85xx_CPM_SCC3,
-   MPC85xx_CPM_FCC1, MPC85xx_CPM_FCC2, MPC85xx_CPM_FCC3,
+   MPC85xx_CPM_FCC1, MPC85xx_CPM_FCC2,
MPC85xx_CPM_SMC1, MPC85xx_CPM_SMC2,
MPC85xx_CPM_USB,
},
@@ -97,7 +97,7 @@
.ppc_sys_name   = "8555E",
.mask   = 0x,
.value  = 0x8079,
-   .num_devices= 21,
+   .num_devices= 20,
.device_list= (enum ppc_sys_devices[])
{
MPC85xx_TSEC1, MPC85xx_TSEC2, MPC85xx_IIC1,
@@ -105,7 +105,7 @@
MPC85xx_PERFMON, MPC85xx_DUART, MPC85xx_SEC2,
MPC85xx_CPM_SPI, MPC85xx_CPM_I2C, MPC85xx_CPM_SCC1,
MPC85xx_CPM_SCC2, MPC85xx_CPM_SCC3,
-   MPC85xx_CPM_FCC1, MPC85xx_CPM_FCC2, MPC85xx_CPM_FCC3,
+   MPC85xx_CPM_FCC1, MPC85xx_CPM_FCC2,
MPC85xx_CPM_SMC1, MPC85xx_CPM_SMC2,
MPC85xx_CPM_USB,
},


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC] DDRaid higher level cluster raid

2005-03-29 Thread Daniel Phillips
Greetings,

I am pleased to be able to present today an interesting project that has kept 
me busy for the last couple of months.

DDRaid is a cluster block device that, together with a cluster filesystem like 
GFS, gives you the ability to operate a "distributed data cluster" where the 
cluster data is distributed redundantly over the nodes of a cluster rather 
than using a single, shared disk.  You could also use ddraid with iscsi or 
fiber channel disks, and it even works reasonably well as a local software 
raid.  But the interesting thing about it to me is the distributed data 
aspect.

As far as I know, ddraid is the first higher level cluster raid, or if that is 
not correct, it is certainly the first to appear as open source.  It is based 
on Raid 3.5, a simple raid scheme I investigated earlier, and presented a 
paper on at Linux Kongress 2002:

   http://sourceware.org/cluster/ddraid/raid35.pdf

Raid 3.5 has the attractive property that it can be implemented without any 
caching or read-before-write, which is very important for a cluster.  Cluster 
caching is a wretchedly complex affair that is normally implemented at a 
higher level by the cluster filesystem and/or vfs.  We certainly do not want 
to have two wretchedly complex layers of cluster caching if we can avoid it. 
This is what you would get by extending Raid 5, say, to operate across a 
cluster.

My Raid 3.5 scheme turned out to work pretty well.  Some initial benchmarks 
were posted yesterday, here:

   https://www.redhat.com/archives/linux-cluster/2005-March/msg00112.html

The executive summary is that on an ideal linear load, ddraid runs about 62% 
faster than a single raw disk.  An example of such a linear load is copying a 
large file.  On random IO loads, ddraid performs no worse than a single raw 
disk.  Of course, increased performance is only the secondary goal of ddraid.  
The primary goal is data redundancy.

Further details on ddraid were provided in the initial project announcement, 
and I will not repeat them here:

   https://www.redhat.com/archives/linux-cluster/2005-March/msg00034.html

My purpose today is twofold: to solicit feedback on some of the kernel issues 
in the ddraid driver, and to introduce some relatively approachable cluster 
code that is easy to install and try out, even if you don't have a cluster.  
In other words, I would like to begin the process of involving more of the 
kernel community in cluster issues.  The ddraid driver is a rather nice test 
case for this, because it touches on most of the interesting cluster issues 
without being particularly big and complex.

Let me start by defining the difference between a cluster block device and a 
non-cluster block device.  It is not necessarily what you would think.  For 
example, you can export a block device over the network, but that does not 
make it a cluster block device: you can still only mount one filesystem at a 
time on it.

Here are some of the things we expect of a cluster block device:

  * Since multiple nodes can access the device simultaneously, the cluster
block device may need to prevent these accesses from interfering in
situations that the cluster filesystem itself has no knowledge of and
therefore cannot handle.

  * If the cluster block device has its own metadata, access to the metadata
must be synchronized across the cluster

  * Cluster control: The cluster block device needs to respond to management
commands arriving from other nodes.  For example, so that a instance of
the device may be created simultaneously on all nodes of the cluster, and
each instance will know how to access the same underlying hardware
resources.

  * Fault tolerance: If the block device relies on services provided by other
nodes, those services need to be able to fail over to other nodes in the
event a node fails.  If a connection is temporarily broken, the cluster
block device should be able to resume operation without failing any IO.

A cluster block device does not need to or should not provide:

  * Caching and cache synchronization.  Except for its own metadata, a cluster
block device should let the cluster filesystem and vfs take care of this.

  * Multiple access.  Every block device already provides this, albeit not
necessarily safely.

A cluster block device may use a cluster lock manager (e.g., gdlm) to 
implement whatever synchronization it needs.  I did not use this approach 
myself.  Instead I used streaming message based synchronization over standard 
sockets, something like DBus.  I did this for efficiency, but it also has the 
attractive side effect of avoiding a dependency on any particular cluster 
lock manager.  Instead I depend only on sockets.

Which brings up an issue.  I implement socket failover by arranging for a 
userspace process to open a new link and pass it to the kernel driver using 
SCM_RIGHTS.  I don't think I can do that with netlink.  So I use PF_UNIX, and 
kludge 

Re: swsusp 'disk' fails in bk-current - intel_agp at fault?

2005-03-29 Thread Andy Isaacson
On Tue, Mar 29, 2005 at 01:42:26PM -0500, Dmitry Torokhov wrote:
> Could you please try the patch below - it should fix the issues you are
[snip]
> --- dtor.orig/drivers/input/serio/serio.c
> +++ dtor/drivers/input/serio/serio.c
>   if (!serio->drv || !serio->drv->reconnect || 
> serio->drv->reconnect(serio)) {
> - serio_disconnect_port(serio);
>   /*
>* Driver re-probing can take a while, so better let kseriod

Yep, that fixes it.  I applied your patch to 2.6.12-rc1-mm1 and
suspended and resumed 5 times in a row without any difficulty.  Thanks!

-andy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Accessing data structure from kernel space

2005-03-29 Thread linux lover
Hello sir,
  
   I successfully added linked list data structure
  in kernel in header file. Write a C source file and
add it to kernel directory. then write 2 system calls
that read and write to linked list from user space
through that syscalls. 
   recompile kernel. Now able to read/write that
linked list.
   I want to write user data in that linked list
and allow kernel to use that info in linked list. Is
my approach to send data from user to kernel  and
store there as long as OS is not rebooted is right?
 Please reply me.
Thanks in advance.
regards,
linux_lover.  
   




__ 
Do you Yahoo!? 
Yahoo! Small Business - Try our new resources site!
http://smallbusiness.yahoo.com/resources/ 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange memory problem with Linux booted from U-Boot

2005-03-29 Thread Eugene Surovegin
On Mon, Mar 28, 2005 at 07:57:52PM +0500, Ara Avanesyan wrote:
> Hi,
> 
> I need some help on solving this strange problem.
> Here is what I have,
> I have a loadable module (linux.2.4.20) which contains a 2 mb static gloabal
> array.
> When I load it from linux booted via U-Boot the system crashes.
> Everything works ok if I do the same thing with the same linux booted with
> RedBoot.

As usual for such problems, check how different firmware configure 
memory controller, etc. Get dump of relevant chip registers under 
U-Boot and RedBoot and compare them.

Other possible problem area can be firmware -> kernel interface. I'm 
not familiar with that particular chip and RedBoot, but it's not 
uncommon for different firmware to have different conventions for the 
environment in which kernel starts execution.

I'd recommend posting to the specific mail-lists, lkml doesn't seem 
a good place for embedded and firmware related questions :)

--
Eugene
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] new fifo I/O elevator that really does nothing at all

2005-03-29 Thread Jens Axboe
On Tue, Mar 29 2005, Bill Davidsen wrote:
> Jens Axboe wrote:
> >On Mon, Mar 28 2005, Chen, Kenneth W wrote:
> >
> >>The noop elevator is still too fat for db transaction processing
> >>workload.  Since the db application already merged all blocks before
> >>sending it down, the I/O presented to the elevator are actually not
> >>merge-able anymore. Since I/O are also random, we don't want to sort
> >>them either.  However the noop elevator is still doing a linear search
> >>on the entire list of requests in the queue.  A noop elevator after
> >>all isn't really noop.
> >>
> >>We are proposing a true no-op elevator algorithm, no merge, no
> >>nothing. Just do first in and first out list management for the I/O
> >>request.  The best name I can come up with is "FIFO".  I also piggy
> >>backed the code onto noop-iosched.c.  I can easily pull those code
> >>into a separate file if people object.  Though, I hope Jens is OK with
> >>it.
> >
> >
> >It's not quite ok, because you don't honor the insertion point in
> >fifo_add_request. The only 'fat' part of the noop io scheduler is the
> >merge stuff, the original plan was to move that to a hash table lookup
> >instead like the other io schedulers do. So I would suggest just
> >changing noop to hash the request on the end point for back merges and
> >forget about front merges, since they are rare anyways. Hmm actually,
> >the last merge hint should catch most of the merges at almost zero cost.
> 
> Making the noop faster is clearly a good thing, but some database 
> software may depend on transaction order as provided by a true fifo 
> process. It would be nice to have both.

Just look at the code. It does FIFO for any request that _isn't_
specified as ELEVATOR_INSERT_FRONT - which means any fs request, or any
plain pc request. There is no specific reordering going on.

Drivers expect to be able to add a request back at the head, for eg
retrying it after a QUEUE_BUSY or similar condition.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/8] CKRM: Core patch set

2005-03-29 Thread Gerrit Huizenga

On Tue, 29 Mar 2005 22:05:30 PST, Paul Jackson wrote:
> gerrit wrote:
> > This is the core patch set for CKRM
> 
> Welcome.
 
 Hi Paul.

> Newcomers to CKRM might want to start reading these patches with "[patch
> 8/8] CKRM:  Documentation".  Starting with patch 0/8 or 1/8 will be
> difficult, at least if you're as dimm witted as I am.
> 
> Even the documentation included in patch 8/8 is missing the motivation
> and context essential to understanding this patch set.  It might have
> helped if the Introduction text at http://ckrm.sourceforge.net/ had been
> included in some form, as part of patch 0/8.  I'm just a little penguin
> here (lkml), but from what I can tell by watching how things work,
> you're going to have to "make the case" -- explain what this is, how
> it's put togeher, and why it's needed.  This is a sizable patch, in
> lines of code, in hooks in critical places, and in amount of "new
> concepts."  I presume (unless you've managed to bribe or blackmail some
> big penguin) you're going to have convince some others that this is
> worth having.  I for one am a CKRM skeptic, so won't be much help to you
> in that quest.  Good luck.
 
 Good point on including the pointer to the web site.  As you probably
 noticed, there is a history of the design, papers presented, etc.
 Also, Jonathan Corbet did a nice write up from the discussion at the
 2004 Kernel summit which is archived here: http://lwn.net/Articles/94573/
 which may be of use.

 The OLS and LinuxTag papers are archived at the site that you pointed
 to and there will be a tutorial on configuring, using and writing
 controllers for CKRM at OLS this year.  You may also want to see the
 previous postings of this code to LKML for more background.

 In short, CKRM provides very basic desktop to server workload management
 capabilities similar to those provided by most of the old fashioned
 operating systems.  The code provides a fairly simple mechanism for
 adding controllers for any resource type and the code is currently
 widely deployed by PlanetLab, a part of Novell/SuSE's distro, and
 the capabilities are requested by a fair number of Linux users and
 customers.

> I don't see any performance numbers, either on small systems, or
> scalability on large systems.  Certainly this patch does not fall under
> the "obviously no performance impact" exclusion.

 Fair point.  We have been running some of the smaller benchmarks but
 have not yet had a chance to do any kind of performance comparison
 based on the current code.  However, when configured out, it will
 have zero impact.  We do have some performance analysis of the code
 with CONFIG_CKRM set to y but no rules configured planned for the
 very near future.
 
> A couple of nits:
> 
>  1) Instead of disabling routines with #defines:
>  #define numtasks_put_ref(core_class)  do {} while (0)
> one can do it with static inlines, preserving more compiler
> checking.
 
 Yeah - that works well in some cases but it turns out to not do so
 well when an argument to a function refers to a structure element
 which is not configured in.  In that case, the compiler emits a
 reference to an undefined structure value in the case of the static
 inline, where otherwise the entire set of code is pre-processed
 away.  I think we've gone through the code and used the correct
 balance of static inlines and #define constructs as appropriate.
 If we've missed any, I'm more than willing to accept a patch to
 correct a specific instance.

>  2) I take it that the following constitutes the 'documentation'
> for what is in /proc//delay.  Perhaps I missed something.
> 
>   +   res  = sprintf(buffer,"%u %llu %llu %u %llu %u %llu\n",
>   +  (unsigned int) get_delay(task,runs),
>   +  (uint64_t) get_delay(task,runcpu_total),
>   +  (uint64_t) get_delay(task,waitcpu_total),
>   +  (unsigned int) get_delay(task,num_iowaits),
>   +  (uint64_t) get_delay(task,iowait_total),
>   +  (unsigned int) get_delay(task,num_memwaits),
>   +  (uint64_t) get_delay(task,mem_iowait_total)
 
 The code is the documentation?  :)

 There is probably some documentation on /proc// in general and
 we'll see if we can get it updated appropriately.  Vivek?

>  3) Typo in init/Kconfig "atleast":
> 
> If you say Y here, enable the Resource Class File System and atleast

 Got it - thanks!  Someone liked the new word "atleast" - at least
 three occurences removed.

 Oh - and uniformly updated diffstats - I probably missed some when
 I was playing with quilt originally.

gerrit
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: no need to check for NULL before calling kfree() -fs/ext2/

2005-03-29 Thread P Lavin
Hi,
In my wlan driver module, i allocated some memory using kmalloc in 
interrupt context, this one failed but its not returning NULL , so i was 
proceeding further everything was going wrong... & finally the kernel 
crahed. Can any one of you tell me why this is happening ? i cannot use 
GFP_KERNEL because i'm calling this function from interrupt context & it 
may block. Any other solution for this ?? I'm concerned abt why kmalloc 
is not returning null if its not a success ??

Is it not necessary to check for NULL before calling kfree() ??
Regards,
Lavin
Pekka J Enberg wrote:
Hi,
Paul Jackson writes:
Even such obvious changes as removing redundant checks doesn't
seem to ensure a performance improvement.  Jesper Juhl posted
performance data for such changes in his microbenchmark a couple
of days ago.

It is not a performance issue, it's an API issue. Please note that 
kfree() is analogous libc free() in terms of NULL checking. People are 
checking NULL twice now because they're confused whether kfree() deals 
it or not.
Paul Jackson writes:

Maybe we should be following your good advice:
> You don't know that until you profile! 
instead of continuing to make these code changes.

I am all for profiling but it should not stop us from merging the 
patches because we can restore the generated code with the included 
(totally untested) patch.
   Pekka
Signed-off-by: Pekka Enberg <[EMAIL PROTECTED]>
---
Index: 2.6/include/linux/slab.h
===
--- 2.6.orig/include/linux/slab.h   2005-03-22 14:31:30.0 
+0200
+++ 2.6/include/linux/slab.h2005-03-30 09:08:13.0 +0300
@@ -105,8 +105,14 @@
  return __kmalloc(size, flags);
}
+static inline void kfree(const void * p)
+{
+   if (!p)
+   return;
+   __kfree(p);
+}
+
extern void *kcalloc(size_t, size_t, int);
-extern void kfree(const void *);
extern unsigned int ksize(const void *);
extern int FASTCALL(kmem_cache_reap(int));
Index: 2.6/mm/slab.c
===
--- 2.6.orig/mm/slab.c  2005-03-22 14:31:31.0 +0200
+++ 2.6/mm/slab.c   2005-03-30 09:08:45.0 +0300
@@ -2567,13 +2567,11 @@
* Don't free memory not originally allocated by kmalloc()
* or you will run into trouble.
*/
-void kfree (const void *objp)
+void __kfree (const void *objp)
{
  kmem_cache_t *c;
  unsigned long flags;
-   if (!objp)
-   return;
  local_irq_save(flags);
  kfree_debugcheck(objp);
  c = GET_PAGE_CACHE(virt_to_page(objp));
@@ -2581,7 +2579,7 @@
  local_irq_restore(flags);
}
-EXPORT_SYMBOL(kfree);
+EXPORT_SYMBOL(__kfree);
#ifdef CONFIG_SMP
/**
-
To unsubscribe from this list: send the line "unsubscribe 
linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-10

2005-03-29 Thread Ingo Molnar

* Lee Revell <[EMAIL PROTECTED]> wrote:

> > could you run a bit with tracing disabled (in the .config) on the C3?  
> > (but wakeup timing still enabled) It may very well be tracing overhead 
> > that makes those latencies that high.  Also, we'd thus have some hard 
> > data on how much overhead tracing is in such a situation, on that CPU.
> 
> I have not left it to run overnight yet with the swappiness set to 
> 100, which triggers the biggest latencies as my entire desktop is 
> swapped out, but so far it looks like the problem was tracing 
> overhead.  With timing enabled but tracing disabled the longest 
> latency on the C3 so far is 270 usecs.
> 
> An important giveaway is that with tracing enabled the same code path 
> only triggers ~200 usec latencies on the K7 but ~2ms on the C3.  Since 
> the longest latency with PREEMPT_DESKTOP is normally more a function 
> of memory bandwidth than processor speed, and the machines differ much 
> more in the latter, this agrees with the theory that the overhead is 
> the problem.

besides cycle overhead, function tracing increases cache footprint - and 
with a CPU that has smaller caches (such as the C3) it can tip a loop 
over the edge, and can make it cache-trashing, while it would fit into 
the cache before. In such a situation the difference can be dramatic.

(on CPUs with larger caches similar artifacts can happen too, but it 
needs a 'fatter' loop, which are apparently rarer.)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07

2005-03-29 Thread Ingo Molnar

* Steven Rostedt <[EMAIL PROTECTED]> wrote:

> OK, I'm declaring defeat here. I've been fighting race conditions all 
> day, and it's now 1 in the morning where I live. It looks like this 
> implementation has no other choice but to have the waking up "pending 
> owner" take the wait_list lock once again.  How heavy of a overhead is 
> that really?

as i mentioned it before, taking a lock is not a big issue at all. Since 
you have to touch the lock data structure anyway (and all of it fits 
into a single cacheline), it doesnt really matter whether it's atomic 
flag setting/clearing, or raw spinlock based.

later on, once things are stable and well-understood, we can still 
attempt to micro-optimize it.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fixup newly added jsm driver

2005-03-29 Thread Christoph Hellwig
On Tue, Mar 29, 2005 at 05:56:13PM -0500, Jeff Garzik wrote:
> It got a decent review, but from Christoph's list it looks like not all 
> the issues raised during the review got addressed.

Exactly.  Most reviews take more than one pass.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/2] fork_connector: add a fork connector

2005-03-29 Thread Paul Jackson
Guillaume wrote:
>   I'm sorry but I really don't understand why you're speaking about
> accounting when I present results about fork connector. I agree that
> ELSA is using the fork connector but the fork connector has nothing to
> do with accounting.

True - sorry.  I kinda hijacked your thread.  I had fork_connector
associated in my mind with process accounting, so made the leap from
analyzing the fork_connector mechanism on its own merit, to analyzing
whether it was useful for collecting the new process accounting
information that was needed from forks.

In my own defense, I don't see where the motivations for fork_connector
are spelled out in the presentation to this patch, and it seems that
the other potential uses of it are less well explored at this point.

So I think my leap was a small one ;).

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fixup newly added jsm driver

2005-03-29 Thread Christoph Hellwig
On Tue, Mar 29, 2005 at 01:47:34PM -0800, Andrew Morton wrote:
> Christoph Hellwig <[EMAIL PROTECTED]> wrote:
> >
> > One more prematurely added drivers..
> 
> This driver was first sent out for review a month ago, was upissued twice
> and generated over seventy linux-kernel emails including some from Russell
> and some from Greg.  It was by no means a "premature" addition.
> 
> One could say that it was inadequately reviewed, but how is one to
> determine that?  If the thing has been under discussion for a month and the
> submitter says "I've addressed all comments" then it's going to get merged.

I don't think the submitter should say all issues have been addressed
but that should come from a sufficiently trusted reviewer.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: API changes to the slab allocator for NUMA memory allocation

2005-03-29 Thread Manfred Spraul
Christoph Lameter wrote:
The patch makes the following function calls available to allocate memory on
a specific node without changing the basic operation of the slab
allocator:
kmem_cache_alloc_node(kmem_cache_t *cachep, unsigned int flags, int node);
kmalloc_node(size_t size, unsigned int flags, int node);
 

I intentionally didn't add a kmalloc_node() function:
kmalloc is just a wrapper around 
kmem_find_general_cachep+kmem_cache_alloc. It exists only for 
efficiency. The _node functions are slow, thus a wrapper is IMHO not 
required. kmalloc_node(size,flags,node) is identical to 
kmem_cache_alloc(kmem_find_general_cachep(size,flags),flags,node). What 
about making kmem_find_general_cachep() public again and removing 
kmalloc_node()?

And I don't know if it's a good idea to make kmalloc() a special case of 
kmalloc_node(): It adds one parameter to every kmalloc call and 
kmem_cache_alloc call, virtually everyone passes -1. Does it increase 
the .text size?

--
   Manfred
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/2] fork_connector: add a fork connector

2005-03-29 Thread Guillaume Thouvenin
On Tue, 2005-03-29 at 22:06 -0800, dean gaudet wrote:
> On Tue, 29 Mar 2005, Jay Lan wrote:
> 
> > The fork_connector is not designed to solve accounting data collection
> > problem.
> > 
> > The accounting data collection must be done via a hook from do_exit().
> 
> by the time do_exit() occurs the parent may have disappeared... you do 
> need to record something at fork() time so that you can account to the 
> correct ancestor.

You're right. At fork(), the "job daemon", provided by ELSA, records
information about parent PID, child PID and also about the group of
processes they belong to. At exit(), accounting data are recorded by CSA
or BSD-like accounting. 

> an example of where this ancestry is useful would be the summation of all 
> cpu time spent by children of apache, spamd, clamd, ...

You're right. One usage can be: apache, spamd and clamd can be put in a
job (a group of processes) by using the "job daemon" and automatically,
all children will belong to the same jobs respectively. So the gaol here
is really to perform per-group of processes accounting using ELSA and
CSA accounting data.

Best Regards,
Guillaume

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/2] fork_connector: add a fork connector

2005-03-29 Thread Paul Jackson
Guillaume wrote:
> When I wrote "several user space applications" it was just to say that
> this fork connector is not designed only for ELSA and fork information
> is available to every listeners.

So I suppose if fork_connector were not used to collect  information for accounting, then someone would have to make
the case that there were enough other uses, of sufficient value, to add
fork_connector.  We have to be a bit careful, in the kernel, to avoid
adding mechanisms until we have the immediate use in hand.  If we don't
do this, then the kernel ends up looking like the Gargoyles on a
Renaissance church - burdened with overly ornate features serving no
earthly purpose.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07

2005-03-29 Thread Steven Rostedt
On Sat, 2005-03-26 at 11:04 -0500, Steven Rostedt wrote:
> 
> On Fri, 25 Mar 2005, Esben Nielsen wrote:

> >
> > I like the idea of having the scheduler take care of it - it is a very
> > optimal coded queue-system after all. That will work on UP but not on SMP.
> > Having the unlock operation to set the mutex in a "partially owned" state
> > will work better. The only problem I see, relative to Ingo's
> > implementation, is that then the awoken task have to go in and
> > change the state of the mutex, i.e. it has to lock the wait_lock again.
> > Will the extra schedulings being the problem happen offen enough in
> > practise to have the extra overhead?
> 
> Another answer is to have the "pending owner" bit be part of the task
> structure. A flag maybe.  If a higher priority process comes in and
> decides to grab the lock from this owner, it does a test_and_clear on the
> this flag on the pending owner task.  When the pending owner task wakes
> up, it does the test_and_clear on its own bit.  Who ever had the bit set
> on the test wins. If the higher prio task were to clear it first, then it
> takes the ownership away from the pending owner.  If the pending owner
> were to clear the bit first, it won and would contiue as though it got the
> lock.  The higher priority tasks would do this test within the wait_lock
> to keep from having more than one trying to grab the lock from the pending
> owner, but the pending owner won't need to do anything since it will know
> if it was the new owner just by testing its own bit.

OK, I'm declaring defeat here. I've been fighting race conditions all
day, and it's now 1 in the morning where I live. It looks like this
implementation has no other choice but to have the waking up "pending
owner" take the wait_list lock once again.  How heavy of a overhead is
that really?  

The problem I've painfully discovered, is that a task trying to take the
lock must test the pending owner for two things. One is, is the pending
owner owning the same lock as the one the task is trying to get. Since
the waking up of the pending owner has no synchronous locking, it can
grab the lock and then become a pending owner to another lock after the
other task thinks it's still the pending owner of the lock its trying to
get, but before testing it. So it can mistake it as the pending owner
still for this lock, when in reality it owns to lock and is pending for
another.

The other test must also do the test_and_clear_bit on the pending owner
bit. So you need to make sure the owner not only stays the owner of the
lock the task is trying to get, but also be able to do the atomic
test_and_clear on the owner's pending owner bit.

I can't get these two in sync without grabbing a lock (in this case, the
wait_list lock). 

Ingo, unless you can think of a way to do this, tomorrow (actually
today), I'll change this to have the end of __down (and friends) grab
the wait_list lock to test and clear it's pending owner bit. 

-- Steve


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] the nommu support for ARM linux 2.6.10

2005-03-29 Thread Hyok S. Choi
An updated MPU and noMMU support patch for ARM against linux 2.6.10 kernel
is available at :

http://opensrc.sec.samsung.com/download/linux-2.6.10-hsc1.patch.gz

You can select the memory management type "MPU" or "NONE" in the arm kernel
configuration menu, which was traditionally known as "armnommu" or
uClinux/ARM by 2.6.9. (sure, you can choose "MMU" for vanila Linux :-)

It's a different way from other uclinux arch. (i.e. m68knommu), which
enables simultaneous support to use "singular address space" support even
for MMU platforms.
You can choose "MMU" or "NONE" for your mmu based arm platform with a few
modification. i.e. virtual address --> physical address conversion.

the 2.6.11.6-hsc0 patch will be available in this week, and some benchmark
will be provided for both cases on a same h/w platform.
and addtional MPU support API is pending for some services like memory
protection, even for uClinux.

any suggesstions welcomed.

You can reach the project home at : http://opensrc.sec.samsung.com/

currently officially supported platforms are : s3c24a0, s5c7375, atmel,
espd_s3c510b, P2001, s3c3410, s3cb0x.
thanks for contribution : Tobias Lorenz and Jiun-Shian Ho

Regards,
Hyok

---
CHOI, HYOK-SUNG
Engineer (Kernel/System Architecture)
Digital Media R Center, Samsung Electronics Co.,Ltd.
tel: +82-31-200-8594  fax: +82-31-200-3427
e-mail: [EMAIL PROTECTED]
[Linux 2.6 ARM MPU/noMMU Kernel Maintainer] http://opensrc.sec.samsung.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.12-rc1-bk2+PREEMPT_BKL: Oops at serio_interrupt

2005-03-29 Thread Dmitry Torokhov
On Tuesday 29 March 2005 14:49, Alexey Dobriyan wrote:
> On Tuesday 29 March 2005 23:02, Dmitry Torokhov wrote:
> > On Tue, 29 Mar 2005 21:28:20 +0400, Alexey Dobriyan <[EMAIL PROTECTED]> 
> > wrote:
> > > On Tuesday 29 March 2005 10:27, Dmitry Torokhov wrote:
> > > > On Monday 28 March 2005 12:26, Alexey Dobriyan wrote:
> > > > > Steps to reproduce for me:
> > > > > * Boot CONFIG_PREEMPT_BKL=y kernel (.config, dmesg are attached)
> > > > > * Start rebooting
> > > > > * Start moving serial mouse (I have Genius NetMouse Pro)
> > > > > * Right after gpm is shut down I see the oops
> > > > > * The system continues to reboot
> > > >
> > > > Could you try the patch below, please? Thanks!
> > > 
> > > > Input: serport - fix an Oops when closing port - should not call
> > > >serio_interrupt when serio port is being unregistered.
> > > 
> > > Doesn't work, sorry. Even worse: rebooting now also produces many pages of
> > > oopsen, then hang the system. I'm willing to test any new patches.
> > 
> > Does it oops at the same place with this patch or at some other place?
> 
> I manage to find this in the logs (nothing more :-( ):
> 
> Unable to handle kernel NULL pointer dereference at virtual address 0068
>  printing eip:
> c0202947
> *pde = 
> Oops:  [#1]
> PREEMPT 
> Modules linked in: ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables 
> binfmt_misc uhci_hcd snd_intel8x0 snd_ac97_codec snd_pcm snd_timer snd 
> soundcore snd_page_alloc floppy
> CPU:0
> EIP:0060:[]Not tainted VLI
> EFLAGS: 00010282   (2.6.12-rc1-bk2-serio) 
> 
> According to vmlinux, c0202947 is at:
> 
> c020293e :

Could you please try this one instead? Thanks!

-- 
Dmitry


 serport.c |   98 +++---
 1 files changed, 68 insertions(+), 30 deletions(-)

Index: dtor/drivers/input/serio/serport.c
===
--- dtor.orig/drivers/input/serio/serport.c
+++ dtor/drivers/input/serio/serport.c
@@ -27,11 +27,15 @@ MODULE_LICENSE("GPL");
 MODULE_ALIAS_LDISC(N_MOUSE);
 
 #define SERPORT_BUSY   1
+#define SERPORT_ACTIVE 2
+#define SERPORT_DEAD   3
 
 struct serport {
struct tty_struct *tty;
wait_queue_head_t wait;
struct serio *serio;
+   struct serio_device_id id;
+   spinlock_t lock;
unsigned long flags;
 };
 
@@ -45,11 +49,29 @@ static int serport_serio_write(struct se
return -(serport->tty->driver->write(serport->tty, , 1) != 1);
 }
 
+static int serport_serio_open(struct serio *serio)
+{
+   struct serport *serport = serio->port_data;
+   unsigned long flags;
+
+   spin_lock_irqsave(>lock, flags);
+   set_bit(SERPORT_ACTIVE, >flags);
+   spin_unlock_irqrestore(>lock, flags);
+
+   return 0;
+}
+
+
 static void serport_serio_close(struct serio *serio)
 {
struct serport *serport = serio->port_data;
+   unsigned long flags;
+
+   spin_lock_irqsave(>lock, flags);
+   clear_bit(SERPORT_ACTIVE, >flags);
+   set_bit(SERPORT_DEAD, >flags);
+   spin_unlock_irqrestore(>lock, flags);
 
-   serport->serio->id.type = 0;
wake_up_interruptible(>wait);
 }
 
@@ -61,36 +83,21 @@ static void serport_serio_close(struct s
 static int serport_ldisc_open(struct tty_struct *tty)
 {
struct serport *serport;
-   struct serio *serio;
-   char name[64];
 
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
 
-   serport = kmalloc(sizeof(struct serport), GFP_KERNEL);
-   serio = kmalloc(sizeof(struct serio), GFP_KERNEL);
-   if (unlikely(!serport || !serio)) {
-   kfree(serport);
-   kfree(serio);
+   serport = kcalloc(1, sizeof(struct serport), GFP_KERNEL);
+   if (!serport)
return -ENOMEM;
-   }
 
-   memset(serport, 0, sizeof(struct serport));
-   serport->serio = serio;
-   set_bit(TTY_DO_WRITE_WAKEUP, >flags);
serport->tty = tty;
-   tty->disc_data = serport;
-
-   memset(serio, 0, sizeof(struct serio));
-   strlcpy(serio->name, "Serial port", sizeof(serio->name));
-   snprintf(serio->phys, sizeof(serio->phys), "%s/serio0", tty_name(tty, 
name));
-   serio->id.type = SERIO_RS232;
-   serio->write = serport_serio_write;
-   serio->close = serport_serio_close;
-   serio->port_data = serport;
-
+   spin_lock_init(>lock);
init_waitqueue_head(>wait);
 
+   tty->disc_data = serport;
+   set_bit(TTY_DO_WRITE_WAKEUP, >flags);
+
return 0;
 }
 
@@ -100,7 +107,8 @@ static int serport_ldisc_open(struct tty
 
 static void serport_ldisc_close(struct tty_struct *tty)
 {
-   struct serport *serport = (struct serport*) tty->disc_data;
+   struct serport *serport = (struct serport *) 

Re: [patch 1/2] fork_connector: add a fork connector

2005-03-29 Thread Paul Jackson
Dean wrote:
> by the time do_exit() occurs the parent may have disappeared

I don't think Jay was disagreeing with this.  I think he agrees
that there is to be collected:
 1) the classic bsd accounting data, in do_exit
 2) the fork time  by some mechanism at
fork time (perhaps just not the fork_connect mechanism)
 3) some additional data to be harvested at exit time, for CSA

I suspect you two are just tripping over words to describe this.

However, this does expose another possibility.  Record the original
forking parent pid in another task_struct field at fork time (didn't
someone else have a 'bio_pid' patch to this affect?), and add that task
struct value to the list of additional items to be written out, at exit
time.

I was skeptical that CBUS could have zero impact on fork, but recording
one more word in the task struct at fork gets about as close to zero
impact as one can get on fork.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Reduce stack usage in module.c

2005-03-29 Thread Yum Rayan
On Tue, 29 Mar 2005 09:43:12 -0800, Randy.Dunlap <[EMAIL PROTECTED]> wrote:
> Yum Rayan wrote:
> > Attempt to reduce stack usage in module.c (linux-2.6.12-rc1-mm3).
> > Specifically from checkstack.pl
> >
> > Before patch
> > --
> > who_is_doing_it: 512
> > obsolete_params: 160
> >
> > After patch
> > 
> > who_is_doing_it: none
> So all function local variables are in registers?

Yes, all function local variables of the patched who_is_doing_it(...)
are in registers.

> > Also while at it, fix following in who_is_doing_it(...)
> > - use only as much memory is needed
> > - do not write past array index for the boundary case
> 
> I don't see a boundary case problem with the current code,
> hence I don't see why the kmalloc(len + 1, GFP_KERNEL) is
> needed...

Let's consider the original code and len = 513

   1399 static void who_is_doing_it(void)
   1400 {
   1401 /* Print out all the args. */
   1402 char args[512];
   1403 unsigned long i, len = current->mm->arg_end -
current->mm->arg_start;
   1404
   1405 if (len > 512)
   1406 len = 512;
   1407
   1408 len -= copy_from_user(args, (void
*)current->mm->arg_start, len);
   1409
   1410 for (i = 0; i < len; i++) {
   1411 if (args[i] == '\0')
   1412 args[i] = ' ';
   1413 }
   1414 args[i] = 0;
   1415 printk("ARGS: %s\n", args);
   1416 }

After lines 1410 thru 1413, "i" wil be 512. So line 1414 will be
"args[512] = 0". But args is 512 byte array with last legally
accessible element at 511?

> File names start one level deeper than wanted.  They should begin
> with linux/ or a/ or ./ e.g.
> There are plenty of docs on this, please let me know if you need
> references to them.

Point noted. Will post patch to linux/Documentation/SubmittingPatches,
hopefully making it more clear. Reworked patch at end of email.

> 
> > @@ -769,15 +769,25 @@
> >   struct kernel_param *kp;
> >   unsigned int i;
> >   int ret;
> > + char *sym_name = NULL;
> > + unsigned int sym_name_len = 0;
> >
> >   kp = kmalloc(sizeof(kp[0]) * num, GFP_KERNEL);
> >   if (!kp)
> >   return -ENOMEM;
> 
> Style thing, I guess, but since the case of num == 0 doesn't do
> anything here, I would just begin the function with:
> 
>if (!num)
>return;
> or  goto out;
> to maintain one return point.
> 
> and then eliminate the kmalloc()s, if (num), kfree()s, and
> parse_args().

Was attempting to preserve the call flow of the previous author. But
yes, this makes more sense. I changed code to return "0" for !num
case.

Thanks,
Rayan

Summary: Reduce stack usage in obsolete_params() and who_is_doing_it()
Target: linux-2.6.12-rc1-mm3
Signed-off-by: Yum Rayan <[EMAIL PROTECTED]>

--- a/kernel/module.c   2005-03-25 22:11:06.0 -0800
+++ b/kernel/module.c   2005-03-29 22:16:09.0 -0800
@@ -767,17 +767,27 @@
   const char *strtab)
 {
struct kernel_param *kp;
-   unsigned int i;
+   char *sym_name;
+   unsigned int sym_name_len, i;
int ret;
 
+   if (!num)
+   return 0;
+
kp = kmalloc(sizeof(kp[0]) * num, GFP_KERNEL);
if (!kp)
return -ENOMEM;
 
-   for (i = 0; i < num; i++) {
-   char sym_name[128 + sizeof(MODULE_SYMBOL_PREFIX)];
+   sym_name_len = 128 + sizeof (MODULE_SYMBOL_PREFIX);
+   sym_name = kmalloc(sym_name_len, GFP_KERNEL);
+   if (!sym_name) {
+   ret = -ENOMEM;
+   goto free_kp;
+   }
 
-   snprintf(sym_name, sizeof(sym_name), "%s%s",
+   for (i = 0; i < num; i++) {
+   
+   snprintf(sym_name, sym_name_len, "%s%s",
 MODULE_SYMBOL_PREFIX, obsparm[i].name);
 
kp[i].name = obsparm[i].name;
@@ -791,13 +801,15 @@
printk("%s: falsely claims to have parameter %s\n",
   name, obsparm[i].name);
ret = -EINVAL;
-   goto out;
+   goto free_sym;
}
kp[i].arg = [i];
}
 
ret = parse_args(name, args, kp, num, NULL);
- out:
+ free_sym:
+   kfree(sym_name);
+ free_kp:
kfree(kp);
return ret;
 }
@@ -1399,12 +1411,16 @@
 static void who_is_doing_it(void)
 {
/* Print out all the args. */
-   char args[512];
+   char *args;
unsigned long i, len = current->mm->arg_end - current->mm->arg_start;
 
if (len > 512)
len = 512;
 
+   args = kmalloc(len + 1, GFP_KERNEL);
+   if (!args)
+   return;
+
len -= copy_from_user(args, (void *)current->mm->arg_start, len);
 
for (i = 0; i < len; i++) {
@@ -1413,6 +1429,7 @@
}
args[i] = 0;
printk("ARGS: %s\n", args);
+   

Re: no need to check for NULL before calling kfree() -fs/ext2/

2005-03-29 Thread Paul Jackson
Pekka writes:
> It is not a performance issue, it's an API issue.
> ...
> I am all for profiling but it should not stop us from merging the patches 

Ok - sounds right.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: memcpy(a,b,CONST) is not inlined by gcc 3.4.1 in Linux kernel

2005-03-29 Thread Denis Vlasenko
On Wednesday 30 March 2005 05:27, Gerold Jury wrote:
> 
> >> On Tue, Mar 29, 2005 at 05:37:06PM +0300, Denis Vlasenko wrote:
> >> > /*
> >> >  * This looks horribly ugly, but the compiler can optimize it totally,
> >> >  * as the count is constant.
> >> >  */
> >> > static inline void * __constant_memcpy(void * to, const void * from,
> >> > size_t n) {
> >> > if (n <= 128)
> >> > return __builtin_memcpy(to, from, n);
> >>
> >> The problem is that in GCC < 4.0 there is no constant propagation
> >> pass before expanding builtin functions, so the __builtin_memcpy
> >> call above sees a variable rather than a constant.
> >
> >or change "size_t n" to "const size_t n" will also fix the issue.
> >As we do some (well very little and with inlining and const values)
> >const progation before 4.0.0 on the trees before expanding the builtin.
> >
> >-- Pinski
> >-
> I used the following "const size_t n" change on x86_64
> and it reduced the memcpy count from 1088 to 609 with my setup and gcc 3.4.3.
> (kernel 2.6.12-rc1, running now)

What do you mean, 'reduced'?

(/me is checking)

Oh shit... It still emits half of memcpys, to be exact - for
struct copies:

arch/i386/kernel/process.c:

int copy_thread(int nr, unsigned long clone_flags, unsigned long esp,
unsigned long unused,
struct task_struct * p, struct pt_regs * regs)
{
struct pt_regs * childregs;
struct task_struct *tsk;
int err;

childregs = ((struct pt_regs *) (THREAD_SIZE + (unsigned long) 
p->thread_info)) - 1;
*childregs = *regs;
^^^
childregs->eax = 0;
childregs->esp = esp;

# make arch/i386/kernel/process.s

copy_thread:
pushl   %ebp
movl%esp, %ebp
pushl   %edi
pushl   %esi
pushl   %ebx
subl$20, %esp
movl24(%ebp), %eax
movl4(%eax), %esi
pushl   $60
leal8132(%esi), %ebx
pushl   28(%ebp)
pushl   %ebx
callmemcpy  <=
movl$0, 24(%ebx)
movl16(%ebp), %eax
movl%eax, 52(%ebx)
movl24(%ebp), %edx
addl$8192, %esi
movl%ebx, 516(%edx)
movl%esi, -32(%ebp)
movl%esi, 504(%edx)
movl$ret_from_fork, 512(%edx)

Jakub, is there a way to instruct gcc to inine this copy, or better yet,
to use user-supplied inline version of memcpy?
--
vda

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: no need to check for NULL before calling kfree() -fs/ext2/

2005-03-29 Thread Pekka J Enberg
Hi, 

Paul Jackson writes:
Even such obvious changes as removing redundant checks doesn't
seem to ensure a performance improvement.  Jesper Juhl posted
performance data for such changes in his microbenchmark a couple
of days ago.
It is not a performance issue, it's an API issue. Please note that kfree() 
is analogous libc free() in terms of NULL checking. People are checking NULL 
twice now because they're confused whether kfree() deals it or not. 

Paul Jackson writes:
Maybe we should be following your good advice: 

> You don't know that until you profile!  

instead of continuing to make these code changes.
I am all for profiling but it should not stop us from merging the patches 
because we can restore the generated code with the included (totally 
untested) patch. 

   Pekka 

Signed-off-by: Pekka Enberg <[EMAIL PROTECTED]>
--- 

Index: 2.6/include/linux/slab.h
===
--- 2.6.orig/include/linux/slab.h   2005-03-22 14:31:30.0 +0200
+++ 2.6/include/linux/slab.h2005-03-30 09:08:13.0 +0300
@@ -105,8 +105,14 @@
  return __kmalloc(size, flags);
} 

+static inline void kfree(const void * p)
+{
+   if (!p)
+   return;
+   __kfree(p);
+}
+
extern void *kcalloc(size_t, size_t, int);
-extern void kfree(const void *);
extern unsigned int ksize(const void *); 

extern int FASTCALL(kmem_cache_reap(int));
Index: 2.6/mm/slab.c
===
--- 2.6.orig/mm/slab.c  2005-03-22 14:31:31.0 +0200
+++ 2.6/mm/slab.c   2005-03-30 09:08:45.0 +0300
@@ -2567,13 +2567,11 @@
* Don't free memory not originally allocated by kmalloc()
* or you will run into trouble.
*/
-void kfree (const void *objp)
+void __kfree (const void *objp)
{
  kmem_cache_t *c;
  unsigned long flags; 

-   if (!objp)
-   return;
  local_irq_save(flags);
  kfree_debugcheck(objp);
  c = GET_PAGE_CACHE(virt_to_page(objp));
@@ -2581,7 +2579,7 @@
  local_irq_restore(flags);
} 

-EXPORT_SYMBOL(kfree);
+EXPORT_SYMBOL(__kfree); 

#ifdef CONFIG_SMP
/** 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ACPI] Re: [BKPATCH] ACPI for 2.6.12-rc1

2005-03-29 Thread Yu, Luming
On Tuesday 29 March 2005 16:13, Romano Giannetti wrote:
>  This is to report an issue with 2.6.11 and ACPI battery/ac. The resume is:
>  acpi battery with preemptive kernel do not work, while the same kernel
> with no preempt works ok. I have tried to collect all the possible info;
> tell me if you need something more.
>
>  The details:
>
>  The working kernel is 2.6.11 with the patch from the acpi-devel list to
> fix acpi keys (not working otherwise). See for a description
>  http://bugme.osdl.org/show_bug.cgi?id=4124

If you can find AE_AML_BUFFER_LIMIT in your long, then, it should be 
interpreter bug.  please see http://bugzilla.kernel.org/show_bug.cgi?id=4150
Otherwise, maybe it is related to EC driver.
-- 
Thanks,
Luming
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/2] fork_connector: add a fork connector

2005-03-29 Thread dean gaudet
On Tue, 29 Mar 2005, Jay Lan wrote:

> The fork_connector is not designed to solve accounting data collection
> problem.
> 
> The accounting data collection must be done via a hook from do_exit().

by the time do_exit() occurs the parent may have disappeared... you do 
need to record something at fork() time so that you can account to the 
correct ancestor.

an example of where this ancestry is useful would be the summation of all 
cpu time spent by children of apache, spamd, clamd, ...

-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/8] CKRM: Core patch set

2005-03-29 Thread Paul Jackson
gerrit wrote:
> This is the core patch set for CKRM

Welcome.

Newcomers to CKRM might want to start reading these patches with "[patch
8/8] CKRM:  Documentation".  Starting with patch 0/8 or 1/8 will be
difficult, at least if you're as dimm witted as I am.

Even the documentation included in patch 8/8 is missing the motivation
and context essential to understanding this patch set.  It might have
helped if the Introduction text at http://ckrm.sourceforge.net/ had been
included in some form, as part of patch 0/8.  I'm just a little penguin
here (lkml), but from what I can tell by watching how things work,
you're going to have to "make the case" -- explain what this is, how
it's put togeher, and why it's needed.  This is a sizable patch, in
lines of code, in hooks in critical places, and in amount of "new
concepts."  I presume (unless you've managed to bribe or blackmail some
big penguin) you're going to have convince some others that this is
worth having.  I for one am a CKRM skeptic, so won't be much help to you
in that quest.  Good luck.

I don't see any performance numbers, either on small systems, or
scalability on large systems.  Certainly this patch does not fall under
the "obviously no performance impact" exclusion.

Here's a combined diffstat showing how much code is added by these
patches, where.  Some of the patches have individual diffstat's, some
don't seem to.

 Documentation/ckrm/TODO  |   17
 Documentation/ckrm/ckrm_basics   |   66 ++
 Documentation/ckrm/core_usage|   72 +++
 Documentation/ckrm/crbce |   33 +
 Documentation/ckrm/installation  |   70 +++
 Documentation/ckrm/rbce_basics   |   67 ++
 Documentation/ckrm/rbce_usage|   98 
 fs/Makefile  |1
 fs/exec.c|2
 fs/proc/array.c  |   18
 fs/proc/base.c   |   17
 fs/proc/internal.h   |1
 fs/rcfs/Makefile |9
 fs/rcfs/dir.c|  220 +
 fs/rcfs/inode.c  |  160 ++
 fs/rcfs/magic.c  |  517 ++
 fs/rcfs/rootdir.c|  220 +
 fs/rcfs/socket_fs.c  |  280 
 fs/rcfs/super.c  |  291 
 fs/rcfs/tc_magic.c   |   93 
 include/linux/ckrm_ce.h  |   95 
 include/linux/ckrm_events.h  |  230 +-
 include/linux/ckrm_net.h |   42 +
 include/linux/ckrm_rc.h  |  345 +++
 include/linux/ckrm_tc.h  |   46 ++
 include/linux/ckrm_tsk.h |   35 +
 include/linux/rcfs.h |  116 -
 include/linux/sched.h|  105 
 include/linux/taskdelays.h   |   35 +
 include/net/sock.h   |3
 include/net/tcp.h|4
 init/Kconfig |   68 ++
 init/main.c  |2
 kernel/Makefile  |1
 kernel/ckrm/Makefile |   14
 kernel/ckrm/ckrm.c   |  892 +++
 kernel/ckrm/ckrm_events.c|   86 +++
 kernel/ckrm/ckrm_numtasks.c  |  522 ++
 kernel/ckrm/ckrm_numtasks_stub.c |   53 ++
 kernel/ckrm/ckrm_sockc.c |  559 
 kernel/ckrm/ckrm_tc.c|  745 
 kernel/ckrm/ckrmutils.c  |  188 
 kernel/exit.c|3
 kernel/fork.c|   12
 kernel/sched.c   |   20
 kernel/sys.c |   11
 mm/memory.c  |   10
 net/ipv4/tcp_ipv4.c  |5
 48 files changed, 6460 insertions(+), 39 deletions(-)

A couple of nits:

 1) Instead of disabling routines with #defines:
 #define numtasks_put_ref(core_class)  do {} while (0)
one can do it with static inlines, preserving more compiler
checking.

 2) I take it that the following constitutes the 'documentation'
for what is in /proc//delay.  Perhaps I missed something.

+   res  = sprintf(buffer,"%u %llu %llu %u %llu %u %llu\n",
+  (unsigned int) get_delay(task,runs),
+  (uint64_t) get_delay(task,runcpu_total),
+  (uint64_t) get_delay(task,waitcpu_total),
+  (unsigned int) get_delay(task,num_iowaits),
+  (uint64_t) get_delay(task,iowait_total),
+  (unsigned int) get_delay(task,num_memwaits),
+  (uint64_t) get_delay(task,mem_iowait_total)

 3) Typo in init/Kconfig "atleast":

If you say Y here, enable the Resource Class File System and atleast

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe 

Re: prefetch on ppc64

2005-03-29 Thread Paul Mackerras
Antonio Vargas writes:

> Don't know exactly about power5, but G5 processor is described on IBM
> docs as doing automatic whole-page prefetch read-ahead when detecting
> linear accesses.

Sure, but linked lists would rarely be laid out linearly in memory.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Driver States

2005-03-29 Thread Patrick Mochel

On Sun, 27 Mar 2005, Adam Belay wrote:

> Dynamic power management may require devices and drivers to transition
> between various physical and logical states.  I would like to start a
> discussion on how these might be defined at the bus, driver, and class
> levels.



> Bus Level
> =
> At the bus level, there are two state attributes, power and
> enable/disable.  Enable/disable may mean different things on different
> buses, but they generally refer to resource decoding.  A device can only
> be enabled during a non-off power state.

<...>

> Driver Level
> 
> At the driver level there are two areas of interest, physical and
> logical state.  There is an additional concern of transitioning between
> these states multiple times.  Because a driver acts as a bridge between
> physical and logical components, I think separating these steps seems
> natural.

<...>

> *attach - allocates data structures, creates sysfs entries, prepares driver
>to handle the hardware.
>
> *start -  Sets up device resources and configures the hardware.  Loads
> firmware, etc.
> (physical)
>
> *open -   engages the hardware, and makes it usable by the class device.
> (logical and physical)
>
> *close -  disengages the hardware, and stops class level access
> (logical and physical)
>
> *stop -   physically disables the hardware
> (physical)
>
> *detach - tears down the driver and releases it from the "struct device"
>

You have a few things here that can easily conflict, and that will be
developed at different paces. I like the direction that it's going, but
how do you intend to do it gradually. I.e. what to do first?


Pat

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ACPI] 2.6.12-rc1-mm[1-3]: ACPI battery monitor does not work

2005-03-29 Thread Yu, Luming
On Tuesday 29 March 2005 17:56, Rafael J. Wysocki wrote:
> Hi,
>
> There is a problem on my box (Asus L5D, x86-64 kernel) with the ACPI
> battery driver in the 2.6.12-rc1-mm[1-3] kernels.  Namely, the battery
> monitor that I use (the kpowersave applet from SUSE 9.2) is no longer able
> to report the battery status (ie how much % it is loaded).  It can only
> check if the AC power is connected (if it is connected, kpowersave behaves
> as though there was no battery in the box, and if it is not connected,
> kpowersave always shows that the battery is 1% loaded).
>
> Also, there are big latencies on loading and accessing the battery module,
> but the module loads successfully and there's nothing suspicious in dmesg.
>
> Please let me know if you need any additional information.
>
> Greets,
> Rafael

Could you just revert ec-mode patch, then retest? 
-- 
Thanks,
Luming
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] embarassing typo

2005-03-29 Thread Gene Heskett
On Tuesday 29 March 2005 20:40, Dmitry Torokhov wrote:
>On Tuesday 29 March 2005 16:58, Michael Tokarev wrote:
>> Well, it's a matter of readability mostly.  For now at least, when
>> char is always 8 bytes...
>
>Wow, that's one huge char you have there ;)

Yeah, I was gonna ask what language is so complex as to need an 8 byte 
char?

Certainly not an earthly one I'd think ;)

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.34% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com and AOL/TW attorneys please note, additions to the above
message by Gene Heskett are:
Copyright 2005 by Maurice Eugene Heskett, all rights reserved.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/2] fork_connector: add a fork connector

2005-03-29 Thread Guillaume Thouvenin
On Tue, 2005-03-29 at 07:35 -0800, Paul Jackson wrote:
> Guillaume wrote:
> > I ran some test using the CBUS instead of the cn_netlink_send() routine
> > and the overhead is nearly 0%:
> 
> Overhead of what?  Does this include merging the data and getting it to
> disk?

I test the overhead of sending the fork information to a user space
application. The merge of the data is done later and it has nothing to
do with the fork connector...

> Am I even asking the right question here - is it true that this data,
> when collected for accounting purposes, needs to go to disk, and that
> summarizing and analyzing the data is done 'off-line', perhaps hours
> later?  That's the way it was 25 years ago ... but perhaps the basic
> data flow appropriate for accounting has changed since then.

  Accounting is another problem and, as you said previously, summarizing
and analyzing the data is done later. 

  I'm sorry but I really don't understand why you're speaking about
accounting when I present results about fork connector. I agree that
ELSA is using the fork connector but the fork connector has nothing to
do with accounting.

Regards,
Guillaume


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Pageset Localization V2

2005-03-29 Thread Christoph Lameter
This patch modifies the way pagesets in struct zone are allocated. It relocates
the pagesets contained in a zone for each cpu to the node that is nearest to
the cpu instead keeping the pagesets in the (possibly remote) target zone.
This means that the operations to manage caches of pages on remote zones can
be done with information available in the local zone.

The patch depends on the API changes to the slab allocator posted before
this patch.

AIM7 benchmark on a 32 CPU SMP system:

w/o patches:
Tasksjobs/min  jti  jobs/min/task  real   cpu
1  484.68  100   484.6769 12.01  1.97   Fri Mar 25 11:01:42 
2005
  10027140.46   89   271.4046 21.44148.71   Fri Mar 25 11:02:04 
2005
  20030792.02   82   153.9601 37.80296.72   Fri Mar 25 11:02:42 
2005
  30032209.27   81   107.3642 54.21451.34   Fri Mar 25 11:03:37 
2005
  40034962.83   7887.4071 66.59588.97   Fri Mar 25 11:04:44 
2005
  50031676.92   7563.3538 91.87742.71   Fri Mar 25 11:06:16 
2005
  60036032.69   7360.0545 96.91885.44   Fri Mar 25 11:07:54 
2005
  70035540.43   7750.7720114.63   1024.28   Fri Mar 25 11:09:49 
2005
  80033906.70   7442.3834137.32   1181.65   Fri Mar 25 11:12:06 
2005
  90034120.67   7337.9119153.51   1325.26   Fri Mar 25 11:14:41 
2005
 100034802.37   7434.8024167.23   1465.26   Fri Mar 25 11:17:28 
2005

with Slab API changes and pageset patch:

Tasksjobs/min  jti  jobs/min/task  real   cpu
1  485.00  100   485. 12.00  1.96   Fri Mar 25 11:46:18 
2005
  10028000.96   89   280.0096 20.79150.45   Fri Mar 25 11:46:39 
2005
  20032285.80   79   161.4290 36.05293.37   Fri Mar 25 11:47:16 
2005
  30040424.15   84   134.7472 43.19438.42   Fri Mar 25 11:47:59 
2005
  40039155.01   7997.8875 59.46590.05   Fri Mar 25 11:48:59 
2005
  50037881.25   8275.7625 76.82730.19   Fri Mar 25 11:50:16 
2005
  60039083.14   7865.1386 89.35872.79   Fri Mar 25 11:51:46 
2005
  70038627.83   7755.1826105.47   1022.46   Fri Mar 25 11:53:32 
2005
  80039631.94   7849.5399117.48   1169.94   Fri Mar 25 11:55:30 
2005
  90036903.70   7941.0041141.94   1310.78   Fri Mar 25 11:57:53 
2005
 100036201.23   7736.2012160.77   1458.31   Fri Mar 25 12:00:34 
2005

The major improvement is in the mid range when running 100-600 tasks. For 1 task
there is barely any improvement since most data will be locally allocated. In 
the high
range other factors seem to become important.

Patch against 2.6.11.6-bk3

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
Signed-off-by: Shobhit Dayal <[EMAIL PROTECTED]>
Signed-off-by: Shai Fultheim <[EMAIL PROTECTED]>

Index: linux-2.6.11/drivers/base/node.c
===
--- linux-2.6.11.orig/drivers/base/node.c   2005-03-21 13:18:06.0 
-0800
+++ linux-2.6.11/drivers/base/node.c2005-03-21 13:22:06.0 -0800
@@ -87,7 +87,7 @@ static ssize_t node_read_numastat(struct
for (i = 0; i < MAX_NR_ZONES; i++) {
struct zone *z = >node_zones[i];
for (cpu = 0; cpu < NR_CPUS; cpu++) {
-   struct per_cpu_pageset *ps = >pageset[cpu];
+   struct per_cpu_pageset *ps = z->pageset[cpu];
numa_hit += ps->numa_hit;
numa_miss += ps->numa_miss;
numa_foreign += ps->numa_foreign;
Index: linux-2.6.11/include/linux/mm.h
===
--- linux-2.6.11.orig/include/linux/mm.h2005-03-21 13:18:06.0 
-0800
+++ linux-2.6.11/include/linux/mm.h 2005-03-21 13:22:06.0 -0800
@@ -691,6 +691,7 @@ extern void mem_init(void);
 extern void show_mem(void);
 extern void si_meminfo(struct sysinfo * val);
 extern void si_meminfo_node(struct sysinfo *val, int nid);
+extern void setup_per_cpu_pageset(void);

 /* prio_tree.c */
 void vma_prio_tree_add(struct vm_area_struct *, struct vm_area_struct *old);
Index: linux-2.6.11/include/linux/mmzone.h
===
--- linux-2.6.11.orig/include/linux/mmzone.h2005-03-21 13:21:59.0 
-0800
+++ linux-2.6.11/include/linux/mmzone.h 2005-03-21 13:22:06.0 -0800
@@ -122,7 +122,7 @@ struct zone {
 */
unsigned long   lowmem_reserve[MAX_NR_ZONES];

-   struct per_cpu_pageset  pageset[NR_CPUS];
+   struct per_cpu_pageset  *pageset[NR_CPUS];

/*
 * free areas of different sizes
Index: linux-2.6.11/init/main.c
===
--- linux-2.6.11.orig/init/main.c   

Re: Mac mini sound woes

2005-03-29 Thread Lee Revell
On Wed, 2005-03-30 at 03:48 +0200, Marcin Dalecki wrote:
> On 2005-03-30, at 01:39, Benjamin Herrenschmidt wrote:
> > Look at the pile of junk that are most winmodem driver implementations,
> > nothing I want to see in the kernel ever. Those things should be in
> > userland.
> 
> You are joking? Linux IS NOT an RT OS.

Are you joking?  Any system that can capture audio, do a little DSP on
it and play it back without skipping can drive a Winmodem.  Are you
saying Linux can't possibly do that because it's not an RTOS?

I bet you could implement a Winmodem driver as a JACK client.

Lee



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Aligning file system data

2005-03-29 Thread Barry K. Nathan
On Tue, Mar 29, 2005 at 11:32:16PM -0500, John Richard Moser wrote:
> Does crossing a
> track boundary incur anything expensive?

AFAIK, yes. It's going to involve some kind of seeking (even a head
switch needs microjogging on modern drives), and it will certainly add
latency (although I don't remember how much, off the top of my head).

However, trying to control this from the kernel may be vastly harder
than you're expecting (assuming a modern hard drive). You may want to
look at these pages for more info:

http://www.storagereview.com/guide2000/ref/hdd/geom/tracksZBR.html
http://www.storagereview.com/guide2000/ref/hdd/geom/geomLogical.html

Also look at the last paragraph on this page -- not the paragraph with
the "Stop" sign, but the one after it:
http://www.storagereview.com/guide2000/ref/hdd/geom/formatDefect.html


I think this could in fact be done, but it would be a lot of effort,
and the kernel would need knowledge on a per-drive-model basis (or
at least it would need a way to obtain such knowledge from user space,
and the per-model knowledge would need to be stored there somehow).
For all I know, vendor-specific commands might also be needed in order
to find out which blocks are remapped, in order to use that knowledge to
avoid changing tracks spuriously. (And one other note: Since your device
almost certainly has many tracks with well over 256 sectors in reality,
your device is actually incapable of reading or writing a single track
with a single ATA command unless it supports LBA48.)

-Barry K. Nathan <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/2] fork_connector: add a fork connector

2005-03-29 Thread Guillaume Thouvenin
On Tue, 2005-03-29 at 07:23 -0800, Paul Jackson wrote:
> Guillaume wrote:
> >   The goal of the fork connector is to inform a user space application
> > that a fork occurs in the kernel. This information (cpu ID, parent PID
> > and child PID) can be used by several user space applications. It's not
> > only for accounting. Accounting and fork_connector are two different
> > things and thus, fork_connector doesn't do the merge of any kinds of
> > data (and it will never do). 
> 
> Yes - it is clear that the fork_connector does this - inform user space
> of fork information .  I'm not saying that
> fork_connector should merge data; I'm observing that it doesn't, and
> that this would seem to serve the needs of accounting poorly.
> 
> Out of curiosity, what are these 'several user space applications?'  The
> only one I know of is this extension to bsd accounting to include
> capturing parent and child pid at fork.  Probably you've mentioned some
> other uses of fork_connector before here, but I missed it.

During the discussion some people like Erich Focht and Ram mentioned
that this information can be useful for them. I remember that Erich had
in mind something like cluster-wide pid tracking in user space. 

When I wrote "several user space applications" it was just to say that
this fork connector is not designed only for ELSA and fork information
is available to every listeners.

Regards,
Guillaume

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: prefetch on ppc64

2005-03-29 Thread Antonio Vargas
On Wed, 30 Mar 2005 13:55:25 +1000, Paul Mackerras <[EMAIL PROTECTED]> wrote:
> Serge E. Hallyn writes:
> 
> > While investigating the inordinate performance impact one of my patches
> > seemed to be having, we tracked it down to two hlist_for_each_entry
> > loops, and finally to the prefetch instruction in the loop.
> 
> I would be interested to know what results you get if you leave the
> loops using hlist_for_each_entry but change prefetch() and prefetchw()
> to do the dcbt or dcbtst instruction only if the address is non-zero,
> like this:
> 
> static inline void prefetch(const void *x)
> {
> if (x)
> __asm__ __volatile__ ("dcbt 0,%0" : : "r" (x));
> }
> 
> static inline void prefetchw(const void *x)
> {
> if (x)
> __asm__ __volatile__ ("dcbtst 0,%0" : : "r" (x));
> }
> 
> It seems that doing a prefetch on a NULL pointer, while it doesn't
> cause a fault, does waste time looking for a translation of the zero
> address.
> 
> Paul.

Don't know exactly about power5, but G5 processor is described on IBM
docs as doing automatic whole-page prefetch read-ahead when detecting
linear accesses.

-- 
Greetz, Antonio Vargas aka winden of network

http://wind.codepixel.com/

Las cosas no son lo que parecen, excepto cuando parecen lo que si son.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Aligning file system data

2005-03-29 Thread Bernd Eckenfels
In article <[EMAIL PROTECTED]> you wrote:
> How likely is it that I can actually align stuff to 31.5KiB on the
> physical disk, i.e. have each block be a track?

It is not that easy to allign on tracks, even on raw partition. Some disks
have different length of tracks (of course because the inner cylinders are
shorter), some show a totally different geometry than they have internally,
and the disks are happyly remapping.

With raid and lvm the situation get worse.

Why do you want to do thoe micro optimizations?

With a filesystem in between you have virtuelly no way to allign larger
files for streaming.

Let the buffer cache and prefetch do, what they are intended for and feel
happy.

Greetings
Bernd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Aligning file system data

2005-03-29 Thread John Richard Moser
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Well then, the verdict is reached.

My original design is based around storing related data in the same
block so that the track cache allows me to evade doing reads while I
poke around.

The design will stay the same; but the dependency on the track cache
will dissappear.  I'll simply consider 32KiB or 64KiB to be a nice block
size, 64KiB being the biggest, and leverage the design on the kernel
reading whole blocks into main memory to play with at a time.

Back to designing my file system. . . .


The only lasting regrets I have is that I don't have a good, fast way to
do on-disk locking for a cluster file system.  This would make my FS a
complete solution. . . .

It doesn't matter, finishing the design is a while off anyway.  I still
have to define several extended journal transaction types to support
fault tolerant dynamic resizing (grow, shrink) while running.  I don't
see how to grow left; shrinking from the left is easy enough.  Wait,
suddenly I see how to grow left:  Superblock at the end, and a bit of
magic. . . .


Robert Hancock wrote:
> John Richard Moser wrote:
> 
>> How likely is it that I can actually align stuff to 31.5KiB on the
>> physical disk, i.e. have each block be a track?
> 
> 
> I don't think this is very likely. Even being able to find out what the
> physical disk arrangement is, or whether it is consistent in terms of
> track size, etc. seems unlikely.
> 
>>
>> Rather than leveraging the track cache, would it be less expensive for
>> me to simply read in blocks totaling about 16 or 32KiB all at once?
> 
> 
> For block sizes that small I think that the kernel should be smart
> enough to do this itself, there is no need to concern with such low
> level details in the application.
> 
>> How much more latency is involved in (B) than in (C)?  Does crossing a
>> track boundary incur anything expensive?
> 
> 
> Given that both the disk and the kernel will likely read far more than
> 32KB ahead I can't see much difference other than the overhead inside
> your application..
> 

- --
All content of all messages exchanged herein are left in the
Public Domain, unless otherwise explicitly stated.

Creative brains are a valuable, limited resource. They shouldn't be
wasted on re-inventing the wheel when there are so many fascinating
new problems waiting out there.
 -- Eric Steven Raymond
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD4DBQFCSjmPhDd4aOud5P8RAgB7AJiWq4Qiyfk1G0SJa+5ZCtJ//WH8AJ9ysogo
3z6+FLvkNgyU/k0o9HBf1w==
=OPXo
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


API changes to the slab allocator for NUMA memory allocation

2005-03-29 Thread Christoph Lameter
The patch makes the following function calls available to allocate memory on
a specific node without changing the basic operation of the slab
allocator:

 kmem_cache_alloc_node(kmem_cache_t *cachep, unsigned int flags, int node);
 kmalloc_node(size_t size, unsigned int flags, int node);

These are similar then to the existing node-blind functions:

 kmem_cache_alloc(kmem_cache_t *cachep, unsigned int flags);
 kmalloc(size, flags);

The implementation for kmalloc_node is a slight variation on the old
kmalloc function. kmem_cache_alloc_node was changed to pass flags and
the node information through the existing layers of the slab allocator
(which lead to some minor rearrangements). The functions at the lowest
layer (kmem_getpages, cache_grow) are already node aware.
Also __alloc_percpu can call kmalloc_node now.

This patch is necessary for the pageset localization patch posted
after this patch. The pageset patch also contains results of an
AIM7 benchmark that exercises this patch.

Patch against 2.6.11.6-bk3

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

Index: linux-2.6.11/include/linux/slab.h
===
--- linux-2.6.11.orig/include/linux/slab.h  2005-03-29 15:02:20.0 
-0800
+++ linux-2.6.11/include/linux/slab.h   2005-03-29 18:17:19.0 -0800
@@ -61,15 +61,6 @@ extern kmem_cache_t *kmem_cache_create(c
   void (*)(void *, kmem_cache_t *, 
unsigned long));
 extern int kmem_cache_destroy(kmem_cache_t *);
 extern int kmem_cache_shrink(kmem_cache_t *);
-extern void *kmem_cache_alloc(kmem_cache_t *, unsigned int __nocast);
-#ifdef CONFIG_NUMA
-extern void *kmem_cache_alloc_node(kmem_cache_t *, int);
-#else
-static inline void *kmem_cache_alloc_node(kmem_cache_t *cachep, int node)
-{
-   return kmem_cache_alloc(cachep, GFP_KERNEL);
-}
-#endif
 extern void kmem_cache_free(kmem_cache_t *, void *);
 extern unsigned int kmem_cache_size(kmem_cache_t *);

@@ -80,9 +71,23 @@ struct cache_sizes {
kmem_cache_t*cs_dmacachep;
 };
 extern struct cache_sizes malloc_sizes[];
-extern void *__kmalloc(size_t, unsigned int __nocast);

-static inline void *kmalloc(size_t size, unsigned int __nocast flags)
+extern void *__kmalloc_node(size_t, unsigned int __nocast, int node);
+#ifdef CONFIG_NUMA
+extern void *kmem_cache_alloc_node(kmem_cache_t *, unsigned int __nocast, int);
+#define kmem_cache_alloc(cachep, flags) kmem_cache_alloc_node(cachep, flags, 
-1)
+#else
+extern void *kmem_cache_alloc(kmem_cache_t *, unsigned int __nocast);
+#define kmem_cache_alloc_node(cachep, flags, node) kmem_cache_alloc(cachep, 
flags)
+#endif
+
+#define __kmalloc(size, flags) __kmalloc_node(size, flags, -1)
+#define kmalloc(size, flags) kmalloc_node(size, flags, -1)
+
+/*
+ * Allocating memory on a specific node.
+ */
+static inline void *kmalloc_node(size_t size, unsigned int flags, int node)
 {
if (__builtin_constant_p(size)) {
int i = 0;
@@ -98,11 +103,11 @@ static inline void *kmalloc(size_t size,
__you_cannot_kmalloc_that_much();
}
 found:
-   return kmem_cache_alloc((flags & GFP_DMA) ?
+   return kmem_cache_alloc_node((flags & GFP_DMA) ?
malloc_sizes[i].cs_dmacachep :
-   malloc_sizes[i].cs_cachep, flags);
+   malloc_sizes[i].cs_cachep, flags, node);
}
-   return __kmalloc(size, flags);
+   return __kmalloc_node(size, flags, node);
 }

 extern void *kcalloc(size_t, size_t, unsigned int __nocast);
Index: linux-2.6.11/mm/slab.c
===
--- linux-2.6.11.orig/mm/slab.c 2005-03-29 15:02:20.0 -0800
+++ linux-2.6.11/mm/slab.c  2005-03-29 15:02:27.0 -0800
@@ -676,7 +676,7 @@ static struct array_cache *alloc_arrayca
kmem_cache_t *cachep;
cachep = kmem_find_general_cachep(memsize, GFP_KERNEL);
if (cachep)
-   nc = kmem_cache_alloc_node(cachep, cpu_to_node(cpu));
+   nc = kmem_cache_alloc_node(cachep, GFP_KERNEL, 
cpu_to_node(cpu));
}
if (!nc)
nc = kmalloc(memsize, GFP_KERNEL);
@@ -1988,7 +1988,7 @@ bad:
 #define check_slabp(x,y) do { } while(0)
 #endif

-static void *cache_alloc_refill(kmem_cache_t *cachep, unsigned int __nocast 
flags)
+static void *cache_alloc_refill(kmem_cache_t *cachep, unsigned int __nocast 
flags, int node)
 {
int batchcount;
struct kmem_list3 *l3;
@@ -2070,7 +2070,7 @@ alloc_done:

if (unlikely(!ac->avail)) {
int x;
-   x = cache_grow(cachep, flags, -1);
+   x = cache_grow(cachep, flags, node);

// cache_grow can reenable interrupts, then ac could change.
ac = ac_data(cachep);
@@ -2140,7 +2140,7 @@ cache_alloc_debugcheck_after(kmem_cache_
 

Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-10

2005-03-29 Thread Lee Revell
On Sun, 2005-03-27 at 10:58 +0200, Ingo Molnar wrote:
> * Lee Revell <[EMAIL PROTECTED]> wrote:
> 
> > Running for several days with PREEMPT_DESKTOP, on the Athlon XP the 
> > worst latency I am seeing is ~150 usecs!  But on the C3 its about 4ms:
> 
> could you run a bit with tracing disabled (in the .config) on the C3?  
> (but wakeup timing still enabled) It may very well be tracing overhead 
> that makes those latencies that high.  Also, we'd thus have some hard 
> data on how much overhead tracing is in such a situation, on that CPU.
> 

I have not left it to run overnight yet with the swappiness set to 100,
which triggers the biggest latencies as my entire desktop is swapped
out, but so far it looks like the problem was tracing overhead.  With
timing enabled but tracing disabled the longest latency on the C3 so far
is 270 usecs.

An important giveaway is that with tracing enabled the same code path
only triggers ~200 usec latencies on the K7 but ~2ms on the C3.  Since
the longest latency with PREEMPT_DESKTOP is normally more a function of
memory bandwidth than processor speed, and the machines differ much more
in the latter, this agrees with the theory that the overhead is the
problem.

Lee

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Mac mini sound woes

2005-03-29 Thread Steven Rostedt
On Wed, 2005-03-30 at 03:45 +0200, Marcin Dalecki wrote:

> > I think your misunderstanding is that you beliieve user-space can't do
> > RT.  It's wrong.  See JACK (jackit.sf.net), for example.
> 
> I know JACK in and out. It doesn't provide what you claim.

Are you implying that "He don't know JACK!"

Sorry, couldn't resist. Move along now, nothing to see here :-)  God
it's late, I need to go to bed.

Is that an American phrase. If so, it might not be understood elsewhere.
So just in case others don't understand this stupid joke. There's a
phrase "You don't know Jack" which is equivalent to saying "you don't
know what you're talking about".  Which makes this kind of a pun. 

-- Steve


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Aligning file system data

2005-03-29 Thread Robert Hancock
John Richard Moser wrote:
How likely is it that I can actually align stuff to 31.5KiB on the
physical disk, i.e. have each block be a track?
I don't think this is very likely. Even being able to find out what the 
physical disk arrangement is, or whether it is consistent in terms of 
track size, etc. seems unlikely.

Rather than leveraging the track cache, would it be less expensive for
me to simply read in blocks totaling about 16 or 32KiB all at once?
For block sizes that small I think that the kernel should be smart 
enough to do this itself, there is no need to concern with such low 
level details in the application.

How much more latency is involved in (B) than in (C)?  Does crossing a
track boundary incur anything expensive?
Given that both the disk and the kernel will likely read far more than 
32KB ahead I can't see much difference other than the overhead inside 
your application..

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Aligning file system data

2005-03-29 Thread John Richard Moser
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

How likely is it that I can actually align stuff to 31.5KiB on the
physical disk, i.e. have each block be a track?

Rather than leveraging the track cache, would it be less expensive for
me to simply read in blocks totaling about 16 or 32KiB all at once?


Let's say I have two situations...

A)
  My blocks are all 31.5KiB (512 bytes/sector * 63 sectors) and aligned
to tracks.  The track cache on the disk stores the entire block, so
repeted reads to the disk are 0mS seek.  I leverage this to read a
couple sectors at a time and seek as I care within the block while it's
cached, making several requests to the ATA device.

B)
  My blocks are all 32KiB and cross track boundaries.  All of them exist
in part in two separate tracks.  Upon reading a block, I request the
entire block and work with it in main memory.

Which situation has less overhead?

C)
  My blocks are all 31.5KiB and perfectly aligned within tracks.  I read
the entire block as in (B) and work with it in main memory.

How much more latency is involved in (B) than in (C)?  Does crossing a
track boundary incur anything expensive?


- --
All content of all messages exchanged herein are left in the
Public Domain, unless otherwise explicitly stated.

Creative brains are a valuable, limited resource. They shouldn't be
wasted on re-inventing the wheel when there are so many fascinating
new problems waiting out there.
 -- Eric Steven Raymond
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCSivPhDd4aOud5P8RAszeAJ4wPonhpXas8IprMBUq8/NdM57aegCdEBva
24LXB3O+7GEE0XKxPBFr1L0=
=iTEm
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ppc32: CPM2 PIC cleanup irq_to_siubit array

2005-03-29 Thread Kumar Gala
Done, updated patch w/comment sent to Andrew.
- kumar
On Mar 29, 2005, at 7:10 PM, Dan Malek wrote:

On Mar 29, 2005, at 5:30 PM, Kumar Gala wrote:
> Cleaned up irq_to_siubit array so we no longer need to do 1 <<
> (31-bit),
 > just 1 << bit.
Will you please put a comment in here that indicates this array now
 has this computation done?  When I wrote it, these bit numbers
 matched the registers and the documentation, so I didn't take
the time to explain. :-)
Thanks.

    -- Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ppc32: CPM2 PIC cleanup irq_to_siubit array (updated)

2005-03-29 Thread Kumar Gala
Andrew,

(Updated this patch to include a comment at Dan Malek's request.)

Cleaned up irq_to_siubit array so we no longer need to do 1 << (31-bit), 
just 1 << bit.

Signed-off-by: Kumar Gala <[EMAIL PROTECTED]>

---

diff -Nru a/arch/ppc/syslib/cpm2_pic.c b/arch/ppc/syslib/cpm2_pic.c
--- a/arch/ppc/syslib/cpm2_pic.c2005-03-29 22:15:24 -06:00
+++ b/arch/ppc/syslib/cpm2_pic.c2005-03-29 22:15:24 -06:00
@@ -32,15 +32,17 @@
0, 0, 0, 0, 0, 0, 0, 0
 };
 
+/* bit numbers do not match the docs, these are precomputed so the bit for
+ * a given irq is (1 << irq_to_siubit[irq]) */
 static u_char  irq_to_siubit[] = {
-   31, 16, 17, 18, 19, 20, 21, 22,
-   23, 24, 25, 26, 27, 28, 29, 30,
-   29, 30, 16, 17, 18, 19, 20, 21,
-   22, 23, 24, 25, 26, 27, 28, 31,
-0,  1,  2,  3,  4,  5,  6,  7,
-8,  9, 10, 11, 12, 13, 14, 15,
-   15, 14, 13, 12, 11, 10,  9,  8,
-7,  6,  5,  4,  3,  2,  1,  0
+0, 15, 14, 13, 12, 11, 10,  9, 
+8,  7,  6,  5,  4,  3,  2,  1, 
+2,  1, 15, 14, 13, 12, 11, 10, 
+9,  8,  7,  6,  5,  4,  3,  0, 
+   31, 30, 29, 28, 27, 26, 25, 24, 
+   23, 22, 21, 20, 19, 18, 17, 16, 
+   16, 17, 18, 19, 20, 21, 22, 23, 
+   24, 25, 26, 27, 28, 29, 30, 31, 
 };
 
 static void cpm2_mask_irq(unsigned int irq_nr)
@@ -54,7 +56,7 @@
word = irq_to_siureg[irq_nr];
 
simr = &(cpm2_immr->im_intctl.ic_simrh);
-   ppc_cached_irq_mask[word] &= ~(1 << (31 - bit));
+   ppc_cached_irq_mask[word] &= ~(1 << bit);
simr[word] = ppc_cached_irq_mask[word];
 }
 
@@ -69,7 +71,7 @@
word = irq_to_siureg[irq_nr];
 
simr = &(cpm2_immr->im_intctl.ic_simrh);
-   ppc_cached_irq_mask[word] |= (1 << (31 - bit));
+   ppc_cached_irq_mask[word] |= 1 << bit;
simr[word] = ppc_cached_irq_mask[word];
 }
 
@@ -85,9 +87,9 @@
 
simr = &(cpm2_immr->im_intctl.ic_simrh);
sipnr = &(cpm2_immr->im_intctl.ic_sipnrh);
-   ppc_cached_irq_mask[word] &= ~(1 << (31 - bit));
+   ppc_cached_irq_mask[word] &= ~(1 << bit);
simr[word] = ppc_cached_irq_mask[word];
-   sipnr[word] = 1 << (31 - bit);
+   sipnr[word] = 1 << bit;
 }
 
 static void cpm2_end_irq(unsigned int irq_nr)
@@ -103,7 +105,7 @@
word = irq_to_siureg[irq_nr];
 
simr = &(cpm2_immr->im_intctl.ic_simrh);
-   ppc_cached_irq_mask[word] |= (1 << (31 - bit));
+   ppc_cached_irq_mask[word] |= 1 << bit;
simr[word] = ppc_cached_irq_mask[word];
}
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Mac mini sound woes

2005-03-29 Thread Lee Revell
On Wed, 2005-03-30 at 03:45 +0200, Marcin Dalecki wrote:
> > I think your misunderstanding is that you beliieve user-space can't do
> > RT.  It's wrong.  See JACK (jackit.sf.net), for example.
> 
> I know JACK in and out. It doesn't provide what you claim.
> 

This was just an example, to prove the point that user space can do RT
just fine.  JACK can do low latency sample accurate audio, and mixing
and volume control are fairly trivial compared to what some JACK clients
do.  If it works well enough for professional hard disk recording
systems, then it can certainly handle system sounds and playing movies
and MP3s.

And as a matter of fact you can implement all the audio needs of a
desktop system with JACK, this is what Linspire is doing for the next
release, even though it wasn't designed for this.  The system mixer is
just a JACK mixing client and each app opens ports for I/O, and only
JACK talks to the hardware (through ALSA).

The fact that OSX and Windows do this in the kernel is not a good
argument, those kernels are bloated.  Windows drivers also do things
like AC3 decoding in the kernel.  And the OSX kernel uses 16K stacks.

If audio does not work as well OOTB as on those other OSes, it's an
indication of their relative maturity vs JACK/ALSA, not an inherently
superior design.  Most audio people consider JACK + ALSA a better design
than anything in the proprietary world (CoreAudio, ASIO).

Lee



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.4.30-rc3 md/ext3 problems (ext3 gurus : please check)

2005-03-29 Thread Neil Brown
On Tuesday March 29, [EMAIL PROTECTED] wrote:
> 
> Attached is the backout patch, for convenience.

Thanks.  I had another look, and think I may be able to see the
problem.  If I'm right, it is a problem with this patch.

> diff -Nru a/fs/jbd/commit.c b/fs/jbd/commit.c
> --- a/fs/jbd/commit.c 2005-03-29 18:50:55 -03:00
> +++ b/fs/jbd/commit.c 2005-03-29 18:50:55 -03:00
> @@ -92,7 +92,7 @@
>   struct buffer_head *wbuf[64];
>   int bufs;
>   int flags;
> - int err = 0;
> + int err;
>   unsigned long blocknr;
>   char *tagp = NULL;
>   journal_header_t *header;
> @@ -299,8 +299,6 @@
>   spin_unlock(_datalist_lock);
>   unlock_journal(journal);
>   wait_on_buffer(bh);
> - if (unlikely(!buffer_uptodate(bh)))
> - err = -EIO;
>   /* the journal_head may have been removed now */
>   lock_journal(journal);
>   goto write_out_data;


I think the "!buffer_update(bh)" test is not safe at this point as,
after the wait_on_buffer which could cause a schedule, the bh may
no longer exist, or be for the same block.  There doesn't seem to be
any locking or refcounting that would keep it valid.

Note the comment "the journal_head may have been removed now".
If the journal_head is gone, the associated buffer_head is likely gone
as well. 

I'm not certain that this is right, but it seems possible and would
explain the symptoms.  Maybe Stephen or Andrew could comments?


> --- a/mm/filemap.c2005-03-29 18:50:55 -03:00
> +++ b/mm/filemap.c2005-03-29 18:50:55 -03:00
> @@ -3261,12 +3261,7 @@
>   status = generic_osync_inode(inode, 
> OSYNC_METADATA|OSYNC_DATA);
>   }
>   
> - /*
> -  * generic_osync_inode always returns 0 or negative value.
> -  * So 'status < written' is always true, and written should
> -  * be returned if status >= 0.
> -  */
> - err = (status < 0) ? status : written;
> + err = written ? written : status;
>  out:
>  
>   return err;

As an aside, this looks extremely dubious to me.

There is a loop earlier in this routine (do_generic_file_write) that
passes a piece-at-a-time of the write request to prepare_write /
commit_write.
Successes are counted in 'written'.  A failure causes the loop to
abort with a status in 'status'.

If some of the write succeeded and some failed, then I believe the
correct behaviour is to return the number of bytes that succeeded.
However this change to the return status (remember the above patch is
a reversal) causes any failure to over-ride any success. This, I
think, is wrong.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: i386/x86_64 segment register issuses (Re: PATCH: Fix x86 segment register access)

2005-03-29 Thread H. J. Lu
On Tue, Mar 29, 2005 at 06:44:18PM -0800, Linus Torvalds wrote:
> 
> 
> On Tue, 29 Mar 2005, H. J. Lu wrote:
> > 
> > > the smaller and faster version do not want to just rely on gas
> > > automatically getting it right, especially since gas has historically been
> > > very very bad at getting things right.
> > 
> > We are fixing those issues in assembler. If people run into problems
> > like that with gas, they can report them. They will be fixed.
> 
> It's fine if gas fixes things. It's not fine if gas breaks things that 
> used to work, for no really good reason.
> 
> > > What is the advantage of not allowing "movl %ds,mem"? Really? Especially
> > > since I suspect the kernel is pretty much the only one who does this, and
> > > the kernel really does do it on purpose. The kernel explicitly wants the
> > > 32-bit version, knowing that the upper bits are undefined.
> > > 
> > 
> > Kernel has
> > 
> > unsigned gsindex;
> > asm volatile("movl %%gs,%0" : "=g" (gsindex));
> 
> Ok, that's a real x86-64 bug, it seems. Andi, please fix, preferably by 
> just making the "g" be a "r".
> 
> However, your argument isn't very valid, since:
> 
> > The new assembler will make sure that it won't happen.
> 
> Not true, since the suggestion was just to change all segment "movl"  
> things to "mov", at which point the same old bug is still there, and the
> assembler didn't really help us at all.

The new assembler won't accept

movl %gs,128(%rsp)

It makes it harder to generate binary code user doesn't tend. FWIW,
what I suggested are in

http://sourceware.org/ml/binutils/2005-03/msg00873.html

Thera are things like

-   asm volatile("movl %%fs,%0" : "=g" (fsindex)); 
+   asm volatile("movl %%fs,%0" : "=r" (fsindex)); 

> 
> See the problem? You're not actually protecting anything. The change just 
> makes it _harder_ to make sizes explicit, and suddenly we have to trust an 
> assembler to be clever about sizes, when that assembler historically has 
> definitely _not_ been very clever about them at all. 
> 

There is no such an instruction of "movl %ds,(%eax)". The old assembler
accepts it and turns it into "movw %ds,(%eax)". It won't catch problems
like

unsigned fsindex;
asm volatile("movl %%fs,%0" : "=m" (fsindex)); 

The "movw %ds,(%eax)" bug was fixed in binutils 2.15.94.0.1. Gas no
longer generates 0x66 for it. If you find gas preventing you from doing
what the hardware supports, I will be happy to fix it.


H.J.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: prefetch on ppc64

2005-03-29 Thread Paul Mackerras
Serge E. Hallyn writes:

> While investigating the inordinate performance impact one of my patches
> seemed to be having, we tracked it down to two hlist_for_each_entry
> loops, and finally to the prefetch instruction in the loop.

I would be interested to know what results you get if you leave the
loops using hlist_for_each_entry but change prefetch() and prefetchw()
to do the dcbt or dcbtst instruction only if the address is non-zero,
like this:

static inline void prefetch(const void *x)
{
if (x)
__asm__ __volatile__ ("dcbt 0,%0" : : "r" (x));
}

static inline void prefetchw(const void *x)
{
if (x)
__asm__ __volatile__ ("dcbtst 0,%0" : : "r" (x));
}

It seems that doing a prefetch on a NULL pointer, while it doesn't
cause a fault, does waste time looking for a translation of the zero
address.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[3/5] Orinoco merge updates, part the fourth: kill dump_recs

2005-03-29 Thread David Gibson
Remove the dump_recs debugging iwpriv command.  It will be replaced
later with the simpler and more flexible get_rid command.

Signed-off-by: David Gibson <[EMAIL PROTECTED]>

Index: working-2.6/drivers/net/wireless/orinoco.c
===
--- working-2.6.orig/drivers/net/wireless/orinoco.c 2005-03-24 
15:57:43.0 +1100
+++ working-2.6/drivers/net/wireless/orinoco.c  2005-03-24 15:57:44.0 
+1100
@@ -607,7 +607,6 @@
 static int orinoco_ioctl(struct net_device *dev, struct ifreq *rq, int cmd);
 static int __orinoco_program_rids(struct net_device *dev);
 static void __orinoco_set_multicast_list(struct net_device *dev);
-static int orinoco_debug_dump_recs(struct net_device *dev);
 
 //
 /* Internal helper functions*/
@@ -3861,7 +3860,6 @@
{ SIOCIWFIRSTPRIV + 0x7, 0,
  IW_PRIV_TYPE_INT | IW_PRIV_SIZE_FIXED | 1,
  "get_ibssport" },
-   { SIOCIWLASTPRIV, 0, 0, "dump_recs" },
};
 
wrq->u.data.length = sizeof(privtab) / 
sizeof(privtab[0]);
@@ -3949,14 +3947,6 @@
err = orinoco_ioctl_getibssport(dev, wrq);
break;
 
-   case SIOCIWLASTPRIV:
-   err = orinoco_debug_dump_recs(dev);
-   if (err)
-   printk(KERN_ERR "%s: Unable to dump records (%d)\n",
-  dev->name, err);
-   break;
-
-
default:
err = -EOPNOTSUPP;
}
@@ -3970,187 +3960,6 @@
return err;
 }
 
-struct {
-   u16 rid;
-   char *name;
-   int displaytype;
-#define DISPLAY_WORDS  0
-#define DISPLAY_BYTES  1
-#define DISPLAY_STRING 2
-#define DISPLAY_XSTRING3
-} record_table[] = {
-#define DEBUG_REC(name,type) { HERMES_RID_##name, #name, DISPLAY_##type }
-   DEBUG_REC(CNFPORTTYPE,WORDS),
-   DEBUG_REC(CNFOWNMACADDR,BYTES),
-   DEBUG_REC(CNFDESIREDSSID,STRING),
-   DEBUG_REC(CNFOWNCHANNEL,WORDS),
-   DEBUG_REC(CNFOWNSSID,STRING),
-   DEBUG_REC(CNFOWNATIMWINDOW,WORDS),
-   DEBUG_REC(CNFSYSTEMSCALE,WORDS),
-   DEBUG_REC(CNFMAXDATALEN,WORDS),
-   DEBUG_REC(CNFPMENABLED,WORDS),
-   DEBUG_REC(CNFPMEPS,WORDS),
-   DEBUG_REC(CNFMULTICASTRECEIVE,WORDS),
-   DEBUG_REC(CNFMAXSLEEPDURATION,WORDS),
-   DEBUG_REC(CNFPMHOLDOVERDURATION,WORDS),
-   DEBUG_REC(CNFOWNNAME,STRING),
-   DEBUG_REC(CNFOWNDTIMPERIOD,WORDS),
-   DEBUG_REC(CNFMULTICASTPMBUFFERING,WORDS),
-   DEBUG_REC(CNFWEPENABLED_AGERE,WORDS),
-   DEBUG_REC(CNFMANDATORYBSSID_SYMBOL,WORDS),
-   DEBUG_REC(CNFWEPDEFAULTKEYID,WORDS),
-   DEBUG_REC(CNFDEFAULTKEY0,BYTES),
-   DEBUG_REC(CNFDEFAULTKEY1,BYTES),
-   DEBUG_REC(CNFMWOROBUST_AGERE,WORDS),
-   DEBUG_REC(CNFDEFAULTKEY2,BYTES),
-   DEBUG_REC(CNFDEFAULTKEY3,BYTES),
-   DEBUG_REC(CNFWEPFLAGS_INTERSIL,WORDS),
-   DEBUG_REC(CNFWEPKEYMAPPINGTABLE,WORDS),
-   DEBUG_REC(CNFAUTHENTICATION,WORDS),
-   DEBUG_REC(CNFMAXASSOCSTA,WORDS),
-   DEBUG_REC(CNFKEYLENGTH_SYMBOL,WORDS),
-   DEBUG_REC(CNFTXCONTROL,WORDS),
-   DEBUG_REC(CNFROAMINGMODE,WORDS),
-   DEBUG_REC(CNFHOSTAUTHENTICATION,WORDS),
-   DEBUG_REC(CNFRCVCRCERROR,WORDS),
-   DEBUG_REC(CNFMMLIFE,WORDS),
-   DEBUG_REC(CNFALTRETRYCOUNT,WORDS),
-   DEBUG_REC(CNFBEACONINT,WORDS),
-   DEBUG_REC(CNFAPPCFINFO,WORDS),
-   DEBUG_REC(CNFSTAPCFINFO,WORDS),
-   DEBUG_REC(CNFPRIORITYQUSAGE,WORDS),
-   DEBUG_REC(CNFTIMCTRL,WORDS),
-   DEBUG_REC(CNFTHIRTY2TALLY,WORDS),
-   DEBUG_REC(CNFENHSECURITY,WORDS),
-   DEBUG_REC(CNFGROUPADDRESSES,BYTES),
-   DEBUG_REC(CNFCREATEIBSS,WORDS),
-   DEBUG_REC(CNFFRAGMENTATIONTHRESHOLD,WORDS),
-   DEBUG_REC(CNFRTSTHRESHOLD,WORDS),
-   DEBUG_REC(CNFTXRATECONTROL,WORDS),
-   DEBUG_REC(CNFPROMISCUOUSMODE,WORDS),
-   DEBUG_REC(CNFBASICRATES_SYMBOL,WORDS),
-   DEBUG_REC(CNFPREAMBLE_SYMBOL,WORDS),
-   DEBUG_REC(CNFSHORTPREAMBLE,WORDS),
-   DEBUG_REC(CNFWEPKEYS_AGERE,BYTES),
-   DEBUG_REC(CNFEXCLUDELONGPREAMBLE,WORDS),
-   DEBUG_REC(CNFTXKEY_AGERE,WORDS),
-   DEBUG_REC(CNFAUTHENTICATIONRSPTO,WORDS),
-   DEBUG_REC(CNFBASICRATES,WORDS),
-   DEBUG_REC(CNFSUPPORTEDRATES,WORDS),
-   DEBUG_REC(CNFTICKTIME,WORDS),
-   DEBUG_REC(CNFSCANREQUEST,WORDS),
-   DEBUG_REC(CNFJOINREQUEST,WORDS),
-   DEBUG_REC(CNFAUTHENTICATESTATION,WORDS),
-   DEBUG_REC(CNFCHANNELINFOREQUEST,WORDS),
-   DEBUG_REC(MAXLOADTIME,WORDS),
-   DEBUG_REC(DOWNLOADBUFFER,WORDS),
-   DEBUG_REC(PRIID,WORDS),
-   DEBUG_REC(PRISUPRANGE,WORDS),
-   DEBUG_REC(CFIACTRANGES,WORDS),
-   DEBUG_REC(NICSERNUM,XSTRING),
-   DEBUG_REC(NICID,WORDS),
-   

[4/5] Orinoco merge updates, part the fourth: don't set channel in managed mode

2005-03-29 Thread David Gibson
Don't attempt to manually set the channel in infrastructure mode, the
firmware doesn't like that much.  Also don't attempt to override the
firmware's default channel number for IBSS mode (I believe default
channel can vary by regulatory domain).

Signed-off-by: David Gibson <[EMAIL PROTECTED]>

Index: working-2.6/drivers/net/wireless/orinoco.c
===
--- working-2.6.orig/drivers/net/wireless/orinoco.c 2005-03-11 
15:07:08.0 +1100
+++ working-2.6/drivers/net/wireless/orinoco.c  2005-03-11 16:13:31.0 
+1100
@@ -1615,17 +1615,15 @@
return err;
}
/* Set the channel/frequency */
-   if (priv->channel == 0) {
-   printk(KERN_DEBUG "%s: Channel is 0 in 
__orinoco_program_rids()\n", dev->name);
-   if (priv->createibss)
-   priv->channel = 10;
-   }
-   err = hermes_write_wordrec(hw, USER_BAP, HERMES_RID_CNFOWNCHANNEL,
-  priv->channel);
-   if (err) {
-   printk(KERN_ERR "%s: Error %d setting channel\n",
-  dev->name, err);
-   return err;
+   if (priv->channel != 0 && priv->iw_mode != IW_MODE_INFRA) {
+   err = hermes_write_wordrec(hw, USER_BAP,
+  HERMES_RID_CNFOWNCHANNEL,
+  priv->channel);
+   if (err) {
+   printk(KERN_ERR "%s: Error %d setting channel %d\n",
+  dev->name, err, priv->channel);
+   return err;
+   }
}
 
if (priv->has_ibss) {
@@ -2405,7 +2403,7 @@
/* By default use IEEE/IBSS ad-hoc mode if we have it */
priv->prefer_port3 = priv->has_port3 && (! priv->has_ibss);
set_port_type(priv);
-   priv->channel = 10; /* default channel, more-or-less arbitrary */
+   priv->channel = 0; /* use firmware default */
 
priv->promiscuous = 0;
priv->wep_on = 0;

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/people/dgibson
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[1/5] Orinoco merge updates, part the fourth: wireless stats updates

2005-03-29 Thread David Gibson
Minor updates/bugfixes to the handling of wireless statistics.

Signed-off-by: David Gibson <[EMAIL PROTECTED]>

Index: working-2.6/drivers/net/wireless/orinoco.c
===
--- working-2.6.orig/drivers/net/wireless/orinoco.c 2005-02-25 
15:47:53.314373136 +1100
+++ working-2.6/drivers/net/wireless/orinoco.c  2005-02-25 16:20:13.951351472 
+1100
@@ -686,7 +686,7 @@
struct orinoco_private *priv = netdev_priv(dev);
hermes_t *hw = >hw;
struct iw_statistics *wstats = >wstats;
-   int err = 0;
+   int err;
unsigned long flags;
 
if (! netif_device_present(dev)) {
@@ -695,9 +695,21 @@
return NULL; /* FIXME: Can we do better than this? */
}
 
+   /* If busy, return the old stats.  Returning NULL may cause
+* the interface to disappear from /proc/net/wireless */
if (orinoco_lock(priv, ) != 0)
-   return NULL;  /* FIXME: Erg, we've been signalled, how
-  * do we propagate this back up? */
+   return wstats;
+
+   /* We can't really wait for the tallies inquiry command to
+* complete, so we just use the previous results and trigger
+* a new tallies inquiry command for next time - Jean II */
+   /* FIXME: Really we should wait for the inquiry to come back -
+* as it is the stats we give don't make a whole lot of sense.
+* Unfortunately, it's not clear how to do that within the
+* wireless extensions framework: I think we're in user
+* context, but a lock seems to be held by the time we get in
+* here so we're not safe to sleep here. */
+   hermes_inquire(hw, HERMES_INQ_TALLIES);
 
if (priv->iw_mode == IW_MODE_ADHOC) {
memset(>qual, 0, sizeof(wstats->qual));
@@ -716,25 +728,16 @@
 
err = HERMES_READ_RECORD(hw, USER_BAP,
 HERMES_RID_COMMSQUALITY, );
-   
-   wstats->qual.qual = (int)le16_to_cpu(cq.qual);
-   wstats->qual.level = (int)le16_to_cpu(cq.signal) - 0x95;
-   wstats->qual.noise = (int)le16_to_cpu(cq.noise) - 0x95;
-   wstats->qual.updated = 7;
+
+   if (!err) {
+   wstats->qual.qual = (int)le16_to_cpu(cq.qual);
+   wstats->qual.level = (int)le16_to_cpu(cq.signal) - 0x95;
+   wstats->qual.noise = (int)le16_to_cpu(cq.noise) - 0x95;
+   wstats->qual.updated = 7;
+   }
}
 
-   /* We can't really wait for the tallies inquiry command to
-* complete, so we just use the previous results and trigger
-* a new tallies inquiry command for next time - Jean II */
-   /* FIXME: We're in user context (I think?), so we should just
-   wait for the tallies to come through */
-   err = hermes_inquire(hw, HERMES_INQ_TALLIES);
-   
orinoco_unlock(priv, );
-
-   if (err)
-   return NULL;
-   
return wstats;
 }
 


-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/people/dgibson
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[5/5] Orinoco merge updates, part the fourth: consolidate allocation code

2005-03-29 Thread David Gibson
Consolidate allocation of firmware buffers.  In the process, remove
duplication of a workaround for an old symbol firmware bug, and fix a
bug where we could retry the workaround, even if it already failed to
help.

Signed-off-by: David Gibson <[EMAIL PROTECTED]>

Index: working-2.6/drivers/net/wireless/orinoco.c
===
--- working-2.6.orig/drivers/net/wireless/orinoco.c 2005-03-11 
16:13:31.0 +1100
+++ working-2.6/drivers/net/wireless/orinoco.c  2005-03-11 16:21:55.0 
+1100
@@ -1418,7 +1418,7 @@
return err;
 
err = hermes_allocate(hw, priv->nicbuf_size, >txfid);
-   if (err == -EIO) {
+   if (err == -EIO && priv->nicbuf_size > TX_NICBUF_SIZE_BUG) {
/* Try workaround for old Symbol firmware bug */
printk(KERN_WARNING "%s: firmware ALLOC bug detected "
   "(old Symbol firmware?). Trying to work around... ",
@@ -2270,7 +2270,7 @@
priv->nicbuf_size = IEEE802_11_FRAME_LEN + ETH_HLEN;
 
/* Initialize the firmware */
-   err = hermes_init(hw);
+   err = orinoco_reinit_firmware(dev);
if (err != 0) {
printk(KERN_ERR "%s: failed to initialize firmware (err = 
%d)\n",
   dev->name, err);
@@ -2409,25 +2409,6 @@
priv->wep_on = 0;
priv->tx_key = 0;
 
-   err = hermes_allocate(hw, priv->nicbuf_size, >txfid);
-   if (err == -EIO) {
-   /* Try workaround for old Symbol firmware bug */
-   printk(KERN_WARNING "%s: firmware ALLOC bug detected "
-  "(old Symbol firmware?). Trying to work around... ",
-  dev->name);
-   
-   priv->nicbuf_size = TX_NICBUF_SIZE_BUG;
-   err = hermes_allocate(hw, priv->nicbuf_size, >txfid);
-   if (err)
-   printk("failed!\n");
-   else
-   printk("ok.\n");
-   }
-   if (err) {
-   printk("%s: Error %d allocating Tx buffer\n", dev->name, err);
-   goto out;
-   }
-
/* Make the hardware available, as long as it hasn't been
 * removed elsewhere (e.g. by PCMCIA hot unplug) */
spin_lock_irq(>lock);

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/people/dgibson
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[2/5] Orinoco merge updates, part the fourth: ignore_disconnect flag

2005-03-29 Thread David Gibson
Adds an ignore_disconnect module parameter.  When enabled, the driver
will continue attempting to send packets even when the firmware has
told us we've lost our link to the AP.  On some firmwares this
substantially increases the usable range of the card (presumably
because we have an interrmittent connection, but the firmware is able
to queue the packets for us until we're connected again).  On some
other cards, it causes the firmware to fall in a screaming heap :(
(hence, default off).

Signed-off-by: David Gibson <[EMAIL PROTECTED]>

Index: working-2.6/drivers/net/wireless/orinoco.c
===
--- working-2.6.orig/drivers/net/wireless/orinoco.c 2005-03-11 
14:44:09.0 +1100
+++ working-2.6/drivers/net/wireless/orinoco.c  2005-03-11 14:51:33.0 
+1100
@@ -492,6 +492,9 @@
 static int suppress_linkstatus; /* = 0 */
 module_param(suppress_linkstatus, bool, 0644);
 MODULE_PARM_DESC(suppress_linkstatus, "Don't log link status changes");
+static int ignore_disconnect; /* = 0 */
+module_param(ignore_disconnect, int, 0644);
+MODULE_PARM_DESC(ignore_disconnect, "Don't report lost link to the network 
layer");
 
 //
 /* Compile time configuration and compatibility stuff   */
@@ -1320,7 +1323,7 @@
 
if (connected)
netif_carrier_on(dev);
-   else
+   else if (!ignore_disconnect)
netif_carrier_off(dev);
 
if (newstatus != priv->last_linkstatus)

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/people/dgibson
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[0/5] Orinoco merge updates, part the fourth

2005-03-29 Thread David Gibson
Hi Jeff, please apply:

Here's yet another batch of orinoco updates.  Smaller and less
significant than the last, this is basically a handful of remaining
small updates before tackling the big changes (wext v15, monitor and
scanning).  Patches are:
orinoco-wstats-updates
Updates and bugfixes to wireless stats handling
orinoco-ignore-disconnect
Add the ignore_disconnect module parameter
orinoco-kill-dump-recs
Remove ugly debugging code, to be replaced later with
simpler and more useful stuff
orinoco-no-infra-channel
Don't attempt to set channel in managed mode, the
firmware doesn't like that.
orinoco-consolidate-allocate
Remove some duplicated code for firmware buffer
allocation, removing a bug in a hw workaround in the
process.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/people/dgibson
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


prefetch on ppc64

2005-03-29 Thread Serge E. Hallyn
Hi,

While investigating the inordinate performance impact one of my patches
seemed to be having, we tracked it down to two hlist_for_each_entry
loops, and finally to the prefetch instruction in the loop.

The machine I'm testing on has 4 power5 1.5Ghz cpus and 16G ram.  I was
mostly using dbench (v3.03) in runs of 50 and 100 on an ext2 system.
Kernel was 2.6.11-rc5.

I've not had much of a chance to test on x86, but the few tests I've run
have shown that prefetch does improve performance there.  From what I've
seen this seems to be a ppc (perhaps ppc64) specific symptom.

Following are two sets of interesting results on the ppc64 system.  The
first is on a stock 2.6.11-rc5 kernel.  The actual stock kernel gave the
following results for 100 runs of dbench:
# elements: 100, mean 862.580380, variance 5.973441, std dev 2.444062

When I patched fs/dcache.c to replace the three hlist_for_each_entry{,_rcu}
rules with manual loops, as shown in the attached file dcache-nohlist.patch,
I got:
# elements: 50, mean 881.804980, variance 10.695022, std dev 3.270325

The next set of results is based on 2.6.11-rc5 with the LSM stacking
patches (from www.sf.net/projects/lsm-stacker).  I was understandably
alarmed to find the original patched version gave me:
# elements: 100, mean 797.654870, variance 7.503588, std dev 2.739268

The code which I determined to be responsible contained two
list_for_each_entry loops,  Replacing one with a manual loop gave me
# elements: 50, mean 835.859980, variance 81.901719, std dev 9.049957
and replacing the second gave me
# elements: 50, mean 846.541060, variance 17.095401, std dev 4.134658

Finally I followed Paul McKenney's suggestion and just commented out the
ppc definition of prefetch altogether, which gave me:

# elements: 50, mean 860.823880, variance 47.567428, std dev 6.896914

I am currently testing this same patch against a non-stacking kernel.

thanks,
-serge
Index: linux-2.6.11-rc5-nostack/fs/dcache.c
===
--- linux-2.6.11-rc5-nostack.orig/fs/dcache.c   2005-03-11 15:19:58.0 
-0600
+++ linux-2.6.11-rc5-nostack/fs/dcache.c2005-03-26 01:35:29.0 
-0600
@@ -656,7 +656,7 @@
do {
found = 0;
spin_lock(_lock);
-   hlist_for_each(lp, head) {
+   for (lp=head->first; lp; lp = lp->next) {
struct dentry *this = hlist_entry(lp, struct dentry, 
d_hash);
if (!list_empty(>d_lru)) {
dentry_stat.nr_unused--;
@@ -1047,7 +1047,9 @@
 
rcu_read_lock();

-   hlist_for_each_rcu(node, head) {
+   for (node=head->first; node;
+   ({ node = node->next; smp_read_barrier_depends(); }))
+   {
struct dentry *dentry; 
struct qstr *qstr;
 
@@ -1123,7 +1125,7 @@
 
spin_lock(_lock);
base = d_hash(dparent, dentry->d_name.hash);
-   hlist_for_each(lhp,base) { 
+   for (lhp=base->first; lhp; lhp = lhp->next) {
/* hlist_for_each_rcu() not required for d_hash list
 * as it is parsed under dcache_lock
 */


Re: [PATCH] embarassing typo

2005-03-29 Thread Måns Rullgård
Vicente Feito <[EMAIL PROTECTED]> writes:

> On Tuesday 29 March 2005 09:58 pm, you wrote:
>> Måns Rullgård wrote:
>> > "Ronald S. Bultje" <[EMAIL PROTECTED]> writes:
>> >>--- linux-2.6.5/drivers/media/video/zr36050.c.old 16 Sep 2004 22:53:27
>> >> - 1.2 +++ linux-2.6.5/drivers/media/video/zr36050.c 29 Mar 2005
>> >> 20:30:23 - @@ -419,7 +419,7 @@
>> >>  dri_data[2] = 0x00;
>> >>  dri_data[3] = 0x04;
>> >>  dri_data[4] = ptr->dri >> 8;
>> >>- dri_data[5] = ptr->dri * 0xff;
>> >>+ dri_data[5] = ptr->dri & 0xff;
>> >
>> > Hey, that's a nice obfuscation of a simple negation.
>>
>> It's not a negation.  This statement always assigns zero to
>> dri_data[5] if dri_data is char[].  Looks like gcc isn't catching
>> this problem.
>>
> As long as the variable doesn't get overflowed you would have a
> negation, you shouldn't do dri_data[5] = ptr->dri * 0xff; if
> ptr->dri it's 255, but if ptr->dri = 1 i.e. (like is set in
> zr36050_setup) then you would be getting the negation, -1. the
> Direct rendering support is a flag afaik, so in this case I believe
> is a worthy C obfuscated negation code :)
> btw, are you sure about this patch?I would contact the maintainer
> first, because and'ing that doesn't make much sense...

It seems pretty obvious to me, that the code is supposed to store the
high byte in dri_data[4], and the low byte in dri_data[5].  Mistyping
& as * doesn't seem too unlikely, either.

-- 
Måns Rullgård
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] embarassing typo

2005-03-29 Thread Phil Howard
On Wed, Mar 30, 2005 at 04:07:39AM +0200, M?ns Rullg?rd wrote:

| Michael Tokarev <[EMAIL PROTECTED]> writes:
| 
| > M?ns Rullg?rd wrote:
| >> "Ronald S. Bultje" <[EMAIL PROTECTED]> writes:
| >>
| >>>--- linux-2.6.5/drivers/media/video/zr36050.c.old  16 Sep 2004 22:53:27 
-  1.2
| >>>+++ linux-2.6.5/drivers/media/video/zr36050.c  29 Mar 2005 20:30:23 
-
| >>>@@ -419,7 +419,7 @@
| >>>   dri_data[2] = 0x00;
| >>>   dri_data[3] = 0x04;
| >>>   dri_data[4] = ptr->dri >> 8;
| >>>-  dri_data[5] = ptr->dri * 0xff;
| >>>+  dri_data[5] = ptr->dri & 0xff;
| >> Hey, that's a nice obfuscation of a simple negation.
| >
| > It's not a negation.  This statement always assigns zero to
| > dri_data[5] if dri_data is char[].
| 
| Sure about that?
| 
| __u16 i;
| char c;
| i = 1; c = i * 255; /* c = 255 = -1 */
| i = 2; c = i * 255; /* c = 510 & 0xff = 254 = -2 */
| ...
| 
| Looks like negation to me.

Sure it's negation because 255 _is_ 256 - 1.  Basic finite math.

( x * 256 ) mod 256 == 0
( ( x * 256 ) - ( x * 1 ) ) mod 256 == - ( x * 1 )
( x * ( 256 - 1 ) ) mod 256 == - ( x * 1 )
( x * 255 ) mod 256 == - ( x * 1 )
( x * 255 ) mod 256 == - x

Now what I am interested in is if gcc optimized it to a faster negation
or subtraction instruction.

-- 
-
| Phil Howard KA9WGN   | http://linuxhomepage.com/  http://ham.org/ |
| (first name) at ipal.net | http://phil.ipal.org/   http://ka9wgn.ham.org/ |
-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Do not misuse Coverity please (Was: sound/oss/cs46xx.c: fix a check after use)

2005-03-29 Thread Horst von Brand
"Jean Delvare" <[EMAIL PROTECTED]> said:

[Sttributions missing, sorry]

> > >  Think about it. If the pointer could be NULL, then it's unlikely that
> > >  the bug would have gone unnoticed so far (unless the code is very
> > >  recent). Coverity found 3 such bugs in one i2c driver [1], and the
> > >  correct solution was to NOT check for NULL because it just couldn't
> > >  happen.

> > No, there is a third case: the pointer can be NULL, but the compiler
> > happened to move the dereference down to after the check.

> Wow. Great point. I completely missed that possibility. In fact I didn't
> know that the compiler could possibly alter the order of the
> instructions. For one thing, I thought it was simply not allowed to. For
> another, I didn't know that it had been made so aware that it could
> actually figure out how to do this kind of things. What a mess. Let's
> just hope that the gcc folks know their business :)

The compiler is most definitely /not/ allowed to change the results the
code gives.
-- 
Dr. Horst H. von Brand   User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria  +56 32 654239
Casilla 110-V, Valparaiso, ChileFax:  +56 32 797513
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 8/8] CKRM: Documentation

2005-03-29 Thread gh

This patch adds all current documentation on CKRM.

Signed-Off-By: Hubertus Franke <[EMAIL PROTECTED]>
Signed-Off-By: Chandra Seetharaman <[EMAIL PROTECTED]>
Signed-Off-By: Shailabh Nagar <[EMAIL PROTECTED]>
Signed-Off-By: Vivek Kashyap <[EMAIL PROTECTED]>
Signed-Off-By: Gerrit Huizenga <[EMAIL PROTECTED]>

Index: linux-2.6.12-rc1/Documentation/ckrm/ckrm_basics
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.12-rc1/Documentation/ckrm/ckrm_basics 2005-03-18 
15:16:46.010477430 -0800
@@ -0,0 +1,66 @@
+CKRM Basics
+-
+A brief review of CKRM concepts and terminology will help make installation
+and testing easier. For more details, please visit http://ckrm.sf.net. 
+
+Currently there are two class types, taskclass and socketclass for grouping,
+regulating and monitoring tasks and sockets respectively.
+
+To avoid repeating instructions for each classtype, this document assumes a
+task to be the kernel object being grouped. By and large, one can replace task
+with socket and taskclass with socketclass.
+
+RCFS depicts a CKRM class as a directory. Hierarchy of classes can be
+created in which children of a class share resources allotted to
+the parent. Tasks can be classified to any class which is at any level.
+There is no correlation between parent-child relationship of tasks and
+the parent-child relationship of classes they belong to.
+
+Without a Classification Engine, class is inherited by a task. A privileged
+user can reassigned a task to a class as described below, after which all
+the child tasks under that task will be assigned to that class, unless the
+user reassigns any of them.
+
+A Classification Engine, if one exists, will be used by CKRM to
+classify a task to a class. The Rule based classification engine uses some
+of the attributes of the task to classify a task. When a CE is present
+class is not inherited by a task.
+
+Characteristics of a class can be accessed/changed through the following magic
+files under the directory representing the class:
+
+shares:  allows to change the shares of different resources managed by the
+ class
+stats:   allows to see the statistics associated with each resources managed
+ by the class
+target:  allows to assign a task to a class. If a CE is present, assigning
+ a task to a class through this interface will prevent CE from
+reassigning the task to any class during reclassification.
+members: allows to see which tasks has been assigned to a class
+config:  allow to view and modify configuration information of different
+ resources in a class.
+
+Resource allocations for a class is controlled by the parameters:
+
+guarantee: specifies how much of a resource is guranteed to a class. A
+   special value DONT_CARE(-2) mean that there is no specific
+  guarantee of a resource is specified, this class may not get
+  any resource if the system is runing short of resources
+limit: specifies the maximum amount of resource that is allowed to be
+   allocated by a class. A special value DONT_CARE(-2) mean that
+  there is no specific limit is specified, this class can get all
+  the resources available.
+total_guarantee: total guarantee that is allowed among the children of this
+   class. In other words, the sum of "guarantee"s of all children
+  of this class cannot exit this number.
+max_limit: Maximum "limit" allowed for any of this class's children. In
+  other words, "limit" of any children of this class cannot exceed
+  this value.
+
+None of this parameters are absolute or have any units associated with
+them. These are just numbers(that are relative to its parents') that are
+used to calculate the absolute number of resource available for a specific
+class.
+
+Note: The root class has an absolute number of resource units associated with 
it.
+
Index: linux-2.6.12-rc1/Documentation/ckrm/core_usage
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.12-rc1/Documentation/ckrm/core_usage  2005-03-18 
15:16:46.011477350 -0800
@@ -0,0 +1,72 @@
+Usage of CKRM without a classification engine
+---
+
+1. Create a class
+
+   # mkdir /rcfs/taskclass/c1
+   creates a taskclass named c1 , while
+   # mkdir /rcfs/socket_class/s1
+   creates a socketclass named s1 
+
+The newly created class directory is automatically populated by magic files
+shares, stats, members, target and config.
+
+2. View default shares 
+
+   # cat /rcfs/taskclass/c1/shares
+
+   "guarantee=-2,limit=-2,total_guarantee=100,max_limit=100" is the default
+   value set for resources that have controllers registered with CKRM.
+
+3. change shares of a 
+
+   One or more of the following fields can/must be specified
+   res= #mandatory
+   

[patch 7/8] CKRM: Numtasks Controller

2005-03-29 Thread gh

This patch provides a resource controller for limiting the number
of tasks per class in CKRM.

Signed-Off-By: Chandra Seetharaman <[EMAIL PROTECTED]>
Signed-Off-By: Hubertus Franke <[EMAIL PROTECTED]>
Signed-Off-By: Shailabh Nagar <[EMAIL PROTECTED]>
Signed-Off-By: Gerrit Huizenga <[EMAIL PROTECTED]>


Index: linux-2.6.12-rc1/include/linux/ckrm_tsk.h
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.12-rc1/include/linux/ckrm_tsk.h   2005-03-18 15:16:41.818810820 
-0800
@@ -0,0 +1,35 @@
+/* ckrm_tsk.h - No. of tasks resource controller for CKRM
+ *
+ * Copyright (C) Chandra Seetharaman, IBM Corp. 2003
+ * 
+ * Provides No. of tasks resource controller for CKRM
+ *
+ * Latest version, more details at http://ckrm.sf.net
+ * 
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ */
+
+#ifndef _LINUX_CKRM_TSK_H
+#define _LINUX_CKRM_TSK_H
+
+#ifdef CONFIG_CKRM_TYPE_TASKCLASS
+#include 
+
+typedef int (*get_ref_t) (struct ckrm_core_class *, int);
+typedef void (*put_ref_t) (struct ckrm_core_class *);
+
+extern int numtasks_get_ref(struct ckrm_core_class *, int);
+extern void numtasks_put_ref(struct ckrm_core_class *);
+extern void ckrm_numtasks_register(get_ref_t, put_ref_t);
+
+#else /* CONFIG_CKRM_TYPE_TASKCLASS */
+
+#define numtasks_get_ref(core_class, ref) (1)
+#define numtasks_put_ref(core_class)  do {} while (0)
+
+#endif /* CONFIG_CKRM_TYPE_TASKCLASS */
+#endif /* _LINUX_CKRM_RES_H */
Index: linux-2.6.12-rc1/init/Kconfig
===
--- linux-2.6.12-rc1.orig/init/Kconfig  2005-03-18 15:16:37.397162502 -0800
+++ linux-2.6.12-rc1/init/Kconfig   2005-03-18 15:16:41.819810740 -0800
@@ -185,6 +185,16 @@ config CKRM_TYPE_SOCKETCLASS

  Say Y if unsure.  
 
+config CKRM_RES_NUMTASKS
+   tristate "Number of Tasks Resource Manager"
+   depends on CKRM_TYPE_TASKCLASS
+   default y
+   help
+ Provides a Resource Controller for CKRM that allows limiting no of
+ tasks a task class can have.
+   
+ Say N if unsure, Y to use the feature.
+
 endmenu
 
 config SYSCTL
Index: linux-2.6.12-rc1/kernel/ckrm/ckrm_numtasks.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.12-rc1/kernel/ckrm/ckrm_numtasks.c2005-03-18 
15:16:41.820810661 -0800
@@ -0,0 +1,522 @@
+/* ckrm_numtasks.c - "Number of tasks" resource controller for CKRM
+ *
+ * Copyright (C) Chandra Seetharaman,  IBM Corp. 2003
+ * 
+ * Latest version, more details at http://ckrm.sf.net
+ * 
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ */
+
+/*
+ * CKRM Resource controller for tracking number of tasks in a class.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define TOTAL_NUM_TASKS (131072)   /* 128 K */
+#define NUMTASKS_DEBUG
+#define NUMTASKS_NAME "numtasks"
+
+struct ckrm_numtasks {
+   struct ckrm_core_class *core;   /* the core i am part of... */
+   struct ckrm_core_class *parent; /* parent of the core above. */
+   struct ckrm_shares shares;
+   spinlock_t cnt_lock;/* always grab parent's lock before child's */
+   int cnt_guarantee;  /* num_tasks guarantee in local units */
+   int cnt_unused; /* has to borrow if more than this is needed */
+   int cnt_limit;  /* no tasks over this limit. */
+   atomic_t cnt_cur_alloc; /* current alloc from self */
+   atomic_t cnt_borrowed;  /* borrowed from the parent */
+
+   int over_guarantee; /* turn on/off when cur_alloc goes  */
+   /* over/under guarantee */
+
+   /* internally maintained statictics to compare with max numbers */
+   int limit_failures; /* # failures as request was over the limit */
+   int borrow_sucesses;/* # successful borrows */
+   int borrow_failures;/* # borrow failures */
+
+   /* Maximum the specific statictics has reached. */
+   int max_limit_failures;
+   int max_borrow_sucesses;
+   int max_borrow_failures;
+
+   /* Total number of specific statistics */
+   int tot_limit_failures;
+   int tot_borrow_sucesses;
+   int tot_borrow_failures;
+};
+
+struct ckrm_res_ctlr numtasks_rcbs;
+
+/* Initialize rescls values
+ * May be called on each rcfs unmount or as part of error recovery
+ * to make share values sane.
+ * Does not traverse hierarchy 

[patch 6/8] CKRM: Socket Class Controller

2005-03-29 Thread gh

This patch provides the extensions for CKRM to track per socket classes.
This is the base to enable socket based resource control for inbound
connection control, bandwidth control etc.

Signed-Off-By: Vivek Kashyap <[EMAIL PROTECTED]>
Signed-Off-By: Gerrit Huizenga <[EMAIL PROTECTED]>
Signed-off-by: Nishanth Aravamudan <[EMAIL PROTECTED]>

Index: linux-2.6.12-rc1/fs/rcfs/Makefile
===
--- linux-2.6.12-rc1.orig/fs/rcfs/Makefile  2005-03-18 15:16:33.370482769 
-0800
+++ linux-2.6.12-rc1/fs/rcfs/Makefile   2005-03-18 15:16:37.387163297 -0800
@@ -6,3 +6,4 @@ obj-$(CONFIG_RCFS_FS) += rcfs.o 
 
 rcfs-y := super.o inode.o dir.o rootdir.o magic.o
 rcfs-$(CONFIG_CKRM_TYPE_TASKCLASS) += tc_magic.o
+rcfs-$(CONFIG_CKRM_TYPE_SOCKETCLASS) += socket_fs.o
Index: linux-2.6.12-rc1/fs/rcfs/rootdir.c
===
--- linux-2.6.12-rc1.orig/fs/rcfs/rootdir.c 2005-03-18 15:16:33.372482610 
-0800
+++ linux-2.6.12-rc1/fs/rcfs/rootdir.c  2005-03-18 15:16:37.387163297 -0800
@@ -187,6 +187,10 @@ EXPORT_SYMBOL_GPL(rcfs_deregister_classt
 extern struct rcfs_mfdesc tc_mfdesc;
 #endif
 
+#ifdef CONFIG_CKRM_TYPE_SOCKETCLASS
+extern struct rcfs_mfdesc rcfs_sock_mfdesc;
+#endif
+
 /* Common root and magic file entries.
  * root name, root permissions, magic file names and magic file permissions 
  * are needed by all entities (classtypes and classification engines) existing 
@@ -203,4 +207,10 @@ struct rcfs_mfdesc *genmfdesc[CKRM_MAX_C
 #else
NULL,
 #endif
+#ifdef CONFIG_CKRM_TYPE_SOCKETCLASS
+   _sock_mfdesc,
+#else
+   NULL,
+#endif
+
 };
Index: linux-2.6.12-rc1/fs/rcfs/socket_fs.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.12-rc1/fs/rcfs/socket_fs.c2005-03-18 15:16:37.391162979 
-0800
@@ -0,0 +1,280 @@
+/* ckrm_socketaq.c 
+ *
+ * Copyright (C) Vivek Kashyap,  IBM Corp. 2004
+ * 
+ * Latest version, more details at http://ckrm.sf.net
+ * 
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ */
+
+/***
+ *  Socket class type
+ *   
+ * Defines the root structure for socket based classes. Currently only inbound
+ * connection control is supported based on prioritized accept queues. 
+ 
**/
+
+#include 
+#include 
+
+extern int rcfs_create_noperm(struct inode *, struct dentry *, int,
+  struct nameidata *);
+extern int rcfs_symlink_noperm(struct inode *, struct dentry *, const char *);
+extern int rcfs_mkdir_noperm(struct inode *, struct dentry *, int);
+extern int rcfs_rmdir_noperm(struct inode *, struct dentry *);
+extern int rcfs_link_noperm(struct dentry *, struct inode *, struct dentry *);
+extern int rcfs_unlink_noperm(struct inode *, struct dentry *);
+extern int rcfs_mknod_noperm(struct inode *, struct dentry *, int mode, dev_t);
+
+extern int rcfs_rmdir(struct inode *, struct dentry *);
+extern int rcfs_unlink(struct inode *, struct dentry *);
+extern int rcfs_rename(struct inode *, struct dentry *, struct inode *,
+  struct dentry *);
+
+extern int rcfs_create_coredir(struct inode *, struct dentry *);
+
+int rcfs_sock_mkdir(struct inode *, struct dentry *, int mode);
+int rcfs_sock_rmdir(struct inode *, struct dentry *);
+struct inode_operations my_iops;
+struct inode_operations class_iops;
+struct inode_operations sub_iops;
+ 
+
+struct rcfs_magf def_magf = {
+   .mode = RCFS_DEFAULT_DIR_MODE,
+   .i_op = _iops,
+   .i_fop = NULL,
+};
+
+struct rcfs_magf rcfs_sock_rootdesc[] = {
+   {
+/* .name = should not be set, copy from classtype name, */
+.mode = RCFS_DEFAULT_DIR_MODE,
+.i_op = _iops,
+/* .i_fop   = _dir_operations, */
+.i_fop = NULL,
+},
+   {
+.name = "members",
+.mode = RCFS_DEFAULT_FILE_MODE,
+.i_op = _iops,
+.i_fop = _fileops,
+},
+   {
+.name = "target",
+.mode = RCFS_DEFAULT_FILE_MODE,
+.i_op = _iops,
+.i_fop = _fileops,
+},
+   {
+.name = "reclassify",
+.mode = RCFS_DEFAULT_FILE_MODE,
+.i_op = _iops,
+.i_fop = _fileops,
+},
+};
+
+struct rcfs_magf rcfs_sock_magf[] = {
+   {
+.name = "config",
+.mode = RCFS_DEFAULT_FILE_MODE,
+.i_op = _iops,
+.i_fop = _fileops,
+},
+   {
+.name = "members",
+.mode = RCFS_DEFAULT_FILE_MODE,
+.i_op = _iops,
+.i_fop = _fileops,
+},
+   {
+.name = "shares",
+

[patch 5/8] CKRM: Task Class Controller

2005-03-29 Thread Gerrit Huizenga

 This patch provides the extensions for CKRM to track task classes.
 This is the base to enable task class based resource control for
 cpu, memory and disk I/O.

Signed-Off-By: Chandra Seetharaman <[EMAIL PROTECTED]>
Signed-Off-By: Hubertus Franke <[EMAIL PROTECTED]>
Signed-Off-By: Shailabh Nagar <[EMAIL PROTECTED]>
Signed-Off-By: Vivek Kashyap <[EMAIL PROTECTED]>
Signed-Off-By: Gerrit Huizenga <[EMAIL PROTECTED]>
Signed-off-by: Nishanth Aravamudan <[EMAIL PROTECTED]>


Index: linux-2.6.12-rc1/fs/rcfs/Makefile
===
--- linux-2.6.12-rc1.orig/fs/rcfs/Makefile  2005-03-18 15:16:29.721772974 
-0800
+++ linux-2.6.12-rc1/fs/rcfs/Makefile   2005-03-18 15:16:33.370482769 -0800
@@ -5,3 +5,4 @@
 obj-$(CONFIG_RCFS_FS) += rcfs.o 
 
 rcfs-y := super.o inode.o dir.o rootdir.o magic.o
+rcfs-$(CONFIG_CKRM_TYPE_TASKCLASS) += tc_magic.o
Index: linux-2.6.12-rc1/fs/rcfs/rootdir.c
===
--- linux-2.6.12-rc1.orig/fs/rcfs/rootdir.c 2005-03-18 15:16:29.721772974 
-0800
+++ linux-2.6.12-rc1/fs/rcfs/rootdir.c  2005-03-18 15:16:33.372482610 -0800
@@ -58,7 +58,7 @@ int rcfs_unregister_engine(struct rbce_e
return 0;
 }
 
-EXPORT_SYMBOL(rcfs_unregister_engine);
+EXPORT_SYMBOL_GPL(rcfs_unregister_engine);
 
 /*
  * rcfs_mkroot
@@ -183,6 +183,10 @@ int rcfs_deregister_classtype(struct ckr
 
 EXPORT_SYMBOL_GPL(rcfs_deregister_classtype);
 
+#ifdef CONFIG_CKRM_TYPE_TASKCLASS
+extern struct rcfs_mfdesc tc_mfdesc;
+#endif
+
 /* Common root and magic file entries.
  * root name, root permissions, magic file names and magic file permissions 
  * are needed by all entities (classtypes and classification engines) existing 
@@ -193,6 +197,10 @@ EXPORT_SYMBOL_GPL(rcfs_deregister_classt
  * table to initialize their magf entries. 
  */
 
-struct rcfs_mfdesc *genmfdesc[] = {
+struct rcfs_mfdesc *genmfdesc[CKRM_MAX_CLASSTYPES] = {
+#ifdef CONFIG_CKRM_TYPE_TASKCLASS
+   _mfdesc,
+#else
NULL,
+#endif
 };
Index: linux-2.6.12-rc1/fs/rcfs/tc_magic.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.12-rc1/fs/rcfs/tc_magic.c 2005-03-18 15:16:33.373482530 -0800
@@ -0,0 +1,93 @@
+/* 
+ * fs/rcfs/tc_magic.c 
+ *
+ * Copyright (C) Shailabh Nagar,  IBM Corp. 2004
+ *   (C) Vivek Kashyap,   IBM Corp. 2004
+ *   (C) Chandra Seetharaman, IBM Corp. 2004
+ *   (C) Hubertus Franke, IBM Corp. 2004
+ *   
+ * define magic fileops for taskclass classtype
+ *
+ * Latest version, more details at http://ckrm.sf.net
+ * 
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include 
+#include 
+
+/*
+ * Taskclass general
+ *
+ * Define structures for taskclass root directory and its magic files 
+ * In taskclasses, there is one set of magic files, created automatically under
+ * the taskclass root (upon classtype registration) and each directory (class) 
+ * created subsequently. However, classtypes can also choose to have different 
+ * sets of magic files created under their root and other directories under 
+ * root using their mkdir function. RCFS only provides helper functions for 
+ * creating the root directory and its magic files
+ * 
+ */
+
+#define TC_FILE_MODE (S_IFREG | S_IRUGO | S_IWUSR)
+
+#define NR_TCROOTMF  7
+struct rcfs_magf tc_rootdesc[NR_TCROOTMF] = {
+   /* First entry must be root */
+   {
+   /* .name = should not be set, copy from classtype name */
+.mode = RCFS_DEFAULT_DIR_MODE,
+.i_op = _dir_inode_operations,
+.i_fop = _dir_operations,
+},
+   /* Rest are root's magic files */
+   {
+.name = "target",
+.mode = TC_FILE_MODE,
+.i_fop = _fileops,
+.i_op = _file_inode_operations,
+},
+   {
+.name = "members",
+.mode = TC_FILE_MODE,
+.i_fop = _fileops,
+.i_op = _file_inode_operations,
+},
+   {
+.name = "stats",
+.mode = TC_FILE_MODE,
+.i_fop = _fileops,
+.i_op = _file_inode_operations,
+},
+   {
+.name = "shares",
+.mode = TC_FILE_MODE,
+.i_fop = _fileops,
+.i_op = _file_inode_operations,
+},
+   /*
+* Reclassify and Config should be made available only at the 
+* root level. Make sure they are the last two entries, as 
+* rcfs_mkdir depends on it.
+*/
+   {
+.name = "reclassify",
+.mode = TC_FILE_MODE,
+.i_fop = _fileops,
+.i_op = _file_inode_operations,
+},
+   {
+.name = "config",
+.mode = TC_FILE_MODE,
+.i_fop = _fileops,
+.i_op 

[patch 4/8] CKRM: Resource Control File System (rcfs)

2005-03-29 Thread Gerrit Huizenga

Updates CKRM Resource Control Filesystem (rcfs) to include full
directory structure support.

Signed-Off-By: Chandra Seetharaman <[EMAIL PROTECTED]>
Signed-Off-By: Hubertus Franke <[EMAIL PROTECTED]>
Signed-Off-By: Shailabh Nagar <[EMAIL PROTECTED]>
Signed-Off-By: Vivek Kashyap <[EMAIL PROTECTED]>
Signed-Off-By: Gerrit Huizenga <[EMAIL PROTECTED]>
Signed-off-by: Nishanth Aravamudan <[EMAIL PROTECTED]>

Index: linux-2.6.12-rc1/fs/Makefile
===
--- linux-2.6.12-rc1.orig/fs/Makefile   2005-03-17 17:34:17.0 -0800
+++ linux-2.6.12-rc1/fs/Makefile2005-03-18 15:16:29.717773292 -0800
@@ -92,6 +92,7 @@ obj-$(CONFIG_JFS_FS)  += jfs/
 obj-$(CONFIG_XFS_FS)   += xfs/
 obj-$(CONFIG_AFS_FS)   += afs/
 obj-$(CONFIG_BEFS_FS)  += befs/
+obj-$(CONFIG_RCFS_FS)  += rcfs/
 obj-$(CONFIG_HOSTFS)   += hostfs/
 obj-$(CONFIG_HPPFS)+= hppfs/
 obj-$(CONFIG_DEBUG_FS) += debugfs/
Index: linux-2.6.12-rc1/fs/rcfs/dir.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.12-rc1/fs/rcfs/dir.c  2005-03-18 15:16:29.718773213 -0800
@@ -0,0 +1,220 @@
+/* 
+ * fs/rcfs/dir.c 
+ *
+ * Copyright (C) Shailabh Nagar,  IBM Corp. 2004
+ *   Vivek Kashyap,   IBM Corp. 2004
+ *   
+ * 
+ * Directory operations for rcfs
+ *
+ * Latest version, more details at http://ckrm.sf.net
+ * 
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define rcfs_positive(dentry)  ((dentry)->d_inode && !d_unhashed((dentry)))
+
+int rcfs_empty(struct dentry *dentry)
+{
+   struct dentry *child;
+   int ret = 0;
+
+   spin_lock(_lock);
+   list_for_each_entry(child, >d_subdirs, d_child)
+   if (!rcfs_is_magic(child) && rcfs_positive(child))
+   goto out;
+   ret = 1;
+out:
+   spin_unlock(_lock);
+   return ret;
+}
+
+/* Directory inode operations */
+
+int rcfs_create_coredir(struct inode *dir, struct dentry *dentry)
+{
+
+   struct rcfs_inode_info *ripar, *ridir;
+   int sz;
+
+   ripar = rcfs_get_inode_info(dir);
+   ridir = rcfs_get_inode_info(dentry->d_inode);
+   /* Inform resource controllers - do Core operations */
+   if (ckrm_is_core_valid(ripar->core)) {
+   sz = strlen(ripar->name) + strlen(dentry->d_name.name) + 2;
+   ridir->name = kmalloc(sz, GFP_KERNEL);
+   if (!ridir->name) {
+   return -ENOMEM;
+   }
+   snprintf(ridir->name, sz, "%s/%s", ripar->name,
+dentry->d_name.name);
+   ridir->core = (*(ripar->core->classtype->alloc))
+   (ripar->core, ridir->name);
+   } else {
+   printk(KERN_ERR "rcfs_mkdir: Invalid parent core %p\n",
+  ripar->core);
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
+int rcfs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
+{
+
+   int retval = 0;
+   struct ckrm_classtype *clstype;
+
+   if (rcfs_mknod(dir, dentry, mode | S_IFDIR, 0)) {
+   printk(KERN_ERR "rcfs_mkdir: error in rcfs_mknod\n");
+   return retval;
+   }
+   dir->i_nlink++;
+   /* Inherit parent's ops since rcfs_mknod assigns noperm ops. */
+   dentry->d_inode->i_op = dir->i_op;
+   dentry->d_inode->i_fop = dir->i_fop;
+   retval = rcfs_create_coredir(dir, dentry);
+   if (retval) {
+   simple_rmdir(dir, dentry);
+   return retval;
+   }
+   /* create the default set of magic files */
+   clstype = (rcfs_get_inode_info(dentry->d_inode))->core->classtype;
+   rcfs_create_magic(dentry, &(((struct rcfs_magf *)clstype->mfdesc)[1]),
+ clstype->mfcount - 3);
+   return retval;
+}
+
+int rcfs_rmdir(struct inode *dir, struct dentry *dentry)
+{
+   struct rcfs_inode_info *ri = rcfs_get_inode_info(dentry->d_inode);
+
+   if (!rcfs_empty(dentry)) {
+   printk(KERN_ERR "rcfs_rmdir: directory not empty\n");
+   return -ENOTEMPTY;
+   }
+   /* Core class removal  */
+
+   if (ri->core == NULL) {
+   printk(KERN_ERR "rcfs_rmdir: core==NULL\n");
+   /* likely a race condition */
+   return 0;
+   }
+
+   if ((*(ri->core->classtype->free)) (ri->core)) {
+   printk(KERN_ERR "rcfs_rmdir: ckrm_free_core_class failed\n");
+   goto out;
+   }
+   ri->core = NULL;/* just to be safe */
+
+   /* 

[patch 3/8] CKRM: Default Classification Engine

2005-03-29 Thread gh

Main code for CKRM default classification engine.  Adds Resrouce
Control (rc) filesystem as mechanism for setting policies for
class assignments in CKRM.

Signed-Off-By: Chandra Seetharaman <[EMAIL PROTECTED]>
Signed-Off-By: Hubertus Franke <[EMAIL PROTECTED]>
Signed-Off-By: Shailabh Nagar <[EMAIL PROTECTED]>
Signed-Off-By: Gerrit Huizenga <[EMAIL PROTECTED]>
Signed-Off-By: Vivek Kashyap <[EMAIL PROTECTED]>


 include/linux/ckrm_ce.h |  108 +
 include/linux/ckrm_events.h |8 
 include/linux/ckrm_rc.h |  355 
 include/linux/rcfs.h|   96 
 include/linux/sched.h   |6 
 init/main.c |2 
 kernel/ckrm/Makefile|2 
 kernel/ckrm/ckrm.c  |  927 
 kernel/ckrm/ckrmutils.c |  195 +
 9 files changed, 1694 insertions(+), 5 deletions(-)

Index: linux-2.6.12-rc1/include/linux/ckrm_ce.h
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.12-rc1/include/linux/ckrm_ce.h2005-03-18 15:16:24.330201800 
-0800
@@ -0,0 +1,95 @@
+/*
+ *  ckrm_ce.h - Header file to be used by Classification Engine of CKRM
+ *
+ * Copyright (C) Hubertus Franke, IBM Corp. 2003
+ *   (C) Shailabh Nagar,  IBM Corp. 2003
+ *   (C) Chandra Seetharaman, IBM Corp. 2003
+ * 
+ * Provides data structures, macros and kernel API of CKRM for 
+ * classification engine.
+ *
+ * Latest version, more details at http://ckrm.sf.net
+ * 
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of version 2.1 of the GNU Lesser General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ *
+ */
+
+#ifndef _LINUX_CKRM_CE_H
+#define _LINUX_CKRM_CE_H
+
+#ifdef CONFIG_CKRM
+
+#include 
+
+/*
+ * Action parameters identifying the cause of a task<->class notify callback 
+ * these can perculate up to user daemon consuming records send by the 
+ * classification engine
+ */
+
+typedef void *(*ce_classify_fct) (enum ckrm_event event, void *obj, ...);
+typedef void (*ce_notify_fct) (enum ckrm_event event, void *classobj,
+void *obj);
+
+struct ckrm_eng_callback {
+   /* general state information */
+   int always_callback;/* set if CE should always be called back 
+  regardless of numclasses */
+
+   /* callbacks which are called without holding locks */
+
+   unsigned long c_interest;   /* set of classification events of 
+* interest to CE 
+*/
+
+   /* generic classify */
+   ce_classify_fct classify;
+
+   /* class added */
+   void (*class_add) (const char *name, void *core, int classtype);
+
+   /* class deleted */
+   void (*class_delete) (const char *name, void *core, int classtype);
+
+   /* callbacks which are called while holding task_lock(tsk) */
+   unsigned long n_interest;   /* set of notification events of 
+*  interest to CE 
+*/
+   /* notify on class switch */
+   ce_notify_fct notify;   
+};
+
+struct inode;
+struct dentry;
+
+struct rbce_eng_callback {
+   int (*mkdir) (struct inode *, struct dentry *, int);/* mkdir */
+   int (*rmdir) (struct inode *, struct dentry *); /* rmdir */
+   int (*mnt) (void);
+   int (*umnt) (void);
+};
+
+extern int ckrm_register_engine(const char *name, struct ckrm_eng_callback *);
+extern int ckrm_unregister_engine(const char *name);
+
+extern void *ckrm_classobj(char *, int *classtype);
+
+extern int rcfs_register_engine(struct rbce_eng_callback *);
+extern int rcfs_unregister_engine(struct rbce_eng_callback *);
+
+extern int ckrm_reclassify(int pid);
+
+#ifndef _LINUX_CKRM_RC_H
+
+extern void ckrm_core_grab(struct ckrm_core_class *core);
+extern void ckrm_core_drop(struct ckrm_core_class *core);
+#endif
+
+#endif /* CONFIG_CKRM */
+#endif /* _LINUX_CKRM_CE_H */
Index: linux-2.6.12-rc1/include/linux/ckrm_events.h
===
--- linux-2.6.12-rc1.orig/include/linux/ckrm_events.h   2005-03-18 
15:16:16.981786266 -0800
+++ linux-2.6.12-rc1/include/linux/ckrm_events.h2005-03-18 
15:16:24.335201402 -0800
@@ -108,70 +108,78 @@ int ckrm_unregister_event_cb(enum ckrm_e
 extern void ckrm_invoke_event_cb_chain(enum ckrm_event ev, void *arg);
 
 /* forward declarations for function arguments */
-struct task_struct;
+
+#include/* for task_struct */
+
 struct sock;
 struct user_struct;
 
 static inline void ckrm_cb_fork(struct task_struct *p)
 {
- 

[patch 1/8] CKRM: Core CKRM Event Callbacks

2005-03-29 Thread gh

Core CKRM Event Callbacks.

On exec, fork, exit, real/effective gid/uid, use CKRM to associate
tasks with appropriate class.

Addressed all review comments except:

Greg KH:
Use of __bitwise and sparse in enum's
Use of kernel list type

Signed-off-by:  Shailabh Nagar <[EMAIL PROTECTED]>
Signed-off-by:  Hubertus Franke <[EMAIL PROTECTED]>
Signed-off-by:  Chandra Seetharaman <[EMAIL PROTECTED]>
Signed-off-by:  Gerrit Huizenga <[EMAIL PROTECTED]>


 fs/exec.c   |2 
 include/linux/ckrm_events.h |  190 
 include/linux/sched.h   |1 
 init/Kconfig|   16 +++
 kernel/Makefile |2 
 kernel/ckrm/Makefile|7 +
 kernel/ckrm/ckrm_events.c   |   97 ++
 kernel/exit.c   |3 
 kernel/fork.c   |4 
 kernel/sys.c|   10 ++
 10 files changed, 331 insertions(+), 1 deletion(-)

Index: linux-2.6.12-rc1/fs/exec.c
===
--- linux-2.6.12-rc1.orig/fs/exec.c 2005-03-17 17:34:09.0 -0800
+++ linux-2.6.12-rc1/fs/exec.c  2005-03-18 15:16:16.981786266 -0800
@@ -48,6 +48,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1087,6 +1088,7 @@ int search_binary_handler(struct linux_b
fput(bprm->file);
bprm->file = NULL;
current->did_exec = 1;
+   ckrm_cb_exec(bprm->filename);
return retval;
}
read_lock(_lock);
Index: linux-2.6.12-rc1/include/linux/ckrm_events.h
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.12-rc1/include/linux/ckrm_events.h2005-03-18 
15:16:16.981786266 -0800
@@ -0,0 +1,192 @@
+/*
+ * ckrm_events.h - Class-based Kernel Resource Management (CKRM)
+ * event handling
+ *
+ * Copyright (C) Hubertus Franke, IBM Corp. 2003,2004
+ *   (C) Shailabh Nagar,  IBM Corp. 2003
+ *   (C) Chandra Seetharaman, IBM Corp. 2003
+ * 
+ * 
+ * Provides a base header file including macros and basic data structures.
+ *
+ * Latest version, more details at http://ckrm.sf.net
+ * 
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of version 2.1 of the GNU Lesser General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ *
+ */
+
+#ifndef _LINUX_CKRM_EVENTS_H
+#define _LINUX_CKRM_EVENTS_H
+
+#ifdef CONFIG_CKRM
+
+/*
+ * Data structure and function to get the list of registered 
+ * resource controllers.
+ */
+
+/*
+ * CKRM defines a set of events at particular points in the kernel
+ * at which callbacks registered by various class types are called
+ */
+
+enum ckrm_event {
+   /*
+* we distinguish these events types:
+*
+* (a) CKRM_LATCHABLE_EVENTS
+*  events can be latched for event callbacks by classtypes
+*
+* (b) CKRM_NONLATACHBLE_EVENTS
+* events can not be latched but can be used to call classification
+* 
+* (c) event that are used for notification purposes
+* range: [ CKRM_EVENT_CANNOT_CLASSIFY .. )
+*/
+
+   /* events (a) */
+
+   CKRM_LATCHABLE_EVENTS,
+
+   CKRM_EVENT_NEWTASK = CKRM_LATCHABLE_EVENTS,
+   CKRM_EVENT_FORK,
+   CKRM_EVENT_EXIT,
+   CKRM_EVENT_EXEC,
+   CKRM_EVENT_UID,
+   CKRM_EVENT_GID,
+   CKRM_EVENT_LOGIN,
+   CKRM_EVENT_USERADD,
+   CKRM_EVENT_USERDEL,
+   CKRM_EVENT_LISTEN_START,
+   CKRM_EVENT_LISTEN_STOP,
+   CKRM_EVENT_APPTAG,
+
+   /* events (b) */
+
+   CKRM_NONLATCHABLE_EVENTS,
+
+   CKRM_EVENT_RECLASSIFY = CKRM_NONLATCHABLE_EVENTS,
+
+   /* events (c) */
+
+   CKRM_NOTCLASSIFY_EVENTS,
+
+   CKRM_EVENT_MANUAL = CKRM_NOTCLASSIFY_EVENTS,
+
+   CKRM_NUM_EVENTS
+};
+
+/*
+ * CKRM event callback specification for the classtypes or resource 
controllers 
+ *   typically an array is specified using CKRM_EVENT_SPEC terminated with 
+ *   CKRM_EVENT_SPEC_LAST and then that array is registered using
+ *   ckrm_register_event_set.
+ *   Individual registration of event_cb is also possible
+ */
+
+struct ckrm_hook_cb {
+   void (*fct)(void *arg);
+   struct ckrm_hook_cb *next;
+};
+
+struct ckrm_event_spec {
+   enum ckrm_event ev;
+   struct ckrm_hook_cb cb;
+};
+
+int ckrm_register_event_set(struct ckrm_event_spec especs[]);
+int ckrm_unregister_event_set(struct ckrm_event_spec especs[]);
+int ckrm_register_event_cb(enum ckrm_event ev, struct 

[patch 2/8] CKRM: Processor Delay Accounting

2005-03-29 Thread gh

CKRM processor scheduling delay accounting - provides a mechanism
to In addition to counting frequency the total delay in ns is also
recorded. CPU delays are specified as cpu-wait and cpu-run.  I/O delays
are recorded for memory and regular I/O.  Information is accessible
through /proc//delay.

Signed-Off-By: Chandra Seetharaman <[EMAIL PROTECTED]>
Signed-Off-By: Hubertus Franke <[EMAIL PROTECTED]>
Signed-Off-By: Shailabh Nagar <[EMAIL PROTECTED]>
Signed-Off-By: Gerrit Huizenga <[EMAIL PROTECTED]>

 fs/proc/array.c|   18 +
 fs/proc/base.c |   18 +
 include/linux/sched.h  |   86 +
 include/linux/taskdelays.h |   45 +++
 init/Kconfig   |8 
 kernel/fork.c  |1 
 kernel/sched.c |   17 
 mm/memory.c|9 +++-
 8 files changed, 200 insertions(+), 2 deletions(-)

Index: linux-2.6.12-rc1/fs/proc/array.c
===
--- linux-2.6.12-rc1.orig/fs/proc/array.c   2005-03-17 17:34:18.0 
-0800
+++ linux-2.6.12-rc1/fs/proc/array.c2005-03-18 15:16:20.884475861 -0800
@@ -482,3 +482,21 @@ int proc_pid_statm(struct task_struct *t
return sprintf(buffer,"%d %d %d %d %d %d %d\n",
   size, resident, shared, text, lib, data, 0);
 }
+
+
+int proc_pid_delay(struct task_struct *task, char * buffer)
+{
+   int res;
+
+   res  = sprintf(buffer,"%u %llu %llu %u %llu %u %llu\n",
+  (unsigned int) get_delay(task,runs),
+  (uint64_t) get_delay(task,runcpu_total),
+  (uint64_t) get_delay(task,waitcpu_total),
+  (unsigned int) get_delay(task,num_iowaits),
+  (uint64_t) get_delay(task,iowait_total),
+  (unsigned int) get_delay(task,num_memwaits),
+  (uint64_t) get_delay(task,mem_iowait_total)
+   );
+   return res;
+}
+
Index: linux-2.6.12-rc1/fs/proc/base.c
===
--- linux-2.6.12-rc1.orig/fs/proc/base.c2005-03-17 17:34:18.0 
-0800
+++ linux-2.6.12-rc1/fs/proc/base.c 2005-03-18 15:16:20.889475463 -0800
@@ -120,6 +120,10 @@ enum pid_directory_inos {
 #ifdef CONFIG_AUDITSYSCALL
PROC_TID_LOGINUID,
 #endif
+#ifdef CONFIG_DELAY_ACCT
+PROC_TID_DELAY_ACCT,
+PROC_TGID_DELAY_ACCT,
+#endif
PROC_TID_FD_DIR = 0x8000,   /* 0x8000-0x */
PROC_TID_OOM_SCORE,
PROC_TID_OOM_ADJUST,
@@ -155,6 +159,9 @@ static struct pid_entry tgid_base_stuff[
 #ifdef CONFIG_SECURITY
E(PROC_TGID_ATTR,  "attr",S_IFDIR|S_IRUGO|S_IXUGO),
 #endif
+#ifdef CONFIG_DELAY_ACCT
+   E(PROC_TGID_DELAY_ACCT,"delay",   S_IFREG|S_IRUGO),
+#endif
 #ifdef CONFIG_KALLSYMS
E(PROC_TGID_WCHAN, "wchan",   S_IFREG|S_IRUGO),
 #endif
@@ -191,6 +198,9 @@ static struct pid_entry tid_base_stuff[]
 #ifdef CONFIG_SECURITY
E(PROC_TID_ATTR,   "attr",S_IFDIR|S_IRUGO|S_IXUGO),
 #endif
+#ifdef CONFIG_DELAY_ACCT
+   E(PROC_TGID_DELAY_ACCT,"delay",   S_IFREG|S_IRUGO),
+#endif
 #ifdef CONFIG_KALLSYMS
E(PROC_TID_WCHAN,  "wchan",   S_IFREG|S_IRUGO),
 #endif
@@ -1564,6 +1574,13 @@ static struct dentry *proc_pident_lookup
ei->op.proc_read = proc_pid_wchan;
break;
 #endif
+#ifdef CONFIG_DELAY_ACCT
+   case PROC_TID_DELAY_ACCT:
+   case PROC_TGID_DELAY_ACCT:
+   inode->i_fop = _info_file_operations;
+   ei->op.proc_read = proc_pid_delay;
+   break;
+#endif
 #ifdef CONFIG_SCHEDSTATS
case PROC_TID_SCHEDSTAT:
case PROC_TGID_SCHEDSTAT:
Index: linux-2.6.12-rc1/fs/proc/internal.h
===
--- linux-2.6.12-rc1.orig/fs/proc/internal.h2005-03-17 17:33:50.0 
-0800
+++ linux-2.6.12-rc1/fs/proc/internal.h 2005-03-18 15:16:20.889475463 -0800
@@ -36,6 +36,7 @@ extern int proc_tid_stat(struct task_str
 extern int proc_tgid_stat(struct task_struct *, char *);
 extern int proc_pid_status(struct task_struct *, char *);
 extern int proc_pid_statm(struct task_struct *, char *);
+extern int proc_pid_delay(struct task_struct *, char*);
 
 static inline struct task_struct *proc_task(struct inode *inode)
 {
Index: linux-2.6.12-rc1/include/linux/sched.h
===
--- linux-2.6.12-rc1.orig/include/linux/sched.h 2005-03-17 17:33:50.0 
-0800
+++ linux-2.6.12-rc1/include/linux/sched.h  2005-03-18 15:16:20.891475304 
-0800
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct exec_domain;
 
@@ -727,6 +728,9 @@ struct task_struct {
nodemask_t mems_allowed;
int cpuset_mems_generation;
 

[patch 0/8] CKRM: Core patch set

2005-03-29 Thread gh
--

This is the core patch set for CKRM, review comments almost all
applied (there are a few we are still working on, mostly cosmetic).
However, this set has been extensively regression tested on IA32,
x86-64/EM64T, and PPC64, with various CKRM CONFIG options on and
off and both regression tests and ckrm's functional tests.

I believe this set is ready for additional testing in -mm.  We
have an additional 4 patch sets that will follow this (classification
engines, memory controller, IO controller, updated network controller).

Continued comments are welcome; once we have patches for the last
of the cleanups, we are hoping we'll have sufficient testing to be
able to push this towards mainline.

gerrit
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.4.30-rc3 md/ext3 problems

2005-03-29 Thread Marcelo Tosatti
On Tue, Mar 29, 2005 at 10:10:34AM +1000, Neil Brown wrote:
> On Monday March 28, [EMAIL PROTECTED] wrote:
> > On Mon, Mar 28, 2005 at 10:34:05AM +0300, [Ville Herva] wrote:
> > > 
> > > I just upgraded from linux-2.4.21 + vserser 0.17 to 2.4.30rc3 + vserver
> > > 1.2.10. The box has been running stable with 2.4.21 + vserver 0.17/0.16 
> > > for
> > > a few years (uptime before reboot was nearly 400 days.)
> > > 
> > > The boot went fine, but after few hours I got 
> > > Message from [EMAIL PROTECTED] at Sun Mar 27 22:07:00 2005 ...
> > > kernel: journal commit I/O error
> 
> I got that error on 2.4.30-rc1 a couple of times, and now cannot
> reproduce it :-(
> But if you got it too, then it wasn't just bad luck.
> 
> The ext3 code in 2.4.30-rc does have a few more checks for IO errors
> which will cause the journal to be aborted and produce this error, so
> I suspect that change which caused the problem is a change in ext3.
> However that doesn't mean the bug is there.
> 
> The extra code in ext3 seems to just check if buffer_uptodate is false
> after it has waited on a locked buffer, and triggers a journal abort
> if it isn't.  This should be perfectly safe, and I cannot find any
> logic error near by.  But nor can I find any errors that would cause a
> buffer returned from raid1 to not be uptodate (unless there really was
> an IO error).

Attached is the backout patch, for convenience.
# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
#   2005/03/29 18:49:25-03:00 [EMAIL PROTECTED] 
#   Cset exclude: [EMAIL PROTECTED]|ChangeSet|20050226095914|25750
# 
# mm/filemap.c
#   2005/03/29 18:49:22-03:00 [EMAIL PROTECTED] +0 -0
#   Exclude
# 
# include/linux/jbd.h
#   2005/03/29 18:49:22-03:00 [EMAIL PROTECTED] +0 -0
#   Exclude
# 
# fs/jbd/transaction.c
#   2005/03/29 18:49:21-03:00 [EMAIL PROTECTED] +0 -0
#   Exclude
# 
# fs/jbd/journal.c
#   2005/03/29 18:49:21-03:00 [EMAIL PROTECTED] +0 -0
#   Exclude
# 
# fs/jbd/commit.c
#   2005/03/29 18:49:21-03:00 [EMAIL PROTECTED] +0 -0
#   Exclude
# 
# fs/ext3/super.c
#   2005/03/29 18:49:21-03:00 [EMAIL PROTECTED] +0 -0
#   Exclude
# 
# fs/ext3/fsync.c
#   2005/03/29 18:49:21-03:00 [EMAIL PROTECTED] +0 -0
#   Exclude
# 
diff -Nru a/fs/ext3/fsync.c b/fs/ext3/fsync.c
--- a/fs/ext3/fsync.c   2005-03-29 18:50:56 -03:00
+++ b/fs/ext3/fsync.c   2005-03-29 18:50:56 -03:00
@@ -69,7 +69,7 @@
if (test_opt(inode->i_sb, DATA_FLAGS) == EXT3_MOUNT_WRITEBACK_DATA)
ret |= fsync_inode_data_buffers(inode);
 
-   ret |= ext3_force_commit(inode->i_sb);
+   ext3_force_commit(inode->i_sb);
 
return ret;
 }
diff -Nru a/fs/ext3/super.c b/fs/ext3/super.c
--- a/fs/ext3/super.c   2005-03-29 18:50:56 -03:00
+++ b/fs/ext3/super.c   2005-03-29 18:50:56 -03:00
@@ -1608,13 +1608,12 @@
 
 static int ext3_sync_fs(struct super_block *sb)
 {
-   int err;
tid_t target;

sb->s_dirt = 0;
target = log_start_commit(EXT3_SB(sb)->s_journal, NULL);
-   err = log_wait_commit(EXT3_SB(sb)->s_journal, target);
-   return err;
+   log_wait_commit(EXT3_SB(sb)->s_journal, target);
+   return 0;
 }
 
 /*
diff -Nru a/fs/jbd/commit.c b/fs/jbd/commit.c
--- a/fs/jbd/commit.c   2005-03-29 18:50:55 -03:00
+++ b/fs/jbd/commit.c   2005-03-29 18:50:55 -03:00
@@ -92,7 +92,7 @@
struct buffer_head *wbuf[64];
int bufs;
int flags;
-   int err = 0;
+   int err;
unsigned long blocknr;
char *tagp = NULL;
journal_header_t *header;
@@ -299,8 +299,6 @@
spin_unlock(_datalist_lock);
unlock_journal(journal);
wait_on_buffer(bh);
-   if (unlikely(!buffer_uptodate(bh)))
-   err = -EIO;
/* the journal_head may have been removed now */
lock_journal(journal);
goto write_out_data;
@@ -328,8 +326,6 @@
spin_unlock(_datalist_lock);
unlock_journal(journal);
wait_on_buffer(bh);
-   if (unlikely(!buffer_uptodate(bh)))
-   err = -EIO;
lock_journal(journal);
spin_lock(_datalist_lock);
continue;   /* List may have changed */
@@ -355,9 +351,6 @@
}
spin_unlock(_datalist_lock);
 
-   if (err)
-   __journal_abort_hard(journal);
-
/*
 * If we found any dirty or locked buffers, then we should have
 * looped back up to the write_out_data label.  If there weren't
@@ -548,8 +541,6 @@
if (buffer_locked(bh)) {
unlock_journal(journal);
wait_on_buffer(bh);
-   if (unlikely(!buffer_uptodate(bh)))
-   err = -EIO;
lock_journal(journal);

Re: no need to check for NULL before calling kfree() -fs/ext2/

2005-03-29 Thread Paul Jackson
Pekka wrote:
>  (4) The cleanups Jesper and others are doing are to remove the
>  _redundant_ NULL checks (i.e. it is now checked twice). 

Even such obvious changes as removing redundant checks doesn't
seem to ensure a performance improvement.  Jesper Juhl posted
performance data for such changes in his microbenchmark a couple
of days ago.

As I posted then, I could swear that his numbers show:

> Just looking at the third run, it seems to me that "if (likely(p))
> kfree(p);" beats a naked "kfree(p);" everytime, whether p is half
> NULL's, or very few NULL's, or almost all NULL's.

Twice now I have asked Jesper to explain this strange result.

I have heard no explanation (not even a terse "you idiot ;)"),
nor anyone else comment on these numbers.

Maybe we should be following your good advice:

> You don't know that until you profile! 

instead of continuing to make these code changes.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: i386/x86_64 segment register issuses (Re: PATCH: Fix x86 segment register access)

2005-03-29 Thread Linus Torvalds


On Tue, 29 Mar 2005, H. J. Lu wrote:
> 
> > the smaller and faster version do not want to just rely on gas
> > automatically getting it right, especially since gas has historically been
> > very very bad at getting things right.
> 
> We are fixing those issues in assembler. If people run into problems
> like that with gas, they can report them. They will be fixed.

It's fine if gas fixes things. It's not fine if gas breaks things that 
used to work, for no really good reason.

> > What is the advantage of not allowing "movl %ds,mem"? Really? Especially
> > since I suspect the kernel is pretty much the only one who does this, and
> > the kernel really does do it on purpose. The kernel explicitly wants the
> > 32-bit version, knowing that the upper bits are undefined.
> > 
> 
> Kernel has
> 
>   unsigned gsindex;
>   asm volatile("movl %%gs,%0" : "=g" (gsindex));

Ok, that's a real x86-64 bug, it seems. Andi, please fix, preferably by 
just making the "g" be a "r".

However, your argument isn't very valid, since:

> The new assembler will make sure that it won't happen.

Not true, since the suggestion was just to change all segment "movl"  
things to "mov", at which point the same old bug is still there, and the
assembler didn't really help us at all.

See the problem? You're not actually protecting anything. The change just 
makes it _harder_ to make sizes explicit, and suddenly we have to trust an 
assembler to be clever about sizes, when that assembler historically has 
definitely _not_ been very clever about them at all. 

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] embarassing typo

2005-03-29 Thread Vicente Feito
As long as the variable doesn't get overflowed you would have a negation, you
shouldn't do dri_data[5] = ptr->dri * 0xff; if ptr->dri it's 255, but if
ptr->dri = 1 i.e. (like is set in zr36050_setup) then you would be getting
the negation, -1. the Direct rendering support is a flag afaik, so in this
case I believe is a worthy C obfuscated negation code :)
btw, are you sure about this patch?I would contact the maintainer first,
because and'ing that doesn't make much sense...
Disclaimer, all this is: AFAIK! :)

On Tuesday 29 March 2005 09:58 pm, you wrote:
> Måns Rullgård wrote:
> > "Ronald S. Bultje" <[EMAIL PROTECTED]> writes:
> >>--- linux-2.6.5/drivers/media/video/zr36050.c.old 16 Sep 2004 22:53:27
> >> - 1.2 +++ linux-2.6.5/drivers/media/video/zr36050.c 29 Mar 2005
> >> 20:30:23 - @@ -419,7 +419,7 @@
> >>  dri_data[2] = 0x00;
> >>  dri_data[3] = 0x04;
> >>  dri_data[4] = ptr->dri >> 8;
> >>- dri_data[5] = ptr->dri * 0xff;
> >>+ dri_data[5] = ptr->dri & 0xff;
> >
> > Hey, that's a nice obfuscation of a simple negation.
>
> It's not a negation.  This statement always assigns zero to
> dri_data[5] if dri_data is char[].  Looks like gcc isn't catching
> this problem.
>
> > BTW, when assigning to a char type, is the masking really necessary at
> > all?  I can't see that it should make a difference.  Am I missing
> > something subtle?
>
> Well, it's a matter of readability mostly.  For now at least, when
> char is always 8 bytes...
>
> /mjt
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Mac mini sound woes

2005-03-29 Thread Lee Revell
On Wed, 2005-03-30 at 03:45 +0200, Marcin Dalecki wrote:
> On 2005-03-29, at 12:22, Takashi Iwai wrote:
> >
> > ALSA provides the "driver" feature in user-space because it's more
> > flexible, more efficient and safer than doing in kernel.  It's
> > transparent from apps perspective.  It really doesn't matter whether
> > it's in kernel or user space.
> 
> Yes because it's that wonder full linux sound processing sucks in 
> compare
> to the other OSs out there doing it in kernel.

What are you taking about?  It's actually quite good.

Have you actually tried these other OSes lately?  These devices in
question (those lacking hardware mixing and volume control) don't
exactly work great under that OS.

Lee

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] embarassing typo

2005-03-29 Thread Måns Rullgård
Michael Tokarev <[EMAIL PROTECTED]> writes:

> Måns Rullgård wrote:
>> "Ronald S. Bultje" <[EMAIL PROTECTED]> writes:
>>
>>>--- linux-2.6.5/drivers/media/video/zr36050.c.old16 Sep 2004 22:53:27 
>>>-  1.2
>>>+++ linux-2.6.5/drivers/media/video/zr36050.c29 Mar 2005 20:30:23 
>>>-
>>>@@ -419,7 +419,7 @@
>>> dri_data[2] = 0x00;
>>> dri_data[3] = 0x04;
>>> dri_data[4] = ptr->dri >> 8;
>>>-dri_data[5] = ptr->dri * 0xff;
>>>+dri_data[5] = ptr->dri & 0xff;
>> Hey, that's a nice obfuscation of a simple negation.
>
> It's not a negation.  This statement always assigns zero to
> dri_data[5] if dri_data is char[].

Sure about that?

__u16 i;
char c;
i = 1; c = i * 255; /* c = 255 = -1 */
i = 2; c = i * 255; /* c = 510 & 0xff = 254 = -2 */
...

Looks like negation to me.

-- 
Måns Rullgård
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Industry db benchmark result on recent 2.6 kernels

2005-03-29 Thread Nick Piggin
Chen, Kenneth W wrote:
Nick Piggin wrote on Tuesday, March 29, 2005 5:32 PM
If it is doing a lot of mapping/unmapping (or fork/exit), then that
might explain why 2.6.11 is worse.
Fortunately there are more patches to improve this on the way.

Once benchmark reaches steady state, there is no mapping/unmapping
going on.  Actually, the virtual address space for all the processes
are so stable at steady state that we don't even see it grow or shrink.
Oh, well there goes that theory ;)
The only other thing I can think of is the CPU scheduler changes
that went into 2.6.11 (but there are obviously a lot that I can't
think of).
I'm sure I don't need to tell you it would be nice to track down
the source of these problems rather than papering over them with
improvements to the block layer... any indication of what has gone
wrong?
Typically if the CPU scheduler has gone bad and is moving too many
tasks around (and hurting caches), you'll see things like copy_*_user
increase in cost for the same units of work performed. Wheras if it
is too reluctant to move tasks, you'll see increased idle time.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: i386/x86_64 segment register issuses (Re: PATCH: Fix x86 segment register access)

2005-03-29 Thread H. J. Lu
On Tue, Mar 29, 2005 at 04:30:01PM -0800, Linus Torvalds wrote:
> 
> 
> On Mon, 28 Mar 2005, Andi Kleen wrote:
> >
> > "H. J. Lu" <[EMAIL PROTECTED]> writes:
> > > The new assembler will disallow them since those instructions with
> > > memory operand will only use the first 16bits. If the memory operand
> > > is 16bit, you won't see any problems. But if the memory destinatin
> > > is 32bit, the upper 16bits may have random values. The new assembler
> > 
> > Does it really have random values on existing x86 hardware?
> 
> The upper bits are not written at all, so it's not random.
> 
> > If it is a only a "theoretical" problem that does not happen
> > in practice I would advise to not do the change.
> 
> My preference too. The reason we use "movl" is because we really do want 
> the 32-bit versions, since they are faster. It's a conscious choice. In 
> contrast "movw" generates bigger and slower code on all assemblers out 
> there, and "mov" doesn't make it clear which one it is. Is it the slow 
> one, or the fast one? 

"mov" shouldn't generate the 0x66 prefix, at least with the assembler
since binutils 2.14.90.0.4 20030523. The assembler in CVS won't generate
0x66 for "movw" either.

> Now, those versions of gas may be so old that nobody cares, but the
> explicit size still is a GOOD THING. The size DOES MATTER. People who want

Suggesting "mov" instead of "movw" is for the existing assemblers. Or
kernel can check assembler version to decide if "movw" should be used.
I can verify the first Linux assembler which won't generate 0x66 for
"movw".

> the smaller and faster version do not want to just rely on gas
> automatically getting it right, especially since gas has historically been
> very very bad at getting things right.

We are fixing those issues in assembler. If people run into problems
like that with gas, they can report them. They will be fixed.

> 
> What is the advantage of not allowing "movl %ds,mem"? Really? Especially
> since I suspect the kernel is pretty much the only one who does this, and
> the kernel really does do it on purpose. The kernel explicitly wants the
> 32-bit version, knowing that the upper bits are undefined.
> 

Kernel has

unsigned gsindex;
asm volatile("movl %%gs,%0" : "=g" (gsindex));
...
if (gsindex)


It is OK if gcc never generates memory access like

movl %gs,0x128(%rsp)

Otherwise, the upper bits in gsindex are undefined. The new
assembler will make sure that it won't happen.


H.J.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Mac mini sound woes

2005-03-29 Thread Marcin Dalecki
On 2005-03-30, at 01:39, Benjamin Herrenschmidt wrote:
On Tue, 2005-03-29 at 17:25 -0600, Chris Friesen wrote:
Lee Revell wrote:
This is the exact line of reasoning that led to Winmodems.
My main issue with winmodems is not so much the software offload, but
rather that the vendors don't release full specs.
If all winmodem manufacturers released full hardware specs, I doubt
people would really complain all that much.  There's a fairly large 
pool
of talent available to write drivers once the interfaces are known.
Look at the pile of junk that are most winmodem driver implementations,
nothing I want to see in the kernel ever. Those things should be in
userland.
You are joking? Linux IS NOT an RT OS. And well not too long ago you 
could
be jailed for example in germany for using not well behaving 
communication devices.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Mac mini sound woes

2005-03-29 Thread Marcin Dalecki
On 2005-03-30, at 00:13, Lee Revell wrote:
On Tue, 2005-03-29 at 11:22 +0200, Marcin Dalecki wrote:
No. You didn't get it. I'm taking the view that mixing sound is simply
a task you would typically love to make a DSP firmware do.
However providing a DSP for sound processing at 44kHZ on the same
PCB as an 1GHZ CPU is a ridiculous waste of resources. Thus most
hardware
vendors out there decided to use the main CPU instead. Thus the
"firmware"
is simply running on the main CPU now. Now where should it go? I'm
convinced
that its better to put it near the hardware in the whole stack. You
think
it's best to put it far away and to invent artificial synchronization
problems between different applications putting data down to the
same hardware device.
This is the exact line of reasoning that led to Winmodems.
Yes and BTW those are from a hardware point of view a technically 
perfectly
fine solution. The obstacles here are two fold: Win32 kernel sucks big 
rocks
on latency issues. However since the time we are over 1GHz and use XP 
they work perfectly
fine. On Linux you don't get the necessary DSP processing code/docs. 
Both are just pragmatical arguments which don't apply to sound 
processing at all.
And for you note - I'm the guy who several years ago wrote the first 
ever GDI-Printer
driver for Linux (oki4linux) despite claims from quite prominent people 
here that this couldn't be ever done. And yes I did it in user space 
because pages are not data streams.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Mac mini sound woes

2005-03-29 Thread Marcin Dalecki
On 2005-03-29, at 12:22, Takashi Iwai wrote:
ALSA provides the "driver" feature in user-space because it's more
flexible, more efficient and safer than doing in kernel.  It's
transparent from apps perspective.  It really doesn't matter whether
it's in kernel or user space.
Yes because it's that wonder full linux sound processing sucks in 
compare
to the other OSs out there doing it in kernel.

I think your misunderstanding is that you beliieve user-space can't do
RT.  It's wrong.  See JACK (jackit.sf.net), for example.
I know JACK in and out. It doesn't provide what you claim.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] embarassing typo

2005-03-29 Thread Dmitry Torokhov
On Tuesday 29 March 2005 16:58, Michael Tokarev wrote:
> Well, it's a matter of readability mostly.  For now at least, when
> char is always 8 bytes...

Wow, that's one huge char you have there ;)

-- 
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Disc driver is module, software suspend fails

2005-03-29 Thread Jim Carter
On Tue, 29 Mar 2005, Pavel Machek wrote:

> You insmod driver for your swap device, then you echo device numbers
> to /sys... then initiate resume.

So you're saying, let the machine come all the way up, log in as root, 
"echo 8:5 > /sys/power/resume" (I think that was the name), then "echo 
resume > /sys/power/state"?  Hmm, you would have to bypass "swapon -a",
e.g. boot with the -b kernel parameter.  

Or I'll bet one could do something equivalent in the initrd -- much more 
user friendly.  But the friendliest of all would be if the swsusp resume 
call were not a late_initcall but rather were called just before the root 
was mounted, after the initrd (if any) had loaded whatever modules.  I 
think you're confirming that that approach would not blow up the kernel -- 
if it will work with the root mounted and user space in full roar (well, 
skimpy roar with the -b switch), then it's got to be OK at the earlier 
time.

I'll see what I can do.


James F. Carter  Voice 310 825 2897FAX 310 206 6673
UCLA-Mathnet;  6115 MSA; 405 Hilgard Ave.; Los Angeles, CA, USA 90095-1555
Email: [EMAIL PROTECTED]  http://www.math.ucla.edu/~jimc (q.v. for PGP key)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Industry db benchmark result on recent 2.6 kernels

2005-03-29 Thread Chen, Kenneth W
Nick Piggin wrote on Tuesday, March 29, 2005 5:32 PM
> If it is doing a lot of mapping/unmapping (or fork/exit), then that
> might explain why 2.6.11 is worse.
>
> Fortunately there are more patches to improve this on the way.

Once benchmark reaches steady state, there is no mapping/unmapping
going on.  Actually, the virtual address space for all the processes
are so stable at steady state that we don't even see it grow or shrink.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Industry db benchmark result on recent 2.6 kernels

2005-03-29 Thread Nick Piggin
Linus Torvalds wrote:
On Tue, 29 Mar 2005, Chen, Kenneth W wrote:
Linus Torvalds wrote on Tuesday, March 29, 2005 4:00 PM
The fact that it seems to fluctuate pretty wildly makes me wonder
how stable the numbers are.
I can't resist myself from bragging. The high point in the fluctuation
might be because someone is working hard trying to make 2.6 kernel run
faster.  Hint hint hint .  ;-)

Heh. How do you explain the low-point? If there's somebody out there 
working hard on making it run slower, I want to whack the guy ;)

If it is doing a lot of mapping/unmapping (or fork/exit), then that
might explain why 2.6.11 is worse.
Fortunately there are more patches to improve this on the way.
Kernel profiles would be useful if possible.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: memcpy(a,b,CONST) is not inlined by gcc 3.4.1 in Linux kernel

2005-03-29 Thread Gerold Jury

>> On Tue, Mar 29, 2005 at 05:37:06PM +0300, Denis Vlasenko wrote:
>> > /*
>> >  * This looks horribly ugly, but the compiler can optimize it totally,
>> >  * as the count is constant.
>> >  */
>> > static inline void * __constant_memcpy(void * to, const void * from,
>> > size_t n) {
>> > if (n <= 128)
>> > return __builtin_memcpy(to, from, n);
>>
>> The problem is that in GCC < 4.0 there is no constant propagation
>> pass before expanding builtin functions, so the __builtin_memcpy
>> call above sees a variable rather than a constant.
>
>or change "size_t n" to "const size_t n" will also fix the issue.
>As we do some (well very little and with inlining and const values)
>const progation before 4.0.0 on the trees before expanding the builtin.
>
>-- Pinski
>-
I used the following "const size_t n" change on x86_64
and it reduced the memcpy count from 1088 to 609 with my setup and gcc 3.4.3.
(kernel 2.6.12-rc1, running now)

--- include/asm-x86_64/string.h.~1~ 2005-03-02 08:38:33.0 +0100
+++ include/asm-x86_64/string.h 2005-03-30 03:24:35.0 +0200
@@ -28,9 +28,9 @@
function. */

 #define __HAVE_ARCH_MEMCPY 1
-extern void *__memcpy(void *to, const void *from, size_t len);
+extern void *__memcpy(void *to, const void *from, const size_t len);
 #define memcpy(dst,src,len) \
-   ({ size_t __len = (len);\
+   ({ const size_t __len = (len);  \
   void *__ret; \
   if (__builtin_constant_p(len) && __len >= 64)\
 __ret = __memcpy((dst),(src),__len);   \
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ppc32: CPM2 PIC cleanup irq_to_siubit array

2005-03-29 Thread Dan Malek
On Mar 29, 2005, at 5:30 PM, Kumar Gala wrote:
Cleaned up irq_to_siubit array so we no longer need to do 1 << 
(31-bit),
just 1 << bit.
Will you please put a comment in here that indicates this array now
has this computation done?  When I wrote it, these bit numbers
matched the registers and the documentation, so I didn't take
the time to explain. :-)
Thanks.
-- Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/2] fork_connector: add a fork connector

2005-03-29 Thread Paul Jackson
[ Hmmm .. the following pertains more to accounting than to fork_connector,
  as have my other remarks earlier today.  I notice just now I am on a thread
  whose Subject is "fork_connector".   Oh well.  Sorry.  - pj ]

Jay wrote:
> You probably can look at it this way: the accounting data being
> written out by BSD are per process data and the fork connector
> provides information needed to group processes into process
> aggregates.

I guess so.  Though that doesn't provide any explicit guidance as to
what the necessary dataflow must be -- who (which essential piece(s) of
software) needs the data when, to accomplish what purposes that some
Linux users will desire.

Well, maybe to someone expert in Process Aggregates, it provides
such guidance, implicitly.  That's definitely not I.

  Let me step back a minute here.

  What's needed to is work from the actual user requirements down
  to what technical pieces are needed.  There's an old saying
  that if you want something done bad enough, do it yourself.

  Or, on usenet and now on mailing lists, this has become:
  if you want something done, post a sufficiently botched
  example yourself, and someone who actually knows will become
  sufficiently annoyed to post a useful answer.

  So here goes my botched effort to work from user requirements
  down to actual technical pieces needed.  I look forward to
  being shot down in flames.

My current understanding of the 'system accounting' requirement
is that users of large shared resource servers want to determine,
after the fact, what was the usage by or for various
tasks/jobs/users/groups/time-periods of various compute
resources, in order to perform such tasks as billing and sizing
of future equipment needs, and to identify patterns of over or
under utilized system resources that might present other
opportunities for useful action, or causes for remedial action.

I am working under the assumption that there is some accounting
(of computer users and resources, not of money ;) software
(runnacct, CSA, and ELSA, for example) that runs, after the fact
in some post-processing mode, that reads records of actual usage
details from disk files and does useful stuff (like generate
reports useful to the above requirements) with what it can glean
from those records and from other configuration information
it can find about the current system (by reading other disk
files, typically).  This processing can be and often is done
in batch mode, and is often scheduled out of a cron job for some
time when the system is normally under relatively lighter load,
such as late at night.

I assume that the information needed by this accounting software
includes both the classic BSD accounting records and the
 information at fork.

I am not aware of any other uses of the 
information from fork, though it would not surprise me to learn
that there other such uses - you're welcome to educate me on
this matter.

I suspect that there is other information, or will be, in
addition to the specific details collected by the classic bsd
accounting kernel hooks, and in addition to the  information at fork, which will also be needed by CSA
and/or ELSA, and which also needs to be written to disk files
as the data is collected, for subsequent processing by such
accounting software as CSA and ELSA, or the classic runacct(1M)
daily accounting software and variants.

If the above is all true, then the basic problem to solve
regarding the  information collected at
fork is how to get it into a disk file, with close to minimum
impact on the system.

Since the data is not needed in anything like realtime (or
if it is, I don't realize that yet) therefore there is an
opportunity to combine the data records into buffers of data,
so as to amortize some of the costs of writing the data to
disk over several records.  The classic bsd accounting hooks
do this merging aggressively, in the context of the process
doing the exit.

The classic accounting hooks may have a problem that they are not
NUMA friendly - having all the nodes in a big system trying to
simultaneously add small (64 bytes, typically) snippets to the
same shared file buffers at the same time might not scale well.
These hooks were designed over 25 years ago, when multiprocessing
was in its infancy, and may need overhaul.

The fork_connector mechanism is being proposed to get the
particular bit of information  from
fork moved to what I presume is a data collector daemon user
process, which will I presume then write merged records of
this data to disk.  This may have the problem that it moves
the individual records between various contexts on the system,
more than is necessary, before it can be merged into buffers
and written.  While such data motion does not happen inline
to the fork itself, it still has to occur in near realtime
(minutes) of the fork event, so still impacts system performance
(both CPU cycles and memory footprint) during peak usage hours.
Performance impact numbers have been 

[PATCH 2.6.12-rc1-mm3] m32r: m32r_sio driver update (was Re: [PATCH] Re: Bitrotting serial drivers)

2005-03-29 Thread Hirokazu Takata
Hello, 

Here is an additional patch to update m32r_sio driver.
This patch is against 2.6.12-rc1-mm3.

m32r_sio driver updates:
- Move m32r_sio specific description from asm-m32r/serial.h to 
  driver/serial/m32r_sio.c.
- Remove __register_m32r_sio, register_m32r_sio and unregister_m32r_sio
  from driver/serial/m32r_sio.c.

Thank you.

From: Russell King <[EMAIL PROTECTED]>
Subject: Re: [PATCH] Re: Bitrotting serial drivers
Date: Thu, 24 Mar 2005 12:17:46 +
> On Thu, Mar 24, 2005 at 07:14:24PM +0900, Hirokazu Takata wrote:
> > diff -ruNp a/include/asm-m32r/serial.h b/include/asm-m32r/serial.h
> > --- a/include/asm-m32r/serial.h 2004-12-25 06:35:40.0 +0900
> > +++ b/include/asm-m32r/serial.h 2005-03-24 17:25:05.812651363 +0900
> 
> Can m32r accept PCMCIA cards?  If so, this may mean that 8250.c gets
> built, which will use this file to determine where it should look for
> built-in 8250 ports.
> 
> If this file is used to describe non-8250 compatible ports, you could
> end up with a nasty mess.  Therefore, I recommend that you do not use
> asm-m32r/serial.h to describe your SIO ports.
> 
> Instead, since these definitions are private to your own driver, you
> may consider moving them into the driver, or a header file closely
> associated with your driver in drivers/serial.


Signed-off-by: Hirokazu Takata <[EMAIL PROTECTED]>
---

 drivers/serial/m32r_sio.c |  131 ++
 include/asm-m32r/serial.h |   41 --
 2 files changed, 31 insertions(+), 141 deletions(-)


diff -ruNp a/include/asm-m32r/serial.h b/include/asm-m32r/serial.h
--- a/include/asm-m32r/serial.h 2005-03-29 21:47:12.912822762 +0900
+++ b/include/asm-m32r/serial.h 2005-03-29 18:15:37.0 +0900
@@ -1,47 +1,10 @@
 #ifndef _ASM_M32R_SERIAL_H
 #define _ASM_M32R_SERIAL_H
 
-/*
- * include/asm-m32r/serial.h
- */
+/* include/asm-m32r/serial.h */
 
 #include 
-#include 
 
-/*
- * This assumes you have a 1.8432 MHz clock for your UART.
- *
- * It'd be nice if someone built a serial card with a 24.576 MHz
- * clock, since the 16550A is capable of handling a top speed of 1.5
- * megabits/second; but this requires the faster clock.
- */
-#define BASE_BAUD ( 1843200 / 16 )
-
-/* Standard COM flags */
-#define STD_COM_FLAGS (ASYNC_BOOT_AUTOCONF | ASYNC_SKIP_TEST)
-
-/* Standard PORT definitions */
-#if defined(CONFIG_PLAT_USRV)
-
-#define STD_SERIAL_PORT_DEFNS  \
-   /* UART  CLK PORT   IRQFLAGS */ \
-   { 0, BASE_BAUD, 0x3F8, PLD_IRQ_UART0, STD_COM_FLAGS }, /* ttyS0 */ \
-   { 0, BASE_BAUD, 0x2F8, PLD_IRQ_UART1, STD_COM_FLAGS }, /* ttyS1 */
-
-#else /* !CONFIG_PLAT_USRV */
-
-#if defined(CONFIG_SERIAL_M32R_PLDSIO)
-#define STD_SERIAL_PORT_DEFNS  \
-   { 0, BASE_BAUD, ((unsigned long)PLD_ESIO0CR), PLD_IRQ_SIO0_RCV, \
- STD_COM_FLAGS }, /* ttyS0 */
-#else
-#define STD_SERIAL_PORT_DEFNS  \
-   { 0, BASE_BAUD, M32R_SIO_OFFSET, M32R_IRQ_SIO0_R,   \
- STD_COM_FLAGS }, /* ttyS0 */
-#endif
-
-#endif /* !CONFIG_PLAT_USRV */
-
-#define SERIAL_PORT_DFNS   STD_SERIAL_PORT_DEFNS
+#define BASE_BAUD  115200
 
 #endif  /* _ASM_M32R_SERIAL_H */
diff -ruNp a/drivers/serial/m32r_sio.c b/drivers/serial/m32r_sio.c
--- a/drivers/serial/m32r_sio.c 2005-03-29 21:47:12.924820913 +0900
+++ b/drivers/serial/m32r_sio.c 2005-03-29 21:56:38.001930365 +0900
@@ -54,13 +54,6 @@
 #include "m32r_sio_reg.h"
 
 /*
- * Configuration:
- *   share_irqs - whether we pass SA_SHIRQ to request_irq().  This option
- *is unsafe when used on edge-triggered interrupts.
- */
-unsigned int share_irqs_sio = M32R_SIO_SHARE_IRQS;
-
-/*
  * Debugging.
  */
 #if 0
@@ -86,15 +79,36 @@ unsigned int share_irqs_sio = M32R_SIO_S
 
 #include 
 
+/* Standard COM flags */
+#define STD_COM_FLAGS (ASYNC_BOOT_AUTOCONF | ASYNC_SKIP_TEST)
+
 /*
  * SERIAL_PORT_DFNS tells us about built-in ports that have no
  * standard enumeration mechanism.   Platforms that can find all
  * serial ports via mechanisms like ACPI or PCI need not supply it.
  */
-#ifndef SERIAL_PORT_DFNS
-#define SERIAL_PORT_DFNS
+#undef SERIAL_PORT_DFNS
+#if defined(CONFIG_PLAT_USRV)
+
+#define SERIAL_PORT_DFNS   \
+   /* UART  CLK PORT   IRQFLAGS */ \
+   { 0, BASE_BAUD, 0x3F8, PLD_IRQ_UART0, STD_COM_FLAGS }, /* ttyS0 */ \
+   { 0, BASE_BAUD, 0x2F8, PLD_IRQ_UART1, STD_COM_FLAGS }, /* ttyS1 */
+
+#else /* !CONFIG_PLAT_USRV */
+
+#if defined(CONFIG_SERIAL_M32R_PLDSIO)
+#define SERIAL_PORT_DFNS   \
+   { 0, BASE_BAUD, ((unsigned long)PLD_ESIO0CR), PLD_IRQ_SIO0_RCV, \
+ STD_COM_FLAGS }, /* ttyS0 */
+#else
+#define SERIAL_PORT_DFNS   \
+   { 0, BASE_BAUD, M32R_SIO_OFFSET, 

[PATCH 2.6.12-rc1] m32r: Fix spinlock.h for CONFIG_DEBUG_SPINLOCK

2005-03-29 Thread Hirokazu Takata
This patch is for fixing a build error of asm-m32r/spinlock.h
for CONFIG_DEBUG_SPINLOCK.
Please apply.

Thanks,

Signed-off-by: Hirokazu Takata <[EMAIL PROTECTED]>
---

 include/asm-m32r/spinlock.h |6 ++
 1 files changed, 2 insertions(+), 4 deletions(-)

diff -ruNp a/include/asm-m32r/spinlock.h b/include/asm-m32r/spinlock.h
--- a/include/asm-m32r/spinlock.h   2005-03-07 14:10:57.0 +0900
+++ b/include/asm-m32r/spinlock.h   2005-03-08 14:08:57.0 +0900
@@ -102,10 +102,8 @@ static inline void _raw_spin_lock(spinlo
unsigned long tmp0, tmp1;
 
 #ifdef CONFIG_DEBUG_SPINLOCK
-   __label__ here;
-here:
-   if (lock->magic != SPINLOCK_MAGIC) {
-   printk("pc: %p\n", &);
+   if (unlikely(lock->magic != SPINLOCK_MAGIC)) {
+   printk("pc: %p\n", __builtin_return_address(0));
BUG();
}
 #endif

--
Hirokazu Takata <[EMAIL PROTECTED]>
Linux/M32R Project:  http://www.linux-m32r.org/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] use cheaper elv_queue_empty when unplug a device

2005-03-29 Thread Nick Piggin
Nick Piggin wrote:
Jens Axboe wrote:
Looks good, I've been toying with something very similar for a long time
myself.
Here is another thing I just noticed that should further reduce the
locking by at least 1, sometimes 2 lock/unlock pairs per request.
At the cost of uglifying the code somewhat. Although it is pretty
nicely contained, so Jens you might consider it acceptable as is,
or we could investigate how to make it nicer if Kenneth reports some
improvement.
Note, this isn't runtime tested - it could easily have a bug.
OK - I have booted this on a 4-way SMP with SCSI disks, and done
some IO tests, and no hangs.
So Kenneth if you could look into this one as well, to see if
it is worthwhile, that would be great.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Industry db benchmark result on recent 2.6 kernels

2005-03-29 Thread Linus Torvalds


On Tue, 29 Mar 2005, Chen, Kenneth W wrote:
>
> Linus Torvalds wrote on Tuesday, March 29, 2005 4:00 PM
> > The fact that it seems to fluctuate pretty wildly makes me wonder
> > how stable the numbers are.
> 
> I can't resist myself from bragging. The high point in the fluctuation
> might be because someone is working hard trying to make 2.6 kernel run
> faster.  Hint hint hint .  ;-)

Heh. How do you explain the low-point? If there's somebody out there 
working hard on making it run slower, I want to whack the guy ;)

Good luck with the million-dollar grants, btw. We're all rooting for you, 
and hope your manager is a total push-over.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: drivers/net/at1700.c: at1700_probe1: array overflow

2005-03-29 Thread null
On Fri, 25 Mar 2005, Adrian Bunk wrote:

> Date: Fri, 25 Mar 2005 21:38:20 +0100
> From: Adrian Bunk <[EMAIL PROTECTED]>
> To: Roland Dreier <[EMAIL PROTECTED]>
> Cc: [EMAIL PROTECTED], linux-net@vger.kernel.org,
>  linux-kernel@vger.kernel.org
> Subject: Re: drivers/net/at1700.c: at1700_probe1: array overflow
>
> On Fri, Mar 25, 2005 at 10:42:11AM -0800, Roland Dreier wrote:
> > Adrian> This can result in indexing in an array with 8 entries the
> > Adrian> 10th entry.
> >
> > Well, not really, since the first 8 entries of the array have every
> > 3-bit pattern.  So pos3 & 0x07 will always match one of them.
> >
> > I agree it would be cleaner to make the loop only go up to 7 though.
>
> You either have this (impossible) overflow, or the case l_i == 7 isn't
> tested explicitely.
>
> I'd say simply leave it as it is now.
>
> But if noone disagrees, I'm inclined to add a comment.
>
> >  - R.
>
> cu
> Adrian
>

But on the other hand why loop if you don't have to?

static int at1700_ioaddr_pattern[] __initdata = {
- 0x00, 0x04, 0x01, 0x05, 0x02, 0x06, 0x03, 0x07
+ 0x00, 0x02, 0x04, 0x06, 0x01, 0x03, 0x05, 0x07
};
...

static int __init at1700_probe1(struct net_device *dev, int ioaddr)
{
...
-   for (l_i = 0; l_i < 0x09; l_i++)
-   if (( pos3 & 0x07) == at1700_ioaddr_pattern[l_i])
-   break;
-   ioaddr = at1700_mca_probe_list[l_i];
+   ioaddr = at1700_mca_probe_list[at1700_ioaddr_pattern[pos3&7]];
...
}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Industry db benchmark result on recent 2.6 kernels

2005-03-29 Thread Chen, Kenneth W
Linus Torvalds wrote on Tuesday, March 29, 2005 4:00 PM
> The fact that it seems to fluctuate pretty wildly makes me wonder
> how stable the numbers are.

I can't resist myself from bragging. The high point in the fluctuation
might be because someone is working hard trying to make 2.6 kernel run
faster.  Hint hint hint .  ;-)


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: i386/x86_64 segment register issuses (Re: PATCH: Fix x86 segment register access)

2005-03-29 Thread Linus Torvalds


On Mon, 28 Mar 2005, Andi Kleen wrote:
>
> "H. J. Lu" <[EMAIL PROTECTED]> writes:
> > The new assembler will disallow them since those instructions with
> > memory operand will only use the first 16bits. If the memory operand
> > is 16bit, you won't see any problems. But if the memory destinatin
> > is 32bit, the upper 16bits may have random values. The new assembler
> 
> Does it really have random values on existing x86 hardware?

The upper bits are not written at all, so it's not random.

> If it is a only a "theoretical" problem that does not happen
> in practice I would advise to not do the change.

My preference too. The reason we use "movl" is because we really do want 
the 32-bit versions, since they are faster. It's a conscious choice. In 
contrast "movw" generates bigger and slower code on all assemblers out 
there, and "mov" doesn't make it clear which one it is. Is it the slow 
one, or the fast one? 

For example, "mov %ds,%eax" does seem to generate the (faster) 32-bit code
on modern assemblers, while "mov %ds,%ax" generates (slower) 16-bit code 
that leaves the high bits of %eax untouched. Sometimes you may want the 
slower one, sometimes the faster one. I have this pretty strong memory of 
old versions of gas not making any difference between %ax and %eax as a 
target, and that you really needed to set the size explicitly.

Now, those versions of gas may be so old that nobody cares, but the
explicit size still is a GOOD THING. The size DOES MATTER. People who want
the smaller and faster version do not want to just rely on gas
automatically getting it right, especially since gas has historically been
very very bad at getting things right.

What is the advantage of not allowing "movl %ds,mem"? Really? Especially
since I suspect the kernel is pretty much the only one who does this, and
the kernel really does do it on purpose. The kernel explicitly wants the
32-bit version, knowing that the upper bits are undefined.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How to measure time accurately.

2005-03-29 Thread Chris Friesen
Peter Chubb wrote:
"Chris" == Chris Friesen <[EMAIL PROTECTED]> writes:

Chris> Most cpus have some way of getting at a counter or decrementer
Chris> of various frequencies.  Usually it requires low-level hardware
Chris> knowledge and often it needs assembly code.
As a device driver is inside the linux kernel (unless you're writein a
user-mode device driver :-)) you can use the getcycles() macro that's
defined for most architectures.  It provides a snapshot of the
cycle-counter.
For ppc this only gives 32-bit values, which overflow every 129 seconds 
on my G5.  Depending on how long you're trying to time, this could be a 
problem.

Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Industry db benchmark result on recent 2.6 kernels

2005-03-29 Thread Chen, Kenneth W
On Mon, 28 Mar 2005, Chen, Kenneth W wrote:
> With that said, here goes our first data point along with some historical data
> we have collected so far.
>
> 2.6.11-13%
> 2.6.9 - 6%
> 2.6.8 -23%
> 2.6.2 - 1%
> baseline  (rhel3)

Linus Torvalds wrote on Tuesday, March 29, 2005 4:00 PM
> How repeatable are the numbers across reboots with the same kernel? Some
> benchmarks will depend heavily on just where things land in memory,
> especially with things like PAE or even just cache behaviour (ie if some
> frequenly-used page needs to be kmap'ped or not depending on where it
> landed).

Very repeatable.  This workload is very steady and resolution in throughput
is repeatable down to 0.1%.  We toss everything below that level as noise.


> You don't have the PAE issue on ia64, but there could be other issues.
> Some of them just disk-layout issues or similar, ie performance might
> change depending on where on the disk the data is written in relationship
> to where most of the reads come from etc etc. The fact that it seems to
> fluctuate pretty wildly makes me wonder how stable the numbers are.

This workload has been around for 10+ years and people at Intel studied the
characteristics of this workload inside out for 10+ years.  Every stones will
be turned at least more than once while we tune the entire setup making sure
everything is well balanced.  And we tune the system whenever there is a
hardware change.  Data layout on the disk spindle are very well balanced.


> Also, it would be absolutely wonderful to see a finer granularity (which
> would likely also answer the stability question of the numbers). If you
> can do this with the daily snapshots, that would be great. If it's not
> easily automatable, or if a run takes a long time, maybe every other or
> every third day would be possible?

I sure will make my management know that Linus wants to see the performance
number on a daily bases (I will ask for a couple of million dollar to my
manager for this project :-))


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   >