Re: [patch] CFS scheduler, -v6

2007-04-29 Thread Ingo Molnar

* Willy Tarreau <[EMAIL PROTECTED]> wrote:

> I don't know if Mike still has problems with SD, but there are now 
> several interesting reports of SD giving better feedback than CFS on 
> real work. In my experience, CFS seems smoother on *technical* tests, 
> which I agree that they do not really simulate real work.

well, there are several reports of CFS being significantly better than 
SD on a number of workloads - and i know of only two reports where SD 
was reported to be better than CFS: in Kasper's test (where i'd like to 
know what the "3D stuff" he uses is and take a good look at that 
workload), and another 3D report which was done against -v6. (And even 
in these two reports the 'smoothness advantage' was not dramatic. If you 
know of any other reports then please let me know!)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v7

2007-04-29 Thread Ingo Molnar

* S.Çağlar Onur <[EMAIL PROTECTED]> wrote:

> Ingo, please ignore my first report until i found a proper way to 
> reproduce the slowness cause currently CFS-v7, CFS-v7 + "renice 
> patch", CFS-v7 + renice + your private mail suggestions and CFS-v6 + 
> "PI support for futexes patch" seems works equally (which is a good 
> thing so X renicing seems really not needed, [...]

oh, good!

> [...] and there were no regression instead of my daydreams) or im too 
> tired to understand the differences.

could the CPU have dropped speed for that bootup (some CPUs do that 
automatically upon overheating), or perhaps if you are using some RAID 
array, could it have done a background resync? Especially the bootup 
slowdown you saw seemed significant, and because bootup speed is 90% IO 
dominated, the CPU scheduler seems an unlikely candidate.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.21-git2] sk_buff changes break Cisco VPN client

2007-04-29 Thread Alessandro Suardi

On 4/29/07, David Miller <[EMAIL PROTECTED]> wrote:

From: Roland Dreier <[EMAIL PROTECTED]>
Date: Sat, 28 Apr 2007 14:05:27 -0700

> However I can suggest vpnc (http://www.unix-ag.uni-kl.de/~massar/vpnc/)
> as an alternative.  I'm not forced to use Cisco VPN access any more,
> but when I tried it, vpnc was tons better than the Cisco product.

Also, and I know this might be a COMPLETE SHOCK to some people, but we
do have a full in-kernel IPSEC stack and using it with openswan to
connect to VPNs works perfectly fine.

I use it every day.

It's quite amusing that people use a userland IPSEC implementation
via VPNC, in spite of this.



Have a look here

https://lists.dulug.duke.edu/pipermail/dulug/2007-March/010792.html

where someone seems to be in my same boat. No, openswan/IPSEC
does not work in all configurations with Cisco VPN concentrators -
and by Murphy's law, I'm in the non-working configuration.

Hope this clarifies the reason I asked. And if anyone out there
is in doubt about it, I absolutely hate having to rebuild an out
of kernel module every time I build a kernel, crossing fingers
it doesn't break (again). I would use *anything* else.

--alessandro

"Did you get married but forgot to get divorced ?"

(Danny and Dusty, 'The Good Old Days')
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: High Resolution Timer DOS

2007-04-29 Thread Ingo Molnar

* Lee Revell <[EMAIL PROTECTED]> wrote:

> > Well, it is not really a DoS. The rescheduling of the process is 
> > limited by the scheduler and the available CPU time (depending on 
> > the number of runnable tasks in the system).
> 
> Shouldn't an unprivileged process be rate limited somehow to avoid 
> flooding the machine with interrupts?  We restrict nonroot users from 
> setting the RTC interrupt rate higher than 64Hz for a similar reason 
> (granted, this limit dates back to the 486 days and should probably be 
> increased to 1024 Hz).

No. An interrupt in this case is really just 'CPU time used up', and an 
unprivileged process can take up as much CPU time as the scheduler 
allows. So it's _not_ a DoS, and neither is any other unprivileged 
infinit loop (or high-rate context-switching task) a DoS.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v6

2007-04-29 Thread Willy Tarreau
On Sun, Apr 29, 2007 at 08:59:01AM +0200, Ingo Molnar wrote:
> 
> * Willy Tarreau <[EMAIL PROTECTED]> wrote:
> 
> > I don't know if Mike still has problems with SD, but there are now 
> > several interesting reports of SD giving better feedback than CFS on 
> > real work. In my experience, CFS seems smoother on *technical* tests, 
> > which I agree that they do not really simulate real work.
> 
> well, there are several reports of CFS being significantly better than 
> SD on a number of workloads - and i know of only two reports where SD 
> was reported to be better than CFS: in Kasper's test (where i'd like to 
> know what the "3D stuff" he uses is and take a good look at that 
> workload), and another 3D report which was done against -v6. (And even 
> in these two reports the 'smoothness advantage' was not dramatic. If you 
> know of any other reports then please let me know!)

There was Caglar Onur too but he said he will redo all the tests. I'm
not tracking all tests nor versions, so it might be possible that some
of the differences vanish with v7.

In fact, what I'd like to see in 2.6.22 is something better for everybody
and with *no* regression, even if it's not perfect. I had the feeling
that SD matched that goal right now, except for Mike who has not tested
recent versions. Don't get me wrong, I still think that CFS is a more
interesting long-term target. But it may require more time to satisfy
everyone. At least with one of them in 2.6.22, we won't waste time
comparing to current mainline.

>   Ingo

Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.21

2007-04-29 Thread David Lang

On Sat, 28 Apr 2007, David Miller wrote:



From: "Markus Rechberger" <[EMAIL PROTECTED]>
Date: Sun, 29 Apr 2007 00:58:09 +0200


On 4/29/07, Linus Torvalds <[EMAIL PROTECTED]> wrote:



On Sat, 28 Apr 2007, Adrian Bunk wrote:


We are already quite good at ignoring bug reports that come through
linux-kernel, and it's an _advantage_ of the kernel Bugzilla to see more
than 1600 open bugs because this tells how bad we are at handling bugs.


No, it just shows that bugzilla doesn't matter for most of the kernel.

Don't say that "bugzilla tells how bad we are at handling bugs". It tells
how bad *bugzilla* is for handling bugs, nothing more.



I totally disagree here, bugzilla is a very good tool.


No, Bugzilla really does suck, and I personally refuse to use it when
I have a choice.  And guess what?  You better be concerned about that
because I maintain all of the networking code :-)

It puts the onus FAR too much on the developer and not enough on the
reporter and other minions.  We have a small resource of developers,
yet lots of users, bug reporters, and minions, so something that
doesn't take advantage of the larger resource we have is going to
not function efficiently at all.  Yet that is what bugzilla does.


I'll say that as a user I hate having to deal with bugzilla.

there's nothing more frustrating then spending a good chunk of time trying to 
find a similar bug, then jumping through all the bugzilla hoops to file a report 
to eventually (days/weeks later) get a message 'closed becouse it's a duplicate 
report), then have to go and track down what it's a duplicate of, read through 
that bug report, only to find that it's not solved there either, and to top it 
off, the people working on that bug won't see my report or that I'm available to 
troubleshoot it.


from a user poit of view, e-mailing the kernel list (retrying a few days later 
of there is no response) tends to work _much_ better.


David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage

2007-04-29 Thread Jeremy Fitzhardinge
Eric W. Biederman wrote:
> All it does is set a flag that tells a bootloader.
> "Hey. I can run when loaded a non-default address, and this is what
>  you have to align me to."
>
> All relocation processing happens in the kernel itself.
>   

Is it possible to decompress and extract the kernel image from the
bzImage without executing it?  Ie, is there enough information to find
the compressed data part of the bzImage by inspection?

At some point we'll need to change the Xen domain builder to handle
bzImage files, and it would be best if we didn't need to run them.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Fwd: Re: [RFC] pata_icside driver

2007-04-29 Thread Russell King
Resend.  Copying linux-ide as requested appears to result in being
ignored. ;(

- Forwarded message from Russell King <[EMAIL PROTECTED]> -

Date:   Sat, 21 Apr 2007 16:09:03 +0100
From:   Russell King <[EMAIL PROTECTED]>
To: linux-kernel@vger.kernel.org,
Andrew Morton <[EMAIL PROTECTED]>,
Jeff Garzik <[EMAIL PROTECTED]>, [EMAIL PROTECTED]
Subject: Re: [RFC] pata_icside driver

On Sun, Apr 08, 2007 at 11:18:26AM +0100, Russell King wrote:
> Below is an initial attempt at converting the ICS IDE driver to fit
> into the PATA infrastructure.
> 
> There's a number of FIXMEs in there: due to the hardware missing
> resistors on the interrupt signals from the drives, a port
> without any drives attached results in spurious interrupts being
> generated.
> 
> To prevent this, we need to disable the interrupts from the port
> on the card if no drives are found, but unfortunately ATA doesn't
> call the "port_disable" method in this circumstance.

Here's an updated version.  I've removed the correction of the cycle
time - since we're checking whether all of active, recovery and cycle
periods fit the hardware, the correction becomes unnecessary.

I still suggest that the PATA core folk consider fixing their timing
calculation function in that respect though.

This driver continues to have the so far ignored issue concerning
port_disable.  It would be good to have some feedback on this instead
of this driver continuing to be crippled by the libata core code.  This
really needs resolving before this driver can be merged, though I'm not
sure how.

diff --git a/drivers/ata/Kconfig b/drivers/ata/Kconfig
index 7bdbe5a..9cd8a61 100644
--- a/drivers/ata/Kconfig
+++ b/drivers/ata/Kconfig
@@ -552,6 +552,14 @@ config PATA_PLATFORM
 
  If unsure, say N.
 
+config PATA_ICSIDE
+   tristate "Acorn ICS PATA support"
+   depends on ARM && ARCH_ACORN
+   help
+ On Acorn systems, say Y here if you wish to use the ICS PATA
+ interface card.  This is not required for ICS partition support.
+ If you are unsure, say N to this.
+
 config PATA_IXP4XX_CF
tristate "IXP4XX Compact Flash support"
depends on ARCH_IXP4XX
diff --git a/drivers/ata/Makefile b/drivers/ata/Makefile
index 13d7397..cc8798b 100644
--- a/drivers/ata/Makefile
+++ b/drivers/ata/Makefile
@@ -61,6 +61,7 @@ obj-$(CONFIG_PATA_TRIFLEX)+= pata_triflex.o
 obj-$(CONFIG_PATA_IXP4XX_CF)   += pata_ixp4xx_cf.o
 obj-$(CONFIG_PATA_SCC) += pata_scc.o
 obj-$(CONFIG_PATA_PLATFORM)+= pata_platform.o
+obj-$(CONFIG_PATA_ICSIDE)  += pata_icside.o
 # Should be last but one libata driver
 obj-$(CONFIG_ATA_GENERIC)  += ata_generic.o
 # Should be last libata driver
diff --git a/drivers/ata/pata_icside.c b/drivers/ata/pata_icside.c
new file mode 100644
index 000..75b22da
--- /dev/null
+++ b/drivers/ata/pata_icside.c
@@ -0,0 +1,686 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#define DRV_NAME   "pata_icside"
+
+#define ICS_IDENT_OFFSET   0x2280
+
+#define ICS_ARCIN_V5_INTRSTAT  0x
+#define ICS_ARCIN_V5_INTROFFSET0x0004
+
+#define ICS_ARCIN_V6_INTROFFSET_1  0x2200
+#define ICS_ARCIN_V6_INTRSTAT_10x2290
+#define ICS_ARCIN_V6_INTROFFSET_2  0x3200
+#define ICS_ARCIN_V6_INTRSTAT_20x3290
+
+struct portinfo {
+   unsigned int dataoffset;
+   unsigned int ctrloffset;
+   unsigned int stepping;
+};
+
+static const struct portinfo pata_icside_portinfo_v5 = {
+   .dataoffset = 0x2800,
+   .ctrloffset = 0x2b80,
+   .stepping   = 6,
+};
+
+static const struct portinfo pata_icside_portinfo_v6_1 = {
+   .dataoffset = 0x2000,
+   .ctrloffset = 0x2380,
+   .stepping   = 6,
+};
+
+static const struct portinfo pata_icside_portinfo_v6_2 = {
+   .dataoffset = 0x3000,
+   .ctrloffset = 0x3380,
+   .stepping   = 6,
+};
+
+#define PATA_ICSIDE_MAX_SG 128
+
+struct pata_icside_state {
+   void __iomem *irq_port;
+   void __iomem *ioc_base;
+   unsigned int type;
+   unsigned int dma;
+   struct {
+   u8 port_sel;
+   u8 disabled;
+   unsigned int speed[ATA_MAX_DEVICES];
+   } port[2];
+   struct scatterlist sg[PATA_ICSIDE_MAX_SG];
+};
+
+#define ICS_TYPE_A3IN  0
+#define ICS_TYPE_A3USER1
+#define ICS_TYPE_V63
+#define ICS_TYPE_V515
+#define ICS_TYPE_NOTYPE((unsigned int)-1)
+
+/*  Version 5 PCB Support Functions - */
+/* Prototype: pata_icside_irqenable_arcin_v5 (struct expansion_card *ec, int 
irqnr)
+ * Purpose  : enable interrupts from card
+ */
+static void pata_icside_irqenable_arcin_v5 (struct expansion_card *ec, int 
irqnr)
+{
+   struct pata_icside_state *state = ec->irq_data;
+
+   writeb(0, state->irq_port + ICS_ARCIN_V5_INTROFFSET);
+}
+
+/* Pro

Re: [patch] CFS scheduler, -v6

2007-04-29 Thread Ingo Molnar

* Willy Tarreau <[EMAIL PROTECTED]> wrote:

> > know of any other reports then please let me know!)
> 
> There was Caglar Onur too but he said he will redo all the tests. 
> [...]

well, Caglar said CFSv7 works as well as CFSv6 in his latest tests and 
that he'll redo all the tests to re-verify his original regression 
report :)

> In fact, what I'd like to see in 2.6.22 is something better for 
> everybody and with *no* regression, even if it's not perfect.
>
> I had the feeling that SD matched that goal right now, [...]

curious, which are the reports where in your opinion CFS behaves worse 
than vanilla? There were two audio skipping reports against CFS, the 
most serious one got resolved and i hope the other one has been resolved 
by the same fix as well. (i'm still waiting for feedback on that one)

> [...] except for Mike who has not tested recent versions. [...]

actually, dont discount Mark Lord's test results either. And it might be 
a good idea for Mike to re-test SD 0.46?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.21

2007-04-29 Thread Russell King
On Sun, Apr 29, 2007 at 12:49:04AM +0200, Adrian Bunk wrote:
> On Sat, Apr 28, 2007 at 09:27:01PM +0100, Russell King wrote:
> > On Sat, Apr 28, 2007 at 09:53:20PM +0200, Adrian Bunk wrote:
> > > We are already quite good at ignoring bug reports that come through 
> > > linux-kernel, and it's an _advantage_ of the kernel Bugzilla to see more 
> > > than 1600 open bugs because this tells how bad we are at handling bugs.
> > > How many thousand bug reports have been ignored during the same time on 
> > > linux-kernel?
> > 
> > However, look at this bug:
> > 
> >   http://bugme.osdl.org/show_bug.cgi?id=7760
> > 
> > It's outside my knowledge to be able to fix for various reasons:
> >...
> > I'm personally very tempted to close it as "won't fix" (I wish there was
> > a "can't fix" category.)
> >...
> 
> So this is a completely debugged bug in a well-maintained subsystem
> (no matter what the status in Bugzilla is).

You're being very optimistic.

I'm not sure where you get the idea that it's "completely debugged".
It isn't - I've no real idea what the problem is, let alone what the
solution might be.  I've only one guess based upon what is sane in
the kernel, and that isn't even based on the data provided in the
bug report.

-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] crypto: Use padlock.ko only as a module

2007-04-29 Thread Ingo Oeser
Hi Scott,

On Sunday 29 April 2007, Simon Arlott wrote:
> Ideally I'd just remove that module completely, all it does is 
> trigger the loading of the other two modules when modules are 
> used - so I'll submit a patch for that instead.

That's much better! 

When you force a feature to be a module on a kernel without 
module support, it will effectivly be disabled.

And if it is so simple to do the same in userspace like you suggest,
than that's much better.

Best Regards

Ingo Oeser
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.21

2007-04-29 Thread Russell King
On Sun, Apr 29, 2007 at 12:58:09AM +0200, Markus Rechberger wrote:
> I totally disagree here, bugzilla is a very good tool. If someone is
> too lazy to look at it it's his problem.

If you think so, try reading my email and responding constructively
on how the issues there can be resolved.

That email contains good examples where bugzilla fails, and bugs end
up sitting around for ages untouched.  And no, it's not because I'm
"lazy".

-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] reduce AER init error information

2007-04-29 Thread Zhang, Yanmin
PCI-Express AER support in kernel requires BIOS to provide _OSC support
to allow the AER Root port service driver to request for native control
of AER. If a root port supports AER capablity, but BIOS doesn't provide
_OSC for it, aerdriver will print many debug information to system console.
Below is a log example.

***Error information example
Evaluate _OSC Set fails. Status = 0x0005
Evaluate _OSC Set fails. Status = 0x0005
aer_init: AER service init fails - Run ACPI _OSC fails
aer: probe of :00:02.0:pcie01 failed with error 2
Evaluate _OSC Set fails. Status = 0x0005
Evaluate _OSC Set fails. Status = 0x0005
aer_init: AER service init fails - Run ACPI _OSC fails
aer: probe of :00:04.0:pcie01 failed with error 2
Evaluate _OSC Set fails. Status = 0x0005
Evaluate _OSC Set fails. Status = 0x0005
aer_init: AER service init fails - Run ACPI _OSC fails
aer: probe of :00:06.0:pcie01 failed with error 2
**End of Error information example**


As _OSC is an optional capability of BIOS, such error information looks
like overly-verbosed. The patch against kernel 2.6.21 changes it to just
print one line report messages if aerdriver fails to attach the root port
service device.

Below is an example of new output.
AER service couldn't init device :00:02.0:pcie01 - no _OSC support


Signed-off-by: Zhang Yanmin <[EMAIL PROTECTED]>

---

diff -Nraup linux-2.6.21/drivers/pci/pci-acpi.c 
linux-2.6.21_aer/drivers/pci/pci-acpi.c
--- linux-2.6.21/drivers/pci/pci-acpi.c 2007-02-05 02:44:54.0 +0800
+++ linux-2.6.21_aer/drivers/pci/pci-acpi.c 2007-04-30 22:03:08.0 
+0800
@@ -55,8 +55,6 @@ acpi_query_osc (
 
status = acpi_evaluate_object(handle, "_OSC", &input, &output);
if (ACPI_FAILURE (status)) {
-   printk(KERN_DEBUG  
-   "Evaluate _OSC Set fails. Status = 0x%04x\n", status);
*ret_status = status;
return status;
}
@@ -124,11 +122,9 @@ acpi_run_osc (
in_params[3].buffer.pointer = (u8 *)context;
 
status = acpi_evaluate_object(handle, "_OSC", &input, &output);
-   if (ACPI_FAILURE (status)) {
-   printk(KERN_DEBUG  
-   "Evaluate _OSC Set fails. Status = 0x%04x\n", status);
+   if (ACPI_FAILURE (status))
return status;
-   }
+
out_obj = output.pointer;
if (out_obj->type != ACPI_TYPE_BUFFER) {
printk(KERN_DEBUG  
diff -Nraup linux-2.6.21/drivers/pci/pcie/aer/aerdrv_acpi.c 
linux-2.6.21_aer/drivers/pci/pcie/aer/aerdrv_acpi.c
--- linux-2.6.21/drivers/pci/pcie/aer/aerdrv_acpi.c 2007-02-05 
02:44:54.0 +0800
+++ linux-2.6.21_aer/drivers/pci/pcie/aer/aerdrv_acpi.c 2007-04-30 
21:06:43.0 +0800
@@ -27,10 +27,9 @@
  * Invoked when PCIE bus loads AER service driver. To avoid conflict with
  * BIOS AER support requires BIOS to yield AER control to OS native driver.
  **/
-int aer_osc_setup(struct pci_dev *dev)
+acpi_status aer_osc_setup(struct pci_dev *dev)
 {
-   int retval = OSC_METHOD_RUN_SUCCESS;
-   acpi_status status;
+   acpi_status status = AE_NOT_FOUND;
acpi_handle handle = DEVICE_ACPI_HANDLE(&dev->dev);
struct pci_dev *pdev = dev;
struct pci_bus *parent;
@@ -51,18 +50,12 @@ int aer_osc_setup(struct pci_dev *dev)
}
 
if (!handle)
-   return OSC_METHOD_NOT_SUPPORTED;
+   return status;
 
pci_osc_support_set(OSC_EXT_PCI_CONFIG_SUPPORT);
status = pci_osc_control_set(handle, OSC_PCI_EXPRESS_AER_CONTROL |
OSC_PCI_EXPRESS_CAP_STRUCTURE_CONTROL);
-   if (ACPI_FAILURE(status)) {
-   if (status == AE_SUPPORT)
-   retval = OSC_METHOD_NOT_SUPPORTED;
-   else
-   retval = OSC_METHOD_RUN_FAILURE;
-   }
 
-   return retval;
+   return status;
 }
 
diff -Nraup linux-2.6.21/drivers/pci/pcie/aer/aerdrv_core.c 
linux-2.6.21_aer/drivers/pci/pcie/aer/aerdrv_core.c
--- linux-2.6.21/drivers/pci/pcie/aer/aerdrv_core.c 2007-02-05 
02:44:54.0 +0800
+++ linux-2.6.21_aer/drivers/pci/pcie/aer/aerdrv_core.c 2007-04-30 
22:33:16.0 +0800
@@ -733,19 +733,19 @@ void aer_delete_rootport(struct aer_rpc 
  **/
 int aer_init(struct pcie_device *dev)
 {
-   int status;
+   acpi_status status;
 
/* Run _OSC Method */
status = aer_osc_setup(dev->port);
 
-   if(status != OSC_METHOD_RUN_SUCCESS) {
-   printk(KERN_DEBUG "%s: AER service init fails - %s\n",
-   __FUNCTION__,
-   (status == OSC_METHOD_NOT_SUPPORTED) ?
-   "No ACPI _OSC support" : "Run ACPI _OSC fails");
+   if (ACPI_FAILURE(status)) {
+   printk(KERN_DEBUG "AER service couldn't init device %s - %s\n",
+   dev->device.bus_id,
+   (status == AE_SUPPORT || status == AE_NOT_FOUND

Re: [patch] CFS scheduler, -v6

2007-04-29 Thread Willy Tarreau
On Sun, Apr 29, 2007 at 09:30:30AM +0200, Ingo Molnar wrote:
> > In fact, what I'd like to see in 2.6.22 is something better for 
> > everybody and with *no* regression, even if it's not perfect.
> >
> > I had the feeling that SD matched that goal right now, [...]
> 
> curious, which are the reports where in your opinion CFS behaves worse 
> than vanilla?

see below :-)

> There were two audio skipping reports against CFS, the 
> most serious one got resolved and i hope the other one has been resolved 
> by the same fix as well. (i'm still waiting for feedback on that one)

your answer to your question above ;-)
Yes, we're all waiting for feedback. And I said I did not track the
versions involved, so it is possible that all previously encountered
regressions are fixed by now.

> > [...] except for Mike who has not tested recent versions. [...]
> 
> actually, dont discount Mark Lord's test results either. And it might be 
> a good idea for Mike to re-test SD 0.46?

In any case, it might be a good idea because Mike encountered a problem
that nobody could reproduce. It may come from hardware, scheduler design,
scheduler bug, or any other bug, but whatever the cause, it would be
interesting to conclude on it.

>   Ingo

Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 22/25] xen: xen-netfront: use skb.cb for storing private data

2007-04-29 Thread Jeremy Fitzhardinge
Herbert Xu wrote:
> BTW, the version I posted to you is missing the following line.
>   
> --- linux-2.6.20.i386/drivers/xen/core/skbuff.c   2007-04-28 
> 15:30:16.0 +1000
> +++ build-2.6.20.i386/drivers/xen/core/skbuff.c   2007-04-28 
> 15:30:52.0 +1000
> @@ -89,6 +89,7 @@
>   skb->h.raw = (unsigned char *)skb->nh.iph + 4*skb->nh.iph->ihl;
>   if (skb->h.raw >= skb->tail)
>   goto out;
> + skb->csum_start = skb->h.raw - skb->head;
>   switch (skb->nh.iph->protocol) {
>   case IPPROTO_TCP:
>   skb->csum_offset = offsetof(struct tcphdr, check);
>   

drivers/xen/core/skbuff.c?  What's that?

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [DOC] Fix wrong identifier name in Documentation/driver-model/devres.txt

2007-04-29 Thread Tejun Heo
Jeff Garzik wrote:
> Now that devres is in the kernel, I don't think I am the best person to
> merge these sort of patches.  Certainly I can, and I know the code from
> my original review and subsequent usage, but I think the patch is more
> appropriate for Greg, going through normal maintainership channels.
> 
> IOW, I think devres is too generic to be queued via libata-dev.git.
> 
> Tejun, comments?

I don't have problem either way.  If it's okay with Greg, I'll queue
future devres updates through Greg.

BTW, converting network drivers to devres is on my ever-growing todo
list and when those are done they will go through you, Jeff.  :-)

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: What's in infiniband.git for 2.6.22

2007-04-29 Thread Michael S. Tsirkin
> Quoting Roland Dreier <[EMAIL PROTECTED]>:
> Subject: Re: What's in infiniband.git for 2.6.22
> 
>  > What about the mthca patch to use separate HW queues for kernel 
> RC/UD/userspace RC?
> 
> right, I'll queue that up too.

I think you want to queue the following obvios bugix up as well:
http://www.openfabrics.org/git/?p=~vlad/ofed_1_2/.git;a=blob;f=kernel_patches/fixes/ipoib_crash_on_error.patch;hb=HEAD

-- 
MST
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v6

2007-04-29 Thread William Lee Irwin III
On Sun, Apr 29, 2007 at 09:16:27AM +0200, Willy Tarreau wrote:
> In fact, what I'd like to see in 2.6.22 is something better for everybody
> and with *no* regression, even if it's not perfect. I had the feeling
> that SD matched that goal right now, except for Mike who has not tested
> recent versions. Don't get me wrong, I still think that CFS is a more
> interesting long-term target. But it may require more time to satisfy
> everyone. At least with one of them in 2.6.22, we won't waste time
> comparing to current mainline.

I think it'd be a good idea to merge scheduler classes before changing
over the policy so future changes to policy have smaller code impact.
Basically, get scheduler classes going with the mainline scheduler.

There are other pieces that can be merged earlier, too, for instance,
the correction to the comment in init/main.c. Directed yields can
probably also go in as nops or -ENOSYS returns if not fully implemented,
though I suspect there shouldn't be much in the way of implementing them.
p->array vs. p->on_rq can be merged early too. Common code for rbtree-
based priority queues can be factored out of cfq, cfs, and hrtimers.
There are extensive /proc/ reporting changes, large chunks of which
could go in before the policy as well.

I'm camping in this weekend, so I'll see what I can eke out.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v6

2007-04-29 Thread Kasper Sandberg
On Sun, 2007-04-29 at 08:59 +0200, Ingo Molnar wrote:
> * Willy Tarreau <[EMAIL PROTECTED]> wrote:
> 
> > I don't know if Mike still has problems with SD, but there are now 
> > several interesting reports of SD giving better feedback than CFS on 
> > real work. In my experience, CFS seems smoother on *technical* tests, 
> > which I agree that they do not really simulate real work.
> 
> well, there are several reports of CFS being significantly better than 
> SD on a number of workloads - and i know of only two reports where SD 
> was reported to be better than CFS: in Kasper's test (where i'd like to 
> know what the "3D stuff" he uses is and take a good look at that 
> workload), and another 3D report which was done against -v6. (And even 
> in these two reports the 'smoothness advantage' was not dramatic. If you 
> know of any other reports then please let me know!)

I can tell you one thing, its not just me that has observed the
smoothness in 3d stuff, after i tried rsdl first i've had lots of people
try rsdl and subsequently sd because of the significant improvement in
smoothness, and they have all found the same results.

The stuff i have tested with in particular is unreal tournament 2004 and
world of warcraft through wine, both running opengl, and consuming all
the cpu time it can get.

and the thing that happens is simply that even when theres only that
process, sd is still smoother, but the significance is much larger once
just something starts, like if the mail client starts fetching mail, and
running some somewhat demanding stuff like spamasassin, the only way you
notice it is by the drop in fps, smoothness is 100% intact with SD
(ofcourse if you started HUGE load it probably would get so little cpu
it would stutter), but with every other scheduler you will notice
immediate and quite severe stuttering, in fact to many it will seem
intolerable.

I can tell you how I first noticed this, i was experimenting in ut2k4
with sd, and usually i always have to close my mail client, because when
spamasassin starts (nice 0), the game would stutter quite much, but when
i was playing i noticed some IO activity and work noises from my disk,
but that was all, no noticable stutter or problems with the 3d, but i
couldnt figure out why, i then discovered i had forgotten to close my
mail client which i previously ALWAYS have had to do.

If you have some ideas on how these problems might be fixed i'd surely
try fixes and stuff, or if you have some data you need me to collect to
better understand whats going on. But i suspect any somewhat demanding
3d application will do, and the difference is so staggering that when
you see it in effect, you cant miss it.

> 
>   Ingo
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6 patch] drivers/kvm/mmu.c: fix an if() condition

2007-04-29 Thread Avi Kivity

Adrian Bunk wrote:
It might have worked in this case since PT_PRESENT_MASK is 1, but let's 
express this correctly.


  


Applied, thanks.


--
error compiling committee.c: too many arguments to function

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH (v2)] crypto: Remove pointless padlock module

2007-04-29 Thread Simon Arlott

When this is compiled in it is run too early to do anything useful:
[6.052000] padlock: No VIA PadLock drivers have been loaded.
[6.052000] padlock: Using VIA PadLock ACE for AES algorithm.
[6.052000] padlock: Using VIA PadLock ACE for SHA1/SHA256 algorithms.

When it's a module it isn't doing anything special, the same functionality 
can be provided in userspace by "probeall padlock padlock-aes padlock-sha" 
in modules.conf if it is required.


Signed-off-by: Simon Arlott <[EMAIL PROTECTED]>
Cc: Herbert Xu <[EMAIL PROTECTED]>
Cc: Michal Ludvig <[EMAIL PROTECTED]>
---
On 29/04/07 02:59, Randy Dunlap wrote:

Simon Arlott wrote:

+depends on CRYPTO && X86_32


All of drivers/crypto/Kconfig already depends on CRYPTO, so just
depends on X86_32
should be enough.


Ok, I've changed this for geode too.

On 29/04/07 08:28, Ingo Oeser wrote:

On Sunday 29 April 2007, Simon Arlott wrote:
> Ideally I'd just remove that module completely, all it does is 
> trigger the loading of the other two modules when modules are 
> used - so I'll submit a patch for that instead.


That's much better! 

When you force a feature to be a module on a kernel without 
module support, it will effectivly be disabled.


Well that's mostly the point - it shouldn't get compiled in - ever, 
but it also has other modules depending on it in Kconfig that 
shouldn't need to be modules.


drivers/crypto/Kconfig   |   16 ++--
drivers/crypto/Makefile  |1 -
drivers/crypto/padlock.c |   58 --
3 files changed, 3 insertions(+), 72 deletions(-)
delete mode 100644 drivers/crypto/padlock.c

diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
index ff8c4be..f21fe66 100644
--- a/drivers/crypto/Kconfig
+++ b/drivers/crypto/Kconfig
@@ -1,10 +1,10 @@
menu "Hardware crypto devices"

config CRYPTO_DEV_PADLOCK
-   tristate "Support for VIA PadLock ACE"
+   bool "Support for VIA PadLock ACE"
depends on X86_32
select CRYPTO_ALGAPI
-   default m
+   default y
help
  Some VIA processors come with an integrated crypto engine
  (so called VIA PadLock ACE, Advanced Cryptography Engine)
@@ -14,16 +14,6 @@ config CRYPTO_DEV_PADLOCK
  The instructions are used only when the CPU supports them.
  Otherwise software encryption is used.

- Selecting M for this option will compile a helper module
- padlock.ko that should autoload all below configured
- algorithms. Don't worry if your hardware does not support
- some or all of them. In such case padlock.ko will
- simply write a single line into the kernel log informing
- about its failure but everything will keep working fine.
-
- If you are unsure, say M. The compiled module will be
- called padlock.ko
-
config CRYPTO_DEV_PADLOCK_AES
tristate "PadLock driver for AES algorithm"
depends on CRYPTO_DEV_PADLOCK
@@ -55,7 +45,7 @@ source "arch/s390/crypto/Kconfig"

config CRYPTO_DEV_GEODE
tristate "Support for the Geode LX AES engine"
-   depends on CRYPTO && X86_32 && PCI
+   depends on X86_32 && PCI
select CRYPTO_ALGAPI
select CRYPTO_BLKCIPHER
default m
diff --git a/drivers/crypto/Makefile b/drivers/crypto/Makefile
index 6059cf8..d070030 100644
--- a/drivers/crypto/Makefile
+++ b/drivers/crypto/Makefile
@@ -1,4 +1,3 @@
-obj-$(CONFIG_CRYPTO_DEV_PADLOCK) += padlock.o
obj-$(CONFIG_CRYPTO_DEV_PADLOCK_AES) += padlock-aes.o
obj-$(CONFIG_CRYPTO_DEV_PADLOCK_SHA) += padlock-sha.o
obj-$(CONFIG_CRYPTO_DEV_GEODE) += geode-aes.o
diff --git a/drivers/crypto/padlock.c b/drivers/crypto/padlock.c
deleted file mode 100644
index d6d7dd5..000
--- a/drivers/crypto/padlock.c
+++ /dev/null
@@ -1,58 +0,0 @@
-/*
- * Cryptographic API.
- *
- * Support for VIA PadLock hardware crypto engine.
- *
- * Copyright (c) 2006  Michal Ludvig <[EMAIL PROTECTED]>
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include "padlock.h"
-
-static int __init padlock_init(void)
-{
-   int success = 0;
-
-   if (crypto_has_cipher("aes-padlock", 0, 0))
-   success++;
-
-   if (crypto_has_hash("sha1-padlock", 0, 0))
-   success++;
-
-   if (crypto_has_hash("sha256-padlock", 0, 0))
-   success++;
-
-   if (!success) {
-   printk(KERN_WARNING PFX "No VIA PadLock drivers have been 
loaded.\n");
-   return -ENODEV;
-   }
-
-   printk(KERN_NOTICE PFX "%d drivers are available.\n", success);
-
-   return 0;
-}
-
-static void __exit padlock_fini(void)
-{
-}
-
-module_init(padlock_init);
-module_exit(padlock_fini);

Re: [patch] CFS scheduler, -v6

2007-04-29 Thread Ingo Molnar

* Willy Tarreau <[EMAIL PROTECTED]> wrote:

> > > [...] except for Mike who has not tested recent versions. [...]
> > 
> > actually, dont discount Mark Lord's test results either. And it 
> > might be a good idea for Mike to re-test SD 0.46?
> 
> In any case, it might be a good idea because Mike encountered a 
> problem that nobody could reproduce. [...]

actually, Mark Lord too reproduced something similar to Mike's results. 
Please try those workloads yourself.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PATCH] ACPI patches for 2.6.22

2007-04-29 Thread Andrew Morton
On Sun, 29 Apr 2007 01:02:33 -0400 Len Brown <[EMAIL PROTECTED]> wrote:

> please pull from: 
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6.git release
> 
> This batch mostly updates the platform-specific drivers that use ACPI.
> The EC and sbs changes are primarily cleanups.
> There are no changes to the ACPICA core, except a single bugfix
> that was related to a 2.6.21 boot regression on some older machines.
> And then the usual mix of random tweaks.

There might still be a few regressions in this lot:

- Miles Lane's "2.6.21-rc7-mm2 -- gnome-power-manager always shows the
  power as coming from AC"

- "battery caching introduces a lock up"
  http://bugzilla.kernel.org/show_bug.cgi?id=8351

These are older and might have been fixed:

- Mat Mackall's "Thinkpads not waking up on lid open with -rc6-mm1"

- Helge Hafting's "2.6.21-rc3-mm2 hangs my opteron during bootup, ACPI?"

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v6

2007-04-29 Thread Willy Tarreau
On Sun, Apr 29, 2007 at 10:00:28AM +0200, Ingo Molnar wrote:
> 
> * Willy Tarreau <[EMAIL PROTECTED]> wrote:
> 
> > > > [...] except for Mike who has not tested recent versions. [...]
> > > 
> > > actually, dont discount Mark Lord's test results either. And it 
> > > might be a good idea for Mike to re-test SD 0.46?
> > 
> > In any case, it might be a good idea because Mike encountered a 
> > problem that nobody could reproduce. [...]
> 
> actually, Mark Lord too reproduced something similar to Mike's results. 

OK.

> Please try those workloads yourself.

Unfortunately, I do not have their tools, environments nor hardware.
That's the advantage of having multiple testers ;-)

>   Ingo

Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm2

2007-04-29 Thread Geert Uytterhoeven
On Thu, 26 Apr 2007, Randy Dunlap wrote:
> On Thu, 26 Apr 2007 13:37:20 -0700 Andrew Morton wrote:
> > On Thu, 26 Apr 2007 13:47:14 +0200 Gabriel C <[EMAIL PROTECTED]> wrote:
> > > Andrew Morton wrote:
> > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/
> > > >
> > > >
> > > > - this has everything which is in 2.6.21.  Plus more!
> > > >
> > > > - a number of nasty bugs were fixed.  This should be (a lot) more stable
> > > >   than 2.6.21-rc7-mm1.
> > > >
> > > >   
> > > 
> > > I get this warning here :
> > > 
> > > 
> > > drivers/net/Kconfig:2327:warning: 'select' used by config symbol 
> > > 'UCC_GETH' refer to undefined symbol 'UCC_FAST'
> > 
> > Yes, we get so many of those that I tend to ignore them, assuming that
> > someone will pick it up and fix it.
> > 
> > This one was added by git-powerpc, presumably
> > 7d776cb596994219584257eb5956b87628e5deaf "QE: automatically select QE
> > options"
> 
> There was a similar problem with PS3_xyz (don't recall exactly
> which PS3_option it was) that was "solved" by introducing an
> intermediate config symbol IIRC.  Maybe that trick^W fix can be
> done here also...

CONFIG_PS3_ADVANCED, commit 3f555c700b6c90f9ac24bc81a4f509583d906278

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- Sony Network and Software Technology Center Europe (NSCE)
[EMAIL PROTECTED] --- Sint-Stevens-Woluwestraat 55
Voice +32-2-2908453 Fax +32-2-7262686  B-1130 Brussels, Belgium
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v6

2007-04-29 Thread Ingo Molnar

* William Lee Irwin III <[EMAIL PROTECTED]> wrote:

> I think it'd be a good idea to merge scheduler classes before changing 
> over the policy so future changes to policy have smaller code impact. 
> Basically, get scheduler classes going with the mainline scheduler.

i've got a split up patch for the class stuff already, but lets first 
get some wider test-coverage before even thinking about upstream 
integration. This is all v2.6.22 stuff at the earliest.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v6

2007-04-29 Thread Ingo Molnar

* Kasper Sandberg <[EMAIL PROTECTED]> wrote:

> If you have some ideas on how these problems might be fixed i'd surely 
> try fixes and stuff, or if you have some data you need me to collect 
> to better understand whats going on. But i suspect any somewhat 
> demanding 3d application will do, and the difference is so staggering 
> that when you see it in effect, you cant miss it.

it would be great if you could try a simple experiment: does something 
as simple as glxgears resized to a large window trigger this 
'stuttering' phenomenon when other stuff is running? If not, could you 
try to find the simplest 3D stuff under Linux that already triggers it 
so that i can reproduce it?

(Also, as an independent debug-test, could you try CONFIG_PREEMPT too 
perhaps? I.e. is this 'stuttering' behavior independent of the 
preemption model and a general property of CFS?)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 22/25] xen: xen-netfront: use skb.cb for storing private data

2007-04-29 Thread Herbert Xu
On Sun, Apr 29, 2007 at 12:43:33AM -0700, Jeremy Fitzhardinge wrote:
> Herbert Xu wrote:
> > BTW, the version I posted to you is missing the following line.
> >   
> > --- linux-2.6.20.i386/drivers/xen/core/skbuff.c 2007-04-28 
> > 15:30:16.0 +1000
> > +++ build-2.6.20.i386/drivers/xen/core/skbuff.c 2007-04-28 
> > 15:30:52.0 +1000
> > @@ -89,6 +89,7 @@
> > skb->h.raw = (unsigned char *)skb->nh.iph + 4*skb->nh.iph->ihl;
> > if (skb->h.raw >= skb->tail)
> > goto out;
> > +   skb->csum_start = skb->h.raw - skb->head;
> > switch (skb->nh.iph->protocol) {
> > case IPPROTO_TCP:
> > skb->csum_offset = offsetof(struct tcphdr, check);
> >   
> 
> drivers/xen/core/skbuff.c?  What's that?

It's part of the skb_checksum_setup function which we still need
for this because the current netback protocol doesn't pass the
csum_start and csum_offset values along.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v6

2007-04-29 Thread Mike Galbraith
On Sun, 2007-04-29 at 09:16 +0200, Willy Tarreau wrote:

> In fact, what I'd like to see in 2.6.22 is something better for everybody
> and with *no* regression, even if it's not perfect. I had the feeling
> that SD matched that goal right now, except for Mike who has not tested
> recent versions. Don't get me wrong, I still think that CFS is a more
> interesting long-term target.

While I haven't tested recent SD versions, unless it's design has
radically changed recently, I know what to expect.  CFS is giving me a
very high quality experience already (it's at a whopping v7), while
RSDL/SD irritated me greatly at version v40.  As far as I'm concerned,
CFS is the superior target, short-term, long-term whatever-term.  For
the tree where I make the decisions, the hammer has fallen, and RSDL/SD
is history.  Heck, I'm _almost_ ready to rm -rf my own scheduler trees
as well... I could really use some free disk space.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v6

2007-04-29 Thread Willy Tarreau
On Sun, Apr 29, 2007 at 12:54:36AM -0700, William Lee Irwin III wrote:
> On Sun, Apr 29, 2007 at 09:16:27AM +0200, Willy Tarreau wrote:
> > In fact, what I'd like to see in 2.6.22 is something better for everybody
> > and with *no* regression, even if it's not perfect. I had the feeling
> > that SD matched that goal right now, except for Mike who has not tested
> > recent versions. Don't get me wrong, I still think that CFS is a more
> > interesting long-term target. But it may require more time to satisfy
> > everyone. At least with one of them in 2.6.22, we won't waste time
> > comparing to current mainline.
> 
> I think it'd be a good idea to merge scheduler classes before changing
> over the policy so future changes to policy have smaller code impact.
> Basically, get scheduler classes going with the mainline scheduler.
> 
> There are other pieces that can be merged earlier, too, for instance,
> the correction to the comment in init/main.c. Directed yields can
> probably also go in as nops or -ENOSYS returns if not fully implemented,
> though I suspect there shouldn't be much in the way of implementing them.
> p->array vs. p->on_rq can be merged early too.

I agree that merging some framework is a good way to proceed.

> Common code for rbtree-based priority queues can be factored out of
> cfq, cfs, and hrtimers.

In my experience, rbtrees are painfully slow. Yesterday, I spent the
day replacing them in haproxy with other trees I developped a few
years ago, which look like radix trees. They are about 2-3 times as
fast to insert 64-bit data, and you walk through them in O(1). I have
many changes to apply to them before they could be used in kernel, but
at least I think we already have code available for other types of trees.

> There are extensive /proc/ reporting changes, large chunks of which
> could go in before the policy as well.
> 
> I'm camping in this weekend, so I'll see what I can eke out.

good luck !

Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: vmstat: use our own timer events

2007-04-29 Thread Andrew Morton
On Sat, 28 Apr 2007 22:09:04 -0700 (PDT) Christoph Lameter <[EMAIL PROTECTED]> 
wrote:

> vmstat is currently using the cache reaper to periodically bring the
> statistics up to date. The cache reaper does only exists in SLUB
> as a way to provide compatibility with SLAB. This patch removes
> the vmstat calls from the slab allocators and provides its own
> handling.
> 
> The advantage is also that we can use a different frequency for the
> updates. Refreshing vm stats is a pretty fast job so we can run this
> every second and stagger this by only one tick. This will lead to
> some overlap in large systems. F.e a system running at 250 HZ with
> 1024 processors will have 4 vm updates occurring at once.
> 
> However, the vm stats update only accesses per node information.
> It is only necessary to stagger the vm statistics updates per
> processor in each node. Vm counter updates occurring on distant
> nodes will not cause cacheline contention.
> 
> We could implement an alternate approach that runs the first processor
> on each node at the second and then each of the other processor on a
> node on a subsequent tick. That may be useful to keep a large amount
> of the second free of timer activity. Maybe the timer folks will have
> some feedback on this one?

The one-per-second timer interrupt will upset the people who are really
aggressive about power consumption (eg, OLPC).  Perhaps there isn't (yet)
an intersection between those people and SMP.

However a knob to set the frequency would be nice, if it's not too
expensive to implement.  Presumably anyone who cares enough will come along
and add one, but then they have to wait for a long period for that change
to propagate out to their users, which is a bit sad for something which we
already knew about.

Having each CPU touch every zone looks a bit expensive - I'd have thought
that it would be showing up a little on your monster NUMA machines?

> @@ -648,11 +664,21 @@ static int __cpuinit vmstat_cpuup_callba
>   unsigned long action,
>   void *hcpu)
>  {
> + long cpu = (long)hcpu;
> +
>   switch (action) {
> - case CPU_UP_PREPARE:
> - case CPU_UP_PREPARE_FROZEN:
> - case CPU_UP_CANCELED:
> - case CPU_UP_CANCELED_FROZEN:
> + case CPU_ONLINE:
> + case CPU_ONLINE_FROZEN:
> + start_cpu_timer(cpu);
> + break;
> + case CPU_DOWN_PREPARE:
> + case CPU_DOWN_PREPARE_FROZEN:
> + cancel_rearming_delayed_work(&per_cpu(vmstat_work, cpu));
> + per_cpu(vmstat_work, cpu).work.func = NULL;
> + case CPU_DOWN_FAILED:
> + case CPU_DOWN_FAILED_FROZEN:
> + start_cpu_timer(cpu);
> + break;
>   case CPU_DEAD:
>   case CPU_DEAD_FROZEN:
>   refresh_zone_stat_thresholds();

Oh dear.  Some of these new notifier types are added by a patch which is a
few hundred patches later than slub.  I can park this patch after that one,
but that introduces a risk that later slub patches will also get
disconnected.

Oh well, we'll see how things go.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


bad_page from quicklist patches

2007-04-29 Thread Paul Jackson
I am getting many 'bad_page' failures from the quicklist patches
in 2.6.21-rc7-mm1.  I have bisected the problem down the following
patches:

quicklists-for-page-table-pages.patch
quicklists-for-page-table-pages-avoid-useless-virt_to_page-conversion.patch
quicklist-support-for-ia64.patch
quicklist-support-for-x86_64.patch
quicklist-support-for-sparc64.patch

This is on an ia64, compiled with the sn2_defconfig configuration.

If I have these quicklist patches included in the build, then starting
with the init process entering user land, I start getting many
complaints such as the following.  I don't know if it ever actually
gets past the init scripts to finish userland booting, due the large
number of such complaints, and due to another problem that I haven't
looked into yet, involving my boot hanging after the "Setting up
service network" message is displayed during the init script sequence.

But ... back to this problem ... here is the boot output, up through
the first few such 'bad_page' failures that I see when booting with
a kernel that includes the above quicklist patches:

===

ELILO
Uncompressing Linux... done
Linux version 2.6.21-rc7-mm1 ([EMAIL PROTECTED]) (gcc version 4.1.2 20070115 
(prerelease) (SUSE Linux)) #12 SMP PREEMPT Sun Apr 29 00:43:14 PDT 2007
EFI v1.10 by INTEL: SALsystab=0x3002a092f0 ACPI 2.0=0x3002a09ac0
ACPI: RSDP 3002A09AC0, 0024 (r2SGI)
ACPI: XSDT 3002A09B00, 0044 (r1SGI  XSDTSN2100011)
ACPI: APIC 3002A09B60, 008C (r1SGI  APICSN2100011)
ACPI: SRAT 3002A09C00, 0150 (r1SGI  SRATSN2100011)
ACPI: SLIT 3002A09D60, 003C (r1SGI  SLITSN2100011)
ACPI: FACP 3002A09E40, 00F4 (r3SGI  FACPSN2300011)
ACPI: DSDT 3002A09E00, 0024 (r2SGI  DSDTSN2200011)
ACPI: FACS 3002A09DB0, 0040
Number of logical nodes in system = 4
Number of memory chunks in system = 4
SAL 2.9: SGI SN2 version 4.50
SAL Platform features: ITC_Drift
SAL: AP wakeup using external interrupt vector 0x12
No logical to physical processor mapping available
SAL_CAL_FLUSH failed with -1
ACPI: Local APIC address c000fee0
ACPI: Error parsing MADT - no IOSAPIC entries
register_intr: No IOSAPIC for GSI 52
8 CPUs available, 8 CPUs total
Increasing MCA rendezvous timeout from 2 to 49000 milliseconds
MCA related initialization done
ACPI: RSDP 3002A09AC0, 0024 (r2SGI)
ACPI: XSDT 3002A09B00, 0044 (r1SGI  XSDTSN2100011)
ACPI: APIC 3002A09B60, 008C (r1SGI  APICSN2100011)
ACPI: SRAT 3002A09C00, 0150 (r1SGI  SRATSN2100011)
ACPI: SLIT 3002A09D60, 003C (r1SGI  SLITSN2100011)
ACPI: FACP 3002A09E40, 00F4 (r3SGI  FACPSN2300011)
ACPI: DSDT 3002A09E00, 0024 (r2SGI  DSDTSN2200011)
ACPI: FACS 3002A09DB0, 0040
SGI SAL version 4.50
Virtual mem_map starts at 0xa0007ffe85c8
Zone PFN ranges:
  Normal   12585984 -> 113307648
Movable zone start PFN for each node
early_node_map[7] active PFN ranges
0: 12585984 -> 12709887
1: 46140416 -> 46264320
2: 79694848 -> 79753216
3: 113249280 -> 113306111
3: 113307136 -> 113307481
3: 113307496 -> 113307524
3: 113307552 -> 113307560
Built 4 zonelists, mobility grouping on.  Total pages: 362143
Kernel command line: BOOT_IMAGE=scsi1:\efi\SuSE\vmlinuz.pj6 root=/dev/sda5 
console=ttySG0 splash=silent thash_entries=2097152 kdb=on ro
PID hash table entries: 4096 (order: 12, 32768 bytes)
Console: colour dummy device 80x25
Memory: 5644256k/5796608k available (7960k code, 169936k reserved, 6062k data, 
1664k init)
McKinley Errata 9 workaround not needed; disabling it
SLUB: General Slabs=20, HW alignment=128, Processors=8, Nodes=1024
Dentry cache hash table entries: 1048576 (order: 9, 8388608 bytes)
Inode-cache hash table entries: 524288 (order: 8, 4194304 bytes)
Mount-cache hash table entries: 1024
ACPI: Core revision 20070126
Boot processor id 0x0/0x0
Brought up 8 CPUs
Total of 8 processors activated (15564.80 BogoMIPS).
migration_cost=5743,44436
DMI not present or invalid.
NET: Registered protocol family 16
ACPI  DSDT OEM Rev 0x20001
ACPI: bus type pci registered
ACPI: SCI (ACPI GSI 52) not registered
ACPI: Interpreter enabled
ACPI: Using IOSAPIC for interrupt routing
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI init
pnp: PnP ACPI: found 0 devices
SCSI subsystem initialized
NET: Registered protocol family 2
IP route cache hash table entries: 262144 (order: 7, 2097152 bytes)
TCP established hash table entries: 2097152 (order: 11, 50331648 bytes)
TCP bind hash table entries: 65536 (order: 6, 1048576 bytes)
TCP: Hash tables configured (established 2097152 bind 65536)
TCP reno registered
perfmon: version 2.0 IRQ 238
perfmon: Itanium 2 PMU detected, 16 PMCs, 18 PMDs, 4 counters (47 bits)
PAL Information Facility v0.5
perfmon: added sampling format d

Re: [patch] CFS scheduler, -v6

2007-04-29 Thread William Lee Irwin III
* William Lee Irwin III <[EMAIL PROTECTED]> wrote:
>> I think it'd be a good idea to merge scheduler classes before changing 
>> over the policy so future changes to policy have smaller code impact. 
>> Basically, get scheduler classes going with the mainline scheduler.

On Sun, Apr 29, 2007 at 10:03:59AM +0200, Ingo Molnar wrote:
> i've got a split up patch for the class stuff already, but lets first 
> get some wider test-coverage before even thinking about upstream 
> integration. This is all v2.6.22 stuff at the earliest.

I'd like to get some regression testing (standard macrobenchmarks) in
on the scheduler class bits in isolation, as they do have rather
non-negligible impacts on load balancing code, to changes in which such
macrobenchmarks are quite sensitive.

This shouldn't take much more than kicking off a benchmark on an
internal box at work already set up to do such testing routinely.
I won't need to write any fresh testcases etc. for it. Availability
of the test systems may have to wait until Monday, since various people
not wanting benchmarks disturbed are likely to be out for the weekend.

It would also be beneficial for the other schedulers to be able to
standardize on the scheduling class framework as far in advance as
possible. In such a manner comparative testing by end-users and more
industrial regression testing can be facilitated.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm/memory.c: remove warning from an uninitialized spinlock. was: Re: 2.6.21-rc7-mm2

2007-04-29 Thread Andrew Morton
On Sun, 29 Apr 2007 08:50:49 +0200 Borislav Petkov <[EMAIL PROTECTED]> wrote:

> 
> Introduce a macro for suppressing gcc from generating a warning about a 
> probable
> unitialized state of a variable.
> 
> Signed-off-by: Borislav Petkov <[EMAIL PROTECTED]>
> 
> ---
> 
> Index: linux-mm/include/linux/compiler.h
> ===
> --- linux-mm.orig/include/linux/compiler.h
> +++ linux-mm/include/linux/compiler.h
> @@ -109,6 +109,10 @@ extern int do_check_likely(struct likeli
>  (typeof(ptr)) (__ptr + (off)); })
>  #endif
>  
> +#ifndef unitialized_var
> +# define unitialized_var(x) x = x
> +#endif
> +
>  #endif /* __KERNEL__ */
>  
>  #endif /* __ASSEMBLY__ */
> Index: linux-mm/mm/memory.c
> ===
> --- linux-mm.orig/mm/memory.c
> +++ linux-mm/mm/memory.c
> @@ -1488,7 +1488,7 @@ static int apply_to_pte_range(struct mm_
>   pte_t *pte;
>   int err;
>   struct page *pmd_page;
> - spinlock_t *ptl;
> + spinlock_t *unitialized_var(ptl);
> 
>   pte = (mm == &init_mm) ?
>   pte_alloc_kernel(pmd, addr) :

Ho hum.  I guess I'll slide this over to Linus if there's not too much
howling, and unless someone can come up with anything better.

I will, however, fix the spelling to "uninitialized" ;)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: bad_page from quicklist patches

2007-04-29 Thread Andrew Morton
On Sun, 29 Apr 2007 01:16:10 -0700 Paul Jackson <[EMAIL PROTECTED]> wrote:

> I am getting many 'bad_page' failures from the quicklist patches
> in 2.6.21-rc7-mm1.  I have bisected the problem down the following
> patches:
> 
> quicklists-for-page-table-pages.patch
> 
> quicklists-for-page-table-pages-avoid-useless-virt_to_page-conversion.patch
> quicklist-support-for-ia64.patch
> quicklist-support-for-x86_64.patch
> quicklist-support-for-sparc64.patch
> 
> This is on an ia64, compiled with the sn2_defconfig configuration.

That should have been fixed in -mm2, by the below:

--- 
a/include/linux/quicklist.h~quicklists-for-page-table-pages-avoid-useless-virt_to_page-conversion-fix
+++ a/include/linux/quicklist.h
@@ -61,7 +61,7 @@ static inline void __quicklist_free(int 
if (unlikely(nid != numa_node_id())) {
if (dtor)
dtor(p);
-   free_hot_page(page);
+   __free_page(page);
return;
}
 
_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: send_IPI_mask_bitmask() (Re: 2.6.21 known regressions (v2) (for -stable team))

2007-04-29 Thread Thomas Gleixner
On Sun, 2007-04-29 at 01:34 -0400, Len Brown wrote:
> > clockevents_notify() is called with the power verify information for an
> > offline CPU. I can handle this in the clockevents code, but I think acpi
> > is the correct place.
> 
> So the CONFIG_GENERIC_CLOCKEVENTS=y case is broken,
> but the CONFIG_GENERIC_CLOCKEVENTS=n below is okay?
> Not immediately clear why both cases can't fail.

True, that's strange. Even more strange is that 2.6.21-rc7 does not have
the problem, but 2.6.21 has. We did not change anything in those code
pathes between rc7 and final.

Jeff, can you please verify which -rcX was the last which did not have
this problem.

Thanks,

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Back to the future.

2007-04-29 Thread Pavel Machek
Hi!

> > > The freezer has *caused* those deadlocks (eg by stopping threads that 
> > > were 
> > > needed for the suspend writeouts to succeed!), not solved them.
> > 
> > I can't remember anything like this, but I believe you have a specific test
> > case in mind.
> 
> Ehh.. Why do you thik we _have_ that PF_NOFREEZE thing in the first place?
> 
> Rafael, you really don't know what you're talking about, do you?
> 
> Just _look_ at them. It's the IO threads etc that shouldn't be frozen, 
> exactly *because* they do IO. You claim that kernel threads shouldn't do 
> IO, but that's the point: if you cannot do IO when snapshotting to disk, 
> here's a damn big clue for you: how do you think that snapshot is going to 
> get written?
> 
> I *guarantee* you that we've had a lot more problems with threads that 
> should *not* have been frozen than with those hypothetical threads that 
> you think should have been frozen.

Well, we had nasty corruption on XFS, caused by thread that was not
frozen and should be. (While the other case leads "only" to deadlocks,
so it is easier to debug.)

The locking point.. when I added freezing to swsusp, I knew very
little about kernel locking, so I "simply" decided to avoid the
problem altogether... using the freezer.

You may be right that locks are not a big problem for the hibernation
after all; I just do not know.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: bad_page from quicklist patches

2007-04-29 Thread Paul Jackson
> That should have been fixed in -mm2, by the below:

Ah - ok - you're quick - thanks.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PATCH] UIO patches for 2.6.21

2007-04-29 Thread Thomas Gleixner
On Sat, 2007-04-28 at 18:23 -0700, Greg KH wrote:
> On Sat, Apr 28, 2007 at 10:31:37PM +0200, Thomas Gleixner wrote:
> > On Sat, 2007-04-28 at 21:15 +0100, Alan Cox wrote:
> > > > > I have a political question, if I have a user space driver, is my 
> > > > > kernel 
> > > > > tainted or not? 
> > > > 
> > > > Surely not. By using the kernel's userspace interface, you create no
> > > > "derived work" of the kernel. See COPYING in the root directory of the 
> > > > kernel sources for details.
> > > 
> > > That only covers normal system calls - but I don't think thats what is
> > > relevant, taints are for debug assistance not politics.
> > > 
> > > I think we should have a taint flag for UIO type drivers. Not for any
> > > licensing or political reason but for the simple fact it means that there
> > > may be other complexities to debugging - and not the same one as a binary
> > > module. Probably we want the same marker for mmap /dev/mem too.
> > 
> > I agree, if we make it entirely clear that the flag is nonpolitical. 
> 
> Hm, I don't know, what makes this different from the fact that we can
> mmap PCI device space today through the proc and sysfs entries?  That's
> how X gets direct access to the hardware for a number of different
> cards, and that's pretty much the same thing as the UIO interface is
> doing.
> 
> Unless you think we should also use the same "taint" flag on those
> accesses too, and if so, I have no objection.

Right, this is just a hint, that something in user space is accessing
the hardware directly. Not a too bad idea, but pretty much useless when
we add X to the picture as it will be set always :)

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.21

2007-04-29 Thread Thomas Gleixner
On Sat, 2007-04-28 at 20:41 -0700, David Miller wrote:
> From: Adrian Bunk <[EMAIL PROTECTED]>
> Date: Sun, 29 Apr 2007 01:04:16 +0200
> 
> > Bugzilla has an email interface.
> > Andrew forwards bugs from Bugzilla to developers.
> 
> Therefore, bugzilla only works at all when Andrew forwards things
> around by-hand.

That's not entirely true. There are people watching the bugs which might
be relevant for them on their own.

It does not make bugzilla better though. The user interface sucks and
getting things correlated is simply not possible.

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm 1/2] Separate freezer from PM code

2007-04-29 Thread Sam Ravnborg
On Fri, Apr 27, 2007 at 11:29:34PM +0200, Rafael J. Wysocki wrote:
> On Friday, 27 April 2007 22:20, Jeremy Fitzhardinge wrote:
> > Rafael J. Wysocki wrote:
> > > Makes sense.  Please have a look at the updated patch below.
> > >
> > > Sam, does this one look better to you?
> > >   
> > 
> > If freezer.c is in kernel/, then shouldn't the corresponding config var
> > be in a non-arch Kconfig file?
> 
> Well, I though it would look strange.  Still, I can do that, of course:

It would have been much better to have a single kernel/Kconfig
so you avoided all the arch specific source lines.
But thats not this patch and can come later.

So I'm OK with this one.

Sam

> 
> ---
> From: Rafael J. Wysocki <[EMAIL PROTECTED]>
> 
> Now that the freezer is used by kprobes, it is no longer a PM-specific piece 
> of
> code.  Move the freezer code out of kernel/power and introduce the
> CONFIG_FREEZER option that will be chosen automatically if PM or KPROBES is 
> set.
> 
> Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
> ---
>  arch/arm/Kconfig |2 
>  arch/avr32/Kconfig   |2 
>  arch/avr32/Kconfig.debug |1 
>  arch/blackfin/Kconfig|2 
>  arch/frv/Kconfig |2 
>  arch/i386/Kconfig|3 
>  arch/ia64/Kconfig|3 
>  arch/mips/Kconfig|2 
>  arch/powerpc/Kconfig |3 
>  arch/ppc/Kconfig |2 
>  arch/s390/Kconfig|3 
>  arch/sh/Kconfig  |2 
>  arch/sparc64/Kconfig |3 
>  arch/x86_64/Kconfig  |3 
>  include/linux/freezer.h  |2 
>  kernel/Kconfig.freezer   |5 
>  kernel/Makefile  |1 
>  kernel/freezer.c |  236 
> +++
>  kernel/kprobes.c |2 
>  kernel/power/Kconfig |1 
>  kernel/power/Makefile|2 
>  kernel/power/process.c   |  236 
> ---
>  22 files changed, 279 insertions(+), 239 deletions(-)
> 
> Index: linux-2.6.21-rc7-mm2/arch/x86_64/Kconfig
> ===
> --- linux-2.6.21-rc7-mm2.orig/arch/x86_64/Kconfig 2007-04-27 
> 21:41:05.0 +0200
> +++ linux-2.6.21-rc7-mm2/arch/x86_64/Kconfig  2007-04-27 23:20:43.0 
> +0200
> @@ -703,6 +703,8 @@ config GENERIC_PENDING_IRQ
>   depends on GENERIC_HARDIRQS && SMP
>   default y
>  
> +source "kernel/Kconfig.freezer"
> +
>  menu "Power management options"
>  
>  source kernel/power/Kconfig
> @@ -791,6 +793,7 @@ source "arch/x86_64/oprofile/Kconfig"
>  config KPROBES
>   bool "Kprobes (EXPERIMENTAL)"
>   depends on KALLSYMS && EXPERIMENTAL && MODULES
> + select FREEZER
>   help
> Kprobes allows you to trap at almost any kernel address and
> execute a callback function.  register_kprobe() establishes
> Index: linux-2.6.21-rc7-mm2/arch/avr32/Kconfig.debug
> ===
> --- linux-2.6.21-rc7-mm2.orig/arch/avr32/Kconfig.debug2007-04-27 
> 21:41:05.0 +0200
> +++ linux-2.6.21-rc7-mm2/arch/avr32/Kconfig.debug 2007-04-27 
> 21:54:19.0 +0200
> @@ -12,6 +12,7 @@ menu "Instrumentation Support"
>  config KPROBES
>   bool "Kprobes"
>   depends on DEBUG_KERNEL
> + select FREEZER
>   help
> Kprobes allows you to trap at almost any kernel address and
>execute a callback function.  register_kprobe() establishes
> Index: linux-2.6.21-rc7-mm2/arch/frv/Kconfig
> ===
> --- linux-2.6.21-rc7-mm2.orig/arch/frv/Kconfig2007-04-27 
> 21:41:05.0 +0200
> +++ linux-2.6.21-rc7-mm2/arch/frv/Kconfig 2007-04-27 23:13:27.0 
> +0200
> @@ -364,6 +364,8 @@ source "drivers/pcmcia/Kconfig"
>  #  sleep-deprived psychotic hacker types can say Y now, everyone else
>  #  should probably wait a while.
>  
> +source "kernel/Kconfig.freezer"
> +
>  menu "Power management options"
>  source kernel/power/Kconfig
>  endmenu
> Index: linux-2.6.21-rc7-mm2/arch/i386/Kconfig
> ===
> --- linux-2.6.21-rc7-mm2.orig/arch/i386/Kconfig   2007-04-27 
> 21:41:05.0 +0200
> +++ linux-2.6.21-rc7-mm2/arch/i386/Kconfig2007-04-27 23:17:36.0 
> +0200
> @@ -912,6 +912,8 @@ config ARCH_ENABLE_MEMORY_HOTPLUG
>   def_bool y
>   depends on HIGHMEM
>  
> +source "kernel/Kconfig.freezer"
> +
>  menu "Power management options (ACPI, APM)"
>   depends on !X86_VOYAGER
>  
> @@ -1218,6 +1220,7 @@ source "arch/i386/oprofile/Kconfig"
>  config KPROBES
>   bool "Kprobes (EXPERIMENTAL)"
>   depends on KALLSYMS && EXPERIMENTAL && MODULES
> + select FREEZER
>   help
> Kprobes allows you to trap at almost any kernel address and
> execute a callback function.  register_kprobe() establishes
> Index: linux-2.6.21-rc7-mm2/arch/ia64/Kconf

Re: Back to the future.

2007-04-29 Thread Rafael J. Wysocki
On Sunday, 29 April 2007 01:45, Linus Torvalds wrote:
> 
> On Sun, 29 Apr 2007, Rafael J. Wysocki wrote:
> > 
> > OK, more precisely: fs-related threads should not try to process their 
> > queues,
> > etc., after the snapshot is done, because that may cause some fs data to be
> > written at that time and then the fs in question may be corrupted after the
> > restore.  Not all of the I/O in general, fs data.
> 
> But that's not true _either_. That's only true because right now I think 
> we cannot even suspend to a swapfile (I might be wrong). 

You are.
 
> If you have a swapfile on a filesystem, you'd need those fs queues 
> running!

No, I don't.  It's done by bmapping the file and writing directly to the
underlying blockdev.  Otherwise we'd have corrupted filesystems after the
restore.

Swapfiles are handled this way anyway, so we just use the same code.

> > Well, I'm not sure whether or not that still would have been the case if we 
> > had
> > stopped to freeze kernel threads for the hibernation/suspend.
> 
> Did you miss the email where Paul pointed out that Mac/PowerPC didn't use 
> to do any of this?

No, I didn't.

> And apparently never had any issues with it?

On one platform with a limited subset of device drivers.

> And probably worked more reliably several years ago than suspend/hibernation 
> does _today_?

I have no problems with the hibernation on my test boxes (six of them), except
for one network driver that doesn't bother to define a .suspend() callback.

There are problems with the suspend (s2ram), but they are _not_ related to the
freezing of kernel threads.  Some of them are related to the other issue that
you have risen, which is that the same callbacks should not be used for the
suspend and hibernation, and which I think is absolutely valid.  The remaining
ones are related to the fact that graphic card vendors don't care for us at
all.

> Ie we do have history of _not_ freezing things.  The freezing came later, 
> and came with the subsystem that had more problems..

It doesn't have that many problems as you are trying to suggest.  At present,
the only problems with it happen if someone tries to "improve" it in the way
I did with the workqueues.

Anyway, the freezing of tasks, including kernel threads, is one of the few
things on which Pavel, Nigel and me completely agree that they should be done,
so perhaps you could accept that?

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: bad_page from quicklist patches

2007-04-29 Thread Paul Jackson
> That should have been fixed in -mm2

Verified.  2.6.21-rc7-mm2 builds and boots on my SN2 (ia64).

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Back to the future.

2007-04-29 Thread Pavel Machek
Hi!

> > Ie we do have history of _not_ freezing things.  The freezing came later, 
> > and came with the subsystem that had more problems..
> 
> It doesn't have that many problems as you are trying to suggest.  At present,
> the only problems with it happen if someone tries to "improve" it in the way
> I did with the workqueues.
> 
> Anyway, the freezing of tasks, including kernel threads, is one of the few
> things on which Pavel, Nigel and me completely agree that they should be done,
> so perhaps you could accept that?

Actually, if we want to support OLPC _nicely_, we'll need to get rid
of freezer from suspend-to-RAM. Of course, that _will_ put more
pressure at the drivers -- and break few of them...

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v6

2007-04-29 Thread William Lee Irwin III
On Sun, Apr 29, 2007 at 12:54:36AM -0700, William Lee Irwin III wrote:
>> Common code for rbtree-based priority queues can be factored out of
>> cfq, cfs, and hrtimers.

On Sun, Apr 29, 2007 at 10:13:17AM +0200, Willy Tarreau wrote:
> In my experience, rbtrees are painfully slow. Yesterday, I spent the
> day replacing them in haproxy with other trees I developped a few
> years ago, which look like radix trees. They are about 2-3 times as
> fast to insert 64-bit data, and you walk through them in O(1). I have
> many changes to apply to them before they could be used in kernel, but
> at least I think we already have code available for other types of trees.

Dynamic allocation of auxiliary indexing structures is problematic for
the scheduler, which significantly constrains the algorithms one may
use for this purpose.

rbtrees are not my favorite either. Faster alternatives to rbtrees
exist even among binary trees; for instance, it's not so difficult to
implement a heap-ordered tree maintaining the red-black invariant with
looser constraints on the tree structure and hence less rebalancing.
One could always try implementing a van Emde Boas queue, if he felt
particularly brave.

Some explanation of the structure may be found at:
http://courses.csail.mit.edu/6.897/spring03/scribe_notes/L1/lecture1.pdf

According to that, y-trees use less space, and exponential trees are
asymptotically faster with a worst-case asymptotic running time of

O(min(lg(lg(u))*lg(lg(n))/lg(lg(lg(u))), sqrt(lg(n)/lg(lg(n)

for all operations, so van Emde Boas is not the ultimate algorithm by
any means at O(lg(lg(u))); in these estimates, u is the size of the
"universe," or otherwise the range of the key data type. Not to say
that any of those are appropriate for the kernel; it's rather likely
we'll have to settle for something less interesting, if we bother
ditching rbtrees at all, on account of the constraints of the kernel
environment.

I'll see what I can do about a userspace test harness for priority
queues more comprehensive than smart-queue.c. I have in mind the
ability to replay traces obtained from queues in the kernel and loading
priority queue implementations via dlopen()/dlsym() et al. valgrind can
do most of the dirty work. Otherwise running a trace for some period of
time and emitting the number of operations it got through should serve
as a benchmark. With that in hand, people can grind out priority queue
implementations and see how they compare on real operation sequences
logged from live kernels.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] paravirt: fix startup_ipi_hook config dependency

2007-04-29 Thread Jeremy Fitzhardinge
startup_ipi_hook depends on CONFIG_X86_LOCAL_APIC, so move it to the
right part of the paravirt_ops initialization.

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>

---
 arch/i386/kernel/paravirt.c |3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

===
--- a/arch/i386/kernel/paravirt.c
+++ b/arch/i386/kernel/paravirt.c
@@ -292,6 +292,7 @@ struct paravirt_ops paravirt_ops = {
.apic_read = native_apic_read,
.setup_boot_clock = setup_boot_APIC_clock,
.setup_secondary_clock = setup_secondary_APIC_clock,
+   .startup_ipi_hook = paravirt_nop,
 #endif
.set_lazy_mode = paravirt_nop,
 
@@ -342,8 +343,6 @@ struct paravirt_ops paravirt_ops = {
.dup_mmap = paravirt_nop,
.exit_mmap = paravirt_nop,
.activate_mm = paravirt_nop,
-
-   .startup_ipi_hook = paravirt_nop,
 };
 
 EXPORT_SYMBOL(paravirt_ops);


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Back to the future.

2007-04-29 Thread Rafael J. Wysocki
On Sunday, 29 April 2007 10:23, Pavel Machek wrote:
> Hi!
> 
> > > > The freezer has *caused* those deadlocks (eg by stopping threads that 
> > > > were 
> > > > needed for the suspend writeouts to succeed!), not solved them.
> > > 
> > > I can't remember anything like this, but I believe you have a specific 
> > > test
> > > case in mind.
> > 
> > Ehh.. Why do you thik we _have_ that PF_NOFREEZE thing in the first place?
> > 
> > Rafael, you really don't know what you're talking about, do you?
> > 
> > Just _look_ at them. It's the IO threads etc that shouldn't be frozen, 
> > exactly *because* they do IO. You claim that kernel threads shouldn't do 
> > IO, but that's the point: if you cannot do IO when snapshotting to disk, 
> > here's a damn big clue for you: how do you think that snapshot is going to 
> > get written?
> > 
> > I *guarantee* you that we've had a lot more problems with threads that 
> > should *not* have been frozen than with those hypothetical threads that 
> > you think should have been frozen.
> 
> Well, we had nasty corruption on XFS, caused by thread that was not
> frozen and should be. (While the other case leads "only" to deadlocks,
> so it is easier to debug.)
> 
> The locking point.. when I added freezing to swsusp, I knew very
> little about kernel locking, so I "simply" decided to avoid the
> problem altogether... using the freezer.
> 
> You may be right that locks are not a big problem for the hibernation
> after all; I just do not know.

Still, I think, if a kernel thread is a part of a device driver, then _in_
_principle_ it needs _some_ synchronization with the driver's suspend/freeze
and resume/thaw callbacks.  For example, it's reasonable to assume that the
thread should be quiet between suspend/freeze and resume/thaw.

With the freezing of kernel threads we provide a simple means of such
synchronization: use try_to_freeze() in a suitable place of your kernel thread
and you're done.  [Well, there should be a second part for making the thread
die if the thaw callback doesn't find the device, but that's in the works.]

Without it, there may be race conditions that we are not even aware of and that
may trigger in, say, 1 in 10 suspends or so and I wish you good luck with
debugging such things.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm/memory.c: remove warning from an uninitialized spinlock. was: Re: 2.6.21-rc7-mm2

2007-04-29 Thread Andrew Morton
On Sun, 29 Apr 2007 08:50:49 +0200 Borislav Petkov <[EMAIL PROTECTED]> wrote:

> Introduce a macro for suppressing gcc from generating a warning about a 
> probable
> unitialized state of a variable.

I ended up doing the below.

It's better to make this a per-compiler-version thing: later versions of
gcc might need different tricks, or might provide __attribute__((stfu)) or
whatever.

Plus I don't know if the x=x trick is needed on the intel compiler, nor if
it even works, so I left ICC alone.




From: Borislav Petkov <[EMAIL PROTECTED]>

Introduce a macro for suppressing gcc from generating a warning about a
probable uninitialized state of a variable.

Example:

-   spinlock_t *ptl;
+   spinlock_t *uninitialized_var(ptl);

Not a happy solution, but those warnings are obnoxious.

- Using the usual pointlessly-set-it-to-zero approach wastes several
  bytes of text.

- Using a macro means we can (hopefully) do something else if gcc changes
  cause the `x = x' hack to stop working

- Using a macro means that people who are worried about hiding true bugs
  can easily turn it off.

Signed-off-by: Borislav Petkov <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 include/linux/compiler-gcc3.h  |6 ++
 include/linux/compiler-gcc4.h  |6 ++
 include/linux/compiler-intel.h |2 ++
 3 files changed, 14 insertions(+)

diff -puN 
include/linux/compiler-gcc3.h~add-unitialized_var-macro-for-suppressing-gcc-warnings
 include/linux/compiler-gcc3.h
--- 
a/include/linux/compiler-gcc3.h~add-unitialized_var-macro-for-suppressing-gcc-warnings
+++ a/include/linux/compiler-gcc3.h
@@ -13,4 +13,10 @@
 #define __must_check   __attribute__((warn_unused_result))
 #endif
 
+/*
+ * A trick to suppress uninitialized variable warning without generating any
+ * code
+ */
+#define uninitialized_var(x) x = x
+
 #define __always_inlineinline __attribute__((always_inline))
diff -puN 
include/linux/compiler-gcc4.h~add-unitialized_var-macro-for-suppressing-gcc-warnings
 include/linux/compiler-gcc4.h
--- 
a/include/linux/compiler-gcc4.h~add-unitialized_var-macro-for-suppressing-gcc-warnings
+++ a/include/linux/compiler-gcc4.h
@@ -16,3 +16,9 @@
 #define __must_check   __attribute__((warn_unused_result))
 #define __compiler_offsetof(a,b) __builtin_offsetof(a,b)
 #define __always_inlineinline __attribute__((always_inline))
+
+/*
+ * A trick to suppress uninitialized variable warning without generating any
+ * code
+ */
+#define uninitialized_var(x) x = x
diff -puN 
include/linux/compiler-intel.h~add-unitialized_var-macro-for-suppressing-gcc-warnings
 include/linux/compiler-intel.h
--- 
a/include/linux/compiler-intel.h~add-unitialized_var-macro-for-suppressing-gcc-warnings
+++ a/include/linux/compiler-intel.h
@@ -22,3 +22,5 @@
 (typeof(ptr)) (__ptr + (off)); })
 
 #endif
+
+#define uninitialized_var(x) x
_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Back to the future.

2007-04-29 Thread Rafael J. Wysocki
On Sunday, 29 April 2007 10:59, Pavel Machek wrote:
> Hi!
> 
> > > Ie we do have history of _not_ freezing things.  The freezing came later, 
> > > and came with the subsystem that had more problems..
> > 
> > It doesn't have that many problems as you are trying to suggest.  At 
> > present,
> > the only problems with it happen if someone tries to "improve" it in the way
> > I did with the workqueues.
> > 
> > Anyway, the freezing of tasks, including kernel threads, is one of the few
> > things on which Pavel, Nigel and me completely agree that they should be 
> > done,
> > so perhaps you could accept that?
> 
> Actually, if we want to support OLPC _nicely_, we'll need to get rid
> of freezer from suspend-to-RAM. Of course, that _will_ put more
> pressure at the drivers -- and break few of them...

I think the removal of sys_sync() from freeze_processes() in the s2ram case
might help.

I'm really afraid of dropping the freezing of kernel threads from the
hibernation/suspend altogether before we know we won't break drivers, because
we can introduce some very subtle and difficult to debug problems this way.

Moreover, apart from speeding up the suspend slightly (kernel threads are
frozen very quickly) this won't buy us anything, since kprobes uses the freezer
and all of the infrastructure is needed anyway.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.21

2007-04-29 Thread Stefan Richter
David Lang wrote:
> I'll say that as a user I hate having to deal with bugzilla.
> 
> there's nothing more frustrating then spending a good chunk of time
> trying to find a similar bug, then jumping through all the bugzilla
> hoops to file a report to eventually (days/weeks later) get a message
> 'closed becouse it's a duplicate report), then have to go and track down
> what it's a duplicate of, read through that bug report, only to find
> that it's not solved there either, and to top it off, the people working
> on that bug won't see my report or that I'm available to troubleshoot it.

Ideally, joining duplicate reports should be a low-cost, lossless operation.

That said, when bug B is marked as duplicate of bug A, people at bug A
at least get a link to bug B, aren't they?  If they are too lazy to read
the report B, they obviously are not very interested in A either.  Tough
luck.  Vice versa, people at bug B get notified that the matter is now
continued at bug A and can add their Cc there.  Of course that addition
is one of the very few things that could probably be automated.

Joining duplicate reports at a mailinglist involves responding to
multiple threads and send links into web archives of the list, which
happens to be redundant to and disparate from your local e-mail storage.
 I can't see how this aspect of bug-handling works easier on mailinglists.

> from a user poit of view, e-mailing the kernel list (retrying a few days
> later of there is no response) tends to work _much_ better.

What I from a maintainer's POV agree with is that a report to the
appropriate mailinglist is often easier to triage than a report at
bugzilla, because the reporter often needs initial help to properly
define the problem.  Bugzilla becomes useful after a report reached a
minimum level of quality (after minimum initial triage) and if the bug
can be clearly associated with a maintained subsystem of the kernel (as
e.g. Linus already pointed out in this thread).
-- 
Stefan Richter
-=-=-=== -=-- ===-=
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/9] Containers (V9): Generic Process Containers

2007-04-29 Thread Paul Jackson
I'm afraid that this patch set doesn't do cpusets very well.

It builds and boots and mounts the cpuset file system ok.
But trying to write the 'mems' file hangs the system hard.

I had to test it against 2.6.21-rc7-mm2, because I can't boot
2.6.21-rc7-mm1, due to the 'bad_page' problem that I noted in an
earlier post this evening on lkml.

These container patches seemed to apply ok against 2.6.21-rc7-mm2,
and built and booted ok.  I built with an sn2_defconfig configuration,
having the following CONTAINER and CPUSET settings:

CONFIG_CONTAINERS=y
CONFIG_CONTAINER_DEBUG=y
CONFIG_CPUSETS=y
CONFIG_CONTAINER_CPUACCT=y
CONFIG_PROC_PID_CPUSET=y
# CONFIG_ACPI_CONTAINER is not set

I could mount the cpuset file system on /dev/cpuset just fine.

Then I invoked the following commands:

# cd /dev/cpuset
# mkdir foo
# cd foo
# echo 0-3 > cpus
# echo 0-1 > mems

At that point, the system hangs.  Reproduced three times, on two boots.
I never get a shell prompt back from that second echo.  I have to hit
Reset.  The three different hangs were done with the following three
different values:

echo 0-3 > mems
echo 0-1 > mems
echo 1 > mems

On that last one, "echo 1 > mems", I did not do the echo to cpus first.

The test system had 8 cpus, numbered 0-7, and 4 mems, numbered 0-3.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.21

2007-04-29 Thread Stefan Richter
I wrote:
> Joining duplicate reports at a mailinglist involves responding to
> multiple threads and send links into web archives of the list, which
> happens to be redundant to and disparate from your local e-mail storage.
> I can't see how this aspect of bug-handling works easier on mailinglists.

PS:  Of course what _does_ work better on mailinglists than on bugzilla
is to recognize duplicates as such in the first place, when the symptoms
seem only loosely related.  (I.e. seeing the big picture and recognize
patterns.)
-- 
Stefan Richter
-=-=-=== -=-- ===-=
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.21

2007-04-29 Thread Stefan Richter
Willy Tarreau wrote:
> On Sun, Apr 29, 2007 at 12:58:09AM +0200, Markus Rechberger wrote:
>> I totally disagree here, bugzilla is a very good tool. If someone is
>> too lazy to look at it it's his problem.
> 
> I'm glad we finally found _the_ person using it !
> 
> More seriously, it's so much a complicated interface ! It's hard to
> bring more people into a discussion, it's hard to comment on code or
> suggested patches, etc... Mail is by far more adapted to the job !

To continue on the sarcastic tangent:  This flaw of bugzilla is
irrelevant for subsystems where there are less than three or two persons
who steadily hunt bugs anyway.  At the field I work on, I wouldn't have
anybody else to bring in in the first place, except that I sometimes
suggest to reporters to subscribe to a bug ticket.
-- 
Stefan Richter
-=-=-=== -=-- ===-=
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v6

2007-04-29 Thread Con Kolivas
On Sunday 29 April 2007 18:00, Ingo Molnar wrote:
> * Willy Tarreau <[EMAIL PROTECTED]> wrote:
> > > > [...] except for Mike who has not tested recent versions. [...]
> > >
> > > actually, dont discount Mark Lord's test results either. And it
> > > might be a good idea for Mike to re-test SD 0.46?
> >
> > In any case, it might be a good idea because Mike encountered a
> > problem that nobody could reproduce. [...]
>
> actually, Mark Lord too reproduced something similar to Mike's results.
> Please try those workloads yourself.

I see no suggestion that either Mark or Mike have tested, or for that matter 
_have any intention of testing_, the current version of SD without fancy 
renicing or anything involved. Willy I grealy appreciate you trying, but I 
don't know why you're bothering even trying here since clearly 1. Ingo is the 
scheduler maintainer 2. he's working on a competing implementation and 3. in 
my excellent physical and mental state I seem to have slighted the two 
testers (both?) somewhere along the line. Mike feels his testing was a 
complete waste of time yet it would be ludicrous for me to say that SD didn't 
evolve 20 versions further due to his earlier testing, and was the impetus 
for you to start work on CFS. The crunch came that we couldn't agree that 
fair was appropriate for mainline and we parted ways. That fairness has not 
been a problem for his view on CFS though but he has only tested older 
versions of SD that still had bugs.

Given facts 1 and 2 above I have all but resigned myself to the fact that SD 
has -less than zero- chance of ever being considered for mainline and it's my 
job to use it as something to compare your competing design with to make sure 
that when (and I do mean when since there seems no doubt in everyone else's 
mind) CFS becomes part of mainline that it is as good as SD. Saying people 
found CFS better than SD is, in my humble opinion, an exaggeration since 
every one I could find was a glowing standalone report of CFS rather than any 
comparison to the current very stable bug free version of SD. On the other 
hand I still see that when people compare them side to side they find SD is 
better, so I will hold CFS against that comparison - when comparing fairness 
based designs.

On a related note - implementing a framework is nice but doesn't address any 
of the current fairness/starvation/corner case problems mainline has. I don't 
see much point in rushing the framework merging since it's still in flux.

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Fastboot] restoring x86 BIOS state before reboot

2007-04-29 Thread Bernhard Walle
* Eric W. Biederman <[EMAIL PROTECTED]> [2007-04-29 08:51]:
> 
> It may also be worth investigating if it is possible to bypass
> the part of windows that uses BIOS calls.  I really don't have
> a clue how a modern windows systems boots.

I think ReactOS has a bootlader that can also boot Windows, or at
least they are close to booting the Windows kernel. That bootloader
could be then used for booting Windows without BIOS calls.

I also heard that the 100-$-Laptop should now be able to use Windows.
As it's using LinuxBIOS, this could also be interesting. However,
maybe Microsoft simply modifies a special Windows for this, or they
integrate a ClosedSource part in that LinuxBIOS, I don't know ... :-(


Thanks,
   Bernhard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCI Express MMCONFIG and BIOS Bug messages..

2007-04-29 Thread Andi Kleen

> 
> I tried adapting a patch by Rajesh Shah to do this for current kernels:

The Intel patches checked against ACPI which also didn't work in all cases.

You're right the e820 check is overzealous and has a lot of false positives,
but it is the only generic way we know right now to handle a common i965 BIOS
bug. Also there is the nasty case of the Apple EFI boxes where only mmconfig
works which has to be handled too.

I expect eventually the logic to be:

- If we know the hardware: read it from hw registers; trust them; ignore BIOS.
- Otherwise check e820 and ACPI resources and be very trigger happy at not using
it

> It walks through all the motherboard resource devices and tries to pull 
> out the resource settings for all of them using the _CRS method. 

I tested it originally on a Intel system with the above BIOS problem
and it didn't help there.

> (Depending on how you do the probing, the _STA method is called as well, 
> either before or after.) From my limited ACPI knowledge, the problem is 
> that the PCI MMCONFIG initialization is called before the main ACPI 
> interpreter is enabled, and these control methods may try to access 
> operation regions who don't have handlers set up for them yet, so a 
> bunch of "no handler for region" errors show up.

mmconfig access can be switched later without problems; so it would
be possible to boot using Type1 if it works (e.g. detect the Apple case) 
and switch later.

It's all quite tricky unfortunately; that is why i left it at the current
relatively safe state for now. After all mmconfig is normally not needed.

> So essentially if we want to do this check based on ACPI resource 
> reservations, we need to be able to execute control methods at the point 
> that MMCONFIG is set up. Is there a reason why this can't be made 
> possible (like by moving the necessary parts of ACPI initialization 
> earlier)?

ACPI Interpreter wants to allocate memory and use other kernel services that
are not available in really early boot. It could be probably done somehow,
but would be quite ugly with lots of special cases.

-Andi
 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v6

2007-04-29 Thread Mike Galbraith
On Sun, 2007-04-29 at 19:52 +1000, Con Kolivas wrote:
> On Sunday 29 April 2007 18:00, Ingo Molnar wrote:
> > * Willy Tarreau <[EMAIL PROTECTED]> wrote:
> > > > > [...] except for Mike who has not tested recent versions. [...]
> > > >
> > > > actually, dont discount Mark Lord's test results either. And it
> > > > might be a good idea for Mike to re-test SD 0.46?
> > >
> > > In any case, it might be a good idea because Mike encountered a
> > > problem that nobody could reproduce. [...]
> >
> > actually, Mark Lord too reproduced something similar to Mike's results.
> > Please try those workloads yourself.
> 
> I see no suggestion that either Mark or Mike have tested, or for that matter 
> _have any intention of testing_, the current version of SD without fancy 
> renicing or anything involved. Willy I grealy appreciate you trying, but I 
> don't know why you're bothering even trying here since clearly 1. Ingo is the 
> scheduler maintainer 2. he's working on a competing implementation and 3. in 
> my excellent physical and mental state I seem to have slighted the two 
> testers (both?) somewhere along the line. Mike feels his testing was a 
> complete waste of time yet it would be ludicrous for me to say that SD didn't 
> evolve 20 versions further due to his earlier testing, and was the impetus 
> for you to start work on CFS. The crunch came that we couldn't agree that 
> fair was appropriate for mainline and we parted ways. That fairness has not 
> been a problem for his view on CFS though but he has only tested older 
> versions of SD that still had bugs.

The crunch for me came when you started hand-waving and spin-doctoring
as you are doing now.  Listening to twisted echoes of my voice is not my
idea of a good time.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v6

2007-04-29 Thread Thomas Gleixner
Willy,

On Sun, 2007-04-29 at 09:16 +0200, Willy Tarreau wrote:
> In fact, what I'd like to see in 2.6.22 is something better for everybody
> and with *no* regression, even if it's not perfect. I had the feeling
> that SD matched that goal right now, except for Mike who has not tested
> recent versions. Don't get me wrong, I still think that CFS is a more
> interesting long-term target. But it may require more time to satisfy
> everyone. At least with one of them in 2.6.22, we won't waste time
> comparing to current mainline.

Oh no, we really do _NOT_ want to throw SD or anything else at mainline
in a hurry just for not wasting time on comparing to the current
scheduler.

I agree that CFS is the more interesting target and I prefer to push the
more interesting one even if it takes a release cycle longer. The main
reason for me is the design of CFS. Even if it is not really modular
right now, it is not rocket science to make it fully modular.

Looking at the areas where people work on, e.g. containers, resource
management, cpu isolation, fully tickless systems , we really need
to go into that direction, when we want to avoid permanent tinkering in
the core scheduler code for the next five years.

As a sidenote: I really wonder if anybody noticed yet, that the whole
CFS / SD comparison is so ridiculous, that it is not even funny anymore.
CFS modifies the scheduler and nothing else, SD fiddles all over the
kernel in interesting ways. 

This is worse than apples and oranges, it's more like apples and
screwdrivers. 

Can we please stop this useless pissing contest and sit down and get a
modular design into mainline, which allows folks to work and integrate
their "workload X perfect scheduler" and gives us the flexibility to
adjust to the needs of upcoming functionality.

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v6

2007-04-29 Thread William Lee Irwin III
On Sun, Apr 29, 2007 at 12:30:54PM +0200, Thomas Gleixner wrote:
> Can we please stop this useless pissing contest and sit down and get a
> modular design into mainline, which allows folks to work and integrate
> their "workload X perfect scheduler" and gives us the flexibility to
> adjust to the needs of upcoming functionality.

If I don't see some sort of modularity patch soon I'll post one myself.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.22 patch] the scheduled removal of the i8xx_tco watchdog driver

2007-04-29 Thread Wim Van Sebroeck
Hi Adrian,

> I might be blind, but I'm not seeing what should be removed.

[EMAIL PROTECTED] linux-2.6]$ grep -i tco Documentation/watchdog/*
Documentation/watchdog/watchdog-api.txt:i810-tco.c -- Intel 810 chipset
Documentation/watchdog/watchdog.txt:The i810 TCO watchdog modules can be 
configured with the "i810_margin"
Documentation/watchdog/watchdog.txt:The i810 TCO watchdog driver also 
implements the WDIOC_GETSTATUS and
Documentation/watchdog/watchdog.txt:and WDIOC_GETBOOTSTATUS returns the value 
of TCO2 Status Register (see Intel's
Documentation/watchdog/watchdog.txt:WDT501P WDT500P 
SoftwareBerkshire   i810 TCOSA1100WD

But I need to clean this up anyway. So no real need to put this in the same 
patch yet.
I'll create a new patch for this that deals with the complete watchdog 
Documentation.

Greetings,
Wim.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[2.6.21-rc7-mm2] BUG while suspend to ram

2007-04-29 Thread Maciej Rutecki
BUG: at kernel/kthread.c:166 kthread_bind()
 [] _cpu_down+0x16c/0x250
 [] disable_nonboot_cpus+0x60/0xf0
 [] pm_suspend_disk+0x177/0x2c0
 [] enter_state+0xb5/0x200
 [] state_store+0xbd/0xd0
 [] state_store+0x0/0xd0
 [] subsys_attr_store+0x29/0x40
 [] sysfs_write_file+0xd4/0x160
 [] vfs_write+0xc1/0x160
 [] sysfs_write_file+0x0/0x160
 [] sys_write+0x41/0x70
 [] sys_dup2+0xd5/0x100
 [] sysenter_past_esp+0x5f/0x85
 [] xfrm_policy_insert+0x210/0x400
 ===

dmesg:
http://www.unixy.pl/maciek/download/kernel/2.6.21-rc7--mm2/dmesg.txt.gz
lsmod:
http://www.unixy.pl/maciek/download/kernel/2.6.21-rc7--mm2/lsmod.txt.gz
ver_linux:
http://www.unixy.pl/maciek/download/kernel/2.6.21-rc7--mm2/ver_linux.txt.gz
lspci:
http://www.unixy.pl/maciek/download/kernel/2.6.21-rc7--mm2/lspci.txt.gz
config:
http://www.unixy.pl/maciek/download/kernel/2.6.21-rc7--mm2/config-2.6.21-rc7-mm2.gz

-- 
Maciej Rutecki <[EMAIL PROTECTED]>
www.unixy.pl
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [patch] CFS scheduler, -v6

2007-04-29 Thread Kasper Sandberg
On Sun, 2007-04-29 at 12:30 +0200, Thomas Gleixner wrote:
> Willy,

> As a sidenote: I really wonder if anybody noticed yet, that the whole
> CFS / SD comparison is so ridiculous, that it is not even funny anymore.
> CFS modifies the scheduler and nothing else, SD fiddles all over the
> kernel in interesting ways. 
> 

have you looked at diffstat lately? :)

sd:
 Documentation/sched-design.txt  |  241 +++
 Documentation/sysctl/kernel.txt |   14
 Makefile|2
 fs/pipe.c   |7
 fs/proc/array.c |2
 include/linux/init_task.h   |4
 include/linux/sched.h   |   32 -
 kernel/sched.c  | 1279
+++-
 kernel/softirq.c|2
 kernel/sysctl.c |   26
 kernel/workqueue.c  |2
 11 files changed, 919 insertions(+), 692 deletions(-)

cfs:
 Documentation/kernel-parameters.txt |   43
 Documentation/sched-design-CFS.txt  |  107 +
 Makefile|2
 arch/i386/kernel/smpboot.c  |   13
 arch/i386/kernel/tsc.c  |8
 arch/ia64/kernel/setup.c|6
 arch/mips/kernel/smp.c  |   11
 arch/sparc/kernel/smp.c |   10
 arch/sparc64/kernel/smp.c   |   36
 fs/proc/array.c |   11
 fs/proc/base.c  |2
 fs/proc/internal.h  |1
 include/asm-i386/unistd.h   |3
 include/asm-x86_64/unistd.h |4
 include/linux/hardirq.h |   13
 include/linux/sched.h   |   94 +
 init/main.c |2
 kernel/exit.c   |3
 kernel/fork.c   |4
 kernel/posix-cpu-timers.c   |   34
 kernel/sched.c  | 2288
+---
 kernel/sched_debug.c|  152 ++
 kernel/sched_fair.c |  601 +
 kernel/sched_rt.c   |  184 ++
 kernel/sched_stats.h|  235 +++
 kernel/sysctl.c |   32
 26 files changed, 2062 insertions(+), 1837 deletions(-)


> This is worse than apples and oranges, it's more like apples and
> screwdrivers. 

> 
>   tglx
> 
> 
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [3/48] x86_64: Use new shared sched_clock in x86-64 too

2007-04-29 Thread Andi Kleen

Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

---
 arch/x86_64/kernel/Makefile |3 ++-
 arch/x86_64/kernel/time.c   |1 -
 arch/x86_64/kernel/tsc.c|   28 
 include/asm-x86_64/timer.h  |1 +
 include/asm-x86_64/timex.h  |1 -
 5 files changed, 3 insertions(+), 31 deletions(-)

Index: linux/arch/x86_64/kernel/Makefile
===
--- linux.orig/arch/x86_64/kernel/Makefile
+++ linux/arch/x86_64/kernel/Makefile
@@ -8,7 +8,7 @@ obj-y   := process.o signal.o entry.o trap
ptrace.o time.o ioport.o ldt.o setup.o i8259.o sys_x86_64.o \
x8664_ksyms.o i387.o syscall.o vsyscall.o \
setup64.o bootflag.o e820.o reboot.o quirks.o i8237.o \
-   pci-dma.o pci-nommu.o alternative.o hpet.o tsc.o
+   pci-dma.o pci-nommu.o alternative.o hpet.o tsc.o sched-clock.o
 
 obj-$(CONFIG_STACKTRACE)   += stacktrace.o
 obj-$(CONFIG_X86_MCE)  += mce.o therm_throt.o
@@ -57,3 +57,4 @@ i8237-y   += 
../../i386/kernel/i8237.o
 msr-$(subst m,y,$(CONFIG_X86_MSR))  += ../../i386/kernel/msr.o
 alternative-y  += ../../i386/kernel/alternative.o
 pcspeaker-y+= ../../i386/kernel/pcspeaker.o
+sched-clock-y  += ../../i386/kernel/sched-clock.o
Index: linux/arch/x86_64/kernel/tsc.c
===
--- linux.orig/arch/x86_64/kernel/tsc.c
+++ linux/arch/x86_64/kernel/tsc.c
@@ -16,32 +16,6 @@ EXPORT_SYMBOL(cpu_khz);
 unsigned int tsc_khz;
 EXPORT_SYMBOL(tsc_khz);
 
-static unsigned int cyc2ns_scale __read_mostly;
-
-void set_cyc2ns_scale(unsigned long khz)
-{
-   cyc2ns_scale = (NSEC_PER_MSEC << NS_SCALE) / khz;
-}
-
-static unsigned long long cycles_2_ns(unsigned long long cyc)
-{
-   return (cyc * cyc2ns_scale) >> NS_SCALE;
-}
-
-unsigned long long sched_clock(void)
-{
-   unsigned long a = 0;
-
-   /* Could do CPU core sync here. Opteron can execute rdtsc speculatively,
-* which means it is not completely exact and may not be monotonous
-* between CPUs. But the errors should be too small to matter for
-* scheduling purposes.
-*/
-
-   rdtscll(a);
-   return cycles_2_ns(a);
-}
-
 static int tsc_unstable;
 
 static inline int check_tsc_unstable(void)
@@ -114,8 +88,6 @@ static int time_cpufreq_notifier(struct 
mark_tsc_unstable();
}
 
-   set_cyc2ns_scale(tsc_khz_ref);
-
return 0;
 }
 
Index: linux/include/asm-x86_64/timer.h
===
--- /dev/null
+++ linux/include/asm-x86_64/timer.h
@@ -0,0 +1 @@
+#define get_scheduled_cycles(x) rdtscll(x)
Index: linux/arch/x86_64/kernel/time.c
===
--- linux.orig/arch/x86_64/kernel/time.c
+++ linux/arch/x86_64/kernel/time.c
@@ -404,7 +404,6 @@ void __init time_init(void)
else
vgetcpu_mode = VGETCPU_LSL;
 
-   set_cyc2ns_scale(tsc_khz);
printk(KERN_INFO "time.c: Detected %d.%03d MHz processor.\n",
cpu_khz / 1000, cpu_khz % 1000);
init_tsc_clocksource();
Index: linux/include/asm-x86_64/timex.h
===
--- linux.orig/include/asm-x86_64/timex.h
+++ linux/include/asm-x86_64/timex.h
@@ -28,5 +28,4 @@ extern int read_current_timer(unsigned l
 #define US_SCALE32 /* 2^32, arbitralrily chosen */
 
 extern void mark_tsc_unstable(void);
-extern void set_cyc2ns_scale(unsigned long khz);
 #endif
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [0/48] x86 candidate patches for review III: various stuff

2007-04-29 Thread Andi Kleen

- Rewritten all dancing sched_clock()
- Extended numa emulation support from David Rientjes; now the sizes
of the emulated nodes can be configured on the command line.
- Faster vgettimeofday from Eric Dumazet
- GDT cleanups from Rusty
- cpa() fixes and better kernel protection from Jan Beulich
- PGD handling cleanup from Christoph Lameter
- Various other changes
- Lots of minor cleanups from various people

Please review.

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [28/48] i386: prevent ACPI quirk warning mass spamming in logs

2007-04-29 Thread Andi Kleen

From: Thierry Vignaud <[EMAIL PROTECTED]>

The following patch prevent this warning to be displayed again & again (eg:
nine times on my NForce2 motherboard) and thus improve signal to noise
ratio in logs.

The ATI quirk below probably needs a similar "fix" but I don't have
the hardware to test.

Btw arch/x86_64/kernel/early-quirks.c::nvidia_bugs() would probably need to
be synced (but I don't have an x86_64 NVidia motherboard to boot test it). 
Still it shows the usefullity of the recent x86 merge thread.

[EMAIL PROTECTED]: cleanup]
Signed-off-by: Thierry Vignaud <[EMAIL PROTECTED]>
Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Cc: Len Brown <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 arch/i386/kernel/acpi/earlyquirk.c |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

Index: linux/arch/i386/kernel/acpi/earlyquirk.c
===
--- linux.orig/arch/i386/kernel/acpi/earlyquirk.c
+++ linux/arch/i386/kernel/acpi/earlyquirk.c
@@ -21,11 +21,14 @@ static int __init nvidia_hpet_check(stru
 
 static int __init check_bridge(int vendor, int device)
 {
+   static int warned;
 #ifdef CONFIG_ACPI
/* According to Nvidia all timer overrides are bogus unless HPET
   is enabled. */
if (!acpi_use_timer_override && vendor == PCI_VENDOR_ID_NVIDIA) {
-   if (acpi_table_parse(ACPI_SIG_HPET, nvidia_hpet_check)) {
+   if (!warned && acpi_table_parse(ACPI_SIG_HPET,
+   nvidia_hpet_check)) {
+   warned = 1;
acpi_skip_timer_override = 1;
  printk(KERN_INFO "Nvidia board "
"detected. Ignoring ACPI "
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [30/48] i386: Use per-cpu variables for GDT, PDA

2007-04-29 Thread Andi Kleen

From: Rusty Russell <[EMAIL PROTECTED]>

Allocating PDA and GDT at boot is a pain.  Using simple per-cpu variables adds
happiness (although we need the GDT page-aligned for Xen, which we do in a
followup patch).

[EMAIL PROTECTED]: build fix]
Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 arch/i386/kernel/cpu/common.c|   94 ---
 arch/i386/kernel/smpboot.c   |   21 ---
 arch/i386/mach-voyager/voyager_smp.c |   10 ---
 include/asm-generic/percpu.h |1 
 include/asm-i386/desc.h  |1 
 include/asm-i386/pda.h   |7 +-
 include/asm-i386/processor.h |2 
 7 files changed, 21 insertions(+), 115 deletions(-)

Index: linux/arch/i386/kernel/cpu/common.c
===
--- linux.orig/arch/i386/kernel/cpu/common.c
+++ linux/arch/i386/kernel/cpu/common.c
@@ -25,8 +25,10 @@
 DEFINE_PER_CPU(struct Xgt_desc_struct, cpu_gdt_descr);
 EXPORT_PER_CPU_SYMBOL(cpu_gdt_descr);
 
-struct i386_pda *_cpu_pda[NR_CPUS] __read_mostly;
-EXPORT_SYMBOL(_cpu_pda);
+DEFINE_PER_CPU(struct desc_struct, cpu_gdt[GDT_ENTRIES]);
+
+DEFINE_PER_CPU(struct i386_pda, _cpu_pda);
+EXPORT_PER_CPU_SYMBOL(_cpu_pda);
 
 static int cachesize_override __cpuinitdata = -1;
 static int disable_x86_fxsr __cpuinitdata;
@@ -609,52 +611,6 @@ struct pt_regs * __devinit idle_regs(str
return regs;
 }
 
-static __cpuinit int alloc_gdt(int cpu)
-{
-   struct Xgt_desc_struct *cpu_gdt_descr = &per_cpu(cpu_gdt_descr, cpu);
-   struct desc_struct *gdt;
-   struct i386_pda *pda;
-
-   gdt = (struct desc_struct *)cpu_gdt_descr->address;
-   pda = cpu_pda(cpu);
-
-   /*
-* This is a horrible hack to allocate the GDT.  The problem
-* is that cpu_init() is called really early for the boot CPU
-* (and hence needs bootmem) but much later for the secondary
-* CPUs, when bootmem will have gone away
-*/
-   if (NODE_DATA(0)->bdata->node_bootmem_map) {
-   BUG_ON(gdt != NULL || pda != NULL);
-
-   gdt = alloc_bootmem_pages(PAGE_SIZE);
-   pda = alloc_bootmem(sizeof(*pda));
-   /* alloc_bootmem(_pages) panics on failure, so no check */
-
-   memset(gdt, 0, PAGE_SIZE);
-   memset(pda, 0, sizeof(*pda));
-   } else {
-   /* GDT and PDA might already have been allocated if
-  this is a CPU hotplug re-insertion. */
-   if (gdt == NULL)
-   gdt = (struct desc_struct *)get_zeroed_page(GFP_KERNEL);
-
-   if (pda == NULL)
-   pda = kmalloc_node(sizeof(*pda), GFP_KERNEL, 
cpu_to_node(cpu));
-
-   if (unlikely(!gdt || !pda)) {
-   free_pages((unsigned long)gdt, 0);
-   kfree(pda);
-   return 0;
-   }
-   }
-
-   cpu_gdt_descr->address = (unsigned long)gdt;
-   cpu_pda(cpu) = pda;
-
-   return 1;
-}
-
 /* Initial PDA used by boot CPU */
 struct i386_pda boot_pda = {
._pda = &boot_pda,
@@ -670,31 +626,17 @@ static inline void set_kernel_fs(void)
asm volatile ("mov %0, %%fs" : : "r" (__KERNEL_PDA) : "memory");
 }
 
-/* Initialize the CPU's GDT and PDA.  The boot CPU does this for
-   itself, but secondaries find this done for them. */
-__cpuinit int init_gdt(int cpu, struct task_struct *idle)
+/* Initialize the CPU's GDT and PDA.  This is either the boot CPU doing itself
+   (still using cpu_gdt_table), or a CPU doing it for a secondary which
+   will soon come up. */
+__cpuinit void init_gdt(int cpu, struct task_struct *idle)
 {
struct Xgt_desc_struct *cpu_gdt_descr = &per_cpu(cpu_gdt_descr, cpu);
-   struct desc_struct *gdt;
-   struct i386_pda *pda;
-
-   /* For non-boot CPUs, the GDT and PDA should already have been
-  allocated. */
-   if (!alloc_gdt(cpu)) {
-   printk(KERN_CRIT "CPU%d failed to allocate GDT or PDA\n", cpu);
-   return 0;
-   }
-
-   gdt = (struct desc_struct *)cpu_gdt_descr->address;
-   pda = cpu_pda(cpu);
-
-   BUG_ON(gdt == NULL || pda == NULL);
+   struct desc_struct *gdt = per_cpu(cpu_gdt, cpu);
+   struct i386_pda *pda = &per_cpu(_cpu_pda, cpu);
 
-   /*
-* Initialize the per-CPU GDT with the boot GDT,
-* and set up the GDT descriptor:
-*/
memcpy(gdt, cpu_gdt_table, GDT_SIZE);
+   cpu_gdt_descr->address = (unsigned long)gdt;
cpu_gdt_descr->size = GDT_SIZE - 1;
 
pack_descriptor((u32 *)&gdt[GDT_ENTRY_PDA].a,
@@ -706,17 +648,12 @@ __cpuinit int init_gdt(int cpu, struct t
pda->_pda = pda;
pda->cpu_number = cpu;
pda->pcurrent = idle;
-
-   return 1;
 }
 
 void __cpuinit c

[PATCH] [40/48] x86_64: use lru instead of page->index and page->private for pgd lists management.

2007-04-29 Thread Andi Kleen

From: Christoph Lameter <[EMAIL PROTECTED]>

x86_64 currently simulates a list using the index and private fields of the
page struct.  Seems that the code was inherited from i386.  But x86_64 does
not use the slab to allocate pgds and pmds etc.  So the lru field is not
used by the slab and therefore available.

This patch uses standard list operations on page->lru to realize pgd
tracking.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 arch/x86_64/mm/fault.c   |5 ++---
 include/asm-x86_64/pgalloc.h |   14 +++---
 include/asm-x86_64/pgtable.h |2 +-
 3 files changed, 6 insertions(+), 15 deletions(-)

Index: linux/arch/x86_64/mm/fault.c
===
--- linux.orig/arch/x86_64/mm/fault.c
+++ linux/arch/x86_64/mm/fault.c
@@ -585,7 +585,7 @@ do_sigbus:
 }
 
 DEFINE_SPINLOCK(pgd_lock);
-struct page *pgd_list;
+LIST_HEAD(pgd_list);
 
 void vmalloc_sync_all(void)
 {
@@ -605,8 +605,7 @@ void vmalloc_sync_all(void)
if (pgd_none(*pgd_ref))
continue;
spin_lock(&pgd_lock);
-   for (page = pgd_list; page;
-page = (struct page *)page->index) {
+   list_for_each_entry(page, &pgd_list, lru) {
pgd_t *pgd;
pgd = (pgd_t *)page_address(page) + 
pgd_index(address);
if (pgd_none(*pgd))
Index: linux/include/asm-x86_64/pgalloc.h
===
--- linux.orig/include/asm-x86_64/pgalloc.h
+++ linux/include/asm-x86_64/pgalloc.h
@@ -44,24 +44,16 @@ static inline void pgd_list_add(pgd_t *p
struct page *page = virt_to_page(pgd);
 
spin_lock(&pgd_lock);
-   page->index = (pgoff_t)pgd_list;
-   if (pgd_list)
-   pgd_list->private = (unsigned long)&page->index;
-   pgd_list = page;
-   page->private = (unsigned long)&pgd_list;
+   list_add(&page->lru, &pgd_list);
spin_unlock(&pgd_lock);
 }
 
 static inline void pgd_list_del(pgd_t *pgd)
 {
-   struct page *next, **pprev, *page = virt_to_page(pgd);
+   struct page *page = virt_to_page(pgd);
 
spin_lock(&pgd_lock);
-   next = (struct page *)page->index;
-   pprev = (struct page **)page->private;
-   *pprev = next;
-   if (next)
-   next->private = (unsigned long)pprev;
+   list_del(&page->lru);
spin_unlock(&pgd_lock);
 }
 
Index: linux/include/asm-x86_64/pgtable.h
===
--- linux.orig/include/asm-x86_64/pgtable.h
+++ linux/include/asm-x86_64/pgtable.h
@@ -410,7 +410,7 @@ static inline pte_t pte_modify(pte_t pte
 #define __swp_entry_to_pte(x)  ((pte_t) { (x).val })
 
 extern spinlock_t pgd_lock;
-extern struct page *pgd_list;
+extern struct list_head pgd_list;
 void vmalloc_sync_all(void);
 
 extern int kern_addr_valid(unsigned long addr); 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [38/48] x86_64: Remove unused stext symbol

2007-04-29 Thread Andi Kleen

suggested by Jan Beulich

Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

---
 arch/x86_64/kernel/head.S |1 -
 1 file changed, 1 deletion(-)

Index: linux/arch/x86_64/kernel/head.S
===
--- linux.orig/arch/x86_64/kernel/head.S
+++ linux/arch/x86_64/kernel/head.S
@@ -279,7 +279,6 @@ early_idt_ripmsg:
.asciz "RIP %s\n"
 
 .balign PAGE_SIZE
-ENTRY(stext)
 
 #define NEXT_PAGE(name) \
.balign PAGE_SIZE; \
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [39/48] i386: remove the APM_RTC_IS_GMT config option.

2007-04-29 Thread Andi Kleen

From: "Parag Warudkar" <[EMAIL PROTECTED]>

Signed-off-by: Parag Warudkar <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

---

 arch/i386/Kconfig |   13 -
 1 file changed, 13 deletions(-)

Index: linux/arch/i386/Kconfig
===
--- linux.orig/arch/i386/Kconfig
+++ linux/arch/i386/Kconfig
@@ -1029,19 +1029,6 @@ config APM_DISPLAY_BLANK
  backlight at all, or it might print a lot of errors to the console,
  especially if you are using gpm.
 
-config APM_RTC_IS_GMT
-   bool "RTC stores time in GMT"
-   depends on APM
-   help
- Say Y here if your RTC (Real Time Clock a.k.a. hardware clock)
- stores the time in GMT (Greenwich Mean Time). Say N if your RTC
- stores localtime.
-
- It is in fact recommended to store GMT in your RTC, because then you
- don't have to worry about daylight savings time changes. The only
- reason not to use GMT in your RTC is if you also run a broken OS
- that doesn't understand GMT.
-
 config APM_ALLOW_INTS
bool "Allow interrupts during APM BIOS calls"
depends on APM
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [45/48] x86_64: Inhibit machine from asserting an NMI when doing Alt-SysRq-M operation.

2007-04-29 Thread Andi Kleen

From: Konrad Rzeszutek <[EMAIL PROTECTED]>
This patch touches the NMI watchdog every MAX_ORDER_NR_PAGES
to inhibit the machine from triggering an NMI while the CPUs
are locked. This situation is happening on boxes with more 
than 64CPUs and 128GB of RAM when Alt-SysRq-m is performed.

It has been succesfully tested for regression on uni, 2, 4, 8 
32, and 64 CPU boxes with various memory configuration.

Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

---
 arch/x86_64/mm/init.c |6 ++
 1 file changed, 6 insertions(+)

Index: linux/arch/x86_64/mm/init.c
===
--- linux.orig/arch/x86_64/mm/init.c
+++ linux/arch/x86_64/mm/init.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -73,6 +74,11 @@ void show_mem(void)
 
for_each_online_pgdat(pgdat) {
for (i = 0; i < pgdat->node_spanned_pages; ++i) {
+   /* this loop can take a while with 256 GB and 4k pages
+  so update the NMI watchdog */
+   if (unlikely(i % MAX_ORDER_NR_PAGES == 0)) {
+   touch_nmi_watchdog();
+   }
page = pfn_to_page(pgdat->node_start_pfn + i);
total++;
if (PageReserved(page))
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [36/48] i386: get rid of unused variables

2007-04-29 Thread Andi Kleen

From: Parag Warudkar <[EMAIL PROTECTED]>

Signed-off-by: Parag Warudkar <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

---

 arch/i386/kernel/apm.c |7 ---
 1 file changed, 7 deletions(-)

Index: linux/arch/i386/kernel/apm.c
===
--- linux.orig/arch/i386/kernel/apm.c
+++ linux/arch/i386/kernel/apm.c
@@ -384,13 +384,6 @@ static int ignore_sys_suspend;
 static int ignore_normal_resume;
 static int bounce_interval __read_mostly = 
DEFAULT_BOUNCE_INTERVAL;
 
-#ifdef CONFIG_APM_RTC_IS_GMT
-#  define  clock_cmos_diff 0
-#  define  got_clock_diff  1
-#else
-static longclock_cmos_diff;
-static int got_clock_diff;
-#endif
 static int debug __read_mostly;
 static int smp __read_mostly;
 static int apm_disabled = -1;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [43/48] x86_64: fix vtime() vsyscall

2007-04-29 Thread Andi Kleen

From: Eric Dumazet <[EMAIL PROTECTED]>

There is a tiny probability that the return value from vtime(time_t *t) is 
Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

different than the value stored in *t

Using a temporary variable solves the problem and gives a faster code.

   17:   48 85 fftest   %rdi,%rdi
   1a:   48 8b 05 00 00 00 00mov0(%rip),%rax# 
__vsyscall_gtod_data.wall_time_tv.tv_sec
   21:   74 03   je 26
   23:   48 89 07mov%rax,(%rdi)
   26:   c9  leaveq
   27:   c3  retq

Signed-off-by: Eric Dumazet <[EMAIL PROTECTED]>


---
 arch/x86_64/kernel/vsyscall.c |8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

Index: linux/arch/x86_64/kernel/vsyscall.c
===
--- linux.orig/arch/x86_64/kernel/vsyscall.c
+++ linux/arch/x86_64/kernel/vsyscall.c
@@ -156,11 +156,13 @@ int __vsyscall(0) vgettimeofday(struct t
  * unlikely */
 time_t __vsyscall(1) vtime(time_t *t)
 {
+   time_t result;
if (unlikely(!__vsyscall_gtod_data.sysctl_enabled))
return time_syscall(t);
-   else if (t)
-   *t = __vsyscall_gtod_data.wall_time_tv.tv_sec;
-   return __vsyscall_gtod_data.wall_time_tv.tv_sec;
+   result = __vsyscall_gtod_data.wall_time_tv.tv_sec;
+   if (t)
+   *t = result;
+   return result;
 }
 
 /* Fast way to get current CPU and node.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [32/48] i386: clean up cpu_init()

2007-04-29 Thread Andi Kleen

From: Rusty Russell <[EMAIL PROTECTED]>

We now have cpu_init() and secondary_cpu_init() doing nothing but calling
_cpu_init() with the same arguments.  Rename _cpu_init() to cpu_init() and use
it as a replcement for secondary_cpu_init().

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 arch/i386/kernel/cpu/common.c |   34 +-
 arch/i386/kernel/smpboot.c|8 
 include/asm-i386/processor.h  |2 +-
 3 files changed, 14 insertions(+), 30 deletions(-)

Index: linux/arch/i386/kernel/cpu/common.c
===
--- linux.orig/arch/i386/kernel/cpu/common.c
+++ linux/arch/i386/kernel/cpu/common.c
@@ -644,9 +644,16 @@ struct i386_pda boot_pda = {
.pcurrent = &init_task,
 };
 
-/* Common CPU init for both boot and secondary CPUs */
-static void __cpuinit _cpu_init(int cpu, struct task_struct *curr)
+/*
+ * cpu_init() initializes state that is per-CPU. Some data is already
+ * initialized (naturally) in the bootstrap process, such as the GDT
+ * and IDT. We reload them nevertheless, this function acts as a
+ * 'CPU state barrier', nothing should get across.
+ */
+void __cpuinit cpu_init(void)
 {
+   int cpu = smp_processor_id();
+   struct task_struct *curr = current;
struct tss_struct * t = &per_cpu(init_tss, cpu);
struct thread_struct *thread = &curr->thread;
 
@@ -706,29 +713,6 @@ static void __cpuinit _cpu_init(int cpu,
mxcsr_feature_mask_init();
 }
 
-/* Entrypoint to initialize secondary CPU */
-void __cpuinit secondary_cpu_init(void)
-{
-   int cpu = smp_processor_id();
-   struct task_struct *curr = current;
-
-   _cpu_init(cpu, curr);
-}
-
-/*
- * cpu_init() initializes state that is per-CPU. Some data is already
- * initialized (naturally) in the bootstrap process, such as the GDT
- * and IDT. We reload them nevertheless, this function acts as a
- * 'CPU state barrier', nothing should get across.
- */
-void __cpuinit cpu_init(void)
-{
-   int cpu = smp_processor_id();
-   struct task_struct *curr = current;
-
-   _cpu_init(cpu, curr);
-}
-
 #ifdef CONFIG_HOTPLUG_CPU
 void __cpuinit cpu_uninit(void)
 {
Index: linux/arch/i386/kernel/smpboot.c
===
--- linux.orig/arch/i386/kernel/smpboot.c
+++ linux/arch/i386/kernel/smpboot.c
@@ -378,14 +378,14 @@ set_cpu_sibling_map(int cpu)
 static void __cpuinit start_secondary(void *unused)
 {
/*
-* Don't put *anything* before secondary_cpu_init(), SMP
-* booting is too fragile that we want to limit the
-* things done here to the most necessary things.
+* Don't put *anything* before cpu_init(), SMP booting is too
+* fragile that we want to limit the things done here to the
+* most necessary things.
 */
 #ifdef CONFIG_VMI
vmi_bringup();
 #endif
-   secondary_cpu_init();
+   cpu_init();
preempt_disable();
smp_callin();
while (!cpu_isset(smp_processor_id(), smp_commenced_mask))
Index: linux/include/asm-i386/processor.h
===
--- linux.orig/include/asm-i386/processor.h
+++ linux/include/asm-i386/processor.h
@@ -744,6 +744,6 @@ extern void enable_sep_cpu(void);
 extern int sysenter_setup(void);
 
 extern void cpu_set_gdt(int);
-extern void secondary_cpu_init(void);
+extern void cpu_init(void);
 
 #endif /* __ASM_I386_PROCESSOR_H */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [33/48] i386: Rename boot_gdt_table to boot_gdt

2007-04-29 Thread Andi Kleen

From: Sebastien Dugue <[EMAIL PROTECTED]>

Rename boot_gdt_table to boot_gdt to avoid the duplicate T(able).

Signed-off-by: Sebastien Dugue <[EMAIL PROTECTED]>
Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>
Acked-by: Rusty Russell <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 arch/i386/kernel/head.S   |9 -
 arch/i386/kernel/trampoline.S |   12 ++--
 2 files changed, 10 insertions(+), 11 deletions(-)

Index: linux/arch/i386/kernel/head.S
===
--- linux.orig/arch/i386/kernel/head.S
+++ linux/arch/i386/kernel/head.S
@@ -147,8 +147,7 @@ page_pde_offset = (__PAGE_OFFSET >> 20);
 /*
  * Non-boot CPU entry point; entered from trampoline.S
  * We can't lgdt here, because lgdt itself uses a data segment, but
- * we know the trampoline has already loaded the boot_gdt_table GDT
- * for us.
+ * we know the trampoline has already loaded the boot_gdt for us.
  *
  * If cpu hotplug is not supported then this code can go in init section
  * which will be freed later
@@ -588,7 +587,7 @@ fault_msg:
.word 0 # 32 bit align gdt_desc.address
 boot_gdt_descr:
.word __BOOT_DS+7
-   .long boot_gdt_table - __PAGE_OFFSET
+   .long boot_gdt - __PAGE_OFFSET
 
.word 0 # 32-bit align idt_desc.address
 idt_descr:
@@ -602,11 +601,11 @@ ENTRY(early_gdt_descr)
.long per_cpu__cpu_gdt  /* Overwritten for secondary CPUs */
 
 /*
- * The boot_gdt_table must mirror the equivalent in setup.S and is
+ * The boot_gdt must mirror the equivalent in setup.S and is
  * used only for booting.
  */
.align L1_CACHE_BYTES
-ENTRY(boot_gdt_table)
+ENTRY(boot_gdt)
.fill GDT_ENTRY_BOOT_CS,8,0
.quad 0x00cf9a00/* kernel 4GB code at 0x */
.quad 0x00cf9200/* kernel 4GB data at 0x */
Index: linux/arch/i386/kernel/trampoline.S
===
--- linux.orig/arch/i386/kernel/trampoline.S
+++ linux/arch/i386/kernel/trampoline.S
@@ -29,7 +29,7 @@
  *
  * TYPE  VALUE
  * R_386_32  startup_32_smp
- * R_386_32  boot_gdt_table
+ * R_386_32  boot_gdt
  */
 
 #include 
@@ -62,8 +62,8 @@ r_base = .
 * to 32 bit.
 */
 
-   lidtl   boot_idt - r_base   # load idt with 0, 0
-   lgdtl   boot_gdt - r_base   # load gdt with whatever is appropriate
+   lidtl   boot_idt_descr - r_base # load idt with 0, 0
+   lgdtl   boot_gdt_descr - r_base # load gdt with whatever is appropriate
 
xor %ax, %ax
inc %ax # protected mode (PE) bit
@@ -73,11 +73,11 @@ r_base = .
 
# These need to be in the same 64K segment as the above;
# hence we don't use the boot_gdt_descr defined in head.S
-boot_gdt:
+boot_gdt_descr:
.word   __BOOT_DS + 7   # gdt limit
-   .long   boot_gdt_table-__PAGE_OFFSET# gdt base
+   .long   boot_gdt - __PAGE_OFFSET# gdt base
 
-boot_idt:
+boot_idt_descr:
.word   0   # idt limit = 0
.long   0   # idt base = 0L
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [31/48] i386: Use per-cpu GDT immediately upon boot

2007-04-29 Thread Andi Kleen

From: Rusty Russell <[EMAIL PROTECTED]>

Now we are no longer dynamically allocating the GDT, we don't need the
"cpu_gdt_table" at all: we can switch straight from "boot_gdt_table" to the
per-cpu GDT.  This means initializing the cpu_gdt array in C.

The boot CPU uses the per-cpu var directly, then in smp_prepare_cpus() it
switches to the per-cpu copy just allocated.  For secondary CPUs, the
early_gdt_descr is set to point directly to their per-cpu copy.

For UP the code is very simple: it keeps using the "per-cpu" GDT as per SMP,
but we never have to move.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 arch/i386/kernel/cpu/common.c|   72 +--
 arch/i386/kernel/head.S  |   55 --
 arch/i386/kernel/smpboot.c   |   59 ++--
 arch/i386/mach-voyager/voyager_smp.c |6 --
 include/asm-i386/desc.h  |2 
 include/asm-i386/processor.h |1 
 6 files changed, 75 insertions(+), 120 deletions(-)

Index: linux/arch/i386/kernel/cpu/common.c
===
--- linux.orig/arch/i386/kernel/cpu/common.c
+++ linux/arch/i386/kernel/cpu/common.c
@@ -25,7 +25,33 @@
 DEFINE_PER_CPU(struct Xgt_desc_struct, cpu_gdt_descr);
 EXPORT_PER_CPU_SYMBOL(cpu_gdt_descr);
 
-DEFINE_PER_CPU(struct desc_struct, cpu_gdt[GDT_ENTRIES]);
+DEFINE_PER_CPU(struct desc_struct, cpu_gdt[GDT_ENTRIES]) = {
+   [GDT_ENTRY_KERNEL_CS] = { 0x, 0x00cf9a00 },
+   [GDT_ENTRY_KERNEL_DS] = { 0x, 0x00cf9200 },
+   [GDT_ENTRY_DEFAULT_USER_CS] = { 0x, 0x00cffa00 },
+   [GDT_ENTRY_DEFAULT_USER_DS] = { 0x, 0x00cff200 },
+   /*
+* Segments used for calling PnP BIOS have byte granularity.
+* They code segments and data segments have fixed 64k limits,
+* the transfer segment sizes are set at run time.
+*/
+   [GDT_ENTRY_PNPBIOS_CS32] = { 0x, 0x00409a00 },/* 32-bit code */
+   [GDT_ENTRY_PNPBIOS_CS16] = { 0x, 0x9a00 },/* 16-bit code */
+   [GDT_ENTRY_PNPBIOS_DS] = { 0x, 0x9200 }, /* 16-bit data */
+   [GDT_ENTRY_PNPBIOS_TS1] = { 0x, 0x9200 },/* 16-bit data */
+   [GDT_ENTRY_PNPBIOS_TS2] = { 0x, 0x9200 },/* 16-bit data */
+   /*
+* The APM segments have byte granularity and their bases
+* are set at run time.  All have 64k limits.
+*/
+   [GDT_ENTRY_APMBIOS_BASE] = { 0x, 0x00409a00 },/* 32-bit code */
+   /* 16-bit code */
+   [GDT_ENTRY_APMBIOS_BASE+1] = { 0x, 0x9a00 },
+   [GDT_ENTRY_APMBIOS_BASE+2] = { 0x, 0x00409200 }, /* data */
+
+   [GDT_ENTRY_ESPFIX_SS] = { 0x, 0x00c09200 },
+   [GDT_ENTRY_PDA] = { 0x, 0x00c09200 }, /* set in setup_pda */
+};
 
 DEFINE_PER_CPU(struct i386_pda, _cpu_pda);
 EXPORT_PER_CPU_SYMBOL(_cpu_pda);
@@ -618,46 +644,6 @@ struct i386_pda boot_pda = {
.pcurrent = &init_task,
 };
 
-static inline void set_kernel_fs(void)
-{
-   /* Set %fs for this CPU's PDA.  Memory clobber is to create a
-  barrier with respect to any PDA operations, so the compiler
-  doesn't move any before here. */
-   asm volatile ("mov %0, %%fs" : : "r" (__KERNEL_PDA) : "memory");
-}
-
-/* Initialize the CPU's GDT and PDA.  This is either the boot CPU doing itself
-   (still using cpu_gdt_table), or a CPU doing it for a secondary which
-   will soon come up. */
-__cpuinit void init_gdt(int cpu, struct task_struct *idle)
-{
-   struct Xgt_desc_struct *cpu_gdt_descr = &per_cpu(cpu_gdt_descr, cpu);
-   struct desc_struct *gdt = per_cpu(cpu_gdt, cpu);
-   struct i386_pda *pda = &per_cpu(_cpu_pda, cpu);
-
-   memcpy(gdt, cpu_gdt_table, GDT_SIZE);
-   cpu_gdt_descr->address = (unsigned long)gdt;
-   cpu_gdt_descr->size = GDT_SIZE - 1;
-
-   pack_descriptor((u32 *)&gdt[GDT_ENTRY_PDA].a,
-   (u32 *)&gdt[GDT_ENTRY_PDA].b,
-   (unsigned long)pda, sizeof(*pda) - 1,
-   0x80 | DESCTYPE_S | 0x2, 0); /* present read-write data 
segment */
-
-   memset(pda, 0, sizeof(*pda));
-   pda->_pda = pda;
-   pda->cpu_number = cpu;
-   pda->pcurrent = idle;
-}
-
-void __cpuinit cpu_set_gdt(int cpu)
-{
-   struct Xgt_desc_struct *cpu_gdt_descr = &per_cpu(cpu_gdt_descr, cpu);
-
-   load_gdt(cpu_gdt_descr);
-   set_kernel_fs();
-}
-
 /* Common CPU init for both boot and secondary CPUs */
 static void __cpuinit _cpu_init(int cpu, struct task_struct *curr)
 {
@@ -740,10 +726,6 @@ void __cpuinit cpu_init(void)
int cpu = smp_processor_id();
struct task_struct *curr = current;
 
-   /* Set up the real GDT and PDA, so we can transition from the
-  boot_gdt_table & b

[PATCH] [48/48] i386: cleanup GDT Access

2007-04-29 Thread Andi Kleen

From: Rusty Russell <[EMAIL PROTECTED]>

Now we have an explicit per-cpu GDT variable, we don't need to keep the
descriptors around to use them to find the GDT: expose cpu_gdt directly.

We could go further and make load_gdt() pack the descriptor for us, or even
assume it means "load the current cpu's GDT" which is what it always does.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 arch/i386/kernel/cpu/common.c |4 +---
 arch/i386/kernel/efi.c|   16 
 arch/i386/kernel/entry.S  |3 +--
 arch/i386/kernel/smpboot.c|   12 ++--
 arch/i386/kernel/traps.c  |4 +---
 include/asm-i386/desc.h   |7 ++-
 6 files changed, 19 insertions(+), 27 deletions(-)

Index: linux/arch/i386/kernel/cpu/common.c
===
--- linux.orig/arch/i386/kernel/cpu/common.c
+++ linux/arch/i386/kernel/cpu/common.c
@@ -22,9 +22,6 @@
 
 #include "cpu.h"
 
-DEFINE_PER_CPU(struct Xgt_desc_struct, cpu_gdt_descr);
-EXPORT_PER_CPU_SYMBOL(cpu_gdt_descr);
-
 DEFINE_PER_CPU(struct desc_struct, cpu_gdt[GDT_ENTRIES]) = {
[GDT_ENTRY_KERNEL_CS] = { 0x, 0x00cf9a00 },
[GDT_ENTRY_KERNEL_DS] = { 0x, 0x00cf9200 },
@@ -52,6 +49,7 @@ DEFINE_PER_CPU(struct desc_struct, cpu_g
[GDT_ENTRY_ESPFIX_SS] = { 0x, 0x00c09200 },
[GDT_ENTRY_PDA] = { 0x, 0x00c09200 }, /* set in setup_pda */
 };
+EXPORT_PER_CPU_SYMBOL_GPL(cpu_gdt);
 
 DEFINE_PER_CPU(struct i386_pda, _cpu_pda);
 EXPORT_PER_CPU_SYMBOL(_cpu_pda);
Index: linux/arch/i386/kernel/efi.c
===
--- linux.orig/arch/i386/kernel/efi.c
+++ linux/arch/i386/kernel/efi.c
@@ -69,13 +69,11 @@ static void efi_call_phys_prelog(void) _
 {
unsigned long cr4;
unsigned long temp;
-   struct Xgt_desc_struct *cpu_gdt_descr;
+   struct Xgt_desc_struct gdt_descr;
 
spin_lock(&efi_rt_lock);
local_irq_save(efi_rt_eflags);
 
-   cpu_gdt_descr = &per_cpu(cpu_gdt_descr, 0);
-
/*
 * If I don't have PSE, I should just duplicate two entries in page
 * directory. If I have PSE, I just need to duplicate one entry in
@@ -105,17 +103,19 @@ static void efi_call_phys_prelog(void) _
 */
local_flush_tlb();
 
-   cpu_gdt_descr->address = __pa(cpu_gdt_descr->address);
-   load_gdt(cpu_gdt_descr);
+   gdt_descr.address = __pa(get_cpu_gdt_table(0));
+   gdt_descr.size = GDT_SIZE - 1;
+   load_gdt(&gdt_descr);
 }
 
 static void efi_call_phys_epilog(void) __releases(efi_rt_lock)
 {
unsigned long cr4;
-   struct Xgt_desc_struct *cpu_gdt_descr = &per_cpu(cpu_gdt_descr, 0);
+   struct Xgt_desc_struct gdt_descr;
 
-   cpu_gdt_descr->address = (unsigned long)__va(cpu_gdt_descr->address);
-   load_gdt(cpu_gdt_descr);
+   gdt_descr.address = (unsigned long)get_cpu_gdt_table(0);
+   gdt_descr.size = GDT_SIZE - 1;
+   load_gdt(&gdt_descr);
 
cr4 = read_cr4();
 
Index: linux/arch/i386/kernel/entry.S
===
--- linux.orig/arch/i386/kernel/entry.S
+++ linux/arch/i386/kernel/entry.S
@@ -561,8 +561,7 @@ END(syscall_badsys)
 #define FIXUP_ESPFIX_STACK \
/* since we are on a wrong stack, we cant make it a C code :( */ \
movl %fs:PDA_cpu, %ebx; \
-   PER_CPU(cpu_gdt_descr, %ebx); \
-   movl GDS_address(%ebx), %ebx; \
+   PER_CPU(cpu_gdt, %ebx); \
GET_DESC_BASE(GDT_ENTRY_ESPFIX_SS, %ebx, %eax, %ax, %al, %ah); \
addl %esp, %eax; \
pushl $__KERNEL_DS; \
Index: linux/arch/i386/kernel/smpboot.c
===
--- linux.orig/arch/i386/kernel/smpboot.c
+++ linux/arch/i386/kernel/smpboot.c
@@ -786,13 +786,9 @@ static inline struct task_struct * alloc
secondary which will soon come up. */
 static __cpuinit void init_gdt(int cpu, struct task_struct *idle)
 {
-   struct Xgt_desc_struct *cpu_gdt_descr = &per_cpu(cpu_gdt_descr, cpu);
-   struct desc_struct *gdt = per_cpu(cpu_gdt, cpu);
+   struct desc_struct *gdt = get_cpu_gdt_table(cpu);
struct i386_pda *pda = &per_cpu(_cpu_pda, cpu);
 
-   cpu_gdt_descr->address = (unsigned long)gdt;
-   cpu_gdt_descr->size = GDT_SIZE - 1;
-
pack_descriptor((u32 *)&gdt[GDT_ENTRY_PDA].a,
(u32 *)&gdt[GDT_ENTRY_PDA].b,
(unsigned long)pda, sizeof(*pda) - 1,
@@ -1187,7 +1183,11 @@ void __init smp_prepare_cpus(unsigned in
  * it's on the real one. */
 static inline void switch_to_new_gdt(void)
 {
-   load_gdt(&per_cpu(cpu_gdt_descr, smp_processor_id()));
+   struct Xgt_desc_struct gdt_descr;
+
+   gdt_descr.address = (long)get_cpu_gdt_table(smp_processor_id());
+   gdt

Re: Probable PCIE prob

2007-04-29 Thread Andi Kleen
Syren Baran <[EMAIL PROTECTED]> writes:

> i got a problem with the combination of an Asrock AM2NF4G-SATA2
> mainboard with a Radeon X1900 (chip 1002,724b) graphics
> card. /i386/pci/mmconfig.c reports a buggy bios (e000 is not
> E820-reserved).

This message is harmless and likely unrelated.

 System crashes only happen when viewing films (neither
> xine nor mplayer run with root privileges) and independent of video
> drivers (framebuffer, vesa and fglrx). Logs dont show any anomalies
> before crashing. Anybody got a clue?

Sounds like some sort of hardware problem. Maybe the power supply
is not up to the task? Or the board could be broken. Also always
worth updating the BIOS.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [24/48] i386: Initialize esp0 properly all the time

2007-04-29 Thread Andi Kleen

From: Rusty Russell <[EMAIL PROTECTED]>

Whenever we schedule, __switch_to calls load_esp0 which does:

tss->esp0 = thread->esp0;

This is never initialized for the initial thread (ie "swapper"), so when we're
scheduling that, we end up setting esp0 to 0.  This is fine: the swapper never
leaves ring 0, so this field is never used.

lguest, however, gets upset that we're trying to used an unmapped page as our
kernel stack.  Rather than work around it there, let's initialize it.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 include/asm-i386/processor.h |1 +
 1 file changed, 1 insertion(+)

Index: linux/include/asm-i386/processor.h
===
--- linux.orig/include/asm-i386/processor.h
+++ linux/include/asm-i386/processor.h
@@ -421,6 +421,7 @@ struct thread_struct {
 };
 
 #define INIT_THREAD  { \
+   .esp0 = sizeof(init_stack) + (long)&init_stack, \
.vm86_info = NULL,  \
.sysenter_cs = __KERNEL_CS, \
.io_bitmap_ptr = NULL,  \
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [25/48] x86_64: Introduce load_TLS to the "for" loop.

2007-04-29 Thread Andi Kleen

From: Rusty Russell <[EMAIL PROTECTED]>

GCC (4.1 at least) unrolls it anyway, but I can't believe this code
was ever justifiable.  (I've also submitted a patch which cleans up
i386, which is even uglier).

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 include/asm-x86_64/desc.h |   11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

Index: linux/include/asm-x86_64/desc.h
===
--- linux.orig/include/asm-x86_64/desc.h
+++ linux/include/asm-x86_64/desc.h
@@ -135,16 +135,13 @@ static inline void set_ldt_desc(unsigned
(info)->useable == 0&& \
(info)->lm  == 0)
 
-#if TLS_SIZE != 24
-# error update this code.
-#endif
-
 static inline void load_TLS(struct thread_struct *t, unsigned int cpu)
 {
+   unsigned int i;
u64 *gdt = (u64 *)(cpu_gdt(cpu) + GDT_ENTRY_TLS_MIN);
-   gdt[0] = t->tls_array[0];
-   gdt[1] = t->tls_array[1];
-   gdt[2] = t->tls_array[2];
+
+   for (i = 0; i < GDT_ENTRY_TLS_ENTRIES; i++)
+   gdt[i] = t->tls_array[i];
 } 
 
 /*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [47/48] x86_64: Fix "Section mismatch" compile warning

2007-04-29 Thread Andi Kleen

From: Bernhard Walle <[EMAIL PROTECTED]>
Fix "Section mismatch" warnings in arch/x86_64/kernel/time.c

Signed-off-by: Bernhard Walle <[EMAIL PROTECTED]>
Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

---
 arch/x86_64/kernel/time.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Index: linux/arch/x86_64/kernel/time.c
===
--- linux.orig/arch/x86_64/kernel/time.c
+++ linux/arch/x86_64/kernel/time.c
@@ -328,7 +328,7 @@ static unsigned int __init pit_calibrate
 #define PIT_MODE 0x43
 #define PIT_CH0  0x40
 
-static void __init __pit_init(int val, u8 mode)
+static void __pit_init(int val, u8 mode)
 {
unsigned long flags;
 
@@ -344,12 +344,12 @@ void __init pit_init(void)
__pit_init(LATCH, 0x34); /* binary, mode 2, LSB/MSB, ch 0 */
 }
 
-void __init pit_stop_interrupt(void)
+void pit_stop_interrupt(void)
 {
__pit_init(0, 0x30); /* mode 0 */
 }
 
-void __init stop_timer_interrupt(void)
+void stop_timer_interrupt(void)
 {
char *name;
if (hpet_address) {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [44/48] x86_64: vsyscall_gtod_data diet and vgettimeofday() fix

2007-04-29 Thread Andi Kleen

From: Eric Dumazet <[EMAIL PROTECTED]>

Current vsyscall_gtod_data is large (3 or 4 cache lines dirtied at timer 
interrupt). We can shrink it to exactly 64 bytes (1 cache line on AMD64)

Instead of copying a whole struct clocksource, we copy only needed fields.

I deleted an unused field : offset_base

This patch fixes one oddity in vgettimeofday(): It can returns a timeval with 
tv_usec = 100. Maybe not a bug, but why not doing the right thing ?

Signed-off-by: Eric Dumazet <[EMAIL PROTECTED]>
Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>



---
 arch/x86_64/kernel/vsyscall.c |   53 --
 1 file changed, 36 insertions(+), 17 deletions(-)

Index: linux/arch/x86_64/kernel/vsyscall.c
===
--- linux.orig/arch/x86_64/kernel/vsyscall.c
+++ linux/arch/x86_64/kernel/vsyscall.c
@@ -51,13 +51,28 @@
  asm("" : "=r" (v) : "0" (x)); \
  ((v - VSYSCALL_FIRST_PAGE) + __pa_symbol(&__vsyscall_0)); })
 
+/*
+ * vsyscall_gtod_data contains data that is :
+ * - readonly from vsyscalls
+ * - writen by timer interrupt or systcl (/proc/sys/kernel/vsyscall64)
+ * Try to keep this structure as small as possible to avoid cache line ping 
pongs
+ */
 struct vsyscall_gtod_data_t {
-   seqlock_t lock;
-   int sysctl_enabled;
-   struct timeval wall_time_tv;
+   seqlock_t   lock;
+
+   /* open coded 'struct timespec' */
+   time_t  wall_time_sec;
+   u32 wall_time_nsec;
+
+   int sysctl_enabled;
struct timezone sys_tz;
-   cycle_t offset_base;
-   struct clocksource clock;
+   struct { /* extract of a clocksource struct */
+   cycle_t (*vread)(void);
+   cycle_t cycle_last;
+   cycle_t mask;
+   u32 mult;
+   u32 shift;
+   } clock;
 };
 int __vgetcpu_mode __section_vgetcpu_mode;
 
@@ -73,9 +88,13 @@ void update_vsyscall(struct timespec *wa
 
write_seqlock_irqsave(&vsyscall_gtod_data.lock, flags);
/* copy vsyscall data */
-   vsyscall_gtod_data.clock = *clock;
-   vsyscall_gtod_data.wall_time_tv.tv_sec = wall_time->tv_sec;
-   vsyscall_gtod_data.wall_time_tv.tv_usec = wall_time->tv_nsec/1000;
+   vsyscall_gtod_data.clock.vread = clock->vread;
+   vsyscall_gtod_data.clock.cycle_last = clock->cycle_last;
+   vsyscall_gtod_data.clock.mask = clock->mask;
+   vsyscall_gtod_data.clock.mult = clock->mult;
+   vsyscall_gtod_data.clock.shift = clock->shift;
+   vsyscall_gtod_data.wall_time_sec = wall_time->tv_sec;
+   vsyscall_gtod_data.wall_time_nsec = wall_time->tv_nsec;
vsyscall_gtod_data.sys_tz = sys_tz;
write_sequnlock_irqrestore(&vsyscall_gtod_data.lock, flags);
 }
@@ -110,7 +129,8 @@ static __always_inline long time_syscall
 static __always_inline void do_vgettimeofday(struct timeval * tv)
 {
cycle_t now, base, mask, cycle_delta;
-   unsigned long seq, mult, shift, nsec_delta;
+   unsigned seq;
+   unsigned long mult, shift, nsec;
cycle_t (*vread)(void);
do {
seq = read_seqbegin(&__vsyscall_gtod_data.lock);
@@ -126,21 +146,20 @@ static __always_inline void do_vgettimeo
mult = __vsyscall_gtod_data.clock.mult;
shift = __vsyscall_gtod_data.clock.shift;
 
-   *tv = __vsyscall_gtod_data.wall_time_tv;
-
+   tv->tv_sec = __vsyscall_gtod_data.wall_time_sec;
+   nsec = __vsyscall_gtod_data.wall_time_nsec;
} while (read_seqretry(&__vsyscall_gtod_data.lock, seq));
 
/* calculate interval: */
cycle_delta = (now - base) & mask;
/* convert to nsecs: */
-   nsec_delta = (cycle_delta * mult) >> shift;
+   nsec += (cycle_delta * mult) >> shift;
 
-   /* convert to usecs and add to timespec: */
-   tv->tv_usec += nsec_delta / NSEC_PER_USEC;
-   while (tv->tv_usec > USEC_PER_SEC) {
+   while (nsec >= NSEC_PER_SEC) {
tv->tv_sec += 1;
-   tv->tv_usec -= USEC_PER_SEC;
+   nsec -= NSEC_PER_SEC;
}
+   tv->tv_usec = nsec / NSEC_PER_USEC;
 }
 
 int __vsyscall(0) vgettimeofday(struct timeval * tv, struct timezone * tz)
@@ -159,7 +178,7 @@ time_t __vsyscall(1) vtime(time_t *t)
time_t result;
if (unlikely(!__vsyscall_gtod_data.sysctl_enabled))
return time_syscall(t);
-   result = __vsyscall_gtod_data.wall_time_tv.tv_sec;
+   result = __vsyscall_gtod_data.wall_time_sec;
if (t)
*t = result;
return result;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [46/48] x86_64: adjust EDID retrieval

2007-04-29 Thread Andi Kleen

From: "Jan Beulich" <[EMAIL PROTECTED]>
commit 5e518d7672dea4cd7c60871e40d0490c52f01d13 did the same change to
i386's variant.

With this change, i386's and x86-64's versions are identical, raising
the question whether the x86-64 one should go (just like there's only
one instance of edd.S).

Signed-off-by: Jan Beulich <[EMAIL PROTECTED]>
Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

---
 arch/x86_64/boot/video.S |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/arch/x86_64/boot/video.S
===
--- linux.orig/arch/x86_64/boot/video.S
+++ linux/arch/x86_64/boot/video.S
@@ -1977,7 +1977,7 @@ store_edid:
movw$0x4f15, %ax# do VBE/DDC
movw$0x01, %bx
movw$0x00, %cx
-   movw$0x01, %dx
+   movw$0x00, %dx
movw$0x140, %di
int $0x10
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v6

2007-04-29 Thread Con Kolivas
On Sunday 29 April 2007 20:30, Thomas Gleixner wrote:
> As a sidenote: I really wonder if anybody noticed yet, that the whole
> CFS / SD comparison is so ridiculous, that it is not even funny anymore.
> CFS modifies the scheduler and nothing else, SD fiddles all over the
> kernel in interesting ways.

This is a WTF if ever I saw one.

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [27/48] i386: Allow i386 crash kernels to handle x86_64 dumps

2007-04-29 Thread Andi Kleen

From: Ian Campbell <[EMAIL PROTECTED]>

The specific case I am encountering is kdump under Xen with a 64 bit
hypervisor and 32 bit kernel/userspace.  The dump created is 64 bit due to
the hypervisor but the dump kernel is 32 bit for maximum compatibility.

It's possibly less likely to be useful in a purely native scenario but I
see no reason to disallow it.

[EMAIL PROTECTED]: build fix]
Signed-off-by: Ian Campbell <[EMAIL PROTECTED]>
Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>
Acked-by: Vivek Goyal <[EMAIL PROTECTED]>
Cc: Horms <[EMAIL PROTECTED]>
Cc: Magnus Damm <[EMAIL PROTECTED]>
Cc: "Eric W. Biederman" <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 fs/proc/vmcore.c   |2 +-
 include/asm-i386/kexec.h   |3 +++
 include/linux/crash_dump.h |8 
 3 files changed, 12 insertions(+), 1 deletion(-)

Index: linux/fs/proc/vmcore.c
===
--- linux.orig/fs/proc/vmcore.c
+++ linux/fs/proc/vmcore.c
@@ -514,7 +514,7 @@ static int __init parse_crash_elf64_head
/* Do some basic Verification. */
if (memcmp(ehdr.e_ident, ELFMAG, SELFMAG) != 0 ||
(ehdr.e_type != ET_CORE) ||
-   !elf_check_arch(&ehdr) ||
+   !vmcore_elf_check_arch(&ehdr) ||
ehdr.e_ident[EI_CLASS] != ELFCLASS64 ||
ehdr.e_ident[EI_VERSION] != EV_CURRENT ||
ehdr.e_version != EV_CURRENT ||
Index: linux/include/asm-i386/kexec.h
===
--- linux.orig/include/asm-i386/kexec.h
+++ linux/include/asm-i386/kexec.h
@@ -42,6 +42,9 @@
 /* The native architecture */
 #define KEXEC_ARCH KEXEC_ARCH_386
 
+/* We can also handle crash dumps from 64 bit kernel. */
+#define vmcore_elf_check_arch_cross(x) ((x)->e_machine == EM_X86_64)
+
 #define MAX_NOTE_BYTES 1024
 
 /* CPU does not save ss and esp on stack if execution is already
Index: linux/include/linux/crash_dump.h
===
--- linux.orig/include/linux/crash_dump.h
+++ linux/include/linux/crash_dump.h
@@ -14,5 +14,13 @@ extern ssize_t copy_oldmem_page(unsigned
 extern const struct file_operations proc_vmcore_operations;
 extern struct proc_dir_entry *proc_vmcore;
 
+/* Architecture code defines this if there are other possible ELF
+ * machine types, e.g. on bi-arch capable hardware. */
+#ifndef vmcore_elf_check_arch_cross
+#define vmcore_elf_check_arch_cross(x) 0
+#endif
+
+#define vmcore_elf_check_arch(x) (elf_check_arch(x) || 
vmcore_elf_check_arch_cross(x))
+
 #endif /* CONFIG_CRASH_DUMP */
 #endif /* LINUX_CRASHDUMP_H */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [29/48] x86: add command line length to boot protocol

2007-04-29 Thread Andi Kleen

From: Bernhard Walle <[EMAIL PROTECTED]>

Because the command line is increased to 2048 characters after 2.6.21, it's
not possible for boot loaders and userspace tools to determine the length
of the command line the kernel can understand.  The benefit of knowing the
length is that users can be warned if the command line size is too long
which prevents surprise if things don't work after bootup.

This patch updates the boot protocol to contain a field called
"cmdline_size" that contain the length of the command line (excluding the
terminating zero).

The patch also adds missing fields (of protocol version 2.05) to the x86_64
setup code.

Signed-off-by: Bernhard Walle <[EMAIL PROTECTED]>
Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>
Cc: Alon Bar-Lev <[EMAIL PROTECTED]>
Acked-by: H. Peter Anvin <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 Documentation/i386/boot.txt |   23 +--
 arch/i386/boot/setup.S  |7 ++-
 arch/x86_64/boot/setup.S|7 ++-
 3 files changed, 29 insertions(+), 8 deletions(-)

Index: linux/Documentation/i386/boot.txt
===
--- linux.orig/Documentation/i386/boot.txt
+++ linux/Documentation/i386/boot.txt
@@ -2,7 +2,7 @@
 
 
H. Peter Anvin <[EMAIL PROTECTED]>
-   Last update 2007-01-26
+   Last update 2007-03-06
 
 On the i386 platform, the Linux kernel uses a rather complicated boot
 convention.  This has evolved partially due to historical aspects, as
@@ -35,9 +35,13 @@ Protocol 2.03:   (Kernel 2.4.18-pre1) Expl
initrd address available to the bootloader.
 
 Protocol 2.04: (Kernel 2.6.14) Extend the syssize field to four bytes.
+
 Protocol 2.05: (Kernel 2.6.20) Make protected mode kernel relocatable.
Introduce relocatable_kernel and kernel_alignment fields.
 
+Protocol 2.06: (Kernel 2.6.22) Added a field that contains the size of
+   the boot command line
+
 
  MEMORY LAYOUT
 
@@ -133,6 +137,8 @@ Offset  Proto   NameMeaning
 022C/4 2.03+   initrd_addr_max Highest legal initrd address
 0230/4 2.05+   kernel_alignment Physical addr alignment required for kernel
 0234/1 2.05+   relocatable_kernel Whether kernel is relocatable or not
+0235/3 N/A pad2Unused
+0238/4 2.06+   cmdline_sizeMaximum size of the kernel command line
 
 (1) For backwards compatibility, if the setup_sects field contains 0, the
 real value is 4.
@@ -233,6 +239,12 @@ filled out, however:
if your ramdisk is exactly 131072 bytes long and this field is
0x37FF, you can start your ramdisk at 0x37FE.)
 
+  cmdline_size:
+   The maximum size of the command line without the terminating
+   zero. This means that the command line can contain at most
+   cmdline_size characters. With protocol version 2.05 and
+   earlier, the maximum size was 255.
+
 
  THE KERNEL COMMAND LINE
 
@@ -241,11 +253,10 @@ loader to communicate with the kernel.  
 relevant to the boot loader itself, see "special command line options"
 below.
 
-The kernel command line is a null-terminated string currently up to
-255 characters long, plus the final null.  A string that is too long
-will be automatically truncated by the kernel, a boot loader may allow
-a longer command line to be passed to permit future kernels to extend
-this limit.
+The kernel command line is a null-terminated string. The maximum
+length can be retrieved from the field cmdline_size.  Before protocol
+version 2.06, the maximum was 255 characters.  A string that is too
+long will be automatically truncated by the kernel.
 
 If the boot protocol version is 2.02 or later, the address of the
 kernel command line is given by the header field cmd_line_ptr (see
Index: linux/arch/i386/boot/setup.S
===
--- linux.orig/arch/i386/boot/setup.S
+++ linux/arch/i386/boot/setup.S
@@ -52,6 +52,7 @@
 #include 
 #include 
 #include 
+#include 

 /* Signature words to ensure LILO loaded us right */
 #define SIG1   0xAA55
@@ -81,7 +82,7 @@ start:
 # This is the setup header, and it must start at %cs:2 (old 0x9020:2)
 
.ascii  "HdrS"  # header signature
-   .word   0x0205  # header version number (>= 0x0105)
+   .word   0x0206  # header version number (>= 0x0105)
# or else old loadlin-1.5 will fail)
 realmode_swtch:.word   0, 0# default_switch, SETUPSEG
 start_sys_seg: .word   SYSSEG
@@ -171,6 +172,10 @@ relocatable_kernel:.byte 0
 pad2:  .byte 0
 pad3:  .word 0
 
+cmdline_size:   .long   COMMAND_LINE_SIZE-1 #length of the command line,
+   

[PATCH] [1/48] x86_64: fix x86_64-mm-sched-clock-share

2007-04-29 Thread Andi Kleen

From: Andrew Morton <[EMAIL PROTECTED]>

Fix for the following patch. Provide dummy cpufreq functions when
CPUFREQ is not compiled in. 

Cc: Andi Kleen <[EMAIL PROTECTED]>
Cc: Dave Jones <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

---

 include/linux/cpufreq.h |   19 ---
 1 file changed, 16 insertions(+), 3 deletions(-)

Index: linux/include/linux/cpufreq.h
===
--- linux.orig/include/linux/cpufreq.h
+++ linux/include/linux/cpufreq.h
@@ -32,7 +32,15 @@
  * CPUFREQ NOTIFIER INTERFACE*
  */
 
+#ifdef CONFIG_CPU_FREQ
 int cpufreq_register_notifier(struct notifier_block *nb, unsigned int list);
+#else
+static inline int cpufreq_register_notifier(struct notifier_block *nb,
+   unsigned int list)
+{
+   return 0;
+}
+#endif
 int cpufreq_unregister_notifier(struct notifier_block *nb, unsigned int list);
 
 #define CPUFREQ_TRANSITION_NOTIFIER(0)
@@ -261,17 +269,22 @@ int cpufreq_set_policy(struct cpufreq_po
 int cpufreq_get_policy(struct cpufreq_policy *policy, unsigned int cpu);
 int cpufreq_update_policy(unsigned int cpu);
 
-/* query the current CPU frequency (in kHz). If zero, cpufreq couldn't detect 
it */
-unsigned int cpufreq_get(unsigned int cpu);
 
-/* query the last known CPU freq (in kHz). If zero, cpufreq couldn't detect it 
*/
+/*
+ * query the last known CPU freq (in kHz). If zero, cpufreq couldn't detect it
+ */
 #ifdef CONFIG_CPU_FREQ
 unsigned int cpufreq_quick_get(unsigned int cpu);
+unsigned int cpufreq_get(unsigned int cpu);
 #else
 static inline unsigned int cpufreq_quick_get(unsigned int cpu)
 {
return 0;
 }
+static inline unsigned int cpufreq_get(unsigned int cpu)
+{
+   return 0;
+}
 #endif
 
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [37/48] i386: ignore vgacon if hardware not present

2007-04-29 Thread Andi Kleen

From: Rusty Russell <[EMAIL PROTECTED]>
On Thu, 2007-03-29 at 12:36 +0200, Andi Kleen wrote:
> On Thu, Mar 29, 2007 at 05:46:48PM +1000, Rusty Russell wrote:
> > (Did this fall through the cracks?  I don't see it in -mm.  It's
> > standalone, and saves some silly code in lguest and presumably others).
> 
> Normally it should go to some some console maintainer? 

Hmm, but who?

> Ok I can add it.

Thanks.  While you're in a patch-applying mood, how about this?

Cheers,
Rusty.
==
Use X86_EFLAGS_IF in irqflags.h.

Move X86_EFLAGS_IF et al out to a new header: processor-flags.h, so we
can include it from irqflags.h and use it in raw_irqs_disabled_flags().

As a side-effect, we could now use these flags in .S files.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

---
 include/asm-i386/irqflags.h|3 ++-
 include/asm-i386/processor-flags.h |   26 ++
 include/asm-i386/processor.h   |   22 +-
 3 files changed, 29 insertions(+), 22 deletions(-)

===
Index: linux/include/asm-i386/processor-flags.h
===
--- /dev/null
+++ linux/include/asm-i386/processor-flags.h
@@ -0,0 +1,26 @@
+#ifndef __ASM_I386_PROCESSOR_FLAGS_H
+#define __ASM_I386_PROCESSOR_FLAGS_H
+/* Various flags defined: can be included from assembler. */
+
+/*
+ * EFLAGS bits
+ */
+#define X86_EFLAGS_CF  0x0001 /* Carry Flag */
+#define X86_EFLAGS_PF  0x0004 /* Parity Flag */
+#define X86_EFLAGS_AF  0x0010 /* Auxillary carry Flag */
+#define X86_EFLAGS_ZF  0x0040 /* Zero Flag */
+#define X86_EFLAGS_SF  0x0080 /* Sign Flag */
+#define X86_EFLAGS_TF  0x0100 /* Trap Flag */
+#define X86_EFLAGS_IF  0x0200 /* Interrupt Flag */
+#define X86_EFLAGS_DF  0x0400 /* Direction Flag */
+#define X86_EFLAGS_OF  0x0800 /* Overflow Flag */
+#define X86_EFLAGS_IOPL0x3000 /* IOPL mask */
+#define X86_EFLAGS_NT  0x4000 /* Nested Task */
+#define X86_EFLAGS_RF  0x0001 /* Resume Flag */
+#define X86_EFLAGS_VM  0x0002 /* Virtual Mode */
+#define X86_EFLAGS_AC  0x0004 /* Alignment Check */
+#define X86_EFLAGS_VIF 0x0008 /* Virtual Interrupt Flag */
+#define X86_EFLAGS_VIP 0x0010 /* Virtual Interrupt Pending */
+#define X86_EFLAGS_ID  0x0020 /* CPUID detection flag */
+
+#endif /* __ASM_I386_PROCESSOR_FLAGS_H */
Index: linux/include/asm-i386/irqflags.h
===
--- linux.orig/include/asm-i386/irqflags.h
+++ linux/include/asm-i386/irqflags.h
@@ -9,6 +9,7 @@
  */
 #ifndef _ASM_IRQFLAGS_H
 #define _ASM_IRQFLAGS_H
+#include 
 
 #ifndef __ASSEMBLY__
 static inline unsigned long native_save_fl(void)
@@ -119,7 +120,7 @@ static inline unsigned long __raw_local_
 
 static inline int raw_irqs_disabled_flags(unsigned long flags)
 {
-   return !(flags & (1 << 9));
+   return !(flags & X86_EFLAGS_IF);
 }
 
 static inline int raw_irqs_disabled(void)
Index: linux/include/asm-i386/processor.h
===
--- linux.orig/include/asm-i386/processor.h
+++ linux/include/asm-i386/processor.h
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* flag for disabling the tsc */
 extern int tsc_disable;
@@ -126,27 +127,6 @@ extern void detect_ht(struct cpuinfo_x86
 static inline void detect_ht(struct cpuinfo_x86 *c) {}
 #endif
 
-/*
- * EFLAGS bits
- */
-#define X86_EFLAGS_CF  0x0001 /* Carry Flag */
-#define X86_EFLAGS_PF  0x0004 /* Parity Flag */
-#define X86_EFLAGS_AF  0x0010 /* Auxillary carry Flag */
-#define X86_EFLAGS_ZF  0x0040 /* Zero Flag */
-#define X86_EFLAGS_SF  0x0080 /* Sign Flag */
-#define X86_EFLAGS_TF  0x0100 /* Trap Flag */
-#define X86_EFLAGS_IF  0x0200 /* Interrupt Flag */
-#define X86_EFLAGS_DF  0x0400 /* Direction Flag */
-#define X86_EFLAGS_OF  0x0800 /* Overflow Flag */
-#define X86_EFLAGS_IOPL0x3000 /* IOPL mask */
-#define X86_EFLAGS_NT  0x4000 /* Nested Task */
-#define X86_EFLAGS_RF  0x0001 /* Resume Flag */
-#define X86_EFLAGS_VM  0x0002 /* Virtual Mode */
-#define X86_EFLAGS_AC  0x0004 /* Alignment Check */
-#define X86_EFLAGS_VIF 0x0008 /* Virtual Interrupt Flag */
-#define X86_EFLAGS_VIP 0x0010 /* Virtual Interrupt Pending */
-#define X86_EFLAGS_ID  0x0020 /* CPUID detection flag */
-
 static inline void native_cpuid(unsigned int *eax, unsigned int *ebx,
 unsigned int *ecx, unsigned int *edx)
 {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [26/48] x86_64: Clarify CONFIG_REORDER explanation

2007-04-29 Thread Andi Kleen

From: Rusty Russell <[EMAIL PROTECTED]>

if (1 && X) => if (X).

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 arch/x86_64/Kconfig |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux/arch/x86_64/Kconfig
===
--- linux.orig/arch/x86_64/Kconfig
+++ linux/arch/x86_64/Kconfig
@@ -665,8 +665,8 @@ config REORDER
default n
help
  This option enables the toolchain to reorder functions for a more 
- optimal TLB usage. If you have pretty much any version of binutils, 
-this can increase your kernel build time by roughly one minute.
+ optimal TLB usage.  This will slow your kernel build by
+roughly one minute.
 
 config K8_NB
def_bool y
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [4/48] x86_64: Don't disable basic block reordering

2007-04-29 Thread Andi Kleen

When compiling with -Os (which is default) the compiler defaults to it
anyways. And with -O2 it probably generates somewhat better (although
also larger) code. 

Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

---
 arch/x86_64/Makefile |3 ---
 1 file changed, 3 deletions(-)

Index: linux/arch/x86_64/Makefile
===
--- linux.orig/arch/x86_64/Makefile
+++ linux/arch/x86_64/Makefile
@@ -41,9 +41,6 @@ cflags-y += -mno-red-zone
 cflags-y += -mcmodel=kernel
 cflags-y += -pipe
 cflags-kernel-$(CONFIG_REORDER) += -ffunction-sections
-# this makes reading assembly source easier, but produces worse code
-# actually it makes the kernel smaller too.
-cflags-y += -fno-reorder-blocks
 cflags-y += -Wno-sign-compare
 cflags-y += -fno-asynchronous-unwind-tables
 ifneq ($(CONFIG_DEBUG_INFO),y)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [10/48] i386: type cast clean up for find_next_zero_bit

2007-04-29 Thread Andi Kleen

From: "Ken Chen" <[EMAIL PROTECTED]>

clean up unneeded type cast by properly declare data type.

Signed-off-by: Ken Chen <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

---

 arch/i386/lib/bitops.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux/arch/i386/lib/bitops.c
===
--- linux.orig/arch/i386/lib/bitops.c
+++ linux/arch/i386/lib/bitops.c
@@ -43,7 +43,7 @@ EXPORT_SYMBOL(find_next_bit);
  */
 int find_next_zero_bit(const unsigned long *addr, int size, int offset)
 {
-   unsigned long * p = ((unsigned long *) addr) + (offset >> 5);
+   const unsigned long *p = addr + (offset >> 5);
int set = 0, bit = offset & 31, res;
 
if (bit) {
@@ -64,7 +64,7 @@ int find_next_zero_bit(const unsigned lo
/*
 * No zero yet, search remaining full bytes for a zero
 */
-   res = find_first_zero_bit (p, size - 32 * (p - (unsigned long *) addr));
+   res = find_first_zero_bit(p, size - 32 * (p - addr));
return (offset + set + res);
 }
 EXPORT_SYMBOL(find_next_zero_bit);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [9/48] i386: make struct vmi_ops static

2007-04-29 Thread Andi Kleen

From: Adrian Bunk <[EMAIL PROTECTED]>

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>
Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Cc: Zachary Amsden <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 arch/i386/kernel/vmi.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/arch/i386/kernel/vmi.c
===
--- linux.orig/arch/i386/kernel/vmi.c
+++ linux/arch/i386/kernel/vmi.c
@@ -56,7 +56,7 @@ static int disable_noidle;
 static int disable_vmi_timer;
 
 /* Cached VMI operations */
-struct {
+static struct {
void (*cpuid)(void /* non-c */);
void (*_set_ldt)(u32 selector);
void (*set_tr)(u32 selector);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [8/48] i386: modpost apic related warning fixes

2007-04-29 Thread Andi Kleen

From: Vivek Goyal <[EMAIL PROTECTED]>

o Modpost generates warnings for i386 if compiled with CONFIG_RELOCATABLE=y

WARNING: vmlinux - Section mismatch: reference to 
.init.text:find_unisys_acpi_oem_table from .text between 'acpi_madt_oem_check' 
(at offset 0xc0101eda) and 'enable_apic_mode'
WARNING: vmlinux - Section mismatch: reference to 
.init.text:acpi_get_table_header_early from .text between 'acpi_madt_oem_check' 
(at offset 0xc0101ef0) and 'enable_apic_mode'
WARNING: vmlinux - Section mismatch: reference to .init.text:parse_unisys_oem 
from .text between 'acpi_madt_oem_check' (at offset 0xc0101f2e) and 
'enable_apic_mode'
WARNING: vmlinux - Section mismatch: reference to .init.text:setup_unisys from 
.text between 'acpi_madt_oem_check' (at offset 0xc0101f37) and 
'enable_apic_mode'WARNING: vmlinux - Section mismatch: reference to 
.init.text:parse_unisys_oem from .text between 'mps_oem_check' (at offset 
0xc0101ec7) and 'acpi_madt_oem_check'
WARNING: vmlinux - Section mismatch: reference to .init.text:es7000_sw_apic 
from .text between 'enable_apic_mode' (at offset 0xc0101f48) and 
'check_apicid_present'

o Some functions which are inline (acpi_madt_oem_check) are not inlined by
  compiler as these functions are accessed using function pointer. These
  functions are put in .text section and they in-turn access __init type
  functions hence modpost generates warnings.

o Do not iniline acpi_madt_oem_check, instead make it __init.

Signed-off-by: Vivek Goyal <[EMAIL PROTECTED]>
Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Cc: Len Brown <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 arch/i386/mach-generic/es7000.c |   41 
 include/asm-i386/mach-es7000/mach_apic.h|7 
 include/asm-i386/mach-es7000/mach_mpparse.h |   32 -
 scripts/mod/modpost.c   |1 
 4 files changed, 42 insertions(+), 39 deletions(-)

Index: linux/arch/i386/mach-generic/es7000.c
===
--- linux.orig/arch/i386/mach-generic/es7000.c
+++ linux/arch/i386/mach-generic/es7000.c
@@ -25,4 +25,45 @@ static int probe_es7000(void)
return 0;
 }
 
+extern void es7000_sw_apic(void);
+static void __init enable_apic_mode(void)
+{
+   es7000_sw_apic();
+   return;
+}
+
+static __init int mps_oem_check(struct mp_config_table *mpc, char *oem,
+   char *productid)
+{
+   if (mpc->mpc_oemptr) {
+   struct mp_config_oemtable *oem_table =
+   (struct mp_config_oemtable *)mpc->mpc_oemptr;
+   if (!strncmp(oem, "UNISYS", 6))
+   return parse_unisys_oem((char *)oem_table);
+   }
+   return 0;
+}
+
+#ifdef CONFIG_ACPI
+/* Hook from generic ACPI tables.c */
+static int __init acpi_madt_oem_check(char *oem_id, char *oem_table_id)
+{
+   unsigned long oem_addr;
+   if (!find_unisys_acpi_oem_table(&oem_addr)) {
+   if (es7000_check_dsdt())
+   return parse_unisys_oem((char *)oem_addr);
+   else {
+   setup_unisys();
+   return 1;
+   }
+   }
+   return 0;
+}
+#else
+static int __init acpi_madt_oem_check(char *oem_id, char *oem_table_id)
+{
+   return 0;
+}
+#endif
+
 struct genapic apic_es7000 = APIC_INIT("es7000", probe_es7000);
Index: linux/include/asm-i386/mach-es7000/mach_apic.h
===
--- linux.orig/include/asm-i386/mach-es7000/mach_apic.h
+++ linux/include/asm-i386/mach-es7000/mach_apic.h
@@ -73,13 +73,6 @@ static inline void init_apic_ldr(void)
apic_write_around(APIC_LDR, val);
 }
 
-extern void es7000_sw_apic(void);
-static inline void enable_apic_mode(void)
-{
-   es7000_sw_apic();
-   return;
-}
-
 extern int apic_version [MAX_APICS];
 static inline void setup_apic_routing(void)
 {
Index: linux/include/asm-i386/mach-es7000/mach_mpparse.h
===
--- linux.orig/include/asm-i386/mach-es7000/mach_mpparse.h
+++ linux/include/asm-i386/mach-es7000/mach_mpparse.h
@@ -18,18 +18,6 @@ extern int parse_unisys_oem (char *oempt
 extern int find_unisys_acpi_oem_table(unsigned long *oem_addr);
 extern void setup_unisys(void);
 
-static inline int mps_oem_check(struct mp_config_table *mpc, char *oem,
-   char *productid)
-{
-   if (mpc->mpc_oemptr) {
-   struct mp_config_oemtable *oem_table =
-   (struct mp_config_oemtable *)mpc->mpc_oemptr;
-   if (!strncmp(oem, "UNISYS", 6))
-   return parse_unisys_oem((char *)oem_table);
-   }
-   return 0;
-}
-
 #ifdef CONFIG_ACPI
 
 static inline int es7000_check_dsdt(void)
@@ -41,26 +29,6 @@ static inline int es7000_check_dsdt(void
return 1;
return 0;
 }

[PATCH] [11/48] i386: workaround for a -Wmissing-prototypes warning

2007-04-29 Thread Andi Kleen

From: Adrian Bunk <[EMAIL PROTECTED]>

Work around a warning with -Wmissing-prototypes in
arch/i386/kernel/asm-offsets.c

The warning isn't gcc's fault - asm-offsets.c is simply a special file.

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>
Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 arch/i386/kernel/asm-offsets.c |3 +++
 1 file changed, 3 insertions(+)

Index: linux/arch/i386/kernel/asm-offsets.c
===
--- linux.orig/arch/i386/kernel/asm-offsets.c
+++ linux/arch/i386/kernel/asm-offsets.c
@@ -25,6 +25,9 @@
 #define OFFSET(sym, str, mem) \
DEFINE(sym, offsetof(struct str, mem));
 
+/* workaround for a warning with -Wmissing-prototypes */
+void foo(void);
+
 void foo(void)
 {
OFFSET(SIGCONTEXT_eax, sigcontext, eax);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [13/48] x86_64: fix ia32_binfmt.c build error

2007-04-29 Thread Andi Kleen

From: Ralf Baechle <[EMAIL PROTECTED]>

Reorder code to avoid multiple inclusion of elf.h.

#undef several symbols to avoid build errors over redefinitions.

Signed-off-by: Ralf Baechle <[EMAIL PROTECTED]>
Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 arch/x86_64/ia32/ia32_binfmt.c |   10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

Index: linux/arch/x86_64/ia32/ia32_binfmt.c
===
--- linux.orig/arch/x86_64/ia32/ia32_binfmt.c
+++ linux/arch/x86_64/ia32/ia32_binfmt.c
@@ -5,6 +5,11 @@
  * This tricks binfmt_elf.c into loading 32bit binaries using lots 
  * of ugly preprocessor tricks. Talk about very very poor man's inheritance.
  */ 
+#define __ASM_X86_64_ELF_H 1
+
+#undef ELF_CLASS
+#define ELF_CLASS ELFCLASS32
+
 #include 
 #include 
 #include 
@@ -50,9 +55,6 @@ struct elf_phdr; 
 #undef ELF_ARCH
 #define ELF_ARCH EM_386
 
-#undef ELF_CLASS
-#define ELF_CLASS ELFCLASS32
-
 #define ELF_DATA   ELFDATA2LSB
 
 #define USE_ELF_CORE_DUMP 1
@@ -136,7 +138,7 @@ struct elf_prpsinfo
 
 #define user user32
 
-#define __ASM_X86_64_ELF_H 1
+#undef elf_read_implies_exec
 #define elf_read_implies_exec(ex, executable_stack) (executable_stack != 
EXSTACK_DISABLE_X)
 //#include 
 #include 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.21 frozen for a few minutes, swapping to disk

2007-04-29 Thread Miguel Figueiredo

Hi all,

today, with 2.6.21, my laptop had a really odd behaviour. It started 
writing to disk for a few minutes with no interactivity at all (no 
redraw on screen, only hdd led on). It's the first time i noticed 
OOM-killer started do kill programs.


It was totally unresponsive for minutes, after back to life it had a 
load of ~19.0, and 300+ MB on swap (first time i saw this).


It's an HP pavillon core duo 2.0 GHz, 1 GB RAM

kern.log details: 
http://www.debianpt.org/~elmig/pool/kernel/20070429/kern.log

.config: http://www.debianpt.org/~elmig/pool/kernel/20070429/2.6.21.config
dmesg: http://www.debianpt.org/~elmig/pool/kernel/20070429/dmesg

As this is the first time it happened and it felt odd i am reporting.

If aditional info is needed please CC me as i am not on the list.

--

Com os melhores cumprimentos/Best regards,

Miguel Figueiredo
http://www.DebianPT.org
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [2/48] i386: Rewrite sched_clock

2007-04-29 Thread Andi Kleen

Move it into an own file for easy sharing.
Do everything per CPU. This avoids problems with TSCs that
tick at different frequencies per CPU.
Resync properly on cpufreq changes. CPU frequency is instable
around cpu frequency changing, so fall back during a backing
clock during this period.
Hopefully TSC will work now on all systems except when there isn't a
physical TSC. 

And

+From: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Three cleanups there:
 - change "instable" -> "unstable"
 - it's better to use get_cpu_var for getting this cpu's variables
 - change cycles_2_ns to do the full computation rather than just the
   tsc->ns scaling.  It's a simpler interface, and it makes the function

Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

---
 arch/i386/kernel/Makefile  |3 
 arch/i386/kernel/sched-clock.c |  213 +
 arch/i386/kernel/tsc.c |   62 ---
 3 files changed, 215 insertions(+), 63 deletions(-)

Index: linux/arch/i386/kernel/sched-clock.c
===
--- /dev/null
+++ linux/arch/i386/kernel/sched-clock.c
@@ -0,0 +1,213 @@
+/* A fast clock for the scheduler. */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * convert from cycles(64bits) => nanoseconds (64bits)
+ *  basic equation:
+ * ns = cycles / (freq / ns_per_sec)
+ * ns = cycles * (ns_per_sec / freq)
+ * ns = cycles * (10^9 / (cpu_khz * 10^3))
+ * ns = cycles * (10^6 / cpu_khz)
+ *
+ * Then we use scaling math (suggested by [EMAIL PROTECTED]) to get:
+ * ns = cycles * (10^6 * SC / cpu_khz) / SC
+ * ns = cycles * cyc2ns_scale / SC
+ *
+ * And since SC is a constant power of two, we can convert the div
+ *  into a shift.
+ *
+ *  We can use khz divisor instead of mhz to keep a better percision, since
+ *  cyc2ns_scale is limited to 10^6 * 2^10, which fits in 32 bits.
+ *  ([EMAIL PROTECTED])
+ *
+ * [EMAIL PROTECTED] "math is hard, lets go shopping!"
+ */
+
+#define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */
+
+struct sc_data {
+   unsigned cyc2ns_scale;
+   unsigned unstable;
+   unsigned long long sync_base;   /* TSC or jiffies at syncpoint*/
+   unsigned long long ns_base; /* nanoseconds at sync point */
+   unsigned long long last_val;/* Last returned value */
+};
+
+static DEFINE_PER_CPU(struct sc_data, sc_data) =
+   { .unstable = 1, .sync_base = INITIAL_JIFFIES };
+
+static inline u64 cycles_2_ns(struct sc_data *sc, u64 cyc)
+{
+   u64 ns;
+
+   cyc -= sc->sync_base;
+   ns = (cyc * sc->cyc2ns_scale) >> CYC2NS_SCALE_FACTOR;
+   ns += sc->ns_base;
+
+   return ns;
+}
+
+/*
+ * Scheduler clock - returns current time in nanosec units.
+ * All data is local to the CPU.
+ * The values are approximately[1] monotonic local to a CPU, but not
+ * between CPUs.   There might be also an occasionally random error,
+ * but not too bad. Between CPUs the values can be non monotonic.
+ *
+ * [1] no attempt to stop CPU instruction reordering, which can hit
+ * in a 100 instruction window or so.
+ *
+ * The clock can be in two states: stable and unstable.
+ * When it is stable we use the TSC per CPU.
+ * When it is unstable we use jiffies as fallback.
+ * stable->unstable->stable transitions can happen regularly
+ * during CPU frequency changes.
+ * There is special code to avoid having the clock jump backwards
+ * when we switch from TSC to jiffies, which needs to keep some state
+ * per CPU. This state is protected against parallel state changes
+ * with interrupts off.
+ */
+unsigned long long sched_clock(void)
+{
+   unsigned long long r;
+   struct sc_data *sc = &get_cpu_var(sc_data);
+
+   if (sc->unstable) {
+   unsigned long flags;
+   r = (jiffies_64 - sc->sync_base) * (10 / HZ);
+   r += sc->ns_base;
+   local_irq_save(flags);
+   /* last_val is used to avoid non monotonity on a
+  stable->unstable transition. Make sure the time
+  never goes to before the last value returned by
+  the TSC clock */
+   if (r <= sc->last_val)
+   r = sc->last_val + 1;
+   sc->last_val = r;
+   local_irq_restore(flags);
+   } else {
+   get_scheduled_cycles(r);
+   r = cycles_2_ns(sc, r);
+   sc->last_val = r;
+   }
+
+   put_cpu_var(sc_data);
+
+   return r;
+}
+
+/* Resync with new CPU frequency */
+static void resync_sc_freq(struct sc_data *sc, unsigned int newfreq)
+{
+   sc->sync_base = jiffies;
+   if (!cpu_has_tsc) {
+   sc->unstable = 1;
+   return;
+   }
+   /* Handle nesting, but when we're z

[PATCH] [5/48] x86_64: Allow sys_uselib unconditionally

2007-04-29 Thread Andi Kleen

Previously it wasn't enabled in the binfmt_aout is a module case. 

Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

---
 arch/x86_64/ia32/ia32entry.S |4 
 1 file changed, 4 deletions(-)

Index: linux/arch/x86_64/ia32/ia32entry.S
===
--- linux.orig/arch/x86_64/ia32/ia32entry.S
+++ linux/arch/x86_64/ia32/ia32entry.S
@@ -481,11 +481,7 @@ ia32_sys_call_table:
.quad sys_symlink
.quad sys_lstat
.quad sys_readlink  /* 85 */
-#ifdef CONFIG_IA32_AOUT
.quad sys_uselib
-#else
-   .quad quiet_ni_syscall
-#endif
.quad sys_swapon
.quad sys_reboot
.quad compat_sys_old_readdir
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [6/48] x86_64: Minor white space cleanup in traps.c

2007-04-29 Thread Andi Kleen

Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

---
 arch/x86_64/kernel/traps.c |4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

Index: linux/arch/x86_64/kernel/traps.c
===
--- linux.orig/arch/x86_64/kernel/traps.c
+++ linux/arch/x86_64/kernel/traps.c
@@ -426,8 +426,7 @@ void show_registers(struct pt_regs *regs
const int cpu = smp_processor_id();
struct task_struct *cur = cpu_pda(cpu)->pcurrent;
 
-   rsp = regs->rsp;
-
+   rsp = regs->rsp;
printk("CPU %d ", cpu);
__show_regs(regs);
printk("Process %s (pid: %d, threadinfo %p, task %p)\n",
@@ -438,7 +437,6 @@ void show_registers(struct pt_regs *regs
 * time of the fault..
 */
if (in_kernel) {
-
printk("Stack: ");
_show_stack(NULL, regs, (unsigned long*)rsp);
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [19/48] x86_64: split remaining fake nodes equally

2007-04-29 Thread Andi Kleen

From: David Rientjes <[EMAIL PROTECTED]>

Extends the numa=fake x86_64 command-line option to split the remaining
system memory into equal-sized nodes.

For example:
numa=fake=2*512,4*  gives two 512M nodes and the remaining system
memory is split into four approximately equal
chunks.

This is beneficial for systems where the exact size of RAM is unknown or not
necessarily relevant, but the granularity with which nodes shall be allocated
is known.

Cc: Andi Kleen <[EMAIL PROTECTED]>
Signed-off-by: David Rientjes <[EMAIL PROTECTED]>
Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>
Cc: Paul Jackson <[EMAIL PROTECTED]>
Cc: Christoph Lameter <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 Documentation/x86_64/boot-options.txt |4 +++-
 arch/x86_64/mm/numa.c |   22 ++
 2 files changed, 21 insertions(+), 5 deletions(-)

Index: linux/Documentation/x86_64/boot-options.txt
===
--- linux.orig/Documentation/x86_64/boot-options.txt
+++ linux/Documentation/x86_64/boot-options.txt
@@ -155,7 +155,9 @@ NUMA
depending on the sizes and coefficients listed.  For example:
numa=fake=2*512,1024,4*256
gives two 512M nodes, a 1024M node, and four 256M nodes.  The
-   remaining system RAM is allocated to an additional node.
+   remaining system RAM is allocated to an additional node.  If
+   the last character of CMDLINE is a *, the remaining system RAM
+   is instead divided up equally among its coefficient.
 
   numa=hotadd=percent
Only allow hotadd memory to preallocate page structures upto
Index: linux/arch/x86_64/mm/numa.c
===
--- linux.orig/arch/x86_64/mm/numa.c
+++ linux/arch/x86_64/mm/numa.c
@@ -418,11 +418,25 @@ static int __init numa_emulation(unsigne
 done:
if (!num_nodes)
return -1;
-   /* Fill remainder of system RAM with a final node, if appropriate. */
+   /* Fill remainder of system RAM, if appropriate. */
if (addr < max_addr) {
-   setup_node_range(num_nodes, nodes, &addr, max_addr - addr,
-max_addr);
-   num_nodes++;
+   switch (*(cmdline - 1)) {
+   case '*':
+   /* Split remaining nodes into coeff chunks */
+   if (coeff <= 0)
+   break;
+   num_nodes += split_nodes_equally(nodes, &addr, max_addr,
+num_nodes, coeff);
+   break;
+   case ',':
+   /* Do not allocate remaining system RAM */
+   break;
+   default:
+   /* Give one final node */
+   setup_node_range(num_nodes, nodes, &addr,
+max_addr - addr, max_addr);
+   num_nodes++;
+   }
}
 out:
memnode_shift = compute_hash_shift(nodes, num_nodes);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [12/48] x86: Log reason why TSC was marked unstable

2007-04-29 Thread Andi Kleen

From: john stultz <[EMAIL PROTECTED]>

Change mark_tsc_unstable() so it takes a string argument, which holds the
reason the TSC was marked unstable.

This is then displayed the first time mark_tsc_unstable is called.

This should help us better debug why the TSC was marked unstable on certain
systems and allow us to make sure we're not being overly paranoid when
throwing out this troublesome clocksource.

Cc: Ingo Molnar <[EMAIL PROTECTED]>
Cc: Thomas Gleixner <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

---

 arch/i386/kernel/cpu/cyrix.c|2 +-
 arch/i386/kernel/tsc.c  |5 +++--
 arch/x86_64/kernel/time.c   |2 +-
 arch/x86_64/kernel/tsc.c|5 +++--
 arch/x86_64/kernel/tsc_sync.c   |2 +-
 drivers/acpi/processor_idle.c   |4 ++--
 include/asm-i386/mach-summit/mach_mpparse.h |4 ++--
 include/asm-i386/tsc.h  |2 +-
 include/asm-x86_64/timex.h  |2 +-
 9 files changed, 15 insertions(+), 13 deletions(-)

Index: linux/arch/i386/kernel/cpu/cyrix.c
===
--- linux.orig/arch/i386/kernel/cpu/cyrix.c
+++ linux/arch/i386/kernel/cpu/cyrix.c
@@ -279,7 +279,7 @@ static void __cpuinit init_cyrix(struct 
 */  
if (vendor == PCI_VENDOR_ID_CYRIX &&
 (device == PCI_DEVICE_ID_CYRIX_5510 || device == 
PCI_DEVICE_ID_CYRIX_5520))
-   mark_tsc_unstable();
+   mark_tsc_unstable("cyrix 5510/5520 detected");
}
 #endif
c->x86_cache_size=16;   /* Yep 16K integrated cache thats it */
Index: linux/arch/i386/kernel/tsc.c
===
--- linux.orig/arch/i386/kernel/tsc.c
+++ linux/arch/i386/kernel/tsc.c
@@ -172,7 +172,7 @@ time_cpufreq_notifier(struct notifier_bl
ref_freq, freq->new);
if (!(freq->flags & CPUFREQ_CONST_LOOPS)) {
tsc_khz = cpu_khz;
-   mark_tsc_unstable();
+   mark_tsc_unstable("cpufreq changes");
}
}
}
@@ -220,11 +220,12 @@ static struct clocksource clocksource_ts
  CLOCK_SOURCE_MUST_VERIFY,
 };
 
-void mark_tsc_unstable(void)
+void mark_tsc_unstable(char *reason)
 {
if (!tsc_unstable) {
tsc_unstable = 1;
tsc_enabled = 0;
+   printk("Marking TSC unstable due to: %s.\n", reason);
/* Can be called before registration */
if (clocksource_tsc.mult)
clocksource_change_rating(&clocksource_tsc, 0);
Index: linux/arch/x86_64/kernel/time.c
===
--- linux.orig/arch/x86_64/kernel/time.c
+++ linux/arch/x86_64/kernel/time.c
@@ -397,7 +397,7 @@ void __init time_init(void)
cpu_khz = tsc_calibrate_cpu_khz();
 
if (unsynchronized_tsc())
-   mark_tsc_unstable();
+   mark_tsc_unstable("TSCs unsynchronized");
 
if (cpu_has(&boot_cpu_data, X86_FEATURE_RDTSCP))
vgetcpu_mode = VGETCPU_RDTSCP;
Index: linux/arch/x86_64/kernel/tsc.c
===
--- linux.orig/arch/x86_64/kernel/tsc.c
+++ linux/arch/x86_64/kernel/tsc.c
@@ -85,7 +85,7 @@ static int time_cpufreq_notifier(struct 
 
tsc_khz = cpufreq_scale(tsc_khz_ref, ref_freq, freq->new);
if (!(freq->flags & CPUFREQ_CONST_LOOPS))
-   mark_tsc_unstable();
+   mark_tsc_unstable("cpufreq changes");
}
 
return 0;
@@ -171,10 +171,11 @@ static struct clocksource clocksource_ts
.vread  = vread_tsc,
 };
 
-void mark_tsc_unstable(void)
+void mark_tsc_unstable(char *reason)
 {
if (!tsc_unstable) {
tsc_unstable = 1;
+   printk("Marking TSC unstable due to %s\n", reason);
/* Change only the rating, when not registered */
if (clocksource_tsc.mult)
clocksource_change_rating(&clocksource_tsc, 0);
Index: linux/arch/x86_64/kernel/tsc_sync.c
===
--- linux.orig/arch/x86_64/kernel/tsc_sync.c
+++ linux/arch/x86_64/kernel/tsc_sync.c
@@ -138,7 +138,7 @@ void __cpuinit check_tsc_sync_source(int
printk("\n");
printk(KERN_WARNING "Measured %Ld cycles TSC warp between CPUs,"
" turning off TSC clock.\n", max_warp);
-   mark_tsc_unstable();
+   mark_tsc_unstable("check_ts

  1   2   3   4   5   >