Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-28 Thread Frans Pop
Hoi Bas,

On Thursday 28 December 2006 20:15, you wrote:
> You wrote:
> > forgot to ask you to for the ouput of dmidecode and acpidump.
> > for the acpi blacklisting.
>
> Attached.

Als de oplossing wordt om machines uit te sluiten, lijkt me dit wel een 
candidaat voor documentatie in de Release Notes. Een voorgestelde tekst 
daarvoor zou zeer welkom zijn (BR tegen release-notes).

Gr,
Frans


pgpOGqHruRgmR.pgp
Description: PGP signature


Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-28 Thread Matthew Garrett
A couple of observations:

* This bug will not cause hardware damage. The hard thermal cutoff 
temperature is well below the temperature at which actual damage will 
occur.

* It's not clear that the vendor DSDT is broken. It's an unusual 
interpretation of the spec, but not necessarily an invalid one - sadly, 
the ACPI specification is not entirely clear on every point.

The patch is /probably/ safe, and we've been shipping it in Ubuntu. On 
the other hand, previous versions did cause problems on certain other 
items of hardware. It's not clear what the best option is, but it's 
certainly not a regression over Sarge.

-- 
Matthew Garrett | [EMAIL PROTECTED]


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-28 Thread maximilian attems
hello,

On Wed, Dec 27, 2006 at 09:46:04PM +0100, Bas Zoetekouw wrote:
> 
> At Jurij's request, I've tried out his patch.  It seems to work
> perfectly here (HP nc6120) , and fixes the "no fans after suspend"
> problem of #400488.

forgot to ask you to for the ouput of dmidecode and acpidump.
for the acpi blacklisting.

thanks

-- 
maks


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-28 Thread maximilian attems
hello,

On Fri, 22 Dec 2006, Marc 'HE' Brockschmidt wrote:

> [EMAIL PROTECTED] writes:
> > I'm more than willing to help test a kernel package, but I'll be on
> > [VAC] from 2006-12-23 to 2007-01-03 inclusive.  So, please do not
> > release Etch just now :)
> 
> I have ordered an nx6325, which should arrive directly after
> Christmas. I would also be happy to test a fixed kernel. Due to this
> being an overheating problem, I would prefer if you could provide kernel
> images, so that I don't have to compile it.
> 
> Marc
> -- 
> BOFH #34:
> (l)user error

could you please send in the output of:
dmidecode
acpidump

thanks

-- 
maks


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-28 Thread Steve Langasek
On Tue, Dec 26, 2006 at 06:52:06PM -0800, Jurij Smakov wrote:
> On Wed, Dec 27, 2006 at 03:40:58AM +0100, maximilian attems wrote:

> > > I have reviewed the information available on the thermal problems with 
> > > HP laptops, and it appears that there is a fairly conservative set of 
> > > patches which takes care of the problems (thanks to Bas for pointing 
> > > most of the out). I might have missed some upstream bugs, so please 
> > > let me know if there is anything else available on the issue. Below is 
> > > the summary, describing the relevant patches:

> > i nack the mentioned patches!

> Well, that's one in favor and one vote against then.

I'm going to have to side with maks on this.  The last thing we need at this
point of the release is a complex backported patch, targetted or not, that's
going to require a lot of third-party testing before we can even establish
whether it's caused regressions for other systems.

I think that leaves the best option as ACPI blacklisting, in the kernel, for
those models known to have problems.  I think this is strictly better than
trying to have the kernel give a warning when it detects such a model; it's
more likely to reach the target audience than a note in the release notes;
and it's far less of a support burden overall than trying to add in a
special 2.6.19 kernel in and pretend that support for it could be at all
comparable to that of the main kernel for the release.

Cheers,
-- 
Steve Langasek   Give me a lever long enough and a Free OS
Debian Developer   to set it on, and I can move the world.
[EMAIL PROTECTED]   http://www.debian.org/


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-27 Thread Bas Zoetekouw
Hi!

At Jurij's request, I've tried out his patch.  It seems to work
perfectly here (HP nc6120) , and fixes the "no fans after suspend"
problem of #400488.

-- 
Kind regards,
++
| Bas Zoetekouw  | GPG key: 0644fab7 |
|| Fingerprint: c1f5 f24c d514 3fec 8bf6 |
| [EMAIL PROTECTED] |  a2b1 2bae e41f 0644 fab7 |
++ 


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-27 Thread maximilian attems
On Tue, 26 Dec 2006, Jurij Smakov wrote:

> On Wed, Dec 27, 2006 at 04:22:45AM +0100, maximilian attems wrote:

> > why was that fact never rc for sarge?
> > #259481, #262383
> 
> Discussing why it was not RC for Sarge seems pretty irrelevant to me. 
> It's up to release managers what is RC, and Etch release managers have 
> stated repeatedly that this issue is RC. I happen to agree with their 
> position.

a broken dsdt is a vendor fault.

for sarge the affected range was across all boxes,
here this affects 2 specific hp laptop models.
 
> > the dsdt of those hp notebooks is quite strange,
> > if you follow mjg59 posts you have read a funny story:
> > http://mjg59.livejournal.com/67443.html
> > 
> > the reference is easily readable in the git-commits-mail,
> > if you interested in a 2006 tarball, i can send it.
> > 
> > check b976fe19acc565e5137e6f12af7b6633a23e6b7c
> > it reverts your proposed patch.
> 
> >From the comments in patch #9746:
> 
>  First attempt to create a new thread was done by Peter Wainwright
>  He created a bunch of threads, which were stealing work from a kacpid 
> workqueue.
>  This patch appeared in 2.6.15 kernel shipped with Ubuntu 6.06 LTS.
> 
>  Second attempt was done by me, I created a new thread for each Notify
>  event. This worked OK on HP nx machines, but broke Linus' Compaq
>  n620c, by producing threads with a speed what they stopped the machine
>  completely. Thus this patch was reverted from 18-rc2 as I remember.
>  I re-made the patch to create second workqueue just for notify events, 
>  thus hopping it will not break Linus' machine. Patch was tested on the
>  same HP nx machines in #5534 and #7122, but I did not received reply
>  from Linus on a test patch sent to him.
>  Patch went to 19-rc and was rejected with much fanfare again.
>  There was 4th patch, which inserted schedule_timeout(1) into deferred
>  execution of kacpid, if we had any notify requests pending, but Linus
>  decided that it was too complex (involved either changes to workqueue
>  to see if it's empty or atomic inc/dec).
>  Now you see last variant which adds yield() to every GPE execution.
>  http://bugzilla.kernel.org/show_bug.cgi?id=5534
> 
> Obviously, this version of the patch is not the one which was 
> reverted. It has already went through some pretty stringent review and 
> incremental improvement.

again i'm highly skeptic about the patch quality.
the semantics of yield() changed fundamentally from 2.4 to 2.6.
afaik only b0rked code in 2.6 needs yield().
 
> > fully agreed.
> > the cost analysis of acpi patches seems quite high,
> > that's why we currently have the policy not to add any.
> > i hate to do name dropping, but that goes back to hch.
> 
> I'm not aware of any such policy. We have backported a fair amount of 
> fixes from newer upstream releases, I don't see what qualifies ACPI as 
> some magic which should not be touched.
> -- 
> Jurij Smakov   [EMAIL PROTECTED]
> Key: http://www.wooyd.org/pgpkey/  KeyID: C99E03CC

the high risk of unwanted/unrelated side effects of the acpi subsys.

-- 
maks


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-26 Thread Jurij Smakov
On Wed, Dec 27, 2006 at 04:22:45AM +0100, maximilian attems wrote:
> On Tue, Dec 26, 2006 at 06:52:06PM -0800, Jurij Smakov wrote:
> >  
> > > backports are risky, again as you see for the net-r8169-1.patch,
> > > that is a "localized" driver enhancement with big slow down consequences
> > > #400524 and #403782. yes upstream has a fix for that and it should
> > > land soon, but still no one else bothered yet.
> > 
> > That's because slower networking will not break your hardware.
> 
> why was that fact never rc for sarge?
> #259481, #262383

Discussing why it was not RC for Sarge seems pretty irrelevant to me. 
It's up to release managers what is RC, and Etch release managers have 
stated repeatedly that this issue is RC. I happen to agree with their 
position.

> the dsdt of those hp notebooks is quite strange,
> if you follow mjg59 posts you have read a funny story:
> http://mjg59.livejournal.com/67443.html
> 
> the reference is easily readable in the git-commits-mail,
> if you interested in a 2006 tarball, i can send it.
> 
> check b976fe19acc565e5137e6f12af7b6633a23e6b7c
> it reverts your proposed patch.

>From the comments in patch #9746:

 First attempt to create a new thread was done by Peter Wainwright
 He created a bunch of threads, which were stealing work from a kacpid 
workqueue.
 This patch appeared in 2.6.15 kernel shipped with Ubuntu 6.06 LTS.

 Second attempt was done by me, I created a new thread for each Notify
 event. This worked OK on HP nx machines, but broke Linus' Compaq
 n620c, by producing threads with a speed what they stopped the machine
 completely. Thus this patch was reverted from 18-rc2 as I remember.
 I re-made the patch to create second workqueue just for notify events, 
 thus hopping it will not break Linus' machine. Patch was tested on the
 same HP nx machines in #5534 and #7122, but I did not received reply
 from Linus on a test patch sent to him.
 Patch went to 19-rc and was rejected with much fanfare again.
 There was 4th patch, which inserted schedule_timeout(1) into deferred
 execution of kacpid, if we had any notify requests pending, but Linus
 decided that it was too complex (involved either changes to workqueue
 to see if it's empty or atomic inc/dec).
 Now you see last variant which adds yield() to every GPE execution.
 http://bugzilla.kernel.org/show_bug.cgi?id=5534

Obviously, this version of the patch is not the one which was 
reverted. It has already went through some pretty stringent review and 
incremental improvement.

> fully agreed.
> the cost analysis of acpi patches seems quite high,
> that's why we currently have the policy not to add any.
> i hate to do name dropping, but that goes back to hch.

I'm not aware of any such policy. We have backported a fair amount of 
fixes from newer upstream releases, I don't see what qualifies ACPI as 
some magic which should not be touched.
-- 
Jurij Smakov   [EMAIL PROTECTED]
Key: http://www.wooyd.org/pgpkey/  KeyID: C99E03CC


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-26 Thread maximilian attems
On Tue, Dec 26, 2006 at 06:52:06PM -0800, Jurij Smakov wrote:
>  
> > backports are risky, again as you see for the net-r8169-1.patch,
> > that is a "localized" driver enhancement with big slow down consequences
> > #400524 and #403782. yes upstream has a fix for that and it should
> > land soon, but still no one else bothered yet.
> 
> That's because slower networking will not break your hardware.

why was that fact never rc for sarge?
#259481, #262383
 
> > the acpi patches may solve the troubles with those stupid HP laptops,
> > but they have _certainly_ side effects.
> > if you look at the acpi commits of this day you see that they broke
> > a toshiba laptop.
> 
> Do you have a reference to that? And we do have a possibility to test 
> the changes pretty extensively by uploading to unstable plus 
> specifically asking people to test.

the dsdt of those hp notebooks is quite strange,
if you follow mjg59 posts you have read a funny story:
http://mjg59.livejournal.com/67443.html

the reference is easily readable in the git-commits-mail,
if you interested in a 2006 tarball, i can send it.

check b976fe19acc565e5137e6f12af7b6633a23e6b7c
it reverts your proposed patch.
  
> > and push a newer linux in a point release.
> 
> Do you have a patch which does that? If that would exist, I might 
> reconsider my position.
 
no that is a release manager position. ;)
but i assume you mean a patch for drivers/acpi/blacklist.c
that should be fairly easy to create once we get dmidecode
output of the bug reporter.

fully untested:

diff --git a/drivers/acpi/blacklist.c b/drivers/acpi/blacklist.c
index f9c972b..669d81d 100644
--- a/drivers/acpi/blacklist.c
+++ b/drivers/acpi/blacklist.c
@@ -69,6 +69,9 @@ static struct acpi_blacklist_item acpi_blacklist[] __initdata 
= {
 "Incorrect _ADR", 1},
{"ASUS\0\0", "P2B-S   ", 0, ACPI_DSDT, all_versions,
 "Bogus PCI routing", 1},
+/* HP nx6125 */
+   {"Hewlett-Packard ", "68DTT Ver. F.0", 0xE, ACPI_DSDT, all_versions,
+"Bogus fan support", 1},
 
{""}
 };

> > playing with acpi fire is not appropriate for a stable release.
> 
> It's all about cost/benefit analysis. In my eyes the benefits of 
> introducing these patches significantly outweighs the possible 
> problems, given the proper testing.

fully agreed.
the cost analysis of acpi patches seems quite high,
that's why we currently have the policy not to add any.
i hate to do name dropping, but that goes back to hch.

best regards

--
maks


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-26 Thread Jurij Smakov
On Tue, Dec 26, 2006 at 06:09:02PM -0800, Jurij Smakov wrote:

> So far I have not tried building the kernel with this patches, but I think 
> this is
> a reasonable way to resolve the problem, as the resulting cumulative patch 
> (attached)
> is only 19K.

Sorry, I made this patch reversed by mistake. Use the one attached to 
this message, or apply the old one with 'patch -R' :-P
-- 
Jurij Smakov   [EMAIL PROTECTED]
Key: http://www.wooyd.org/pgpkey/  KeyID: C99E03CC
diff -aur a/drivers/acpi/bus.c b/drivers/acpi/bus.c
--- a/drivers/acpi/bus.c	2006-09-19 20:42:06.0 -0700
+++ b/drivers/acpi/bus.c	2006-12-26 19:21:33.0 -0800
@@ -202,15 +202,14 @@
 	 * Get device's current power state if it's unknown
 	 * This means device power state isn't initialized or previous setting failed
 	 */
-	if (!device->flags.force_power_state) {
-		if (device->power.state == ACPI_STATE_UNKNOWN)
-			acpi_bus_get_power(device->handle, &device->power.state);
-		if (state == device->power.state) {
-			ACPI_DEBUG_PRINT((ACPI_DB_INFO, "Device is already at D%d\n",
-	  state));
-			return 0;
-		}
+	if ((device->power.state == ACPI_STATE_UNKNOWN) || device->flags.force_power_state)
+		acpi_bus_get_power(device->handle, &device->power.state);
+	if ((state == device->power.state) && !device->flags.force_power_state) {
+		ACPI_DEBUG_PRINT((ACPI_DB_INFO, "Device is already at D%d\n",
+  state));
+		return 0;
 	}
+
 	if (!device->power.states[state].flags.valid) {
 		printk(KERN_WARNING PREFIX "Device does not support D%d\n", state);
 		return -ENODEV;
diff -aur a/drivers/acpi/events/evmisc.c b/drivers/acpi/events/evmisc.c
--- a/drivers/acpi/events/evmisc.c	2006-09-19 20:42:06.0 -0700
+++ b/drivers/acpi/events/evmisc.c	2006-12-26 19:21:15.0 -0800
@@ -342,20 +342,8 @@
 	if (acquired) {
 
 		/* Got the lock, now wake all threads waiting for it */
-
 		acpi_gbl_global_lock_acquired = TRUE;
-
-		/* Run the Global Lock thread which will signal all waiting threads */
-
-		status =
-		acpi_os_execute(OSL_GLOBAL_LOCK_HANDLER,
-acpi_ev_global_lock_thread, context);
-		if (ACPI_FAILURE(status)) {
-			ACPI_EXCEPTION((AE_INFO, status,
-	"Could not queue Global Lock thread"));
-
-			return (ACPI_INTERRUPT_NOT_HANDLED);
-		}
+		acpi_ev_global_lock_thread(context);
 	}
 
 	return (ACPI_INTERRUPT_HANDLED);
diff -aur a/drivers/acpi/osl.c b/drivers/acpi/osl.c
--- a/drivers/acpi/osl.c	2006-09-19 20:42:06.0 -0700
+++ b/drivers/acpi/osl.c	2006-12-26 19:21:30.0 -0800
@@ -73,6 +73,7 @@
 static acpi_osd_handler acpi_irq_handler;
 static void *acpi_irq_context;
 static struct workqueue_struct *kacpid_wq;
+static struct workqueue_struct *kacpi_notify_wq;
 
 acpi_status acpi_os_initialize(void)
 {
@@ -91,8 +92,9 @@
 		return AE_NULL_ENTRY;
 	}
 	kacpid_wq = create_singlethread_workqueue("kacpid");
+	kacpi_notify_wq = create_singlethread_workqueue("kacpi_notify");
 	BUG_ON(!kacpid_wq);
-
+	BUG_ON(!kacpi_notify_wq);
 	return AE_OK;
 }
 
@@ -104,6 +106,7 @@
 	}
 
 	destroy_workqueue(kacpid_wq);
+	destroy_workqueue(kacpi_notify_wq);
 
 	return AE_OK;
 }
@@ -566,10 +569,24 @@
 
 static void acpi_os_execute_deferred(void *context)
 {
-	struct acpi_os_dpc *dpc = NULL;
+	struct acpi_os_dpc *dpc = context;
+	if (!dpc) {
+		printk(KERN_ERR PREFIX "Invalid (NULL) context\n");
+		return;
+	}
+
+	dpc->function(dpc->context);
+	kfree(dpc);
+
+	/* Yield cpu to notify thread */
+	cond_resched();
 
+	return;
+}
 
-	dpc = (struct acpi_os_dpc *)context;
+static void acpi_os_execute_notify(void *context)
+{
+	struct acpi_os_dpc *dpc = context;
 	if (!dpc) {
 		printk(KERN_ERR PREFIX "Invalid (NULL) context\n");
 		return;
@@ -604,14 +621,12 @@
 	struct acpi_os_dpc *dpc;
 	struct work_struct *task;
 
-	ACPI_FUNCTION_TRACE("os_queue_for_execution");
-
 	ACPI_DEBUG_PRINT((ACPI_DB_EXEC,
 			  "Scheduling function [%p(%p)] for deferred execution.\n",
 			  function, context));
 
 	if (!function)
-		return_ACPI_STATUS(AE_BAD_PARAMETER);
+		return AE_BAD_PARAMETER;
 
 	/*
 	 * Allocate/initialize DPC structure.  Note that this memory will be
@@ -624,23 +639,27 @@
 	 * from the same memory.
 	 */
 
-	dpc =
-	kmalloc(sizeof(struct acpi_os_dpc) + sizeof(struct work_struct),
-		GFP_ATOMIC);
+	dpc = kzalloc(sizeof(struct acpi_os_dpc) +
+			sizeof(struct work_struct), GFP_ATOMIC);
 	if (!dpc)
 		return_ACPI_STATUS(AE_NO_MEMORY);
 
 	dpc->function = function;
 	dpc->context = context;
 
-	task = (void *)(dpc + 1);
-	INIT_WORK(task, acpi_os_execute_deferred, (void *)dpc);
-
-	if (!queue_work(kacpid_wq, task)) {
-		ACPI_DEBUG_PRINT((ACPI_DB_ERROR,
-  "Call to queue_work() failed.\n"));
-		kfree(dpc);
-		status = AE_ERROR;
+	task = (struct work_struct *)(dpc + 1);
+	if (type == OSL_NOTIFY_HANDLER) {
+		INIT_WORK(task, acpi_os_execute_notify, (void *)dpc);
+		if (!queue_work(kacpi_notify_wq, task)) {
+			status = AE_ERROR;
+			kfree(dpc);
+		}
+	} else {
+		INIT_WO

Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-26 Thread Jurij Smakov
On Wed, Dec 27, 2006 at 03:40:58AM +0100, maximilian attems wrote:

> > I have reviewed the information available on the thermal problems with 
> > HP laptops, and it appears that there is a fairly conservative set of 
> > patches which takes care of the problems (thanks to Bas for pointing 
> > most of the out). I might have missed some upstream bugs, so please 
> > let me know if there is anything else available on the issue. Below is 
> > the summary, describing the relevant patches:
> 
> i nack the mentioned patches!

Well, that's one in favor and one vote against then.
 
> backports are risky, again as you see for the net-r8169-1.patch,
> that is a "localized" driver enhancement with big slow down consequences
> #400524 and #403782. yes upstream has a fix for that and it should
> land soon, but still no one else bothered yet.

That's because slower networking will not break your hardware.

> the acpi patches may solve the troubles with those stupid HP laptops,
> but they have _certainly_ side effects.
> if you look at the acpi commits of this day you see that they broke
> a toshiba laptop.

Do you have a reference to that? And we do have a possibility to test 
the changes pretty extensively by uploading to unstable plus 
specifically asking people to test.
 
> back to the facts
> * the sarge kernel was released with *huge* thermal problems
>   and without any userspace help for early loading
> * the etch 2.6.18 linux acpi supports *many* thermal boxes
>   thermal hooks load modules at earliest possible stage
> * acpi releases have regression tests that are only run
>   for the complete release itself
> 
> the sanest way is to disable acpi for the affected laptops
> and push a newer linux in a point release.

Do you have a patch which does that? If that would exist, I might 
reconsider my position.

> playing with acpi fire is not appropriate for a stable release.

It's all about cost/benefit analysis. In my eyes the benefits of 
introducing these patches significantly outweighs the possible 
problems, given the proper testing.

Best regards,
-- 
Jurij Smakov   [EMAIL PROTECTED]
Key: http://www.wooyd.org/pgpkey/  KeyID: C99E03CC


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-26 Thread maximilian attems
On Tue, Dec 26, 2006 at 06:09:02PM -0800, Jurij Smakov wrote:
> On Sun, Dec 24, 2006 at 03:07:55AM +0100, Frederik Schueler wrote:
> > 
> > Hi *,
> > 
> > this is indeed a severe issue which requires all our attention and care
> > to solve or circumvent in order for nobodies boxes to get any harm, you
> > know how expensive these laptops are.
> > 
> > I basically see 3 solutions/workarounds:
> > 
> > 1. the brutal one: deactivate ACPI in 2.6.18, have the bios keep control
> > of the fans - better a noisy laptop until I upgrade the kernel than a
> > fried box.
> > 
> > 2. port 2.6.19 ACPI - noop because way too much work, unless someone 
> > "crazy enough" to accomplish this task.
> 
> I have reviewed the information available on the thermal problems with 
> HP laptops, and it appears that there is a fairly conservative set of 
> patches which takes care of the problems (thanks to Bas for pointing 
> most of the out). I might have missed some upstream bugs, so please 
> let me know if there is anything else available on the issue. Below is 
> the summary, describing the relevant patches:

i nack the mentioned patches!

backports are risky, again as you see for the net-r8169-1.patch,
that is a "localized" driver enhancement with big slow down consequences
#400524 and #403782. yes upstream has a fix for that and it should
land soon, but still no one else bothered yet.

the acpi patches may solve the troubles with those stupid HP laptops,
but they have _certainly_ side effects.
if you look at the acpi commits of this day you see that they broke
a toshiba laptop.


back to the facts
* the sarge kernel was released with *huge* thermal problems
  and without any userspace help for early loading
* the etch 2.6.18 linux acpi supports *many* thermal boxes
  thermal hooks load modules at earliest possible stage
* acpi releases have regression tests that are only run
  for the complete release itself

the sanest way is to disable acpi for the affected laptops
and push a newer linux in a point release.
playing with acpi fire is not appropriate for a stable release.

 
-- 
maks


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-26 Thread Jurij Smakov
On Sun, Dec 24, 2006 at 03:07:55AM +0100, Frederik Schueler wrote:
> 
> Hi *,
> 
> this is indeed a severe issue which requires all our attention and care
> to solve or circumvent in order for nobodies boxes to get any harm, you
> know how expensive these laptops are.
> 
> I basically see 3 solutions/workarounds:
> 
> 1. the brutal one: deactivate ACPI in 2.6.18, have the bios keep control
> of the fans - better a noisy laptop until I upgrade the kernel than a
> fried box.
> 
> 2. port 2.6.19 ACPI - noop because way too much work, unless someone 
> "crazy enough" to accomplish this task.

I have reviewed the information available on the thermal problems with 
HP laptops, and it appears that there is a fairly conservative set of 
patches which takes care of the problems (thanks to Bas for pointing 
most of the out). I might have missed some upstream bugs, so please 
let me know if there is anything else available on the issue. Below is 
the summary, describing the relevant patches:

Bug #5534: No thermal events until acpi -t - HP nx6125
--
Summary: thermal events generated by the ACPI subsystem do not get
processed by the kernel because both the interrupt due to a thermal
event and event handler are managed by the same thread (kacpid). The
solution is to create a separate thread for the handler, so that the
processing of thermal events may happen asynchronously.

I have identified the following patches which appear to finally resolve
the problem:

#8951 from comment #159 Don't defer release of the global lock.
(applies to drivers/acpi/events/evmisc.c)
#8952 from comment #160 Create another workqueue for notify()
execution.
(applies to drivers/acpi/osl.c)

These patches presumably solve the problem, but the problem persists after
suspend/resume cycle. Followup patches which are supposed to improve the
situation include:

#9631 from comment #171 Improved version of #8952, which prevents
flooding of certain machines with thermal
events (Linus owns one of those, so he was
very unhappy :-)
#9746 from comment #180 Some further improvements. AFAICT, supersedes
#9631 and #8952.

So, it looks like we need #8951 and #9746 from this bug. Both apply cleanly
to our 2.6.18-8 source.

Bug #7122: Thermal management problems - HPC nx6325
---
Summary: the fans do not come on properly after resume/suspend cycle. Looks
like the reason for the problem is that the ACPI logic which turns on the
fans cannot cope with the fact that it might be needed to execute the
"power on" method for fans a few times before they actually turn on.

The following patches appear to be relevant:

#9254 from comment #37  Reset number of resource references on resume
and make power on/off routines more strict and
robust.
#9255 from comment #38  Make ACPI suspend handlers to occur before 
_PTS/_GTS methods and ACPI resume handlers to
occur after _WAK method.
#9263 from comment #41  A modification of #9254 to apply to 
2.6.19-rc1-mm1

#9355 from comment #48  Implement power resource references as a list,
so if two devices using the same power resource,
it cannot be disabled by two subsequent calls 
from
a single device. Supersedes #9254 and #9263.
#9337 from comment #52  Improved final version of #9355.

We need #9255 and #9337 from this bug. They apply cleanly to 2.6.18-8.

Bug 7570: S3: fan doesn't work properly after resume

Summary: one of the four fans is not turned on after suspend/resume cycle.

Relevant patch:

#9802 from comment #8   'force_power_state' flag being set, disables the
check if the required power state is the same as
the current one. In that case the list of power
resources being enabled is the same as the list 
of
power resources being disabled, and follows to
consequent enabling and disabling of these 
resources.   

This patch may be included, even though the issue it fixes is not as critical
as the other ones. Applies fine to 2.6.18-8 too.

So far I have not tried building the kernel with this patches, but I think this 
is
a reasonable way to resolve the problem, as the resulting cumulative patch 
(attached)
is only 19K.

Best regards, 
-- 
Jurij Smakov   [EMAIL PR

Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-24 Thread Frans Pop
On Sunday 24 December 2006 15:22, you wrote:
> This is exactly the same kind of
> argument you are using in d-i, don't you think ?

There is a difference between being conservative with fixes for minor 
issues and fixes for issues that can fry peoples hardware, don't you 
think?

Of course care is needed for such changes and I would certainly encourage 
a careful review and possibly some contact with upstream maintainers to 
get a better feeling for feasibility and possible risks.

The sooner some action is taken on this, the earlier a kernel could be 
uploaded (or made available for testing) and a call for testing be done 
on the appropriate lists. If patches do cause regressions there would 
still be time to revert them. After all, this is an RC issue and the 
release will wait for it.


pgpvzHC0wokLZ.pgp
Description: PGP signature


Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-24 Thread Sven Luther
On Sun, Dec 24, 2006 at 03:42:46PM +0100, maximilian attems wrote:
> On Sun, Dec 24, 2006 at 03:31:15PM +0100, Frederik Schueler wrote:
> > Hello,
> > 
> > On Sun, Dec 24, 2006 at 02:02:58PM +0100, Moritz Muehlenhoff wrote:
> > > Do you intent to disable ACPI entirely for all systems?
> > > 
> > > It appears to me that the affected HP models could be disabled on a 
> > > per-case
> > > basis using drivers/acpi/blacklist.c
> > 
> > This looks like a good idea to me, do we know which models are affected?
> > 
> > OTOH, I doubt we have a complete list of affected models, and who knows
> > what problems may arise for yet to be released laptops...
> 
> indeed this is a good way.
> acpi patches have known side-effects so i would nack any hand-picking
> of those.
> 
> do we have a report from an affected laptop that booting with noacpi
> solves the thermal issues?

Ah, neat, there is the noacpi option.

We could simply add this flag to affected laptops by d-i. No need to touch the
kernel or otherwise.

Friendly,

Sven Luther


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-24 Thread Martin Michlmayr
* Moritz Muehlenhoff <[EMAIL PROTECTED]> [2006-12-24 15:57]:
> Since HP supports Debian officially now

not on laptops.

> I'm sure Dann or someone else from HP can provide us a list of
> affected models.

-- 
Martin Michlmayr
http://www.cyrius.com/


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-24 Thread maximilian attems
On Sun, Dec 24, 2006 at 03:31:15PM +0100, Frederik Schueler wrote:
> Hello,
> 
> On Sun, Dec 24, 2006 at 02:02:58PM +0100, Moritz Muehlenhoff wrote:
> > Do you intent to disable ACPI entirely for all systems?
> > 
> > It appears to me that the affected HP models could be disabled on a per-case
> > basis using drivers/acpi/blacklist.c
> 
> This looks like a good idea to me, do we know which models are affected?
> 
> OTOH, I doubt we have a complete list of affected models, and who knows
> what problems may arise for yet to be released laptops...

indeed this is a good way.
acpi patches have known side-effects so i would nack any hand-picking
of those.

do we have a report from an affected laptop that booting with noacpi
solves the thermal issues?

i don't agreee with the fuzz about this bug report nor with the severity.
for the sarge release kernel-image 2.6.8 did not boot on a wide range
of market available intel boards and there were overheating bug reports.
completly disabling acpi seems like an overreaction, based on the fact
that the affected laptops are quite specific. on the other hand i'm
delighted to see discussions about the linux-image upgrade in a stable
revision.

happy christmas

-- 
maks




-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-24 Thread Sven Luther
On Sun, Dec 24, 2006 at 02:48:27PM +0100, Frans Pop wrote:
> On Sunday 24 December 2006 03:07, Frederik Schueler wrote:
> > 2. port 2.6.19 ACPI - noop because way too much work, unless someone
> > "crazy enough" to accomplish this task.
> 
> Did you see that Bas Zoetekouw managed [1, #400488] to solve the problem 
> for his box by applying some selected patches from upstream?
> Wouldn't that be an option?

I thought i saw Maximilian say that there are indeed some patches, but that
the risk to destabilize the whole ACPI subsystem was too great this near to
the etch release. This is exactly the same kind of argument you are using in
d-i, don't you think ? 

> I'd suggest asking other people that see the same issues to also test a 
> kernel with these patches and decide based on the results.

No, what we would need is huge testing of these patches by people *WHO DIDN'T
SEE THE SAME ISSUES* to make sure there is no regression.

Friendly,

Sven Luther


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-24 Thread Moritz Muehlenhoff
Frederik Schueler wrote:
> Hello,
> 
> On Sun, Dec 24, 2006 at 02:02:58PM +0100, Moritz Muehlenhoff wrote:
> > Do you intent to disable ACPI entirely for all systems?
> > 
> > It appears to me that the affected HP models could be disabled on a per-case
> > basis using drivers/acpi/blacklist.c
> 
> This looks like a good idea to me, do we know which models are affected?
> OTOH, I doubt we have a complete list of affected models, 

Since HP supports Debian officially now, I'm sure Dann or someone else from
HP can provide us a list of affected models.

If not, we can contact Len Brown to get the ACPI-OEM-ID for HP and
blacklist all HP models.

> and who knows what problems may arise for yet to be released laptops...

Well, even Debian can't predict the future :-)
Plus, we can still address these in point updates.

Cheers,
Moritz


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-24 Thread Frederik Schueler
Hello,

On Sun, Dec 24, 2006 at 02:02:58PM +0100, Moritz Muehlenhoff wrote:
> Do you intent to disable ACPI entirely for all systems?
> 
> It appears to me that the affected HP models could be disabled on a per-case
> basis using drivers/acpi/blacklist.c

This looks like a good idea to me, do we know which models are affected?

OTOH, I doubt we have a complete list of affected models, and who knows
what problems may arise for yet to be released laptops...

Best regards
Frederik Schueler

-- 
ENOSIG


signature.asc
Description: Digital signature


Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-24 Thread Frans Pop
On Sunday 24 December 2006 03:07, Frederik Schueler wrote:
> 2. port 2.6.19 ACPI - noop because way too much work, unless someone
> "crazy enough" to accomplish this task.

Did you see that Bas Zoetekouw managed [1, #400488] to solve the problem 
for his box by applying some selected patches from upstream?
Wouldn't that be an option?

I'd suggest asking other people that see the same issues to also test a 
kernel with these patches and decide based on the results.

[1] http://lists.debian.org/debian-kernel/2006/12/msg00768.html


pgpdq9azefcyf.pgp
Description: PGP signature


Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-24 Thread Moritz Muehlenhoff
On Sun, Dec 24, 2006 at 03:07:55AM +0100, Frederik Schueler wrote:
> 
> Hi *,
> 
> this is indeed a severe issue which requires all our attention and care
> to solve or circumvent in order for nobodies boxes to get any harm, you
> know how expensive these laptops are.
> 
> I basically see 3 solutions/workarounds:
> 
> 1. the brutal one: deactivate ACPI in 2.6.18, have the bios keep control
> of the fans - better a noisy laptop until I upgrade the kernel than a
> fried box.

Do you intent to disable ACPI entirely for all systems?

It appears to me that the affected HP models could be disabled on a per-case
basis using drivers/acpi/blacklist.c

Cheers,
Moritz


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-24 Thread Sven Luther
On Sun, Dec 24, 2006 at 03:07:55AM +0100, Frederik Schueler wrote:
> 
> Hi *,
> 
> this is indeed a severe issue which requires all our attention and care
> to solve or circumvent in order for nobodies boxes to get any harm, you
> know how expensive these laptops are.
> 
> I basically see 3 solutions/workarounds:
> 
> 1. the brutal one: deactivate ACPI in 2.6.18, have the bios keep control
> of the fans - better a noisy laptop until I upgrade the kernel than a
> fried box.
> 
> 2. port 2.6.19 ACPI - noop because way too much work, unless someone 
> "crazy enough" to accomplish this task.
> 
> 3. go for 2.6.19

As said, i can imagine another solution.

  4. Provide both a stable 2.6.18, and a easily usable backported 2.6.19
  (or newer) kernel, which would be built for etch, but built out of our
  trunk/unstable/testing archive.

Then we can add a bit of logic into d-i's base-installer, so that the kernel
installation step detects the laptops which have this problem (do we know how
to detect them ?), and inform the user and install the newer kernel.

Alternatively, we can go 1, create a -noacpi flavour usable on those laptops,
and install that flavour in d-i. This would probably be the easiest solution.

> Documenting arbitrary breakage in the release notes is not a solution,
> just consider how well manuals are usually read (if at all). Users will 
> end with damaged hardware and blame us for it.

/me agrees.

> We released woody with disabled ide dma due to somewhat similar issues
> (boxes hanging), so disabling ACPI in 2.6.18 and going for a 2.6.19
> based 4.0r1 ASAP seems the best thing to me personally, but this is of
> course up for discussion.

I have been thinking of another solution, but since i am kind of ignored or
this is a subject a certain amount of the powers-who-be don't want me to
mention, i doubt it will be gaining much momentum. I am going to propose a
talk at fosdem about these ideas, where issues and everything else can be
discussed.

The idea goes as follows :

  1) We take the kernel out of the main debian archive, into a separate kernel
  pool. This pool would hold the kernel and all assorted modules or
  abi-depending packages. This pool would hold per-abi subpools
  (dists/kernel/2.6.18-3, dists/kernel/2.6.19-1, etc).

  2) Eventually, we have some symlink or mirroring logic which would allow the
  chosen kernel to be accesible from the main archives. This means we can
  prepare kernels in this kernel pool, test it, and once it is ready, do a
  one-pule moving of those packages (without rebuild) into the main pools.

  3) This pool will include both kernel .debs and .udebs. A further
  improvement would allow to split the d-i initramfs into two, having a single
  copy of the non-kernel specific stuff, and a per-flavour copy of the kernel
  initramfs stuff. This way, we move together the kernel and the module
  .udebs, and can easily switch d-i to change kernel version, or even build
  various d-i for various kernel versions. Furthermore this would avoid d-i
  trying to import 2.6.18-3 modules when you build a local 2.6.19-1 kernel,
  and simplify the whole .udeb version checking and downloading logic.

Well, there is more to it, and i will present that at fosdem, but i hope this
already gave you all a taste of what could be, and that these ideas will not
be rejected out of hand, just because they come from me.

Friendly,

Sven Luther


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-23 Thread Frederik Schueler

Hi *,

this is indeed a severe issue which requires all our attention and care
to solve or circumvent in order for nobodies boxes to get any harm, you
know how expensive these laptops are.

I basically see 3 solutions/workarounds:

1. the brutal one: deactivate ACPI in 2.6.18, have the bios keep control
of the fans - better a noisy laptop until I upgrade the kernel than a
fried box.

2. port 2.6.19 ACPI - noop because way too much work, unless someone 
"crazy enough" to accomplish this task.

3. go for 2.6.19

Documenting arbitrary breakage in the release notes is not a solution,
just consider how well manuals are usually read (if at all). Users will 
end with damaged hardware and blame us for it.

We released woody with disabled ide dma due to somewhat similar issues
(boxes hanging), so disabling ACPI in 2.6.18 and going for a 2.6.19
based 4.0r1 ASAP seems the best thing to me personally, but this is of
course up for discussion.

Best regards
Frederik Schueler

-- 
ENOSIG


signature.asc
Description: Digital signature


Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-23 Thread Sven Luther
On Sat, Dec 23, 2006 at 11:50:40AM +0100, Andreas Barth wrote:
> * Sven Luther ([EMAIL PROTECTED]) [061222 05:42]:
> > On Fri, Dec 22, 2006 at 12:53:09PM +0100, Marc 'HE' Brockschmidt wrote:
> > > maximilian attems <[EMAIL PROTECTED]> writes:
> > > > On Fri, Dec 22, 2006 at 11:28:29AM +0100, Marc 'HE' Brockschmidt wrote:
> > > >> Fix it or document it, I don't care. But the current state is not
> > > >> releasable.
> > > > we are not talking about "a" patch.
> > > > what you need is an backport of the 2.6.19 acpi release to 2.6.18.
> > > 
> > > Read again what I wrote. I will not allow Debian to release with a
> > > Kernel that may damage hardware without even a notice in the release
> > > notes. If you are not able to fix it, note that you have provided a
> > > broken kernel.
> > 
> > Cool, let's delay etch a couple of weeks and move to a (now released) 2.6.19
> > kernel, to solve this issue.
> 
> Sven, stop this!

Why ? /me guesses that even though debian is about free software, there are
many who feel that freedom of speach is to be banned. Do you also follow that
line of thought ? Was it not enough that some people felt that i should be
burned on the stack for having send mails while i was not at my best ? 

Really, this kind of behavior is disgusting.

> I can remember well how you promised that moving to
> 2.6.18 will magically solve almost all of our issues - 6 (or more)
> release critical bugs against 2.6.18 don't show that this has worked so
> well. Please try helping us on solutions rather then breaking things
> again.

I did not promise anything such. I simply stated at that time, that there
where many RC issues which where already fixed in the 2.6.18 tree, and which
would be a pain to backport to the 2.6.17 tree. Quite a different thing, don't
you think ? 

I personally will need to maintain 2.6.19+ backports to etch, because there is
no sane way to get Efika support in 2.6.18 without lot of work.

> Please try to look at it from another perspective:
> 
> Consider you have bought such a laptop, and you install Debian. You have
> even read the release notes first.  Everything works well.  Until one
> day you notice your laptop gets too warm, and eventually even breaks
> because of this.  On deeper research, you notice that this issue was
> well-known to Debian, but they refused to deal with it at all. How would
> you feel as a user? I think this is an unacceptable perspective.

Bah. hardware which can be broken by software is broken. That said, if in fact
this is not a bug of the bios as was first mentioned here, but that the linux
support is not able to cope with some not usual but legal features of acpi,
then it is another matter.

But you should *NEVER* try to stop discussion about the subject, or bash on
someone for writing a single sentence as i did.

Friendly,

Sven Luther


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-23 Thread Andreas Barth
* Sven Luther ([EMAIL PROTECTED]) [061222 05:42]:
> On Fri, Dec 22, 2006 at 12:53:09PM +0100, Marc 'HE' Brockschmidt wrote:
> > maximilian attems <[EMAIL PROTECTED]> writes:
> > > On Fri, Dec 22, 2006 at 11:28:29AM +0100, Marc 'HE' Brockschmidt wrote:
> > >> Fix it or document it, I don't care. But the current state is not
> > >> releasable.
> > > we are not talking about "a" patch.
> > > what you need is an backport of the 2.6.19 acpi release to 2.6.18.
> > 
> > Read again what I wrote. I will not allow Debian to release with a
> > Kernel that may damage hardware without even a notice in the release
> > notes. If you are not able to fix it, note that you have provided a
> > broken kernel.
> 
> Cool, let's delay etch a couple of weeks and move to a (now released) 2.6.19
> kernel, to solve this issue.

Sven, stop this! I can remember well how you promised that moving to
2.6.18 will magically solve almost all of our issues - 6 (or more)
release critical bugs against 2.6.18 don't show that this has worked so
well. Please try helping us on solutions rather then breaking things
again.


Please try to look at it from another perspective:

Consider you have bought such a laptop, and you install Debian. You have
even read the release notes first.  Everything works well.  Until one
day you notice your laptop gets too warm, and eventually even breaks
because of this.  On deeper research, you notice that this issue was
well-known to Debian, but they refused to deal with it at all. How would
you feel as a user? I think this is an unacceptable perspective.


Ok, what can we do? 
1. ignore the problem,
2. document it in the release notes and README.Debian of the kernel,
3. prevent the kernel running on such buggy laptops [is this possible?],
4. backport ACPI from 2.6.19, or use 2.6.19,
5. isolate a smaller fix and apply it.

I personally consider options 1 and 4 to be unacceptable. Option 5 would
be the best, but I have yet to see that this is possible (or rather,
someone knowledgeable enough has time to do it).

So, we should at least document it inside of the release notes, and
README.Debian, and, if possible without being to invasive, get some
check inside the kernel to print a big warning on bootup, or even refuse
to work until some special parameter is used.


How does this proposal sound to the kernel team?



Cheers,
Andi
-- 
  http://home.arcor.de/andreas-barth/


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-22 Thread Ludovic Brenta
Some more information.

1) On my machine, reading the temperature using, say, yacpi, causes
   one processor to process all the pending ACPI events.  On a
   uniprocessor machine, the machine would appear to hang for several
   seconds; not so on my dual-core machine :)

2) The lare slab usage (1.1 Gb) was in part due to the XFS cache data;
   all three of my machine's filesystems are XFS.  So the Acpi-State
   line in /proc/slabinfo is the really meaningful one.

Here is my complete log so far, with annotations.

2006-06-21T20:06:10: Slab:30296 kB
2006-17-21T20:17:01: Slab:37756 kB
2006-17-21T21:17:01: Slab:48116 kB
2006-17-21T22:17:01: Slab:55764 kB
2006-17-21T23:17:01: Slab:69904 kB
-- Reboot with acpi=noirq: only one CPU found --
2006-24-21T23:24:10: Slab:10444 kB
-- Reboot with pci=noacpi: only one CPU found --
2006-30-21T23:30:26: Slab: 9676 kB
2006-30-21T23:30:26: Acpi-State 0  0 80   481 : 
tunables  120   608 : slabdata  0  0  0
-- Reboot with no options: OK, both CPUs found --
2006-34-21T23:34:23: Slab:10584 kB
2006-34-21T23:34:23: Acpi-State 0  0 80   481 : 
tunables  120   608 : slabdata  0  0  0
2006-17-22T00:17:01: Slab:15424 kB
2006-17-22T00:17:01: Acpi-State 23088  23088 80   481 : 
tunables  120   608 : slabdata481481  0
2006-17-22T01:17:01: Slab:29956 kB
2006-17-22T01:17:01: Acpi-State 59136  59136 80   481 : 
tunables  120   608 : slabdata   1232   1232  0
2006-17-22T02:17:01: Slab:37764 kB
2006-17-22T02:17:01: Acpi-State 95088  95088 80   481 : 
tunables  120   608 : slabdata   1981   1981  0
2006-17-22T03:17:01: Slab:45544 kB
2006-17-22T03:17:01: Acpi-State130992 130992 80   481 : 
tunables  120   608 : slabdata   2729   2729  0
2006-17-22T04:17:01: Slab:53328 kB
2006-17-22T04:17:01: Acpi-State166944 166944 80   481 : 
tunables  120   608 : slabdata   3478   3478  0
2006-17-22T05:17:01: Slab:61120 kB
2006-17-22T05:17:01: Acpi-State202896 202896 80   481 : 
tunables  120   608 : slabdata   4227   4227  0
2006-17-22T06:17:01: Slab:68904 kB
2006-17-22T06:17:01: Acpi-State238800 238800 80   481 : 
tunables  120   608 : slabdata   4975   4975  0
2006-17-22T07:17:01: Slab:  1152624 kB
2006-17-22T07:17:01: Acpi-State274656 274656 80   481 : 
tunables  120   608 : slabdata   5722   5722  0
2006-17-22T08:17:01: Slab:  1160376 kB
2006-17-22T08:17:01: Acpi-State310608 310608 80   481 : 
tunables  120   608 : slabdata   6471   6471  0
2006-17-22T09:17:01: Slab:  1168168 kB
2006-17-22T09:17:01: Acpi-State346464 346464 80   481 : 
tunables  120   608 : slabdata   7218   7218  0
2006-17-22T10:17:01: Slab:  1175892 kB
2006-17-22T10:17:01: Acpi-State382176 382176 80   481 : 
tunables  120   608 : slabdata   7962   7962  0
2006-17-22T11:17:01: Slab:  1183660 kB
2006-17-22T11:17:01: Acpi-State417984 417984 80   481 : 
tunables  120   608 : slabdata   8708   8708  0
2006-17-22T12:17:01: Slab:  1191400 kB
2006-17-22T12:17:01: Acpi-State453744 453744 80   481 : 
tunables  120   608 : slabdata   9453   9453  0
2006-17-22T13:17:01: Slab:  1202924 kB
2006-17-22T13:17:01: Acpi-State489696 489696 80   481 : 
tunables  120   608 : slabdata  10202  10202  0
-- Start yacpi, monitoring the temperature every second.
-- Note how the slab allocation drops by ~100M and then stays constant.
2006-17-22T14:17:01: Slab:  1097584 kB
2006-17-22T14:17:01: Acpi-State   109144 80   481 : 
tunables  120   608 : slabdata  3  3  0
2006-17-22T15:17:01: Slab:  1097532 kB
2006-17-22T15:17:01: Acpi-State45 96 80   481 : 
tunables  120   608 : slabdata  2  2  0
2006-17-22T16:17:01: Slab:  1097536 kB
2006-17-22T16:17:01: Acpi-State75144 80   481 : 
tunables  120   608 : slabdata  3  3  0
2006-17-22T17:17:01: Slab:  1097668 kB
2006-17-22T17:17:01: Acpi-State   141144 80   481 : 
tunables  120   608 : slabdata  3  3  0
-- Stop the yacpi monitoring.
2006-17-22T18:17:01: Slab:  1098904 kB
2006-17-22T18:17:01: Acpi-State  5808   5808 80   481 : 
tunables  120   608 : slabdata121121  0
-- At this point the Acpi-State has started increasing again, but is still
-- small.  Most of the slab allocations are in the XFS caches (all three
-- filesystems on this computer are XFS).
-- T

Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-22 Thread Marc 'HE' Brockschmidt
Sven Luther <[EMAIL PROTECTED]> writes:
> On Fri, Dec 22, 2006 at 12:53:09PM +0100, Marc 'HE' Brockschmidt wrote:
>> maximilian attems <[EMAIL PROTECTED]> writes:
>>> On Fri, Dec 22, 2006 at 11:28:29AM +0100, Marc 'HE' Brockschmidt wrote:
 Fix it or document it, I don't care. But the current state is not
 releasable.
>>> we are not talking about "a" patch.
>>> what you need is an backport of the 2.6.19 acpi release to 2.6.18.
>> Read again what I wrote. I will not allow Debian to release with a
>> Kernel that may damage hardware without even a notice in the release
>> notes. If you are not able to fix it, note that you have provided a
>> broken kernel.
> Cool, let's delay etch a couple of weeks and move to a (now released) 2.6.19
> kernel, to solve this issue.

Let's try again: Fix it *OR* explain in the release notes that the
kernel in etch is broken for some hardware.

Marc
-- 
Fachbegriffe der Informatik - Einfach erklärt
79: Usenet
   Ich habe zuviel Freizeit. (Florian Kuehnert)


pgpJTdSGdnwIx.pgp
Description: PGP signature


Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-22 Thread Sven Luther
On Fri, Dec 22, 2006 at 12:53:09PM +0100, Marc 'HE' Brockschmidt wrote:
> severity 404143 critical
> thanks
> 
> maximilian attems <[EMAIL PROTECTED]> writes:
> > On Fri, Dec 22, 2006 at 11:28:29AM +0100, Marc 'HE' Brockschmidt wrote:
> >> Fix it or document it, I don't care. But the current state is not
> >> releasable.
> > we are not talking about "a" patch.
> > what you need is an backport of the 2.6.19 acpi release to 2.6.18.
> 
> Read again what I wrote. I will not allow Debian to release with a
> Kernel that may damage hardware without even a notice in the release
> notes. If you are not able to fix it, note that you have provided a
> broken kernel.

Cool, let's delay etch a couple of weeks and move to a (now released) 2.6.19
kernel, to solve this issue.

Friendly,

Sven Luther


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-22 Thread Andreas Barth
* Sven Luther ([EMAIL PROTECTED]) [061222 11:34]:
> On Fri, Dec 22, 2006 at 10:54:50AM +0100, Andreas Barth wrote:
> > severity 404143 critical
> > thanks
> > 
> > * Bastian Blank ([EMAIL PROTECTED]) [061222 01:27]:
> > > On Fri, Dec 22, 2006 at 01:51:36AM +0100, [EMAIL PROTECTED] wrote:
> > > > Consequence: linux-image-2.6.18-3-amd63 (=2.6.18-7) is unsuitable for
> > > > release.
> > > 
> > > Failing for you don't makes it unsuitable.
> > 
> > That is a true statement by itself. This bug however has the potential
> > to damage hardware. Which is a critical bug.
> 
> Euh, it seems to me more that the hardware has a bug which causes normal
> operation to damage it.
> 
> As thus, i think that any damage done would be under the responsability of the
> manufacturer to repare or fix. This seems to be both the position of Bastian
> and Maximilian, and it seems reasonable.
> 
> So, users of such hardware, please bother your vendor to either exchange it
> for a not broken one, or at least provide a bios upgrade which fixes the
> brokeness.

If a bios upgrade is a solution, the kernel could e.g. refuse to run
with a broken bios unless forced to ("runs if forced to" so that people
can do a bios upgrade)? (And of course, write about that in the release
notes).

I'm not saying the fix needs to happen in the kernel. But I do say that
if we must not ship software where we know that hardware damages could
happen on a certain platform - this is not a question of "who did the
mistake", but on protecting our users.



Cheers,
Andi
-- 
  http://home.arcor.de/andreas-barth/


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-22 Thread Ludovic Brenta
forward 400488 http://bugzilla.kernel.org/show_bug.cgi?id=7122
forward 404143 http://bugzilla.kernel.org/show_bug.cgi?id=5534
thanks

When I said there's a memory leak, that's not technically true.  What
happens is that ACPI events get piled up in a queue and never
processed, due to a deadlock in Linux' ACPI subsystem.  Thus the
memory is not exactly "lost" but the net effect is the same as for a
genuine memory leak.

Now here is some additional information; my hourly cron job has
monitored the slab allocation for some more time and the bug appears
even more severe than I first thought.  Notice how the slab allocation
jumped from 64M to 1G between 6:17 and 7:17?  The only thing happening
at that time in the system was the execution of the daily crontabs at
6:47.  These are the stock (unmodified) Debian crontabs for apt,
aptitude, apt-show-versions, bsdmainutils, dlocate, find, logrotate,
man-db, modutils, prelink, standard, sysklogd, tetex-bin, zz-backup2l.

2006-06-21T20:06:10: Slab:30296 kB
2006-17-21T20:17:01: Slab:37756 kB
2006-17-21T21:17:01: Slab:48116 kB
2006-17-21T22:17:01: Slab:55764 kB
2006-17-21T23:17:01: Slab:69904 kB
-- Reboot with acpi=noirq: only one CPU found --
2006-24-21T23:24:10: Slab:10444 kB
-- Reboot with pci=noacpi: only one CPU found --
2006-30-21T23:30:26: Slab: 9676 kB
2006-30-21T23:30:26: Acpi-State 0  0 80   481 : 
tunables  120   608 : slabdata  0  0  0
-- Reboot with no options: OK, both CPUs found --
2006-34-21T23:34:23: Slab:10584 kB
2006-34-21T23:34:23: Acpi-State 0  0 80   481 : 
tunables  120   608 : slabdata  0  0  0
2006-17-22T00:17:01: Slab:15424 kB
2006-17-22T00:17:01: Acpi-State 23088  23088 80   481 : 
tunables  120   608 : slabdata481481  0
2006-17-22T01:17:01: Slab:29956 kB
2006-17-22T01:17:01: Acpi-State 59136  59136 80   481 : 
tunables  120   608 : slabdata   1232   1232  0
2006-17-22T02:17:01: Slab:37764 kB
2006-17-22T02:17:01: Acpi-State 95088  95088 80   481 : 
tunables  120   608 : slabdata   1981   1981  0
2006-17-22T03:17:01: Slab:45544 kB
2006-17-22T03:17:01: Acpi-State130992 130992 80   481 : 
tunables  120   608 : slabdata   2729   2729  0
2006-17-22T04:17:01: Slab:53328 kB
2006-17-22T04:17:01: Acpi-State166944 166944 80   481 : 
tunables  120   608 : slabdata   3478   3478  0
2006-17-22T05:17:01: Slab:61120 kB
2006-17-22T05:17:01: Acpi-State202896 202896 80   481 : 
tunables  120   608 : slabdata   4227   4227  0
2006-17-22T06:17:01: Slab:68904 kB
2006-17-22T06:17:01: Acpi-State238800 238800 80   481 : 
tunables  120   608 : slabdata   4975   4975  0
2006-17-22T07:17:01: Slab:  1152624 kB
2006-17-22T07:17:01: Acpi-State274656 274656 80   481 : 
tunables  120   608 : slabdata   5722   5722  0
2006-17-22T08:17:01: Slab:  1160376 kB
2006-17-22T08:17:01: Acpi-State310608 310608 80   481 : 
tunables  120   608 : slabdata   6471   6471  0
2006-17-22T09:17:01: Slab:  1168168 kB
2006-17-22T09:17:01: Acpi-State346464 346464 80   481 : 
tunables  120   608 : slabdata   7218   7218  0
2006-17-22T10:17:01: Slab:  1175892 kB
2006-17-22T10:17:01: Acpi-State382176 382176 80   481 : 
tunables  120   608 : slabdata   7962   7962  0
2006-17-22T11:17:01: Slab:  1183660 kB
2006-17-22T11:17:01: Acpi-State417984 417984 80   481 : 
tunables  120   608 : slabdata   8708   8708  0
2006-17-22T12:17:01: Slab:  1191400 kB
2006-17-22T12:17:01: Acpi-State453744 453744 80   481 : 
tunables  120   608 : slabdata   9453   9453  0
2006-17-22T13:17:01: Slab:  1202924 kB
2006-17-22T13:17:01: Acpi-State489696 489696 80   481 : 
tunables  120   608 : slabdata  10202  10202  0



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-22 Thread Ludovic Brenta
Sven Luther writes:
> Euh, it seems to me more that the hardware has a bug which causes
> normal operation to damage it.
>
> As thus, i think that any damage done would be under the
> responsability of the manufacturer to repare or fix. This seems to
> be both the position of Bastian and Maximilian, and it seems
> reasonable.
>
> So, users of such hardware, please bother your vendor to either
> exchange it for a not broken one, or at least provide a bios upgrade
> which fixes the brokeness.

No, the problem is not in the BIOS, it is in the kernel and it is
described at length in the upstream bug report.  If I understand this
description correctly, the kernel is not compliant with the ACPI
specification in that it handles all ACPI events in a single thread,
whereas the ACPI spec only says that the *interpreter* must be
single-threaded.  Also, there is a deadlock situation in the kernel
which is clearly a kernel, not BIOS, bug.

-- 
Ludovic Brenta.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-22 Thread Sven Luther
On Fri, Dec 22, 2006 at 12:09:45PM +0100, maximilian attems wrote:
> On Fri, Dec 22, 2006 at 11:28:29AM +0100, Marc 'HE' Brockschmidt wrote:
> > Bastian Blank <[EMAIL PROTECTED]> writes:
> > > On Fri, Dec 22, 2006 at 10:30:57AM +0100, Marc 'HE' Brockschmidt wrote:
> > >> Sorry, I don't accept this. We are talking about an *overheating*
> > >> problem, which means *broken* hardware. There needs to be at least a fix
> > >> documented in the release-notes.
> > > Garbage-in, garbage-out. The BIOS of that machines is broken. Do you
> > > really expect that an interpreter (in this case the ACPI interpreter)
> > > accepts any garbage?
> > 
> > Other OSes don't destroy the hardware. There is a patch for Linux not to
> > - I don't see why Debian should release with a kernel that destroys
> > hardware, without even giving users a warning. Not everyone who buys a
> > notebook is aware of ACPI problems, and we shouldn't expect all users to
> > do so.
> > 
> > Fix it or document it, I don't care. But the current state is not
> > releasable.
> 
> we are not talking about "a" patch.
> what you need is an backport of the 2.6.19 acpi release to 2.6.18.
> 
> acpi linux releases are tested as one release and you open a can of worm
> once you start picking acpi patches. only mjg59 is insane enough to do
> that. anyway the fix for those broken aml tables has a big dependency
> so the backport is insane.
> 
> i looked at it 2 month ago and dropped the case, we are shortly before
> release. i restate those broken hardware needs a newer kernel fullstop.

Well, this would mean that we could provide a semi-official set of newer
kernels for etch. We would, once etch is released, provide a backportet kernel
of the new unstable kernel, as well as a etch-installing d-i for them.

This would allow users to install a stable etch, but including a newer kernel,
which is what probably most of us are doing anyway.

Friendly,

Sven Luther


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Processed: Re: Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-22 Thread Debian Bug Tracking System
Processing commands for [EMAIL PROTECTED]:

> severity 404143 critical
Bug#404143: Fans unreliable under load, permanent memory leak
Severity set to `critical' from `important'

> thanks
Stopping processing here.

Please contact me if you need assistance.

Debian bug tracking system administrator
(administrator, Debian Bugs database)


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Processed: Re: Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-22 Thread Debian Bug Tracking System
Processing commands for [EMAIL PROTECTED]:

> severity 404143 important
Bug#404143: Fans unreliable under load, permanent memory leak
Severity set to `important' from `critical'

> thanks
Stopping processing here.

Please contact me if you need assistance.

Debian bug tracking system administrator
(administrator, Debian Bugs database)


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-22 Thread maximilian attems
severity 404143 important
thanks

On Fri, Dec 22, 2006 at 10:54:50AM +0100, Andreas Barth wrote:
> severity 404143 critical
> thanks
> 
> 
> This bug however has the potential to damage hardware. Which is a
> critical bug.

yes, but it is a very specific affected hardware range.
upstream did not issue a fix for the stable serie 2.6.18.X,
because it's not possible.

-- 
maks


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-22 Thread maximilian attems
On Fri, Dec 22, 2006 at 11:28:29AM +0100, Marc 'HE' Brockschmidt wrote:
> Bastian Blank <[EMAIL PROTECTED]> writes:
> > On Fri, Dec 22, 2006 at 10:30:57AM +0100, Marc 'HE' Brockschmidt wrote:
> >> Sorry, I don't accept this. We are talking about an *overheating*
> >> problem, which means *broken* hardware. There needs to be at least a fix
> >> documented in the release-notes.
> > Garbage-in, garbage-out. The BIOS of that machines is broken. Do you
> > really expect that an interpreter (in this case the ACPI interpreter)
> > accepts any garbage?
> 
> Other OSes don't destroy the hardware. There is a patch for Linux not to
> - I don't see why Debian should release with a kernel that destroys
> hardware, without even giving users a warning. Not everyone who buys a
> notebook is aware of ACPI problems, and we shouldn't expect all users to
> do so.
> 
> Fix it or document it, I don't care. But the current state is not
> releasable.

we are not talking about "a" patch.
what you need is an backport of the 2.6.19 acpi release to 2.6.18.

acpi linux releases are tested as one release and you open a can of worm
once you start picking acpi patches. only mjg59 is insane enough to do
that. anyway the fix for those broken aml tables has a big dependency
so the backport is insane.

i looked at it 2 month ago and dropped the case, we are shortly before
release. i restate those broken hardware needs a newer kernel fullstop.

--
maks


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-22 Thread Marc 'HE' Brockschmidt
Bastian Blank <[EMAIL PROTECTED]> writes:
> On Fri, Dec 22, 2006 at 10:30:57AM +0100, Marc 'HE' Brockschmidt wrote:
>> Sorry, I don't accept this. We are talking about an *overheating*
>> problem, which means *broken* hardware. There needs to be at least a fix
>> documented in the release-notes.
> Garbage-in, garbage-out. The BIOS of that machines is broken. Do you
> really expect that an interpreter (in this case the ACPI interpreter)
> accepts any garbage?

Other OSes don't destroy the hardware. There is a patch for Linux not to
- I don't see why Debian should release with a kernel that destroys
hardware, without even giving users a warning. Not everyone who buys a
notebook is aware of ACPI problems, and we shouldn't expect all users to
do so.

Fix it or document it, I don't care. But the current state is not
releasable.

Marc
-- 
BOFH #241:
_Rosin_ core solder? But...


pgpdzuerRPZVd.pgp
Description: PGP signature


Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-22 Thread Sven Luther
On Fri, Dec 22, 2006 at 10:54:50AM +0100, Andreas Barth wrote:
> severity 404143 critical
> thanks
> 
> * Bastian Blank ([EMAIL PROTECTED]) [061222 01:27]:
> > On Fri, Dec 22, 2006 at 01:51:36AM +0100, [EMAIL PROTECTED] wrote:
> > > Consequence: linux-image-2.6.18-3-amd63 (=2.6.18-7) is unsuitable for
> > > release.
> > 
> > Failing for you don't makes it unsuitable.
> 
> That is a true statement by itself. This bug however has the potential
> to damage hardware. Which is a critical bug.

Euh, it seems to me more that the hardware has a bug which causes normal
operation to damage it.

As thus, i think that any damage done would be under the responsability of the
manufacturer to repare or fix. This seems to be both the position of Bastian
and Maximilian, and it seems reasonable.

So, users of such hardware, please bother your vendor to either exchange it
for a not broken one, or at least provide a bios upgrade which fixes the
brokeness.

Friendly,

Sven Luther


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-22 Thread Bastian Blank
On Fri, Dec 22, 2006 at 10:30:57AM +0100, Marc 'HE' Brockschmidt wrote:
> Sorry, I don't accept this. We are talking about an *overheating*
> problem, which means *broken* hardware. There needs to be at least a fix
> documented in the release-notes.

Garbage-in, garbage-out. The BIOS of that machines is broken. Do you
really expect that an interpreter (in this case the ACPI interpreter)
accepts any garbage?

Bastian

-- 
Deflector shields just came on, Captain.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Processed: Re: Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-22 Thread Debian Bug Tracking System
Processing commands for [EMAIL PROTECTED]:

> severity 404143 critical
Bug#404143: Fans unreliable under load, permanent memory leak
Severity set to `critical' from `serious'

> thanks
Stopping processing here.

Please contact me if you need assistance.

Debian bug tracking system administrator
(administrator, Debian Bugs database)


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-22 Thread Marc 'HE' Brockschmidt
[EMAIL PROTECTED] writes:
> I'm more than willing to help test a kernel package, but I'll be on
> [VAC] from 2006-12-23 to 2007-01-03 inclusive.  So, please do not
> release Etch just now :)

I have ordered an nx6325, which should arrive directly after
Christmas. I would also be happy to test a fixed kernel. Due to this
being an overheating problem, I would prefer if you could provide kernel
images, so that I don't have to compile it.

Marc
-- 
BOFH #34:
(l)user error


pgpt2v43ps8bX.pgp
Description: PGP signature


Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-22 Thread Andreas Barth
severity 404143 critical
thanks

* Bastian Blank ([EMAIL PROTECTED]) [061222 01:27]:
> On Fri, Dec 22, 2006 at 01:51:36AM +0100, [EMAIL PROTECTED] wrote:
> > Consequence: linux-image-2.6.18-3-amd63 (=2.6.18-7) is unsuitable for
> > release.
> 
> Failing for you don't makes it unsuitable.

That is a true statement by itself. This bug however has the potential
to damage hardware. Which is a critical bug.


Cheers,
Andi
-- 
  http://home.arcor.de/andreas-barth/


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Processed: Re: Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-22 Thread Debian Bug Tracking System
Processing commands for [EMAIL PROTECTED]:

> severity 404143 serious
Bug#404143: Fans unreliable under load, permanent memory leak
Severity set to `serious' from `important'

> thanks
Stopping processing here.

Please contact me if you need assistance.

Debian bug tracking system administrator
(administrator, Debian Bugs database)


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Processed: Re: Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-22 Thread Debian Bug Tracking System
Processing commands for [EMAIL PROTECTED]:

> severity 404143 important
Bug#404143: Fans unreliable under load, permanent memory leak
Severity set to `important' from `grave'

> tags 404143 upstream
Bug#404143: Fans unreliable under load, permanent memory leak
There were no tags set.
Tags added: upstream

> stop
Stopping processing here.

Please contact me if you need assistance.

Debian bug tracking system administrator
(administrator, Debian Bugs database)


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-22 Thread maximilian attems
severity 404143 important
tags 404143 upstream
stop

On Fri, Dec 22, 2006 at 01:51:36AM +0100, [EMAIL PROTECTED] wrote:
> Package: linux-image-2.6.18-3-amd64
> Version: 2.6.18-7
> Severity: grave
> Justification: hardware overheating hazard; requires periodic reboots
> 
> (This is not the same bug as #400488 (upstream #7122))
> 
> This bug affects several amd64 notebooks from HP, notably the nx6125
> and the nx6325; there may be other affected machines as well.

yes this is a known problem of 2.6.18.
the real cause is that HP is shipping broken BIOS in those models.
 
> Kernel team, please apply the patches for
> http://bugzilla.kernel.org/show_bug.cgi?id=5534
> 
> This bug is there merely to remind the kernel team not to release etch
> without the patches :) However I'm not sure which upstream version of
> linux, if any, contains the patches in the (long) trail of comments.
> So, it might be necessary to wait for a few days until the patches
> arrive in Linus' tree.

big nack,
acpi has a huge potential destabilisation.
at this time of the game adding acpi patches is pron to regression
at unexpected corners.

etch will get in a point release a newer kernel,
those laptops will have to get one on backports soon after release.
 
> Symptoms:
> - under load, the fans fail to turn on when the temperature reaches
>   and then exceeds the normal threshold, which is 58°C.
> - there is a permanent memory leak in the kernel, even when the system
>   is idle.  The leak is visible by looking at
>   $ grep Slab: /proc/meminfo and
>   $ grep Acpi-State /proc/slabinfo
> 
> Workaround:
> - if overheating, shut down the computer and let it cool down; or
>   let it shut itself down to prevent a fire hazard.
> - if the only problem is the memory leak, reboot.
> 
> Consequence: linux-image-2.6.18-3-amd63 (=2.6.18-7) is unsuitable for
> release.
> 
> The memory leak is described at:
> 
> http://www.mail-archive.com/linux-acpi@vger.kernel.org/msg03119.html
> 
> Today I had to reboot my HP Compaq nx6325 because the kernel was
> eating 1.8 Gb out of the 1.9 Gb of RAM in the system, after about 9
> days of uptime.  Then I started a hourly cron job to monitor
> /proc/meminfo and /proc/slabinfo as described above:
> 
> 2006-06-21T20:06:10: Slab:30296 kB
> 2006-17-21T20:17:01: Slab:37756 kB
> 2006-17-21T21:17:01: Slab:48116 kB
> 2006-17-21T22:17:01: Slab:55764 kB
> 2006-17-21T23:17:01: Slab:69904 kB
> -- Reboot with acpi=noirq: only one CPU found --
> 2006-24-21T23:24:10: Slab:10444 kB
> -- Reboot with pci=noacpi: only one CPU found --
> 2006-30-21T23:30:26: Slab: 9676 kB
> 2006-30-21T23:30:26: Acpi-State 0  0 80   481 : 
> tunables  120   608 : slabdata  0  0  0
> -- Reboot with no options: OK, both CPUs found --
> 2006-34-21T23:34:23: Slab:10584 kB
> 2006-34-21T23:34:23: Acpi-State 0  0 80   481 : 
> tunables  120   608 : slabdata  0  0  0
> 2006-17-22T00:17:01: Slab:15424 kB
> 2006-17-22T00:17:01: Acpi-State 23088  23088 80   481 : 
> tunables  120   608 : slabdata481481  0
> 2006-17-22T01:17:01: Slab:29956 kB
> 2006-17-22T01:17:01: Acpi-State 59136  59136 80   481 : 
> tunables  120   608 : slabdata   1232   1232  0
> 
> I'm more than willing to help test a kernel package, but I'll be on
> [VAC] from 2006-12-23 to 2007-01-03 inclusive.  So, please do not
> release Etch just now :)
> 
> -- 
> Ludovic Brenta.

anyway this bug report is helpfull as documentation.
happy vacation

-- 
maks



Bug#404143: Fans unreliable under load, permanent memory leak

2006-12-21 Thread ludovic
Package: linux-image-2.6.18-3-amd64
Version: 2.6.18-7
Severity: grave
Justification: hardware overheating hazard; requires periodic reboots

(This is not the same bug as #400488 (upstream #7122))

This bug affects several amd64 notebooks from HP, notably the nx6125
and the nx6325; there may be other affected machines as well.

Kernel team, please apply the patches for
http://bugzilla.kernel.org/show_bug.cgi?id=5534

This bug is there merely to remind the kernel team not to release etch
without the patches :) However I'm not sure which upstream version of
linux, if any, contains the patches in the (long) trail of comments.
So, it might be necessary to wait for a few days until the patches
arrive in Linus' tree.

Symptoms:
- under load, the fans fail to turn on when the temperature reaches
  and then exceeds the normal threshold, which is 58°C.
- there is a permanent memory leak in the kernel, even when the system
  is idle.  The leak is visible by looking at
  $ grep Slab: /proc/meminfo and
  $ grep Acpi-State /proc/slabinfo

Workaround:
- if overheating, shut down the computer and let it cool down; or
  let it shut itself down to prevent a fire hazard.
- if the only problem is the memory leak, reboot.

Consequence: linux-image-2.6.18-3-amd63 (=2.6.18-7) is unsuitable for
release.

The memory leak is described at:

http://www.mail-archive.com/linux-acpi@vger.kernel.org/msg03119.html

Today I had to reboot my HP Compaq nx6325 because the kernel was
eating 1.8 Gb out of the 1.9 Gb of RAM in the system, after about 9
days of uptime.  Then I started a hourly cron job to monitor
/proc/meminfo and /proc/slabinfo as described above:

2006-06-21T20:06:10: Slab:30296 kB
2006-17-21T20:17:01: Slab:37756 kB
2006-17-21T21:17:01: Slab:48116 kB
2006-17-21T22:17:01: Slab:55764 kB
2006-17-21T23:17:01: Slab:69904 kB
-- Reboot with acpi=noirq: only one CPU found --
2006-24-21T23:24:10: Slab:10444 kB
-- Reboot with pci=noacpi: only one CPU found --
2006-30-21T23:30:26: Slab: 9676 kB
2006-30-21T23:30:26: Acpi-State 0  0 80   481 : 
tunables  120   608 : slabdata  0  0  0
-- Reboot with no options: OK, both CPUs found --
2006-34-21T23:34:23: Slab:10584 kB
2006-34-21T23:34:23: Acpi-State 0  0 80   481 : 
tunables  120   608 : slabdata  0  0  0
2006-17-22T00:17:01: Slab:15424 kB
2006-17-22T00:17:01: Acpi-State 23088  23088 80   481 : 
tunables  120   608 : slabdata481481  0
2006-17-22T01:17:01: Slab:29956 kB
2006-17-22T01:17:01: Acpi-State 59136  59136 80   481 : 
tunables  120   608 : slabdata   1232   1232  0

I'm more than willing to help test a kernel package, but I'll be on
[VAC] from 2006-12-23 to 2007-01-03 inclusive.  So, please do not
release Etch just now :)

-- 
Ludovic Brenta.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]