Bug#688198: megasas: Failed to alloc kernel SGL buffer for IOCTL - Possible regression from 2.6.32.41~3

2012-11-21 Thread Todd Fleisher
FWIW - I don't experience the problem/message on a Debian Squeeze box running 
Linux deb003.pod01 2.6.32-5-xen-amd64 #1 SMP Mon Jan 16 20:48:30 UTC 2012 
x86_64 GNU/Linux

I'm not currently able to re-compile my 3.2 ubuntu 12.04 kernel, but will try 
to find a comparable system to do it on.

On Nov 20, 2012, at 11:20 PM, Bjørn Mork wrote:

 Todd Fleisher t...@fleetstreetops.com writes:
 
 I get this periodically (seemingly random - but usually once it starts 
 happening it sticks around for a while, then disappears only to return 
 later) when I'm using LSI's MegaCli64 utility. When the kernel logs the 
 error the MegaCli64 command doesn't return any data either.
 
 Ex:
 root@deb015.pod02:~# MegaCli64 -PDList -aALL
 
 
 Exit Code: 0x00
 
 
 Which is paired with a kernel message:
 Nov 20 20:29:50 deb015 kernel: [797020.797811] megasas: Failed to alloc 
 kernel SGL buffer for IOCTL 
 
 Other times that same command (or other MegaCli64 commands) will succeed and 
 return the associated data. When this happens, there is no megasas kernel 
 message.
 
 
 Thanks.  I don't know what the MegaCli64 utility does, but I assume it
 use the driver specific ioctls to send passthrough commands like the
 smartmontools do.  That is consistent with your description.
 
 But I was concluding too fast as usual.  The bug I found needs to be
 fixed, but it cannot be the cause of this problem.  If it were then you
 would most likely see many other effects on your system.  And the same
 bug has been backported to 2.6.32 as well.  And if if had not been, and
 you are in fact hit by it, then your system would have crashed instead.
 
 So that cannot be the problem.  And then I don't know what could have
 changed between 2.6.32 and 3.2.  Could be something outside this driver.
 
 It would be interesting to know something about the size of the buffers
 which cannot be allocated.  But running with debug pacthes is maybe out
 of the question? Otherwise you could try running with something like
 this to get a better picture of why this is failing:
 
 
 diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c 
 b/drivers/scsi/megaraid/megaraid_sas_base.c
 index f013432..1c0fa1d 100644
 --- a/drivers/scsi/megaraid/megaraid_sas_base.c
 +++ b/drivers/scsi/megaraid/megaraid_sas_base.c
 @@ -4797,6 +4797,7 @@ megasas_mgmt_fw_ioctl(struct megasas_instance *instance,
   if (!kbuff_arr[i]) {
   printk(KERN_DEBUG megasas: Failed to alloc 
  kernel SGL buffer for IOCTL \n);
 + printk(KERN_DEBUG megasas: iov_len=%d\n, 
 ioc-sgl[i].iov_len);
   error = -ENOMEM;
   goto out;
   }
 
 
 
 
 Bjørn


--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/ec4cf311-1d76-4377-9ca8-f2fdca291...@fleetstreetops.com



Bug#688198: megasas: Failed to alloc kernel SGL buffer for IOCTL - Possible regression from 2.6.32.41~3

2012-11-20 Thread Todd Fleisher
FYI - I'm seeing this same issue in Ubuntu 12.04: Linux deb015.pod02 
3.2.0-32-generic #51-Ubuntu SMP Wed Sep 26 21:33:09 UTC 2012 x86_64 x86_64 
x86_64 GNU/Linux


--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/2cca949c-9192-47dd-91bb-7b3af80f7...@fleetstreetops.com



Bug#688198: megasas: Failed to alloc kernel SGL buffer for IOCTL - Possible regression from 2.6.32.41~3

2012-11-20 Thread Bjørn Mork
Todd Fleisher t...@fleetstreetops.com writes:

 FYI - I'm seeing this same issue in Ubuntu 12.04: Linux deb015.pod02
 3.2.0-32-generic #51-Ubuntu SMP Wed Sep 26 21:33:09 UTC 2012 x86_64
 x86_64 x86_64 GNU/Linux

Shit!  I have a bad feeling I might be responsible here...

Looks like the fix I submitted a while ago results in leaking
dma_allocated memory instead of BUGing out. Maybe slightly better in a
short term, but slightly more difficult to notice. Does it take a while
before this error starts appearing?  Do you run some smartctl commands
periodically?

I'd appreciate it if the good Debian kernel team could tak a look at
this before it goes upstream, but I believe something like the attached
patch might fix the bug.  This patch is based on v3.2.34, but I'll
rebase it on current mainline and submit it upstream with Cc stable if
any of you confirms that this look sane


Bjørn

From 4c41818461c2604f859d2fecda2657827071f0d4 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bj=C3=B8rn=20Mork?= bj...@mork.no
Date: Tue, 20 Nov 2012 18:17:48 +0100
Subject: [PATCH] megaraid_sas: fix memory leak if SGL has 0 length entries
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

commit 98cb7e44 ([SCSI] megaraid_sas: Sanity check user
supplied length before passing it to dma_alloc_coherent())
introduced a memory leak.  Memory allocated for entries
following zero length SGL entries will not be freed.

Signed-off-by: Bjørn Mork bj...@mork.no
---
 drivers/scsi/megaraid/megaraid_sas_base.c |5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
index 7c471eb..f013432 100644
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -4886,8 +4886,9 @@ megasas_mgmt_fw_ioctl(struct megasas_instance *instance,
 sense, sense_handle);
 	}
 
-	for (i = 0; i  ioc-sge_count  kbuff_arr[i]; i++) {
-		dma_free_coherent(instance-pdev-dev,
+	for (i = 0; i  ioc-sge_count; i++) {
+		if (kbuff_arr[i])
+			dma_free_coherent(instance-pdev-dev,
 kern_sge32[i].length,
 kbuff_arr[i], kern_sge32[i].phys_addr);
 	}
-- 
1.7.10.4



Bug#688198: megasas: Failed to alloc kernel SGL buffer for IOCTL - Possible regression from 2.6.32.41~3

2012-11-20 Thread Todd Fleisher
I get this periodically (seemingly random - but usually once it starts 
happening it sticks around for a while, then disappears only to return later) 
when I'm using LSI's MegaCli64 utility. When the kernel logs the error the 
MegaCli64 command doesn't return any data either.

Ex:
root@deb015.pod02:~# MegaCli64 -PDList -aALL


Exit Code: 0x00


Which is paired with a kernel message:
Nov 20 20:29:50 deb015 kernel: [797020.797811] megasas: Failed to alloc kernel 
SGL buffer for IOCTL 

Other times that same command (or other MegaCli64 commands) will succeed and 
return the associated data. When this happens, there is no megasas kernel 
message.

-T
 
On Nov 20, 2012, at 9:24 AM, Bjørn Mork wrote:

 Todd Fleisher t...@fleetstreetops.com writes:
 
 FYI - I'm seeing this same issue in Ubuntu 12.04: Linux deb015.pod02
 3.2.0-32-generic #51-Ubuntu SMP Wed Sep 26 21:33:09 UTC 2012 x86_64
 x86_64 x86_64 GNU/Linux
 
 Shit!  I have a bad feeling I might be responsible here...
 
 Looks like the fix I submitted a while ago results in leaking
 dma_allocated memory instead of BUGing out. Maybe slightly better in a
 short term, but slightly more difficult to notice. Does it take a while
 before this error starts appearing?  Do you run some smartctl commands
 periodically?
 
 I'd appreciate it if the good Debian kernel team could tak a look at
 this before it goes upstream, but I believe something like the attached
 patch might fix the bug.  This patch is based on v3.2.34, but I'll
 rebase it on current mainline and submit it upstream with Cc stable if
 any of you confirms that this look sane
 
 
 Bjørn
 
 From 4c41818461c2604f859d2fecda2657827071f0d4 Mon Sep 17 00:00:00 2001
 From: =?UTF-8?q?Bj=C3=B8rn=20Mork?= bj...@mork.no
 Date: Tue, 20 Nov 2012 18:17:48 +0100
 Subject: [PATCH] megaraid_sas: fix memory leak if SGL has 0 length entries
 MIME-Version: 1.0
 Content-Type: text/plain; charset=UTF-8
 Content-Transfer-Encoding: 8bit
 
 commit 98cb7e44 ([SCSI] megaraid_sas: Sanity check user
 supplied length before passing it to dma_alloc_coherent())
 introduced a memory leak.  Memory allocated for entries
 following zero length SGL entries will not be freed.
 
 Signed-off-by: Bjørn Mork bj...@mork.no
 ---
 drivers/scsi/megaraid/megaraid_sas_base.c |5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c 
 b/drivers/scsi/megaraid/megaraid_sas_base.c
 index 7c471eb..f013432 100644
 --- a/drivers/scsi/megaraid/megaraid_sas_base.c
 +++ b/drivers/scsi/megaraid/megaraid_sas_base.c
 @@ -4886,8 +4886,9 @@ megasas_mgmt_fw_ioctl(struct megasas_instance *instance,
   sense, sense_handle);
   }
 
 - for (i = 0; i  ioc-sge_count  kbuff_arr[i]; i++) {
 - dma_free_coherent(instance-pdev-dev,
 + for (i = 0; i  ioc-sge_count; i++) {
 + if (kbuff_arr[i])
 + dma_free_coherent(instance-pdev-dev,
   kern_sge32[i].length,
   kbuff_arr[i], kern_sge32[i].phys_addr);
   }
 -- 
 1.7.10.4
 


--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/84cd168f-0f6e-42e6-a3e2-a1ef620fa...@fleetstreetops.com



Bug#688198: megasas: Failed to alloc kernel SGL buffer for IOCTL - Possible regression from 2.6.32.41~3

2012-11-20 Thread Bjørn Mork
Todd Fleisher t...@fleetstreetops.com writes:

 I get this periodically (seemingly random - but usually once it starts 
 happening it sticks around for a while, then disappears only to return later) 
 when I'm using LSI's MegaCli64 utility. When the kernel logs the error the 
 MegaCli64 command doesn't return any data either.

 Ex:
 root@deb015.pod02:~# MegaCli64 -PDList -aALL


 Exit Code: 0x00


 Which is paired with a kernel message:
 Nov 20 20:29:50 deb015 kernel: [797020.797811] megasas: Failed to alloc 
 kernel SGL buffer for IOCTL 

 Other times that same command (or other MegaCli64 commands) will succeed and 
 return the associated data. When this happens, there is no megasas kernel 
 message.


Thanks.  I don't know what the MegaCli64 utility does, but I assume it
use the driver specific ioctls to send passthrough commands like the
smartmontools do.  That is consistent with your description.

But I was concluding too fast as usual.  The bug I found needs to be
fixed, but it cannot be the cause of this problem.  If it were then you
would most likely see many other effects on your system.  And the same
bug has been backported to 2.6.32 as well.  And if if had not been, and
you are in fact hit by it, then your system would have crashed instead.

So that cannot be the problem.  And then I don't know what could have
changed between 2.6.32 and 3.2.  Could be something outside this driver.

It would be interesting to know something about the size of the buffers
which cannot be allocated.  But running with debug pacthes is maybe out
of the question? Otherwise you could try running with something like
this to get a better picture of why this is failing:


diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c 
b/drivers/scsi/megaraid/megaraid_sas_base.c
index f013432..1c0fa1d 100644
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -4797,6 +4797,7 @@ megasas_mgmt_fw_ioctl(struct megasas_instance *instance,
if (!kbuff_arr[i]) {
printk(KERN_DEBUG megasas: Failed to alloc 
   kernel SGL buffer for IOCTL \n);
+   printk(KERN_DEBUG megasas: iov_len=%d\n, 
ioc-sgl[i].iov_len);
error = -ENOMEM;
goto out;
}




Bjørn


--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/87txsj5smj@nemi.mork.no