Bug#688198: megasas: Failed to alloc kernel SGL buffer for IOCTL - Possible regression from 2.6.32.41~3
FWIW - I don't experience the problem/message on a Debian Squeeze box running Linux deb003.pod01 2.6.32-5-xen-amd64 #1 SMP Mon Jan 16 20:48:30 UTC 2012 x86_64 GNU/Linux I'm not currently able to re-compile my 3.2 ubuntu 12.04 kernel, but will try to find a comparable system to do it on. On Nov 20, 2012, at 11:20 PM, Bjørn Mork wrote: Todd Fleisher t...@fleetstreetops.com writes: I get this periodically (seemingly random - but usually once it starts happening it sticks around for a while, then disappears only to return later) when I'm using LSI's MegaCli64 utility. When the kernel logs the error the MegaCli64 command doesn't return any data either. Ex: root@deb015.pod02:~# MegaCli64 -PDList -aALL Exit Code: 0x00 Which is paired with a kernel message: Nov 20 20:29:50 deb015 kernel: [797020.797811] megasas: Failed to alloc kernel SGL buffer for IOCTL Other times that same command (or other MegaCli64 commands) will succeed and return the associated data. When this happens, there is no megasas kernel message. Thanks. I don't know what the MegaCli64 utility does, but I assume it use the driver specific ioctls to send passthrough commands like the smartmontools do. That is consistent with your description. But I was concluding too fast as usual. The bug I found needs to be fixed, but it cannot be the cause of this problem. If it were then you would most likely see many other effects on your system. And the same bug has been backported to 2.6.32 as well. And if if had not been, and you are in fact hit by it, then your system would have crashed instead. So that cannot be the problem. And then I don't know what could have changed between 2.6.32 and 3.2. Could be something outside this driver. It would be interesting to know something about the size of the buffers which cannot be allocated. But running with debug pacthes is maybe out of the question? Otherwise you could try running with something like this to get a better picture of why this is failing: diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c index f013432..1c0fa1d 100644 --- a/drivers/scsi/megaraid/megaraid_sas_base.c +++ b/drivers/scsi/megaraid/megaraid_sas_base.c @@ -4797,6 +4797,7 @@ megasas_mgmt_fw_ioctl(struct megasas_instance *instance, if (!kbuff_arr[i]) { printk(KERN_DEBUG megasas: Failed to alloc kernel SGL buffer for IOCTL \n); + printk(KERN_DEBUG megasas: iov_len=%d\n, ioc-sgl[i].iov_len); error = -ENOMEM; goto out; } Bjørn -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/ec4cf311-1d76-4377-9ca8-f2fdca291...@fleetstreetops.com
Bug#688198: megasas: Failed to alloc kernel SGL buffer for IOCTL - Possible regression from 2.6.32.41~3
FYI - I'm seeing this same issue in Ubuntu 12.04: Linux deb015.pod02 3.2.0-32-generic #51-Ubuntu SMP Wed Sep 26 21:33:09 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/2cca949c-9192-47dd-91bb-7b3af80f7...@fleetstreetops.com
Bug#688198: megasas: Failed to alloc kernel SGL buffer for IOCTL - Possible regression from 2.6.32.41~3
Todd Fleisher t...@fleetstreetops.com writes: FYI - I'm seeing this same issue in Ubuntu 12.04: Linux deb015.pod02 3.2.0-32-generic #51-Ubuntu SMP Wed Sep 26 21:33:09 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux Shit! I have a bad feeling I might be responsible here... Looks like the fix I submitted a while ago results in leaking dma_allocated memory instead of BUGing out. Maybe slightly better in a short term, but slightly more difficult to notice. Does it take a while before this error starts appearing? Do you run some smartctl commands periodically? I'd appreciate it if the good Debian kernel team could tak a look at this before it goes upstream, but I believe something like the attached patch might fix the bug. This patch is based on v3.2.34, but I'll rebase it on current mainline and submit it upstream with Cc stable if any of you confirms that this look sane Bjørn From 4c41818461c2604f859d2fecda2657827071f0d4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B8rn=20Mork?= bj...@mork.no Date: Tue, 20 Nov 2012 18:17:48 +0100 Subject: [PATCH] megaraid_sas: fix memory leak if SGL has 0 length entries MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 98cb7e44 ([SCSI] megaraid_sas: Sanity check user supplied length before passing it to dma_alloc_coherent()) introduced a memory leak. Memory allocated for entries following zero length SGL entries will not be freed. Signed-off-by: Bjørn Mork bj...@mork.no --- drivers/scsi/megaraid/megaraid_sas_base.c |5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c index 7c471eb..f013432 100644 --- a/drivers/scsi/megaraid/megaraid_sas_base.c +++ b/drivers/scsi/megaraid/megaraid_sas_base.c @@ -4886,8 +4886,9 @@ megasas_mgmt_fw_ioctl(struct megasas_instance *instance, sense, sense_handle); } - for (i = 0; i ioc-sge_count kbuff_arr[i]; i++) { - dma_free_coherent(instance-pdev-dev, + for (i = 0; i ioc-sge_count; i++) { + if (kbuff_arr[i]) + dma_free_coherent(instance-pdev-dev, kern_sge32[i].length, kbuff_arr[i], kern_sge32[i].phys_addr); } -- 1.7.10.4
Bug#688198: megasas: Failed to alloc kernel SGL buffer for IOCTL - Possible regression from 2.6.32.41~3
I get this periodically (seemingly random - but usually once it starts happening it sticks around for a while, then disappears only to return later) when I'm using LSI's MegaCli64 utility. When the kernel logs the error the MegaCli64 command doesn't return any data either. Ex: root@deb015.pod02:~# MegaCli64 -PDList -aALL Exit Code: 0x00 Which is paired with a kernel message: Nov 20 20:29:50 deb015 kernel: [797020.797811] megasas: Failed to alloc kernel SGL buffer for IOCTL Other times that same command (or other MegaCli64 commands) will succeed and return the associated data. When this happens, there is no megasas kernel message. -T On Nov 20, 2012, at 9:24 AM, Bjørn Mork wrote: Todd Fleisher t...@fleetstreetops.com writes: FYI - I'm seeing this same issue in Ubuntu 12.04: Linux deb015.pod02 3.2.0-32-generic #51-Ubuntu SMP Wed Sep 26 21:33:09 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux Shit! I have a bad feeling I might be responsible here... Looks like the fix I submitted a while ago results in leaking dma_allocated memory instead of BUGing out. Maybe slightly better in a short term, but slightly more difficult to notice. Does it take a while before this error starts appearing? Do you run some smartctl commands periodically? I'd appreciate it if the good Debian kernel team could tak a look at this before it goes upstream, but I believe something like the attached patch might fix the bug. This patch is based on v3.2.34, but I'll rebase it on current mainline and submit it upstream with Cc stable if any of you confirms that this look sane Bjørn From 4c41818461c2604f859d2fecda2657827071f0d4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B8rn=20Mork?= bj...@mork.no Date: Tue, 20 Nov 2012 18:17:48 +0100 Subject: [PATCH] megaraid_sas: fix memory leak if SGL has 0 length entries MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 98cb7e44 ([SCSI] megaraid_sas: Sanity check user supplied length before passing it to dma_alloc_coherent()) introduced a memory leak. Memory allocated for entries following zero length SGL entries will not be freed. Signed-off-by: Bjørn Mork bj...@mork.no --- drivers/scsi/megaraid/megaraid_sas_base.c |5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c index 7c471eb..f013432 100644 --- a/drivers/scsi/megaraid/megaraid_sas_base.c +++ b/drivers/scsi/megaraid/megaraid_sas_base.c @@ -4886,8 +4886,9 @@ megasas_mgmt_fw_ioctl(struct megasas_instance *instance, sense, sense_handle); } - for (i = 0; i ioc-sge_count kbuff_arr[i]; i++) { - dma_free_coherent(instance-pdev-dev, + for (i = 0; i ioc-sge_count; i++) { + if (kbuff_arr[i]) + dma_free_coherent(instance-pdev-dev, kern_sge32[i].length, kbuff_arr[i], kern_sge32[i].phys_addr); } -- 1.7.10.4 -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/84cd168f-0f6e-42e6-a3e2-a1ef620fa...@fleetstreetops.com
Bug#688198: megasas: Failed to alloc kernel SGL buffer for IOCTL - Possible regression from 2.6.32.41~3
Todd Fleisher t...@fleetstreetops.com writes: I get this periodically (seemingly random - but usually once it starts happening it sticks around for a while, then disappears only to return later) when I'm using LSI's MegaCli64 utility. When the kernel logs the error the MegaCli64 command doesn't return any data either. Ex: root@deb015.pod02:~# MegaCli64 -PDList -aALL Exit Code: 0x00 Which is paired with a kernel message: Nov 20 20:29:50 deb015 kernel: [797020.797811] megasas: Failed to alloc kernel SGL buffer for IOCTL Other times that same command (or other MegaCli64 commands) will succeed and return the associated data. When this happens, there is no megasas kernel message. Thanks. I don't know what the MegaCli64 utility does, but I assume it use the driver specific ioctls to send passthrough commands like the smartmontools do. That is consistent with your description. But I was concluding too fast as usual. The bug I found needs to be fixed, but it cannot be the cause of this problem. If it were then you would most likely see many other effects on your system. And the same bug has been backported to 2.6.32 as well. And if if had not been, and you are in fact hit by it, then your system would have crashed instead. So that cannot be the problem. And then I don't know what could have changed between 2.6.32 and 3.2. Could be something outside this driver. It would be interesting to know something about the size of the buffers which cannot be allocated. But running with debug pacthes is maybe out of the question? Otherwise you could try running with something like this to get a better picture of why this is failing: diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c index f013432..1c0fa1d 100644 --- a/drivers/scsi/megaraid/megaraid_sas_base.c +++ b/drivers/scsi/megaraid/megaraid_sas_base.c @@ -4797,6 +4797,7 @@ megasas_mgmt_fw_ioctl(struct megasas_instance *instance, if (!kbuff_arr[i]) { printk(KERN_DEBUG megasas: Failed to alloc kernel SGL buffer for IOCTL \n); + printk(KERN_DEBUG megasas: iov_len=%d\n, ioc-sgl[i].iov_len); error = -ENOMEM; goto out; } Bjørn -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/87txsj5smj@nemi.mork.no