On Wed, 13 Feb 2008 13:43:24 -0800
Tim Pepper <[EMAIL PROTECTED]> wrote:

> We recently upgraded a production x86_64 machine with serveraid
> cards to 2.6.24 and noted that /proc/scsi/scsi showed garbage for our
> serveraid service processors.  sg_inq also returned garbage from the
> service processors' sg devices.  After a few iterations I started seeing
> meaninful stuff in the garbage.  Not sure if it was returning live memory
> or just unzero'd.  Either way not good so we went back to a known good,
> older kernel and tried to repro on a similar machine.  We got different,
> but still bad results in terms of pointing at memory badness.
> 
> FWIW, the original machine had the following hardware:
>     scsi0 : IBM PCI ServeRAID 7.12.05  Build 761 <ServeRAID 4H>
>     scsi1 : IBM PCI ServeRAID 7.12.05  Build 761 <ServeRAID 4M>
> and the repro's have been on a machine with just:
>     scsi0 : IBM PCI ServeRAID 7.12.05  Build 761 <ServeRAID 4Mx>
> 
> On the repro machine I'm getting a hang on ips driver load with the following
> logged:
> 
> Feb 13 13:16:08 ipstest kernel: [  915.236563] scsi3 : IBM PCI ServeRAID 
> 7.12.05  Build 761 <ServeRAID 4Mx>
> Feb 13 13:16:08 ipstest kernel: [  915.236839] Unable to handle kernel NULL 
> pointer dereference at 0000000000000000 RIP:
> Feb 13 13:16:08 ipstest kernel: [  915.236863]  [check_addr+16/80] 
> check_addr+0x10/0x50
> Feb 13 13:16:08 ipstest kernel: [  915.237209] PGD 79fff067 PUD 7a898067 PMD 0
> Feb 13 13:16:08 ipstest kernel: [  915.237341] Oops: 0000 [1] SMP
> Feb 13 13:16:08 ipstest kernel: [  915.237463] CPU 1
> Feb 13 13:16:08 ipstest kernel: [  915.239436] Modules linked in: ips aic94xx
> Feb 13 13:16:08 ipstest kernel: [  915.239559] Pid: 5213, comm: scsi_scan_3 
> Not tainted 2.6.23-ips_as_module #3
> Feb 13 13:16:08 ipstest kernel: [  915.239692] RIP: 0010:[check_addr+16/80]  
> [check_addr+16/80] check_addr+0x10/0x50
> Feb 13 13:16:08 ipstest kernel: [  915.239932] RSP: 0018:ffff810076d87900  
> EFLAGS: 00010082
> Feb 13 13:16:08 ipstest kernel: [  915.240059] RAX: 0000000000000000 RBX: 
> ffff81007b636300 RCX: 0000000000000024
> Feb 13 13:16:08 ipstest kernel: [  915.240196] RDX: 000000007b636b00 RSI: 
> ffffffff8077cde0 RDI: ffffffff806c4ed5
> Feb 13 13:16:08 ipstest kernel: [  915.240332] RBP: ffff810076d87900 R08: 
> 0000000000000500 R09: 0000000000000000
> Feb 13 13:16:08 ipstest kernel: [  915.240468] R10: ffff81007aa33b40 R11: 
> 0000000000000060 R12: 0000000000000000
> Feb 13 13:16:08 ipstest kernel: [  915.240605] R13: 0000000000000001 R14: 
> ffffffff8077cde0 R15: ffff81007aa33a80
> Feb 13 13:16:08 ipstest kernel: [  915.240741] FS:  0000000000000000(0000) 
> GS:ffff810001039300(0000) knlGS:0000000000000000
> Feb 13 13:16:08 ipstest kernel: [  915.240981] CS:  0010 DS: 0018 ES: 0018 
> CR0: 000000008005003b
> Feb 13 13:16:08 ipstest kernel: [  915.241111] CR2: 0000000000000000 CR3: 
> 0000000078a98000 CR4: 00000000000006e0
> Feb 13 13:16:08 ipstest kernel: [  915.241248] DR0: 0000000000000000 DR1: 
> 0000000000000000 DR2: 0000000000000000
> Feb 13 13:16:08 ipstest kernel: [  915.241384] DR3: 0000000000000000 DR6: 
> 00000000ffff0ff0 DR7: 0000000000000400
> Feb 13 13:16:08 ipstest kernel: [  915.241520] Process scsi_scan_3 (pid: 
> 5213, threadinfo ffff810076d86000, task ffff81007be26720)
> Feb 13 13:16:08 ipstest kernel: [  915.241761] Stack:  ffff810076d87930 
> ffffffff802125c3 ffff81007aa33a80 ffff81007480cf50
> Feb 13 13:16:08 ipstest kernel: [  915.242006]  0000000000000000 
> ffff81007ba38ca8 ffff810076d87940 ffffffff8046fb42
> Feb 13 13:16:08 ipstest kernel: [  915.242248]  ffff810076d879c0 
> ffffffff8801c2ee ffff81007aa33af0 000000017aa33af0
> Feb 13 13:16:08 ipstest kernel: [  915.242389] Call Trace:
> Feb 13 13:16:08 ipstest kernel: [  915.242606]  [nommu_map_sg+115/144] 
> nommu_map_sg+0x73/0x90
> Feb 13 13:16:08 ipstest kernel: [  915.242736]  [scsi_dma_map+66/96] 
> scsi_dma_map+0x42/0x60
> Feb 13 13:16:08 ipstest kernel: [  915.242867]  [_end+124884230/2127548952] 
> :ips:ips_next+0x33e/0xc00
> Feb 13 13:16:08 ipstest kernel: [  915.242986]  [scsi_done+0/48] 
> scsi_done+0x0/0x30
> Feb 13 13:16:08 ipstest kernel: [  915.243114]  [_end+124896894/2127548952] 
> :ips:ips_queue+0x106/0x1f0
> Feb 13 13:16:08 ipstest kernel: [  915.243240]  [scsi_dispatch_cmd+498/784] 
> scsi_dispatch_cmd+0x1f2/0x310
> Feb 13 13:16:08 ipstest kernel: [  915.243370]  [scsi_request_fn+491/976] 
> scsi_request_fn+0x1eb/0x3d0
> Feb 13 13:16:08 ipstest kernel: [  915.243500]  
> [__generic_unplug_device+37/48] __generic_unplug_device+0x25/0x30
> Feb 13 13:16:08 ipstest kernel: [  915.243630]  
> [blk_execute_rq_nowait+99/176] blk_execute_rq_nowait+0x63/0xb0
> Feb 13 13:16:08 ipstest kernel: [  915.243761]  [blk_execute_rq+122/224] 
> blk_execute_rq+0x7a/0xe0
> Feb 13 13:16:08 ipstest kernel: [  915.243889]  [scsi_execute+240/288] 
> scsi_execute+0xf0/0x120
> Feb 13 13:16:08 ipstest kernel: [  915.244016]  [scsi_execute_req+134/240] 
> scsi_execute_req+0x86/0xf0
> Feb 13 13:16:08 ipstest kernel: [  915.244145]  
> [scsi_probe_and_add_lun+594/3472] scsi_probe_and_add_lun+0x252/0xd90
> Feb 13 13:16:08 ipstest kernel: [  915.244279]  [sas_expander_match+27/160] 
> sas_expander_match+0x1b/0xa0
> Feb 13 13:16:08 ipstest kernel: [  915.244412]  [get_device+23/32] 
> get_device+0x17/0x20
> Feb 13 13:16:08 ipstest kernel: [  915.244534]  [__scsi_scan_target+220/1696] 
> __scsi_scan_target+0xdc/0x6a0
> Feb 13 13:16:08 ipstest kernel: [  915.244665]  [enqueue_entity+172/432] 
> enqueue_entity+0xac/0x1b0
> Feb 13 13:16:08 ipstest kernel: [  915.244793]  [update_curr_load+135/160] 
> update_curr_load+0x87/0xa0
> Feb 13 13:16:08 ipstest kernel: [  915.244923]  
> [__check_preempt_curr_fair+107/128] __check_preempt_curr_fair+0x6b/0x80
> Feb 13 13:16:08 ipstest kernel: [  915.245057]  [update_curr+258/272] 
> update_curr+0x102/0x110
> Feb 13 13:16:08 ipstest kernel: [  915.245186]  [scsi_scan_channel+139/160] 
> scsi_scan_channel+0x8b/0xa0
> Feb 13 13:16:08 ipstest kernel: [  915.245315]  
> [scsi_scan_host_selected+158/352] scsi_scan_host_selected+0x9e/0x160
> Feb 13 13:16:08 ipstest kernel: [  915.245447]  [do_scan_async+0/320] 
> do_scan_async+0x0/0x140
> Feb 13 13:16:08 ipstest kernel: [  915.245574]  [do_scsi_scan_host+126/128] 
> do_scsi_scan_host+0x7e/0x80
> Feb 13 13:16:08 ipstest kernel: [  915.245703]  [do_scan_async+23/320] 
> do_scan_async+0x17/0x140
> Feb 13 13:16:08 ipstest kernel: [  915.245832]  [do_scan_async+0/320] 
> do_scan_async+0x0/0x140
> Feb 13 13:16:08 ipstest kernel: [  915.245962]  [kthread+77/128] 
> kthread+0x4d/0x80
> Feb 13 13:16:08 ipstest kernel: [  915.246086]  [child_rip+10/18] 
> child_rip+0xa/0x12
> Feb 13 13:16:08 ipstest kernel: [  915.246209]  [kthread+0/128] 
> kthread+0x0/0x80
> Feb 13 13:16:08 ipstest kernel: [  915.246333]  [child_rip+0/18] 
> child_rip+0x0/0x12
> Feb 13 13:16:08 ipstest kernel: [  915.246457]
> Feb 13 13:16:08 ipstest kernel: [  915.246564]
> Feb 13 13:16:08 ipstest kernel: [  915.246565] Code: 4c 8b 00 48 8d 04 0a 4c 
> 39 c0 76 2b b8 fe ff ff ff 31 f6 49
> Feb 13 13:16:08 ipstest kernel: [  915.246933] RIP  [check_addr+16/80] 
> check_addr+0x10/0x50
> Feb 13 13:16:08 ipstest kernel: [  915.247062]  RSP <ffff810076d87900>
> Feb 13 13:16:08 ipstest kernel: [  915.247181] CR2: 0000000000000000
> 
> I was able to narrow it down in as much as with this reverted the machine
> seems to run fine:
>     commit 2f4cf91cc0a1f32f75e1fa0a4d70a9bc7340a302
>     [SCSI] ips: convert to use the data buffer accessors
> 
> Nothing looks overly suspicious in that patch per se, although based
> on the list archives it looks like related changes caused other drivers
> grief.  I've tried a variety of things to get a little more debug info,
> but to no avail.  If anybody has any suggestions, I'd appreciate them!

Really sorry about the bug.

I have a slight doubt on the breakup code though I'm not sure you hit
the code. Reverting only the breakup part works? The patch is against
2.6.24.


diff --git a/drivers/scsi/ips.c b/drivers/scsi/ips.c
index 5c5a9b2..acabb19 100644
--- a/drivers/scsi/ips.c
+++ b/drivers/scsi/ips.c
@@ -3251,34 +3251,52 @@ ips_done(ips_ha_t * ha, ips_scb_t * scb)
                 * the rest of the data and continue.
                 */
                if ((scb->breakup) || (scb->sg_break)) {
-                        struct scatterlist *sg;
-                        int i, sg_dma_index, ips_sg_index = 0;
-
                        /* we had a data breakup */
                        scb->data_len = 0;
 
-                        sg = scsi_sglist(scb->scsi_cmd);
-
-                        /* Spin forward to last dma chunk */
-                        sg_dma_index = scb->breakup;
-                        for (i = 0; i < scb->breakup; i++)
-                                sg = sg_next(sg);
-
-                       /* Take care of possible partial on last chunk */
-                        ips_fill_scb_sg_single(ha,
-                                               sg_dma_address(sg),
-                                               scb, ips_sg_index++,
-                                               sg_dma_len(sg));
-
-                        for (; sg_dma_index < scsi_sg_count(scb->scsi_cmd);
-                             sg_dma_index++, sg = sg_next(sg)) {
-                                if (ips_fill_scb_sg_single
-                                    (ha,
-                                     sg_dma_address(sg),
-                                     scb, ips_sg_index++,
-                                     sg_dma_len(sg)) < 0)
-                                        break;
-                        }
+                       if (scb->sg_count) {
+                               /* S/G request */
+                               struct scatterlist *sg;
+                               int ips_sg_index = 0;
+                               int sg_dma_index;
+
+                               sg = scb->scsi_cmd->request_buffer;
+
+                               /* Spin forward to last dma chunk */
+                               sg_dma_index = scb->breakup;
+
+                               /* Take care of possible partial on last chunk 
*/
+                               ips_fill_scb_sg_single(ha,
+                                                      sg_dma_address(&sg
+                                                                     
[sg_dma_index]),
+                                                      scb, ips_sg_index++,
+                                                      sg_dma_len(&sg
+                                                                 
[sg_dma_index]));
+
+                               for (; sg_dma_index < scb->sg_count;
+                                    sg_dma_index++) {
+                                       if (ips_fill_scb_sg_single
+                                           (ha,
+                                            sg_dma_address(&sg[sg_dma_index]),
+                                            scb, ips_sg_index++,
+                                            sg_dma_len(&sg[sg_dma_index])) < 0)
+                                               break;
+
+                               }
+
+                       } else {
+                               /* Non S/G Request */
+                               (void) ips_fill_scb_sg_single(ha,
+                                                             scb->
+                                                             data_busaddr +
+                                                             (scb->sg_break *
+                                                              ha->max_xfer),
+                                                             scb, 0,
+                                                             scb->scsi_cmd->
+                                                             request_bufflen -
+                                                             (scb->sg_break *
+                                                              ha->max_xfer));
+                       }
 
                        scb->dcdb.transfer_length = scb->data_len;
                        scb->dcdb.cmd_attribute |=
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to