On Sat, Jun 6, 2009 at 1:36 AM, Bart Van Assche <[email protected]> wrote: > On Sat, Jun 6, 2009 at 1:15 AM, Chris Worley<[email protected]> wrote: >> Setup: 1.4.1 w/ 3 dual-port QDR cards in each of two hosts, all ports >> direct connected, opensm running on all port GUIDs from one host, all >> links active. >> >> Problem: ibsrpdm only advertises the first port of the first HCA of the >> target. >> Next problem: I can add targets via >> /sys/class/infiniband_srp/srp-*/add_target on the initiator, but only >> when naming the two port guids of the first HCA on the target. In >> testing, both ports are used. >> >> Can somebody aim me in the right direction of what/who's stopping >> after the first HCA? > > Please have a look at the /sys/class/infiniband_srpt/srpt-*/login_info > information on the target. The following information should be > present: > * One /sys/class/infiniband_srpt/srpt-* entry per HCA. > * For each HCA, /sys/class/infiniband_srpt/srpt-${HCA}/login_info > should contain one line for each port of that HCA.
# cat /sys/class/infiniband_srpt/srpt-*/login_info tid_ext=0024710000000040,ioc_guid=0024710000000040,pkey=ffff,dgid=fe800000000000000024710000000041,service_id=0024710000000040 tid_ext=0024710000000040,ioc_guid=0024710000000040,pkey=ffff,dgid=fe800000000000000024710000000042,service_id=0024710000000040 tid_ext=0024710000000040,ioc_guid=0024710000000040,pkey=ffff,dgid=fe800000000000000024710000000045,service_id=0024710000000040 tid_ext=0024710000000040,ioc_guid=0024710000000040,pkey=ffff,dgid=fe800000000000000024710000000046,service_id=0024710000000040 tid_ext=0024710000000040,ioc_guid=0024710000000040,pkey=ffff,dgid=fe800000000000000002c903000292af,service_id=0024710000000040 tid_ext=0024710000000040,ioc_guid=0024710000000040,pkey=ffff,dgid=fe800000000000000002c903000292b0,service_id=0024710000000040 Each port has an entry, and the port GUIDs are correct (dgid's), but the rest of the GUIDs refer to the node GUID of the first IB HCA: 0024710000000040. Is that expected? > > On the initiator you can use the information obtained from > "login_info" (after having replaced tid_ext by id_ext) to log in to > the target: > echo ... > /sys/class/infiniband_srp/srp-mlx4_0-1/add_target Using the first HCA's node GUIDs from my target adds on the initiator seems to work, but soon after (and not doing anything w/ the devices) the system panic'd (and remote power cycling is not working). It doesn't look like the panic was anywhere in IB or SRP modules: ... SCSI device sdbo: 314287168 512-byte hdwr sectors (160915 MB) sdbo: Write Protect is off sdbo: Mode Sense: 83 00 10 08 SCSI device sdbo: drive cache: write back w/ FUA SCSI device sdbo: 314287168 512-byte hdwr sectors (160915 MB) sdbo: Write Protect is off sdbo: Mode Sense: 83 00 10 08 SCSI device sdbo: drive cache: write back w/ FUA sdbo: unknown partition table sd 42:0:0:5: Attached scsi disk sdbo Vendor: SCST_BIO Model: vdisk6 Rev: 102 Type: Direct-Access ANSI SCSI revision: 04 SCSI device sdbp: 314287168 512-byte hdwr sectors (160915 MB) sdbp: Write Protect is off sdbp: Mode Sense: 83 00 10 08 SCSI device sdbp: drive cache: write back w/ FUA host32: ib_srp: connection closed ib_srp: host32: add qp_in_err timer SCSI device sdbp: 314287168 512-byte hdwr sectors (160915 MB) sdbp: Write Protect is off sdbp: Mode Sense: 83 00 10 08 SCSI device sdbp: drive cache: write back w/ FUA sdbp: unknown partition table sd 42:0:0:6: Attached scsi disk sdbp Vendor: SCST_BIO Model: vdisk7 Rev: 102 Type: Direct-Access ANSI SCSI revision: 04 SCSI device sdbq: 314287168 512-byte hdwr sectors (160915 MB) sdbq: Write Protect is off sdbq: Mode Sense: 83 00 10 08 SCSI device sdbq: drive cache: write back w/ FUA SCSI device sdbq: 314287168 512-byte hdwr sectors (160915 MB) sdbq: Write Protect is off sdbq: Mode Sense: 83 00 10 08 SCSI device sdbq: drive cache: write back w/ FUA sdbq: unknown partition table sd 42:0:0:7: Attached scsi disk sdbq host31: ib_srp: srp_qp_in_err_timer called host31: ib_srp: srp_qp_in_err_timer flushed reset - done host31: ib_srp: Sending CM DREQ failed host37: ib_srp: DREQ received - connection closed host32: ib_srp: srp_qp_in_err_timer called host32: ib_srp: srp_qp_in_err_timer flushed reset - done host32: ib_srp: Sending CM DREQ failed host38: ib_srp: DREQ received - connection closed host37: ib_srp: connection closed ib_srp: host37: add qp_in_err timer host38: ib_srp: connection closed ib_srp: host38: add qp_in_err timer host37: ib_srp: srp_qp_in_err_timer called host37: ib_srp: srp_qp_in_err_timer flushed reset - done host37: ib_srp: Sending CM DREQ failed host31: ib_srp: DREQ received - connection closed host38: ib_srp: srp_qp_in_err_timer called host38: ib_srp: srp_qp_in_err_timer flushed reset - done host38: ib_srp: Sending CM DREQ failed host32: ib_srp: DREQ received - connection closed host31: ib_srp: connection closed ib_srp: host31: add qp_in_err timer host32: ib_srp: connection closed ib_srp: host32: add qp_in_err timer host31: ib_srp: srp_qp_in_err_timer called host31: ib_srp: srp_qp_in_err_timer flushed reset - done host31: ib_srp: Sending CM DREQ failed host37: ib_srp: DREQ received - connection closed host32: ib_srp: srp_qp_in_err_timer called host32: ib_srp: srp_qp_in_err_timer flushed reset - done host32: ib_srp: Sending CM DREQ failed host38: ib_srp: DREQ received - connection closed host37: ib_srp: connection closed ib_srp: host37: add qp_in_err timer host38: ib_srp: connection closed ib_srp: host38: add qp_in_err timer host37: ib_srp: srp_qp_in_err_timer called host37: ib_srp: srp_qp_in_err_timer flushed reset - done host37: ib_srp: Sending CM DREQ failed host31: ib_srp: DREQ received - connection closed host38: ib_srp: srp_qp_in_err_timer called host38: ib_srp: srp_qp_in_err_timer flushed reset - done host38: ib_srp: Sending CM DREQ failed host32: ib_srp: DREQ received - connection closed host31: ib_srp: connection closed ib_srp: host31: add qp_in_err timer host32: ib_srp: connection closed ib_srp: host32: add qp_in_err timer host31: ib_srp: srp_qp_in_err_timer called host31: ib_srp: srp_qp_in_err_timer flushed reset - done host31: ib_srp: Sending CM DREQ failed host37: ib_srp: DREQ received - connection closed host32: ib_srp: srp_qp_in_err_timer called host32: ib_srp: srp_qp_in_err_timer flushed reset - done host32: ib_srp: Sending CM DREQ failed host38: ib_srp: DREQ received - connection closed host37: ib_srp: connection closed ib_srp: host37: add qp_in_err timer host38: ib_srp: connection closed ib_srp: host38: add qp_in_err timer host37: ib_srp: srp_qp_in_err_timer called host37: ib_srp: srp_qp_in_err_timer flushed reset - done host37: ib_srp: Sending CM DREQ failed host31: ib_srp: DREQ received - connection closed host38: ib_srp: srp_qp_in_err_timer called host38: ib_srp: srp_qp_in_err_timer flushed reset - done host38: ib_srp: Sending CM DREQ failed host32: ib_srp: DREQ received - connection closed host31: ib_srp: connection closed ib_srp: host31: add qp_in_err timer host32: ib_srp: connection closed ib_srp: host32: add qp_in_err timer host31: ib_srp: srp_qp_in_err_timer called host31: ib_srp: srp_qp_in_err_timer flushed reset - done host31: ib_srp: Sending CM DREQ failed host37: ib_srp: DREQ received - connection closed host32: ib_srp: srp_qp_in_err_timer called host32: ib_srp: srp_qp_in_err_timer flushed reset - done host32: ib_srp: Sending CM DREQ failed host38: ib_srp: DREQ received - connection closed host37: ib_srp: connection closed ib_srp: host37: add qp_in_err timer host38: ib_srp: connection closed ib_srp: host38: add qp_in_err timer host37: ib_srp: srp_qp_in_err_timer called host37: ib_srp: srp_qp_in_err_timer flushed reset - done host37: ib_srp: Sending CM DREQ failed host31: ib_srp: DREQ received - connection closed host38: ib_srp: srp_qp_in_err_timer called host38: ib_srp: srp_qp_in_err_timer flushed reset - done host38: ib_srp: Sending CM DREQ failed host32: ib_srp: DREQ received - connection closed host31: ib_srp: connection closed ib_srp: host31: add qp_in_err timer host32: ib_srp: connection closed ib_srp: host32: add qp_in_err timer host31: ib_srp: srp_qp_in_err_timer called host31: ib_srp: srp_qp_in_err_timer flushed reset - done host31: ib_srp: Sending CM DREQ failed host37: ib_srp: DREQ received - connection closed host32: ib_srp: srp_qp_in_err_timer called host32: ib_srp: srp_qp_in_err_timer flushed reset - done host32: ib_srp: Sending CM DREQ failed host38: ib_srp: DREQ received - connection closed host37: ib_srp: connection closed ib_srp: host37: add qp_in_err timer host38: ib_srp: connection closed ib_srp: host38: add qp_in_err timer host37: ib_srp: srp_qp_in_err_timer called host37: ib_srp: srp_qp_in_err_timer flushed reset - done host37: ib_srp: Sending CM DREQ failed host31: ib_srp: DREQ received - connection closed host38: ib_srp: srp_qp_in_err_timer called host38: ib_srp: srp_qp_in_err_timer flushed reset - done host38: ib_srp: Sending CM DREQ failed host32: ib_srp: DREQ received - connection closed host31: ib_srp: connection closed ib_srp: host31: add qp_in_err timer host32: ib_srp: connection closed ib_srp: host32: add qp_in_err timer host31: ib_srp: srp_qp_in_err_timer called host31: ib_srp: srp_qp_in_err_timer flushed reset - done host31: ib_srp: Sending CM DREQ failed host37: ib_srp: DREQ received - connection closed host32: ib_srp: srp_qp_in_err_timer called host32: ib_srp: srp_qp_in_err_timer flushed reset - done host32: ib_srp: Sending CM DREQ failed host38: ib_srp: DREQ received - connection closed host37: ib_srp: connection closed ib_srp: host37: add qp_in_err timer host38: ib_srp: connection closed ib_srp: host38: add qp_in_err timer host37: ib_srp: srp_qp_in_err_timer called host37: ib_srp: srp_qp_in_err_timer flushed reset - done host37: ib_srp: Sending CM DREQ failed host31: ib_srp: DREQ received - connection closed host38: ib_srp: srp_qp_in_err_timer called host38: ib_srp: srp_qp_in_err_timer flushed reset - done host38: ib_srp: Sending CM DREQ failed host32: ib_srp: DREQ received - connection closed host31: ib_srp: connection closed ib_srp: host31: add qp_in_err timer host32: ib_srp: connection closed ib_srp: host32: add qp_in_err timer host31: ib_srp: srp_qp_in_err_timer called host31: ib_srp: srp_qp_in_err_timer flushed reset - done host31: ib_srp: Sending CM DREQ failed host37: ib_srp: DREQ received - connection closed host32: ib_srp: srp_qp_in_err_timer called host32: ib_srp: srp_qp_in_err_timer flushed reset - done host32: ib_srp: Sending CM DREQ failed host38: ib_srp: DREQ received - connection closed host37: ib_srp: connection closed ib_srp: host37: add qp_in_err timer host38: ib_srp: connection closed ib_srp: host38: add qp_in_err timer host37: ib_srp: srp_qp_in_err_timer called host37: ib_srp: srp_qp_in_err_timer flushed reset - done host37: ib_srp: Sending CM DREQ failed host31: ib_srp: DREQ received - connection closed host38: ib_srp: srp_qp_in_err_timer called host38: ib_srp: srp_qp_in_err_timer flushed reset - done host38: ib_srp: Sending CM DREQ failed host32: ib_srp: DREQ received - connection closed host31: ib_srp: connection closed ib_srp: host31: add qp_in_err timer host32: ib_srp: connection closed ib_srp: host32: add qp_in_err timer host31: ib_srp: srp_qp_in_err_timer called host31: ib_srp: srp_qp_in_err_timer flushed reset - done host31: ib_srp: Sending CM DREQ failed host37: ib_srp: DREQ received - connection closed host32: ib_srp: srp_qp_in_err_timer called host32: ib_srp: srp_qp_in_err_timer flushed reset - done host32: ib_srp: Sending CM DREQ failed host38: ib_srp: DREQ received - connection closed host37: ib_srp: connection closed ib_srp: host37: add qp_in_err timer host38: ib_srp: connection closed ib_srp: host38: add qp_in_err timer host37: ib_srp: srp_qp_in_err_timer called host37: ib_srp: srp_qp_in_err_timer flushed reset - done host37: ib_srp: Sending CM DREQ failed host31: ib_srp: DREQ received - connection closed host38: ib_srp: srp_qp_in_err_timer called host38: ib_srp: srp_qp_in_err_timer flushed reset - done host38: ib_srp: Sending CM DREQ failed host32: ib_srp: DREQ received - connection closed host31: ib_srp: connection closed ib_srp: host31: add qp_in_err timer host32: ib_srp: connection closed ib_srp: host32: add qp_in_err timer host31: ib_srp: srp_qp_in_err_timer called host31: ib_srp: srp_qp_in_err_timer flushed reset - done host31: ib_srp: Sending CM DREQ failed host37: ib_srp: DREQ received - connection closed host32: ib_srp: srp_qp_in_err_timer called host32: ib_srp: srp_qp_in_err_timer flushed reset - done host32: ib_srp: Sending CM DREQ failed host38: ib_srp: DREQ received - connection closed host37: ib_srp: connection closed ib_srp: host37: add qp_in_err timer host38: ib_srp: connection closed ib_srp: host38: add qp_in_err timer host37: ib_srp: srp_qp_in_err_timer called host37: ib_srp: srp_qp_in_err_timer flushed reset - done host37: ib_srp: Sending CM DREQ failed host31: ib_srp: DREQ received - connection closed host38: ib_srp: srp_qp_in_err_timer called host38: ib_srp: srp_qp_in_err_timer flushed reset - done host38: ib_srp: Sending CM DREQ failed host32: ib_srp: DREQ received - connection closed host31: ib_srp: connection closed ib_srp: host31: add qp_in_err timer host32: ib_srp: connection closed ib_srp: host32: add qp_in_err timer host31: ib_srp: srp_qp_in_err_timer called host31: ib_srp: srp_qp_in_err_timer flushed reset - done host31: ib_srp: Sending CM DREQ failed host37: ib_srp: DREQ received - connection closed host32: ib_srp: srp_qp_in_err_timer called host32: ib_srp: srp_qp_in_err_timer flushed reset - done host32: ib_srp: Sending CM DREQ failed host38: ib_srp: DREQ received - connection closed host37: ib_srp: connection closed ib_srp: host37: add qp_in_err timer host38: ib_srp: connection closed ib_srp: host38: add qp_in_err timer host37: ib_srp: srp_qp_in_err_timer called host37: ib_srp: srp_qp_in_err_timer flushed reset - done host37: ib_srp: Sending CM DREQ failed host31: ib_srp: DREQ received - connection closed host38: ib_srp: srp_qp_in_err_timer called host38: ib_srp: srp_qp_in_err_timer flushed reset - done host38: ib_srp: Sending CM DREQ failed host32: ib_srp: DREQ received - connection closed host31: ib_srp: connection closed ib_srp: host31: add qp_in_err timer host32: ib_srp: connection closed ib_srp: host32: add qp_in_err timer host31: ib_srp: Sending CM DREQ failed host32: ib_srp: Sending CM DREQ failed Unable to handle kernel paging request at ffffffff882539ee RIP: [<ffffffff882539ee>] PGD 203027 PUD 205027 PMD 407f4f067 PTE 0 Oops: 0010 [1] PREEMPT SMP CPU 0 Modules linked in: mlx4_ib mlx4_core ib_uverbs ib_umad ib_mad ib_core ppdev parport_pc lp parport button ac battery tsdev dm_snapshot dm_mirror dm_mod loop i2c_i801 psmouse i2c_core floppy serio_raw pcspkr shpchp pci_hotplug evdev ext2 mbcache ide_cd cdrom piix ata_piix libata sd_mod generic ehci_hcd ide_core uhci_hcd e1000 qla2xxx firmware_class scsi_transport_fc scsi_mod thermal processor fan Pid: 0, comm: swapper Not tainted 2.6.18-6-clim-amd64 #1 RIP: 0010:[<ffffffff882539ee>] [<ffffffff882539ee>] RSP: 0018:ffffffff80597ef8 EFLAGS: 00010246 RAX: ffffffff80625fd8 RBX: ffff8103f12584f0 RCX: ffff8103f125a840 RDX: ffffffff80597f00 RSI: 1144ab87d59a6f6a RDI: ffff8103f12584f0 RBP: ffffffff805cc400 R08: 0000000000000000 R09: ffffffff80597ed8 R10: 00004131a65e699e R11: 0000000000000000 R12: 0000000000000102 R13: ffffffff882539ee R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffffffff80616000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: ffffffff882539ee CR3: 000000041ac7e000 CR4: 00000000000006e0 Process swapper (pid: 0, threadinfo ffffffff80624000, task ffffffff805144c0) Stack: ffffffff8028de7c ffffffff80597f00 ffffffff80597f00 ffff810001035400 0000000000000001 ffffffff80619110 000000000000000a 0000000000000000 ffffffff8020ffbc ffffffff805144c0 0000000000000046 ffffffff80597f78 Call Trace: <IRQ> [<ffffffff8028de7c>] run_timer_softirq+0x13b/0x1be [<ffffffff8020ffbc>] __do_softirq+0x52/0xcb [<ffffffff8025c31c>] call_softirq+0x1c/0x28 [<ffffffff8026990d>] do_softirq+0x2c/0x7d [<ffffffff8028a7a1>] irq_exit+0x3f/0x4c [<ffffffff80272d19>] smp_apic_timer_interrupt+0x3d/0x3f [<ffffffff80255a47>] mwait_idle+0x0/0x4a [<ffffffff8025bcba>] apic_timer_interrupt+0x66/0x6c <EOI> [<ffffffff80255a7d>] mwait_idle+0x36/0x4a [<ffffffff80247a78>] cpu_idle+0x92/0xc9 [<ffffffff80267617>] rest_init+0x3f/0x41 [<ffffffff8062e8bd>] start_kernel+0x241/0x246 [<ffffffff8062e288>] _sinittext+0x288/0x28c Code: Bad RIP value. RIP [<ffffffff882539ee>] RSP <ffffffff80597ef8> CR2: ffffffff882539ee <0>Kernel panic - not syncing: Aiee, killing interrupt handler! I'll go in and power-cycle this in a few hours and try again. Chris > > Bart. > _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
