Re: [openib-general] ipath and current git woes
On Wed, Jan 31, 2007 at 04:42:25PM -0800, Robert Walsh wrote: > Jason Gunthorpe wrote: > >Has anyone been able to use ipath with the current latest git > >everything? > > We're working on getting this up to date right now. Give us a couple of > days and we'll have some new patches ready. OK. Things are working ok here using the same kernel and a 64 bit OFED 1.1 user space built in a chroot. That makes sense after reading Roland's analysis... Thanks, Jason ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] ipath and current git woes
> After applying that patch the user space consumers load but we got a > kernel oops when we tried to run a test here :< > > Unable to handle kernel NULL pointer dereference at 0918 RIP: > [] :ib_ipath:ipath_mmap+0x37/0x95 So I had a look at this, and it seems that there are two bugs that lead to this. First of all, libipathverbs gets a response from the kernel that has a 64-bit kernel address in it, and passes that back into a call to mmap(), where it uses that address as the offset. On 32-bit userspace, that chops off the high bits of the address and so the ipath kernel driver can't find the address in its list. So that explains why things don't work. And unfortunately the obvious fix for libipathverbs to use mmap64() instead of mmap() doesn't work, because on Linux, mmap64() is implemented with the mmap2 system call, which just allows the offset to be 12 bits bigger -- so it only gets you to 44 bits, which is not enough to reach a 64-bit kernel address (which is typically something like 0xc2072000). So you probably want to use something like a 32-bit serial number to point at your buffers or something like that. The oops is caused by another more serious problem. Obviously a buggy libipathverbs shouldn't be able to crash the kernel, because even if libipathverbs is fixed then malicious userspace could do the same thing too. It turns out that all the handling of pending_mmaps in the ipath driver is not really careful about userspace screwing it up. When userspace creates a CQ, the CQ buffer is added to the device-wide list of pending mmaps. Of course 32-bit userspace never succeeds in mapping that CQ, so it stays on the list (the only way it gets removed is if it is successfully mmapped). But then the destroy CQ operation sees that the mmap is pending, and frees the structure holding the information (without removing it from the list). And of course when that memory gets reused, then the pending mmap list gets corrupted, etc etc. Of course this is ugly to fix with the current data structure -- the list of pending mmaps is singly-linked, which means I have to walk the whole list to delete an entry. It also makes the list walking in ipath_mmap() is unnecessarily obfuscated. I think it's much better to just use the standard kernel list_head stuff if you're going to delete things from the middle of the list, rather than implementing your own singly-linked list. Sure it costs an extra pointer in each entry but no one ever has to worry about whether you're deleting things correctly, etc. There's some other silly stuff I noticed too, like: grep -n mmap_cnt *.[ch] /dev/null ipath_cq.c:232: ip->mmap_cnt = 0; ipath_mmap.c:63:ip->mmap_cnt++; ipath_mmap.c:70:ip->mmap_cnt--; ipath_qp.c:837: ip->mmap_cnt = 0; ipath_srq.c:162:ip->mmap_cnt = 0; ipath_verbs.h:178: unsigned mmap_cnt; umm -- no one ever looks at mmap_cnt (there's a kref too), so why keep it at all? So Qlogic guys -- please fix this up! - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] ipath and current git woes
Jason Gunthorpe wrote: > Has anyone been able to use ipath with the current latest git > everything? We're working on getting this up to date right now. Give us a couple of days and we'll have some new patches ready. Regards, Robert. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] ipath and current git woes
Has anyone been able to use ipath with the current latest git everything? The libipathverbs git repository seems to be missing a patch from Roland to make it work with libibverbs.2 in an email titled: [openib-general] [PATCH 3/7] libipathverbs: Update libipathverbs for new libibverbs driver handling After applying that patch the user space consumers load but we got a kernel oops when we tried to run a test here :< Unable to handle kernel NULL pointer dereference at 0918 RIP: [] :ib_ipath:ipath_mmap+0x37/0x95 PGD 3ad46067 PUD 3ad4f067 PMD 0 Oops: [1] CPU 0 Modules linked in: usb_storage skge bitrev crc32 ib_ipath k8temp hwmon forcedeth ehci_hcd ohci_hcd usbcore i2c_nforce2 i2c_core ib_uverbs ib_umad ib_mad ib_core Pid: 4009, comm: ib_rdma_lat Not tainted 2.6.20-rc4-gf3a2c3ee-dirty #6 RIP: 0010:[] [] :ib_ipath:ipath_mmap+0x37/0x95 RSP: :81003aaa3e88 EFLAGS: 00010002 RAX: 81003c434000 RBX: 0910 RCX: 1000 RDX: 002b3000 RSI: 81003bcab440 RDI: 81003bc2d840 RBP: 81003b7af918 R08: 81003aaa3f08 R09: 81003aa38c98 R10: 81003aa38c90 R11: 88074c3f R12: ffea R13: 81003ac496c0 R14: f7ee3000 R15: 1000 FS: () GS:8053(0063) knlGS:f7d8c6c0 CS: 0010 DS: 002b ES: 002b CR0: 8005003b CR2: 0918 CR3: 3aa2f000 CR4: 06e0 Process ib_rdma_lat (pid: 4009, threadinfo 81003aaa2000, task 81003c0495e0) Stack: 81003b7af918 001000fb ffea 80250151 81003be9f440 81003bc2d140 0028 ff99df20 81003b87d818 81003dc04840 Call Trace: [] do_mmap_pgoff+0x4d5/0x739 [] sys32_mmap2+0x76/0x9e [] ia32_sysret+0x0/0xa Code: 48 3b 7b 08 75 46 48 3b 53 10 75 40 8b 43 1c 48 39 c1 77 40 RIP [] :ib_ipath:ipath_mmap+0x37/0x95 RSP CR2: 0918 This is with a PCI-E qlogic card: :03:00.0 InfiniBand: Unknown device 1fc1:0010 (rev 01) Anyone have any clues? One notable thing is that I have a 32 bit user space and a 64 bit kernel. I'll try a 64 bit user space tomorrow in case there is some thing wrong with 32bit compatability... The last time we had these cards working was with OFED 1.1 on 64 bit FC4 using a linus kernel (2.6.18 I think).. Thanks, Jason ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general