Re: [openib-general] ipath and current git woes

2007-02-01 Thread Jason Gunthorpe
On Wed, Jan 31, 2007 at 04:42:25PM -0800, Robert Walsh wrote:
> Jason Gunthorpe wrote:
> >Has anyone been able to use ipath with the current latest git
> >everything?
> 
> We're working on getting this up to date right now.  Give us a couple of 
> days and we'll have some new patches ready.

OK. Things are working ok here using the same kernel and a 64 bit OFED
1.1 user space built in a chroot. That makes sense after reading
Roland's analysis...

Thanks,
Jason

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ipath and current git woes

2007-02-01 Thread Roland Dreier
 > After applying that patch the user space consumers load but we got a
 > kernel oops when we tried to run a test here :<
 > 
 > Unable to handle kernel NULL pointer dereference at 0918 RIP: 
 >  [] :ib_ipath:ipath_mmap+0x37/0x95

So I had a look at this, and it seems that there are two bugs that
lead to this.

First of all, libipathverbs gets a response from the kernel that has a
64-bit kernel address in it, and passes that back into a call to
mmap(), where it uses that address as the offset.  On 32-bit
userspace, that chops off the high bits of the address and so the
ipath kernel driver can't find the address in its list.

So that explains why things don't work.  And unfortunately the obvious
fix for libipathverbs to use mmap64() instead of mmap() doesn't work,
because on Linux, mmap64() is implemented with the mmap2 system call,
which just allows the offset to be 12 bits bigger -- so it only gets
you to 44 bits, which is not enough to reach a 64-bit kernel address
(which is typically something like 0xc2072000).  So you
probably want to use something like a 32-bit serial number to point at
your buffers or something like that.

The oops is caused by another more serious problem.  Obviously a buggy
libipathverbs shouldn't be able to crash the kernel, because even if
libipathverbs is fixed then malicious userspace could do the same
thing too.

It turns out that all the handling of pending_mmaps in the ipath
driver is not really careful about userspace screwing it up.  When
userspace creates a CQ, the CQ buffer is added to the device-wide list
of pending mmaps.  Of course 32-bit userspace never succeeds in
mapping that CQ, so it stays on the list (the only way it gets removed
is if it is successfully mmapped).  But then the destroy CQ operation
sees that the mmap is pending, and frees the structure holding the
information (without removing it from the list).  And of course when
that memory gets reused, then the pending mmap list gets corrupted,
etc etc.

Of course this is ugly to fix with the current data structure -- the
list of pending mmaps is singly-linked, which means I have to walk the
whole list to delete an entry.  It also makes the list walking in
ipath_mmap() is unnecessarily obfuscated.  I think it's much better to
just use the standard kernel list_head stuff if you're going to delete
things from the middle of the list, rather than implementing your own
singly-linked list.  Sure it costs an extra pointer in each entry but
no one ever has to worry about whether you're deleting things
correctly, etc.

There's some other silly stuff I noticed too, like:

grep -n mmap_cnt *.[ch] /dev/null
ipath_cq.c:232: ip->mmap_cnt = 0;
ipath_mmap.c:63:ip->mmap_cnt++;
ipath_mmap.c:70:ip->mmap_cnt--;
ipath_qp.c:837: ip->mmap_cnt = 0;
ipath_srq.c:162:ip->mmap_cnt = 0;
ipath_verbs.h:178:  unsigned mmap_cnt;

umm -- no one ever looks at mmap_cnt (there's a kref too), so why keep
it at all?

So Qlogic guys -- please fix this up!

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ipath and current git woes

2007-01-31 Thread Robert Walsh
Jason Gunthorpe wrote:
> Has anyone been able to use ipath with the current latest git
> everything?

We're working on getting this up to date right now.  Give us a couple of 
days and we'll have some new patches ready.

Regards,
  Robert.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] ipath and current git woes

2007-01-31 Thread Jason Gunthorpe
Has anyone been able to use ipath with the current latest git
everything?

The libipathverbs git repository seems to be missing a patch from
Roland to make it work with libibverbs.2 in an email titled:

[openib-general] [PATCH 3/7] libipathverbs: Update libipathverbs for
new libibverbs driver handling

After applying that patch the user space consumers load but we got a
kernel oops when we tried to run a test here :<

Unable to handle kernel NULL pointer dereference at 0918 RIP: 
 [] :ib_ipath:ipath_mmap+0x37/0x95
PGD 3ad46067 PUD 3ad4f067 PMD 0 
Oops:  [1] 
CPU 0 
Modules linked in: usb_storage skge bitrev crc32 ib_ipath k8temp hwmon 
forcedeth ehci_hcd ohci_hcd usbcore i2c_nforce2 i2c_core ib_uverbs ib_umad 
ib_mad ib_core
Pid: 4009, comm: ib_rdma_lat Not tainted 2.6.20-rc4-gf3a2c3ee-dirty #6
RIP: 0010:[]  [] 
:ib_ipath:ipath_mmap+0x37/0x95
RSP: :81003aaa3e88  EFLAGS: 00010002
RAX: 81003c434000 RBX: 0910 RCX: 1000
RDX: 002b3000 RSI: 81003bcab440 RDI: 81003bc2d840
RBP: 81003b7af918 R08: 81003aaa3f08 R09: 81003aa38c98
R10: 81003aa38c90 R11: 88074c3f R12: ffea
R13: 81003ac496c0 R14: f7ee3000 R15: 1000
FS:  () GS:8053(0063) knlGS:f7d8c6c0
CS:  0010 DS: 002b ES: 002b CR0: 8005003b
CR2: 0918 CR3: 3aa2f000 CR4: 06e0
Process ib_rdma_lat (pid: 4009, threadinfo 81003aaa2000, task 
81003c0495e0)
Stack:  81003b7af918 001000fb ffea 80250151
 81003be9f440 81003bc2d140 0028 ff99df20
   81003b87d818 81003dc04840
Call Trace:
 [] do_mmap_pgoff+0x4d5/0x739
 [] sys32_mmap2+0x76/0x9e
 [] ia32_sysret+0x0/0xa


Code: 48 3b 7b 08 75 46 48 3b 53 10 75 40 8b 43 1c 48 39 c1 77 40 
RIP  [] :ib_ipath:ipath_mmap+0x37/0x95
 RSP 
CR2: 0918

This is with a PCI-E qlogic card: 
:03:00.0 InfiniBand: Unknown device 1fc1:0010 (rev 01)

Anyone have any clues?

One notable thing is that I have a 32 bit user space and a 64 bit
kernel. I'll try a 64 bit user space tomorrow in case there is some
thing wrong with 32bit compatability...

The last time we had these cards working was with OFED 1.1 on 64 bit
FC4 using a linus kernel (2.6.18 I think)..

Thanks,
Jason

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general