On Wed, Dec 08, 2010 at 04:55:22PM +0200, Nir Muchtar wrote: > On Tue, 2010-12-07 at 14:29 -0700, Jason Gunthorpe wrote: > > > What you've done in your v2 patch won't work if the table you are > > dumping is too large, once you pass sk_rmem_alloc for the netlink > > socket it will deadlock. The purpose of dump_start is to avoid that > > deadlock. (review my past messages on the subject) > > > > Your v1 patch wouldn't deadlock, but it would fail to dump with > > ENOMEM, and provides an avenue to build an unprivileged kernel OOM > > DOS. > > > > The places in the kernel that don't use dump_start have to stay under > > sk_rmem_alloc. > > > > Jason > > Sorry, I still need some clarifications... > When you say deadlocks, do you mean when calling malloc with a lock or > when overflowing a socket receive buffer? > For the second case, when we use netlink_unicast, the skbuff is sent and > freed. It is transferred to the userspace's socket using netlink_sendskb > and accumulated in its recv buff. > > Are you referring to a deadlock there? I still fail to see the issue. > Why would the kernel socket recv buff reach a limit? Could you please > elaborate?
Netlink is all driven from user space syscalls.. so it looks like sendmsg() [..] ibnl_rcv_msg cma_get_stats [..] ibnl_unicast [..] netlink_attachskb (now we block on the socket recv queue once it fills) The deadlock is that userspace is sitting in sendmsg() while the kernel is sleeping in netlink_attachskb waiting for the recvbuf to empty. User space cannot call recvmsg() while it is in blocked in sendmsg() so it all goes boom. Even if cma_get_stats was executed from a kernel thread and ibnl_rcv_msg returned back to userspace you still hold the dev_list mutex while calling ibnl_unicast, which can sleep waiting on userspace, which creates an easy DOS against the RDMA CM (I can write a program that causes the kernel the hold the mutx indefinitely). You can't hold the mutex while sleeping for userspace, so you have to unlock it. If you unlock it you have to fixup your position when you re-lock it. If you can fixup your position then you can use dump_start. I don't see malloc being a concern anywhere in what you've done... Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html