Potential lost receive WCs (was "[PATCH WIP 38/43]")

2015-07-24 Thread Chuck Lever
During some other testing I found that when a completion upcall returns to the provider leaving CQEs still on the completion queue, there is a non-zero probability that a completion will be lost. >>> >>> What does lost mean? >> >> Lost means a WC in the CQ is skipped by ib_poll_cq

Re: Potential lost receive WCs (was "[PATCH WIP 38/43]")

2015-07-24 Thread Jason Gunthorpe
On Fri, Jul 24, 2015 at 04:26:00PM -0400, Chuck Lever wrote: > Basically RPC work flow stopped because an RPC reply never > arrived. Oh, that is what I expect to see.. Remebmer the cq upcall is edge triggered, so if you leave stuff in the cq then you don't get another upcall until another CQE is a

Re: Potential lost receive WCs (was "[PATCH WIP 38/43]")

2015-07-29 Thread Chuck Lever
Hi Jason- On Jul 24, 2015, at 4:46 PM, Jason Gunthorpe wrote: > On Fri, Jul 24, 2015 at 04:26:00PM -0400, Chuck Lever wrote: >> Basically RPC work flow stopped because an RPC reply never >> arrived. > > Oh, that is what I expect to see.. Remebmer the cq upcall is edge > triggered, so if you l

Re: Potential lost receive WCs (was "[PATCH WIP 38/43]")

2015-07-29 Thread Jason Gunthorpe
On Wed, Jul 29, 2015 at 04:47:59PM -0400, Chuck Lever wrote: > Apparently this is true for some providers, and not for others, and > I misunderstood that when I put this together last year. Really? In kernel providers? Interesting, those are probably wrong... > > The idea that you can completely

Re: Potential lost receive WCs (was "[PATCH WIP 38/43]")

2015-07-29 Thread Chuck Lever
On Jul 29, 2015, at 5:15 PM, Jason Gunthorpe wrote: > On Wed, Jul 29, 2015 at 04:47:59PM -0400, Chuck Lever wrote: > >> Apparently this is true for some providers, and not for others, and >> I misunderstood that when I put this together last year. > > Really? In kernel providers? Interesting,

Re: Potential lost receive WCs (was "[PATCH WIP 38/43]")

2015-07-30 Thread Sagi Grimberg
The drivers we have that don't dequeue all the CQEs are doing something like NAPI polling and have other mechanisms to guarentee progress. Don't copy something like budget without copying the other mechanisms :) OK, that makes total sense. Thanks for clarifying. IIRC NAPI is soft-IRQ which c

Re: Potential lost receive WCs (was "[PATCH WIP 38/43]")

2015-07-30 Thread Chuck Lever
On Jul 30, 2015, at 3:00 AM, Sagi Grimberg wrote: > >>> The drivers we have that don't dequeue all the CQEs are doing >>> something like NAPI polling and have other mechanisms to guarentee >>> progress. Don't copy something like budget without copying the other >>> mechanisms :) >> >> OK, that

Re: Potential lost receive WCs (was "[PATCH WIP 38/43]")

2015-07-30 Thread Jason Gunthorpe
On Thu, Jul 30, 2015 at 10:00:08AM +0300, Sagi Grimberg wrote: > I still think that draining the CQ without respecting a quota is > wrong, even if driverX has a glitch there. Sure, but you can't just return from the CQ upcall after doing a budget and expect to be called again in the future. That