> 
> OTOH it is quite possible that ipoib is corrupting an skb somehow so
> that when it gets reused by e1000, you see a crash.  The fact that you
> were running netperf on IB when e1000 crashed is somewhat suspicious.

Yes, exactly the lingering suspicions that I had. I ran several iterations 
of neteperf on e1000 and there were no crashes. So, I started looking at the
patch more closely. I think I am on to something now.

In ipoib_cm_handle_rx_wc() I see two things (I have not yet looked at the 
latest changes that you mentioned earlier today) :

1. Do not understand the usage and purpose of recv_count (something new that
you have introduced). Can you please explain. However, the suspicion being 
that if somehow the if clause is executed, the rx_ring gets freed and so 
all the skb pointers are bogus. I have commented out this segment of code.

2. The call to ipoib_cm_alloc_rx_skb() in ipoib_cm_handle_rx_wc() uses an
index value of 0 (hard coded) which is incorrect for no srq. I have changed
that to index instead.

I have been running this for some hours now; no crashes and no errors. This is 
using Slub. If I get a chance I will run with slab over the weekend and let you 
know of the results.

Pradeep

_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to