On 02/17/2011 03:34 PM, Mike Marciniszyn wrote:
We too have had installability, perhaps associated with these lists, but it has 
been difficult to diagnose.

We duplicate it by forcing dropped packets and seeing the QP's come/go at the 
rate of 1000s a second because of the 0 rnr_retry and retry counts.  This 
analysis is in line behind other bug investigations.

The list patch was a result of code inspection.

Ralph's patch predates me.   His appears to move some list inserts to before a 
post, I'm assuming since an intervening completion could occur, but I haven't 
studied it in detail to see if any locking prevents it.

I would be interested in Pradeep's test (OS, Hardware, scripts...)

As described in one of my previous mails (in the url given below):

The test is basically to run netperf in a loop from several client machines to a server. The server is unloading and reloading the modules (basically do an "openibd restart") at random times. The crashes recreate in several hours. I used some of the large IBM servers. They did not seem to recreate on say smaller blades.


Mike

-----Original Message-----
From: rol...@purestorage.com [mailto:rol...@purestorage.com] On Behalf Of 
Roland Dreier
Sent: Thursday, February 17, 2011 6:24 PM
To: Pradeep Satyanarayana
Cc: Mike Marciniszyn; linux-rdma@vger.kernel.org; Gary Leshner; Tom Elken
Subject: Re: [PATCH] IPoIB: fix faulty list maintenance in path and neigh list

Yes, that is the crux of the issue. I had missed that ipoib_mcast_free() is
only called on remove_list.

So do we have any idea of what this patch is fixing?  Any thoughts from
the qlogic people involved in this patch?

While we are discussing IPoIB issues, how about the two other issues that
I illustrated previously. One was Ralph Campbell's patch for fixes to
ipoib_cm_start_rx_drain() and my questions wrt ipoib_neigh_cleanup()?

I do need to take a good look at Ralph's patches to try and understand them
and I hope apply them.  Not sure I still have any link to your questions though.

Here is the link to the detailed mail I sent:
http://www.spinics.net/lists/linux-rdma/msg07352.html

Thanks
Pradeep
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to