On Apr 14, 2010, at 3:23 AM, Sasha Khapyorsky wrote:
On 13:44 Tue 13 Apr , Ira Weiny wrote:
This changes the logic. "num_smps_outstanding" is NOT the number
on the wire, but it appears you have made it so.
Actually yes, it made it so.
This is the number which will cause process_smp_queue to continue
being called.
If you are going to do this I think you need to change
process_mads as well as process_one_recv. We discussed
process_one_recv in the error case.
process_one_recv() failure breaks the loop anyway.
What were you trying to fix?
Ok, I think I see. We should move cl_qmap_insert to after a
successful umad_send and putting total_smps here is ok. But
num_smps_outstanding should be put back I think.
But then it blocks process_mads() to loop forever after single
send_smp() failure (with all empty queues and umad_recv() running
without timeout).
But moving the cl_qmap_insert below the send call fixes that.
However, it does cause a memory leak because the smp is no longer in
the smp_queue_head list. It needs to be put back on that list to be
retried with a limit on the retries (to prevent what you are saying
here.) Are you seeing a hang?
I have seen a hang when running "iblinkinfo -S <guid>". However, the
problem is not with send_smp. I am seeing the mad going on the wire
and returning (according to madeye) but I am not receiving it from
umad_recv. I don't know why. If I run with 1 outstanding mad it
works???
Ira
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html