I modified the cmpost program to have individual completion send/receive Q's.  The mcpost
server acts like a echo server, echoing back anything it receives. The client program keeps sending
the packets.

The test works fine upto around 600 connections. After 600 connections, I start to see ibv_post_send errors
with. I added some debug messages in libmthca/src/qp.c  where a check is made for wq_overflow. In fact
it is overflowing. I checked the code to make sure all the send descriptors are recovered with cq_poll operation. Also
the wc.status field is checked for any errors.
I am attaching the modified code .

bash-3.00$ svn info
Path: .
URL: https://openib.org/svn/gen2/trunk
Repository UUID: 21a7a0b7-18d7-0310-8e21-e8b31bdbf5cd
Revision: 3344
Node Kind: directory
Schedule: normal
Last Changed Author: jlentini
Last Changed Rev: 3344
Last Changed Date: 2005-09-08 16:39:25 -0700 (Thu, 08 Sep 2005)


To run the test compile the code

cc -o cmpost cmpost.c -libcm -libverbs -libat

$ cmpost -n 1024        <=== as server

$ cmpost -c  -n 1024 -l <dest_lid> -g <dest_guid>

After sometime you start seeing post_send errors. On my system upto 600 connections work fine.


When running the test I saw panics couple of time. But difficult to reproduce

ernel BUG at include/asm/spinlock.h:149!
invalid operand: 0000 [#1]
SMP
Modules linked in: nfs nfsd exportfs lockd autofs4 sunrpc uhci_hcd ehci_hcd hw_random e1000 ext3 jbdsd_mod
CPU:    1
EIP:    0060:[<c02fef92>]    Not tainted VLI
EFLAGS: 00010086   (2.6.13)
EIP is at _spin_lock_irqsave+0x47/0x51
eax: 00000011   ebx: 00000282   ecx: c035950c   edx: 00000082
esi: f7d82010   edi: 00000000   ebp: f6792c80   esp: c1a33ed0
ds: 007b   es: 007b   ss: 0068
Process ib_mad1 (pid: 308, threadinfo=c1a32000 task=f7e3c540)
Stack: c03123ee c0276963 f6792c80 f7d82010 c0276963 f79a6adc f7974b00 00000001
       c1a33f0c f7912e00 f7df2000 f7df4200 c1a33f0c 00000292 c0276b96 f6792c80
       00000000 00000000 00000000 b93e2c00 00000128 00000296 00000402 00000001
Call Trace:
 [<c0276963>] ib_mad_send_done_handler+0x72/0x11e
 [<c0276963>] ib_mad_send_done_handler+0x72/0x11e
 [<c0276b96>] ib_mad_completion_handler+0x80/0x8d
 [<c0120000>] wait_noreap_copyout+0x55/0xbe
 [<c012bd0d>] worker_thread+0x1b0/0x23a
 [<c02fdb43>] schedule+0x5d3/0xbdf
 [<c0276b16>] ib_mad_completion_handler+0x0/0x8d
 [<c011942d>] default_wake_function+0x0/0xc
 [<c011942d>] default_wake_function+0x0/0xc
 [<c012bb5d>] worker_thread+0x0/0x23a
 [<c012f700>] kthread+0x8a/0xb2
 [<c012f676>] kthread+0x0/0xb2
 [<c0101cf9>] kernel_thread_helper+0x5/0xb
Code: 00 00 74 01 fb f3 90 80 3e 00 7e f9 fa eb e8 83 c4 08 89 d8 5b 5e c3 8b 44 24 10 c7 04 24 ee 23 31 c0 89 44 24 04 e8 2f e7 e1 ff <0f> 0b 95 00 39 1c 31 c0 eb c2 53 89 c3 83 ec 08 fa 81 78 04 ad



-Viswa



Attachment: cmpost.c
Description: Binary data

_______________________________________________
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to