Re: [openib-general] WC Error code question

2007-01-15 Thread Steven Wooding

Sorry guys. Fixed the problem. I gave it an incorrect pointer for the MR.

Thanks for replying.

Steve.

On 15/01/07, Devesh Sharma [EMAIL PROTECTED] wrote:


On which side you are getting this error?

If its at initiator side then its bad lkey if its on other side then
you have bad rkey.

On 1/11/07, Dotan Barak [EMAIL PROTECTED] wrote:
 Steven Wooding wrote:
  Hi,
 
  I'm getting an IBV_WC_LOC_ACCESS_ERR when getting a work completion
  item related to an RDMA with ImmData transfer.
 
  What does this error actually mean?
 
  Thanks,
 
  Steve.
 in which side do you get this completion?
 My guess is that you are trying to send an RDMA Write with immediate
 which has bad rkey
 (rkey which doesn't match the remote address), and you get this status
 at the responder side.


 If you still have this problem, i will need some more info ..

 Dotan

 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general

 To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] WC Error code question

2007-01-11 Thread Steven Wooding

Hi,

I'm getting an IBV_WC_LOC_ACCESS_ERR when getting a work completion item
related to an RDMA with ImmData transfer.

What does this error actually mean?

Thanks,

Steve.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Quick RDMA Write with Immediate Data Item question

2006-05-12 Thread Steven Wooding
Hi,

Just a quick question I can't seem to find the answer to.

With an RDMA Write with Immediate Data Item transfer, in the CQE at the destination (the thing that has the Immediate Data it), does the CQE also contain the memory location where the message just got written too? 
i.e.does the scatter/gather buffer member of the work completion structure get filled in at all? Or do you just get the ImmdDataItem?

Thanks for your help.

Steve Wooding.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Quick RDMA Write with Immediate Data Item question

2006-05-12 Thread Steven Wooding
The work completion will also include the length of the RDMA write.
This leads me to another question I had about memory protection for RDMA writes. What's the best way to stop the sender accidentally writing a larger message than they should of, if I didn't want to use a different rkey for each message (as setting up rkeys is expensive and too inflexible for my application).
Any thoughts?Thanks for you answers to my original question. I thought this was the case. Just could find it written down anywhere. My system is unavailable as the moment, so I couldn't just do a quick test.
Regards,Steve.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Unknown symbol ip_dev_find (2.6.15.1 kernel)

2006-01-17 Thread Steven Wooding
Hi,

I was updating my kernel and openib drivers (haven't done so for a
couple of months) and I've got stuck on the following problem.

When you do make modules_install you can the following warnings at the end:

WARNING: /lib/modules/2.6.15.1/kernel/drivers/infiniband/ulp/sdp/ib_sdp.ko
needs unknown symbol ip_dev_find
WARNING: /lib/modules/2.6.15.1/kernel/drivers/infiniband/core/ib_at.ko
needs unknown symbol ip_dev_find
WARNING: /lib/modules/2.6.15.1/kernel/drivers/infiniband/core/ib_addr.ko
needs unknown symbol ip_dev_find

I tried to reboot anyway, but these modules do indeed fail to load due
to this problem.

I notice this was fixed for 2.6.14 with a patch that exported the
ip_dev_find symbol. Do we need one for 2.6.15.1 or have I missed a
step out of my installation process?

Thanks for the help.

Cheers,


Steve.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] UC connection server

2005-12-06 Thread Steven Wooding
Hi,

I wonder if anybody could give some advice about an idea for making UC
connections with a device that doesn't support a CM (it's a
custom-made embedded device).

The idea is to use a PC-based stack that does use the standard CM
interface. I can then make a connection with that. The PC then gets
the info about the real QP from the embedded device via some
proprietary method.

The problem with this idea is that in the standard CM protocol, it
forms the connection using the LID that the REQ was sent to. But I
need to change this to the LID of the embedded device. I've looked at
doing path migration which looked like it might do this, but I could
do with some advice.

For example, in path migration, does the original connection remain?

Any other suggests are welcome (I know I could do an Ethernet
connection with the PC and exchange the info that way, but that's a
last resort at the moment).

Thanks for your time.

Cheers,

Steve.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Relaying data through an HCA card

2005-12-06 Thread Steven Wooding
Hi,

I have the requirement for a PC that acts as a data relay. I basically
need to pass data from an input QP connection to an output QP
connection on the PC.

Could this be done entirely within the HCA card, without touching
system memory or using a userspace application to supervise the data?
This is so the data throughput remains as high as possible.

Thanks for your time.

Cheers,


Steve.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] UC connection server

2005-12-06 Thread Steven Wooding
Hi Fabain,

 I think what Steve wants to do is issue a REQ, send it to the PC, but
 have the path record go to his embedded device (which has a different
 LID).  The CM protocol supports this, but the implementation of the CM
 looks at the path record to determine the destination of the CM MADs.
 Supporting this would require some way for the user to set the target
 of the CM MADs independently of the path information contained in the
 REQ.  Adding an optional extra path record for the CM path might do
 the trick.

That's it in a nutshell really. I don't know how useful such a feature
would be in the wider IB community. We've been forced into this
position by a vendor not following the standard.

I wanted to check with you guys whether there was a quick solution
that was ready to fly. It seems that this feature would need to go
into the openib drivers, which we don't have time or money to do. We
do have a backup solution from the vendor, but its non-standard and I
was trying to keep our side of the interface use the standard.

Anyway, thanks for your suggests. It's all useful info.

Regards,


Steve.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Missing ib_al.h file?

2005-10-30 Thread Steven Wooding
Thanks Hal.

That makes sense. I'll give that a go.

Cheers,

Steve.

--- Hal Rosenstock [EMAIL PROTECTED] wrote:

 Hi,
  
 That's an IBAL file (gen1). You need to build with
 VENDOR=openib to use this which should not need that
 file.
  
 -- Hal
 






___ 
Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with voicemail 
http://uk.messenger.yahoo.com
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Missing ib_al.h file?

2005-10-29 Thread Steven Wooding
Hi,

I am trying to get my app to use serviceRecords with
the SA.

Anyway, my problem is that a file called ib_al.h seems
to be missing. It should be in
trunk/...osm/include/iba/ directory, along with
ib_types.h. The file osm_vendor_al.h includes it,
which is included by osm_vendor.h. I notice that file
selects the vendor. I have Mellonox IB cards, so have
I got that right?

Thanks,

Steve.



___ 
How much free photo storage do you get? Store your holiday 
snaps for FREE with Yahoo! Photos http://uk.photos.yahoo.com
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Support for UC connections using the CM API?

2005-10-24 Thread Steven Wooding
Hi Sean,

Your patch works if I put the IB_QP_MAX_QP_RD_ATOMIC
mask into the UC (defualt) QP attr mask. Otherwise
fine.

Thanks,

Steve.
--- Sean Hefty [EMAIL PROTECTED] wrote:

 I had a look at where the mask is set in cm.c
 (cm_init_qp_rtr_attr() and cm_init_qp_rts_attr())
 but
 I was unsure how to make the mask depend on the QP
 type. Maybe you have a better idea of how to do
 this.
 
 Here's a patch (edited by hand, so let me know if
 there's any issue
 applying it) that should permit UC connections over
 the CM.  I was able to
 test this using cmpost.
 
 Signed-off-by: Sean Hefty [EMAIL PROTECTED]
 
 
 Index: cm.c

===
 --- cm.c  (revision 3830)
 +++ cm.c  (working copy)
 @@ -135,6 +135,7 @@
   __be64 tid;
   __be32 local_qpn;
   __be32 remote_qpn;
 + enum ib_qp_type qp_type;
   __be32 sq_psn;
   __be32 rq_psn;
   int timeout_ms;
 @@ -926,6 +923,7 @@
   cm_id_priv-responder_resources =
 param-responder_resources;
   cm_id_priv-retry_count = param-retry_count;
   cm_id_priv-path_mtu = param-primary_path-mtu;
 + cm_id_priv-qp_type = param-qp_type;
  
   ret = cm_alloc_msg(cm_id_priv, cm_id_priv-msg);
   if (ret)
 @@ -1320,6 +1314,7 @@
   cm_req_get_primary_local_ack_timeout(req_msg);
   cm_id_priv-retry_count =
 cm_req_get_retry_count(req_msg);
   cm_id_priv-rnr_retry_count =
 cm_req_get_rnr_retry_count(req_msg);
 + cm_id_priv-qp_type = cm_req_get_qp_type(req_msg);
  
   cm_format_req_event(work, cm_id_priv,
 listen_cm_id_priv-id);
   cm_process_work(cm_id_priv, work);
 @@ -3079,10 +3035,10 @@
   case IB_CM_ESTABLISHED:
   *qp_attr_mask = IB_QP_STATE | IB_QP_ACCESS_FLAGS
 |
   IB_QP_PKEY_INDEX | IB_QP_PORT;
 - qp_attr-qp_access_flags = IB_ACCESS_LOCAL_WRITE;
 + qp_attr-qp_access_flags = IB_ACCESS_LOCAL_WRITE
 |
 +IB_ACCESS_REMOTE_WRITE;
   if (cm_id_priv-responder_resources)
 - qp_attr-qp_access_flags |=
 IB_ACCESS_REMOTE_WRITE |
 - IB_ACCESS_REMOTE_READ;
 + qp_attr-qp_access_flags |=
 IB_ACCESS_REMOTE_READ;
   qp_attr-pkey_index = cm_id_priv-av.pkey_index;
   qp_attr-port_num =
 cm_id_priv-av.port-port_num;
   ret = 0;
 @@ -3112,14 +3068,18 @@
   case IB_CM_MRA_REP_RCVD:
   case IB_CM_ESTABLISHED:
   *qp_attr_mask = IB_QP_STATE | IB_QP_AV |
 IB_QP_PATH_MTU |
 - IB_QP_DEST_QPN | IB_QP_RQ_PSN |
 - IB_QP_MAX_DEST_RD_ATOMIC | IB_QP_MIN_RNR_TIMER;
 + IB_QP_DEST_QPN | IB_QP_RQ_PSN;
   qp_attr-ah_attr = cm_id_priv-av.ah_attr;
   qp_attr-path_mtu = cm_id_priv-path_mtu;
   qp_attr-dest_qp_num =
 be32_to_cpu(cm_id_priv-remote_qpn);
   qp_attr-rq_psn =
 be32_to_cpu(cm_id_priv-rq_psn);
 - qp_attr-max_dest_rd_atomic =
 cm_id_priv-responder_resources;
 - qp_attr-min_rnr_timer = 0;
 + if (cm_id_priv-qp_type == IB_QPT_RC) {
 + *qp_attr_mask |= IB_QP_MAX_DEST_RD_ATOMIC |
 +  IB_QP_MIN_RNR_TIMER;
 + qp_attr-max_dest_rd_atomic =
 + cm_id_priv-responder_resources;
 + qp_attr-min_rnr_timer = 0;
 + }
   if (cm_id_priv-alt_av.ah_attr.dlid) {
   *qp_attr_mask |= IB_QP_ALT_PATH;
   qp_attr-alt_ah_attr =
 cm_id_priv-alt_av.ah_attr;
 @@ -3148,14 +3108,17 @@
   case IB_CM_REP_SENT:
   case IB_CM_MRA_REP_RCVD:
   case IB_CM_ESTABLISHED:
 - *qp_attr_mask = IB_QP_STATE | IB_QP_TIMEOUT |
 IB_QP_RETRY_CNT |
 - IB_QP_RNR_RETRY | IB_QP_SQ_PSN |
 - IB_QP_MAX_QP_RD_ATOMIC;
 - qp_attr-timeout = cm_id_priv-local_ack_timeout;
 - qp_attr-retry_cnt = cm_id_priv-retry_count;
 - qp_attr-rnr_retry = cm_id_priv-rnr_retry_count;
 + *qp_attr_mask = IB_QP_STATE | IB_QP_SQ_PSN;
   qp_attr-sq_psn =
 be32_to_cpu(cm_id_priv-sq_psn);
 - qp_attr-max_rd_atomic =
 cm_id_priv-initiator_depth;
 + if (cm_id_priv-qp_type == IB_QPT_RC) {
 + *qp_attr_mask |= IB_QP_TIMEOUT | IB_QP_RETRY_CNT
 |
 +  IB_QP_RNR_RETRY |
 +  IB_QP_MAX_QP_RD_ATOMIC;
 + qp_attr-timeout =
 cm_id_priv-local_ack_timeout;
 + qp_attr-retry_cnt = cm_id_priv-retry_count;
 + qp_attr-rnr_retry =
 cm_id_priv-rnr_retry_count;
 + qp_attr-max_rd_atomic =
 cm_id_priv-initiator_depth;
 +   

RE: [openib-general] Support for UC connections using the CM API?

2005-10-22 Thread Steven Wooding

--- Sean Hefty [EMAIL PROTECTED] wrote:
 
 Here's a patch (edited by hand, so let me know if
 there's any issue
 applying it) that should permit UC connections over
 the CM.  I was able to
 test this using cmpost.

Thanks for the quick response. I'll try this patch out
next week (Monday), but it looks good.

Regards,

Steve.





___ 
How much free photo storage do you get? Store your holiday 
snaps for FREE with Yahoo! Photos http://uk.photos.yahoo.com
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Support for UC connections using the CM API?

2005-10-19 Thread Steven Wooding

Hi there,

I was wondering whether the CM API currently (I'm currently using svn 3470) supports establishing UC connections? I have the RC transport type working fine using the CM.

I've found that somewhere between me sending and receiving the REQ message, the qp_type variable changes from UC to RC (3 to 2). I've checked the value just before the user-space call ib_cm_send_req() and just after receiving the CM event that contains the REQ, so I believe I've ruled out a bug in my app. So this must mean it is switched some where in kernel space driver.

I had a quick look in the kernel space code, but I'm not really sure what's going on. Could be a bug is either cm_req_get_qp_type() or cm_req_set_qp_type() in cm_msgs.h.

Anyway, perhaps you could confirm whether the CM supports UC and if so, look in to this possible bug.

Thank you very much.

Regards,

Steve.
		How much free photo storage do you get? Store your holiday snaps for FREE with Yahoo! Photos. Get Yahoo! 
Photos___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Support for UC connections using the CM API?

2005-10-19 Thread Steven Wooding
Hi Sean,

I've modified cmpost to try to UC and get similar
results as my app. The changes I made to cmpost.c was
to change RC to UC (two places) and remove the
following req and rep parameters as I beleive these
are not required for UC:

req.retry_count = 5;
rep.rnr_retry_count = req-rnr_retry_count;

I also put in some print statements to observe the
value of qp_type. Here are the results:

Client-side output:

starting client
req.qp_type = 3
Received REJ
Error sending REQ or REP
receiving data transfers
initiating data transfers
data transfers complete
test complete

(note to anybody else reading this thread; the last
four lines do not mean the data got transferred
successfully, as no error checking is done on the
connect_events() function)


Server-side output:

starting server
event-param.req_rcvd.qp_type = 2
failed to modify QP to RTR: 22
failing connection request
initiating data transfers
receiving data transfers
data transfers complete
test complete

So basically the server-side thinks the QP being
requested is an RC, not the required UC.

Hope this helps Sean.

Cheers,


Steve.


--- Sean Hefty [EMAIL PROTECTED] wrote:

 I'll look into this more.  If you have time, you
 could change cmpost and ucmpost 
 to use UC and run those.  This would help narrow
 down if the issue is in the 
 kernel, userspace, or the application.  (I'm testing
 some MAD changes, and will 
 try this myself once I'm done testing.)
 





___ 
Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with voicemail 
http://uk.messenger.yahoo.com
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Strange output when calling ibv_poll_cq function

2005-10-19 Thread Steven Wooding
Sorry Roland,

My fault. I had the wrong access flags set when I
registered the memory region.

Thanks for reply though.

Cheers,

Steve.


--- Roland Dreier [EMAIL PROTECTED] wrote:

 However, reading the completion contents, I see that
 it is a receive
 completion with status local protection error.  So
 something is
 wrong with the receive request you posted -- the
 address is out of
 bounds, you used the wrong L_Key, or something like
 that.





___ 
Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with voicemail 
http://uk.messenger.yahoo.com
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Strange output when calling ibv_poll_cq function

2005-10-18 Thread Steven Wooding
Hi,

I got a strange problem that I can't figure out. Turning kernel debugging on might help, but I thought I'd run it by the mailing list to see if anyone has come across this before.

When calling the ibv_poll_cq() function I get the following printed to standard output:

 [ 0] 00620406
 [ 4] 1500
 [ 8] 0200
 [ c] 0004
 [10] 0433
 [14] 
 [18] 0002
 [1c] fe10

The output is not exactly the same each time. The indexes in square brackets are the same, but the eight digit number field changes (though not much).

Also, the data I'm sending does not arrive (though this could be some other problem with my app). I'm using svn 3470 on x86_64 platform.

Thanks,

Steve.
		Yahoo! Messenger 
 NEW - crystal clear PC to PC 
calling worldwide with voicemail 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Compiling an application that calls ib_cm_* functions

2005-10-11 Thread Steven Wooding
Hi,

I wonder if someone could help me with compiling my IB application?The problem is when I go to link my program I get all of the ib_cm* function callscome up as "undefined reference". Also dlist_start and _dlist_mark_move (dlist_next in the code).

Here is my linking command:
icpc -o ib_comms_test1 ib_comms_test1.o ib_queue_pair.o ib_comms_manager.o -L/usr/local/lib -libcm -libat -libverbs -libumad -lsysfs -ldl

Get the same result when using g++
The cmpost.c example compiles fine. I've tried to see what it is doing. It seems to link-in the libibcm.la file,but when I try this with icpc or g++, they say they cannot recogised the file type.

Maybe someone can spot the simple mistake I'm making.

Cheers,

Steve. 
		To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre.___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Portability of AIO

2005-06-10 Thread Steven Wooding
Hi,

Does anybody know of any plans to make the API of libaio (Native Linux AIO) portable (ie make it the standard AIO API)?

Thanks,


Steve.
		Yahoo! Messenger
 NEW - crystal clear PC to PC
calling worldwide with voicemail
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] uverbs performance; ibv_pingpong poll vs sleep

2005-06-10 Thread Steven Wooding

Thanks. I'll give pipeling ago in my app.

Cheers,

Steve.

Roland Dreier [EMAIL PROTECTED] wrote:
Steven I have compaired the data rates using ibv_pingpong withSteven and without the -e option. Therefore, using polling andSteven waiting the CQ events (sleeping).Steven Is there any way to trade off the data rate with the CPUSteven usage (I was thinking of some timeout from polling).I suppose you could have some sort of adaptive polling scheme thatspins polling for a while and then sleeps waiting for an event.However, as Michael said, it's probably better to use pipelining andpost multiple send work requests. This hides the latency of getting acompletion event by keeping the HCA busy.- R.

		How much free photo storage do you get? Store your holiday snaps for FREE with Yahoo! Photos. Get Yahoo! 
Photos___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] uverbs performance; ibv_pingpong poll vs sleep

2005-06-08 Thread Steven Wooding
Hi,

I wonder if anyone could help me with uverbs performance.

I have compaired the data rates using ibv_pingpong with and without the -e option. Therefore, using polling and waiting the CQ events (sleeping).

For a data message size of 16K I get the following results:

Poll: 718 MB/s with CPU at 100%
Sleep: 479 MB/s with CPU at 16%

I suppose the decrease in throughput is due to the time it takes to get the CQ event.

Is there any way to trade off the data rate with the CPU usage (I was thinking of some timeout from polling).

Any suggests would be very welcome.

Regards,

Steve.
		How much free photo storage do you get? Store your holiday snaps for FREE with Yahoo! Photos. Get Yahoo! 
Photos___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH][SDP] AIO buffer corruption

2005-05-06 Thread Steven Wooding
Libor,
Here are the details of my setup:
HCA: PCI-Express MT25208 (MT23108 compat mode) with 256MB memory
HCA Fireware: 4.6.2
Command: ttcp.aio.c.x -r -fM -a 8 -l 2048
  ttcp.aio.c.x -t -fM -a 8 -n 1 -l 2048 {ip of 
receiver}

Syslog: Get the following message with sucessful transfer, but not with 
-32/-104 error
Kernel: ERR: IOCB 0 cancel 0 flag 00e4 size {-l}:0:{-l}

SDP Debug: Get the following message with all ttcp.aio.c.x runs
Kernel: WARN: 9 0101:3b01 CM state 0 event 9 error -2
I'm away for two weeks, so I'll get back to you with any further info 
you require when I get back.

Cheers,
Steve Wooding.
Libor Michalek wrote:
On Mon, May 02, 2005 at 03:48:51PM +0100, Steven Wooding wrote:
 

Hello Libor,
I've tried your patch, but unfortunately it made no difference to 
the -32/-104 errors I get. I have observed the following features, 
which may help you diagnose my problem:

Platform: 64 bit
OS: RHEL 4
Kernel: 2.6.11.6
OpenIB gen2: 2225
SM: IO5000 switch
   

Steve,
 
 I'm not seeing this issue on my x86_64 systems. Is there anything
in the syslog on either system? If you build SDP with debug, there
will be messages in the log, any errors? Which HCA are you using, and
which firmware? Also, can you send the exact command line?

-Libor
 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [PATCH][SDP] AIO buffer corruption

2005-05-04 Thread Steven Wooding
Libor Michalek wrote:
On Mon, May 02, 2005 at 03:48:51PM +0100, Steven Wooding wrote:
 

Hello Libor,
I've tried your patch, but unfortunately it made no difference to 
the -32/-104 errors I get. I have observed the following features, 
which may help you diagnose my problem:

Platform: 64 bit
OS: RHEL 4
Kernel: 2.6.11.6
OpenIB gen2: 2225
SM: IO5000 switch
   

Steve,
 
 I'm not seeing this issue on my x86_64 systems. Is there anything
in the syslog on either system? If you build SDP with debug, there
will be messages in the log, any errors? Which HCA are you using, and
which firmware? Also, can you send the exact command line?

-Libor
 

Hi Libor,
Actually your patch did make a difference (thanks). After playing around 
with ttcp.aio a bit more I now find that long runs now work fine if the 
message size is larger than the zero copy threshold. I'll get back to 
you tomorrow about the details of my setup. I did turn debugging on, but 
did not see any different messages when the -32/-104 error occurred, now 
for message sizes smaller than the zero threshold.

Cheers,
Steve.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] [PATCH][SDP] AIO buffer corruption

2005-05-02 Thread Steven Wooding
Hello Libor,

I've tried your patch, but unfortunately it made no difference to the -32/-104 
errors I get. I have observed the following features, which may help you 
diagnose my problem:

1. Sometimes the receiver reports -32 and the transmitter -104, but other 
times the errors are reversed.

2. The number of ping-pongs seems to be the main factor. However, the value of 
-n at which ttcp.aio fails is not fixed, if all other parameters are the 
same.

3. -32/-104 error occurs for a much smaller value of -n if -a is 1 to 3.

4. This maybe unrelated, but I see very poor performance for -a = 1 (50 - 75 
MB/s). If -a is larger that 4 then the performance comes back (400 - 900 
MB/s).

Is there are other information I can supply to help you?

Hope this finds you well.

Cheers,

Steve.

Platform: 64 bit
OS: RHEL 4
Kernel: 2.6.11.6
OpenIB gen2: 2225
SM: IO5000 switch
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Advice about adapting ibv_pingpong to use UC

2005-04-17 Thread Steven Wooding
Hi,
I wonder if someone working on the gen2 uverbs would be so kind as to 
give me some advice on adapting the ibv_pingpong program to use a UC QP 
type rather than RC. I was previously able to do this with the Mellanox 
stack by changing the qp_type attribute and then not setting variables 
that are only needed for RC (timeout and retry periods etc).

However, when I perform the same trick with ibv_pingpong it errors on 
the function call that should put the QP into the INIT state. I can't 
see what to change in that function call to get it to the next state change.

I realise that the general demand for the UC type connection is low, but 
my application is a real-time interface where retries are not an option 
I'm afraid.

Thank you in advance for any help the busy gen2 developers are able to 
offer.

Regards,
Steve.
x86_64, RHEL 4, gen2 2169.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Repost: AIO SDP and ttcp.aio: Event errors

2005-04-12 Thread Steven Wooding
1. OK. That's fair enough. I'll give that ago.
2. Yeah, it also occurs for large values of -n, say 1.
3. Great.
4. Yeah, the times are small as I'm only doing short runs (-n 1000) to 
avoid the -32/-104 errors. I'll try pushing -n up a bit.

Thanks Libor,
Steve.
Libor Michalek wrote:
On Tue, Apr 12, 2005 at 09:08:15PM +0100, Steven Wooding wrote:
 

Hi,
I have been putting ttcp.aio through its paces and have a few questions.
1. When -l is larger than 131072 I get an Event error -22 on the transmit
side and no data to transferred. Changing values of -n and -a do not make
any difference.
   

 The FMRs need to be sized at initialization time. The code currently
picks 128K as the size for the FMRs, and does not support an AIO operation
that would span multiple FMRs. If you want to try larger AIO operations
with the current code you will need to recompile SDP with a larger FMR
size, which is determined by the constant SDP_IOCB_SIZE_MAX in sdp_iocb.h
It's been a while since I've last tried this, if you try it and have
problems let me know.
 

2. When using a value of 1 for -a (so I suppose this is non-aio), I get an
Event error of -32 on the transmit side and an -104 on the receiver end.
Only some of the data is transferred.
   

 I'll look into this, I'm seeing a problem on longer runs myself. With a
value of 1 for -a it still uses aio, the value only means how many aio
operations can be outstanding at a given time. This just means that a
single buffer will be submitted for read/write and a new one will not
be submitted until that buffer's IO completes.
 

3. For future reference, where can I find out what these Event error codes
mean to give me a glue of what's going wrong.
   

 The errors are errno values. I'll make a note to write up which errors
are possible and what they are likely to mean.
 

4. I sometimes see significant differences in the transfer speed reported on
the transmit and receiver ends. Is one more right than the other?
   

 Are the wall clock times for the data transfers small, on the order
of a few seconds? How big of a wall clock time difference are you seeing?
-Libor
 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general