Re: [ewg] [PATCH] libibmad: Handle MAD redirection

2009-06-30 Thread Joachim Fenkes
On Tuesday 30 June 2009 00:01, Hal Rosenstock wrote:
 On Mon, Jun 29, 2009 at 8:10 AM, Joachim Fenkesfen...@de.ibm.com wrote:

  Previously, libibmad reacted to GSI MAD responses with a redirect status
  by throwing an error. IBM eHCA adapters use redirection, so most
  infiniband_diags tools didn't work against eHCA.
 
 Are there GS classes other than PerfMgt which would be redirected by eHCA ?

Not right now, no. If you're interested in the details of how and when the
eHCA driver redirects, please have a look at 
drivers/infiniband/hw/ehca/ehca_sqp.c.
 
  --- a/libibmad/src/gs.c
  +++ b/libibmad/src/gs.c
  @@ -70,7 +70,8 @@ uint8_t *pma_query_via(void *rcvbuf, ib_portid_t * dest, 
  int port,
         rpc.datasz = IB_PC_DATA_SZ;
         rpc.dataoffs = IB_PC_DATA_OFFS;
 
  -       dest-qp = 1;
  +       if (!dest-qp)
  +               dest-qp = 1;
 
 Is this change part of this patch or unrelated/separate ?

Part of the patch. Without this change, pma_query_via() would overwrite the
redirected QP with QP1 again, and the MAD would never arrive at the right
destination.
 
  +               /* check for exact match instead of only the redirect bit;
  +                * that way, weird statuses cause an error, too */
  +               if (status == IB_MAD_STS_REDIRECT) {
  +                       /* update dport for next request and retry */
  +                       dport-lid = mad_get_field(mad, 64, 
  IB_CPI_REDIRECT_LID_F);
  +                       dport-qp = mad_get_field(mad, 64, 
  IB_CPI_REDIRECT_QP_F);
  +                       dport-qkey = mad_get_field(mad, 64, 
  IB_CPI_REDIRECT_QKEY_F);
 
 Are those the only 3 fields which eHCA changes on a redirect ? There
 may be others we would want to add in here (PKey, SL, ...) ?

Yeah, I agree on the SL, I can add it to the patch.

At first, I also tried to set the PKey, but ClassPortInfo specifies a PKey
while ib_portid_t needs a PKey Index, and I found no way of converting
between the two, so I left it at zero. Incidentally, there isn't a single
code line in management.git that actually changes the pkey_index from its
init value of 0, so I figured that omission couldn't be too bad.

Then there's the GRH stuff, but I refrained from coding that because I
wouldn't be able to test it -- InfiniBand isn't going to evolve beyond a
single subnet any time toon, is it?

 Also, are the offsets above correct ?

Yes, they are, I tested. The ClassPortInfo data starts at offset 64 in the
MAD, and I didn't find a constant for this in mad.h.
 
 Depending on which GS classes are to be supported for redirection, we
 may want to do something similar to the rmpp equivalent of this
 routine too.

The spec says in 13.5.2 that The SA as well as each GSA may individually
support this mechanism or not, so we should probably be prepared for any GS
class to redirect. I don't care much about RMPP, though, so I left it alone.

Regards,
  Joachim



___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [PATCH] libibmad: Handle MAD redirection

2009-06-30 Thread Hal Rosenstock
On Tue, Jun 30, 2009 at 8:04 AM, Joachim Fenkesfen...@de.ibm.com wrote:
 On Tuesday 30 June 2009 00:01, Hal Rosenstock wrote:
 On Mon, Jun 29, 2009 at 8:10 AM, Joachim Fenkesfen...@de.ibm.com wrote:

  Previously, libibmad reacted to GSI MAD responses with a redirect status
  by throwing an error. IBM eHCA adapters use redirection, so most
  infiniband_diags tools didn't work against eHCA.

 Are there GS classes other than PerfMgt which would be redirected by eHCA ?

 Not right now, no. If you're interested in the details of how and when the
 eHCA driver redirects, please have a look at 
 drivers/infiniband/hw/ehca/ehca_sqp.c.

SL is always set to 0 and RespTimeValue is hardcoded.

  --- a/libibmad/src/gs.c
  +++ b/libibmad/src/gs.c
  @@ -70,7 +70,8 @@ uint8_t *pma_query_via(void *rcvbuf, ib_portid_t * dest, 
  int port,
         rpc.datasz = IB_PC_DATA_SZ;
         rpc.dataoffs = IB_PC_DATA_OFFS;
 
  -       dest-qp = 1;
  +       if (!dest-qp)
  +               dest-qp = 1;

 Is this change part of this patch or unrelated/separate ?

 Part of the patch. Without this change, pma_query_via() would overwrite the
 redirected QP with QP1 again, and the MAD would never arrive at the right
 destination.

  +               /* check for exact match instead of only the redirect bit;
  +                * that way, weird statuses cause an error, too */
  +               if (status == IB_MAD_STS_REDIRECT) {
  +                       /* update dport for next request and retry */
  +                       dport-lid = mad_get_field(mad, 64, 
  IB_CPI_REDIRECT_LID_F);
  +                       dport-qp = mad_get_field(mad, 64, 
  IB_CPI_REDIRECT_QP_F);
  +                       dport-qkey = mad_get_field(mad, 64, 
  IB_CPI_REDIRECT_QKEY_F);

 Are those the only 3 fields which eHCA changes on a redirect ? There
 may be others we would want to add in here (PKey, SL, ...) ?

 Yeah, I agree on the SL, I can add it to the patch.

 At first, I also tried to set the PKey, but ClassPortInfo specifies a PKey
 while ib_portid_t needs a PKey Index, and I found no way of converting
 between the two,

It's available via libibumad. Note that umad version 5 is needed for
pkey index support.

 so I left it at zero. Incidentally, there isn't a single
 code line in management.git that actually changes the pkey_index from its
 init value of 0, so I figured that omission couldn't be too bad.

Agreed. I think you're referring to infiniband-diags and not opensm.

 Then there's the GRH stuff, but I refrained from coding that because I
 wouldn't be able to test it

That's fine for now IMO. I think there's only some minimal GRH support
now elsewhere in the diags and this needs more work in general.

 -- InfiniBand isn't going to evolve beyond a
 single subnet any time toon, is it?

There's ongoing work in this area but not sure how soon is soon...

 Also, are the offsets above correct ?

 Yes, they are, I tested. The ClassPortInfo data starts at offset 64 in the
 MAD, and I didn't find a constant for this in mad.h.

 Depending on which GS classes are to be supported for redirection, we
 may want to do something similar to the rmpp equivalent of this
 routine too.

 The spec says in 13.5.2 that The SA as well as each GSA may individually
 support this mechanism or not, so we should probably be prepared for any GS
 class to redirect. I don't care much about RMPP, though, so I left it alone.

Understood.

-- Hal

 Regards,
  Joachim




___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ofa-general] Re: [ewg] [PATCH] libibmad: Handle MAD redirection

2009-06-30 Thread Hal Rosenstock
On Tue, Jun 30, 2009 at 2:00 PM, Jason
Gunthorpejguntho...@obsidianresearch.com wrote:
 On Tue, Jun 30, 2009 at 02:04:03PM +0200, Joachim Fenkes wrote:
 On Tuesday 30 June 2009 00:01, Hal Rosenstock wrote:
  On Mon, Jun 29, 2009 at 8:10 AM, Joachim Fenkesfen...@de.ibm.com wrote:

   Previously, libibmad reacted to GSI MAD responses with a redirect 
   status
   by throwing an error. IBM eHCA adapters use redirection, so most
   infiniband_diags tools didn't work against eHCA.
 
  Are there GS classes other than PerfMgt which would be redirected by eHCA ?

 Not right now, no. If you're interested in the details of how and when the
 eHCA driver redirects, please have a look at 
 drivers/infiniband/hw/ehca/ehca_sqp.c.

 Hmm.. That definately doesn't look right overall. You are not forming
 the redirect reply in a way that will work with all possible
 fabrics. You can't just return a 0 SL and the default PKey and assume
 things will work out.

 It looks like all you want to do is redirect to a different QPN? If so
 I recommend you copy all the values from the incoming MAD's LRH and,
 if present, GRH into the ClassPortInfo reply. Copy the PKey too.

 This way you have the best chance of sending back the right information.

Agreed.

 If there is no GRH then you can use GID index 0 and a 0 TC and a 0
 FL. According to the spec returning the port GID is NOT optional.

These are not needed when using LID based redirection (see
ClassPortInfo RedirectLID description).

-- Hal

   +            /* update dport for next request and retry */
   +            dport-lid = mad_get_field(mad, 64, IB_CPI_REDIRECT_LID_F);
   +            dport-qp = mad_get_field(mad, 64, IB_CPI_REDIRECT_QP_F);
   +            dport-qkey = mad_get_field(mad, 64, 
   IB_CPI_REDIRECT_QKEY_F);

 This code sould also check for 0 LID and bail.

 Jason

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ofa-general] Re: [ewg] [PATCH] libibmad: Handle MAD redirection

2009-06-30 Thread Jason Gunthorpe
On Tue, Jun 30, 2009 at 02:37:26PM -0400, Hal Rosenstock wrote:

  If there is no GRH then you can use GID index 0 and a 0 TC and a 0
  FL. According to the spec returning the port GID is NOT optional.
 
 These are not needed when using LID based redirection (see
 ClassPortInfo RedirectLID description).

Hmm?

RedirectGID RO 128 64 The GID a requester shall use as the destination
  GID in the GRH of messages used to access redirected class
  services. If redirection is not being 
  performed, this shall be set to zero.

They *might* not be used when the LID is returned, but they still
must be set.

-- 
Jason Gunthorpe jguntho...@obsidianresearch.com(780)4406067x832
Chief Technology Officer, Obsidian Research Corp Edmonton, Canada
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ofa-general] Re: [ewg] [PATCH] libibmad: Handle MAD redirection

2009-06-30 Thread Hal Rosenstock
On Tue, Jun 30, 2009 at 2:53 PM, Jason
Gunthorpejguntho...@obsidianresearch.com wrote:
 On Tue, Jun 30, 2009 at 02:37:26PM -0400, Hal Rosenstock wrote:

  If there is no GRH then you can use GID index 0 and a 0 TC and a 0
  FL. According to the spec returning the port GID is NOT optional.

 These are not needed when using LID based redirection (see
 ClassPortInfo RedirectLID description).

 Hmm?

 RedirectGID RO 128 64 The GID a requester shall use as the destination
                      GID in the GRH of messages used to access redirected 
 class
                      services. If redirection is not being
                      performed, this shall be set to zero.

 They *might* not be used when the LID is returned, but they still
 must be set.

Sure; they can all be set to 0 as eHCA is doing now. That was all I
meant to say.

-- Hal


 --
 Jason Gunthorpe jguntho...@obsidianresearch.com        (780)4406067x832
 Chief Technology Officer, Obsidian Research Corp         Edmonton, Canada

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] [PATCH] libibmad: Handle MAD redirection

2009-06-29 Thread Joachim Fenkes
Previously, libibmad reacted to GSI MAD responses with a redirect status
by throwing an error. IBM eHCA adapters use redirection, so most
infiniband_diags tools didn't work against eHCA.

Fix: Modify mad_rpc() so that it resends the request to the redirection
target if a redirect GS response is received. This is repeated until no
redirect response is received, allowing for multiple levels of
indirection.

The dport argument is updated with the redirection target, so subsequent
MADs will not go through the redirection process again but reach the target
directly.

Tested using perfquery between ehca, mlx4 and mthca in all possible
combinations.

Signed-off-by: Joachim Fenkes fen...@de.ibm.com
---

Vlad, please queue this patch for OFED 1.5 if Hal doesn't object -- thanks!

Regards,
  Joachim

 libibmad/include/infiniband/mad.h |9 +++
 libibmad/src/gs.c |6 +++-
 libibmad/src/rpc.c|   47 +---
 3 files changed, 45 insertions(+), 17 deletions(-)

diff --git a/libibmad/include/infiniband/mad.h 
b/libibmad/include/infiniband/mad.h
index aa27eb5..bdf5158 100644
--- a/libibmad/include/infiniband/mad.h
+++ b/libibmad/include/infiniband/mad.h
@@ -115,6 +115,8 @@ enum MAD_ATTR_ID {
 
 enum MAD_STATUS {
IB_MAD_STS_OK= (0  2),
+   IB_MAD_STS_BUSY  = (1  0),
+   IB_MAD_STS_REDIRECT  = (1  1),
IB_MAD_STS_BAD_BASE_VER_OR_CLASS = (1  2),
IB_MAD_STS_METHOD_NOT_SUPPORTED  = (2  2),
IB_MAD_STS_METHOD_ATTR_NOT_SUPPORTED = (3  2),
@@ -783,8 +785,15 @@ MAD_EXPORT int madrpc_set_timeout(int timeout);
 MAD_EXPORT struct ibmad_port *mad_rpc_open_port(char *dev_name, int dev_port,
int *mgmt_classes, int num_classes);
 MAD_EXPORT void mad_rpc_close_port(struct ibmad_port *srcport);
+
+/*
+ * On redirection, the dport argument is updated with the redirection target,
+ * so subsequent MADs will not go through the redirection process again but
+ * reach the target directly.
+ */
 MAD_EXPORT void *mad_rpc(const struct ibmad_port *srcport, ib_rpc_t * rpc,
ib_portid_t * dport, void *payload, void *rcvdata);
+
 MAD_EXPORT void *mad_rpc_rmpp(const struct ibmad_port *srcport, ib_rpc_t * rpc,
  ib_portid_t * dport, ib_rmpp_hdr_t * rmpp,
  void *data);
diff --git a/libibmad/src/gs.c b/libibmad/src/gs.c
index f3d245e..c7e4ff6 100644
--- a/libibmad/src/gs.c
+++ b/libibmad/src/gs.c
@@ -70,7 +70,8 @@ uint8_t *pma_query_via(void *rcvbuf, ib_portid_t * dest, int 
port,
rpc.datasz = IB_PC_DATA_SZ;
rpc.dataoffs = IB_PC_DATA_OFFS;
 
-   dest-qp = 1;
+   if (!dest-qp)
+   dest-qp = 1;
if (!dest-qkey)
dest-qkey = IB_DEFAULT_QP1_QKEY;
 
@@ -109,7 +110,8 @@ uint8_t *performance_reset_via(void *rcvbuf, ib_portid_t * 
dest,
rpc.timeout = timeout;
rpc.datasz = IB_PC_DATA_SZ;
rpc.dataoffs = IB_PC_DATA_OFFS;
-   dest-qp = 1;
+   if (!dest-qp)
+   dest-qp = 1;
if (!dest-qkey)
dest-qkey = IB_DEFAULT_QP1_QKEY;
 
diff --git a/libibmad/src/rpc.c b/libibmad/src/rpc.c
index 07b623d..c809060 100644
--- a/libibmad/src/rpc.c
+++ b/libibmad/src/rpc.c
@@ -189,27 +189,44 @@ void *mad_rpc(const struct ibmad_port *port, ib_rpc_t * 
rpc,
int status, len;
uint8_t sndbuf[1024], rcvbuf[1024], *mad;
int timeout, retries;
+   int redirect = 1;
 
-   len = 0;
-   memset(sndbuf, 0, umad_size() + IB_MAD_SIZE);
+   while (redirect) {
+   len = 0;
+   memset(sndbuf, 0, umad_size() + IB_MAD_SIZE);
 
-   if ((len = mad_build_pkt(sndbuf, rpc, dport, 0, payload))  0)
-   return 0;
+   if ((len = mad_build_pkt(sndbuf, rpc, dport, 0, payload))  0)
+   return 0;
 
-   timeout = rpc-timeout ? rpc-timeout :
-   port-timeout ? port-timeout : madrpc_timeout;
-   retries = port-retries ? port-retries : madrpc_retries;
+   timeout = rpc-timeout ? rpc-timeout :
+   port-timeout ? port-timeout : madrpc_timeout;
+   retries = port-retries ? port-retries : madrpc_retries;
 
-   if ((len = _do_madrpc(port-port_id, sndbuf, rcvbuf,
- port-class_agents[rpc-mgtclass],
- len, timeout, retries))  0) {
-   IBWARN(_do_madrpc failed; dport (%s), portid2str(dport));
-   return 0;
-   }
+   if ((len = _do_madrpc(port-port_id, sndbuf, rcvbuf,
+ port-class_agents[rpc-mgtclass],
+ len, timeout, retries))  0) {
+   IBWARN(_do_madrpc failed; dport (%s), 
portid2str(dport));
+   return 0;
+   }
 
-   mad = 

Re: [ewg] [PATCH] libibmad: Handle MAD redirection

2009-06-29 Thread Hal Rosenstock
On Mon, Jun 29, 2009 at 8:10 AM, Joachim Fenkesfen...@de.ibm.com wrote:
 Previously, libibmad reacted to GSI MAD responses with a redirect status
 by throwing an error. IBM eHCA adapters use redirection, so most
 infiniband_diags tools didn't work against eHCA.

Are there GS classes other than PerfMgt which would be redirected by eHCA ?

 Fix: Modify mad_rpc() so that it resends the request to the redirection
 target if a redirect GS response is received. This is repeated until no
 redirect response is received, allowing for multiple levels of
 indirection.

 The dport argument is updated with the redirection target, so subsequent
 MADs will not go through the redirection process again but reach the target
 directly.

 Tested using perfquery between ehca, mlx4 and mthca in all possible
 combinations.

 Signed-off-by: Joachim Fenkes fen...@de.ibm.com
 ---

 Vlad, please queue this patch for OFED 1.5 if Hal doesn't object -- thanks!

See other comments below.

-- Hal

 Regards,
  Joachim

  libibmad/include/infiniband/mad.h |    9 +++
  libibmad/src/gs.c                 |    6 +++-
  libibmad/src/rpc.c                |   47 +---
  3 files changed, 45 insertions(+), 17 deletions(-)

 diff --git a/libibmad/include/infiniband/mad.h 
 b/libibmad/include/infiniband/mad.h
 index aa27eb5..bdf5158 100644
 --- a/libibmad/include/infiniband/mad.h
 +++ b/libibmad/include/infiniband/mad.h
 @@ -115,6 +115,8 @@ enum MAD_ATTR_ID {

  enum MAD_STATUS {
        IB_MAD_STS_OK                        = (0  2),
 +       IB_MAD_STS_BUSY                      = (1  0),
 +       IB_MAD_STS_REDIRECT                  = (1  1),
        IB_MAD_STS_BAD_BASE_VER_OR_CLASS     = (1  2),
        IB_MAD_STS_METHOD_NOT_SUPPORTED      = (2  2),
        IB_MAD_STS_METHOD_ATTR_NOT_SUPPORTED = (3  2),
 @@ -783,8 +785,15 @@ MAD_EXPORT int madrpc_set_timeout(int timeout);
  MAD_EXPORT struct ibmad_port *mad_rpc_open_port(char *dev_name, int dev_port,
                        int *mgmt_classes, int num_classes);
  MAD_EXPORT void mad_rpc_close_port(struct ibmad_port *srcport);
 +
 +/*
 + * On redirection, the dport argument is updated with the redirection target,
 + * so subsequent MADs will not go through the redirection process again but
 + * reach the target directly.
 + */
  MAD_EXPORT void *mad_rpc(const struct ibmad_port *srcport, ib_rpc_t * rpc,
                        ib_portid_t * dport, void *payload, void *rcvdata);
 +
  MAD_EXPORT void *mad_rpc_rmpp(const struct ibmad_port *srcport, ib_rpc_t * 
 rpc,
                              ib_portid_t * dport, ib_rmpp_hdr_t * rmpp,
                              void *data);
 diff --git a/libibmad/src/gs.c b/libibmad/src/gs.c
 index f3d245e..c7e4ff6 100644
 --- a/libibmad/src/gs.c
 +++ b/libibmad/src/gs.c
 @@ -70,7 +70,8 @@ uint8_t *pma_query_via(void *rcvbuf, ib_portid_t * dest, 
 int port,
        rpc.datasz = IB_PC_DATA_SZ;
        rpc.dataoffs = IB_PC_DATA_OFFS;

 -       dest-qp = 1;
 +       if (!dest-qp)
 +               dest-qp = 1;

Is this change part of this patch or unrelated/separate ?

        if (!dest-qkey)
                dest-qkey = IB_DEFAULT_QP1_QKEY;

 @@ -109,7 +110,8 @@ uint8_t *performance_reset_via(void *rcvbuf, ib_portid_t 
 * dest,
        rpc.timeout = timeout;
        rpc.datasz = IB_PC_DATA_SZ;
        rpc.dataoffs = IB_PC_DATA_OFFS;
 -       dest-qp = 1;
 +       if (!dest-qp)
 +               dest-qp = 1;

Same as above.

        if (!dest-qkey)
                dest-qkey = IB_DEFAULT_QP1_QKEY;

 diff --git a/libibmad/src/rpc.c b/libibmad/src/rpc.c
 index 07b623d..c809060 100644
 --- a/libibmad/src/rpc.c
 +++ b/libibmad/src/rpc.c
 @@ -189,27 +189,44 @@ void *mad_rpc(const struct ibmad_port *port, ib_rpc_t * 
 rpc,
        int status, len;
        uint8_t sndbuf[1024], rcvbuf[1024], *mad;
        int timeout, retries;
 +       int redirect = 1;

 -       len = 0;
 -       memset(sndbuf, 0, umad_size() + IB_MAD_SIZE);
 +       while (redirect) {
 +               len = 0;
 +               memset(sndbuf, 0, umad_size() + IB_MAD_SIZE);

 -       if ((len = mad_build_pkt(sndbuf, rpc, dport, 0, payload))  0)
 -               return 0;
 +               if ((len = mad_build_pkt(sndbuf, rpc, dport, 0, payload))  0)
 +                       return 0;

 -       timeout = rpc-timeout ? rpc-timeout :
 -           port-timeout ? port-timeout : madrpc_timeout;
 -       retries = port-retries ? port-retries : madrpc_retries;
 +               timeout = rpc-timeout ? rpc-timeout :
 +                       port-timeout ? port-timeout : madrpc_timeout;
 +               retries = port-retries ? port-retries : madrpc_retries;

 -       if ((len = _do_madrpc(port-port_id, sndbuf, rcvbuf,
 -                             port-class_agents[rpc-mgtclass],
 -                             len, timeout, retries))  0) {
 -               IBWARN(_do_madrpc failed; dport (%s), portid2str(dport));
 -               return 0;
 -       }
 +               if ((len =