Michael,
Do you have an idea? What can be the problem with the arbel driver?

Itay

On Mon, Jul 12, 2010 at 3:18 AM, M Lowe <ml...@shaw.ca> wrote:

> I have been able to log the debug messages now however I see no errors
> that would indicate where the problem is.
>
> Just to recap quickly, the problem is that san-booting over InfiniBand
> using SRP doesn't work and just times out. The timeout occurs while
> waiting for a response to the SRP login request. I'm fairly certain the
> problem lies within gPXE because I can access the SRP target just fine
> through a local installation of Windows. In addition, on the SRP target
> side I have traced through the ib_srpt module and found that a login
> response is generated and sent (or at least posted to the mthca module
> work queue).
>
> On the gPXE side I've found that I'm not receiving the SRP_LOGIN_RSP
> packet even at the InfiniBand protocol level (net/infiniband.c). So far
> I have been able to determine the packet is lost at some point in the
> Arbel driver (drivers/infiniband/arbel.c) before arbel_complete().This
> would indicate the problem exists within the Arbel driver and explains
> why SRP sanboot worked with the Hermon driver. Despite compiling with
> DEBUG=arbel:3 I get no errors indicating there are any problems or
> dropped packets.
>
> Here is the output from autoboot with
> DEBUG=srp,ipoib,arp,infiniband,ib_cm,ib_cmrc,ib_mcast,ib_mi,ib_packet,ib
> _pathrec,ib_sma,ib_smc,ib_srp
>
> Note: I have added some debug messages to help illustrate the flow of
> packets. At the beginning of ipoib_complete_recv, ib_complete_recv, and
> ib_mi_complete_recv I have added "RX" debug messages.
>
> Booting from root path
> "ib_srp::::fe800000000000000002c9020022e5e5::0002c9020022e5e4::0002c9020
> 022e5e4:0002c9020022e5e4"
> SRP 0xbb134 using
> ib_srp::::fe800000000000000002c9020022e5e5::0002c9020022e5e4::0002c90200
> 22e5e4:0002c9020022e5e4
> SRP attached successfully
> IBDEV 0xb9a84 creating completion queue
> IBDEV 0xb9a84 created 8-entry completion queue 0xbb4c4 (0xbb214) with
> CQN 0x83
> IBDEV 0xb9a84 creating queue pair
> IBDEV 0xb9a84 created queue pair 0xbb4f4 (0xbb5c4) with QPN 0x550403
> IBDEV 0xb9a84 QPN 0x550403 has 4 send entries at [0xbb5a0,0xbb5b0)
> IBDEV 0xb9a84 QPN 0x550403 has 2 receive entries at [0xbb5b0,0xbb5b8)
> CMRC 0xbb1b4 using QPN 550403
> SRP 0xbb134 TX login request tag 0000000000000001
> CM 0xbbb64 created for IBDEV 0xb9a84 QPN 550403
> CM 0xbbb64 connecting to fe800000:00000000:0002c902:0022e5e5
> 0002c902:0022e5e4
> MI 0xba564 TX TID 6750584500000003 (03,02,01,0035) status 0000
> infiniband RX
> MI 0xba564 RX
> MI 0xba564 RX TID 6750584500000003 (03,02,81,0035) status 0000
> IBDEV 0xb9a84 path to fe800000:00000000:0002c902:0022e5e5 is 0007 sl 0
> rate 6
> MI 0xba564 TX TID 6750584500000004 (07,02,03,0010) status 0000
> MI 0xba564 TX TID 6750584500000004 (07,02,03,0010) status 0000
> MI 0xba564 TX TID 6750584500000004 (07,02,03,0010) status 0000
> MI 0xba564 TX TID 6750584500000004 (07,02,03,0010) status 0000
> infiniband RX
> IPoIB 0xb9ccc RX
> ARP cache add: IP 10.20.76.1 => IPoIB
> 80000404:fe800000:00000000:0002c902:0022e5e5
> ARP reply: IP 10.20.76.45 => IPoIB
> 00550402:fe800000:00000000:0002c902:00243035
> IPoIB peer 4 has MAC 80000404:fe800000:00000000:0002c902:0022e5e5
> MI 0xba564 TX TID 6750584500000005 (03,02,01,0035) status 0000
> infiniband RX
> MI 0xba564 RX
> MI 0xba564 RX TID 6750584500000005 (03,02,81,0035) status 0000
> MI 0xba564 RX TID 6750584500000005 handling via transaction handler
> IBDEV 0xb9a84 path to fe800000:00000000:0002c902:0022e5e5 is 0007 sl 0
> rate 6
> infiniband RX
> IPoIB 0xb9ccc RX
> ARP cache update: IP 10.20.76.1 => IPoIB
> 80000404:fe800000:00000000:0002c902:0022e5e5
> ARP reply: IP 10.20.76.45 => IPoIB
> 00550402:fe800000:00000000:0002c902:00243035
> MI 0xba564 TX TID 6750584500000004 (07,02,03,0010) status 0000
> MI 0xba564 abandoning TID 6750584500000004
> CM 0xbbb64 connection request failed: Connection timed out (0x4c206035)
> CMRC 0xbb1b4 disconnected: Connection timed out (0x4c206035)
> SRP 0xbb134 socket closed: Connection timed out (0x4c206035)
>
>
>
> From: Itay Gazit [mailto:itayga...@gmail.com]
> Sent: Friday, June 25, 2010 11:47 AM
> To: Stefan Hajnoczi; M Lowe
> Cc: etherboot-disc...@lists.sourceforge.net; gpxe; Michael Brown
> Subject: Re: [Etherboot-discuss] SRP timeout
>
> Hi Matthew,
> Stefan is right, you should reduce the DEBUG messages depth to find the
> fail cause.
> I have tried SRP boot only with Hermon driver (ConnectX) and it worked
> for me.
> Regards,
> Itay
>
_______________________________________________
gPXE mailing list
gPXE@etherboot.org
http://etherboot.org/mailman/listinfo/gpxe

Reply via email to