On 12/23/2015 11:35 AM, Matan Barak wrote:
> On Wed, Dec 23, 2015 at 6:08 PM, Doug Ledford <dledf...@redhat.com> wrote:
>> On 12/22/2015 02:26 PM, Matan Barak wrote:
>>> On Tue, Dec 22, 2015 at 8:58 PM, Doug Ledford <dledf...@redhat.com> wrote:
>>>> On 12/22/2015 05:47 AM, Or Gerlitz wrote:
>>>>> On 12/21/2015 5:01 PM, Matan Barak wrote:
>>>>>> Previously, cma_match_net_dev called cma_protocol_roce which
>>>>>> tried to verify that the IB device uses RoCE protocol. However,
>>>>>> if rdma_id didn't have a bounded port, it used the first port
>>>>>> of the device.
>>>>>>
>>>>>> In VPI systems, the first port might be an IB port while the second
>>>>>> one could be an Ethernet port. This made requests for unbounded rdma_ids
>>>>>> that come from the Ethernet port fail.
>>>>>> Fixing this by passing the port of the request and checking this port
>>>>>> of the device.
>>>>>>
>>>>>> Fixes: b8cab5dab15f ('IB/cma: Accept connection without a valid netdev
>>>>>> on RoCE')
>>>>>> Signed-off-by: Matan Barak<mat...@mellanox.com>
>>>>>
>>>>> seems that the patch is missing from patchworks, I can't explain that.
>>>>
>>>> I've already downloaded it and marked it accepted.
>>>>
>>>
>>> Thanks Doug. Would you like that I'll repost the patch with the commit
>>> message changed as Or suggested or is the current version good enough?
>>>
>>> Regarding the Ethernet loopback issue, I started looking into that,
>>> but as Or stated, it's broken even before the RoCE patches.
>>
>> Ping.  Any progress on this?
> 
> Yeah, there's some progress - the basic problem is that we don't have
> a bounded ndev and thus cma_resolve_iboe_route returns -ENODEV.

Which makes sense considering that 127.0.0.1 doesn't belong to any of
the devs.

> The root cause for this is that we have to store the ndev in
> cma_bind_loopback. Even after doing that, cma_set_loopback changes the
> sgid to be the localhost GID, which doesn't exist in the GID table and
> thus will fail later in the GID lookup.

Again, makes sense.

> I think that regarding loopback, we actually want to send the data on
> the link local default GID,

Which link local default GID?  If you have more than one port or card,
then that is not a unique value.

> which is guaranteed to exist.

And in many cases, multiple times.

> That's why I
> think we should:
> 1. Change the cma_src_addr and cma_dst_addr in cma_bind_loopback to be
> the default GID.
> 2. Store the associated ndev of this default GID as the bounded device.
> 3. In cma_resolve_loopback, get the MAC of this bounded device and
> store it as the DMAC.
> 4. In cma_resolve_iboe_route, don't try to do route resolve if the
> dGID matches the default GID.
> 
> It's still not working though, but this is where I'm headed. What do you 
> think?

Let's punt this until later.  It only effects the situation when you use
127.0.0.1 as the address.  If you use the local IP address of a specific
interface, you get the same loopback behavior, but no failures (and on
top of that instead of getting a random device to handle the loopback
transfer, you get a specific device of your choosing).  To me, that
qualifies as a reasonable workaround.  The 127.0.0.1 behavior has been
broken for a while (and I'm not sure it should have ever been relied
upon anyway), so I don't think we have to hold things up.

-- 
Doug Ledford <dledf...@redhat.com>
              GPG KeyID: 0E572FDD


Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to