Re: [PATCH for-next V2 00/11] Add RoCE v2 support

2015-12-15 Thread Moni Shoua
> The part that bothers me about this is that this statement makes sense
> when just thinking about the spec, as you say.  However, once you
> consider namespaces, security implications make this statement spec
> compliant, but still unacceptable.  The spec itself is silent on
> namespaces.  But, you guys wanted, and you got, namespace support.
> Since that's beyond spec, and carries security requirements, I think
> it's fair to say that from now on, the Linux kernel RDMA stack can no
> longer *just* be spec compliant.  There are additional concerns that
> must always be addressed with new changes, and those are the namespace
> constraint preservation concerns.

I can't object to that but I really would like to get an example of a
security risk.
So far, besides hearing that the way we choose to handle completions
is wrong, I didn't get a convincing example of how where it doesn't
work. Moreover, regarding security, all we wanted is for HW to report
the L3 protocol (IB, IPv4, or IPv6) in the packet. This is data that
with some extra CPU cycles can be obtained from the 40 bytes that are
scattered to the receive bufs anyway. So, if there is a security hole
it exists from day one of the IB stack and this is not the time we
should insist on fixing it.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND] infiniband:core:Add needed error path in cm_init_av_by_path

2015-12-15 Thread ira.weiny
On Tue, Dec 15, 2015 at 10:30:22AM -0700, Jason Gunthorpe wrote:
> On Tue, Dec 15, 2015 at 05:38:34PM +0100, Michael Wang wrote:
> > The hop_limit is only suggest that the package allowed to be
> > routed, not have to, correct?
> 
> If the hop limit is >= 2 (?) then the GRH is mandatory.

Yes >= 2.

"Setting this value to 0 or 1 will ensure that the packet 
will not be forwarded beyond the local subnet."

-- IBTA 8.3.6 Hop Limit

Ira

> The
> SM will return this information in the PathRecord if it determines a
> GRH is required. The whole stack follows this protocol.
> 
> The GRH is optional for in-subnet communications.
> 
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: device attr cleanup

2015-12-15 Thread Or Gerlitz

On 12/15/2015 9:03 PM, Doug Ledford wrote:


Or, you specifically asked me to wait until this week.  I made my
initial impressions clear (I don't necessarily like the removal of the
attr struct, but I like the removal of all of the query calls, and I'm
inclined to take the patch in spite of not liking the removal of the
struct).  Do you have anything to add or have we beat this horse to death?


Hi Doug,

Lets stop beating, both horses and people.

I do understand that

1. you don't link the removal of the attr
2. you do like the removal of all the query calls

I am proposing to take the path of a patch that
does exactly #2 while avoiding #1.

What's wrong with that? I haven't heard any reasoning for why its
so good to stash ~50 new fields on the IB device structure except
for the author saying that other subsystems do that and other people
saying they are in favor of this approach while not providing any
reasoning, except for maybe something on bikes.

Why you or anyone else has to be from now and ever the cache line police
making sure that people don't add new attributes in random locations
over the IB device structure?

What's wrong with putting fifty attributesin a structure which is a field
of the device struct and have people go there to see what are the d
ifferentattrs and add news ones there?

This will make the 4.5 merge window extremely complex or even totally
threatened  w.r.t to the RDMA subsystem and related drivers by 3.3K LOC 
patch.


Sorry, but, I still don't get it.

Or.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next V2 00/11] Add RoCE v2 support

2015-12-15 Thread Jason Gunthorpe
On Tue, Dec 15, 2015 at 04:45:21PM -0500, Doug Ledford wrote:

> In particular, Liran piped up with this comment:
> 
> "Also, I don't want to do any route resolution on the Rx path. A UD QP
> completion just reports the details of the packet it received.
> 
> Conceptually, an incoming packet may not even match an SGID index at
> all.  Maybe, responses should be sent from another device. This should
> not be decided that the point that a packet was received."
> 
> The part that bothers me about this is that this statement makes sense
> when just thinking about the spec, as you say.  However, once you
> consider namespaces, security implications make this statement spec
> compliant, but still unacceptable.

This is where I got to and decided there is no desire to make a
correct patch for this stuff.

You are completely correct in your comments above.

> Since you and Jason did not reach a consensus, I have to dig in and see
> if these patches make it possible to break namespace confinement, either
> accidentally or with intentionally tricky behavior.  That's going to
> take me some time.

Everything to do with parsing a wc and constructing an AH is wrong in
this series, and the fixes require the API changes I mentioned ( add
'wc to gid index' API call, add 'route to AH' API call)

Every time you read 'route validation' - that is an error, the route
should never just be validated, it is needed information to construct
a rocev2 AH. All the places that roughly hand parse the rocev2 WC
should not be open coded.

Even if current HW is broken for namespaces we should not enshrine that
in the kapi.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] IB core: Display 64 bit counters from the extended set

2015-12-15 Thread Doug Ledford
On 12/15/2015 04:42 PM, Hal Rosenstock wrote:
> On 12/15/2015 4:20 PM, Jason Gunthorpe wrote:
>>> The unicast/multicast extended counters are not always supported -
 depends on setting of PerfMgt ClassPortInfo
 CapabilityMask.IsExtendedWidthSupportedNoIETF (bit 10).
> 
>> Yes.. certainly this proposed patch needs to account for that and
>> continue to use the 32 bit ones in that case.
> 
> There are no 32 bit equivalents of those 4 "IETF" counters ([uni
> multi]cast [xmit rcv] pkts).
> 
> When not supported, perhaps it is best not to populate these counters in
> sysfs so one can discern between counter not supported and 0 value.
> 
> I'm still working on definitive mthca answer but think the attribute is
> not supported there. Does anyone out there have an mthca setup where
> they can try this ?

Yes.


-- 
Doug Ledford 
  GPG KeyID: 0E572FDD




signature.asc
Description: OpenPGP digital signature


Re: [PATCH for-next V2 00/11] Add RoCE v2 support

2015-12-15 Thread Doug Ledford
On 12/15/2015 02:15 AM, Moni Shoua wrote:
> On Thu, Dec 3, 2015 at 3:47 PM, Matan Barak  wrote:
>> Hi Doug,
>>
>> This series adds the support for RoCE v2. In order to support RoCE v2,
>> we add gid_type attribute to every GID. When the RoCE GID management
>> populates the GID table, it duplicates each GID with all supported types.
>> This gives the user the ability to communicate over each supported
>> type.
>>
>> Patch 0001, 0002 and 0003 add support for multiple GID types to the
>> cache and related APIs. The third patch exposes the GID attributes
>> information is sysfs.
>>
>> Patch 0004 adds the RoCE v2 GID type and the capabilities required
>> from the vendor in order to implement RoCE v2. These capabilities
>> are grouped together as RDMA_CORE_PORT_IBA_ROCE_UDP_ENCAP.
>>
>> RoCE v2 could work at IPv4 and IPv6 networks. When receiving ib_wc, this
>> information should come from the vendor's driver. In case the vendor
>> doesn't supply this information, we parse the packet headers and resolve
>> its network type. Patch 0005 adds this information and required utilities.
>>
>> Patches 0006 and 0007 adds route validation. This is mandatory to ensure
>> that we send packets using GIDS which corresponds to a net-device that
>> can be routed to the destination.
>>
>> Patches 0008 and 0009 add configfs support (and the required
>> infrastructure) for CMA. The administrator should be able to set the
>> default RoCE type. This is done through a new per-port
>> default_roce_mode configfs file.
>>
>> Patch 0010 formats a QP1 packet in order to support RoCE v2 CM
>> packets. This is required for vendors which implement their
>> QP1 as a Raw QP.
>>
>> Patch 0011 adds support for IPv4 multicast as an IPv4 network
>> requires IGMP to be sent in order to join multicast groups.
>>
>> Vendors code aren't part of this patch-set. Soft-Roce will be
>> sent soon and depends on these patches. Other vendors, like
>> mlx4, ocrdma and mlx5 will follow.
>>
>> This patch is applied on "Change per-entry locks in GID cache to table lock"
>> which was sent to the mailing list.
>>
>> Thanks,
>> Matan
>>
>> Changed from V1:
>>  - Rebased against Linux 4.4-rc2 master branch.
>>  - Add route validation
>>  - ConfigFS - avoid compiling INFINIBAND=y and CONFIGFS_FS=m
>>  - Add documentation for configfs and sysfs ABI
>>  - Remove ifindex and gid_type from mcmember
>>
>> Changes from V0:
>>  - Rebased patches against Doug's latest k.o/for-4.4 tree.
>>  - Fixed a bug in configfs (rmdir caused an incorrect free).
>>
>> Matan Barak (8):
>>   IB/core: Add gid_type to gid attribute
>>   IB/cm: Use the source GID index type
>>   IB/core: Add gid attributes to sysfs
>>   IB/core: Add ROCE_UDP_ENCAP (RoCE V2) type
>>   IB/core: Move rdma_is_upper_dev_rcu to header file
>>   IB/core: Validate route in ib_init_ah_from_wc and ib_init_ah_from_path
>>   IB/rdma_cm: Add wrapper for cma reference count
>>   IB/cma: Add configfs for rdma_cm
>>
>> Moni Shoua (2):
>>   IB/core: Initialize UD header structure with IP and UDP headers
>>   IB/cma: Join and leave multicast groups with IGMP
>>
>> Somnath Kotur (1):
>>   IB/core: Add rdma_network_type to wc
>>
>>  Documentation/ABI/testing/configfs-rdma_cm   |  22 ++
>>  Documentation/ABI/testing/sysfs-class-infiniband |  16 ++
>>  drivers/infiniband/Kconfig   |   9 +
>>  drivers/infiniband/core/Makefile |   2 +
>>  drivers/infiniband/core/addr.c   | 185 +
>>  drivers/infiniband/core/cache.c  | 169 
>>  drivers/infiniband/core/cm.c |  31 ++-
>>  drivers/infiniband/core/cma.c| 261 --
>>  drivers/infiniband/core/cma_configfs.c   | 321 
>> +++
>>  drivers/infiniband/core/core_priv.h  |  45 
>>  drivers/infiniband/core/device.c |  10 +-
>>  drivers/infiniband/core/multicast.c  |  17 +-
>>  drivers/infiniband/core/roce_gid_mgmt.c  |  81 --
>>  drivers/infiniband/core/sa_query.c   |  76 +-
>>  drivers/infiniband/core/sysfs.c  | 184 -
>>  drivers/infiniband/core/ud_header.c  | 155 ++-
>>  drivers/infiniband/core/uverbs_marshall.c|   1 +
>>  drivers/infiniband/core/verbs.c  | 170 ++--
>>  drivers/infiniband/hw/mlx4/qp.c  |   7 +-
>>  drivers/infiniband/hw/mthca/mthca_qp.c   |   2 +-
>>  drivers/infiniband/hw/ocrdma/ocrdma_ah.c |   2 +-
>>  include/rdma/ib_addr.h   |  11 +-
>>  include/rdma/ib_cache.h  |   4 +
>>  include/rdma/ib_pack.h   |  45 +++-
>>  include/rdma/ib_sa.h |   3 +
>>  include/rdma/ib_verbs.h  |  78 +-
>>  26 files changed, 1704 insertions(+), 203 deletions(-)
>>  create mode 100644 Documentation/ABI/testing/configfs-rdma

Re: [PATCH 3/3] IB core: Display 64 bit counters from the extended set

2015-12-15 Thread Hal Rosenstock
On 12/15/2015 4:20 PM, Jason Gunthorpe wrote:
>> The unicast/multicast extended counters are not always supported -
>> > depends on setting of PerfMgt ClassPortInfo
>> > CapabilityMask.IsExtendedWidthSupportedNoIETF (bit 10).

> Yes.. certainly this proposed patch needs to account for that and
> continue to use the 32 bit ones in that case.

There are no 32 bit equivalents of those 4 "IETF" counters ([uni
multi]cast [xmit rcv] pkts).

When not supported, perhaps it is best not to populate these counters in
sysfs so one can discern between counter not supported and 0 value.

I'm still working on definitive mthca answer but think the attribute is
not supported there. Does anyone out there have an mthca setup where
they can try this ?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] IB core: Display 64 bit counters from the extended set

2015-12-15 Thread Jason Gunthorpe
On Tue, Dec 15, 2015 at 03:01:03PM -0500, Hal Rosenstock wrote:
> > I would agree, from my observation, that the two main byte counters
> > are always available.
> 
> The extended packet counts work but I thought there was a PMA with one
> of the extended byte counts wired to 0.

Can't remember

> > But not necessarily the others.
> 
> The unicast/multicast extended counters are not always supported -
> depends on setting of PerfMgt ClassPortInfo
> CapabilityMask.IsExtendedWidthSupportedNoIETF (bit 10).

Yes.. certainly this proposed patch needs to account for that and
continue to use the 32 bit ones in that case.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] IB core: Display 64 bit counters from the extended set

2015-12-15 Thread Hal Rosenstock
On 12/15/2015 2:55 PM, Jason Gunthorpe wrote:
> On Tue, Dec 15, 2015 at 01:51:35PM -0600, Christoph Lameter wrote:
>> On Mon, 14 Dec 2015, Hal Rosenstock wrote:
>>
 Mellanox should really confirm this for their hardware matrix.
>>>
>>> I am trying to get definitive answer to this.
>>
>> I was told today on a conf call with a couple of Mellanox employees that
>> extended counters are always available.
> 
> .. and every field is valid?
> 
> I would agree, from my observation, that the two main byte counters
> are always available.

The extended packet counts work but I thought there was a PMA with one
of the extended byte counts wired to 0.

> But not necessarily the others.

The unicast/multicast extended counters are not always supported -
depends on setting of PerfMgt ClassPortInfo
CapabilityMask.IsExtendedWidthSupportedNoIETF (bit 10).

-- Hal

> Jason 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] IB core: Display 64 bit counters from the extended set

2015-12-15 Thread Jason Gunthorpe
On Tue, Dec 15, 2015 at 01:51:35PM -0600, Christoph Lameter wrote:
> On Mon, 14 Dec 2015, Hal Rosenstock wrote:
> 
> > > Mellanox should really confirm this for their hardware matrix.
> >
> > I am trying to get definitive answer to this.
> 
> I was told today on a conf call with a couple of Mellanox employees that
> extended counters are always available.

.. and every field is valid?

I would agree, from my observation, that the two main byte counters
are always available.. But not necessarily the others.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] IB core: Display 64 bit counters from the extended set

2015-12-15 Thread Christoph Lameter
On Mon, 14 Dec 2015, Hal Rosenstock wrote:

> > Mellanox should really confirm this for their hardware matrix.
>
> I am trying to get definitive answer to this.

I was told today on a conf call with a couple of Mellanox employees that
extended counters are always available.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 00/11] NFS/RDMA client patches for 4.5

2015-12-15 Thread Anna Schumaker
Thanks, Chuck!

Everything looks okay to me, so I'll apply these patches and send them to Trond 
before the holiday.

On 12/14/2015 04:17 PM, Chuck Lever wrote:
> For 4.5, I'd like to address the send queue accounting and
> invalidation/unmap ordering issues Jason brought up a couple of
> months ago.
> 
> In preparation for Doug's final topic branch, Anna, I've rebased
> these on Christoph's ib_device_attr branch, but there were no merge
> conflicts or other changes needed. Could you begin preparing these
> for linux-next and other final testing and review?

No merge conflicts is nice, and we might not need to worry about ordering the 
pull request.

Thanks,
Anna

> 
> Also available in the "nfs-rdma-for-4.5" topic branch of this git repo:
> 
> git://git.linux-nfs.org/projects/cel/cel-2.6.git
> 
> Or for browsing:
> 
> http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=log;h=refs/heads/nfs-rdma-for-4.5
> 
> 
> Changes since v2:
> - Rebased on Christoph's ib_device_attr branch
> 
> 
> Changes since v1:
> 
> - Rebased on v4.4-rc3
> - Receive buffer safety margin patch dropped
> - Backchannel pr_err and pr_info converted to dprintk
> - Backchannel spin locks converted to work queue-safe locks
> - Fixed premature release of backchannel request buffer
> - NFSv4.1 callbacks tested with for-4.5 server
> 
> ---
> 
> Chuck Lever (11):
>   xprtrdma: Fix additional uses of spin_lock_irqsave(rb_lock)
>   xprtrdma: xprt_rdma_free() must not release backchannel reqs
>   xprtrdma: Disable RPC/RDMA backchannel debugging messages
>   xprtrdma: Move struct ib_send_wr off the stack
>   xprtrdma: Introduce ro_unmap_sync method
>   xprtrdma: Add ro_unmap_sync method for FRWR
>   xprtrdma: Add ro_unmap_sync method for FMR
>   xprtrdma: Add ro_unmap_sync method for all-physical registration
>   SUNRPC: Introduce xprt_commit_rqst()
>   xprtrdma: Invalidate in the RPC reply handler
>   xprtrdma: Revert commit e7104a2a9606 ('xprtrdma: Cap req_cqinit').
> 
> 
>  include/linux/sunrpc/xprt.h|1 
>  net/sunrpc/xprt.c  |   14 +++
>  net/sunrpc/xprtrdma/backchannel.c  |   22 ++---
>  net/sunrpc/xprtrdma/fmr_ops.c  |   64 +
>  net/sunrpc/xprtrdma/frwr_ops.c |  175 
> +++-
>  net/sunrpc/xprtrdma/physical_ops.c |   13 +++
>  net/sunrpc/xprtrdma/rpc_rdma.c |   14 +++
>  net/sunrpc/xprtrdma/transport.c|3 +
>  net/sunrpc/xprtrdma/verbs.c|   13 +--
>  net/sunrpc/xprtrdma/xprt_rdma.h|   12 +-
>  10 files changed, 283 insertions(+), 48 deletions(-)
> 
> --
> Chuck Lever
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: device attr cleanup

2015-12-15 Thread Doug Ledford
On 12/08/2015 06:15 PM, Or Gerlitz wrote:
> On Wed, Dec 9, 2015 at 1:13 AM, Or Gerlitz  wrote:
>> On Wed, Dec 9, 2015 at 12:59 AM, Jason Gunthorpe
>>  wrote:
>>> On Wed, Dec 09, 2015 at 12:47:55AM +0200, Or Gerlitz wrote:
>>
 The patch is three liner to add the cached attrs --
 http://marc.info/?l=linux-rdma&m=142309296813985&w=2 -- if you are OK
 with that, I will add a 2nd patch that ports all ULPs to use the
 cached copy instead of their code which does the query.
>>
 Actually, why not start with this approach and later decide if we need
 to go further of this is enough?
>>
>>> Or, can we please stop this bikeshedding. Christoph's patch is done,
>>> well tested and does a lot more clean up that this hacky three liner.
>>
>> Christoph patch is here indeed, it does two things
>>
>> 1. remove all the ULP device attr alloc, device query, attr free hassle
>> 2. adds tons of new fields to struct ib_device
>>
>> I think it just goes too much and needlessly adds tons of these new
>> fields directly to struct ib_device where we can have them all well
>> scoped into ib_device_attr member or pointer from struct ib_device
>>
>>> It is a good patch,
>>
>> I didn't say it's bad, I said I think we can achieve #1 above with way
>> less drastic patch, and I'd like to hear the maintainer option, and
>> have the chance to argu with the maintainer if needed, yours I heard,
>> you like it, I know.
> 
> and I will be OOO for the rest of this week, since this is in the air
> for ***months***, it would be fair enough to ask the maintainer not to
> cut it this way or another before next week, Doug, agree?

Or, you specifically asked me to wait until this week.  I made my
initial impressions clear (I don't necessarily like the removal of the
attr struct, but I like the removal of all of the query calls, and I'm
inclined to take the patch in spite of not liking the removal of the
struct).  Do you have anything to add or have we beat this horse to death?


-- 
Doug Ledford 
  GPG KeyID: 0E572FDD




signature.asc
Description: OpenPGP digital signature


Re: [PATCH RESEND] infiniband:core:Add needed error path in cm_init_av_by_path

2015-12-15 Thread Jason Gunthorpe
On Tue, Dec 15, 2015 at 05:38:34PM +0100, Michael Wang wrote:
> The hop_limit is only suggest that the package allowed to be
> routed, not have to, correct?

If the hop limit is >= 2 (?) then the GRH is mandatory. The
SM will return this information in the PathRecord if it determines a
GRH is required. The whole stack follows this protocol.

The GRH is optional for in-subnet communications.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/ipoib: Change sendq threshold size

2015-12-15 Thread Doug Ledford
On 11/17/2015 01:45 AM, Erez Shitrit wrote:
> On Mon, Nov 16, 2015 at 8:54 PM, Yuval Shaia  wrote:
>> Expecting half of the queue to be empty before reopening netif_queue seems
>> too high.
>> With this fix threshold will be 90%.
> 
> Is this backed by tests? why not 75% or 25%?

Much like Erez, I suspect this is a bad change.  Unless there are
performance numbers to show that under heavy usage this is actually a
positive change, I'm not inclined to take it.


-- 
Doug Ledford 
  GPG KeyID: 0E572FDD




signature.asc
Description: OpenPGP digital signature


[PATCH for-next V2 4/5] IB/mlx5: Add hca_core_clock_offset to udata in init_ucontext

2015-12-15 Thread Matan Barak
Pass hca_core_clock_offset to user-space is mandatory in order to
let the user-space read the free-running clock register from the
right offset in the memory mapped page.
Passing this value is done by changing the vendor's command
and response of init_ucontext to be in extensible form.

Signed-off-by: Matan Barak 
Reviewed-by: Moshe Lazer 
---
 drivers/infiniband/hw/mlx5/main.c| 37 
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  3 +++
 drivers/infiniband/hw/mlx5/user.h| 12 ++--
 include/linux/mlx5/device.h  |  7 +--
 4 files changed, 47 insertions(+), 12 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index c707c43..917363c 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -582,8 +582,8 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct 
ib_device *ibdev,
  struct ib_udata *udata)
 {
struct mlx5_ib_dev *dev = to_mdev(ibdev);
-   struct mlx5_ib_alloc_ucontext_req_v2 req;
-   struct mlx5_ib_alloc_ucontext_resp resp;
+   struct mlx5_ib_alloc_ucontext_req_v2 req = {};
+   struct mlx5_ib_alloc_ucontext_resp resp = {};
struct mlx5_ib_ucontext *context;
struct mlx5_uuar_info *uuari;
struct mlx5_uar *uars;
@@ -598,20 +598,19 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct 
ib_device *ibdev,
if (!dev->ib_active)
return ERR_PTR(-EAGAIN);
 
-   memset(&req, 0, sizeof(req));
reqlen = udata->inlen - sizeof(struct ib_uverbs_cmd_hdr);
if (reqlen == sizeof(struct mlx5_ib_alloc_ucontext_req))
ver = 0;
-   else if (reqlen == sizeof(struct mlx5_ib_alloc_ucontext_req_v2))
+   else if (reqlen >= sizeof(struct mlx5_ib_alloc_ucontext_req_v2))
ver = 2;
else
return ERR_PTR(-EINVAL);
 
-   err = ib_copy_from_udata(&req, udata, reqlen);
+   err = ib_copy_from_udata(&req, udata, min(reqlen, sizeof(req)));
if (err)
return ERR_PTR(err);
 
-   if (req.flags || req.reserved)
+   if (req.flags)
return ERR_PTR(-EINVAL);
 
if (req.total_num_uuars > MLX5_MAX_UUARS)
@@ -620,6 +619,14 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct 
ib_device *ibdev,
if (req.total_num_uuars == 0)
return ERR_PTR(-EINVAL);
 
+   if (req.comp_mask)
+   return ERR_PTR(-EOPNOTSUPP);
+
+   if (reqlen > sizeof(req) &&
+   !ib_is_udata_cleared(udata, sizeof(req),
+udata->inlen - sizeof(req)))
+   return ERR_PTR(-EOPNOTSUPP);
+
req.total_num_uuars = ALIGN(req.total_num_uuars,
MLX5_NON_FP_BF_REGS_PER_PAGE);
if (req.num_low_latency_uuars > req.total_num_uuars - 1)
@@ -635,6 +642,8 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct 
ib_device *ibdev,
resp.max_send_wqebb = 1 << MLX5_CAP_GEN(dev->mdev, log_max_qp_sz);
resp.max_recv_wr = 1 << MLX5_CAP_GEN(dev->mdev, log_max_qp_sz);
resp.max_srq_recv_wr = 1 << MLX5_CAP_GEN(dev->mdev, log_max_srq_sz);
+   resp.response_length = min(offsetof(typeof(resp), response_length) +
+  sizeof(resp.response_length), udata->outlen);
 
context = kzalloc(sizeof(*context), GFP_KERNEL);
if (!context)
@@ -685,8 +694,20 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct 
ib_device *ibdev,
 
resp.tot_uuars = req.total_num_uuars;
resp.num_ports = MLX5_CAP_GEN(dev->mdev, num_ports);
-   err = ib_copy_to_udata(udata, &resp,
-  sizeof(resp) - sizeof(resp.reserved));
+
+   if (field_avail(typeof(resp), reserved2, udata->outlen))
+   resp.response_length += sizeof(resp.reserved2);
+
+   if (field_avail(typeof(resp), hca_core_clock_offset, udata->outlen)) {
+   resp.comp_mask |=
+   MLX5_IB_ALLOC_UCONTEXT_RESP_MASK_CORE_CLOCK_OFFSET;
+   resp.hca_core_clock_offset =
+   offsetof(struct mlx5_init_seg, internal_timer_h) %
+   PAGE_SIZE;
+   resp.response_length += sizeof(resp.hca_core_clock_offset);
+   }
+
+   err = ib_copy_to_udata(udata, &resp, resp.response_length);
if (err)
goto out_uars;
 
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 6333472..43b3c58 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -55,6 +55,9 @@ pr_err("%s:%s:%d:(pid %d): " format, (dev)->ib_dev.name, 
__func__,\
 pr_warn("%s:%s:%d:(pid %d): " format, (dev)->ib_dev.name, __func__,\
__LINE__, current->pid, ##arg)
 
+#define field_avail(type, fld, sz) (offsetof(type, fld) +  \
+   

[PATCH for-next V2 0/5] User-space time-stamping support for mlx5_ib

2015-12-15 Thread Matan Barak
Hi Eli,

This patch-set adds user-space support for time-stamping in mlx5_ib.
It implements the necessary API:
(a) ib_create_cq_ex - Add support for CQ creation flags
(b) ib_query_device - return timestamp_mask and hca_core_clock.

We also add support for mmaping the HCA's free running clock.
In order to do so, we use the response of the vendor's extended
part in init_ucontext. This allows us to pass the page offset
of the free running clock register to the user-space driver.
In order to implement it in a future extensible manner, we  use the
same mechanism of verbs extensions to the mlx5 vendor part as well.

Regards,
Matan

Changes from v1:
 * Change ib_is_udata_cleared to use memchr_inv.

Changes from v0:
 * Limit mmap PAGE_SIZE to 4K (security wise).
 * Optimize ib_is_udata_cleared.
 * Pass hca_core_clock_offset in the vendor's response part of init_ucontext.

Matan Barak (5):
  IB/mlx5: Add create_cq extended command
  IB/core: Add ib_is_udata_cleared
  IB/mlx5: Add support for hca_core_clock and timestamp_mask
  IB/mlx5: Add hca_core_clock_offset to udata in init_ucontext
  IB/mlx5: Mmap the HCA's core clock register to user-space

 drivers/infiniband/hw/mlx5/cq.c  |  7 
 drivers/infiniband/hw/mlx5/main.c| 67 +++-
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  7 +++-
 drivers/infiniband/hw/mlx5/user.h| 12 +--
 include/linux/mlx5/device.h  |  7 ++--
 include/linux/mlx5/mlx5_ifc.h|  9 +++--
 include/rdma/ib_verbs.h  | 27 +++
 7 files changed, 120 insertions(+), 16 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH for-next V2 1/5] IB/mlx5: Add create_cq extended command

2015-12-15 Thread Matan Barak
In order to create a CQ that supports timestamp, mlx5 needs to
support the extended create CQ command with the timestamp flag.

Signed-off-by: Matan Barak 
Reviewed-by: Eli Cohen 
---
 drivers/infiniband/hw/mlx5/cq.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index 3dfd287..186debf 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -743,6 +743,10 @@ static void destroy_cq_kernel(struct mlx5_ib_dev *dev, 
struct mlx5_ib_cq *cq)
mlx5_db_free(dev->mdev, &cq->db);
 }
 
+enum {
+   CQ_CREATE_FLAGS_SUPPORTED = IB_CQ_FLAGS_TIMESTAMP_COMPLETION
+};
+
 struct ib_cq *mlx5_ib_create_cq(struct ib_device *ibdev,
const struct ib_cq_init_attr *attr,
struct ib_ucontext *context,
@@ -766,6 +770,9 @@ struct ib_cq *mlx5_ib_create_cq(struct ib_device *ibdev,
if (entries < 0)
return ERR_PTR(-EINVAL);
 
+   if (attr->flags & ~CQ_CREATE_FLAGS_SUPPORTED)
+   return ERR_PTR(-EOPNOTSUPP);
+
entries = roundup_pow_of_two(entries + 1);
if (entries > (1 << MLX5_CAP_GEN(dev->mdev, log_max_cq_sz)))
return ERR_PTR(-EINVAL);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH for-next V2 2/5] IB/core: Add ib_is_udata_cleared

2015-12-15 Thread Matan Barak
Extending core and vendor verb commands require us to check that the
unknown part of the user's given command is all zeros.
Adding ib_is_udata_cleared in order to do so.

Signed-off-by: Matan Barak 
---
 include/rdma/ib_verbs.h | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 9a68a19..33ab4eb 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -50,6 +50,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
@@ -1887,6 +1889,31 @@ static inline int ib_copy_to_udata(struct ib_udata 
*udata, void *src, size_t len
return copy_to_user(udata->outbuf, src, len) ? -EFAULT : 0;
 }
 
+static inline bool ib_is_udata_cleared(struct ib_udata *udata,
+  size_t offset,
+  size_t len)
+{
+   const void __user *p = udata->inbuf + offset;
+   bool ret = false;
+   u8 *buf;
+
+   if (len > USHRT_MAX)
+   return false;
+
+   buf = kmalloc(len, GFP_KERNEL);
+   if (!buf)
+   return false;
+
+   if (copy_from_user(buf, p, len))
+   goto free;
+
+   ret = !memchr_inv(buf, 0, len);
+
+free:
+   kfree(buf);
+   return ret;
+}
+
 /**
  * ib_modify_qp_is_ok - Check that the supplied attribute mask
  * contains all required attributes and no attributes not allowed for
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: device attr cleanup (was: Handle mlx4 max_sge_rd correctly)

2015-12-15 Thread Anna Schumaker
On Thu, Dec 10, 2015 at 7:49 PM, Chuck Lever  wrote:
>
>> On Dec 10, 2015, at 6:30 PM, Christoph Hellwig  wrote:
>>
>> On Thu, Dec 10, 2015 at 11:07:03AM -0700, Jason Gunthorpe wrote:
>>> The ARM folks do this sort of stuff on a regular basis.. Very early on
>>> Doug prepares a topic branch with only the big change, NFS folks pull
>>> it and then pull your work. Then Doug would send the topic branch to
>>> Linus as soon as the merge window opens, then NFS would send theirs.
>>>
>>> This is alot less work overall than trying to sequence multiple
>>> patches over multiple releases..
>>
>> Agreed.  Staging has alaways been a giant pain and things tend to never
>> finish moving over that way if they are non-trivial enough.
>
> In that case:
>
> You need to make sure you have all the right Acks. I've added
> Anna and Bruce to Ack the NFS-related portions. Santosh should
> Ack the RDS part.

The NFS client parts look okay to me:

Acked-by: Anna Schumaker 

>
> http://git.infradead.org/users/hch/rdma.git/shortlog/refs/heads/ib_device_attr
>
> Given the proximity to the holidays and the next merge window,
> Doug will need to get a properly-acked topic branch published
> in the next day or two so the rest of us can rebase and start
> testing before the relevant parties disappear for the holiday.
>
>
> --
> Chuck Lever
>
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH for-next V2 5/5] IB/mlx5: Mmap the HCA's core clock register to user-space

2015-12-15 Thread Matan Barak
In order to read the HCA's current cycles register, we need
to map it to user-space. Add support to map this register
via mmap command.

Signed-off-by: Matan Barak 
Reviewed-by: Moshe Lazer 
---
 drivers/infiniband/hw/mlx5/main.c| 28 
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  4 +++-
 2 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 917363c..d037a72 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -810,6 +810,34 @@ static int mlx5_ib_mmap(struct ib_ucontext *ibcontext, 
struct vm_area_struct *vm
case MLX5_IB_MMAP_GET_CONTIGUOUS_PAGES:
return -ENOSYS;
 
+   case MLX5_IB_MMAP_CORE_CLOCK:
+   {
+   phys_addr_t pfn;
+
+   if (vma->vm_end - vma->vm_start != PAGE_SIZE)
+   return -EINVAL;
+
+   if (vma->vm_flags & (VM_WRITE | VM_EXEC))
+   return -EPERM;
+
+   /* Don't expose to user-space information it shouldn't have */
+   if (PAGE_SIZE > 4096)
+   return -EOPNOTSUPP;
+
+   vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
+   pfn = (dev->mdev->iseg_base +
+  offsetof(struct mlx5_init_seg, internal_timer_h)) >>
+   PAGE_SHIFT;
+   if (io_remap_pfn_range(vma, vma->vm_start, pfn,
+  PAGE_SIZE, vma->vm_page_prot))
+   return -EAGAIN;
+
+   mlx5_ib_dbg(dev, "mapped internal timer at 0x%lx, PA 0x%llx\n",
+   vma->vm_start,
+   (unsigned long long)pfn << PAGE_SHIFT);
+   break;
+   }
+
default:
return -EINVAL;
}
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 43b3c58..a87f312 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -65,7 +65,9 @@ enum {
 
 enum mlx5_ib_mmap_cmd {
MLX5_IB_MMAP_REGULAR_PAGE   = 0,
-   MLX5_IB_MMAP_GET_CONTIGUOUS_PAGES   = 1, /* always last */
+   MLX5_IB_MMAP_GET_CONTIGUOUS_PAGES   = 1,
+   /* 5 is chosen in order to be compatible with old versions of libmlx5 */
+   MLX5_IB_MMAP_CORE_CLOCK = 5,
 };
 
 enum {
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH for-next V2 3/5] IB/mlx5: Add support for hca_core_clock and timestamp_mask

2015-12-15 Thread Matan Barak
Reporting the hca_core_clock (in kHZ) and the timestamp_mask in
query_device extended verb. timestamp_mask is used by users in order
to know what is the valid range of the raw timestamps, while
hca_core_clock reports the clock frequency that is used for
timestamps.

Signed-off-by: Matan Barak 
Reviewed-by: Moshe Lazer 
---
 drivers/infiniband/hw/mlx5/main.c | 2 ++
 include/linux/mlx5/mlx5_ifc.h | 9 ++---
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 7e97cb5..c707c43 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -293,6 +293,8 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
props->max_total_mcast_qp_attach = props->max_mcast_qp_attach *
   props->max_mcast_grp;
props->max_map_per_fmr = INT_MAX; /* no limit in ConnectIB */
+   props->hca_core_clock = MLX5_CAP_GEN(mdev, device_frequency_khz);
+   props->timestamp_mask = 0x7FFFULL;
 
 #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
if (MLX5_CAP_GEN(mdev, pg))
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 1565324..1a74114 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -794,15 +794,18 @@ struct mlx5_ifc_cmd_hca_cap_bits {
u8 reserved_63[0x8];
u8 log_uar_page_sz[0x10];
 
-   u8 reserved_64[0x100];
+   u8 reserved_64[0x20];
+   u8 device_frequency_mhz[0x20];
+   u8 device_frequency_khz[0x20];
+   u8 reserved_65[0xa0];
 
-   u8 reserved_65[0x1f];
+   u8 reserved_66[0x1f];
u8 cqe_zip[0x1];
 
u8 cqe_zip_timeout[0x10];
u8 cqe_zip_max_num[0x10];
 
-   u8 reserved_66[0x220];
+   u8 reserved_67[0x220];
 };
 
 enum {
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: device attr cleanup (was: Handle mlx4 max_sge_rd correctly)

2015-12-15 Thread santosh shilimkar

On 12/9/2015 10:42 AM, Christoph Hellwig wrote:

On Tue, Dec 08, 2015 at 07:52:03PM -0500, ira.weiny wrote:

Searching patchworks...

I'm a bit worried about the size of the patch and I would like to see it split
up for review.  But I agree Christophs method is better long term.


I'd be happy to split it up if I could see a way to split it.  So if
anyone has an idea you're welcome!


Christoph do you have this on github somewhere?  Perhaps it is split but I'm
not finding in on patchworks?


No need for github, we have much better (and older) git hosting sites :)

http://git.infradead.org/users/hch/rdma.git/shortlog/refs/heads/ib_device_attr


net/rds/ib.c   |  34 ++---
net/rds/iw.c   |  23 +--

For RDS changes,
Acked-by: Santosh Shilimlkar 

I will try and find some time to test them out. Thanks !!
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND] infiniband:core:Add needed error path in cm_init_av_by_path

2015-12-15 Thread Michael Wang


On 12/15/2015 04:52 PM, Nicholas Krause wrote:
> This adds a needed error path in the function, cm_init_av_by_path
> after the call to ib_init_ah_from_path in order to avoid incorrectly
> accessing the path pointer of structure type ib_sa_path_rec if this
> function call fails to complete its intended work successfully by
> returning a error code.
> 
> Signed-off-by: Nicholas Krause 
> ---
>  drivers/infiniband/core/cm.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
> index 0a26dd6..e9b36ea 100644
> --- a/drivers/infiniband/core/cm.c
> +++ b/drivers/infiniband/core/cm.c
> @@ -383,8 +383,11 @@ static int cm_init_av_by_path(struct ib_sa_path_rec 
> *path, struct cm_av *av)
>   return ret;
>  
>   av->port = port;
> - ib_init_ah_from_path(cm_dev->ib_device, port->port_num, path,
> -  &av->ah_attr);
> + ret = ib_init_ah_from_path(cm_dev->ib_device, port->port_num, path,
> +&av->ah_attr);

..Just wondering what if the transport don't require GRH?
eg inside the same subnet?

The hop_limit is only suggest that the package allowed to be
routed, not have to, correct?

Regards,
Michael Wang

> + if (ret)
> + return ret;
> +
>   av->timeout = path->packet_life_time + 1;
>  
>   return 0;
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next 6/6] IB/mlx5: Add Raw Packet Queue Pair (QP) support

2015-12-15 Thread Or Gerlitz

On 12/10/2015 10:09 PM, Majd Dibbiny wrote:

This patchs adds support for Raw Packet QP for the mlx5 device.

Raw Packet QP, unlike other QP types, has no matching mlx5_core_qp
object but rather it is built of RQ/SQ/TIR/TIS/TD mlx5_core object.

The Raw Packet QP state changes are implemented by changing the
state of the sub-objects.

Since the SQ and RQ work-queue (WQ) buffers are not contiguous like
other QPs, we allocate separate buffers in the user-space and pass
the address of each one of them separately to the kernel.

Signed-off-by: Majd Dibbiny
---
  drivers/infiniband/hw/mlx5/main.c  |  15 +-
  drivers/infiniband/hw/mlx5/mlx5_ib.h   |  52 +-
  drivers/infiniband/hw/mlx5/qp.c| 894 ++---
  drivers/infiniband/hw/mlx5/user.h  |   1 +
  drivers/net/ethernet/mellanox/mlx5/core/qp.c   |  48 +-
  drivers/net/ethernet/mellanox/mlx5/core/transobj.c |  39 +
  include/linux/mlx5/mlx5_ifc.h  |  11 +-
  include/linux/mlx5/qp.h|   3 +-
  include/linux/mlx5/transobj.h  |   4 +
  9 files changed, 913 insertions(+), 154 deletions(-)


Seems way too big for single patch, seek how to split to 2-3 patches

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next V1 2/5] IB/core: Add ib_is_udata_cleared

2015-12-15 Thread Haggai Eran
On 14/12/2015 18:31, Matan Barak wrote:
> I'm not sure regarding the string.c location, as it deals with user
> buffers, but in order not to
> be dependent on this, I'll change this code to the following.
> 
> static inline bool ib_is_udata_cleared(struct ib_udata *udata,
>u8 cleared_char,
>size_t offset,
>size_t len)
> {
> const void __user *p = udata->inbuf + offset;
> bool ret = false;
> u8 *buf;
> 
> if (len > USHRT_MAX)
> return false;
> 
> buf = kmalloc(len, GFP_KERNEL);
> if (!buf)
> return false;
> 
> if (copy_from_user(buf, p, len))
> goto free;
> 
> ret = !memchr_inv(buf, cleared_char, len);
> 
> free:
> kfree(buf);
> return ret;
> }

Looks good to me.

Thanks,
Haggai

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 109391] New: Packet and byte counters only 32bit

2015-12-15 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=109391

Bug ID: 109391
   Summary: Packet and byte counters only 32bit
   Product: Drivers
   Version: 2.5
Kernel Version: git HEAD
  Hardware: All
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: Infiniband/RDMA
  Assignee: drivers_infiniband-r...@kernel-bugs.osdl.org
  Reporter: philipp.ma...@linbit.com
Regression: No

Trying to find packet and byte counters for the RDMA transports I stumbled upon
this directory:

/sys/class/infiniband/mlx5_1/ports/1/counters/

This seems to have all the requested data; but, looking again, I can see

port_rcv_data: 4294967295

which looks awful.


And indeed, in 

   
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/infiniband/core/sysfs.c#n401

there's a 32bit width, with explicit code in

   
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/infiniband/hw/qib/qib_mad.c#n1507

to cap on that limit...


That is useless for the upcoming speeds. The MLX5 driver can drive Connect-X4
cards which go 100GBit/sec; that is 11GByte/sec, which means that the 32bit
counter will overflow ~3 times a second. So it can't be monitored.


Please provide 64bit counters (for packets and bytes, at least)!

Thanks a lot.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html