[PATCH 1/8] vfio/mdev: Fix to not do put_device on device_register failure

2019-03-22 Thread Parav Pandit
device_register() performs put_device() if device_add() fails.
This balances with device_initialize().

mdev core performing put_device() when device_register() fails,
is an error that puts already released device again.
Therefore, don't put the device on error.

Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
Signed-off-by: Parav Pandit 
---
 drivers/vfio/mdev/mdev_core.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 0212f0e..3e5880a 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -318,10 +318,8 @@ int mdev_device_create(struct kobject *kobj, struct device 
*dev, uuid_le uuid)
dev_set_name(>dev, "%pUl", uuid.b);
 
ret = device_register(>dev);
-   if (ret) {
-   put_device(>dev);
+   if (ret)
goto mdev_fail;
-   }
 
ret = mdev_device_create_ops(kobj, mdev);
if (ret)
-- 
1.8.3.1



[PATCH 2/8] vfio/mdev: Avoid release parent reference during error path

2019-03-22 Thread Parav Pandit
During mdev parent registration in mdev_register_device(),
if parent device is duplicate, it releases the reference of existing
parent device.
This is incorrect. Existing parent device should not be touched.

Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
Signed-off-by: Parav Pandit 
---
 drivers/vfio/mdev/mdev_core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 3e5880a..4f213e4d 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -182,6 +182,7 @@ int mdev_register_device(struct device *dev, const struct 
mdev_parent_ops *ops)
/* Check for duplicate */
parent = __find_parent_device(dev);
if (parent) {
+   parent = NULL;
ret = -EEXIST;
goto add_dev_err;
}
-- 
1.8.3.1



[PATCH 5/8] vfio/mdev: Avoid masking error code to EBUSY

2019-03-22 Thread Parav Pandit
Instead of masking return error to -EBUSY, return actual error
returned by the driver.

Signed-off-by: Parav Pandit 
---
 drivers/vfio/mdev/mdev_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 3d91f62..ab05464 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -142,7 +142,7 @@ static int mdev_device_remove_ops(struct mdev_device *mdev, 
bool force_remove)
 */
ret = parent->ops->remove(mdev);
if (ret && !force_remove)
-   return -EBUSY;
+   return ret;
 
sysfs_remove_groups(>dev.kobj, parent->ops->mdev_attr_groups);
return 0;
-- 
1.8.3.1



[PATCH 7/8] vfio/mdev: Fix aborting mdev child device removal if one fails

2019-03-22 Thread Parav Pandit
device_for_each_child() stops executing callback function for remaining
child devices, if callback hits an error.
Each child mdev device is independent of each other.
While unregistering parent device, mdev core must remove all child mdev
devices.
Therefore, mdev_device_remove_cb() always returns success so that
device_for_each_child doesn't abort if one child removal hits error.

While at it, improve remove and unregister functions for below simplicity.

There isn't need to pass forced flag pointer during mdev parent
removal which invokes mdev_device_remove(). So simplify the flow.

mdev_device_remove() is called from two paths.
1. mdev_unregister_driver()
 mdev_device_remove_cb()
   mdev_device_remove()
2. remove_store()
 mdev_device_remove()

When device is removed by user using remote_store(), device under
removal is mdev device.
When device is removed during parent device removal using generic child
iterator, mdev check is already done using dev_is_mdev().

Hence, remove the unnecessary loop in mdev_device_remove().

Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
Signed-off-by: Parav Pandit 
---
 drivers/vfio/mdev/mdev_core.c | 24 +---
 1 file changed, 5 insertions(+), 19 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index ab05464..944a058 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -150,10 +150,10 @@ static int mdev_device_remove_ops(struct mdev_device 
*mdev, bool force_remove)
 
 static int mdev_device_remove_cb(struct device *dev, void *data)
 {
-   if (!dev_is_mdev(dev))
-   return 0;
+   if (dev_is_mdev(dev))
+   mdev_device_remove(dev, true);
 
-   return mdev_device_remove(dev, data ? *(bool *)data : true);
+   return 0;
 }
 
 /*
@@ -241,7 +241,6 @@ int mdev_register_device(struct device *dev, const struct 
mdev_parent_ops *ops)
 void mdev_unregister_device(struct device *dev)
 {
struct mdev_parent *parent;
-   bool force_remove = true;
 
mutex_lock(_list_lock);
parent = __find_parent_device(dev);
@@ -255,8 +254,7 @@ void mdev_unregister_device(struct device *dev)
list_del(>next);
class_compat_remove_link(mdev_bus_compat_class, dev, NULL);
 
-   device_for_each_child(dev, (void *)_remove,
- mdev_device_remove_cb);
+   device_for_each_child(dev, NULL, mdev_device_remove_cb);
 
parent_remove_sysfs_files(parent);
 
@@ -346,24 +344,12 @@ int mdev_device_create(struct kobject *kobj, struct 
device *dev, uuid_le uuid)
 
 int mdev_device_remove(struct device *dev, bool force_remove)
 {
-   struct mdev_device *mdev, *tmp;
+   struct mdev_device *mdev;
struct mdev_parent *parent;
struct mdev_type *type;
int ret;
 
mdev = to_mdev_device(dev);
-
-   mutex_lock(_list_lock);
-   list_for_each_entry(tmp, _list, next) {
-   if (tmp == mdev)
-   break;
-   }
-
-   if (tmp != mdev) {
-   mutex_unlock(_list_lock);
-   return -ENODEV;
-   }
-
if (!mdev->active) {
mutex_unlock(_list_lock);
return -EAGAIN;
-- 
1.8.3.1



[PATCH 0/8] vfio/mdev: Improve vfio/mdev core module

2019-03-22 Thread Parav Pandit
As we would like to use mdev subsystem for wider use case as
discussed in [1], [2] apart from an offline discussion.
This use case is also discussed with wider forum in [4] in track
'Lightweight NIC HW functions for container offload use cases'.

This series is prep-work and improves vfio/mdev module in following ways.

Patch-1 and 2 Fixes releasing parent dev reference during error unwinding
of mdev create and mdev parent registration.
Patch-3 Simplifies mdev device for unused kref.
Patch-4 Drops redundant extern prefix of exported symbols.
Patch-5 Returns right error code from vendor driver.
Patch-6 Fixes to use right sysfs remove sequence.
Patch-7 Fixes removing all child devices if one of them fails.
Patch 8 Brings improvements to mdev in following ways.

1. Fix race conditions among mdev parent's create(), remove() and
mdev parent unregistration routines that leads to call traces.

2. Setup vendor mdev device before placing the device on mdev bus.
This ensures that vfio_mdev or any other module that accesses mdev,
is rightly in any of the callbacks of mdev_register_driver().
This follows Linux driver model now.
Similarly follow exact reverse remove sequence, i.e. to take away the
device first from the bus before removing underlying hardware mdev.

This series is tested using
(a) mtty with VM using vfio_mdev driver for positive tests.
(b) mtty with vfio_mdev with error race condition cases of create,
remove and mtty driver.
(c) mlx5 core driver using RFC patches [3] and internal patches.
Internal patches are large and cannot be combined with this
prep-work patches. It will posted once prep-work completes.

[1] https://www.spinics.net/lists/netdev/msg556978.html
[2] https://lkml.org/lkml/2019/3/7/696
[3] https://lkml.org/lkml/2019/3/8/819
[4] https://netdevconf.org/0x13/session.html?workshop-hardware-offload


Parav Pandit (8):
  vfio/mdev: Fix to not do put_device on device_register failure
  vfio/mdev: Avoid release parent reference during error path
  vfio/mdev: Removed unused kref
  vfio/mdev: Drop redundant extern for exported symbols
  vfio/mdev: Avoid masking error code to EBUSY
  vfio/mdev: Follow correct remove sequence
  vfio/mdev: Fix aborting mdev child device removal if one fails
  vfio/mdev: Improve the create/remove sequence

 drivers/vfio/mdev/mdev_core.c| 164 +++
 drivers/vfio/mdev/mdev_private.h |   8 +-
 drivers/vfio/mdev/mdev_sysfs.c   |   8 +-
 include/linux/mdev.h |  21 +++--
 4 files changed, 98 insertions(+), 103 deletions(-)

-- 
1.8.3.1



[PATCH 8/8] vfio/mdev: Improve the create/remove sequence

2019-03-22 Thread Parav Pandit
ve flow, first remove the device from the bus. This
ensures that any bus specific devices and data is cleared.
Once device is taken of the mdev bus, perform remove() of mdev from the
vendor driver.

3. Linux core device model provides way to register and auto unregister
the device sysfs attribute groups at dev->groups.
Make use of this groups to let core create the groups and simplify code
to avoid explicit groups creation and removal.

4. Wait for any ongoing mdev create() and remove() to finish before
unregistering parent device using srcu. This continues to allow multiple
create and remove to progress in parallel. At the same time guard parent
removal while parent is being access by create() and remove callbacks.

Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
Signed-off-by: Parav Pandit 
---
 drivers/vfio/mdev/mdev_core.c| 142 +--
 drivers/vfio/mdev/mdev_private.h |   7 +-
 drivers/vfio/mdev/mdev_sysfs.c   |   6 +-
 3 files changed, 84 insertions(+), 71 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 944a058..8fe0ed1 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -84,6 +84,7 @@ static void mdev_release_parent(struct kref *kref)
  ref);
struct device *dev = parent->dev;
 
+   cleanup_srcu_struct(>unreg_srcu);
kfree(parent);
put_device(dev);
 }
@@ -103,56 +104,30 @@ static inline void mdev_put_parent(struct mdev_parent 
*parent)
kref_put(>ref, mdev_release_parent);
 }
 
-static int mdev_device_create_ops(struct kobject *kobj,
- struct mdev_device *mdev)
+static int mdev_device_must_remove(struct mdev_device *mdev)
 {
-   struct mdev_parent *parent = mdev->parent;
+   struct mdev_parent *parent;
+   struct mdev_type *type;
int ret;
 
-   ret = parent->ops->create(kobj, mdev);
-   if (ret)
-   return ret;
+   type = to_mdev_type(mdev->type_kobj);
 
-   ret = sysfs_create_groups(>dev.kobj,
- parent->ops->mdev_attr_groups);
+   mdev_remove_sysfs_files(>dev, type);
+   device_del(>dev);
+   parent = mdev->parent;
+   ret = parent->ops->remove(mdev);
if (ret)
-   parent->ops->remove(mdev);
+   dev_err(>dev, "Remove failed: err=%d\n", ret);
 
+   /* Balances with device_initialize() */
+   put_device(>dev);
return ret;
 }
 
-/*
- * mdev_device_remove_ops gets called from sysfs's 'remove' and when parent
- * device is being unregistered from mdev device framework.
- * - 'force_remove' is set to 'false' when called from sysfs's 'remove' which
- *   indicates that if the mdev device is active, used by VMM or userspace
- *   application, vendor driver could return error then don't remove the 
device.
- * - 'force_remove' is set to 'true' when called from mdev_unregister_device()
- *   which indicate that parent device is being removed from mdev device
- *   framework so remove mdev device forcefully.
- */
-static int mdev_device_remove_ops(struct mdev_device *mdev, bool force_remove)
-{
-   struct mdev_parent *parent = mdev->parent;
-   int ret;
-
-   /*
-* Vendor driver can return error if VMM or userspace application is
-* using this mdev device.
-*/
-   ret = parent->ops->remove(mdev);
-   if (ret && !force_remove)
-   return ret;
-
-   sysfs_remove_groups(>dev.kobj, parent->ops->mdev_attr_groups);
-   return 0;
-}
-
 static int mdev_device_remove_cb(struct device *dev, void *data)
 {
if (dev_is_mdev(dev))
-   mdev_device_remove(dev, true);
-
+   mdev_device_must_remove(to_mdev_device(dev));
return 0;
 }
 
@@ -194,6 +169,7 @@ int mdev_register_device(struct device *dev, const struct 
mdev_parent_ops *ops)
}
 
kref_init(>ref);
+   init_srcu_struct(>unreg_srcu);
 
parent->dev = dev;
parent->ops = ops;
@@ -214,6 +190,7 @@ int mdev_register_device(struct device *dev, const struct 
mdev_parent_ops *ops)
if (ret)
dev_warn(dev, "Failed to create compatibility class link\n");
 
+   rcu_assign_pointer(parent->self, parent);
list_add(>next, _list);
mutex_unlock(_list_lock);
 
@@ -244,21 +221,36 @@ void mdev_unregister_device(struct device *dev)
 
mutex_lock(_list_lock);
parent = __find_parent_device(dev);
-
if (!parent) {
mutex_unlock(_list_lock);
return;
}
+   list_del(>next);
+   mutex_unlock(_list_lock);
+
dev_info(dev, "MDEV: Unregistering\n");
 
-   list_del(>next);
+   /* Publish that th

[PATCH 4/8] vfio/mdev: Drop redundant extern for exported symbols

2019-03-22 Thread Parav Pandit
There is no need use 'extern' for exported functions.

Signed-off-by: Parav Pandit 
---
 include/linux/mdev.h | 21 ++---
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/include/linux/mdev.h b/include/linux/mdev.h
index b6e048e..0924c48 100644
--- a/include/linux/mdev.h
+++ b/include/linux/mdev.h
@@ -118,21 +118,20 @@ struct mdev_driver {
 
 #define to_mdev_driver(drv)container_of(drv, struct mdev_driver, driver)
 
-extern void *mdev_get_drvdata(struct mdev_device *mdev);
-extern void mdev_set_drvdata(struct mdev_device *mdev, void *data);
-extern uuid_le mdev_uuid(struct mdev_device *mdev);
+void *mdev_get_drvdata(struct mdev_device *mdev);
+void mdev_set_drvdata(struct mdev_device *mdev, void *data);
+uuid_le mdev_uuid(struct mdev_device *mdev);
 
 extern struct bus_type mdev_bus_type;
 
-extern int  mdev_register_device(struct device *dev,
-const struct mdev_parent_ops *ops);
-extern void mdev_unregister_device(struct device *dev);
+int mdev_register_device(struct device *dev, const struct mdev_parent_ops 
*ops);
+void mdev_unregister_device(struct device *dev);
 
-extern int  mdev_register_driver(struct mdev_driver *drv, struct module 
*owner);
-extern void mdev_unregister_driver(struct mdev_driver *drv);
+int mdev_register_driver(struct mdev_driver *drv, struct module *owner);
+void mdev_unregister_driver(struct mdev_driver *drv);
 
-extern struct device *mdev_parent_dev(struct mdev_device *mdev);
-extern struct device *mdev_dev(struct mdev_device *mdev);
-extern struct mdev_device *mdev_from_dev(struct device *dev);
+struct device *mdev_parent_dev(struct mdev_device *mdev);
+struct device *mdev_dev(struct mdev_device *mdev);
+struct mdev_device *mdev_from_dev(struct device *dev);
 
 #endif /* MDEV_H */
-- 
1.8.3.1



[PATCH 6/8] vfio/mdev: Follow correct remove sequence

2019-03-22 Thread Parav Pandit
mdev_remove_sysfs_files() should follow exact mirror sequence of a
create, similar to what is followed in error unwinding path of
mdev_create_sysfs_files().

Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
Signed-off-by: Parav Pandit 
---
 drivers/vfio/mdev/mdev_sysfs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vfio/mdev/mdev_sysfs.c b/drivers/vfio/mdev/mdev_sysfs.c
index ce5dd21..c782fa9 100644
--- a/drivers/vfio/mdev/mdev_sysfs.c
+++ b/drivers/vfio/mdev/mdev_sysfs.c
@@ -280,7 +280,7 @@ int  mdev_create_sysfs_files(struct device *dev, struct 
mdev_type *type)
 
 void mdev_remove_sysfs_files(struct device *dev, struct mdev_type *type)
 {
+   sysfs_remove_files(>kobj, mdev_device_attrs);
sysfs_remove_link(>kobj, "mdev_type");
sysfs_remove_link(type->devices_kobj, dev_name(dev));
-   sysfs_remove_files(>kobj, mdev_device_attrs);
 }
-- 
1.8.3.1



[PATCH 3/8] vfio/mdev: Removed unused kref

2019-03-22 Thread Parav Pandit
Remove unused kref from the mdev_device structure.

Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
Signed-off-by: Parav Pandit 
---
 drivers/vfio/mdev/mdev_core.c| 1 -
 drivers/vfio/mdev/mdev_private.h | 1 -
 2 files changed, 2 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 4f213e4d..3d91f62 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -311,7 +311,6 @@ int mdev_device_create(struct kobject *kobj, struct device 
*dev, uuid_le uuid)
mutex_unlock(_list_lock);
 
mdev->parent = parent;
-   kref_init(>ref);
 
mdev->dev.parent  = dev;
mdev->dev.bus = _bus_type;
diff --git a/drivers/vfio/mdev/mdev_private.h b/drivers/vfio/mdev/mdev_private.h
index b5819b7..84b2b6c 100644
--- a/drivers/vfio/mdev/mdev_private.h
+++ b/drivers/vfio/mdev/mdev_private.h
@@ -30,7 +30,6 @@ struct mdev_device {
struct mdev_parent *parent;
uuid_le uuid;
void *driver_data;
-   struct kref ref;
struct list_head next;
struct kobject *type_kobj;
bool active;
-- 
1.8.3.1



RE: [RFC net-next v1 1/3] vfio/mdev: Inherit dma masks of parent device

2019-03-08 Thread Parav Pandit



> -Original Message-
> From: Alex Williamson 
> Sent: Friday, March 8, 2019 4:33 PM
> To: Parav Pandit 
> Cc: net...@vger.kernel.org; linux-kernel@vger.kernel.org;
> michal.l...@markovi.net; da...@davemloft.net;
> gre...@linuxfoundation.org; Jiri Pirko ;
> kwankh...@nvidia.com; Vu Pham ; Yuval Avnery
> ; jakub.kicin...@netronome.com;
> k...@vger.kernel.org
> Subject: Re: [RFC net-next v1 1/3] vfio/mdev: Inherit dma masks of parent
> device
> 
> On Fri,  8 Mar 2019 16:07:54 -0600
> Parav Pandit  wrote:
> 
> > Inherit dma mask of parent device in child mdev devices, so that
> > protocol stack can use right dma mask while doing dma mappings.
> >
> > Signed-off-by: Parav Pandit 
> > ---
> >  drivers/vfio/mdev/mdev_core.c | 4 
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/drivers/vfio/mdev/mdev_core.c
> > b/drivers/vfio/mdev/mdev_core.c index 0212f0e..9b8bdc9 100644
> > --- a/drivers/vfio/mdev/mdev_core.c
> > +++ b/drivers/vfio/mdev/mdev_core.c
> > @@ -315,6 +315,10 @@ int mdev_device_create(struct kobject *kobj,
> struct device *dev, uuid_le uuid)
> > mdev->dev.parent  = dev;
> > mdev->dev.bus = _bus_type;
> > mdev->dev.release = mdev_device_release;
> > +   mdev->dev.dma_mask = dev->dma_mask;
> > +   mdev->dev.dma_parms = dev->dma_parms;
> > +   mdev->dev.coherent_dma_mask = dev->coherent_dma_mask;
> > +
> > dev_set_name(>dev, "%pUl", uuid.b);
> >
> > ret = device_register(>dev);
> 
> This seems like a rather large assumption and none of the existing mdev
> drivers even make use of DMA ops.
So its non-harmful anyway.

> Why shouldn't this be done in mdev_parent_ops.create?  Thanks,
> 
Struct device should be setup correctly before calling device_register().
That is the sane way to access device_register() API.
Doing this under mdev_parent_ops.create() will do it after device_register().
If you want to make it optional, mdev_register_device() can pass a bool flag to 
setup dma params or not which can be applied conditionally in 
mdev_device_create() before device_register().


[RFC net-next v1 1/3] vfio/mdev: Inherit dma masks of parent device

2019-03-08 Thread Parav Pandit
Inherit dma mask of parent device in child mdev devices, so that
protocol stack can use right dma mask while doing dma mappings.

Signed-off-by: Parav Pandit 
---
 drivers/vfio/mdev/mdev_core.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 0212f0e..9b8bdc9 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -315,6 +315,10 @@ int mdev_device_create(struct kobject *kobj, struct device 
*dev, uuid_le uuid)
mdev->dev.parent  = dev;
mdev->dev.bus = _bus_type;
mdev->dev.release = mdev_device_release;
+   mdev->dev.dma_mask = dev->dma_mask;
+   mdev->dev.dma_parms = dev->dma_parms;
+   mdev->dev.coherent_dma_mask = dev->coherent_dma_mask;
+
dev_set_name(>dev, "%pUl", uuid.b);
 
ret = device_register(>dev);
-- 
1.8.3.1



[RFC net-next v1 2/3] net/mlx5: Add mdev sub device life cycle command support

2019-03-08 Thread Parav Pandit
Implement mdev hooks to to create mediated devices using mdev driver.
Actual mlx5_core driver in the host is expected to bind to these devices
using standard device driver model.

mdev devices are created using sysfs file as below example.

$ uuidgen
49d0e9ac-61b8-4c91-957e-6f6dbc42557d

$ echo 49d0e9ac-61b8-4c91-957e-6f6dbc42557d > \
./bus/pci/devices/:05:00.0/mdev_supported_types/mlx5_core-mgmt/create

$ echo 49d0e9ac-61b8-4c91-957e-6f6dbc42557d >
/sys/bus/mdev/drivers/vfio_mdev/unbind

Once mlx5 core driver is registered as mdev driver, mdev can be attached
to mlx5_core driver as below.

$ echo 49d0e9ac-61b8-4c91-957e-6f6dbc42557d >
/sys/bus/mdev/drivers/mlx5_core/bind

devlink output:
$ devlink dev show
pci/:05:00.0
mdev/69ea1551-d054-46e9-974d-8edae8f0aefe

Signed-off-by: Parav Pandit 
---
 drivers/net/ethernet/mellanox/mlx5/core/Kconfig|   9 ++
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   5 +
 drivers/net/ethernet/mellanox/mlx5/core/main.c |   9 ++
 drivers/net/ethernet/mellanox/mlx5/core/mdev.c | 120 +
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|  15 +++
 include/linux/mlx5/driver.h|   5 +
 6 files changed, 163 insertions(+)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/mdev.c

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig 
b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
index 37a5514..881ae1a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
@@ -117,3 +117,12 @@ config MLX5_EN_TLS
  Build support for TLS cryptography-offload accelaration in the NIC.
  Note: Support for hardware with this capability needs to be selected
  for this option to become available.
+
+config MLX5_MDEV
+   bool "Mellanox Technologies Mediated device support"
+   depends on MLX5_CORE
+   depends on VFIO_MDEV
+   default y
+   help
+ Build support for mediated devices. Mediated devices allow creating
+ multiple netdev and/or rdma device(s) on single PCI function.
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 82d636b..e5c0822c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -58,4 +58,9 @@ mlx5_core-$(CONFIG_MLX5_EN_IPSEC) += en_accel/ipsec.o 
en_accel/ipsec_rxtx.o \
 
 mlx5_core-$(CONFIG_MLX5_EN_TLS) += en_accel/tls.o en_accel/tls_rxtx.o 
en_accel/tls_stats.o
 
+#
+# Mdev basic
+#
+mlx5_core-$(CONFIG_MLX5_MDEV) += mdev.o
+
 CFLAGS_tracepoint.o := -I$(src)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 40d591c..72b0072 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -851,10 +851,18 @@ static int mlx5_init_once(struct mlx5_core_dev *dev, 
struct mlx5_priv *priv)
goto err_sriov_cleanup;
}
 
+   err = mlx5_mdev_init(dev);
+   if (err) {
+   dev_err(>dev, "Failed to init mdev device %d\n", err);
+   goto err_fpga_cleanup;
+   }
+
dev->tracer = mlx5_fw_tracer_create(dev);
 
return 0;
 
+err_fpga_cleanup:
+   mlx5_fpga_cleanup(dev);
 err_sriov_cleanup:
mlx5_sriov_cleanup(dev);
 err_eswitch_cleanup:
@@ -881,6 +889,7 @@ static int mlx5_init_once(struct mlx5_core_dev *dev, struct 
mlx5_priv *priv)
 static void mlx5_cleanup_once(struct mlx5_core_dev *dev)
 {
mlx5_fw_tracer_destroy(dev->tracer);
+   mlx5_mdev_cleanup(dev);
mlx5_fpga_cleanup(dev);
mlx5_sriov_cleanup(dev);
mlx5_eswitch_cleanup(dev->priv.eswitch);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mdev.c 
b/drivers/net/ethernet/mellanox/mlx5/core/mdev.c
new file mode 100644
index 000..e8e4aac
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mdev.c
@@ -0,0 +1,120 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2018-19 Mellanox Technologies
+
+#include 
+#include 
+
+#include "mlx5_core.h"
+
+#define MLX5_MAX_MDEVS 1
+
+struct mlx5_mdev {
+   struct mlx5_core_dev *dev;
+};
+
+static int mlx5_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
+{
+   struct mlx5_core_dev *mlx5_dev;
+   struct device *parent_dev;
+   struct mlx5_mdev *mmdev;
+   struct devlink *devlink;
+   bool added;
+   int err;
+
+   parent_dev = mdev_parent_dev(mdev);
+   mlx5_dev = pci_get_drvdata(to_pci_dev(parent_dev));
+
+   added = atomic_add_unless(_dev->mdev_info.cnt, 1, MLX5_MAX_MDEVS);
+   if (!added)
+   return -ENOSPC;
+
+   devlink = devlink_alloc(NULL, sizeof(*mmdev));
+   if (!devlink) {
+   atomic_dec(_dev->mdev_info.cnt);
+   return -ENOMEM;
+   }
+   mmd

[RFC net-next v1 3/3] net/mlx5: Add mdev driver to bind to mdev devices

2019-03-08 Thread Parav Pandit
Add a mdev driver to probe the mdev devices and create fake
netdevice for it.
Similar to pci driver, when new mdev are created/removed or when user
triggers binding a mdev to mlx5_core driver by writing
mdev device id to /sys/bus/mdev/drivers/mlx5_core/bind,unbind files,

mlx5_core driver's probe(), remove() are invokes to handle life cycle
of netdev and rdma device associated with the mdev.

Current RFC patch only creates one fake netdev, but in subsequent non
RFC patch, it will create related hw objects and netdev.

Signed-off-by: Parav Pandit 
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/dev.c  |  18 
 drivers/net/ethernet/mellanox/mlx5/core/main.c |  13 +++
 .../net/ethernet/mellanox/mlx5/core/mdev_driver.c  | 106 +
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|   4 +
 5 files changed, 142 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/mdev_driver.c

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index e5c0822c..bded136a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -61,6 +61,6 @@ mlx5_core-$(CONFIG_MLX5_EN_TLS) += en_accel/tls.o 
en_accel/tls_rxtx.o en_accel/t
 #
 # Mdev basic
 #
-mlx5_core-$(CONFIG_MLX5_MDEV) += mdev.o
+mlx5_core-$(CONFIG_MLX5_MDEV) += mdev.o mdev_driver.o
 
 CFLAGS_tracepoint.o := -I$(src)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/dev.c 
b/drivers/net/ethernet/mellanox/mlx5/core/dev.c
index ebc046f..91b8d8ba 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/dev.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/dev.c
@@ -321,3 +321,21 @@ int mlx5_dev_list_trylock(void)
 {
return mutex_trylock(_intf_mutex);
 }
+
+struct mlx5_core_dev *mlx5_get_core_dev(const struct device *dev)
+{
+   struct mlx5_core_dev *found = NULL;
+   struct mlx5_core_dev *tmp_dev;
+   struct mlx5_priv *priv;
+
+   mutex_lock(_intf_mutex);
+   list_for_each_entry(priv, _dev_list, dev_list) {
+   tmp_dev = container_of(priv, struct mlx5_core_dev, priv);
+   if (_dev->pdev->dev == dev) {
+   found = tmp_dev;
+   break;
+   }
+   }
+   mutex_unlock(_intf_mutex);
+   return found;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 72b0072..c1fc0f4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1553,7 +1554,15 @@ static int __init init(void)
mlx5e_init();
 #endif
 
+#if IS_ENABLED(CONFIG_VFIO_MDEV)
+   err = mdev_register_driver(_mdev_driver, THIS_MODULE);
+   if (err) {
+   pci_unregister_driver(_core_driver);
+   goto err_debug;
+   }
+#else
return 0;
+#endif
 
 err_debug:
mlx5_unregister_debugfs();
@@ -1562,6 +1571,10 @@ static int __init init(void)
 
 static void __exit cleanup(void)
 {
+#if IS_ENABLED(CONFIG_VFIO_MDEV)
+   mdev_unregister_driver(_mdev_driver);
+#endif
+
 #ifdef CONFIG_MLX5_CORE_EN
mlx5e_cleanup();
 #endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mdev_driver.c 
b/drivers/net/ethernet/mellanox/mlx5/core/mdev_driver.c
new file mode 100644
index 000..7618c5e
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mdev_driver.c
@@ -0,0 +1,106 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2018-19 Mellanox Technologies
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "mlx5_core.h"
+
+struct mlx5_subdev_ndev {
+   struct net_device ndev;
+};
+
+static void mlx5_dma_test(struct device *dev)
+{
+   dma_addr_t pa;
+   void *va;
+
+   va = dma_alloc_coherent(dev, 4096, , GFP_KERNEL);
+   if (va)
+   dma_free_coherent(dev, 4096, va, pa);
+}
+
+static struct net_device *ndev;
+
+static int mlx5e_mdev_open(struct net_device *netdev)
+{
+   return 0;
+}
+
+static int mlx5e_mdev_close(struct net_device *netdev)
+{
+   return 0;
+}
+
+static netdev_tx_t
+mlx5e_mdev_xmit(struct sk_buff *skb, struct net_device *netdev)
+{
+   return NETDEV_TX_BUSY;
+}
+
+const struct net_device_ops mlx5e_mdev_netdev_ops = {
+   .ndo_open= mlx5e_mdev_open,
+   .ndo_stop= mlx5e_mdev_close,
+   .ndo_start_xmit  = mlx5e_mdev_xmit,
+};
+
+static int mlx5_mdev_probe(struct device *dev)
+{
+   struct mdev_device *mdev = mdev_from_dev(dev);
+   struct device *parent_dev = mdev_parent_dev(mdev);
+   struct mlx5_core_dev *mlx5_dev;
+   int err;
+
+   mlx5_dev = mlx5_get_core_dev(parent_dev);
+   if (!mlx5_dev)
+   return -ENODEV;
+
+   m

[RFC net-next v1 0/3] Support mlx5 mediated devices in host

2019-03-08 Thread Parav Pandit
Use case:
-
A user wants to create/delete hardware linked sub devices without
using SR-IOV.
These devices for a pci device can be netdev (optional rdma device)
or other devices. Such sub devices share some of the PCI device
resources and also have their own dedicated resources.
A user wants to use this device in a host where PF PCI device exist.
(not in a guest VM.) A user may want to use such sub device in future
in guest VM.

Few examples are:
1. netdev having its own txq(s), rq(s) and/or hw offload parameters.
2. netdev with switchdev mode using netdev representor
3. rdma device with IB link layer and IPoIB netdev
4. rdma/RoCE device and a netdev
5. rdma device with multiple ports

Requirements for above use cases:

1. We need a generic user interface & core APIs to create sub devices
from a parent pci device but should be generic enough for other parent
devices
2. Interface should be vendor agnostic
3. User should be able to set device params at creation time
4. In future if needed, tool should be able to create passthrough
device to map to a virtual machine
5. A device can have multiple ports
6. An orchestration software wants to know how many such sub devices
can be created from a parent device so that it can manage them in global
cluster resources.

So how is it done?
--
Kernel has existing mediated device infrastructure for lifecyle of such
sub devices provided by mdev driver.
Hence, these sub devices are created with help of mdev driver.

mlx5_core driver registers with mdev core to do so and exposes necessary
sysfs files.

Each creates sub device has unique uuid id assigned by the user.

mdev sub devices inherit their parent's dma parameters.

Each registered mdev has corresponding devlink instance. Through this
devlink instance, such device and it port(s) are managed.

In order to use mediated device in a VM or in host, user decides
which driver to use. Typically vfio_mdev is used to expose a mdev in a
guest VM. In current use case, mlx5 mediated devices are only usable
inside the host through mlx5_core driver binding to it.

Patchset summary:
-
Patch-1 adds support to inherit dma params of parent device in child mdev.
Patch-2 registers with mdev core.
Patch-3 registers a mdev device driver to create actual netdev.

Summary of alternatives considered and discussed:
-
1. new subdev bus
   Fits the need but mdev simplifies it.
2. visorbus
   Very specific to Unisys s-Par devices.
3. platform devices
   Primarily meant for autonomous, SoC etc devices.
4. mfd devices
   Depends on platform device infra.
5. Directly creating netdev, rdma device instead of sub device
   Doesn't fit use case of passthrough mode.
6. creating subports of devlink instance
   Doesn't cover multiport rdma device usecase.

While discussion [1], [2] is still ongoing, v1 is posted to describe
how two use cases of using mdev in host or in guest via standard Linux
device driver model are addressed.

[1] https://www.spinics.net/lists/netdev/msg556552.html
[2] https://www.spinics.net/lists/netdev/msg556944.html

All patches are only a reference implementation to see framework in
works at devlink, sysfs, mdev and device model level. Once RFC looks good,
solid upstreamable version of the implementation will be done.

System view with one mdev:
--

$ ls -l /sys/bus/pci/devices/:05:00.0
[..]
drwxr-xr-x 3 root root0 Mar  8 14:53 
69ea1551-d054-46e9-974d-8edae8f0aefe
drwxr-xr-x 3 root root0 Mar  8 15:41 infiniband
drwxr-xr-x 3 root root0 Mar  8 15:41 mdev_supported_types
-rw-r--r-- 1 root root 4096 Mar  8 13:17 msi_bus
drwxr-xr-x 2 root root0 Mar  8 15:41 msi_irqs
drwxr-xr-x 3 root root0 Mar  8 15:41 net

ls -l /sys/bus/mdev/drivers
total 0
drwxr-xr-x 2 root root 0 Mar  8 13:39 mlx5_core
drwxr-xr-x 2 root root 0 Mar  8 14:53 vfio_mdev

ls -l /sys/bus/mdev/devices/
total 0
lrwxrwxrwx 1 root root 0 Mar  8 14:53 69ea1551-d054-46e9-974d-8edae8f0aefe -> 
../../../devices/pci:00/:00:02.2/:05:00.0/69ea1551-d054-46e9-974d-8edae8f0aefe

Bind mdev to mlx5_core driver:
$ echo 69ea1551-d054-46e9-974d-8edae8f0aefe > 
/sys/bus/mdev/drivers/mlx5_core/bind

$ ls -l /sys/class/net/eth0/
-r--r--r-- 1 root root 4096 Mar  8 15:43 carrier_up_count
lrwxrwxrwx 1 root root0 Mar  8 15:43 device -> 
../../../69ea1551-d054-46e9-974d-8edae8f0aefe
-r--r--r-- 1 root root 4096 Mar  8 15:43 dev_id

$ devlink dev show
pci/:05:00.0
mdev/69ea1551-d054-46e9-974d-8edae8f0aefe

Changelog
---
v0->v1:
 - Removed subdev bus, instead using existing mdev bus which fits
   the need.
 - Dropped devlink patches which are not needed anymore due to use of
   mdev framework.
 - Updated SPDX license line in patches.
 - Added TODO to patches where more hardware specific code will be added.

Parav Pandit (3):
  vfio/mdev: Inherit dma masks of parent device
  ne

RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension

2019-03-08 Thread Parav Pandit


> -Original Message-
> From: Kirti Wankhede 
> Sent: Friday, March 8, 2019 6:19 AM
> To: Parav Pandit ; Jakub Kicinski
> 
> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
> ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net;
> gre...@linuxfoundation.org; Jiri Pirko ; Alex
> Williamson 
> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
> 
> 
> 
> >>>>>> 
> >>>>>>
> >>>>>>>>>
> >>>>>>>>> Yes. I got my patches to adapt to mdev way. Will be posting
> >>>>>>>>> RFC
> >>>>>>>>> v2
> >>>> soon.
> >>>>>>>>> Will wait for a day to receive more comments/views from Greg
> >>>>>>>>> and
> >>>>>> others.
> >>>>>>>>>
> >>>>>>>>> As I explained in this cover-letter and discussion, First use
> >>>>>>>>> case is to create and use mdevs in the host (and not in VM).
> >>>>>>>>> Later on, I am sure once we have mdevs available, VM users
> >>>>>>>>> will likely use
> >>>>>>>> it.
> >>>>>>>>>
> >>>>>>>>> So, mlx5_core driver will have two components as starting point.
> >>>>>>>>>
> >>>>>>>>> 1. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev.c
> >>>>>>>>> This is mdev device life cycle driver which will do,
> >>>>>>>>> mdev_register_device()
> >>>>>>>> and implements mlx5_mdev_ops.
> >>>>>>>>>
> >>>>>>>> Ok. I would suggest not use mdev.c file name, may be add device
> >>>>>>>> name, something like mlx_mdev.c or vfio_mlx.c
> >>>>>>>>
> >>>>>>> mlx5/core is coding convention is not following to prefix mlx to
> >>>>>>> its
> >>>>>>> 40+
> >>>>>> files.
> >>>>>>>
> >>>>>>> it uses actual subsystem or functionality name, such as, sriov.c
> >>>>>>> eswitch.c fw.c en_tc.c (en for Ethernet) lag.c so, mdev.c aligns
> >>>>>>> to rest of the 40+ files.
> >>>>>>>
> >>>>>>>
> >>>>>>>>> 2. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev_driver.c
> >>>>>>>>> This is mdev device driver which does mdev_register_driver()
> >>>>>>>>> and
> >>>>>>>>> probe() creates netdev by heavily reusing existing code of the
> >>>>>>>>> PF
> >>>> device.
> >>>>>>>>> These drivers will not be placed under drivers/vfio/mdev,
> >>>>>>>>> because this is
> >>>>>>>> not a vfio driver.
> >>>>>>>>> This is fine, right?
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> I'm not too familiar with netdev, but can you create netdev on
> >>>>>>>> open() call on mlx mdev device? Then you don't have to write
> >>>>>>>> mdev device
> >>>>>> driver.
> >>>>>>>>
> >>>>>>> Who invokes open() and release()?
> >>>>>>> I believe it is the qemu would do open(), release,
> read/write/mmap?
> >>>>>>>
> >>>>>>> Assuming that is the case,
> >>>>>>> I think its incorrect to create netdev in open.
> >>>>>>> Because when we want to map the mdev to VM using above mdev
> >> calls,
> >>>>>>> we
> >>>>>> actually wont be creating netdev in host.
> >>>>>>> Instead, some queues etc will be setup as part of these calls.
> >>>>>>>
> >>>>>>> By default this created mdev is bound to vfio_mdev.
> >>>>>>> And once we unbind the device from this driver, we need to bind
> >>>>>>> to
> >>>>>>> mlx5
> >>>>>> driver so that driver can create the netdev etc.
> >>>>>>>
> >>>>>>> Or did I

RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension

2019-03-07 Thread Parav Pandit


> -Original Message-
> From: Kirti Wankhede 
> Sent: Thursday, March 7, 2019 4:02 PM
> To: Parav Pandit ; Jakub Kicinski
> 
> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
> ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net;
> gre...@linuxfoundation.org; Jiri Pirko ; Alex
> Williamson 
> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
> 
> 
> 
> On 3/8/2019 2:51 AM, Parav Pandit wrote:
> >
> >
> >> -Original Message-
> >> From: Kirti Wankhede 
> >> Sent: Thursday, March 7, 2019 3:08 PM
> >> To: Parav Pandit ; Jakub Kicinski
> >> 
> >> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
> >> ker...@vger.kernel.org; michal.l...@markovi.net;
> da...@davemloft.net;
> >> gre...@linuxfoundation.org; Jiri Pirko ; Alex
> >> Williamson 
> >> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink
> >> extension
> >>
> >>
> >>
> >> On 3/8/2019 2:32 AM, Parav Pandit wrote:
> >>>
> >>>
> >>>> -Original Message-
> >>>> From: Kirti Wankhede 
> >>>> Sent: Thursday, March 7, 2019 2:54 PM
> >>>> To: Parav Pandit ; Jakub Kicinski
> >>>> 
> >>>> Cc: Or Gerlitz ; net...@vger.kernel.org;
> >>>> linux- ker...@vger.kernel.org; michal.l...@markovi.net;
> >> da...@davemloft.net;
> >>>> gre...@linuxfoundation.org; Jiri Pirko ; Alex
> >>>> Williamson 
> >>>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink
> >>>> extension
> >>>>
> >>>>
> >>>>
> >>>> 
> >>>>
> >>>>>>>
> >>>>>>> Yes. I got my patches to adapt to mdev way. Will be posting RFC
> >>>>>>> v2
> >> soon.
> >>>>>>> Will wait for a day to receive more comments/views from Greg and
> >>>> others.
> >>>>>>>
> >>>>>>> As I explained in this cover-letter and discussion, First use
> >>>>>>> case is to create and use mdevs in the host (and not in VM).
> >>>>>>> Later on, I am sure once we have mdevs available, VM users will
> >>>>>>> likely use
> >>>>>> it.
> >>>>>>>
> >>>>>>> So, mlx5_core driver will have two components as starting point.
> >>>>>>>
> >>>>>>> 1. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev.c
> >>>>>>> This is mdev device life cycle driver which will do,
> >>>>>>> mdev_register_device()
> >>>>>> and implements mlx5_mdev_ops.
> >>>>>>>
> >>>>>> Ok. I would suggest not use mdev.c file name, may be add device
> >>>>>> name, something like mlx_mdev.c or vfio_mlx.c
> >>>>>>
> >>>>> mlx5/core is coding convention is not following to prefix mlx to
> >>>>> its
> >>>>> 40+
> >>>> files.
> >>>>>
> >>>>> it uses actual subsystem or functionality name, such as, sriov.c
> >>>>> eswitch.c fw.c en_tc.c (en for Ethernet) lag.c so, mdev.c aligns
> >>>>> to rest of the 40+ files.
> >>>>>
> >>>>>
> >>>>>>> 2. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev_driver.c
> >>>>>>> This is mdev device driver which does mdev_register_driver() and
> >>>>>>> probe() creates netdev by heavily reusing existing code of the
> >>>>>>> PF
> >> device.
> >>>>>>> These drivers will not be placed under drivers/vfio/mdev,
> >>>>>>> because this is
> >>>>>> not a vfio driver.
> >>>>>>> This is fine, right?
> >>>>>>>
> >>>>>>
> >>>>>> I'm not too familiar with netdev, but can you create netdev on
> >>>>>> open() call on mlx mdev device? Then you don't have to write mdev
> >>>>>> device
> >>>> driver.
> >>>>>>
> >>>>> Who invokes open() and release()?
> >>>>> I believe it is the qemu would do open(), release, read/write/mmap?
> >>>>>
> >>>>> Assuming that

RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension

2019-03-07 Thread Parav Pandit


> -Original Message-
> From: Kirti Wankhede 
> Sent: Thursday, March 7, 2019 3:08 PM
> To: Parav Pandit ; Jakub Kicinski
> 
> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
> ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net;
> gre...@linuxfoundation.org; Jiri Pirko ; Alex
> Williamson 
> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
> 
> 
> 
> On 3/8/2019 2:32 AM, Parav Pandit wrote:
> >
> >
> >> -Original Message-
> >> From: Kirti Wankhede 
> >> Sent: Thursday, March 7, 2019 2:54 PM
> >> To: Parav Pandit ; Jakub Kicinski
> >> 
> >> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
> >> ker...@vger.kernel.org; michal.l...@markovi.net;
> da...@davemloft.net;
> >> gre...@linuxfoundation.org; Jiri Pirko ; Alex
> >> Williamson 
> >> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink
> >> extension
> >>
> >>
> >>
> >> 
> >>
> >>>>>
> >>>>> Yes. I got my patches to adapt to mdev way. Will be posting RFC v2
> soon.
> >>>>> Will wait for a day to receive more comments/views from Greg and
> >> others.
> >>>>>
> >>>>> As I explained in this cover-letter and discussion, First use case
> >>>>> is to create and use mdevs in the host (and not in VM).
> >>>>> Later on, I am sure once we have mdevs available, VM users will
> >>>>> likely use
> >>>> it.
> >>>>>
> >>>>> So, mlx5_core driver will have two components as starting point.
> >>>>>
> >>>>> 1. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev.c
> >>>>> This is mdev device life cycle driver which will do,
> >>>>> mdev_register_device()
> >>>> and implements mlx5_mdev_ops.
> >>>>>
> >>>> Ok. I would suggest not use mdev.c file name, may be add device
> >>>> name, something like mlx_mdev.c or vfio_mlx.c
> >>>>
> >>> mlx5/core is coding convention is not following to prefix mlx to its
> >>> 40+
> >> files.
> >>>
> >>> it uses actual subsystem or functionality name, such as, sriov.c
> >>> eswitch.c fw.c en_tc.c (en for Ethernet) lag.c so, mdev.c aligns to
> >>> rest of the 40+ files.
> >>>
> >>>
> >>>>> 2. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev_driver.c
> >>>>> This is mdev device driver which does mdev_register_driver() and
> >>>>> probe() creates netdev by heavily reusing existing code of the PF
> device.
> >>>>> These drivers will not be placed under drivers/vfio/mdev, because
> >>>>> this is
> >>>> not a vfio driver.
> >>>>> This is fine, right?
> >>>>>
> >>>>
> >>>> I'm not too familiar with netdev, but can you create netdev on
> >>>> open() call on mlx mdev device? Then you don't have to write mdev
> >>>> device
> >> driver.
> >>>>
> >>> Who invokes open() and release()?
> >>> I believe it is the qemu would do open(), release, read/write/mmap?
> >>>
> >>> Assuming that is the case,
> >>> I think its incorrect to create netdev in open.
> >>> Because when we want to map the mdev to VM using above mdev calls,
> >>> we
> >> actually wont be creating netdev in host.
> >>> Instead, some queues etc will be setup as part of these calls.
> >>>
> >>> By default this created mdev is bound to vfio_mdev.
> >>> And once we unbind the device from this driver, we need to bind to
> >>> mlx5
> >> driver so that driver can create the netdev etc.
> >>>
> >>> Or did I get open() and friends call wrong?
> >>>
> >>
> >> In 'struct mdev_parent_ops' there are create() and remove(). When
> >> user creates mdev device by writing UUID to create sysfs, vendor
> >> driver's
> >> create() callback gets called. This should be used to allocate/commit
> > Yes. I am already past that stage.
> >
> >> resources from parent device and on remove() callback free those
> resources.
> >> So there is no need to bind mlx5 driver to that mdev device.
> >>
> > If we don't bind mlx5 driver, vfio_mdev driver is bound to it. Such driver
> won't create ne

RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension

2019-03-07 Thread Parav Pandit


> -Original Message-
> From: Kirti Wankhede 
> Sent: Thursday, March 7, 2019 2:54 PM
> To: Parav Pandit ; Jakub Kicinski
> 
> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
> ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net;
> gre...@linuxfoundation.org; Jiri Pirko ; Alex
> Williamson 
> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
> 
> 
> 
> 
> 
> >>>
> >>> Yes. I got my patches to adapt to mdev way. Will be posting RFC v2 soon.
> >>> Will wait for a day to receive more comments/views from Greg and
> others.
> >>>
> >>> As I explained in this cover-letter and discussion, First use case
> >>> is to create and use mdevs in the host (and not in VM).
> >>> Later on, I am sure once we have mdevs available, VM users will
> >>> likely use
> >> it.
> >>>
> >>> So, mlx5_core driver will have two components as starting point.
> >>>
> >>> 1. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev.c
> >>> This is mdev device life cycle driver which will do,
> >>> mdev_register_device()
> >> and implements mlx5_mdev_ops.
> >>>
> >> Ok. I would suggest not use mdev.c file name, may be add device name,
> >> something like mlx_mdev.c or vfio_mlx.c
> >>
> > mlx5/core is coding convention is not following to prefix mlx to its 40+
> files.
> >
> > it uses actual subsystem or functionality name, such as, sriov.c
> > eswitch.c fw.c en_tc.c (en for Ethernet) lag.c so, mdev.c aligns to
> > rest of the 40+ files.
> >
> >
> >>> 2. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev_driver.c
> >>> This is mdev device driver which does mdev_register_driver() and
> >>> probe() creates netdev by heavily reusing existing code of the PF device.
> >>> These drivers will not be placed under drivers/vfio/mdev, because
> >>> this is
> >> not a vfio driver.
> >>> This is fine, right?
> >>>
> >>
> >> I'm not too familiar with netdev, but can you create netdev on open()
> >> call on mlx mdev device? Then you don't have to write mdev device
> driver.
> >>
> > Who invokes open() and release()?
> > I believe it is the qemu would do open(), release, read/write/mmap?
> >
> > Assuming that is the case,
> > I think its incorrect to create netdev in open.
> > Because when we want to map the mdev to VM using above mdev calls, we
> actually wont be creating netdev in host.
> > Instead, some queues etc will be setup as part of these calls.
> >
> > By default this created mdev is bound to vfio_mdev.
> > And once we unbind the device from this driver, we need to bind to mlx5
> driver so that driver can create the netdev etc.
> >
> > Or did I get open() and friends call wrong?
> >
> 
> In 'struct mdev_parent_ops' there are create() and remove(). When user
> creates mdev device by writing UUID to create sysfs, vendor driver's
> create() callback gets called. This should be used to allocate/commit
Yes. I am already past that stage.

> resources from parent device and on remove() callback free those resources.
> So there is no need to bind mlx5 driver to that mdev device.
> 
If we don't bind mlx5 driver, vfio_mdev driver is bound to it. Such driver 
won't create netdev.
Again, we do not want to map this mdev to a VM.
We want to consume it in the host where mdev is created.
So I am able to detach this mdev from vfio_mdev driver as usaual using 
$ echo mdev_name > ../drivers/vfio_mdev/unbind

Followed by binding it to mlx5_core driver.

Below is sample output before binding it to mlx5_core driver.
When we bind with mlx5_core driver, that driver creates the netdev in host.
If user wants to map this mdev to VM, user won't bind to mlx5_core driver. 
instead he will bind to vfio driver and that does usual open/release/...


lrwxrwxrwx 1 root root 0 Mar  7 14:24 69ea1551-d054-46e9-974d-8edae8f0aefe -> 
../../../devices/pci:00/:00:02.2/:05:00.0/69ea1551-d054-46e9-974d-8edae8f0aefe
[root@sw-mtx-036 net-next]# ls -l 
/sys/bus/mdev/devices/69ea1551-d054-46e9-974d-8edae8f0aefe/
total 0
lrwxrwxrwx 1 root root0 Mar  7 14:24 driver -> 
../../../../../bus/mdev/drivers/vfio_mdev
lrwxrwxrwx 1 root root0 Mar  7 14:24 iommu_group -> 
../../../../../kernel/iommu_groups/0
lrwxrwxrwx 1 root root0 Mar  7 14:24 mdev_type -> 
../mdev_supported_types/mlx5_core-mgmt
drwxr-xr-x 2 root root0 Mar  7 14:24 power
--w--- 1 root root 4096 Mar  7 14:24 remove
lrwxrwxrwx 1 root root0 Mar  7 14:24 subsystem -> ../../../../../bus/mdev
-rw-r--r-- 1 root root 4096 Mar  7 14:24 uevent

> open/release/read/write/mmap/ioctl are regular file operations for that
> mdev device.
> 

> Thanks,
> Kirti



RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension

2019-03-07 Thread Parav Pandit


> -Original Message-
> From: Kirti Wankhede 
> Sent: Thursday, March 7, 2019 1:04 PM
> To: Parav Pandit ; Jakub Kicinski
> 
> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
> ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net;
> gre...@linuxfoundation.org; Jiri Pirko ; Alex
> Williamson 
> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
> 
> CC += Alex
> 
> On 3/6/2019 11:12 AM, Parav Pandit wrote:
> > Hi Kirti,
> >
> >> -Original Message-
> >> From: Kirti Wankhede 
> >> Sent: Tuesday, March 5, 2019 9:51 PM
> >> To: Parav Pandit ; Jakub Kicinski
> >> 
> >> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
> >> ker...@vger.kernel.org; michal.l...@markovi.net;
> da...@davemloft.net;
> >> gre...@linuxfoundation.org; Jiri Pirko 
> >> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink
> >> extension
> >>
> >>
> >>
> >> On 3/6/2019 6:14 AM, Parav Pandit wrote:
> >>> Hi Greg, Kirti,
> >>>
> >>>> -Original Message-
> >>>> From: Parav Pandit
> >>>> Sent: Tuesday, March 5, 2019 5:45 PM
> >>>> To: Parav Pandit ; Kirti Wankhede
> >>>> ; Jakub Kicinski
> >> 
> >>>> Cc: Or Gerlitz ; net...@vger.kernel.org;
> >>>> linux- ker...@vger.kernel.org; michal.l...@markovi.net;
> >> da...@davemloft.net;
> >>>> gre...@linuxfoundation.org; Jiri Pirko 
> >>>> Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink
> >>>> extension
> >>>>
> >>>>
> >>>>
> >>>>> -Original Message-
> >>>>> From: linux-kernel-ow...@vger.kernel.org  >>>>> ow...@vger.kernel.org> On Behalf Of Parav Pandit
> >>>>> Sent: Tuesday, March 5, 2019 5:17 PM
> >>>>> To: Kirti Wankhede ; Jakub Kicinski
> >>>>> 
> >>>>> Cc: Or Gerlitz ; net...@vger.kernel.org;
> >>>>> linux- ker...@vger.kernel.org; michal.l...@markovi.net;
> >>>>> da...@davemloft.net; gre...@linuxfoundation.org; Jiri Pirko
> >>>>> 
> >>>>> Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink
> >>>>> extension
> >>>>>
> >>>>> Hi Kirti,
> >>>>>
> >>>>>> -Original Message-
> >>>>>> From: Kirti Wankhede 
> >>>>>> Sent: Tuesday, March 5, 2019 4:40 PM
> >>>>>> To: Parav Pandit ; Jakub Kicinski
> >>>>>> 
> >>>>>> Cc: Or Gerlitz ; net...@vger.kernel.org;
> >>>>>> linux- ker...@vger.kernel.org; michal.l...@markovi.net;
> >>>>>> da...@davemloft.net; gre...@linuxfoundation.org; Jiri Pirko
> >>>>>> 
> >>>>>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and
> >>>>>> devlink extension
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> I am novice at mdev level too. mdev or vfio mdev.
> >>>>>>> Currently by default we bind to same vendor driver, but when it
> >>>>>>> was
> >>>>>> created as passthrough device, vendor driver won't create
> >>>>>> netdevice or rdma device for it.
> >>>>>>> And vfio/mdev or whatever mature available driver would bind at
> >>>>>>> that
> >>>>>> point.
> >>>>>>>
> >>>>>>
> >>>>>> Using mdev framework, if you want to partition a physical device
> >>>>>> into multiple logic devices, you can bind those devices to same
> >>>>>> vendor driver through vfio-mdev, where as if you want to
> >>>>>> passthrough the device bind it to vfio-pci. If I understand
> >>>>>> correctly, that is what you are
> >>>>> looking for.
> >>>>>>
> >>>>>>
> >>>>> We cannot bind a whole PCI device to vfio-pci, reason is, A given
> >>>>> PCI device has existing protocol devices on it such as netdevs and
> >>>>> rdma
> >> dev.
> >>>>> This device is partitioned while those protocol devices exist and
> >>>>> mlx5_core, mlx5_ib drivers are

RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension

2019-03-05 Thread Parav Pandit
Hi Kirti,

> -Original Message-
> From: Kirti Wankhede 
> Sent: Tuesday, March 5, 2019 9:51 PM
> To: Parav Pandit ; Jakub Kicinski
> 
> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
> ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net;
> gre...@linuxfoundation.org; Jiri Pirko 
> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
> 
> 
> 
> On 3/6/2019 6:14 AM, Parav Pandit wrote:
> > Hi Greg, Kirti,
> >
> >> -Original Message-
> >> From: Parav Pandit
> >> Sent: Tuesday, March 5, 2019 5:45 PM
> >> To: Parav Pandit ; Kirti Wankhede
> >> ; Jakub Kicinski
> 
> >> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
> >> ker...@vger.kernel.org; michal.l...@markovi.net;
> da...@davemloft.net;
> >> gre...@linuxfoundation.org; Jiri Pirko 
> >> Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink
> >> extension
> >>
> >>
> >>
> >>> -Original Message-
> >>> From: linux-kernel-ow...@vger.kernel.org  >>> ow...@vger.kernel.org> On Behalf Of Parav Pandit
> >>> Sent: Tuesday, March 5, 2019 5:17 PM
> >>> To: Kirti Wankhede ; Jakub Kicinski
> >>> 
> >>> Cc: Or Gerlitz ; net...@vger.kernel.org;
> >>> linux- ker...@vger.kernel.org; michal.l...@markovi.net;
> >>> da...@davemloft.net; gre...@linuxfoundation.org; Jiri Pirko
> >>> 
> >>> Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink
> >>> extension
> >>>
> >>> Hi Kirti,
> >>>
> >>>> -Original Message-
> >>>> From: Kirti Wankhede 
> >>>> Sent: Tuesday, March 5, 2019 4:40 PM
> >>>> To: Parav Pandit ; Jakub Kicinski
> >>>> 
> >>>> Cc: Or Gerlitz ; net...@vger.kernel.org;
> >>>> linux- ker...@vger.kernel.org; michal.l...@markovi.net;
> >>>> da...@davemloft.net; gre...@linuxfoundation.org; Jiri Pirko
> >>>> 
> >>>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink
> >>>> extension
> >>>>
> >>>>
> >>>>
> >>>>> I am novice at mdev level too. mdev or vfio mdev.
> >>>>> Currently by default we bind to same vendor driver, but when it
> >>>>> was
> >>>> created as passthrough device, vendor driver won't create netdevice
> >>>> or rdma device for it.
> >>>>> And vfio/mdev or whatever mature available driver would bind at
> >>>>> that
> >>>> point.
> >>>>>
> >>>>
> >>>> Using mdev framework, if you want to partition a physical device
> >>>> into multiple logic devices, you can bind those devices to same
> >>>> vendor driver through vfio-mdev, where as if you want to
> >>>> passthrough the device bind it to vfio-pci. If I understand
> >>>> correctly, that is what you are
> >>> looking for.
> >>>>
> >>>>
> >>> We cannot bind a whole PCI device to vfio-pci, reason is, A given
> >>> PCI device has existing protocol devices on it such as netdevs and rdma
> dev.
> >>> This device is partitioned while those protocol devices exist and
> >>> mlx5_core, mlx5_ib drivers are loaded on it.
> >>> And we also need to connect these objects rightly to eswitch exposed
> >>> by devlink interface (net/core/devlink.c) that supports eswitch
> >>> binding, health, registers, parameters, ports support.
> >>> It also supports existing PCI VFs.
> >>>
> >>> I don’t think we want to replicate all of this again in mdev subsystem 
> >>> [1].
> >>>
> >>> [1]
> >>> https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt
> >>>
> >>> So devlink interface to migrate users from managing VFs to non_VF
> >>> sub device is natural progression.
> >>>
> >>> However, in future, I believe we would be creating mediated devices
> >>> on user request, to use mdev modules and map them to VM.
> >>>
> >>> Also 'mdev_bus' is created as a class and not as a bus. This limits
> >>> to not use devlink interface whose handle is bus+device name.
> >>>
> >>> So one option is to change mdev from class to bus.
> >>> devlink will create mdevs on the bus, mdev driver can pr

RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension

2019-03-05 Thread Parav Pandit
Hi Greg, Kirti,

> -Original Message-
> From: Parav Pandit
> Sent: Tuesday, March 5, 2019 5:45 PM
> To: Parav Pandit ; Kirti Wankhede
> ; Jakub Kicinski 
> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
> ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net;
> gre...@linuxfoundation.org; Jiri Pirko 
> Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension
> 
> 
> 
> > -Original Message-
> > From: linux-kernel-ow...@vger.kernel.org  > ow...@vger.kernel.org> On Behalf Of Parav Pandit
> > Sent: Tuesday, March 5, 2019 5:17 PM
> > To: Kirti Wankhede ; Jakub Kicinski
> > 
> > Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
> > ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net;
> > gre...@linuxfoundation.org; Jiri Pirko 
> > Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink
> > extension
> >
> > Hi Kirti,
> >
> > > -Original Message-
> > > From: Kirti Wankhede 
> > > Sent: Tuesday, March 5, 2019 4:40 PM
> > > To: Parav Pandit ; Jakub Kicinski
> > > 
> > > Cc: Or Gerlitz ; net...@vger.kernel.org;
> > > linux- ker...@vger.kernel.org; michal.l...@markovi.net;
> > > da...@davemloft.net; gre...@linuxfoundation.org; Jiri Pirko
> > > 
> > > Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink
> > > extension
> > >
> > >
> > >
> > > > I am novice at mdev level too. mdev or vfio mdev.
> > > > Currently by default we bind to same vendor driver, but when it
> > > > was
> > > created as passthrough device, vendor driver won't create netdevice
> > > or rdma device for it.
> > > > And vfio/mdev or whatever mature available driver would bind at
> > > > that
> > > point.
> > > >
> > >
> > > Using mdev framework, if you want to partition a physical device
> > > into multiple logic devices, you can bind those devices to same
> > > vendor driver through vfio-mdev, where as if you want to passthrough
> > > the device bind it to vfio-pci. If I understand correctly, that is
> > > what you are
> > looking for.
> > >
> > >
> > We cannot bind a whole PCI device to vfio-pci, reason is, A given PCI
> > device has existing protocol devices on it such as netdevs and rdma dev.
> > This device is partitioned while those protocol devices exist and
> > mlx5_core, mlx5_ib drivers are loaded on it.
> > And we also need to connect these objects rightly to eswitch exposed
> > by devlink interface (net/core/devlink.c) that supports eswitch
> > binding, health, registers, parameters, ports support.
> > It also supports existing PCI VFs.
> >
> > I don’t think we want to replicate all of this again in mdev subsystem [1].
> >
> > [1] https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt
> >
> > So devlink interface to migrate users from managing VFs to non_VF sub
> > device is natural progression.
> >
> > However, in future, I believe we would be creating mediated devices on
> > user request, to use mdev modules and map them to VM.
> >
> > Also 'mdev_bus' is created as a class and not as a bus. This limits to
> > not use devlink interface whose handle is bus+device name.
> >
> > So one option is to change mdev from class to bus.
> > devlink will create mdevs on the bus, mdev driver can probe these
> > devices on host system by default.
> > And if told to do passthrough, a different driver exposes them to VM.
> > How feasible is this?
> >
> Wait, I do see a mdev bus and mdevs are created on this bus using
> mdev_device_create().
> So how about we create mdevs on this bus using devlink, instead of sysfs?
> And driver side on host gets the mdev_register_driver()->probe()?
> 

Thinking more and reviewing more mdev code, I believe mdev fits 
this need a lot better than new subdev bus, mfd, platform device, or devlink 
subport.
For coming future, to map this sub device (mdev) to VM will also be easier by 
using mdev bus.

I also believe we can use the sysfs interface for mdev life cycle.
Here when mdev are created it will register as devlink instance and 
will be able to query/config parameters before driver probe the device.
(instead of having life cycle via devlink)

Few enhancements would be needed for mdev side.
1. making iommu optional.
2. configuring mdev device parameters during creation time

More once get my hands dirty with mdev in RFCv2.

What do you think?



RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension

2019-03-05 Thread Parav Pandit


> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org  ow...@vger.kernel.org> On Behalf Of Parav Pandit
> Sent: Tuesday, March 5, 2019 5:17 PM
> To: Kirti Wankhede ; Jakub Kicinski
> 
> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
> ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net;
> gre...@linuxfoundation.org; Jiri Pirko 
> Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension
> 
> Hi Kirti,
> 
> > -Original Message-
> > From: Kirti Wankhede 
> > Sent: Tuesday, March 5, 2019 4:40 PM
> > To: Parav Pandit ; Jakub Kicinski
> > 
> > Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
> > ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net;
> > gre...@linuxfoundation.org; Jiri Pirko 
> > Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink
> > extension
> >
> >
> >
> > > I am novice at mdev level too. mdev or vfio mdev.
> > > Currently by default we bind to same vendor driver, but when it was
> > created as passthrough device, vendor driver won't create netdevice or
> > rdma device for it.
> > > And vfio/mdev or whatever mature available driver would bind at that
> > point.
> > >
> >
> > Using mdev framework, if you want to partition a physical device into
> > multiple logic devices, you can bind those devices to same vendor
> > driver through vfio-mdev, where as if you want to passthrough the
> > device bind it to vfio-pci. If I understand correctly, that is what you are
> looking for.
> >
> >
> We cannot bind a whole PCI device to vfio-pci, reason is, A given PCI device
> has existing protocol devices on it such as netdevs and rdma dev.
> This device is partitioned while those protocol devices exist and mlx5_core,
> mlx5_ib drivers are loaded on it.
> And we also need to connect these objects rightly to eswitch exposed by
> devlink interface (net/core/devlink.c) that supports eswitch binding, health,
> registers, parameters, ports support.
> It also supports existing PCI VFs.
> 
> I don’t think we want to replicate all of this again in mdev subsystem [1].
> 
> [1] https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt
> 
> So devlink interface to migrate users from managing VFs to non_VF sub
> device is natural progression.
> 
> However, in future, I believe we would be creating mediated devices on user
> request, to use mdev modules and map them to VM.
> 
> Also 'mdev_bus' is created as a class and not as a bus. This limits to not use
> devlink interface whose handle is bus+device name.
> 
> So one option is to change mdev from class to bus.
> devlink will create mdevs on the bus, mdev driver can probe these devices
> on host system by default.
> And if told to do passthrough, a different driver exposes them to VM.
> How feasible is this?
> 
Wait, I do see a mdev bus and mdevs are created on this bus using 
mdev_device_create().
So how about we create mdevs on this bus using devlink, instead of sysfs?
And driver side on host gets the mdev_register_driver()->probe()?




RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension

2019-03-05 Thread Parav Pandit
Hi Kirti,

> -Original Message-
> From: Kirti Wankhede 
> Sent: Tuesday, March 5, 2019 4:40 PM
> To: Parav Pandit ; Jakub Kicinski
> 
> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
> ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net;
> gre...@linuxfoundation.org; Jiri Pirko 
> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
> 
> 
> 
> >> On Mon, 4 Mar 2019 04:41:01 +, Parav Pandit wrote:
> >>>> -Original Message-
> >>>> From: Jakub Kicinski 
> >>>> Sent: Friday, March 1, 2019 2:04 PM
> >>>> To: Parav Pandit ; Or Gerlitz
> >>>> 
> >>>> Cc: net...@vger.kernel.org; linux-kernel@vger.kernel.org;
> >>>> michal.l...@markovi.net; da...@davemloft.net;
> >>>> gre...@linuxfoundation.org; Jiri Pirko 
> >>>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink
> >>>> extension
> >>>>
> >>>> On Thu, 28 Feb 2019 23:37:44 -0600, Parav Pandit wrote:
> >>>>> Requirements for above use cases:
> >>>>> 
> >>>>> 1. We need a generic user interface & core APIs to create sub
> >>>>> devices from a parent pci device but should be generic enough for
> >>>>> other parent devices 2. Interface should be vendor agnostic 3.
> >>>>> User should be able to set device params at creation time 4. In
> >>>>> future if needed, tool should be able to create passthrough device
> >>>>> to map to a virtual machine
> >>>>
> >>>> Like a mediated device?
> >>>
> >>> Yes.
> >>>
> >>>> https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt
> >>>> https://www.dpdk.org/wp-
> content/uploads/sites/35/2018/06/Mediated-
> >>>> Devices-Better-Userland-IO.pdf
> >>>>
> >>>> Other than pass-through it is entirely unclear to me why you'd need
> >>>> a
> >> bus.
> >>>> (Or should I say VM pass through or DPDK?)  Could you clarify why
> >>>> the need for a bus?
> >>>>
> >>> A bus follow standard linux kernel device driver model to attach a
> >>> driver to specific device. Platform device with my limited
> >>> understanding looks a hack/abuse of it based on documentation [1],
> >>> but it can possibly be an alternative to bus if it looks fine to
> >>> Greg and others.
> >>
> >> I grok from this text that the main advantage you see is the ability
> >> to choose a driver for the subdevice.
> >>
> > Yes.
> >
> >>>> My thinking is that we should allow spawning subports in devlink
> >>>> and if user specifies "passthrough" the device spawned would be an
> mdev.
> >>>
> >>> devlink device is much more comprehensive way to create sub-devices
> >>> than sub-ports for at least below reasons.
> >>>
> >>> 1. devlink device already defines device->port relation which
> >>> enables to create multiport device.
> >>
> >> I presume that by devlink device you mean devlink instance?  Yes,
> >> this part I'm following.
> >>
> > Yes -> 'struct devlink'
> >>> subport breaks that.
> >>
> >> Breaks what?  The ability to create a devlink instance with multiple ports?
> >>
> > Right.
> >
> >>> 2. With bus model, it enables us to load driver of same vendor or
> >>> generic one such a vfio in future.
> >>
> 
> You can achieve this with mdev as well.
> 
> >> Yes, sorry, I'm not an expert on mdevs, but isn't that the goal of those?
> >> Could you go into more detail why not just use mdevs?
> >>
> > I am novice at mdev level too. mdev or vfio mdev.
> > Currently by default we bind to same vendor driver, but when it was
> created as passthrough device, vendor driver won't create netdevice or rdma
> device for it.
> > And vfio/mdev or whatever mature available driver would bind at that
> point.
> >
> 
> Using mdev framework, if you want to partition a physical device into
> multiple logic devices, you can bind those devices to same vendor driver
> through vfio-mdev, where as if you want to passthrough the device bind it to
> vfio-pci. If I understand correctly, that is what you are looking for.
> 
> 
We cannot bind a whole PCI device to vfio-pci, reason is,

RE: [RFC net-next 8/8] net/mlx5: Add subdev driver to bind to subdev devices

2019-03-05 Thread Parav Pandit



> -Original Message-
> From: Greg KH 
> Sent: Tuesday, March 5, 2019 1:27 PM
> To: Parav Pandit 
> Cc: net...@vger.kernel.org; linux-kernel@vger.kernel.org;
> michal.l...@markovi.net; da...@davemloft.net; Jiri Pirko
> ; Jakub Kicinski 
> Subject: Re: [RFC net-next 8/8] net/mlx5: Add subdev driver to bind to
> subdev devices
> 
> On Tue, Mar 05, 2019 at 05:57:58PM +, Parav Pandit wrote:
> >
> >
> > > -Original Message-
> > > From: Greg KH 
> > > Sent: Tuesday, March 5, 2019 1:14 AM
> > > To: Parav Pandit 
> > > Cc: net...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > michal.l...@markovi.net; da...@davemloft.net; Jiri Pirko
> > > ; Jakub Kicinski 
> > > Subject: Re: [RFC net-next 8/8] net/mlx5: Add subdev driver to bind
> > > to subdev devices
> > >
> > > On Fri, Mar 01, 2019 at 05:21:13PM +, Parav Pandit wrote:
> > > >
> > > >
> > > > > -Original Message-
> > > > > From: Greg KH 
> > > > > Sent: Friday, March 1, 2019 1:22 AM
> > > > > To: Parav Pandit 
> > > > > Cc: net...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > > > michal.l...@markovi.net; da...@davemloft.net; Jiri Pirko
> > > > > 
> > > > > Subject: Re: [RFC net-next 8/8] net/mlx5: Add subdev driver to
> > > > > bind to subdev devices
> > > > >
> > > > > On Thu, Feb 28, 2019 at 11:37:52PM -0600, Parav Pandit wrote:
> > > > > > Add a subdev driver to probe the subdev devices and create
> > > > > > fake netdevice for it.
> > > > >
> > > > > So I'm guessing here is the "meat" of the whole goal here?
> > > > >
> > > > > You just want multiple netdevices per PCI device?  Why can't you
> > > > > do that today in your PCI driver?
> > > > >
> > > > Yes, but it just not multiple netdevices.
> > > > Let me please elaborate in detail.
> > > >
> > > > There is a swichdev mode of a PCI function for netdevices.
> > > > In this mode a given netdev has additional control netdev (called
> > > representor netdevice = rep-ndev).
> > > > This rep-ndev is attached to OVS for adding rules, offloads etc
> > > > using
> > > standard tc, netfilter infra.
> > > > Currently this rep-ndev controls switch side of the settings, but
> > > > not the
> > > host side of netdev.
> > > > So there is discussion to create another netdev or devlink port..
> > > >
> > > > Additionally this subdev has optional rdma device too.
> > > >
> > > > And when we are in switchdev mode, this rdma dev has similar rdma
> > > > rep
> > > device for control.
> > > >
> > > > In some cases we actually don't create netdev when it is in
> > > > InfiniBand
> > > mode.
> > > > Here there is PCI device->rdma_device.
> > > >
> > > > In other case, a given sub device for rdma is dual port device,
> > > > having
> > > netdevice for each that can use existing netdev->dev_port.
> > > >
> > > > Creating 4 devices of two different classes using one iproute2/ip
> > > > or
> > > iproute2/rdma command is horrible thing to do.
> > >
> > > Why is that?
> > >
> > When user creates the device, user tool needs to return a device handle
> that got created.
> > Creating multiple devices doesn't make sense. I haven't seen any tool
> doing such crazy thing.
> 
> And what do you mean by "device handle"?  All you get here is a sysfs device
> tree.
> 
Subdev devices are created using devlink tool that works on device handle.
Device handle is defined using bus/device of a 'struct device'.
It is described in [1].
$ devlink dev add DEV creates new devlink device instance and its holding 
'struct device'.
This command returns device handle = new devlink instance bus/name.
Patch 6 in the series returns device handle.
Patch 6 is at [2] with example in it where sysfs name and devlink matches with 
each other.

> > > > In case if this sub device has to be a passthrough device, ip link
> > > > command
> > > will fail badly that day, because we are creating some sub device
> > > which is not even a netdevice.
> > >
> > > But it is a network device, right?
> > >
> > When there is passthrough subdevice, there won't be netdevice creat

RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension

2019-03-05 Thread Parav Pandit



> -Original Message-
> From: Jakub Kicinski 
> Sent: Monday, March 4, 2019 7:35 PM
> To: Parav Pandit 
> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
> ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net;
> gre...@linuxfoundation.org; Jiri Pirko 
> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
> 
> Parav, please wrap your responses to at most 80 characters.
> This is hard to read.
> 
Sorry about it. I will wrap now on.

> On Mon, 4 Mar 2019 04:41:01 +, Parav Pandit wrote:
> > > -Original Message-
> > > From: Jakub Kicinski 
> > > Sent: Friday, March 1, 2019 2:04 PM
> > > To: Parav Pandit ; Or Gerlitz
> > > 
> > > Cc: net...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > michal.l...@markovi.net; da...@davemloft.net;
> > > gre...@linuxfoundation.org; Jiri Pirko 
> > > Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink
> > > extension
> > >
> > > On Thu, 28 Feb 2019 23:37:44 -0600, Parav Pandit wrote:
> > > > Requirements for above use cases:
> > > > 
> > > > 1. We need a generic user interface & core APIs to create sub
> > > > devices from a parent pci device but should be generic enough for
> > > > other parent devices 2. Interface should be vendor agnostic 3.
> > > > User should be able to set device params at creation time 4. In
> > > > future if needed, tool should be able to create passthrough device
> > > > to map to a virtual machine
> > >
> > > Like a mediated device?
> >
> > Yes.
> >
> > > https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt
> > > https://www.dpdk.org/wp-content/uploads/sites/35/2018/06/Mediated-
> > > Devices-Better-Userland-IO.pdf
> > >
> > > Other than pass-through it is entirely unclear to me why you'd need a
> bus.
> > > (Or should I say VM pass through or DPDK?)  Could you clarify why
> > > the need for a bus?
> > >
> > A bus follow standard linux kernel device driver model to attach a
> > driver to specific device. Platform device with my limited
> > understanding looks a hack/abuse of it based on documentation [1], but
> > it can possibly be an alternative to bus if it looks fine to Greg and
> > others.
> 
> I grok from this text that the main advantage you see is the ability to choose
> a driver for the subdevice.
> 
Yes.

> > > My thinking is that we should allow spawning subports in devlink and
> > > if user specifies "passthrough" the device spawned would be an mdev.
> >
> > devlink device is much more comprehensive way to create sub-devices
> > than sub-ports for at least below reasons.
> >
> > 1. devlink device already defines device->port relation which enables
> > to create multiport device.
> 
> I presume that by devlink device you mean devlink instance?  Yes, this part
> I'm following.
> 
Yes -> 'struct devlink' 
> > subport breaks that.
> 
> Breaks what?  The ability to create a devlink instance with multiple ports?
> 
Right.

> > 2. With bus model, it enables us to load driver of same vendor or
> > generic one such a vfio in future.
> 
> Yes, sorry, I'm not an expert on mdevs, but isn't that the goal of those?
> Could you go into more detail why not just use mdevs?
> 
I am novice at mdev level too. mdev or vfio mdev.
Currently by default we bind to same vendor driver, but when it was created as 
passthrough device, vendor driver won't create netdevice or rdma device for it.
And vfio/mdev or whatever mature available driver would bind at that point.

> > 3. Devices live on the bus, mapping a subport to 'struct device' is
> > not intuitive.
> 
> Are you saying that the main devlink instance would not have any port
> information for the subdevices?
> 
Right, this newly created devlink device is the control point of its port(s).

> Devices live on a bus.  Software constructs - depend on how one wants to
> model them - don't have to.
> 
> > 4. sub-device allows to use existing devlink port, registers, health
> > infrastructure to sub devices, which otherwise need to be duplicated
> > for ports.
> 
> Health stuff is not tied to a port, I'm not following you.  You can create a
> reporter per port, per ACL rule or per SB or per whatever your heart desires..
> 
Instead of creating multiple reporters and inventing these reporter naming 
schemes,
creating devlink instance leverage all health reporting done for a devliink 
instance.
So whatever is done for 

RE: [RFC net-next 8/8] net/mlx5: Add subdev driver to bind to subdev devices

2019-03-05 Thread Parav Pandit



> -Original Message-
> From: Greg KH 
> Sent: Tuesday, March 5, 2019 1:14 AM
> To: Parav Pandit 
> Cc: net...@vger.kernel.org; linux-kernel@vger.kernel.org;
> michal.l...@markovi.net; da...@davemloft.net; Jiri Pirko
> ; Jakub Kicinski 
> Subject: Re: [RFC net-next 8/8] net/mlx5: Add subdev driver to bind to
> subdev devices
> 
> On Fri, Mar 01, 2019 at 05:21:13PM +, Parav Pandit wrote:
> >
> >
> > > -Original Message-
> > > From: Greg KH 
> > > Sent: Friday, March 1, 2019 1:22 AM
> > > To: Parav Pandit 
> > > Cc: net...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > michal.l...@markovi.net; da...@davemloft.net; Jiri Pirko
> > > 
> > > Subject: Re: [RFC net-next 8/8] net/mlx5: Add subdev driver to bind
> > > to subdev devices
> > >
> > > On Thu, Feb 28, 2019 at 11:37:52PM -0600, Parav Pandit wrote:
> > > > Add a subdev driver to probe the subdev devices and create fake
> > > > netdevice for it.
> > >
> > > So I'm guessing here is the "meat" of the whole goal here?
> > >
> > > You just want multiple netdevices per PCI device?  Why can't you do
> > > that today in your PCI driver?
> > >
> > Yes, but it just not multiple netdevices.
> > Let me please elaborate in detail.
> >
> > There is a swichdev mode of a PCI function for netdevices.
> > In this mode a given netdev has additional control netdev (called
> representor netdevice = rep-ndev).
> > This rep-ndev is attached to OVS for adding rules, offloads etc using
> standard tc, netfilter infra.
> > Currently this rep-ndev controls switch side of the settings, but not the
> host side of netdev.
> > So there is discussion to create another netdev or devlink port..
> >
> > Additionally this subdev has optional rdma device too.
> >
> > And when we are in switchdev mode, this rdma dev has similar rdma rep
> device for control.
> >
> > In some cases we actually don't create netdev when it is in InfiniBand
> mode.
> > Here there is PCI device->rdma_device.
> >
> > In other case, a given sub device for rdma is dual port device, having
> netdevice for each that can use existing netdev->dev_port.
> >
> > Creating 4 devices of two different classes using one iproute2/ip or
> iproute2/rdma command is horrible thing to do.
> 
> Why is that?
> 
When user creates the device, user tool needs to return a device handle that 
got created.
Creating multiple devices doesn't make sense. I haven't seen any tool doing 
such crazy thing.

> > In case if this sub device has to be a passthrough device, ip link command
> will fail badly that day, because we are creating some sub device which is not
> even a netdevice.
> 
> But it is a network device, right?
> 
When there is passthrough subdevice, there won't be netdevice created.
We don't want to create passthrough subdevice using iproute2/ip tool which 
primarily works on netdevices.

> > So iproute2/devlink which works on bus+device, mainly PCI today, seems
> right abstraction point to create sub devices.
> > This also extends to map ports of the device, health, registers debug, etc
> rich infrastructure that is already built.
> >
> > Additionally, we don't want mlx driver and other drivers to go through its
> child devices (split logic in netdev and rdma) for power management.
> 
> And how is power management going to work with your new devices?  All
> you have here is a tiny shim around a driver bus, 
So subdevices power management is done before their parent's.
Vendor driver doesn't need to iterate its child devices to suspend/resume it.

> I do not see any new
> functionality, and as others have said, no way to actually share, or split up,
> the PCI resources.
> 
devlink tool create command will be able to accept more parameters during 
device creation time to share and split PCI resources.
This is just the start of the development and RFC is to agree on direction.
devlink tool has parameters options that can be queried/set and existing infra 
will be used for granular device config.

> > Kernel core code does that well today, that we like to leverage through
> subdev bus or mfd pm callbacks.
> >
> > So it is lot more than just creating netdevices.
> 
> But that's all you are showing here :)
> 
Starting use case is netdev and rdma, but we don't want to create new tools few 
months/a year later for passthrough mode or for different link layers etc.

> > > What problem are you trying to solve that others also are having
> > > that requires all of this?
> > >
> > > Adding a new bus type

RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension

2019-03-05 Thread Parav Pandit



> -Original Message-
> From: Jakub Kicinski 
> Sent: Monday, March 4, 2019 7:46 PM
> To: Parav Pandit 
> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
> ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net;
> gre...@linuxfoundation.org; Jiri Pirko 
> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
> 
> On Mon, 4 Mar 2019 04:41:01 +, Parav Pandit wrote:
> > > > $ devlink dev show
> > > > pci/:05:00.0
> > > > subdev/subdev0
> > >
> > > Please don't spawn devlink instances.  Devlink instance is supposed
> > > to represent an ASIC.  If we start spawning them willy nilly for
> > > whatever software construct we want to model the clarity of the
> > > ontology will suffer a lot.
> > Devlink devices not restricted to ASIC even though today it is
> > representing ASIC for one vendor. Today for one ASIC, it already
> > presents multiple devlink devices (128 or more) for PF and VFs, two
> > PFs on same ASIC etc. VF is just a sub-device which is well defined by
> > PCISIG, whereas sub-device is not. Sub-device do consume actual ASIC
> > resources (just like PFs and VFs), Hence point-(6) of cover-letter
> > indicate that the devlink capability to tell how many such sub-devices
> > can be created.
> >
> > In above example, they are created for a given bus-device following
> > existing devlink construct.
> 
> No, it's not "representing the ASIC for one vendor".  It's how it works for
> switches (including mlxsw) and how it was described in the original cover
> letter:
> 
Sorry for the confusion.
I meant to say, my understanding is Netronome creates one devlink instance for 
whole ASIC.
Please correct me if this is incorrect.
mlx5_core driver creates multiple devlink devices for PF and VFs for one ASIC.

> Introduce devlink interface and first drivers to use it
> 
> There a is need for some userspace API that would allow to expose things
> that are not directly related to any device class like net_device of
> ib_device, but rather chip-wide/switch-ASIC-wide stuff.
> 
> [...]
> 
> We can deviate from the original intent if need be and dilute the ontology.
> But let's be clear on the status quo, please.
Status quo is mlx5_core driver creates multiple devlink devices. It creates for 
devlink device for each PF and VF of a single ASIC. 


RE: [RFC net-next 8/8] net/mlx5: Add subdev driver to bind to subdev devices

2019-03-04 Thread Parav Pandit
Hi Saeed,

> -Original Message-
> From: Saeed Mahameed
> Sent: Friday, March 1, 2019 4:12 PM
> To: Jiri Pirko ; net...@vger.kernel.org; linux-
> ker...@vger.kernel.org; Parav Pandit ;
> da...@davemloft.net; gre...@linuxfoundation.org;
> michal.l...@markovi.net
> Subject: Re: [RFC net-next 8/8] net/mlx5: Add subdev driver to bind to
> subdev devices
> 
> On Thu, 2019-02-28 at 23:37 -0600, Parav Pandit wrote:
> > Add a subdev driver to probe the subdev devices and create fake
> > netdevice for it.
> >
> > Signed-off-by: Parav Pandit 
> > ---
> >  drivers/net/ethernet/mellanox/mlx5/core/Makefile   |  2 +-
> >  drivers/net/ethernet/mellanox/mlx5/core/main.c |  8 +-
> >  .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|  3 +
> >  .../ethernet/mellanox/mlx5/core/subdev_driver.c| 93
> > ++
> >  4 files changed, 104 insertions(+), 2 deletions(-)  create mode
> > 100644 drivers/net/ethernet/mellanox/mlx5/core/subdev_driver.c
> >
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
> > b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
> > index f218789..c8aeaf1 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
> > @@ -16,7 +16,7 @@ mlx5_core-y :=main.o cmd.o debugfs.o fw.o
> > eq.o uar.o pagealloc.o \
> > transobj.o vport.o sriov.o fs_cmd.o fs_core.o \
> > fs_counters.o rl.o lag.o dev.o events.o wq.o lib/gid.o \
> > lib/devcom.o diag/fs_tracepoint.o diag/fw_tracer.o
> > -mlx5_core-$(CONFIG_SUBDEV) += subdev.o
> > +mlx5_core-$(CONFIG_SUBDEV) += subdev.o subdev_driver.o
> >
> >  #
> >  # Netdev basic
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > index 5f8cf0d..7dfa8c4 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > @@ -1548,7 +1548,11 @@ static int __init init(void)
> > mlx5e_init();
> >  #endif
> >
> > -   return 0;
> > +   err = subdev_register_driver(_subdev_driver);
> > +   if (err)
> > +   pci_unregister_driver(_core_driver);
> > +
> > +   return err;
> >
> >  err_debug:
> > mlx5_unregister_debugfs();
> > @@ -1557,6 +1561,8 @@ static int __init init(void)
> >
> >  static void __exit cleanup(void)
> >  {
> > +   subdev_unregister_driver(_subdev_driver);
> > +
> >  #ifdef CONFIG_MLX5_CORE_EN
> > mlx5e_cleanup();
> >  #endif
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
> > b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
> > index 2a54148..1b733c7 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
> > @@ -41,12 +41,15 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >
> >  #define DRIVER_NAME "mlx5_core"
> >  #define DRIVER_VERSION "5.0-0"
> >
> >  extern uint mlx5_core_debug_mask;
> >
> > +extern struct subdev_driver mlx5_subdev_driver;
> > +
> >  #define mlx5_core_dbg(__dev, format, ...)
> > \
> > dev_dbg(&(__dev)->pdev->dev, "%s:%d:(pid %d): " format,
> 
> > \
> >  __func__, __LINE__, current->pid,
> > \
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/subdev_driver.c
> > b/drivers/net/ethernet/mellanox/mlx5/core/subdev_driver.c
> > new file mode 100644
> > index 000..880aa4f
> > --- /dev/null
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/subdev_driver.c
> > @@ -0,0 +1,93 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +// Copyright (c) 2018-19 Mellanox Technologies
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +struct mlx5_subdev_ndev {
> > +   struct net_device ndev;
> > +};
> > +
> > +static void mlx5_dma_test(struct device *dev) {
> > +   dma_addr_t pa;
> > +   void *va;
> > +
> > +   va = dma_alloc_coherent(dev, 4096, , GFP_KERNEL);
> > +   if (va)
> > +   dma_free_coherent(dev, 4096, va, pa); }
> > +
> > +static struct net_device *ndev;
> > +
> > +static int mlx5e_subdev_open(struct net_device *netdev) {
> > +   return 0;
> > +}
> > +
> > +static int mlx5e_subdev_close(struct net_device *netdev) {
> > +  

RE: [PATCH] RDMA/cma: Make CM response timeout and # CM retries configurable

2019-03-03 Thread Parav Pandit


> -Original Message-
> From: Leon Romanovsky 
> Sent: Saturday, February 23, 2019 2:50 AM
> To: Doug Ledford 
> Cc: Steve Wise ; Jason Gunthorpe
> ; Håkon Bugge ; Parav Pandit
> ; linux-r...@vger.kernel.org; linux-
> ker...@vger.kernel.org
> Subject: Re: [PATCH] RDMA/cma: Make CM response timeout and # CM
> retries configurable
> 
> On Fri, Feb 22, 2019 at 12:51:55PM -0500, Doug Ledford wrote:
> >
> >
> > > On Feb 22, 2019, at 12:14 PM, Steve Wise
>  wrote:
> > >
> > >
> > > On 2/22/2019 10:36 AM, Jason Gunthorpe wrote:
> > >> On Sun, Feb 17, 2019 at 06:09:09PM +0100, Håkon Bugge wrote:
> > >>> During certain workloads, the default CM response timeout is too
> > >>> short, leading to excessive retries. Hence, make it configurable
> > >>> through sysctl. While at it, also make number of CM retries
> > >>> configurable.
> > >>>
> > >>> The defaults are not changed.
> > >>>
> > >>> Signed-off-by: Håkon Bugge 
> > >>> drivers/infiniband/core/cma.c | 51
> > >>> ++-
> > >>> 1 file changed, 44 insertions(+), 7 deletions(-)
> > >>>
> > >>> diff --git a/drivers/infiniband/core/cma.c
> > >>> b/drivers/infiniband/core/cma.c index c43512752b8a..ce99e1cd1029
> > >>> 100644
> > >>> +++ b/drivers/infiniband/core/cma.c
> > >>> @@ -43,6 +43,7 @@
> > >>> #include 
> > >>> #include 
> > >>> #include 
> > >>> +#include 
> > >>> #include 
> > >>>
> > >>> #include 
> > >>> @@ -68,13 +69,46 @@ MODULE_AUTHOR("Sean Hefty");
> > >>> MODULE_DESCRIPTION("Generic RDMA CM Agent");
> MODULE_LICENSE("Dual
> > >>> BSD/GPL");
> > >>>
> > >>> -#define CMA_CM_RESPONSE_TIMEOUT 20 #define
> > >>> CMA_QUERY_CLASSPORT_INFO_TIMEOUT 3000 -#define
> CMA_MAX_CM_RETRIES
> > >>> 15 #define CMA_CM_MRA_SETTING (IB_CM_MRA_FLAG_DELAY | 24)
> #define
> > >>> CMA_IBOE_PACKET_LIFETIME 18 #define
> CMA_PREFERRED_ROCE_GID_TYPE
> > >>> IB_GID_TYPE_ROCE_UDP_ENCAP
> > >>>
> > >>> +#define CMA_DFLT_CM_RESPONSE_TIMEOUT 20 static int
> > >>> +cma_cm_response_timeout = CMA_DFLT_CM_RESPONSE_TIMEOUT;
> static
> > >>> +int cma_cm_response_timeout_min = 8; static int
> > >>> +cma_cm_response_timeout_max = 31; #undef
> > >>> +CMA_DFLT_CM_RESPONSE_TIMEOUT
> > >>> +
> > >>> +#define CMA_DFLT_MAX_CM_RETRIES 15 static int
> cma_max_cm_retries
> > >>> += CMA_DFLT_MAX_CM_RETRIES; static int cma_max_cm_retries_min
> = 1;
> > >>> +static int cma_max_cm_retries_max = 100; #undef
> > >>> +CMA_DFLT_MAX_CM_RETRIES
> > >>> +
> > >>> +static struct ctl_table_header *cma_ctl_table_hdr; static struct
> > >>> +ctl_table cma_ctl_table[] = {
> > >>> +   {
> > >>> +   .procname   = "cma_cm_response_timeout",
> > >>> +   .data   = _cm_response_timeout,
> > >>> +   .maxlen = sizeof(cma_cm_response_timeout),
> > >>> +   .mode   = 0644,
> > >>> +   .proc_handler   = proc_dointvec_minmax,
> > >>> +   .extra1 = _cm_response_timeout_min,
> > >>> +   .extra2 = _cm_response_timeout_max,
> > >>> +   },
> > >>> +   {
> > >>> +   .procname   = "cma_max_cm_retries",
> > >>> +   .data   = _max_cm_retries,
> > >>> +   .maxlen = sizeof(cma_max_cm_retries),
> > >>> +   .mode   = 0644,
> > >>> +   .proc_handler   = proc_dointvec_minmax,
> > >>> +   .extra1 = _max_cm_retries_min,
> > >>> +   .extra2 = _max_cm_retries_max,
> > >>> +   },
> > >>> +   { }
> > >>> +};
> > >> Is sysctl the right approach here? Should it be rdma tool instead?
> > >>
> > >> Jason
> > >
> > > There are other rdma sysctls currently:  net.rdma_ucm.max_backlog
> > > and net.iw_cm.default_backlog.  The core net

RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension

2019-03-03 Thread Parav Pandit



> -Original Message-
> From: Jakub Kicinski 
> Sent: Friday, March 1, 2019 2:04 PM
> To: Parav Pandit ; Or Gerlitz 
> Cc: net...@vger.kernel.org; linux-kernel@vger.kernel.org;
> michal.l...@markovi.net; da...@davemloft.net;
> gre...@linuxfoundation.org; Jiri Pirko 
> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
> 
> On Thu, 28 Feb 2019 23:37:44 -0600, Parav Pandit wrote:
> > Requirements for above use cases:
> > 
> > 1. We need a generic user interface & core APIs to create sub devices
> > from a parent pci device but should be generic enough for other parent
> > devices 2. Interface should be vendor agnostic 3. User should be able
> > to set device params at creation time 4. In future if needed, tool
> > should be able to create passthrough device to map to a virtual
> > machine
> 
> Like a mediated device?
>
Yes.
 
> https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt
> https://www.dpdk.org/wp-content/uploads/sites/35/2018/06/Mediated-
> Devices-Better-Userland-IO.pdf
> 
> Other than pass-through it is entirely unclear to me why you'd need a bus.
> (Or should I say VM pass through or DPDK?)  Could you clarify why the need
> for a bus?
> 
A bus follow standard linux kernel device driver model to attach a driver to 
specific device.
Platform device with my limited understanding looks a hack/abuse of it based on 
documentation [1], but it can possibly be an alternative to bus if it looks 
fine to Greg and others.

> My thinking is that we should allow spawning subports in devlink and if user
> specifies "passthrough" the device spawned would be an mdev.
>
devlink device is much more comprehensive way to create sub-devices than 
sub-ports for at least below reasons.

1. devlink device already defines device->port relation which enables to create 
multiport device.
subport breaks that.
2. With bus model, it enables us to load driver of same vendor or generic one 
such a vfio in future.
3. Devices live on the bus, mapping a subport to 'struct device' is not 
intuitive.
4. sub-device allows to use existing devlink port, registers, health 
infrastructure to sub devices, which otherwise need to be duplicated for ports.
5. Even though current devlink devices are networking devices, there is nothing 
restricts it to be that way.
So subport is a restricted view.
6. devlink device already covers port sub-object, hence creating devlink device 
is desired.

> > 5. A device can have multiple ports
> 
> What does this mean, in practice?  You want to spawn a subdev which can
> access both ports?  That'd be for RDMA use cases, more than Ethernet,
> right?  (Just clarifying :))
>
Yep, you got it right. :-)
 
> > So how is it done?
> > --
> > (a) user in control
> > To address above requirements, a generic tool iproute2/devlink is
> > extended for sub device's life cycle.
> > However a devlink tool and its kernel counter part is not sufficient
> > to create protocol agnostic devices on a existing PCI bus.
> 
> "Protocol agnostic"?...  What does that mean?
> 
Devlink works on bus,device model. It doesn't matter what class of device is.
For example, for pci class can be anything. So newly created sub-devices are 
not limited to netdev/rdma devices.
Its agnostic to protocol.
More importantly, we don't want to create these sub-devices who bus type is 
'pci'.
Because as described below, PCI has its addressing scheme and pci bus must not 
have mix-n match devices.

So probably better wording should be,
'a devlink tool and its kernel counterpart is not sufficient to create 
sub-devices of same class as that of PCI device.

> > (b) subdev bus
> > A given bus defines well defined addressing scheme. Creating sub
> > devices on existing PCI bus with a different naming scheme is just weird.
> > So, creating well named devices on appropriate bus is desired.
> 
> What's that address scheme you're referring to, you seem to assign IDs in
> sequence?
>
Yes. a device on subdev bus follows standard linux driver model based id 
assignment scheme = u32.
And devices are well named as 'subdev0'. Prefix + id as the default scheme of 
core driver model.
 
> >
> > Given that, these are user created devices for a given hardware and in
> > absence of a central entity like PCISIG to assign vendor and device
> > ids, A unique vendor and device id are maintained as enum in
> > include/linux/subdev_ids.h.
> 
> Why do we need IDs?  The sysfs hierarchy isn't sufficient?  

> Do we need a driver to match on those again?  Is it going to be a different 
> driver?
> 
IDs are used to match driver against the created device.
It can be same or different driver.
Eve

RE: [RFC net-next 8/8] net/mlx5: Add subdev driver to bind to subdev devices

2019-03-01 Thread Parav Pandit



> -Original Message-
> From: Greg KH 
> Sent: Friday, March 1, 2019 1:22 AM
> To: Parav Pandit 
> Cc: net...@vger.kernel.org; linux-kernel@vger.kernel.org;
> michal.l...@markovi.net; da...@davemloft.net; Jiri Pirko
> 
> Subject: Re: [RFC net-next 8/8] net/mlx5: Add subdev driver to bind to
> subdev devices
> 
> On Thu, Feb 28, 2019 at 11:37:52PM -0600, Parav Pandit wrote:
> > Add a subdev driver to probe the subdev devices and create fake
> > netdevice for it.
> 
> So I'm guessing here is the "meat" of the whole goal here?
> 
> You just want multiple netdevices per PCI device?  Why can't you do that
> today in your PCI driver?
> 
Yes, but it just not multiple netdevices.
Let me please elaborate in detail.

There is a swichdev mode of a PCI function for netdevices.
In this mode a given netdev has additional control netdev (called representor 
netdevice = rep-ndev).
This rep-ndev is attached to OVS for adding rules, offloads etc using standard 
tc, netfilter infra.
Currently this rep-ndev controls switch side of the settings, but not the host 
side of netdev.
So there is discussion to create another netdev or devlink port..

Additionally this subdev has optional rdma device too.

And when we are in switchdev mode, this rdma dev has similar rdma rep device 
for control.

In some cases we actually don't create netdev when it is in InfiniBand mode.
Here there is PCI device->rdma_device.

In other case, a given sub device for rdma is dual port device, having 
netdevice for each that can use existing netdev->dev_port.

Creating 4 devices of two different classes using one iproute2/ip or 
iproute2/rdma command is horrible thing to do.

In case if this sub device has to be a passthrough device, ip link command will 
fail badly that day, because we are creating some sub device which is not even 
a netdevice.

So iproute2/devlink which works on bus+device, mainly PCI today, seems right 
abstraction point to create sub devices.
This also extends to map ports of the device, health, registers debug, etc rich 
infrastructure that is already built.

Additionally, we don't want mlx driver and other drivers to go through its 
child devices (split logic in netdev and rdma) for power management.
Kernel core code does that well today, that we like to leverage through subdev 
bus or mfd pm callbacks.

So it is lot more than just creating netdevices.

> What problem are you trying to solve that others also are having that
> requires all of this?
> 
> Adding a new bus type and subsystem is fine, but usually we want more
> than just one user of it, as this does not really show how it is exercised 
> very
> well.
This subdev and devlink infrastructure solves this problem of creating smaller 
sub devices out of one PCI device.
Someone has to start.. :-)

To my knowledge, currently Netronome, Broadcom and Mellanox are actively using 
this devlink and switchdev infra today.
I added Jakub from Netronome, he is in netdev mailing list, but added in CC, to 
listen his feedback.

> Ideally 3 users would be there as that is when it proves itself that it is
> flexible enough.
> 

We were looking at drivers/visorbus if we can repurpose it, but GUID device 
naming scheme is just not user friendly.
It has only single s-Par user and whose guest drivers are still in staging for 
more than a year now. So doesn't really fit well.

> Would just using the mfd subsystem work better for you?  That provides
> core support for "multi-function" drivers/devices already.  What is missing
> from that subsystem that does not work for you here?
> 
We were not aware of mfd until now. I looked at very high level now. It's a 
wrapper to platform devices and seems widely use.
Before subdev proposal, Jason suggested an alternative is to create platform 
devices and driver attach to it.

When I read kernel documentation [1], it says "platform devices typically 
appear as autonomous entities"
Here instead of autonomy, it is in user's control.
Platform devices probably don't disappear a lot in live system as opposed to 
subdevices which are created and removed dynamically a lot often.

Not sure if platform device is abuse for this purpose or not.
So which direction to go, devlink->mfd(platform wrapper) or devlink->subdev 
would be obviously a huge blessing.

[1] https://www.kernel.org/doc/Documentation/driver-model/platform.txt



RE: [RFC net-next 1/8] subdev: Introducing subdev bus

2019-03-01 Thread Parav Pandit
Hi Greg,

> -Original Message-
> From: Greg KH 
> Sent: Friday, March 1, 2019 1:17 AM
> To: Parav Pandit 
> Cc: net...@vger.kernel.org; linux-kernel@vger.kernel.org;
> michal.l...@markovi.net; da...@davemloft.net; Jiri Pirko
> 
> Subject: Re: [RFC net-next 1/8] subdev: Introducing subdev bus
> 
> On Thu, Feb 28, 2019 at 11:37:45PM -0600, Parav Pandit wrote:
> > Introduce a new subdev bus which holds sub devices created from a
> > primary device. These devices are named as 'subdev'.
> > A subdev is identified similarly to pci device using 16-bit vendor id
> > and device id.
> > Unlike PCI devices, scope of subdev is limited to Linux kernel.
> 
> But these are limited to only PCI devices, right?
> 
For Mellanox use case yes, its limited to PCI devices.

> This sounds a lot like that ARM proposal a week or so ago that asked for
> something like this, are you working with them to make sure your proposal
> works for them as well?  (sorry, can't find where that was announced, it was
> online somewhere...)
> 
We were not aware of it, mostly because we are either on net side of mailing 
lists (netdev, rdma, virt etc).
ARM proposal likely on linux-kernel, I guess.
I will lookup that proposal and surely see if both of us can use common 
infrastructure.

> > A central entry that assigns unique subdev vendor and device id is:
> > include/linux/subdev_ids.h enums. Enum are chosen over define macro so
> > that two vendors do not end up with vendor id in kernel development
> > process.
> 
> Why not just make it dynamic with on static ids?
> 
Can you please elaborate?
Do you mean we should use something similar to pci_add_dynid() with enhancement 
to catch duplicate id addition?

> > subdev bus holds subdevices of multiple devices. A typical created
> > subdev for a PCI device in sysfs tree appears under their parent's
> > device as using core's default device naming scheme:
> >
> > subdev.
> > i.e.
> > subdev0
> > subdev1
> >
> > $ ls -l /sys/bus/pci/devices/:05:00.0 [..]
> > drwxr-xr-x 4 root root0 Feb 13 15:57 subvdev0
> > drwxr-xr-x 4 root root0 Feb 13 15:57 subvdev1
> >
> > Device model view:
> > --
> >+--++--+   +--+
> >|subdev||subdev|   |subdev|
> >   -|  1   ||  2   |---|  3   |--
> >   |+--|---++-|+   +--|---+ |
> >   |--|---subdev bus--|--
> >   |  |   |
> >+--++-+   +---+---+
> >|pcidev | |pcidev |
> >   -|   A   |-|   B   |--
> >   |+---+ +---+ |
> >   ---pci bus
> 
> To be clear, "subdev bus" is just a logical grouping, there is no physical
> backing "bus" here at all, right?
> 
Yep. that's correct.

> What is going to "bind" to subdev devices?  PCI drivers?  Or new types of
> drivers?
> 
Devices are placed on subdev bus using devlink interface. And drivers which 
registers using subdev_register_driver(), their probe() method will be called.
So yes, those are PCI vendor driver.
I tried to capture this in cover-letter.
At present users didn't ask to map this subdev to VM, but there is very high 
chance that once we have this without PCI SR-IOV, they would like to extend to 
VMs too.
So in that case devlink will have option to say, add 'passthrough' device, and 
in that case instead of vendor's pci driver, some high level vfio type driver 
will bind to it.
That is just the anticipation, but we haven't really worked out this fully.
But this model allows to do so.

> > subdev are allocated and freed using subdev_alloc(), subdev_free() APIs.
> > A driver which wants to create actual class driver such as
> > net/block/infiniband need to use subdev_register_driver(),
> > subdev_unregister_driver() APIs.
> >
> > +++ b/drivers/subdev/Kconfig
> > @@ -0,0 +1,12 @@
> > +#
> > +# subdev configuration
> > +#
> > +
> > +config SUBDEV
> > +   tristate "subdev bus driver"
> > +   help
> > +   The subdev bus driver allows creating hardware based sub devices
> > +   from a parent device. The subdev bus driver is required to create,
> > +   discover devices and to attach device drivers to this subdev
> > +   devices. These subdev devices are created using devlink tool by
> > +   user.
> 
> 
> Your definition of the bus uses the name of the bus in the definit

RE: [RFC net-next 7/8] net/mlx5: Add devlink subdev life cycle command support

2019-03-01 Thread Parav Pandit



> -Original Message-
> From: Greg KH 
> Sent: Friday, March 1, 2019 1:19 AM
> To: Parav Pandit 
> Cc: net...@vger.kernel.org; linux-kernel@vger.kernel.org;
> michal.l...@markovi.net; da...@davemloft.net; Jiri Pirko
> 
> Subject: Re: [RFC net-next 7/8] net/mlx5: Add devlink subdev life cycle
> command support
> 
> On Thu, Feb 28, 2019 at 11:37:51PM -0600, Parav Pandit wrote:
> > --- /dev/null
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/subdev.c
> > @@ -0,0 +1,55 @@
> > +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> 
> For new stuff, just use GPL-2.0, no need to keep the mistake of the Linux-
> OpenIB license around :)
> 
Oh yes. my copy paste error and ignorance of openib carry forward.
Checkpatch did actually complain but didn't realize that it may be for OpenIB.
Will fix this.

> thanks,
> 
> greg k-h


RE: [PATCH net-next 0/8] Introducing subdev bus and devlink extension

2019-02-28 Thread Parav Pandit



> -Original Message-
> From: Parav Pandit 
> Sent: Thursday, February 28, 2019 11:36 PM
> To: net...@vger.kernel.org; linux-kernel@vger.kernel.org;
> michal.l...@markovi.net; da...@davemloft.net;
> gre...@linuxfoundation.org; Jiri Pirko 
> Cc: Parav Pandit 
> Subject: [PATCH net-next 0/8] Introducing subdev bus and devlink extension

Please discard this one email which was sent out as PATCH.
Sending it RFC shortly.



[RFC net-next 2/8] subdev: Introduce pm callbacks

2019-02-28 Thread Parav Pandit
Keep power management callbacks in place to optionally notify drivers
who register them.

Signed-off-by: Parav Pandit 
---
 drivers/subdev/subdev_main.c | 59 
 1 file changed, 59 insertions(+)

diff --git a/drivers/subdev/subdev_main.c b/drivers/subdev/subdev_main.c
index 4aabcaa..e213331 100644
--- a/drivers/subdev/subdev_main.c
+++ b/drivers/subdev/subdev_main.c
@@ -23,10 +23,69 @@ static int subdev_bus_match(struct device *dev, struct 
device_driver *drv)
return 0;
 }
 
+static int subdev_pm_prepare(struct device *dev)
+{
+   if (dev->driver->pm && dev->driver->pm->prepare)
+   return dev->driver->pm->prepare(dev);
+   return 0;
+}
+
+static void subdev_pm_complete(struct device *dev)
+{
+   if (dev->driver->pm && dev->driver->pm->complete)
+   dev->driver->pm->complete(dev);
+}
+
+static int subdev_pm_suspend(struct device *dev)
+{
+   if (dev->driver->pm && dev->driver->pm->suspend)
+   return dev->driver->pm->suspend(dev);
+   return 0;
+}
+
+static int subdev_pm_suspend_late(struct device *dev)
+{
+   if (dev->driver->pm && dev->driver->pm->suspend_late)
+   return dev->driver->pm->suspend_late(dev);
+   return 0;
+}
+
+static int subdev_pm_resume(struct device *dev)
+{
+   if (dev->driver->pm && dev->driver->pm->resume)
+   return dev->driver->pm->resume(dev);
+   return 0;
+}
+
+static int subdev_pm_freeze(struct device *dev)
+{
+   if (dev->driver->pm && dev->driver->pm->freeze)
+   return dev->driver->pm->freeze(dev);
+   return 0;
+}
+
+static int subdev_pm_freeze_late(struct device *dev)
+{
+   if (dev->driver->pm && dev->driver->pm->freeze_late)
+   return dev->driver->pm->freeze_late(dev);
+   return 0;
+}
+
+static const struct dev_pm_ops subdev_dev_pm_ops = {
+   .prepare = subdev_pm_prepare,
+   .complete = subdev_pm_complete,
+   .suspend = subdev_pm_suspend,
+   .suspend_late = subdev_pm_suspend_late,
+   .resume = subdev_pm_resume,
+   .freeze = subdev_pm_freeze,
+   .freeze_late = subdev_pm_freeze_late,
+};
+
 static struct bus_type subdev_bus_type = {
.dev_name = "subdev",
.name = "subdev",
.match = subdev_bus_match,
+   .pm = _dev_pm_ops,
 };
 
 int __subdev_register_driver(struct subdev_driver *drv, struct module *owner,
-- 
1.8.3.1



[RFC net-next 3/8] modpost: Add support for subdev device id table

2019-02-28 Thread Parav Pandit
Add support to parse subdev module device id table.

Signed-off-by: Parav Pandit 
---
 scripts/mod/devicetable-offsets.c |  4 
 scripts/mod/file2alias.c  | 15 +++
 2 files changed, 19 insertions(+)

diff --git a/scripts/mod/devicetable-offsets.c 
b/scripts/mod/devicetable-offsets.c
index 2930044..77f6b6e 100644
--- a/scripts/mod/devicetable-offsets.c
+++ b/scripts/mod/devicetable-offsets.c
@@ -225,5 +225,9 @@ int main(void)
DEVID_FIELD(typec_device_id, svid);
DEVID_FIELD(typec_device_id, mode);
 
+   DEVID(subdev_id);
+   DEVID_FIELD(subdev_id, vendor_id);
+   DEVID_FIELD(subdev_id, device_id);
+
return 0;
 }
diff --git a/scripts/mod/file2alias.c b/scripts/mod/file2alias.c
index a37af7d..be89e8e 100644
--- a/scripts/mod/file2alias.c
+++ b/scripts/mod/file2alias.c
@@ -1287,6 +1287,20 @@ static int do_typec_entry(const char *filename, void 
*symval, char *alias)
return 1;
 }
 
+/* Looks like: subdev:vNdN. */
+static int do_subdev_entry(const char *filename, void *symval, char *alias)
+{
+   DEF_FIELD(symval, subdev_id, vendor_id);
+   DEF_FIELD(symval, subdev_id, device_id);
+
+   strcpy(alias, "subdev:");
+   ADD(alias, "v", 1, vendor_id);
+   ADD(alias, "d", 1, device_id);
+
+   add_wildcard(alias);
+   return 1;
+}
+
 /* Does namelen bytes of name exactly match the symbol? */
 static bool sym_is(const char *name, unsigned namelen, const char *symbol)
 {
@@ -1357,6 +1371,7 @@ static void do_table(void *symval, unsigned long size,
{"fslmc", SIZE_fsl_mc_device_id, do_fsl_mc_entry},
{"tbsvc", SIZE_tb_service_id, do_tbsvc_entry},
{"typec", SIZE_typec_device_id, do_typec_entry},
+   {"subdev", SIZE_subdev_id, do_subdev_entry},
 };
 
 /* Create MODULE_ALIAS() statements.
-- 
1.8.3.1



[RFC net-next 5/8] devlink: Add variant of devlink_register/unregister

2019-02-28 Thread Parav Pandit
Add variants of devlink_register and devlink_unregister which doesn't
explicitly acquire/release devlink_mutex lock, but requires that caller
hold the devlink_mutex lock.

This is required to create child devlink devices while working on
parent devlink device.

Change-Id: I74417158144b28ff51ecfb2d1105c83ebefdf985
Signed-off-by: Parav Pandit 
---
 include/net/devlink.h | 15 ++-
 net/core/devlink.c| 36 +++-
 2 files changed, 45 insertions(+), 6 deletions(-)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index ae5e0e6..9a067b1 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -545,7 +545,9 @@ static inline struct devlink *priv_to_devlink(void *priv)
 void devlink_init(struct devlink *devlink, const struct devlink_ops *ops);
 void devlink_cleanup(struct devlink *devlink);
 struct devlink *devlink_alloc(const struct devlink_ops *ops, size_t priv_size);
+void __devlink_register(struct devlink *devlink, struct device *dev);
 int devlink_register(struct devlink *devlink, struct device *dev);
+void __devlink_unregister(struct devlink *devlink);
 void devlink_unregister(struct devlink *devlink);
 void devlink_free(struct devlink *devlink);
 int devlink_port_register(struct devlink *devlink,
@@ -713,6 +715,7 @@ int devlink_health_report(struct devlink_health_reporter 
*reporter,
 
 static inline void devlink_init(struct devlink *devlink,
const struct devlink_ops *ops)
+{
 }
 
 static inline void devlink_cleanup(struct devlink *devlink)
@@ -725,11 +728,21 @@ static inline struct devlink *devlink_alloc(const struct 
devlink_ops *ops,
return kzalloc(sizeof(struct devlink) + priv_size, GFP_KERNEL);
 }
 
-static inline int devlink_register(struct devlink *devlink, struct device *dev)
+static inline void __devlink_register(struct devlink *devlink,
+ struct device *dev)
+{
+}
+
+static inline int devlink_register(struct devlink *devlink,
+  struct device *dev)
 {
return 0;
 }
 
+static inline void __devlink_unregister(struct devlink *devlink)
+{
+}
+
 static inline void devlink_unregister(struct devlink *devlink)
 {
 }
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 25492c6..cfbad2c 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -5262,22 +5262,49 @@ struct devlink *devlink_alloc(const struct devlink_ops 
*ops, size_t priv_size)
 EXPORT_SYMBOL_GPL(devlink_alloc);
 
 /**
- * devlink_register - Register devlink instance
+ * __devlink_register - Register devlink instance
+ * Caller must hold devlink_mutex.
  *
  * @devlink: devlink
  */
-int devlink_register(struct devlink *devlink, struct device *dev)
+void __devlink_register(struct devlink *devlink, struct device *dev)
 {
-   mutex_lock(_mutex);
+   lockdep_assert_held(_mutex);
devlink->dev = dev;
list_add_tail(>list, _list);
devlink_notify(devlink, DEVLINK_CMD_NEW);
+}
+EXPORT_SYMBOL_GPL(__devlink_register);
+
+/**
+ * devlink_register - Register devlink instance
+ *
+ * @devlink: devlink
+ */
+int devlink_register(struct devlink *devlink, struct device *dev)
+{
+   mutex_lock(_mutex);
+   __devlink_register(devlink, dev);
mutex_unlock(_mutex);
return 0;
 }
 EXPORT_SYMBOL_GPL(devlink_register);
 
 /**
+ * __devlink_unregister - Unregister devlink instance
+ * Caller must hold the devlink_mutex while invoking this API.
+ *
+ * @devlink: devlink
+ */
+void __devlink_unregister(struct devlink *devlink)
+{
+   lockdep_assert_held(_mutex);
+   devlink_notify(devlink, DEVLINK_CMD_DEL);
+   list_del(>list);
+}
+EXPORT_SYMBOL_GPL(__devlink_unregister);
+
+/**
  * devlink_unregister - Unregister devlink instance
  *
  * @devlink: devlink
@@ -5285,8 +5312,7 @@ int devlink_register(struct devlink *devlink, struct 
device *dev)
 void devlink_unregister(struct devlink *devlink)
 {
mutex_lock(_mutex);
-   devlink_notify(devlink, DEVLINK_CMD_DEL);
-   list_del(>list);
+   __devlink_unregister(devlink);
mutex_unlock(_mutex);
 }
 EXPORT_SYMBOL_GPL(devlink_unregister);
-- 
1.8.3.1



[RFC net-next 1/8] subdev: Introducing subdev bus

2019-02-28 Thread Parav Pandit
Introduce a new subdev bus which holds sub devices created from a
primary device. These devices are named as 'subdev'.
A subdev is identified similarly to pci device using 16-bit vendor id
and device id.
Unlike PCI devices, scope of subdev is limited to Linux kernel.
A central entry that assigns unique subdev vendor and device id is:
include/linux/subdev_ids.h enums. Enum are chosen over define macro so
that two vendors do not end up with vendor id in kernel development
process.

subdev bus holds subdevices of multiple devices. A typical created
subdev for a PCI device in sysfs tree appears under their parent's
device as using core's default device naming scheme:

subdev.
i.e.
subdev0
subdev1

$ ls -l /sys/bus/pci/devices/:05:00.0
[..]
drwxr-xr-x 4 root root0 Feb 13 15:57 subvdev0
drwxr-xr-x 4 root root0 Feb 13 15:57 subvdev1

Device model view:
--
   +--++--+   +--+
   |subdev||subdev|   |subdev|
  -|  1   ||  2   |---|  3   |--
  |+--|---++-|+   +--|---+ |
  |--|---subdev bus--|--
  |  |   |
   +--++-+   +---+---+
   |pcidev | |pcidev |
  -|   A   |-|   B   |--
  |+---+ +---+ |
  ---pci bus

subdev are allocated and freed using subdev_alloc(), subdev_free() APIs.
A driver which wants to create actual class driver such as
net/block/infiniband need to use subdev_register_driver(),
subdev_unregister_driver() APIs.

Signed-off-by: Parav Pandit 
---
 drivers/Kconfig |   2 +
 drivers/Makefile|   1 +
 drivers/subdev/Kconfig  |  12 
 drivers/subdev/Makefile |   8 +++
 drivers/subdev/subdev_main.c| 153 
 include/linux/mod_devicetable.h |  12 
 include/linux/subdev_bus.h  |  63 +
 include/linux/subdev_ids.h  |  17 +
 8 files changed, 268 insertions(+)
 create mode 100644 drivers/subdev/Kconfig
 create mode 100644 drivers/subdev/Makefile
 create mode 100644 drivers/subdev/subdev_main.c
 create mode 100644 include/linux/subdev_bus.h
 create mode 100644 include/linux/subdev_ids.h

diff --git a/drivers/Kconfig b/drivers/Kconfig
index 4f9f990..1818796 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -228,4 +228,6 @@ source "drivers/siox/Kconfig"
 
 source "drivers/slimbus/Kconfig"
 
+source "drivers/subdev/Kconfig"
+
 endmenu
diff --git a/drivers/Makefile b/drivers/Makefile
index e1ce029..a040e96 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -186,3 +186,4 @@ obj-$(CONFIG_MULTIPLEXER)   += mux/
 obj-$(CONFIG_UNISYS_VISORBUS)  += visorbus/
 obj-$(CONFIG_SIOX) += siox/
 obj-$(CONFIG_GNSS) += gnss/
+obj-$(CONFIG_SUBDEV)   += subdev/
diff --git a/drivers/subdev/Kconfig b/drivers/subdev/Kconfig
new file mode 100644
index 000..8ce3acc
--- /dev/null
+++ b/drivers/subdev/Kconfig
@@ -0,0 +1,12 @@
+#
+# subdev configuration
+#
+
+config SUBDEV
+   tristate "subdev bus driver"
+   help
+   The subdev bus driver allows creating hardware based sub devices
+   from a parent device. The subdev bus driver is required to create,
+   discover devices and to attach device drivers to this subdev
+   devices. These subdev devices are created using devlink tool by
+   user.
diff --git a/drivers/subdev/Makefile b/drivers/subdev/Makefile
new file mode 100644
index 000..405b74a
--- /dev/null
+++ b/drivers/subdev/Makefile
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Makefile for subdev bus driver
+#
+
+obj-$(CONFIG_SUBDEV)   += subdev.o
+
+subdev-y := subdev_main.o
diff --git a/drivers/subdev/subdev_main.c b/drivers/subdev/subdev_main.c
new file mode 100644
index 000..4aabcaa
--- /dev/null
+++ b/drivers/subdev/subdev_main.c
@@ -0,0 +1,153 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include 
+#include 
+#include 
+#include 
+
+static DEFINE_XARRAY_FLAGS(subdev_ids, XA_FLAGS_ALLOC);
+
+static int subdev_bus_match(struct device *dev, struct device_driver *drv)
+{
+   struct subdev_driver *subdev_drv = to_subdev_driver(drv);
+   const struct subdev_id *ids = subdev_drv->id_table;
+   const struct subdev *subdev = to_subdev_device(dev);
+
+   while (ids) {
+   if (ids->vendor_id == subdev->dev_id.vendor_id &&
+   ids->device_id == subdev->dev_id.device_id)
+   return 1;
+
+   ids++;
+   }
+   return 0;
+}
+
+static struct bus_type subdev_bus_type = {
+   .dev_name = "subdev",
+   .name = "subdev",
+   .match = subdev_bus_match,

[RFC net-next 6/8] devlink: Add support for devlink subdev lifecycle

2019-02-28 Thread Parav Pandit
Add support for creating and deleting devlink subdevices.
For every subdev created on subdev bus, has corresponding devlink device.
This devlink device serves the control point for any internal device
configuration which is usually required before setting up the protocol
specific devices such as netdev, block or infiniband devices.

devlink subdev are created using iproute2 devlink tool command such as:
(a) create devlink subdev
$devlink dev add DEV
output: subdev/subdev0

(b) delete a devlink subdev
$devlink dev del DEV
$devlink dev del subdev/subdev0

Signed-off-by: Parav Pandit 
---
 include/net/devlink.h|  6 ++-
 include/uapi/linux/devlink.h |  3 ++
 net/core/devlink.c   | 97 ++--
 3 files changed, 102 insertions(+), 4 deletions(-)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 9a067b1..3265508 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -36,6 +36,7 @@ struct devlink {
struct device *dev;
possible_net_t _net;
struct mutex lock;
+   struct devlink *parent; /* optional if this is child devlink device */
char priv[0] __aligned(NETDEV_ALIGN);
 };
 
@@ -524,6 +525,8 @@ struct devlink_ops {
int (*flash_update)(struct devlink *devlink, const char *file_name,
const char *component,
struct netlink_ext_ack *extack);
+   struct devlink* (*dev_add)(struct devlink *devlink);
+   void (*dev_del)(struct devlink *del_dev);
 };
 
 static inline void *devlink_priv(struct devlink *devlink)
@@ -545,7 +548,8 @@ static inline struct devlink *priv_to_devlink(void *priv)
 void devlink_init(struct devlink *devlink, const struct devlink_ops *ops);
 void devlink_cleanup(struct devlink *devlink);
 struct devlink *devlink_alloc(const struct devlink_ops *ops, size_t priv_size);
-void __devlink_register(struct devlink *devlink, struct device *dev);
+int __devlink_register(struct devlink *devlink, struct device *dev,
+  struct devlink *parent);
 int devlink_register(struct devlink *devlink, struct device *dev);
 void __devlink_unregister(struct devlink *devlink);
 void devlink_unregister(struct devlink *devlink);
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 53de880..233f5bc 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -105,6 +105,9 @@ enum devlink_command {
 
DEVLINK_CMD_FLASH_UPDATE,
 
+   DEVLINK_CMD_DEV_ADD,
+   DEVLINK_CMD_DEV_DEL,
+
/* add new commands above here */
__DEVLINK_CMD_MAX,
DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1
diff --git a/net/core/devlink.c b/net/core/devlink.c
index cfbad2c..3b5c961 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -3759,6 +3759,57 @@ static int devlink_nl_cmd_region_read_dumpit(struct 
sk_buff *skb,
return err;
 }
 
+static int
+devlink_nl_cmd_dev_add_doit(struct sk_buff *skb, struct genl_info *info)
+{
+   struct devlink *devlink = info->user_ptr[0];
+   struct devlink *new_devlink;
+   struct sk_buff *msg;
+   int err;
+
+   if (!devlink->ops->dev_add || !devlink->ops->dev_del)
+   return -EOPNOTSUPP;
+
+   msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+   if (!msg)
+   return -ENOMEM;
+
+   new_devlink = devlink->ops->dev_add(devlink);
+   if (IS_ERR(new_devlink)) {
+   err = PTR_ERR(new_devlink);
+   goto dev_err;
+   }
+
+   err = devlink_nl_put_handle(msg, new_devlink);
+   if (err)
+   goto put_err;
+
+   return genlmsg_reply(msg, info);
+
+put_err:
+   devlink->ops->dev_del(new_devlink);
+dev_err:
+   nlmsg_free(msg);
+   return err;
+}
+
+static int
+devlink_nl_cmd_dev_del_doit(struct sk_buff *skb, struct genl_info *info)
+{
+   struct devlink *devlink;
+   struct devlink *parent;
+
+   devlink = devlink_get_from_info(info);
+   if (!devlink)
+   return -ENODEV;
+   parent = devlink->parent;
+   if (!parent)
+   return -EOPNOTSUPP;
+
+   parent->ops->dev_del(devlink);
+   return 0;
+}
+
 struct devlink_info_req {
struct sk_buff *msg;
 };
@@ -5201,6 +5252,20 @@ static int 
devlink_nl_cmd_health_reporter_dump_get_doit(struct sk_buff *skb,
.flags = GENL_ADMIN_PERM,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
},
+   {
+   .cmd = DEVLINK_CMD_DEV_ADD,
+   .doit = devlink_nl_cmd_dev_add_doit,
+   .policy = devlink_nl_policy,
+   .flags = GENL_ADMIN_PERM,
+   .internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
+   },
+   {
+   .cmd = DEVLINK_CMD_DEV_DEL,
+   .doit = devlink_nl_cmd_dev_del_doit,
+   .policy = devlink_nl_policy,
+   .flags = GENL_ADMIN_P

[RFC net-next 7/8] net/mlx5: Add devlink subdev life cycle command support

2019-02-28 Thread Parav Pandit
Implement devlink device add/del command which cretes dummy subdev
devices that actual driver can bind to using standard device driver
model.

Signed-off-by: Parav Pandit 
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |  1 +
 drivers/net/ethernet/mellanox/mlx5/core/main.c |  4 ++
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|  4 ++
 drivers/net/ethernet/mellanox/mlx5/core/subdev.c   | 55 ++
 4 files changed, 64 insertions(+)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/subdev.c

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 82d636b..f218789 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -16,6 +16,7 @@ mlx5_core-y :=main.o cmd.o debugfs.o fw.o eq.o uar.o 
pagealloc.o \
transobj.o vport.o sriov.o fs_cmd.o fs_core.o \
fs_counters.o rl.o lag.o dev.o events.o wq.o lib/gid.o \
lib/devcom.o diag/fs_tracepoint.o diag/fw_tracer.o
+mlx5_core-$(CONFIG_SUBDEV) += subdev.o
 
 #
 # Netdev basic
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 40d591c..5f8cf0d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1213,6 +1213,10 @@ static int mlx5_unload_one(struct mlx5_core_dev *dev, 
struct mlx5_priv *priv,
.eswitch_encap_mode_set = mlx5_devlink_eswitch_encap_mode_set,
.eswitch_encap_mode_get = mlx5_devlink_eswitch_encap_mode_get,
 #endif
+#if IS_ENABLED(CONFIG_SUBDEV)
+   .dev_add = mlx5_devlink_dev_add,
+   .dev_del = mlx5_devlink_dev_del,
+#endif
 };
 
 #define MLX5_IB_MOD "mlx5_ib"
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h 
b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index 9529cf9..2a54148 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -202,4 +202,8 @@ enum {
 
 u8 mlx5_get_nic_state(struct mlx5_core_dev *dev);
 void mlx5_set_nic_state(struct mlx5_core_dev *dev, u8 state);
+
+struct devlink *mlx5_devlink_dev_add(struct devlink *devlink);
+void mlx5_devlink_dev_del(struct devlink *devlink);
+
 #endif /* __MLX5_CORE_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/subdev.c 
b/drivers/net/ethernet/mellanox/mlx5/core/subdev.c
new file mode 100644
index 000..9e78ea01
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/subdev.c
@@ -0,0 +1,55 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+// Copyright (c) 2018-19 Mellanox Technologies
+
+#include 
+#include 
+#include 
+#include 
+
+#include "mlx5_core.h"
+
+struct mlx5_subdev {
+   struct subdev subdev;
+   struct devlink dl;
+};
+
+struct devlink *mlx5_devlink_dev_add(struct devlink *devlink)
+{
+   struct mlx5_subdev *subdev;
+   int ret;
+
+   subdev = subdev_alloc_dev(mlx5_subdev, subdev);
+   if (!subdev)
+   return ERR_PTR(-ENOMEM);
+
+   devlink_init(>dl, NULL);
+
+   ret = subdev_add_dev(>subdev, devlink->dev,
+SUBDEV_VENDOR_ID_MELLANOX,
+SUBDEV_DEVICE_ID_MELLANOX_SF);
+   if (ret)
+   goto add_err;
+
+   ret = __devlink_register(>dl, >subdev.dev, devlink);
+   if (ret)
+   goto reg_err;
+
+   return >dl;
+
+reg_err:
+   devlink_cleanup(>dl);
+add_err:
+   subdev_free_dev(>subdev);
+   return ERR_PTR(ret);
+}
+
+void mlx5_devlink_dev_del(struct devlink *devlink)
+{
+   struct mlx5_subdev *subdev =
+   container_of(devlink, struct mlx5_subdev, dl);
+
+   __devlink_unregister(devlink);
+   devlink_cleanup(devlink);
+   subdev_delete_dev(>subdev);
+   subdev_free_dev(>subdev);
+}
-- 
1.8.3.1



[RFC net-next 8/8] net/mlx5: Add subdev driver to bind to subdev devices

2019-02-28 Thread Parav Pandit
Add a subdev driver to probe the subdev devices and create fake
netdevice for it.

Signed-off-by: Parav Pandit 
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/main.c |  8 +-
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|  3 +
 .../ethernet/mellanox/mlx5/core/subdev_driver.c| 93 ++
 4 files changed, 104 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/subdev_driver.c

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index f218789..c8aeaf1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -16,7 +16,7 @@ mlx5_core-y :=main.o cmd.o debugfs.o fw.o eq.o uar.o 
pagealloc.o \
transobj.o vport.o sriov.o fs_cmd.o fs_core.o \
fs_counters.o rl.o lag.o dev.o events.o wq.o lib/gid.o \
lib/devcom.o diag/fs_tracepoint.o diag/fw_tracer.o
-mlx5_core-$(CONFIG_SUBDEV) += subdev.o
+mlx5_core-$(CONFIG_SUBDEV) += subdev.o subdev_driver.o
 
 #
 # Netdev basic
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 5f8cf0d..7dfa8c4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1548,7 +1548,11 @@ static int __init init(void)
mlx5e_init();
 #endif
 
-   return 0;
+   err = subdev_register_driver(_subdev_driver);
+   if (err)
+   pci_unregister_driver(_core_driver);
+
+   return err;
 
 err_debug:
mlx5_unregister_debugfs();
@@ -1557,6 +1561,8 @@ static int __init init(void)
 
 static void __exit cleanup(void)
 {
+   subdev_unregister_driver(_subdev_driver);
+
 #ifdef CONFIG_MLX5_CORE_EN
mlx5e_cleanup();
 #endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h 
b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index 2a54148..1b733c7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -41,12 +41,15 @@
 #include 
 #include 
 #include 
+#include 
 
 #define DRIVER_NAME "mlx5_core"
 #define DRIVER_VERSION "5.0-0"
 
 extern uint mlx5_core_debug_mask;
 
+extern struct subdev_driver mlx5_subdev_driver;
+
 #define mlx5_core_dbg(__dev, format, ...)  \
dev_dbg(&(__dev)->pdev->dev, "%s:%d:(pid %d): " format, \
 __func__, __LINE__, current->pid,  \
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/subdev_driver.c 
b/drivers/net/ethernet/mellanox/mlx5/core/subdev_driver.c
new file mode 100644
index 000..880aa4f
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/subdev_driver.c
@@ -0,0 +1,93 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2018-19 Mellanox Technologies
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct mlx5_subdev_ndev {
+   struct net_device ndev;
+};
+
+static void mlx5_dma_test(struct device *dev)
+{
+   dma_addr_t pa;
+   void *va;
+
+   va = dma_alloc_coherent(dev, 4096, , GFP_KERNEL);
+   if (va)
+   dma_free_coherent(dev, 4096, va, pa);
+}
+
+static struct net_device *ndev;
+
+static int mlx5e_subdev_open(struct net_device *netdev)
+{
+   return 0;
+}
+
+static int mlx5e_subdev_close(struct net_device *netdev)
+{
+   return 0;
+}
+
+static netdev_tx_t
+mlx5e_subdev_xmit(struct sk_buff *skb, struct net_device *netdev)
+{
+   return NETDEV_TX_BUSY;
+}
+
+const struct net_device_ops mlx5e_subdev_netdev_ops = {
+   .ndo_open= mlx5e_subdev_open,
+   .ndo_stop= mlx5e_subdev_close,
+   .ndo_start_xmit  = mlx5e_subdev_xmit,
+};
+
+static int mlx5_subdev_probe(struct device *dev)
+{
+   int err;
+
+   mlx5_dma_test(dev);
+   /* Only one device supported in rfc */
+   if (ndev)
+   return 0;
+
+   ndev = alloc_etherdev_mqs(sizeof(struct mlx5_subdev_ndev), 1, 1);
+   if (!ndev)
+   return -ENOMEM;
+
+   SET_NETDEV_DEV(ndev, dev);
+   ndev->netdev_ops = _subdev_netdev_ops;
+   err = register_netdev(ndev);
+   if (err) {
+   free_netdev(ndev);
+   ndev = NULL;
+   }
+   return err;
+}
+
+static int mlx5_subdev_remove(struct device *dev)
+{
+   if (ndev) {
+   unregister_netdev(ndev);
+   free_netdev(ndev);
+   ndev = NULL;
+   }
+   return 0;
+}
+
+static const struct subdev_id mlx5_subdev_id_table[] = {
+   { .vendor_id = SUBDEV_VENDOR_ID_MELLANOX,
+ .device_id = SUBDEV_DEVICE_ID_MELLANOX_SF },
+   { 0, }
+};
+MODULE_DEVICE_TABLE(subdev, mlx5_subdev_id_table);
+
+struct subdev_driver mlx5_subdev_driver = {
+   .id_tabl

[RFC net-next 4/8] devlink: Introduce and use devlink_init/cleanup() in alloc/free

2019-02-28 Thread Parav Pandit
There is usecase to allocate devlink instance along with other structure
instance.
This is case when struct devlink and struct device are desired to be
part of single structure instance whose life cycle is driven by the life
cycle of the core device.
To support it, have more grandular init/cleanup APIs and reuse them in
existing alloc/free APIs.

Signed-off-by: Parav Pandit 
---
 include/net/devlink.h | 10 ++
 net/core/devlink.c| 50 +-
 2 files changed, 47 insertions(+), 13 deletions(-)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index a2da49d..ae5e0e6 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -542,6 +542,8 @@ static inline struct devlink *priv_to_devlink(void *priv)
 
 #if IS_ENABLED(CONFIG_NET_DEVLINK)
 
+void devlink_init(struct devlink *devlink, const struct devlink_ops *ops);
+void devlink_cleanup(struct devlink *devlink);
 struct devlink *devlink_alloc(const struct devlink_ops *ops, size_t priv_size);
 int devlink_register(struct devlink *devlink, struct device *dev);
 void devlink_unregister(struct devlink *devlink);
@@ -709,6 +711,14 @@ int devlink_health_report(struct devlink_health_reporter 
*reporter,
 
 #else
 
+static inline void devlink_init(struct devlink *devlink,
+   const struct devlink_ops *ops)
+}
+
+static inline void devlink_cleanup(struct devlink *devlink)
+{
+}
+
 static inline struct devlink *devlink_alloc(const struct devlink_ops *ops,
size_t priv_size)
 {
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 04d9855..25492c6 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -5218,21 +5218,16 @@ static int 
devlink_nl_cmd_health_reporter_dump_get_doit(struct sk_buff *skb,
 };
 
 /**
- * devlink_alloc - Allocate new devlink instance resources
+ * devlink_init - Initialize devlink instance
  *
- * @ops: ops
- * @priv_size: size of user private data
+ * @devlink: devlink pointer, which is not allocated using devlink_alloc().
  *
- * Allocate new devlink instance resources, including devlink index
- * and name.
+ * When user wants to allocate devlink object along with other objects
+ * in driver such as refcounted using struct device, it is useful to
+ * just init the devlink instance without allocating.
  */
-struct devlink *devlink_alloc(const struct devlink_ops *ops, size_t priv_size)
+void devlink_init(struct devlink *devlink, const struct devlink_ops *ops)
 {
-   struct devlink *devlink;
-
-   devlink = kzalloc(sizeof(*devlink) + priv_size, GFP_KERNEL);
-   if (!devlink)
-   return NULL;
devlink->ops = ops;
devlink_net_set(devlink, _net);
INIT_LIST_HEAD(>port_list);
@@ -5243,6 +5238,25 @@ struct devlink *devlink_alloc(const struct devlink_ops 
*ops, size_t priv_size)
INIT_LIST_HEAD(>region_list);
INIT_LIST_HEAD(>reporter_list);
mutex_init(>lock);
+}
+EXPORT_SYMBOL_GPL(devlink_init);
+
+/**
+ * devlink_alloc - Allocate new devlink instance resources
+ *
+ * @ops: ops
+ * @priv_size: size of user private data
+ *
+ * Allocate new devlink instance resources, including devlink index
+ * and name.
+ */
+struct devlink *devlink_alloc(const struct devlink_ops *ops, size_t priv_size)
+{
+   struct devlink *devlink;
+
+   devlink = kzalloc(sizeof(*devlink) + priv_size, GFP_KERNEL);
+   if (devlink)
+   devlink_init(devlink, ops);
return devlink;
 }
 EXPORT_SYMBOL_GPL(devlink_alloc);
@@ -5278,11 +5292,11 @@ void devlink_unregister(struct devlink *devlink)
 EXPORT_SYMBOL_GPL(devlink_unregister);
 
 /**
- * devlink_free - Free devlink instance resources
+ * devlink_cleanup - Cleanup devlink instance resources
  *
  * @devlink: devlink
  */
-void devlink_free(struct devlink *devlink)
+void devlink_cleanup(struct devlink *devlink)
 {
WARN_ON(!list_empty(>reporter_list));
WARN_ON(!list_empty(>region_list));
@@ -5291,7 +5305,17 @@ void devlink_free(struct devlink *devlink)
WARN_ON(!list_empty(>dpipe_table_list));
WARN_ON(!list_empty(>sb_list));
WARN_ON(!list_empty(>port_list));
+}
+EXPORT_SYMBOL_GPL(devlink_cleanup);
 
+/**
+ * devlink_free - Free devlink instance resources
+ *
+ * @devlink: devlink
+ */
+void devlink_free(struct devlink *devlink)
+{
+   devlink_cleanup(devlink);
kfree(devlink);
 }
 EXPORT_SYMBOL_GPL(devlink_free);
-- 
1.8.3.1



[RFC net-next 0/8] Introducing subdev bus and devlink extension

2019-02-28 Thread Parav Pandit
 | subdev bus |  | core  |  | subdev device|
| driver |  | kernel|  | drivers  |
| (add/del)  |  | dev model |  | (netdev, rdma)   |
|   --> probe/remove()|
++  +---+  +--+

Alternatives considered:

Will discuss separately if needed to keep this RFC short.


Parav Pandit (8):
  subdev: Introducing subdev bus
  subdev: Introduce pm callbacks
  modpost: Add support for subdev device id table
  devlink: Introduce and use devlink_init/cleanup() in alloc/free
  devlink: Add variant of devlink_register/unregister
  devlink: Add support for devlink subdev lifecycle
  net/mlx5: Add devlink subdev life cycle command support
  net/mlx5: Add subdev driver to bind to subdev devices

 drivers/Kconfig|   2 +
 drivers/Makefile   |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/main.c |  12 +-
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|   7 +
 drivers/net/ethernet/mellanox/mlx5/core/subdev.c   |  55 ++
 .../ethernet/mellanox/mlx5/core/subdev_driver.c|  93 +
 drivers/subdev/Kconfig |  12 ++
 drivers/subdev/Makefile|   8 +
 drivers/subdev/subdev_main.c   | 212 +
 include/linux/mod_devicetable.h|  12 ++
 include/linux/subdev_bus.h |  63 ++
 include/linux/subdev_ids.h |  17 ++
 include/net/devlink.h  |  29 ++-
 include/uapi/linux/devlink.h   |   3 +
 net/core/devlink.c | 179 +++--
 scripts/mod/devicetable-offsets.c  |   4 +
 scripts/mod/file2alias.c   |  15 ++
 18 files changed, 704 insertions(+), 21 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/subdev.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/subdev_driver.c
 create mode 100644 drivers/subdev/Kconfig
 create mode 100644 drivers/subdev/Makefile
 create mode 100644 drivers/subdev/subdev_main.c
 create mode 100644 include/linux/subdev_bus.h
 create mode 100644 include/linux/subdev_ids.h

-- 
1.8.3.1



[PATCH net-next 0/8] Introducing subdev bus and devlink extension

2019-02-28 Thread Parav Pandit
 | subdev bus |  | core  |  | subdev device|
| driver |  | kernel|  | drivers  |
| (add/del)  |  | dev model |  | (netdev, rdma)   |
|   --> probe/remove()|
++  +---+  +--+

Alternatives considered:

Will discuss separately if needed to keep this RFC short.

Parav Pandit (8):
  subdev: Introducing subdev bus
  subdev: Introduce pm callbacks
  modpost: Add support for subdev device id table
  devlink: Introduce and use devlink_init/cleanup() in alloc/free
  devlink: Add variant of devlink_register/unregister
  devlink: Add support for devlink subdev lifecycle
  net/mlx5: Add devlink subdev life cycle command support
  net/mlx5: Add subdev driver to bind to subdev devices

 drivers/Kconfig|   2 +
 drivers/Makefile   |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/main.c |  12 +-
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|   7 +
 drivers/net/ethernet/mellanox/mlx5/core/subdev.c   |  55 ++
 .../ethernet/mellanox/mlx5/core/subdev_driver.c|  93 +
 drivers/subdev/Kconfig |  12 ++
 drivers/subdev/Makefile|   8 +
 drivers/subdev/subdev_main.c   | 212 +
 include/linux/mod_devicetable.h|  12 ++
 include/linux/subdev_bus.h |  63 ++
 include/linux/subdev_ids.h |  17 ++
 include/net/devlink.h  |  29 ++-
 include/uapi/linux/devlink.h   |   3 +
 net/core/devlink.c | 179 +++--
 scripts/mod/devicetable-offsets.c  |   4 +
 scripts/mod/file2alias.c   |  15 ++
 18 files changed, 704 insertions(+), 21 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/subdev.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/subdev_driver.c
 create mode 100644 drivers/subdev/Kconfig
 create mode 100644 drivers/subdev/Makefile
 create mode 100644 drivers/subdev/subdev_main.c
 create mode 100644 include/linux/subdev_bus.h
 create mode 100644 include/linux/subdev_ids.h

-- 
1.8.3.1



RE: [PATCH v2] RDMA/cma: Rollback source IP address if failing to acquire device

2019-01-14 Thread Parav Pandit



> -Original Message-
> From: Myungho Jung 
> Sent: Thursday, January 10, 2019 12:28 AM
> To: Doug Ledford ; Jason Gunthorpe ;
> Parav Pandit 
> Cc: linux-r...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: [PATCH v2] RDMA/cma: Rollback source IP address if failing to
> acquire device
> 
> If cma_acquire_dev_by_src_ip() returns error in addr_handler(), the device
> state changes back to RDMA_CM_ADDR_BOUND but the resolved source IP
> address is still left. After that, if rdma_destroy_id() is called after
> rdma_listen(), the device is freed without removed from listen_any_list in
> cma_cancel_operation(). Revert to the previous IP address if acquiring device
> fails.
> 
> Reported-by: syzbot+f3ce716af730c8f96...@syzkaller.appspotmail.com
> Signed-off-by: Myungho Jung 
> ---
>  drivers/infiniband/core/cma.c | 13 -
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> index 63a7cc00bae0..8cd113b0ddfb 100644
> --- a/drivers/infiniband/core/cma.c
> +++ b/drivers/infiniband/core/cma.c
> @@ -2963,13 +2963,22 @@ static void addr_handler(int status, struct
> sockaddr *src_addr,  {
>   struct rdma_id_private *id_priv = context;
>   struct rdma_cm_event event = {};
> + struct sockaddr *addr;
> + struct sockaddr_storage old_addr;
> 
>   mutex_lock(_priv->handler_mutex);
>   if (!cma_comp_exch(id_priv, RDMA_CM_ADDR_QUERY,
>  RDMA_CM_ADDR_RESOLVED))
>   goto out;
> 
> - memcpy(cma_src_addr(id_priv), src_addr,
> rdma_addr_size(src_addr));
> + /*
> +  * Store the previous src address, so that if we fail to acquire
> +  * matching rdma device, old address can be restored back, which
> helps
> +  * to cancel the cma listen operation correctly.
> +  */
> + addr = cma_src_addr(id_priv);
> + memcpy(_addr, addr, rdma_addr_size(addr));
> + memcpy(addr, src_addr, rdma_addr_size(src_addr));
>   if (!status && !id_priv->cma_dev) {
>   status = cma_acquire_dev_by_src_ip(id_priv);
>   if (status)
> @@ -2980,6 +2989,8 @@ static void addr_handler(int status, struct sockaddr
> *src_addr,
>   }
> 
>   if (status) {
> + memcpy(addr, _addr,
> +rdma_addr_size((struct sockaddr *)_addr));
>   if (!cma_comp_exch(id_priv, RDMA_CM_ADDR_RESOLVED,
>  RDMA_CM_ADDR_BOUND))
>   goto out;
> --
> 2.17.1
Reviewed-by: Parav Pandit 


RE: [PATCH] RDMA/cma: Rollback source IP address if failing to acquire device

2019-01-09 Thread Parav Pandit



> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org  ow...@vger.kernel.org> On Behalf Of Myungho Jung
> Sent: Friday, January 4, 2019 12:46 AM
> To: Doug Ledford ; Jason Gunthorpe 
> Cc: linux-r...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: [PATCH] RDMA/cma: Rollback source IP address if failing to acquire
> device
> 
> If cma_acquire_dev_by_src_ip() returns error in addr_handler(), the device
> state changes back to RDMA_CM_ADDR_BOUND but the resolved source IP
> address is still left. After that, if rdma_destroy_id() is called after
> rdma_listen(), the device is freed without removed from listen_any_list in
> cma_cancel_operation(). Revert to the previous IP address if acquiring device
> fails.
> 
> Reported-by: syzbot+f3ce716af730c8f96...@syzkaller.appspotmail.com
> Signed-off-by: Myungho Jung 
> ---
>  drivers/infiniband/core/cma.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> index 63a7cc00bae0..d27c3b154e71 100644
> --- a/drivers/infiniband/core/cma.c
> +++ b/drivers/infiniband/core/cma.c
> @@ -2963,13 +2963,17 @@ static void addr_handler(int status, struct
> sockaddr *src_addr,  {
>   struct rdma_id_private *id_priv = context;
>   struct rdma_cm_event event = {};
> + struct sockaddr *addr;
> + struct sockaddr_storage old_addr;
> 
>   mutex_lock(_priv->handler_mutex);
>   if (!cma_comp_exch(id_priv, RDMA_CM_ADDR_QUERY,
>  RDMA_CM_ADDR_RESOLVED))
>   goto out;
> 
> - memcpy(cma_src_addr(id_priv), src_addr,
> rdma_addr_size(src_addr));
> + addr = cma_src_addr(id_priv);
> + memcpy(_addr, addr, rdma_addr_size(addr));
Please add a comment here in the patch, why we need to store the old src 
address and restore back.
/*
  * Store the previous src address, so that if we fail to acquire matching rdma 
device,
  * old address can be restored back, which helps to cancel the cma listen 
operation
  * correctly.
  */
> + memcpy(addr, src_addr, rdma_addr_size(src_addr));
>   if (!status && !id_priv->cma_dev) {
>   status = cma_acquire_dev_by_src_ip(id_priv);
>   if (status)
> @@ -2980,6 +2984,8 @@ static void addr_handler(int status, struct sockaddr
> *src_addr,
>   }
> 
>   if (status) {
> + memcpy(addr, _addr,
> +rdma_addr_size((struct sockaddr *)_addr));
>   if (!cma_comp_exch(id_priv, RDMA_CM_ADDR_RESOLVED,
>  RDMA_CM_ADDR_BOUND))
>   goto out;
> --
> 2.17.1

Reviewed-by: Parav Pandit 


RE: [PATCH] rdmacg: fix a typo in rdmacg documentation

2018-10-23 Thread Parav Pandit



> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org  ow...@vger.kernel.org> On Behalf Of Rami Rosen
> Sent: Tuesday, October 23, 2018 8:31 PM
> To: pandit.pa...@gmail.com
> Cc: t...@kernel.org; linux-r...@vger.kernel.org; linux-kernel@vger.kernel.org;
> cgro...@vger.kernel.org; Rami Rosen 
> Subject: [PATCH] rdmacg: fix a typo in rdmacg documentation
> 
> This patch fixes a typo in RDMA cgroup documentation.
> 
> Signed-off-by: Rami Rosen 
> ---
>  Documentation/cgroup-v1/rdma.txt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/Documentation/cgroup-v1/rdma.txt b/Documentation/cgroup-
> v1/rdma.txt
> index af618171e0eb..9bdb7fd03f83 100644
> --- a/Documentation/cgroup-v1/rdma.txt
> +++ b/Documentation/cgroup-v1/rdma.txt
> @@ -27,7 +27,7 @@ cgroup.
>  Currently user space applications can easily take away all the rdma verb
> specific resources such as AH, CQ, QP, MR etc. Due to which other
> applications  in other cgroup or kernel space ULPs may not even get chance
> to allocate any -rdma resources. This can leads to service unavailability.
> +rdma resources. This can lead to service unavailability.
> 
>  Therefore RDMA controller is needed through which resource consumption
> of processes can be limited. Through this controller different rdma
> --
> 2.17.1

Thanks Rami for the correction.
For sake of process correctness, below fixes tag is good to have.

Fixes: 9c1e67f94101 ("rdmacg: Added documentation for rdmacg")

Reviewed-by: Parav Pandit 


RE: [PATCH] rdmacg: fix a typo in rdmacg documentation

2018-10-23 Thread Parav Pandit



> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org  ow...@vger.kernel.org> On Behalf Of Rami Rosen
> Sent: Tuesday, October 23, 2018 8:31 PM
> To: pandit.pa...@gmail.com
> Cc: t...@kernel.org; linux-r...@vger.kernel.org; linux-kernel@vger.kernel.org;
> cgro...@vger.kernel.org; Rami Rosen 
> Subject: [PATCH] rdmacg: fix a typo in rdmacg documentation
> 
> This patch fixes a typo in RDMA cgroup documentation.
> 
> Signed-off-by: Rami Rosen 
> ---
>  Documentation/cgroup-v1/rdma.txt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/Documentation/cgroup-v1/rdma.txt b/Documentation/cgroup-
> v1/rdma.txt
> index af618171e0eb..9bdb7fd03f83 100644
> --- a/Documentation/cgroup-v1/rdma.txt
> +++ b/Documentation/cgroup-v1/rdma.txt
> @@ -27,7 +27,7 @@ cgroup.
>  Currently user space applications can easily take away all the rdma verb
> specific resources such as AH, CQ, QP, MR etc. Due to which other
> applications  in other cgroup or kernel space ULPs may not even get chance
> to allocate any -rdma resources. This can leads to service unavailability.
> +rdma resources. This can lead to service unavailability.
> 
>  Therefore RDMA controller is needed through which resource consumption
> of processes can be limited. Through this controller different rdma
> --
> 2.17.1

Thanks Rami for the correction.
For sake of process correctness, below fixes tag is good to have.

Fixes: 9c1e67f94101 ("rdmacg: Added documentation for rdmacg")

Reviewed-by: Parav Pandit 


RE: [PATCHv1] RDMA/core: Check error status of rdma_find_ndev_for_src_ip_rcu

2018-10-03 Thread Parav Pandit



> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org  ow...@vger.kernel.org> On Behalf Of Jason Gunthorpe
> Sent: Wednesday, October 3, 2018 9:48 PM
> To: Parav Pandit 
> Cc: linux-r...@vger.kernel.org; linux-kernel@vger.kernel.org;
> l...@kernel.org; Daniel Jurgens ;
> dledf...@redhat.com
> Subject: Re: [PATCHv1] RDMA/core: Check error status of
> rdma_find_ndev_for_src_ip_rcu
> 
> On Thu, Oct 04, 2018 at 02:28:54AM +, Parav Pandit wrote:
> > Hi Doug, Jason,
> >
> > > From: Parav Pandit 
> > > Sent: Friday, September 21, 2018 10:00 AM
> > > To: linux-r...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > l...@kernel.org; j...@ziepe.ca; syzkaller-b...@googlegroups.com;
> > > Daniel Jurgens ; dledf...@redhat.com
> > > Cc: Parav Pandit 
> > > Subject: [PATCHv1] RDMA/core: Check error status of
> > > rdma_find_ndev_for_src_ip_rcu
> > >
> > > rdma_find_ndev_for_src_ip_rcu() returns either valid netdev pointer
> > > or ERR_PTR().
> > > Instead of checking for NULL, check for error.
> > >
> > > Fixes: caf1e3ae9fa6 ("RDMA/core Introduce and use
> > > rdma_find_ndev_for_src_ip_rcu")
> > > Reported-by: syzbot+20c32fa6ff84a2d28...@syzkaller.appspotmail.com
> > > Signed-off-by: Parav Pandit 
> > > drivers/infiniband/core/addr.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/infiniband/core/addr.c
> > > b/drivers/infiniband/core/addr.c index c2ca9e4..3c07eeb 100644
> > > +++ b/drivers/infiniband/core/addr.c
> > > @@ -513,8 +513,8 @@ static int rdma_set_src_addr_rcu(struct
> > > rdma_dev_addr *dev_addr,
> > >* loopback IP address.
> > >*/
> > >   ndev = rdma_find_ndev_for_src_ip_rcu(dev_net(ndev),
> > > dst_in);
> > > - if (!ndev)
> > > - return -ENODEV;
> > > + if (IS_ERR(ndev))
> > > + return PTR_ERR(ndev);
> > >   }
> > >
> > >   return copy_src_l2_addr(dev_addr, dst_in, dst, ndev);
> >
> > Can you please review this fix?  I got below report from syzbot that
> > it tested the patch and reproducer didn't trigger.
> 
> It is very strange, but this patch does not show up in rdma's patch works.
> 
> This happened to Dennis as well for one patch, I'm afraid as a general rule,
> people will need to check that patchworks has thier patches, and maybe talk
> to LF IT about why things have gone missing.
> 
> I would guess it is some spam filter issue?
Not sure, may be something wrong in my mail client configuration.
Most patches are through Leon so I will continue it that way for now.

> 
> I have applied this patch from my email.
Thanks a lot.


RE: [PATCHv1] RDMA/core: Check error status of rdma_find_ndev_for_src_ip_rcu

2018-10-03 Thread Parav Pandit



> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org  ow...@vger.kernel.org> On Behalf Of Jason Gunthorpe
> Sent: Wednesday, October 3, 2018 9:48 PM
> To: Parav Pandit 
> Cc: linux-r...@vger.kernel.org; linux-kernel@vger.kernel.org;
> l...@kernel.org; Daniel Jurgens ;
> dledf...@redhat.com
> Subject: Re: [PATCHv1] RDMA/core: Check error status of
> rdma_find_ndev_for_src_ip_rcu
> 
> On Thu, Oct 04, 2018 at 02:28:54AM +, Parav Pandit wrote:
> > Hi Doug, Jason,
> >
> > > From: Parav Pandit 
> > > Sent: Friday, September 21, 2018 10:00 AM
> > > To: linux-r...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > l...@kernel.org; j...@ziepe.ca; syzkaller-b...@googlegroups.com;
> > > Daniel Jurgens ; dledf...@redhat.com
> > > Cc: Parav Pandit 
> > > Subject: [PATCHv1] RDMA/core: Check error status of
> > > rdma_find_ndev_for_src_ip_rcu
> > >
> > > rdma_find_ndev_for_src_ip_rcu() returns either valid netdev pointer
> > > or ERR_PTR().
> > > Instead of checking for NULL, check for error.
> > >
> > > Fixes: caf1e3ae9fa6 ("RDMA/core Introduce and use
> > > rdma_find_ndev_for_src_ip_rcu")
> > > Reported-by: syzbot+20c32fa6ff84a2d28...@syzkaller.appspotmail.com
> > > Signed-off-by: Parav Pandit 
> > > drivers/infiniband/core/addr.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/infiniband/core/addr.c
> > > b/drivers/infiniband/core/addr.c index c2ca9e4..3c07eeb 100644
> > > +++ b/drivers/infiniband/core/addr.c
> > > @@ -513,8 +513,8 @@ static int rdma_set_src_addr_rcu(struct
> > > rdma_dev_addr *dev_addr,
> > >* loopback IP address.
> > >*/
> > >   ndev = rdma_find_ndev_for_src_ip_rcu(dev_net(ndev),
> > > dst_in);
> > > - if (!ndev)
> > > - return -ENODEV;
> > > + if (IS_ERR(ndev))
> > > + return PTR_ERR(ndev);
> > >   }
> > >
> > >   return copy_src_l2_addr(dev_addr, dst_in, dst, ndev);
> >
> > Can you please review this fix?  I got below report from syzbot that
> > it tested the patch and reproducer didn't trigger.
> 
> It is very strange, but this patch does not show up in rdma's patch works.
> 
> This happened to Dennis as well for one patch, I'm afraid as a general rule,
> people will need to check that patchworks has thier patches, and maybe talk
> to LF IT about why things have gone missing.
> 
> I would guess it is some spam filter issue?
Not sure, may be something wrong in my mail client configuration.
Most patches are through Leon so I will continue it that way for now.

> 
> I have applied this patch from my email.
Thanks a lot.


RE: [PATCHv1] RDMA/core: Check error status of rdma_find_ndev_for_src_ip_rcu

2018-10-03 Thread Parav Pandit
Hi Doug, Jason,

> -Original Message-
> From: Parav Pandit 
> Sent: Friday, September 21, 2018 10:00 AM
> To: linux-r...@vger.kernel.org; linux-kernel@vger.kernel.org;
> l...@kernel.org; j...@ziepe.ca; syzkaller-b...@googlegroups.com; Daniel
> Jurgens ; dledf...@redhat.com
> Cc: Parav Pandit 
> Subject: [PATCHv1] RDMA/core: Check error status of
> rdma_find_ndev_for_src_ip_rcu
> 
> rdma_find_ndev_for_src_ip_rcu() returns either valid netdev pointer or
> ERR_PTR().
> Instead of checking for NULL, check for error.
> 
> Fixes: caf1e3ae9fa6 ("RDMA/core Introduce and use
> rdma_find_ndev_for_src_ip_rcu")
> Reported-by: syzbot+20c32fa6ff84a2d28...@syzkaller.appspotmail.com
> Signed-off-by: Parav Pandit 
> ---
>  drivers/infiniband/core/addr.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
> index c2ca9e4..3c07eeb 100644
> --- a/drivers/infiniband/core/addr.c
> +++ b/drivers/infiniband/core/addr.c
> @@ -513,8 +513,8 @@ static int rdma_set_src_addr_rcu(struct
> rdma_dev_addr *dev_addr,
>* loopback IP address.
>*/
>   ndev = rdma_find_ndev_for_src_ip_rcu(dev_net(ndev),
> dst_in);
> - if (!ndev)
> - return -ENODEV;
> + if (IS_ERR(ndev))
> + return PTR_ERR(ndev);
>   }
> 
>   return copy_src_l2_addr(dev_addr, dst_in, dst, ndev);
> --
> 1.8.3.1

Can you please review this fix?
I got below report from syzbot that it tested the patch and reproducer didn't 
trigger.

Report:

syzbot has tested the proposed patch and the reproducer did not trigger  
crash:

Reported-and-tested-by:  
syzbot+20c32fa6ff84a2d28...@syzkaller.appspotmail.com

Tested on:

commit: 41ab1cb7d1cd RDMA/cma: Introduce and use cma_ib_acquire_de..
git tree:   git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git  
for-next
kernel config:  
https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsyzkaller.appspot.com%2Fx%2F.config%3Fx%3D112cc1aec8b19ba4data=02%7C01%7Cparav%40mellanox.com%7Ce0a5662bb6aa4fdee43508d6298c9e60%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636742081267605003sdata=XiJ9F0gNuDvCWt0m2qzS6SbocXbFXIdWEHe%2BaJuvvcM%3Dreserved=0
compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
patch:  
https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsyzkaller.appspot.com%2Fx%2Fpatch.diff%3Fx%3D1055823140data=02%7C01%7Cparav%40mellanox.com%7Ce0a5662bb6aa4fdee43508d6298c9e60%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636742081267605003sdata=da0p6AW3YnXmFVrSaR%2FxSrB%2Bj%2Bml%2B5AXx%2B%2FKG4Kjb1M%3Dreserved=0

Note: testing is done by a robot and is best-effort only.



RE: [PATCHv1] RDMA/core: Check error status of rdma_find_ndev_for_src_ip_rcu

2018-10-03 Thread Parav Pandit
Hi Doug, Jason,

> -Original Message-
> From: Parav Pandit 
> Sent: Friday, September 21, 2018 10:00 AM
> To: linux-r...@vger.kernel.org; linux-kernel@vger.kernel.org;
> l...@kernel.org; j...@ziepe.ca; syzkaller-b...@googlegroups.com; Daniel
> Jurgens ; dledf...@redhat.com
> Cc: Parav Pandit 
> Subject: [PATCHv1] RDMA/core: Check error status of
> rdma_find_ndev_for_src_ip_rcu
> 
> rdma_find_ndev_for_src_ip_rcu() returns either valid netdev pointer or
> ERR_PTR().
> Instead of checking for NULL, check for error.
> 
> Fixes: caf1e3ae9fa6 ("RDMA/core Introduce and use
> rdma_find_ndev_for_src_ip_rcu")
> Reported-by: syzbot+20c32fa6ff84a2d28...@syzkaller.appspotmail.com
> Signed-off-by: Parav Pandit 
> ---
>  drivers/infiniband/core/addr.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
> index c2ca9e4..3c07eeb 100644
> --- a/drivers/infiniband/core/addr.c
> +++ b/drivers/infiniband/core/addr.c
> @@ -513,8 +513,8 @@ static int rdma_set_src_addr_rcu(struct
> rdma_dev_addr *dev_addr,
>* loopback IP address.
>*/
>   ndev = rdma_find_ndev_for_src_ip_rcu(dev_net(ndev),
> dst_in);
> - if (!ndev)
> - return -ENODEV;
> + if (IS_ERR(ndev))
> + return PTR_ERR(ndev);
>   }
> 
>   return copy_src_l2_addr(dev_addr, dst_in, dst, ndev);
> --
> 1.8.3.1

Can you please review this fix?
I got below report from syzbot that it tested the patch and reproducer didn't 
trigger.

Report:

syzbot has tested the proposed patch and the reproducer did not trigger  
crash:

Reported-and-tested-by:  
syzbot+20c32fa6ff84a2d28...@syzkaller.appspotmail.com

Tested on:

commit: 41ab1cb7d1cd RDMA/cma: Introduce and use cma_ib_acquire_de..
git tree:   git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git  
for-next
kernel config:  
https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsyzkaller.appspot.com%2Fx%2F.config%3Fx%3D112cc1aec8b19ba4data=02%7C01%7Cparav%40mellanox.com%7Ce0a5662bb6aa4fdee43508d6298c9e60%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636742081267605003sdata=XiJ9F0gNuDvCWt0m2qzS6SbocXbFXIdWEHe%2BaJuvvcM%3Dreserved=0
compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
patch:  
https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsyzkaller.appspot.com%2Fx%2Fpatch.diff%3Fx%3D1055823140data=02%7C01%7Cparav%40mellanox.com%7Ce0a5662bb6aa4fdee43508d6298c9e60%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636742081267605003sdata=da0p6AW3YnXmFVrSaR%2FxSrB%2Bj%2Bml%2B5AXx%2B%2FKG4Kjb1M%3Dreserved=0

Note: testing is done by a robot and is best-effort only.



RE: linux-next: manual merge of the rdma tree with Linus' tree

2018-09-28 Thread Parav Pandit
Hi Stephen,

> -Original Message-
> From: Stephen Rothwell 
> Sent: Thursday, September 27, 2018 7:01 PM
> To: Doug Ledford ; Jason Gunthorpe
> 
> Cc: Linux-Next Mailing List ; Linux Kernel
> Mailing List ; Parav Pandit
> 
> Subject: linux-next: manual merge of the rdma tree with Linus' tree
> 
> Hi all,
> 
> Today's linux-next merge of the rdma tree got a conflict in:
> 
>   drivers/infiniband/core/cache.c
> 
> between commit:
> 
>   5c5702e259dc ("RDMA/core: Set right entry state before releasing
> reference")
> 
> from Linus' tree and commit:
> 
>   43c7c851b9bc ("RDMA/core: Use dev_err/dbg/etc instead of pr_* + ibdev-
> >name")
> 
> from the rdma tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This is now 
> fixed
> as far as linux-next is concerned, but any non trivial conflicts should be
> mentioned to your upstream maintainer when your tree is submitted for
> merging.  You may also want to consider cooperating with the maintainer of
> the conflicting tree to minimise any particularly complex conflicts.
> 

Sorry for the late reply. For some reason mail ended up in spam folder which I 
noticed now.
I should have watched the device naming series.
My fix went to for-rc and I guess it wasn't applied to for-next which resulted 
into this merge conflict.
My understanding is, maintainers usually try to avoid merge conflict between 
for-next and for-rc branches before sending pull request from rdma tree.
I will try to be more careful in future to notify maintainer about it.

Your changes below looks good. Thanks.
Reviewed-by: Parav Pandit  
> --
> Cheers,
> Stephen Rothwell
> 
> diff --cc drivers/infiniband/core/cache.c index
> 3208ad6ad540,ebc64418d809..
> --- a/drivers/infiniband/core/cache.c
> +++ b/drivers/infiniband/core/cache.c
> @@@ -337,39 -335,6 +335,38 @@@ static int add_roce_gid(struct ib_gid_t
>   return 0;
>   }
> 
>  +/**
>  + * del_gid - Delete GID table entry
>  + *
>  + * @ib_dev: IB device whose GID entry to be deleted
>  + * @port:   Port number of the IB device
>  + * @table:  GID table of the IB device for a port
>  + * @ix: GID entry index to delete
>  + *
>  + */
>  +static void del_gid(struct ib_device *ib_dev, u8 port,
>  +struct ib_gid_table *table, int ix)
>  +{
>  +struct ib_gid_table_entry *entry;
>  +
>  +lockdep_assert_held(>lock);
>  +
> - pr_debug("%s device=%s port=%d index=%d gid %pI6\n", __func__,
> -  ib_dev->name, port, ix,
> -  table->data_vec[ix]->attr.gid.raw);
> ++dev_dbg(_dev->dev, "%s port=%d index=%d gid %pI6\n",
>__func__, port,
> ++ix, table->data_vec[ix]->attr.gid.raw);
>  +
>  +write_lock_irq(>rwlock);
>  +entry = table->data_vec[ix];
>  +entry->state = GID_TABLE_ENTRY_PENDING_DEL;
>  +/*
>  + * For non RoCE protocol, GID entry slot is ready to use.
>  + */
>  +if (!rdma_protocol_roce(ib_dev, port))
>  +table->data_vec[ix] = NULL;
>  +write_unlock_irq(>rwlock);
>  +
>  +put_gid_entry_locked(entry);
>  +}
>  +
>   /**
>* add_modify_gid - Add or modify GID table entry
>*


RE: linux-next: manual merge of the rdma tree with Linus' tree

2018-09-28 Thread Parav Pandit
Hi Stephen,

> -Original Message-
> From: Stephen Rothwell 
> Sent: Thursday, September 27, 2018 7:01 PM
> To: Doug Ledford ; Jason Gunthorpe
> 
> Cc: Linux-Next Mailing List ; Linux Kernel
> Mailing List ; Parav Pandit
> 
> Subject: linux-next: manual merge of the rdma tree with Linus' tree
> 
> Hi all,
> 
> Today's linux-next merge of the rdma tree got a conflict in:
> 
>   drivers/infiniband/core/cache.c
> 
> between commit:
> 
>   5c5702e259dc ("RDMA/core: Set right entry state before releasing
> reference")
> 
> from Linus' tree and commit:
> 
>   43c7c851b9bc ("RDMA/core: Use dev_err/dbg/etc instead of pr_* + ibdev-
> >name")
> 
> from the rdma tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This is now 
> fixed
> as far as linux-next is concerned, but any non trivial conflicts should be
> mentioned to your upstream maintainer when your tree is submitted for
> merging.  You may also want to consider cooperating with the maintainer of
> the conflicting tree to minimise any particularly complex conflicts.
> 

Sorry for the late reply. For some reason mail ended up in spam folder which I 
noticed now.
I should have watched the device naming series.
My fix went to for-rc and I guess it wasn't applied to for-next which resulted 
into this merge conflict.
My understanding is, maintainers usually try to avoid merge conflict between 
for-next and for-rc branches before sending pull request from rdma tree.
I will try to be more careful in future to notify maintainer about it.

Your changes below looks good. Thanks.
Reviewed-by: Parav Pandit  
> --
> Cheers,
> Stephen Rothwell
> 
> diff --cc drivers/infiniband/core/cache.c index
> 3208ad6ad540,ebc64418d809..
> --- a/drivers/infiniband/core/cache.c
> +++ b/drivers/infiniband/core/cache.c
> @@@ -337,39 -335,6 +335,38 @@@ static int add_roce_gid(struct ib_gid_t
>   return 0;
>   }
> 
>  +/**
>  + * del_gid - Delete GID table entry
>  + *
>  + * @ib_dev: IB device whose GID entry to be deleted
>  + * @port:   Port number of the IB device
>  + * @table:  GID table of the IB device for a port
>  + * @ix: GID entry index to delete
>  + *
>  + */
>  +static void del_gid(struct ib_device *ib_dev, u8 port,
>  +struct ib_gid_table *table, int ix)
>  +{
>  +struct ib_gid_table_entry *entry;
>  +
>  +lockdep_assert_held(>lock);
>  +
> - pr_debug("%s device=%s port=%d index=%d gid %pI6\n", __func__,
> -  ib_dev->name, port, ix,
> -  table->data_vec[ix]->attr.gid.raw);
> ++dev_dbg(_dev->dev, "%s port=%d index=%d gid %pI6\n",
>__func__, port,
> ++ix, table->data_vec[ix]->attr.gid.raw);
>  +
>  +write_lock_irq(>rwlock);
>  +entry = table->data_vec[ix];
>  +entry->state = GID_TABLE_ENTRY_PENDING_DEL;
>  +/*
>  + * For non RoCE protocol, GID entry slot is ready to use.
>  + */
>  +if (!rdma_protocol_roce(ib_dev, port))
>  +table->data_vec[ix] = NULL;
>  +write_unlock_irq(>rwlock);
>  +
>  +put_gid_entry_locked(entry);
>  +}
>  +
>   /**
>* add_modify_gid - Add or modify GID table entry
>*


RE: general protection fault in addr_resolve

2018-09-21 Thread Parav Pandit


> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org  ow...@vger.kernel.org> On Behalf Of Parav Pandit
> Sent: Friday, September 21, 2018 8:53 AM
> To: syzbot ;
> Daniel Jurgens ; dledf...@redhat.com;
> j...@ziepe.ca; l...@kernel.org; linux-kernel@vger.kernel.org; linux-
> r...@vger.kernel.org; syzkaller-b...@googlegroups.com
> Subject: RE: general protection fault in addr_resolve
> 
> 
> 
> > -Original Message-
> > From: syzbot 
> > Sent: Friday, September 21, 2018 3:35 AM
> > To: Daniel Jurgens ; dledf...@redhat.com;
> > j...@ziepe.ca; l...@kernel.org; linux-kernel@vger.kernel.org; linux-
> > r...@vger.kernel.org; Parav Pandit ; syzkaller-
> > b...@googlegroups.com
> > Subject: general protection fault in addr_resolve
> >
> > Hello,
> >
> > syzbot found the following crash on:
> >
> > HEAD commit:a0cb0cabe4bb Add linux-next specific files for 20180920
> > git tree:   linux-next
> > console output:
> > https://syzkaller.appspot.com/x/log.txt?x=14b14e2a40
> > kernel config:
> > https://syzkaller.appspot.com/x/.config?x=786006c5dafbadf6
> > dashboard link:
> > https://syzkaller.appspot.com/bug?extid=20c32fa6ff84a2d28c36
> > compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
> >
> > Unfortunately, I don't have any reproducer for this crash yet.
> >
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+20c32fa6ff84a2d28...@syzkaller.appspotmail.com
> >
> > kasan: GPF could be caused by NULL-ptr deref or user memory access
> > general protection fault:  [#1] PREEMPT SMP KASAN
> > CPU: 0 PID: 976 Comm: syz-executor4 Not tainted
> > 4.19.0-rc4-next-20180920+
> > #76
> > Hardware name: Google Google Compute Engine/Google Compute Engine,
> > BIOS Google 01/01/2011
> > RIP: 0010:copy_src_l2_addr drivers/infiniband/core/addr.c:489 [inline]
> > RIP: 0010:rdma_set_src_addr_rcu drivers/infiniband/core/addr.c:518
> > [inline]
> > RIP: 0010:addr_resolve+0x7c4/0x1c90 drivers/infiniband/core/addr.c:593
> > Code: 0f 84 dd 01 00 00 e8 9b 6c e9 fb 48 8b 85 d8 fd ff ff 48 8d b8
> > 54 02
> > 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 14 02
> > 48
> > 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 24
> > RSP: 0018:8801c863f360 EFLAGS: 00010202
> > RAX: dc00 RBX: 8801c863f578 RCX: c90003e76000
> > RDX: 003e RSI: 8593e355 RDI: 01f1
> > RBP: 8801c863f5a0 R08: 8801d95722c0 R09: ed003b585b57
> > R10: ed003b585b57 R11: 8801dac2dabb R12: 8801d78a1560
> > R13: 0001 R14: 000a R15: 
> > kobject: 'loop5' (9a4a0383): kobject_uevent_env
> > FS:  7fefce9ca700() GS:8801dac0()
> > knlGS:
> > CS:  0010 DS:  ES:  CR0: 80050033
> > CR2: 00706158 CR3: 0001d370a000 CR4: 001406f0
> Call
> > Trace:
> > kobject: 'loop5' (9a4a0383): fill_kobj_path: path =
> > '/devices/virtual/block/loop5'
> >   rdma_resolve_ip+0x499/0x790 drivers/infiniband/core/addr.c:697
> >   rdma_resolve_addr+0x2d6/0x2870 drivers/infiniband/core/cma.c:2992
> > kobject: 'loop3' (f158c859): kobject_uevent_env
> > kobject: 'loop3' (f158c859): fill_kobj_path: path =
> > '/devices/virtual/block/loop3'
> >   ucma_resolve_ip+0x242/0x2a0 drivers/infiniband/core/ucma.c:713
> >   ucma_write+0x336/0x420 drivers/infiniband/core/ucma.c:1686
> >   __vfs_write+0x119/0x9f0 fs/read_write.c:485
> > kobject: 'loop2' (ed1b199d): kobject_uevent_env
> >   vfs_write+0x1fc/0x560 fs/read_write.c:549
> > kobject: 'loop2' (ed1b199d): fill_kobj_path: path =
> > '/devices/virtual/block/loop2'
> >   ksys_write+0x101/0x260 fs/read_write.c:598
> >   __do_sys_write fs/read_write.c:610 [inline]
> >   __se_sys_write fs/read_write.c:607 [inline]
> >   __x64_sys_write+0x73/0xb0 fs/read_write.c:607
> >   do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
> >   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > RIP: 0033:0x457679
> > Code: 1d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48
> > 89 f7
> > 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
> > ff ff 0f
> > 83 eb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
> > RSP: 002b:7fefce9c9c78 EFLAGS: 0246 ORIG_RAX:
> > 0001
> > RAX: ffda RBX: 7fefce9ca6d4 RCX: 00457679
> > RDX: 00

RE: general protection fault in addr_resolve

2018-09-21 Thread Parav Pandit


> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org  ow...@vger.kernel.org> On Behalf Of Parav Pandit
> Sent: Friday, September 21, 2018 8:53 AM
> To: syzbot ;
> Daniel Jurgens ; dledf...@redhat.com;
> j...@ziepe.ca; l...@kernel.org; linux-kernel@vger.kernel.org; linux-
> r...@vger.kernel.org; syzkaller-b...@googlegroups.com
> Subject: RE: general protection fault in addr_resolve
> 
> 
> 
> > -Original Message-
> > From: syzbot 
> > Sent: Friday, September 21, 2018 3:35 AM
> > To: Daniel Jurgens ; dledf...@redhat.com;
> > j...@ziepe.ca; l...@kernel.org; linux-kernel@vger.kernel.org; linux-
> > r...@vger.kernel.org; Parav Pandit ; syzkaller-
> > b...@googlegroups.com
> > Subject: general protection fault in addr_resolve
> >
> > Hello,
> >
> > syzbot found the following crash on:
> >
> > HEAD commit:a0cb0cabe4bb Add linux-next specific files for 20180920
> > git tree:   linux-next
> > console output:
> > https://syzkaller.appspot.com/x/log.txt?x=14b14e2a40
> > kernel config:
> > https://syzkaller.appspot.com/x/.config?x=786006c5dafbadf6
> > dashboard link:
> > https://syzkaller.appspot.com/bug?extid=20c32fa6ff84a2d28c36
> > compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
> >
> > Unfortunately, I don't have any reproducer for this crash yet.
> >
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+20c32fa6ff84a2d28...@syzkaller.appspotmail.com
> >
> > kasan: GPF could be caused by NULL-ptr deref or user memory access
> > general protection fault:  [#1] PREEMPT SMP KASAN
> > CPU: 0 PID: 976 Comm: syz-executor4 Not tainted
> > 4.19.0-rc4-next-20180920+
> > #76
> > Hardware name: Google Google Compute Engine/Google Compute Engine,
> > BIOS Google 01/01/2011
> > RIP: 0010:copy_src_l2_addr drivers/infiniband/core/addr.c:489 [inline]
> > RIP: 0010:rdma_set_src_addr_rcu drivers/infiniband/core/addr.c:518
> > [inline]
> > RIP: 0010:addr_resolve+0x7c4/0x1c90 drivers/infiniband/core/addr.c:593
> > Code: 0f 84 dd 01 00 00 e8 9b 6c e9 fb 48 8b 85 d8 fd ff ff 48 8d b8
> > 54 02
> > 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 14 02
> > 48
> > 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 24
> > RSP: 0018:8801c863f360 EFLAGS: 00010202
> > RAX: dc00 RBX: 8801c863f578 RCX: c90003e76000
> > RDX: 003e RSI: 8593e355 RDI: 01f1
> > RBP: 8801c863f5a0 R08: 8801d95722c0 R09: ed003b585b57
> > R10: ed003b585b57 R11: 8801dac2dabb R12: 8801d78a1560
> > R13: 0001 R14: 000a R15: 
> > kobject: 'loop5' (9a4a0383): kobject_uevent_env
> > FS:  7fefce9ca700() GS:8801dac0()
> > knlGS:
> > CS:  0010 DS:  ES:  CR0: 80050033
> > CR2: 00706158 CR3: 0001d370a000 CR4: 001406f0
> Call
> > Trace:
> > kobject: 'loop5' (9a4a0383): fill_kobj_path: path =
> > '/devices/virtual/block/loop5'
> >   rdma_resolve_ip+0x499/0x790 drivers/infiniband/core/addr.c:697
> >   rdma_resolve_addr+0x2d6/0x2870 drivers/infiniband/core/cma.c:2992
> > kobject: 'loop3' (f158c859): kobject_uevent_env
> > kobject: 'loop3' (f158c859): fill_kobj_path: path =
> > '/devices/virtual/block/loop3'
> >   ucma_resolve_ip+0x242/0x2a0 drivers/infiniband/core/ucma.c:713
> >   ucma_write+0x336/0x420 drivers/infiniband/core/ucma.c:1686
> >   __vfs_write+0x119/0x9f0 fs/read_write.c:485
> > kobject: 'loop2' (ed1b199d): kobject_uevent_env
> >   vfs_write+0x1fc/0x560 fs/read_write.c:549
> > kobject: 'loop2' (ed1b199d): fill_kobj_path: path =
> > '/devices/virtual/block/loop2'
> >   ksys_write+0x101/0x260 fs/read_write.c:598
> >   __do_sys_write fs/read_write.c:610 [inline]
> >   __se_sys_write fs/read_write.c:607 [inline]
> >   __x64_sys_write+0x73/0xb0 fs/read_write.c:607
> >   do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
> >   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > RIP: 0033:0x457679
> > Code: 1d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48
> > 89 f7
> > 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
> > ff ff 0f
> > 83 eb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
> > RSP: 002b:7fefce9c9c78 EFLAGS: 0246 ORIG_RAX:
> > 0001
> > RAX: ffda RBX: 7fefce9ca6d4 RCX: 00457679
> > RDX: 00

RE: general protection fault in addr_resolve

2018-09-21 Thread Parav Pandit


> -Original Message-
> From: syzbot 
> Sent: Friday, September 21, 2018 3:35 AM
> To: Daniel Jurgens ; dledf...@redhat.com;
> j...@ziepe.ca; l...@kernel.org; linux-kernel@vger.kernel.org; linux-
> r...@vger.kernel.org; Parav Pandit ; syzkaller-
> b...@googlegroups.com
> Subject: general protection fault in addr_resolve
> 
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:a0cb0cabe4bb Add linux-next specific files for 20180920
> git tree:   linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=14b14e2a40
> kernel config:  https://syzkaller.appspot.com/x/.config?x=786006c5dafbadf6
> dashboard link:
> https://syzkaller.appspot.com/bug?extid=20c32fa6ff84a2d28c36
> compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
> 
> Unfortunately, I don't have any reproducer for this crash yet.
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+20c32fa6ff84a2d28...@syzkaller.appspotmail.com
> 
> kasan: GPF could be caused by NULL-ptr deref or user memory access general
> protection fault:  [#1] PREEMPT SMP KASAN
> CPU: 0 PID: 976 Comm: syz-executor4 Not tainted 4.19.0-rc4-next-20180920+
> #76
> Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 01/01/2011
> RIP: 0010:copy_src_l2_addr drivers/infiniband/core/addr.c:489 [inline]
> RIP: 0010:rdma_set_src_addr_rcu drivers/infiniband/core/addr.c:518 [inline]
> RIP: 0010:addr_resolve+0x7c4/0x1c90 drivers/infiniband/core/addr.c:593
> Code: 0f 84 dd 01 00 00 e8 9b 6c e9 fb 48 8b 85 d8 fd ff ff 48 8d b8 54 02
> 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48
> 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 24
> RSP: 0018:8801c863f360 EFLAGS: 00010202
> RAX: dc00 RBX: 8801c863f578 RCX: c90003e76000
> RDX: 003e RSI: 8593e355 RDI: 01f1
> RBP: 8801c863f5a0 R08: 8801d95722c0 R09: ed003b585b57
> R10: ed003b585b57 R11: 8801dac2dabb R12: 8801d78a1560
> R13: 0001 R14: 000a R15: 
> kobject: 'loop5' (9a4a0383): kobject_uevent_env
> FS:  7fefce9ca700() GS:8801dac0()
> knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 00706158 CR3: 0001d370a000 CR4: 001406f0
> Call Trace:
> kobject: 'loop5' (9a4a0383): fill_kobj_path: path =
> '/devices/virtual/block/loop5'
>   rdma_resolve_ip+0x499/0x790 drivers/infiniband/core/addr.c:697
>   rdma_resolve_addr+0x2d6/0x2870 drivers/infiniband/core/cma.c:2992
> kobject: 'loop3' (f158c859): kobject_uevent_env
> kobject: 'loop3' (f158c859): fill_kobj_path: path =
> '/devices/virtual/block/loop3'
>   ucma_resolve_ip+0x242/0x2a0 drivers/infiniband/core/ucma.c:713
>   ucma_write+0x336/0x420 drivers/infiniband/core/ucma.c:1686
>   __vfs_write+0x119/0x9f0 fs/read_write.c:485
> kobject: 'loop2' (ed1b199d): kobject_uevent_env
>   vfs_write+0x1fc/0x560 fs/read_write.c:549
> kobject: 'loop2' (ed1b199d): fill_kobj_path: path =
> '/devices/virtual/block/loop2'
>   ksys_write+0x101/0x260 fs/read_write.c:598
>   __do_sys_write fs/read_write.c:610 [inline]
>   __se_sys_write fs/read_write.c:607 [inline]
>   __x64_sys_write+0x73/0xb0 fs/read_write.c:607
>   do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x457679
> Code: 1d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7
> 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 
> 0f
> 83 eb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:7fefce9c9c78 EFLAGS: 0246 ORIG_RAX:
> 0001
> RAX: ffda RBX: 7fefce9ca6d4 RCX: 00457679
> RDX: 0048 RSI: 2100 RDI: 0004
> RBP: 0072bf00 R08:  R09: 
> R10:  R11: 0246 R12: 
> R13: 004d8a48 R14: 004cb698 R15: 
> Modules linked in:
> ---[ end trace 416e29924dbdc1a0 ]---
> RIP: 0010:copy_src_l2_addr drivers/infiniband/core/addr.c:489 [inline]
> RIP: 0010:rdma_set_src_addr_rcu drivers/infiniband/core/addr.c:518 [inline]
> RIP: 0010:addr_resolve+0x7c4/0x1c90 drivers/infiniband/core/addr.c:593
> Code: 0f 84 dd 01 00 00 e8 9b 6c e9 fb 48 8b 85 d8 fd ff ff 48 8d b8 54 02
> 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48
> 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 24
> kobject: 'loop5' (9a4a0383): kobject_uevent_env
> kobject: 'loop5' (9a4a0383): fill_kobj_path: p

RE: general protection fault in addr_resolve

2018-09-21 Thread Parav Pandit


> -Original Message-
> From: syzbot 
> Sent: Friday, September 21, 2018 3:35 AM
> To: Daniel Jurgens ; dledf...@redhat.com;
> j...@ziepe.ca; l...@kernel.org; linux-kernel@vger.kernel.org; linux-
> r...@vger.kernel.org; Parav Pandit ; syzkaller-
> b...@googlegroups.com
> Subject: general protection fault in addr_resolve
> 
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:a0cb0cabe4bb Add linux-next specific files for 20180920
> git tree:   linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=14b14e2a40
> kernel config:  https://syzkaller.appspot.com/x/.config?x=786006c5dafbadf6
> dashboard link:
> https://syzkaller.appspot.com/bug?extid=20c32fa6ff84a2d28c36
> compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
> 
> Unfortunately, I don't have any reproducer for this crash yet.
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+20c32fa6ff84a2d28...@syzkaller.appspotmail.com
> 
> kasan: GPF could be caused by NULL-ptr deref or user memory access general
> protection fault:  [#1] PREEMPT SMP KASAN
> CPU: 0 PID: 976 Comm: syz-executor4 Not tainted 4.19.0-rc4-next-20180920+
> #76
> Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 01/01/2011
> RIP: 0010:copy_src_l2_addr drivers/infiniband/core/addr.c:489 [inline]
> RIP: 0010:rdma_set_src_addr_rcu drivers/infiniband/core/addr.c:518 [inline]
> RIP: 0010:addr_resolve+0x7c4/0x1c90 drivers/infiniband/core/addr.c:593
> Code: 0f 84 dd 01 00 00 e8 9b 6c e9 fb 48 8b 85 d8 fd ff ff 48 8d b8 54 02
> 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48
> 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 24
> RSP: 0018:8801c863f360 EFLAGS: 00010202
> RAX: dc00 RBX: 8801c863f578 RCX: c90003e76000
> RDX: 003e RSI: 8593e355 RDI: 01f1
> RBP: 8801c863f5a0 R08: 8801d95722c0 R09: ed003b585b57
> R10: ed003b585b57 R11: 8801dac2dabb R12: 8801d78a1560
> R13: 0001 R14: 000a R15: 
> kobject: 'loop5' (9a4a0383): kobject_uevent_env
> FS:  7fefce9ca700() GS:8801dac0()
> knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 00706158 CR3: 0001d370a000 CR4: 001406f0
> Call Trace:
> kobject: 'loop5' (9a4a0383): fill_kobj_path: path =
> '/devices/virtual/block/loop5'
>   rdma_resolve_ip+0x499/0x790 drivers/infiniband/core/addr.c:697
>   rdma_resolve_addr+0x2d6/0x2870 drivers/infiniband/core/cma.c:2992
> kobject: 'loop3' (f158c859): kobject_uevent_env
> kobject: 'loop3' (f158c859): fill_kobj_path: path =
> '/devices/virtual/block/loop3'
>   ucma_resolve_ip+0x242/0x2a0 drivers/infiniband/core/ucma.c:713
>   ucma_write+0x336/0x420 drivers/infiniband/core/ucma.c:1686
>   __vfs_write+0x119/0x9f0 fs/read_write.c:485
> kobject: 'loop2' (ed1b199d): kobject_uevent_env
>   vfs_write+0x1fc/0x560 fs/read_write.c:549
> kobject: 'loop2' (ed1b199d): fill_kobj_path: path =
> '/devices/virtual/block/loop2'
>   ksys_write+0x101/0x260 fs/read_write.c:598
>   __do_sys_write fs/read_write.c:610 [inline]
>   __se_sys_write fs/read_write.c:607 [inline]
>   __x64_sys_write+0x73/0xb0 fs/read_write.c:607
>   do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x457679
> Code: 1d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7
> 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 
> 0f
> 83 eb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:7fefce9c9c78 EFLAGS: 0246 ORIG_RAX:
> 0001
> RAX: ffda RBX: 7fefce9ca6d4 RCX: 00457679
> RDX: 0048 RSI: 2100 RDI: 0004
> RBP: 0072bf00 R08:  R09: 
> R10:  R11: 0246 R12: 
> R13: 004d8a48 R14: 004cb698 R15: 
> Modules linked in:
> ---[ end trace 416e29924dbdc1a0 ]---
> RIP: 0010:copy_src_l2_addr drivers/infiniband/core/addr.c:489 [inline]
> RIP: 0010:rdma_set_src_addr_rcu drivers/infiniband/core/addr.c:518 [inline]
> RIP: 0010:addr_resolve+0x7c4/0x1c90 drivers/infiniband/core/addr.c:593
> Code: 0f 84 dd 01 00 00 e8 9b 6c e9 fb 48 8b 85 d8 fd ff ff 48 8d b8 54 02
> 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48
> 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 24
> kobject: 'loop5' (9a4a0383): kobject_uevent_env
> kobject: 'loop5' (9a4a0383): fill_kobj_path: p

RE: [PATCH -next] RDMA/core: Properly return the error code of rdma_set_src_addr_rcu

2018-09-19 Thread Parav Pandit
Hi YueHaibing,

> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org  ow...@vger.kernel.org> On Behalf Of Parav Pandit
> Sent: Wednesday, September 19, 2018 8:47 AM
> To: YueHaibing ; dledf...@redhat.com;
> j...@ziepe.ca; l...@kernel.org; Daniel Jurgens 
> Cc: linux-kernel@vger.kernel.org; linux-r...@vger.kernel.org
> Subject: RE: [PATCH -next] RDMA/core: Properly return the error code of
> rdma_set_src_addr_rcu
> 
> 
> 
> > -Original Message-
> > From: YueHaibing 
> > Sent: Wednesday, September 19, 2018 7:29 AM
> > To: dledf...@redhat.com; j...@ziepe.ca; l...@kernel.org; Parav Pandit
> > ; Daniel Jurgens 
> > Cc: linux-kernel@vger.kernel.org; linux-r...@vger.kernel.org;
> > YueHaibing 
> > Subject: [PATCH -next] RDMA/core: Properly return the error code of
> > rdma_set_src_addr_rcu
> >
> > rdma_set_src_addr_rcu should check copy_src_l2_addr fails, rather than
> > always return 0. Also copy_src_l2_addr should return 'ret' as its
> > return value while rdma_translate_ip fails.
> >
> > Fixes: c31d4b2ddf07 ("RDMA/core: Protect against changing dst->dev
> > during destination resolve")
> > Signed-off-by: YueHaibing 
> > ---
> >  drivers/infiniband/core/addr.c | 9 ++---
> >  1 file changed, 6 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/infiniband/core/addr.c
> > b/drivers/infiniband/core/addr.c index 7a0356c..8a31b11 100644
> > --- a/drivers/infiniband/core/addr.c
> > +++ b/drivers/infiniband/core/addr.c
> > @@ -468,7 +468,7 @@ static int addr_resolve_neigh(const struct
> > dst_entry *dst,
> > return ret;
> >  }
> >
> > -static void copy_src_l2_addr(struct rdma_dev_addr *dev_addr,
> > +static int copy_src_l2_addr(struct rdma_dev_addr *dev_addr,
> >  const struct sockaddr *dst_in,
> >  const struct dst_entry *dst,
> >  const struct net_device *ndev) @@ -492,6 +492,8
> @@ static
> > void copy_src_l2_addr(struct rdma_dev_addr *dev_addr,
> > RDMA_NETWORK_IPV6;
> > else
> > dev_addr->network = RDMA_NETWORK_IB;
> > +
> > +   return ret;
> >  }
> >
> >  static int rdma_set_src_addr_rcu(struct rdma_dev_addr *dev_addr, @@ -
> > 499,6 +501,7 @@ static int rdma_set_src_addr_rcu(struct rdma_dev_addr
> > *dev_addr,
> >  const struct sockaddr *dst_in,
> >  const struct dst_entry *dst)
> >  {
> > +   int ret;
> > struct net_device *ndev = READ_ONCE(dst->dev);
> >
> > *ndev_flags = ndev->flags;
> > @@ -515,8 +518,8 @@ static int rdma_set_src_addr_rcu(struct
> > rdma_dev_addr *dev_addr,
> > return -ENODEV;
> > }
> >
> > -   copy_src_l2_addr(dev_addr, dst_in, dst, ndev);
> > -   return 0;
> > +   ret = copy_src_l2_addr(dev_addr, dst_in, dst, ndev);
> > +   return ret;
> >  }
> >
> >  static int set_addr_netns_by_gid_rcu(struct rdma_dev_addr *addr)
> > --
> > 2.7.0
> >
> Reviewed-by: Parav Pandit 

Can you please correct the alignment for copy_src_l2_addr() with this change 
from void to int for rest of the function arguments.



RE: [PATCH -next] RDMA/core: Properly return the error code of rdma_set_src_addr_rcu

2018-09-19 Thread Parav Pandit
Hi YueHaibing,

> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org  ow...@vger.kernel.org> On Behalf Of Parav Pandit
> Sent: Wednesday, September 19, 2018 8:47 AM
> To: YueHaibing ; dledf...@redhat.com;
> j...@ziepe.ca; l...@kernel.org; Daniel Jurgens 
> Cc: linux-kernel@vger.kernel.org; linux-r...@vger.kernel.org
> Subject: RE: [PATCH -next] RDMA/core: Properly return the error code of
> rdma_set_src_addr_rcu
> 
> 
> 
> > -Original Message-
> > From: YueHaibing 
> > Sent: Wednesday, September 19, 2018 7:29 AM
> > To: dledf...@redhat.com; j...@ziepe.ca; l...@kernel.org; Parav Pandit
> > ; Daniel Jurgens 
> > Cc: linux-kernel@vger.kernel.org; linux-r...@vger.kernel.org;
> > YueHaibing 
> > Subject: [PATCH -next] RDMA/core: Properly return the error code of
> > rdma_set_src_addr_rcu
> >
> > rdma_set_src_addr_rcu should check copy_src_l2_addr fails, rather than
> > always return 0. Also copy_src_l2_addr should return 'ret' as its
> > return value while rdma_translate_ip fails.
> >
> > Fixes: c31d4b2ddf07 ("RDMA/core: Protect against changing dst->dev
> > during destination resolve")
> > Signed-off-by: YueHaibing 
> > ---
> >  drivers/infiniband/core/addr.c | 9 ++---
> >  1 file changed, 6 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/infiniband/core/addr.c
> > b/drivers/infiniband/core/addr.c index 7a0356c..8a31b11 100644
> > --- a/drivers/infiniband/core/addr.c
> > +++ b/drivers/infiniband/core/addr.c
> > @@ -468,7 +468,7 @@ static int addr_resolve_neigh(const struct
> > dst_entry *dst,
> > return ret;
> >  }
> >
> > -static void copy_src_l2_addr(struct rdma_dev_addr *dev_addr,
> > +static int copy_src_l2_addr(struct rdma_dev_addr *dev_addr,
> >  const struct sockaddr *dst_in,
> >  const struct dst_entry *dst,
> >  const struct net_device *ndev) @@ -492,6 +492,8
> @@ static
> > void copy_src_l2_addr(struct rdma_dev_addr *dev_addr,
> > RDMA_NETWORK_IPV6;
> > else
> > dev_addr->network = RDMA_NETWORK_IB;
> > +
> > +   return ret;
> >  }
> >
> >  static int rdma_set_src_addr_rcu(struct rdma_dev_addr *dev_addr, @@ -
> > 499,6 +501,7 @@ static int rdma_set_src_addr_rcu(struct rdma_dev_addr
> > *dev_addr,
> >  const struct sockaddr *dst_in,
> >  const struct dst_entry *dst)
> >  {
> > +   int ret;
> > struct net_device *ndev = READ_ONCE(dst->dev);
> >
> > *ndev_flags = ndev->flags;
> > @@ -515,8 +518,8 @@ static int rdma_set_src_addr_rcu(struct
> > rdma_dev_addr *dev_addr,
> > return -ENODEV;
> > }
> >
> > -   copy_src_l2_addr(dev_addr, dst_in, dst, ndev);
> > -   return 0;
> > +   ret = copy_src_l2_addr(dev_addr, dst_in, dst, ndev);
> > +   return ret;
> >  }
> >
> >  static int set_addr_netns_by_gid_rcu(struct rdma_dev_addr *addr)
> > --
> > 2.7.0
> >
> Reviewed-by: Parav Pandit 

Can you please correct the alignment for copy_src_l2_addr() with this change 
from void to int for rest of the function arguments.



RE: [PATCH -next] RDMA/core: Properly return the error code of rdma_set_src_addr_rcu

2018-09-19 Thread Parav Pandit



> -Original Message-
> From: YueHaibing 
> Sent: Wednesday, September 19, 2018 7:29 AM
> To: dledf...@redhat.com; j...@ziepe.ca; l...@kernel.org; Parav Pandit
> ; Daniel Jurgens 
> Cc: linux-kernel@vger.kernel.org; linux-r...@vger.kernel.org; YueHaibing
> 
> Subject: [PATCH -next] RDMA/core: Properly return the error code of
> rdma_set_src_addr_rcu
> 
> rdma_set_src_addr_rcu should check copy_src_l2_addr fails, rather than
> always return 0. Also copy_src_l2_addr should return 'ret' as its return value
> while rdma_translate_ip fails.
> 
> Fixes: c31d4b2ddf07 ("RDMA/core: Protect against changing dst->dev during
> destination resolve")
> Signed-off-by: YueHaibing 
> ---
>  drivers/infiniband/core/addr.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
> index 7a0356c..8a31b11 100644
> --- a/drivers/infiniband/core/addr.c
> +++ b/drivers/infiniband/core/addr.c
> @@ -468,7 +468,7 @@ static int addr_resolve_neigh(const struct dst_entry
> *dst,
>   return ret;
>  }
> 
> -static void copy_src_l2_addr(struct rdma_dev_addr *dev_addr,
> +static int copy_src_l2_addr(struct rdma_dev_addr *dev_addr,
>const struct sockaddr *dst_in,
>const struct dst_entry *dst,
>const struct net_device *ndev)
> @@ -492,6 +492,8 @@ static void copy_src_l2_addr(struct rdma_dev_addr
> *dev_addr,
>   RDMA_NETWORK_IPV6;
>   else
>   dev_addr->network = RDMA_NETWORK_IB;
> +
> + return ret;
>  }
> 
>  static int rdma_set_src_addr_rcu(struct rdma_dev_addr *dev_addr, @@ -
> 499,6 +501,7 @@ static int rdma_set_src_addr_rcu(struct rdma_dev_addr
> *dev_addr,
>const struct sockaddr *dst_in,
>const struct dst_entry *dst)
>  {
> + int ret;
>   struct net_device *ndev = READ_ONCE(dst->dev);
> 
>   *ndev_flags = ndev->flags;
> @@ -515,8 +518,8 @@ static int rdma_set_src_addr_rcu(struct
> rdma_dev_addr *dev_addr,
>   return -ENODEV;
>   }
> 
> - copy_src_l2_addr(dev_addr, dst_in, dst, ndev);
> -     return 0;
> + ret = copy_src_l2_addr(dev_addr, dst_in, dst, ndev);
> + return ret;
>  }
> 
>  static int set_addr_netns_by_gid_rcu(struct rdma_dev_addr *addr)
> --
> 2.7.0
> 
Reviewed-by: Parav Pandit 


RE: [PATCH -next] RDMA/core: Properly return the error code of rdma_set_src_addr_rcu

2018-09-19 Thread Parav Pandit



> -Original Message-
> From: YueHaibing 
> Sent: Wednesday, September 19, 2018 7:29 AM
> To: dledf...@redhat.com; j...@ziepe.ca; l...@kernel.org; Parav Pandit
> ; Daniel Jurgens 
> Cc: linux-kernel@vger.kernel.org; linux-r...@vger.kernel.org; YueHaibing
> 
> Subject: [PATCH -next] RDMA/core: Properly return the error code of
> rdma_set_src_addr_rcu
> 
> rdma_set_src_addr_rcu should check copy_src_l2_addr fails, rather than
> always return 0. Also copy_src_l2_addr should return 'ret' as its return value
> while rdma_translate_ip fails.
> 
> Fixes: c31d4b2ddf07 ("RDMA/core: Protect against changing dst->dev during
> destination resolve")
> Signed-off-by: YueHaibing 
> ---
>  drivers/infiniband/core/addr.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
> index 7a0356c..8a31b11 100644
> --- a/drivers/infiniband/core/addr.c
> +++ b/drivers/infiniband/core/addr.c
> @@ -468,7 +468,7 @@ static int addr_resolve_neigh(const struct dst_entry
> *dst,
>   return ret;
>  }
> 
> -static void copy_src_l2_addr(struct rdma_dev_addr *dev_addr,
> +static int copy_src_l2_addr(struct rdma_dev_addr *dev_addr,
>const struct sockaddr *dst_in,
>const struct dst_entry *dst,
>const struct net_device *ndev)
> @@ -492,6 +492,8 @@ static void copy_src_l2_addr(struct rdma_dev_addr
> *dev_addr,
>   RDMA_NETWORK_IPV6;
>   else
>   dev_addr->network = RDMA_NETWORK_IB;
> +
> + return ret;
>  }
> 
>  static int rdma_set_src_addr_rcu(struct rdma_dev_addr *dev_addr, @@ -
> 499,6 +501,7 @@ static int rdma_set_src_addr_rcu(struct rdma_dev_addr
> *dev_addr,
>const struct sockaddr *dst_in,
>const struct dst_entry *dst)
>  {
> + int ret;
>   struct net_device *ndev = READ_ONCE(dst->dev);
> 
>   *ndev_flags = ndev->flags;
> @@ -515,8 +518,8 @@ static int rdma_set_src_addr_rcu(struct
> rdma_dev_addr *dev_addr,
>   return -ENODEV;
>   }
> 
> - copy_src_l2_addr(dev_addr, dst_in, dst, ndev);
> -     return 0;
> + ret = copy_src_l2_addr(dev_addr, dst_in, dst, ndev);
> + return ret;
>  }
> 
>  static int set_addr_netns_by_gid_rcu(struct rdma_dev_addr *addr)
> --
> 2.7.0
> 
Reviewed-by: Parav Pandit 


RE: [RDMA bug] KASAN: use-after-free Read in __list_del_entry_valid (4)

2018-08-23 Thread Parav Pandit
Hi Doug,

> -Original Message-
> From: Doug Ledford 
> Sent: Thursday, August 23, 2018 11:56 AM
> To: Parav Pandit ; Jason Gunthorpe ;
> Eric Biggers 
> Cc: linux-r...@vger.kernel.org; dasaratharaman.chandramo...@intel.com;
> Leon Romanovsky ; linux-kernel@vger.kernel.org;
> Mark Bloch ; Moni Shoua ;
> syzkaller-b...@googlegroups.com; syzbot
> 
> Subject: Re: [RDMA bug] KASAN: use-after-free Read in __list_del_entry_valid
> (4)
> 
> On Thu, 2018-08-23 at 16:39 +, Parav Pandit wrote:
> > > -Original Message-
> > > From: Jason Gunthorpe 
> > > Sent: Thursday, August 23, 2018 9:55 AM
> > > To: Eric Biggers 
> > > Cc: Doug Ledford ; linux-r...@vger.kernel.org;
> > > dasaratharaman.chandramo...@intel.com; Leon Romanovsky
> > > ; linux-kernel@vger.kernel.org; Mark Bloch
> > > ; Moni Shoua ; Parav
> Pandit
> > > ; syzkaller-b...@googlegroups.com; syzbot
> > > 
> > > Subject: Re: [RDMA bug] KASAN: use-after-free Read in
> > > __list_del_entry_valid
> > > (4)
> > >
> > > On Wed, Aug 22, 2018 at 11:16:31PM -0700, Eric Biggers wrote:
> > > > Hello RDMA / InfiniBand maintainers,
> > > >
> > > > This is an RDMA bug and it still occurs on Linus' tree as of today
> > > > (commit 815f0ddb346c1960).
> > > >
> > > > I've also simplified the reproducer for it; see below after the original
> report.
> > > > Apparently it involves a race between RDMA_USER_CM_CMD_RESOLVE_IP
> > >
> > > and
> > > > RDMA_USER_CM_CMD_LISTEN.
> > >
> > > That is an amazing reproducer!
> > >
> > > I have a feeling this is the same cause as all the other syzkaller bugs 
> > > in this
> code:
> > > lack of any sane locking at all :\
> > >
> > > We've talked about chucking a big lock around this whole thing, but
> > > nobody has done it yet.. It isn't so simple.
> > >
> >
> > I had some code in which reduces three locks (handler_lock, qp_mutex,
> id_lock) to single mutex to protect the cm_id and protects every exported
> symbol of rdmacm which works on cm_id.
> > But not ready enough to post it as patch yet. Lot of tests required before 
> > I get
> there and some refactor too before that.
> 
> Does it finally address the fact that the rdmacm code was written so that it 
> was
> always synchronous but RoCE src gid (I think that's what it was, I'm typing 
> this
> from long ago memory) lookup broke that assumption?
> 
I am not sure. 
To me it is unlikely, because rdma_resolve_route() for InfiniBand is not 
synchronous either which needs to query the SA.
But qp_mutex existed long before that which doesn't provide any performance 
improvements. ( by splitting as 3rd lock instead of id_lock and handler_lock) 
and so on.


> --
> Doug Ledford 
> GPG KeyID: B826A3330E572FDD
> Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


RE: [RDMA bug] KASAN: use-after-free Read in __list_del_entry_valid (4)

2018-08-23 Thread Parav Pandit
Hi Doug,

> -Original Message-
> From: Doug Ledford 
> Sent: Thursday, August 23, 2018 11:56 AM
> To: Parav Pandit ; Jason Gunthorpe ;
> Eric Biggers 
> Cc: linux-r...@vger.kernel.org; dasaratharaman.chandramo...@intel.com;
> Leon Romanovsky ; linux-kernel@vger.kernel.org;
> Mark Bloch ; Moni Shoua ;
> syzkaller-b...@googlegroups.com; syzbot
> 
> Subject: Re: [RDMA bug] KASAN: use-after-free Read in __list_del_entry_valid
> (4)
> 
> On Thu, 2018-08-23 at 16:39 +, Parav Pandit wrote:
> > > -Original Message-
> > > From: Jason Gunthorpe 
> > > Sent: Thursday, August 23, 2018 9:55 AM
> > > To: Eric Biggers 
> > > Cc: Doug Ledford ; linux-r...@vger.kernel.org;
> > > dasaratharaman.chandramo...@intel.com; Leon Romanovsky
> > > ; linux-kernel@vger.kernel.org; Mark Bloch
> > > ; Moni Shoua ; Parav
> Pandit
> > > ; syzkaller-b...@googlegroups.com; syzbot
> > > 
> > > Subject: Re: [RDMA bug] KASAN: use-after-free Read in
> > > __list_del_entry_valid
> > > (4)
> > >
> > > On Wed, Aug 22, 2018 at 11:16:31PM -0700, Eric Biggers wrote:
> > > > Hello RDMA / InfiniBand maintainers,
> > > >
> > > > This is an RDMA bug and it still occurs on Linus' tree as of today
> > > > (commit 815f0ddb346c1960).
> > > >
> > > > I've also simplified the reproducer for it; see below after the original
> report.
> > > > Apparently it involves a race between RDMA_USER_CM_CMD_RESOLVE_IP
> > >
> > > and
> > > > RDMA_USER_CM_CMD_LISTEN.
> > >
> > > That is an amazing reproducer!
> > >
> > > I have a feeling this is the same cause as all the other syzkaller bugs 
> > > in this
> code:
> > > lack of any sane locking at all :\
> > >
> > > We've talked about chucking a big lock around this whole thing, but
> > > nobody has done it yet.. It isn't so simple.
> > >
> >
> > I had some code in which reduces three locks (handler_lock, qp_mutex,
> id_lock) to single mutex to protect the cm_id and protects every exported
> symbol of rdmacm which works on cm_id.
> > But not ready enough to post it as patch yet. Lot of tests required before 
> > I get
> there and some refactor too before that.
> 
> Does it finally address the fact that the rdmacm code was written so that it 
> was
> always synchronous but RoCE src gid (I think that's what it was, I'm typing 
> this
> from long ago memory) lookup broke that assumption?
> 
I am not sure. 
To me it is unlikely, because rdma_resolve_route() for InfiniBand is not 
synchronous either which needs to query the SA.
But qp_mutex existed long before that which doesn't provide any performance 
improvements. ( by splitting as 3rd lock instead of id_lock and handler_lock) 
and so on.


> --
> Doug Ledford 
> GPG KeyID: B826A3330E572FDD
> Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


RE: [RDMA bug] KASAN: use-after-free Read in __list_del_entry_valid (4)

2018-08-23 Thread Parav Pandit



> -Original Message-
> From: Jason Gunthorpe 
> Sent: Thursday, August 23, 2018 9:55 AM
> To: Eric Biggers 
> Cc: Doug Ledford ; linux-r...@vger.kernel.org;
> dasaratharaman.chandramo...@intel.com; Leon Romanovsky
> ; linux-kernel@vger.kernel.org; Mark Bloch
> ; Moni Shoua ; Parav Pandit
> ; syzkaller-b...@googlegroups.com; syzbot
> 
> Subject: Re: [RDMA bug] KASAN: use-after-free Read in __list_del_entry_valid
> (4)
> 
> On Wed, Aug 22, 2018 at 11:16:31PM -0700, Eric Biggers wrote:
> > Hello RDMA / InfiniBand maintainers,
> >
> > This is an RDMA bug and it still occurs on Linus' tree as of today
> > (commit 815f0ddb346c1960).
> >
> > I've also simplified the reproducer for it; see below after the original 
> > report.
> > Apparently it involves a race between RDMA_USER_CM_CMD_RESOLVE_IP
> and
> > RDMA_USER_CM_CMD_LISTEN.
> 
> That is an amazing reproducer!
> 
> I have a feeling this is the same cause as all the other syzkaller bugs in 
> this code:
> lack of any sane locking at all :\
> 
> We've talked about chucking a big lock around this whole thing, but nobody has
> done it yet.. It isn't so simple.
> 
I had some code in which reduces three locks (handler_lock, qp_mutex, id_lock) 
to single mutex to protect the cm_id and protects every exported symbol of 
rdmacm which works on cm_id.
But not ready enough to post it as patch yet. Lot of tests required before I 
get there and some refactor too before that.

> Jason


RE: [RDMA bug] KASAN: use-after-free Read in __list_del_entry_valid (4)

2018-08-23 Thread Parav Pandit



> -Original Message-
> From: Jason Gunthorpe 
> Sent: Thursday, August 23, 2018 9:55 AM
> To: Eric Biggers 
> Cc: Doug Ledford ; linux-r...@vger.kernel.org;
> dasaratharaman.chandramo...@intel.com; Leon Romanovsky
> ; linux-kernel@vger.kernel.org; Mark Bloch
> ; Moni Shoua ; Parav Pandit
> ; syzkaller-b...@googlegroups.com; syzbot
> 
> Subject: Re: [RDMA bug] KASAN: use-after-free Read in __list_del_entry_valid
> (4)
> 
> On Wed, Aug 22, 2018 at 11:16:31PM -0700, Eric Biggers wrote:
> > Hello RDMA / InfiniBand maintainers,
> >
> > This is an RDMA bug and it still occurs on Linus' tree as of today
> > (commit 815f0ddb346c1960).
> >
> > I've also simplified the reproducer for it; see below after the original 
> > report.
> > Apparently it involves a race between RDMA_USER_CM_CMD_RESOLVE_IP
> and
> > RDMA_USER_CM_CMD_LISTEN.
> 
> That is an amazing reproducer!
> 
> I have a feeling this is the same cause as all the other syzkaller bugs in 
> this code:
> lack of any sane locking at all :\
> 
> We've talked about chucking a big lock around this whole thing, but nobody has
> done it yet.. It isn't so simple.
> 
I had some code in which reduces three locks (handler_lock, qp_mutex, id_lock) 
to single mutex to protect the cm_id and protects every exported symbol of 
rdmacm which works on cm_id.
But not ready enough to post it as patch yet. Lot of tests required before I 
get there and some refactor too before that.

> Jason


RE: [PATCH] IB/mlx5: avoid binding a new mpi unit to the same devices repeatedly

2018-07-23 Thread Parav Pandit
Hi Qing,


> -Original Message-
> From: Qing Huang [mailto:qing.hu...@oracle.com]
> Sent: Monday, July 23, 2018 10:36 AM
> To: Daniel Jurgens ; Or Gerlitz
> ; Parav Pandit 
> Cc: Linux Kernel ; RDMA mailing list  r...@vger.kernel.org>; Jason Gunthorpe ; Doug Ledford
> ; Leon Romanovsky ;
> gerald.gib...@oracle.com
> Subject: Re: [PATCH] IB/mlx5: avoid binding a new mpi unit to the same
> devices repeatedly
> 
> 
> 
> On 7/15/2018 12:48 PM, Daniel Jurgens wrote:
> > On 7/14/2018 10:57 AM, Or Gerlitz wrote:
> >> On Sat, Jul 14, 2018 at 2:50 AM, Qing Huang 
> wrote:
> >>> When a CX5 device is configured in dual-port RoCE mode, after
> >>> creating many VFs against port 1, creating the same number of VFs
> >>> against port 2 will flood kernel/syslog with something like
> >>> "mlx5_*:mlx5_ib_bind_slave_port:4266:(pid 5269): port 2 already
> >>> affiliated."
> >>>
> >>> So basically, when traversing mlx5_ib_dev_list,
> >>> mlx5_ib_add_slave_port() shouldn't repeatedly attempt to bind the
> >>> new mpi data unit to every device on the list until it finds an unbound
> device.
> >> Daniel,
> >>
> >> What is mpi data unit?
> > It's a structure to keep track affiliated port info in dual port RoCE mode,
> mpi meaning multi-port info. Parav can review this it my absence, otherwise I
> can take a closer look when I return to the office.
> Hi Daniel/Parav,
> 
> Have you got a chance to review this patch? Thanks!
Didn't have chance yet.
Will do this week. 


RE: [PATCH] IB/mlx5: avoid binding a new mpi unit to the same devices repeatedly

2018-07-23 Thread Parav Pandit
Hi Qing,


> -Original Message-
> From: Qing Huang [mailto:qing.hu...@oracle.com]
> Sent: Monday, July 23, 2018 10:36 AM
> To: Daniel Jurgens ; Or Gerlitz
> ; Parav Pandit 
> Cc: Linux Kernel ; RDMA mailing list  r...@vger.kernel.org>; Jason Gunthorpe ; Doug Ledford
> ; Leon Romanovsky ;
> gerald.gib...@oracle.com
> Subject: Re: [PATCH] IB/mlx5: avoid binding a new mpi unit to the same
> devices repeatedly
> 
> 
> 
> On 7/15/2018 12:48 PM, Daniel Jurgens wrote:
> > On 7/14/2018 10:57 AM, Or Gerlitz wrote:
> >> On Sat, Jul 14, 2018 at 2:50 AM, Qing Huang 
> wrote:
> >>> When a CX5 device is configured in dual-port RoCE mode, after
> >>> creating many VFs against port 1, creating the same number of VFs
> >>> against port 2 will flood kernel/syslog with something like
> >>> "mlx5_*:mlx5_ib_bind_slave_port:4266:(pid 5269): port 2 already
> >>> affiliated."
> >>>
> >>> So basically, when traversing mlx5_ib_dev_list,
> >>> mlx5_ib_add_slave_port() shouldn't repeatedly attempt to bind the
> >>> new mpi data unit to every device on the list until it finds an unbound
> device.
> >> Daniel,
> >>
> >> What is mpi data unit?
> > It's a structure to keep track affiliated port info in dual port RoCE mode,
> mpi meaning multi-port info. Parav can review this it my absence, otherwise I
> can take a closer look when I return to the office.
> Hi Daniel/Parav,
> 
> Have you got a chance to review this patch? Thanks!
Didn't have chance yet.
Will do this week. 


RE: general protection fault in rdma_resolve_route

2018-04-19 Thread Parav Pandit


> -Original Message-
> From: syzbot
> [mailto:syzbot+17c13600b3977aa8e...@syzkaller.appspotmail.com]
> Sent: Thursday, April 19, 2018 11:04 AM
> To: Daniel Jurgens <dani...@mellanox.com>;
> dasaratharaman.chandramo...@intel.com; dledf...@redhat.com;
> j...@ziepe.ca; l...@kernel.org; linux-kernel@vger.kernel.org; linux-
> r...@vger.kernel.org; Moni Shoua <mo...@mellanox.com>; Parav Pandit
> <pa...@mellanox.com>; sw...@opengridcomputing.com; syzkaller-
> b...@googlegroups.com
> Subject: general protection fault in rdma_resolve_route
> 
> Hello,
> 
> syzbot hit the following crash on upstream commit
> a27fc14219f2e3c4a46ba9177b04d9b52c875532 (Mon Apr 16 21:07:39 2018
> +) Merge branch 'parisc-4.17-3' of
> git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
> syzbot dashboard link:
> https://syzkaller.appspot.com/bug?extid=17c13600b3977aa8ef7f
> 
> So far this crash happened 2 times on upstream.
> Unfortunately, I don't have any reproducer for this crash yet.
> Raw console output:
> https://syzkaller.appspot.com/x/log.txt?id=6198183931674624
> Kernel config:
> https://syzkaller.appspot.com/x/.config?id=-5914490758943236750
> compiler: gcc (GCC) 8.0.1 20180413 (experimental)
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+17c13600b3977aa8e...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for details.
> If you forward the report, please keep this part and the footer.
> 
> kasan: CONFIG_KASAN_INLINE enabled
> kasan: GPF could be caused by NULL-ptr deref or user memory access general
> protection fault:  [#1] SMP KASAN Dumping ftrace buffer:
> (ftrace buffer empty)
> Modules linked in:
> CPU: 1 PID: 750 Comm: syz-executor4 Not tainted 4.17.0-rc1+ #6 Hardware
> name: Google Google Compute Engine/Google Compute Engine, BIOS Google
> 01/01/2011
> RIP: 0010:rdma_cap_ib_sa include/rdma/ib_verbs.h:2840 [inline]
> RIP: 0010:rdma_resolve_route+0x134/0x2160
> drivers/infiniband/core/cma.c:2668
> RSP: 0018:8801b3e87850 EFLAGS: 00010202
> RAX:  RBX: 8801abf92c00 RCX: 0029
> RDX: dc00 RSI: 0004 RDI: 0148
> RBP: 8801b3e87a00 R08: ed00357f25e5 R09: ed00357f25e4
> R10: ed00357f25e4 R11: 8801abf92f23 R12: 1100367d0f12
> R13: dc00 R14: 8801abf92db8 R15: 
> FS:  7f673e752700() GS:8801db10()
> knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 00a3eab8 CR3: 0001b10e7000 CR4: 001426e0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400 Call
> Trace:
>   ucma_resolve_route+0x179/0x1c0 drivers/infiniband/core/ucma.c:741
>   ucma_write+0x328/0x410 drivers/infiniband/core/ucma.c:1664
>   __vfs_write+0x10b/0x880 fs/read_write.c:485
>   vfs_write+0x1f8/0x560 fs/read_write.c:549
>   ksys_write+0xf9/0x250 fs/read_write.c:598
>   __do_sys_write fs/read_write.c:610 [inline]
>   __se_sys_write fs/read_write.c:607 [inline]
>   __x64_sys_write+0x73/0xb0 fs/read_write.c:607
>   do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x455329
> RSP: 002b:7f673e751c68 EFLAGS: 0246 ORIG_RAX:
> 0001
> RAX: ffda RBX: 7f673e7526d4 RCX: 00455329
> RDX: 0010 RSI: 2100 RDI: 0014
> RBP: 0072c010 R08:  R09: 
> R10:  R11: 0246 R12: 
> R13: 06c3 R14: 006fd2e8 R15: 0002
> Code: ff df 48 c1 ea 03 80 3c 02 00 0f 85 14 1c 00 00 48 ba 00 00 00 00 00 fc 
> ff df
> 48 8b 03 48 8d b8 48 01 00 00 48 89 f9 48 c1 e9 03 <80> 3c 11 00 0f 85 d7 1b 
> 00
> 00 45 0f b6 ef 49 c1 e5 04 4c 03 a8
> RIP: rdma_cap_ib_sa include/rdma/ib_verbs.h:2840 [inline] RSP:
> 8801b3e87850
> RIP: rdma_resolve_route+0x134/0x2160 drivers/infiniband/core/cma.c:2668
> RSP: 8801b3e87850
> ---[ end trace c34c2fb6aeff4a19 ]---
> 
> 
> ---
> This bug is generated by a dumb bot. It may contain errors.
> See https://goo.gl/tpsmEJ for details.
> Direct all questions to syzkal...@googlegroups.com.
> 
> syzbot will keep track of this bug report.
> If you forgot to add the Reported-by tag, once the fix for this bug is merged 
> into
> any tree, please reply to this email with:
> #syz fix: exact-commit-title
> To mark this as a duplicate of another syzbot report, please reply with:
> #syz dup: exact-subject-of-another-repo

RE: general protection fault in rdma_resolve_route

2018-04-19 Thread Parav Pandit


> -Original Message-
> From: syzbot
> [mailto:syzbot+17c13600b3977aa8e...@syzkaller.appspotmail.com]
> Sent: Thursday, April 19, 2018 11:04 AM
> To: Daniel Jurgens ;
> dasaratharaman.chandramo...@intel.com; dledf...@redhat.com;
> j...@ziepe.ca; l...@kernel.org; linux-kernel@vger.kernel.org; linux-
> r...@vger.kernel.org; Moni Shoua ; Parav Pandit
> ; sw...@opengridcomputing.com; syzkaller-
> b...@googlegroups.com
> Subject: general protection fault in rdma_resolve_route
> 
> Hello,
> 
> syzbot hit the following crash on upstream commit
> a27fc14219f2e3c4a46ba9177b04d9b52c875532 (Mon Apr 16 21:07:39 2018
> +) Merge branch 'parisc-4.17-3' of
> git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
> syzbot dashboard link:
> https://syzkaller.appspot.com/bug?extid=17c13600b3977aa8ef7f
> 
> So far this crash happened 2 times on upstream.
> Unfortunately, I don't have any reproducer for this crash yet.
> Raw console output:
> https://syzkaller.appspot.com/x/log.txt?id=6198183931674624
> Kernel config:
> https://syzkaller.appspot.com/x/.config?id=-5914490758943236750
> compiler: gcc (GCC) 8.0.1 20180413 (experimental)
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+17c13600b3977aa8e...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for details.
> If you forward the report, please keep this part and the footer.
> 
> kasan: CONFIG_KASAN_INLINE enabled
> kasan: GPF could be caused by NULL-ptr deref or user memory access general
> protection fault:  [#1] SMP KASAN Dumping ftrace buffer:
> (ftrace buffer empty)
> Modules linked in:
> CPU: 1 PID: 750 Comm: syz-executor4 Not tainted 4.17.0-rc1+ #6 Hardware
> name: Google Google Compute Engine/Google Compute Engine, BIOS Google
> 01/01/2011
> RIP: 0010:rdma_cap_ib_sa include/rdma/ib_verbs.h:2840 [inline]
> RIP: 0010:rdma_resolve_route+0x134/0x2160
> drivers/infiniband/core/cma.c:2668
> RSP: 0018:8801b3e87850 EFLAGS: 00010202
> RAX:  RBX: 8801abf92c00 RCX: 0029
> RDX: dc00 RSI: 0004 RDI: 0148
> RBP: 8801b3e87a00 R08: ed00357f25e5 R09: ed00357f25e4
> R10: ed00357f25e4 R11: 8801abf92f23 R12: 1100367d0f12
> R13: dc00 R14: 8801abf92db8 R15: 
> FS:  7f673e752700() GS:8801db10()
> knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 00a3eab8 CR3: 0001b10e7000 CR4: 001426e0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400 Call
> Trace:
>   ucma_resolve_route+0x179/0x1c0 drivers/infiniband/core/ucma.c:741
>   ucma_write+0x328/0x410 drivers/infiniband/core/ucma.c:1664
>   __vfs_write+0x10b/0x880 fs/read_write.c:485
>   vfs_write+0x1f8/0x560 fs/read_write.c:549
>   ksys_write+0xf9/0x250 fs/read_write.c:598
>   __do_sys_write fs/read_write.c:610 [inline]
>   __se_sys_write fs/read_write.c:607 [inline]
>   __x64_sys_write+0x73/0xb0 fs/read_write.c:607
>   do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x455329
> RSP: 002b:7f673e751c68 EFLAGS: 0246 ORIG_RAX:
> 0001
> RAX: ffda RBX: 7f673e7526d4 RCX: 00455329
> RDX: 0010 RSI: 2100 RDI: 0014
> RBP: 0072c010 R08:  R09: 
> R10:  R11: 0246 R12: 
> R13: 06c3 R14: 006fd2e8 R15: 0002
> Code: ff df 48 c1 ea 03 80 3c 02 00 0f 85 14 1c 00 00 48 ba 00 00 00 00 00 fc 
> ff df
> 48 8b 03 48 8d b8 48 01 00 00 48 89 f9 48 c1 e9 03 <80> 3c 11 00 0f 85 d7 1b 
> 00
> 00 45 0f b6 ef 49 c1 e5 04 4c 03 a8
> RIP: rdma_cap_ib_sa include/rdma/ib_verbs.h:2840 [inline] RSP:
> 8801b3e87850
> RIP: rdma_resolve_route+0x134/0x2160 drivers/infiniband/core/cma.c:2668
> RSP: 8801b3e87850
> ---[ end trace c34c2fb6aeff4a19 ]---
> 
> 
> ---
> This bug is generated by a dumb bot. It may contain errors.
> See https://goo.gl/tpsmEJ for details.
> Direct all questions to syzkal...@googlegroups.com.
> 
> syzbot will keep track of this bug report.
> If you forgot to add the Reported-by tag, once the fix for this bug is merged 
> into
> any tree, please reply to this email with:
> #syz fix: exact-commit-title
> To mark this as a duplicate of another syzbot report, please reply with:
> #syz dup: exact-subject-of-another-report If it's a one-off invalid bug 
> report,
> please reply with:
> #sy

RE: [Patch v2 2/6] cifs: Allocate validate negotiation request through kmalloc

2018-04-17 Thread Parav Pandit


> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> ow...@vger.kernel.org] On Behalf Of Long Li
> Sent: Tuesday, April 17, 2018 2:17 PM
> To: Steve French <sfre...@samba.org>; linux-c...@vger.kernel.org; samba-
> techni...@lists.samba.org; linux-kernel@vger.kernel.org; linux-
> r...@vger.kernel.org
> Cc: longli <lon...@microsoft.com>; sta...@vger.kernel.org
> Subject: [Patch v2 2/6] cifs: Allocate validate negotiation request through
> kmalloc
> 
> From: Long Li <lon...@microsoft.com>
> 
> The data buffer allocated on the stack can't be DMA'ed, and hence can't send
> through RDMA via SMB Direct.
> 
> Fix this by allocating the request on the heap in smb3_validate_negotiate.
> 
> Fixes: ff1c038addc4f205d5f1ede449426c7d316c0eed "Check SMB3 dialects
> against downgrade attacks"
> 

Format is:
Fixes: ff1c038addc4 ("Check SMB3 dialects against downgrade attacks")
It should be right above Signed-off signature.

> Changes in v2:
> Removed duplicated code on freeing buffers on function exit.
> (Thanks to Parav Pandit <pa...@mellanox.com>)
> 
> Fixed typo in the patch title.
> 
> Signed-off-by: Long Li <lon...@microsoft.com>
> Cc: sta...@vger.kernel.org
> ---
>  fs/cifs/smb2pdu.c | 57 ++
> -
>  1 file changed, 31 insertions(+), 26 deletions(-)
> 
> diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c index 0f044c4..41625e4
> 100644
> --- a/fs/cifs/smb2pdu.c
> +++ b/fs/cifs/smb2pdu.c
> @@ -729,8 +729,8 @@ SMB2_negotiate(const unsigned int xid, struct cifs_ses
> *ses)
> 
>  int smb3_validate_negotiate(const unsigned int xid, struct cifs_tcon *tcon)  
> {
> - int rc = 0;
> - struct validate_negotiate_info_req vneg_inbuf;
> + int ret, rc = -EIO;
> + struct validate_negotiate_info_req *pneg_inbuf;
>   struct validate_negotiate_info_rsp *pneg_rsp = NULL;
>   u32 rsplen;
>   u32 inbuflen; /* max of 4 dialects */
> @@ -741,6 +741,9 @@ int smb3_validate_negotiate(const unsigned int xid,
> struct cifs_tcon *tcon)
>   if (tcon->ses->server->rdma)
>   return 0;
>  #endif
> + pneg_inbuf = kmalloc(sizeof(*pneg_inbuf), GFP_KERNEL);
> + if (!pneg_inbuf)
> + return -ENOMEM;
> 
>   /* In SMB3.11 preauth integrity supersedes validate negotiate */
>   if (tcon->ses->server->dialect == SMB311_PROT_ID) @@ -764,53
> +767,53 @@ int smb3_validate_negotiate(const unsigned int xid, struct
> cifs_tcon *tcon)
>   if (tcon->ses->session_flags & SMB2_SESSION_FLAG_IS_NULL)
>   cifs_dbg(VFS, "Unexpected null user (anonymous) auth flag sent
> by server\n");
> 
> - vneg_inbuf.Capabilities =
> + pneg_inbuf->Capabilities =
>   cpu_to_le32(tcon->ses->server->vals-
> >req_capabilities);
> - memcpy(vneg_inbuf.Guid, tcon->ses->server->client_guid,
> + memcpy(pneg_inbuf->Guid, tcon->ses->server->client_guid,
>   SMB2_CLIENT_GUID_SIZE);
> 
>   if (tcon->ses->sign)
> - vneg_inbuf.SecurityMode =
> + pneg_inbuf->SecurityMode =
>   cpu_to_le16(SMB2_NEGOTIATE_SIGNING_REQUIRED);
>   else if (global_secflags & CIFSSEC_MAY_SIGN)
> - vneg_inbuf.SecurityMode =
> + pneg_inbuf->SecurityMode =
>   cpu_to_le16(SMB2_NEGOTIATE_SIGNING_ENABLED);
>   else
> - vneg_inbuf.SecurityMode = 0;
> + pneg_inbuf->SecurityMode = 0;
> 
> 
>   if (strcmp(tcon->ses->server->vals->version_string,
Please check if strncmp() should be used or not.

>   SMB3ANY_VERSION_STRING) == 0) {
> - vneg_inbuf.Dialects[0] = cpu_to_le16(SMB30_PROT_ID);
> - vneg_inbuf.Dialects[1] = cpu_to_le16(SMB302_PROT_ID);
> - vneg_inbuf.DialectCount = cpu_to_le16(2);
> + pneg_inbuf->Dialects[0] = cpu_to_le16(SMB30_PROT_ID);
> + pneg_inbuf->Dialects[1] = cpu_to_le16(SMB302_PROT_ID);
> + pneg_inbuf->DialectCount = cpu_to_le16(2);
>   /* structure is big enough for 3 dialects, sending only 2 */
>   inbuflen = sizeof(struct validate_negotiate_info_req) - 2;
>   } else if (strcmp(tcon->ses->server->vals->version_string,
>   SMBDEFAULT_VERSION_STRING) == 0) {
> - vneg_inbuf.Dialects[0] = cpu_to_le16(SMB21_PROT_ID);
> - vneg_inbuf.Dialects[1] = cpu_to_le16(SMB30_PROT_ID);
> - vneg_inbuf.Dialects[2] = cpu_to_le16(SMB302

RE: [Patch v2 2/6] cifs: Allocate validate negotiation request through kmalloc

2018-04-17 Thread Parav Pandit


> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> ow...@vger.kernel.org] On Behalf Of Long Li
> Sent: Tuesday, April 17, 2018 2:17 PM
> To: Steve French ; linux-c...@vger.kernel.org; samba-
> techni...@lists.samba.org; linux-kernel@vger.kernel.org; linux-
> r...@vger.kernel.org
> Cc: longli ; sta...@vger.kernel.org
> Subject: [Patch v2 2/6] cifs: Allocate validate negotiation request through
> kmalloc
> 
> From: Long Li 
> 
> The data buffer allocated on the stack can't be DMA'ed, and hence can't send
> through RDMA via SMB Direct.
> 
> Fix this by allocating the request on the heap in smb3_validate_negotiate.
> 
> Fixes: ff1c038addc4f205d5f1ede449426c7d316c0eed "Check SMB3 dialects
> against downgrade attacks"
> 

Format is:
Fixes: ff1c038addc4 ("Check SMB3 dialects against downgrade attacks")
It should be right above Signed-off signature.

> Changes in v2:
> Removed duplicated code on freeing buffers on function exit.
> (Thanks to Parav Pandit )
> 
> Fixed typo in the patch title.
> 
> Signed-off-by: Long Li 
> Cc: sta...@vger.kernel.org
> ---
>  fs/cifs/smb2pdu.c | 57 ++
> -
>  1 file changed, 31 insertions(+), 26 deletions(-)
> 
> diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c index 0f044c4..41625e4
> 100644
> --- a/fs/cifs/smb2pdu.c
> +++ b/fs/cifs/smb2pdu.c
> @@ -729,8 +729,8 @@ SMB2_negotiate(const unsigned int xid, struct cifs_ses
> *ses)
> 
>  int smb3_validate_negotiate(const unsigned int xid, struct cifs_tcon *tcon)  
> {
> - int rc = 0;
> - struct validate_negotiate_info_req vneg_inbuf;
> + int ret, rc = -EIO;
> + struct validate_negotiate_info_req *pneg_inbuf;
>   struct validate_negotiate_info_rsp *pneg_rsp = NULL;
>   u32 rsplen;
>   u32 inbuflen; /* max of 4 dialects */
> @@ -741,6 +741,9 @@ int smb3_validate_negotiate(const unsigned int xid,
> struct cifs_tcon *tcon)
>   if (tcon->ses->server->rdma)
>   return 0;
>  #endif
> + pneg_inbuf = kmalloc(sizeof(*pneg_inbuf), GFP_KERNEL);
> + if (!pneg_inbuf)
> + return -ENOMEM;
> 
>   /* In SMB3.11 preauth integrity supersedes validate negotiate */
>   if (tcon->ses->server->dialect == SMB311_PROT_ID) @@ -764,53
> +767,53 @@ int smb3_validate_negotiate(const unsigned int xid, struct
> cifs_tcon *tcon)
>   if (tcon->ses->session_flags & SMB2_SESSION_FLAG_IS_NULL)
>   cifs_dbg(VFS, "Unexpected null user (anonymous) auth flag sent
> by server\n");
> 
> - vneg_inbuf.Capabilities =
> + pneg_inbuf->Capabilities =
>   cpu_to_le32(tcon->ses->server->vals-
> >req_capabilities);
> - memcpy(vneg_inbuf.Guid, tcon->ses->server->client_guid,
> + memcpy(pneg_inbuf->Guid, tcon->ses->server->client_guid,
>   SMB2_CLIENT_GUID_SIZE);
> 
>   if (tcon->ses->sign)
> - vneg_inbuf.SecurityMode =
> + pneg_inbuf->SecurityMode =
>   cpu_to_le16(SMB2_NEGOTIATE_SIGNING_REQUIRED);
>   else if (global_secflags & CIFSSEC_MAY_SIGN)
> - vneg_inbuf.SecurityMode =
> + pneg_inbuf->SecurityMode =
>   cpu_to_le16(SMB2_NEGOTIATE_SIGNING_ENABLED);
>   else
> - vneg_inbuf.SecurityMode = 0;
> + pneg_inbuf->SecurityMode = 0;
> 
> 
>   if (strcmp(tcon->ses->server->vals->version_string,
Please check if strncmp() should be used or not.

>   SMB3ANY_VERSION_STRING) == 0) {
> - vneg_inbuf.Dialects[0] = cpu_to_le16(SMB30_PROT_ID);
> - vneg_inbuf.Dialects[1] = cpu_to_le16(SMB302_PROT_ID);
> - vneg_inbuf.DialectCount = cpu_to_le16(2);
> + pneg_inbuf->Dialects[0] = cpu_to_le16(SMB30_PROT_ID);
> + pneg_inbuf->Dialects[1] = cpu_to_le16(SMB302_PROT_ID);
> + pneg_inbuf->DialectCount = cpu_to_le16(2);
>   /* structure is big enough for 3 dialects, sending only 2 */
>   inbuflen = sizeof(struct validate_negotiate_info_req) - 2;
>   } else if (strcmp(tcon->ses->server->vals->version_string,
>   SMBDEFAULT_VERSION_STRING) == 0) {
> - vneg_inbuf.Dialects[0] = cpu_to_le16(SMB21_PROT_ID);
> - vneg_inbuf.Dialects[1] = cpu_to_le16(SMB30_PROT_ID);
> - vneg_inbuf.Dialects[2] = cpu_to_le16(SMB302_PROT_ID);
> - vneg_inbuf.DialectCount = cpu_to_le16(3);
> + pneg_inbuf->Dialects[0] = cpu_to_le16(S

RE: [PATCH 2/6] cifs: Allocate validate negoation request through kmalloc

2018-04-16 Thread Parav Pandit


> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> ow...@vger.kernel.org] On Behalf Of Long Li
> Sent: Monday, April 16, 2018 7:49 PM
> To: Steve French ; linux-c...@vger.kernel.org; samba-
> techni...@lists.samba.org; linux-kernel@vger.kernel.org; linux-
> r...@vger.kernel.org; sta...@vger.kernel.org
> Cc: longli 
> Subject: [PATCH 2/6] cifs: Allocate validate negoation request through kmalloc
> 
> From: Long Li 
> 
> The data buffer allocated on the stack can't be DMA'ed, and hence can't send
> through RDMA via SMB Direct.
> 
> Fix this by allocating the request on the heap in smb3_validate_negotiate.
> 
Please provide Fixes ("12-letters commit id") "commit string" which introduced 
this issue and it is getting fixed here, so that other can apply this fix to 
older versions.

> Signed-off-by: Long Li 
> ---
>  fs/cifs/smb2pdu.c | 38 ++
>  1 file changed, 22 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c index 0f044c4..abbefe2
> 100644
> --- a/fs/cifs/smb2pdu.c
> +++ b/fs/cifs/smb2pdu.c
> @@ -730,7 +730,7 @@ SMB2_negotiate(const unsigned int xid, struct cifs_ses
> *ses)  int smb3_validate_negotiate(const unsigned int xid, struct cifs_tcon
> *tcon)  {
>   int rc = 0;
> - struct validate_negotiate_info_req vneg_inbuf;
> + struct validate_negotiate_info_req *pneg_inbuf;
[..]
 
>   rc = SMB2_ioctl(xid, tcon, NO_FILE_ID, NO_FILE_ID,
>   FSCTL_VALIDATE_NEGOTIATE_INFO, true /* is_fsctl */,
> - (char *)_inbuf, sizeof(struct
> validate_negotiate_info_req),
> + (char *)pneg_inbuf, sizeof(struct validate_negotiate_info_req),
>   (char **)_rsp, );
> 
>   if (rc != 0) {
>   cifs_dbg(VFS, "validate protocol negotiate failed: %d\n", rc);
> + kfree(pneg_inbuf);
>   return -EIO;
Instead of duplicating code here, please jump to err_rsp_free label. Kfree() 
takes care to not panic for NULL pointer.
Or for clarity define new label.

>   }
> 
> @@ -838,12 +842,14 @@ int smb3_validate_negotiate(const unsigned int xid,
> struct cifs_tcon *tcon)
> 
>   /* validate negotiate successful */
>   cifs_dbg(FYI, "validate negotiate info successful\n");
> + kfree(pneg_inbuf);
>   kfree(pneg_rsp);
>   return 0;
> 
>  vneg_out:
>   cifs_dbg(VFS, "protocol revalidation - security settings mismatch\n");
>  err_rsp_free:
> + kfree(pneg_inbuf);
>   kfree(pneg_rsp);
>   return -EIO;
>  }
> --
> 2.7.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the 
> body
> of a message to majord...@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html


RE: [PATCH 2/6] cifs: Allocate validate negoation request through kmalloc

2018-04-16 Thread Parav Pandit


> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> ow...@vger.kernel.org] On Behalf Of Long Li
> Sent: Monday, April 16, 2018 7:49 PM
> To: Steve French ; linux-c...@vger.kernel.org; samba-
> techni...@lists.samba.org; linux-kernel@vger.kernel.org; linux-
> r...@vger.kernel.org; sta...@vger.kernel.org
> Cc: longli 
> Subject: [PATCH 2/6] cifs: Allocate validate negoation request through kmalloc
> 
> From: Long Li 
> 
> The data buffer allocated on the stack can't be DMA'ed, and hence can't send
> through RDMA via SMB Direct.
> 
> Fix this by allocating the request on the heap in smb3_validate_negotiate.
> 
Please provide Fixes ("12-letters commit id") "commit string" which introduced 
this issue and it is getting fixed here, so that other can apply this fix to 
older versions.

> Signed-off-by: Long Li 
> ---
>  fs/cifs/smb2pdu.c | 38 ++
>  1 file changed, 22 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c index 0f044c4..abbefe2
> 100644
> --- a/fs/cifs/smb2pdu.c
> +++ b/fs/cifs/smb2pdu.c
> @@ -730,7 +730,7 @@ SMB2_negotiate(const unsigned int xid, struct cifs_ses
> *ses)  int smb3_validate_negotiate(const unsigned int xid, struct cifs_tcon
> *tcon)  {
>   int rc = 0;
> - struct validate_negotiate_info_req vneg_inbuf;
> + struct validate_negotiate_info_req *pneg_inbuf;
[..]
 
>   rc = SMB2_ioctl(xid, tcon, NO_FILE_ID, NO_FILE_ID,
>   FSCTL_VALIDATE_NEGOTIATE_INFO, true /* is_fsctl */,
> - (char *)_inbuf, sizeof(struct
> validate_negotiate_info_req),
> + (char *)pneg_inbuf, sizeof(struct validate_negotiate_info_req),
>   (char **)_rsp, );
> 
>   if (rc != 0) {
>   cifs_dbg(VFS, "validate protocol negotiate failed: %d\n", rc);
> + kfree(pneg_inbuf);
>   return -EIO;
Instead of duplicating code here, please jump to err_rsp_free label. Kfree() 
takes care to not panic for NULL pointer.
Or for clarity define new label.

>   }
> 
> @@ -838,12 +842,14 @@ int smb3_validate_negotiate(const unsigned int xid,
> struct cifs_tcon *tcon)
> 
>   /* validate negotiate successful */
>   cifs_dbg(FYI, "validate negotiate info successful\n");
> + kfree(pneg_inbuf);
>   kfree(pneg_rsp);
>   return 0;
> 
>  vneg_out:
>   cifs_dbg(VFS, "protocol revalidation - security settings mismatch\n");
>  err_rsp_free:
> + kfree(pneg_inbuf);
>   kfree(pneg_rsp);
>   return -EIO;
>  }
> --
> 2.7.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the 
> body
> of a message to majord...@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html


RE: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant

2018-02-20 Thread Parav Pandit
Hi Arnd Bergmann,

> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> ow...@vger.kernel.org] On Behalf Of Arnd Bergmann
> Sent: Tuesday, February 20, 2018 2:59 PM
> To: Doug Ledford ; Jason Gunthorpe 
> Cc: Arnd Bergmann ; Leon Romanovsky
> ; Sagi Grimberg ; Bart Van Assche
> ; linux-r...@vger.kernel.org; linux-
> ker...@vger.kernel.org
> Subject: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant
> 
> The ib_wc structure has grown to much that putting 16 of them on the stack 
> hits
> the warning limit for dangerous kernel stack consumption:
> 
> drivers/infiniband/core/cq.c: In function 'ib_process_cq_direct':
> drivers/infiniband/core/cq.c:78:1: error: the frame size of 1032 bytes is 
> larger
> than 1024 bytes [-Werror=frame-larger-than=]
> 
> Using half that number brings us comfortably below that limit again.
> 
> Fixes: 02d8883f520e ("RDMA/restrack: Add general infrastructure to track
> RDMA resources")

It is not clear to me how above commit 02d8883f520e introduced this stack issue.

Bodong and I came across ib_wc size increase in [1] and it was fixed in [2].
Did you hit this error after/before applying patch [2]?

[1] https://www.spinics.net/lists/linux-rdma/msg50754.html
[2] https://patchwork.kernel.org/patch/10159623/


RE: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant

2018-02-20 Thread Parav Pandit
Hi Arnd Bergmann,

> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> ow...@vger.kernel.org] On Behalf Of Arnd Bergmann
> Sent: Tuesday, February 20, 2018 2:59 PM
> To: Doug Ledford ; Jason Gunthorpe 
> Cc: Arnd Bergmann ; Leon Romanovsky
> ; Sagi Grimberg ; Bart Van Assche
> ; linux-r...@vger.kernel.org; linux-
> ker...@vger.kernel.org
> Subject: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant
> 
> The ib_wc structure has grown to much that putting 16 of them on the stack 
> hits
> the warning limit for dangerous kernel stack consumption:
> 
> drivers/infiniband/core/cq.c: In function 'ib_process_cq_direct':
> drivers/infiniband/core/cq.c:78:1: error: the frame size of 1032 bytes is 
> larger
> than 1024 bytes [-Werror=frame-larger-than=]
> 
> Using half that number brings us comfortably below that limit again.
> 
> Fixes: 02d8883f520e ("RDMA/restrack: Add general infrastructure to track
> RDMA resources")

It is not clear to me how above commit 02d8883f520e introduced this stack issue.

Bodong and I came across ib_wc size increase in [1] and it was fixed in [2].
Did you hit this error after/before applying patch [2]?

[1] https://www.spinics.net/lists/linux-rdma/msg50754.html
[2] https://patchwork.kernel.org/patch/10159623/


RE: [PATCH v3] Documentation/ABI: update infiniband sysfs interfaces

2018-02-07 Thread Parav Pandit
Hi Aishwarya,


> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> ow...@vger.kernel.org] On Behalf Of Aishwarya Pant
> Sent: Tuesday, February 06, 2018 11:00 PM
> To: Doug Ledford ; Jason Gunthorpe ;
> linux-r...@vger.kernel.org; linux-kernel@vger.kernel.org; Jonathan Corbet
> ; Greg KH 
> Cc: Julia Lawall 
> Subject: [PATCH v3] Documentation/ABI: update infiniband sysfs interfaces
> 
> Add documentation for core and hardware specific infiniband interfaces.
> The descriptions have been collected from git commit logs, reading through
> code and data sheets. Some drivers have incomplete doc and are annotated
> with the comment '[to be documented]'.
> 
> Signed-off-by: Aishwarya Pant 
> ---
> Changes in v3:
> -  outbound -> inbound in description of port_rcv_constraint_errors
> v2:
> - Move infiniband interface from testing to stable
> - Fix typos
> - Update description of cap_mask, port_xmit_constraint_errors and
>   port_rcv_constraint_errors
> - Add doc for hw_counters
> - Remove old documentation
> 
>  Documentation/ABI/stable/sysfs-class-infiniband  | 818
> +++  Documentation/ABI/testing/sysfs-class-infiniband
> |  16 -
>  Documentation/infiniband/sysfs.txt   | 129 +---
>  3 files changed, 820 insertions(+), 143 deletions(-)  create mode 100644
> Documentation/ABI/stable/sysfs-class-infiniband
>  delete mode 100644 Documentation/ABI/testing/sysfs-class-infiniband
> 
> diff --git a/Documentation/ABI/stable/sysfs-class-infiniband
> b/Documentation/ABI/stable/sysfs-class-infiniband
> new file mode 100644
> index ..f3acf3713a91
> --- /dev/null
> +++ b/Documentation/ABI/stable/sysfs-class-infiniband
> @@ -0,0 +1,818 @@
> +sysfs interface common for all infiniband devices
> +-
> +

> +What:/sys/class/infiniband//ports/ number>/gid_attrs/ndevs/
> +Date:November 29, 2015
> +KernelVersion:   4.4.0
> +Contact: linux-r...@vger.kernel.org
> +Description: The net-device's name associated with the GID resides
> + at index .
> +
> +What:/sys/class/infiniband//ports/ number>/gid_attrs/types/
> +Date:November 29, 2015
> +KernelVersion:   4.4.0
> +Contact: linux-r...@vger.kernel.org
> +Description: The RoCE type of the associated GID resides at index 
>  index>.
> + This could either be "IB/RoCE v1" for IB and RoCE v1 based
> GODs
GIDs.



RE: [PATCH v3] Documentation/ABI: update infiniband sysfs interfaces

2018-02-07 Thread Parav Pandit
Hi Aishwarya,


> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> ow...@vger.kernel.org] On Behalf Of Aishwarya Pant
> Sent: Tuesday, February 06, 2018 11:00 PM
> To: Doug Ledford ; Jason Gunthorpe ;
> linux-r...@vger.kernel.org; linux-kernel@vger.kernel.org; Jonathan Corbet
> ; Greg KH 
> Cc: Julia Lawall 
> Subject: [PATCH v3] Documentation/ABI: update infiniband sysfs interfaces
> 
> Add documentation for core and hardware specific infiniband interfaces.
> The descriptions have been collected from git commit logs, reading through
> code and data sheets. Some drivers have incomplete doc and are annotated
> with the comment '[to be documented]'.
> 
> Signed-off-by: Aishwarya Pant 
> ---
> Changes in v3:
> -  outbound -> inbound in description of port_rcv_constraint_errors
> v2:
> - Move infiniband interface from testing to stable
> - Fix typos
> - Update description of cap_mask, port_xmit_constraint_errors and
>   port_rcv_constraint_errors
> - Add doc for hw_counters
> - Remove old documentation
> 
>  Documentation/ABI/stable/sysfs-class-infiniband  | 818
> +++  Documentation/ABI/testing/sysfs-class-infiniband
> |  16 -
>  Documentation/infiniband/sysfs.txt   | 129 +---
>  3 files changed, 820 insertions(+), 143 deletions(-)  create mode 100644
> Documentation/ABI/stable/sysfs-class-infiniband
>  delete mode 100644 Documentation/ABI/testing/sysfs-class-infiniband
> 
> diff --git a/Documentation/ABI/stable/sysfs-class-infiniband
> b/Documentation/ABI/stable/sysfs-class-infiniband
> new file mode 100644
> index ..f3acf3713a91
> --- /dev/null
> +++ b/Documentation/ABI/stable/sysfs-class-infiniband
> @@ -0,0 +1,818 @@
> +sysfs interface common for all infiniband devices
> +-
> +

> +What:/sys/class/infiniband//ports/ number>/gid_attrs/ndevs/
> +Date:November 29, 2015
> +KernelVersion:   4.4.0
> +Contact: linux-r...@vger.kernel.org
> +Description: The net-device's name associated with the GID resides
> + at index .
> +
> +What:/sys/class/infiniband//ports/ number>/gid_attrs/types/
> +Date:November 29, 2015
> +KernelVersion:   4.4.0
> +Contact: linux-r...@vger.kernel.org
> +Description: The RoCE type of the associated GID resides at index 
>  index>.
> + This could either be "IB/RoCE v1" for IB and RoCE v1 based
> GODs
GIDs.



RE: [PATCH] IB/CM: fix memory corruption by avoiding unnecessary memset

2017-11-02 Thread Parav Pandit
Hi Qing,

> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> ow...@vger.kernel.org] On Behalf Of Qing Huang
> Sent: Thursday, November 02, 2017 6:22 PM
> To: linux-r...@vger.kernel.org; linux-kernel@vger.kernel.org
> Cc: dledf...@redhat.com; sean.he...@intel.com; hal.rosenst...@gmail.com;
> ira.we...@intel.com; Mark Bloch ; Qing Huang
> 
> Subject: [PATCH] IB/CM: fix memory corruption by avoiding unnecessary
> memset
> 
> The size of path array could be dynamic. However the fixed number(2) of
> memset could cause memory corruption by writing into wrong memory space.
> 
> Fixes: 9fdca4da4d8c (IB/SA: Split struct sa_path_rec based on IB ands
>   ROCE specific fields)
> 
> Signed-off-by: Qing Huang 
> ---
>  drivers/infiniband/core/cm.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index
> 4c4b465..af4f6a0 100644
> --- a/drivers/infiniband/core/cm.c
> +++ b/drivers/infiniband/core/cm.c
> @@ -1856,7 +1856,9 @@ static int cm_req_handler(struct cm_work *work)
>   cm_process_routed_req(req_msg, work->mad_recv_wc->wc);
> 
>   memset(>path[0], 0, sizeof(work->path[0]));
> - memset(>path[1], 0, sizeof(work->path[1]));
> + if (cm_req_has_alt_path(req_msg))
> + memset(>path[1], 0, sizeof(work->path[1]));
> +
>   grh = rdma_ah_read_grh(_id_priv->av.ah_attr);
>   ret = ib_get_cached_gid(work->port->cm_dev->ib_device,
>   work->port->port_num,
> @@ -3823,8 +3825,8 @@ static void cm_recv_handler(struct ib_mad_agent
> *mad_agent,
> 
>   switch (mad_recv_wc->recv_buf.mad->mad_hdr.attr_id) {
>   case CM_REQ_ATTR_ID:
> - paths = 1 + (((struct cm_req_msg *) mad_recv_wc-
> >recv_buf.mad)->
> - alt_local_lid != 0);
> + paths = 1 + cm_req_has_alt_path(
> + (struct cm_req_msg *)mad_recv_wc-
> >recv_buf.mad);
>   event = IB_CM_REQ_RECEIVED;
>   break;
>   case CM_MRA_ATTR_ID:
> --
> 2.9.3
> 
Thanks for the patch. Few weeks back I came across this bug and fix [1] is 
merged now by Doug.
[1] has one additional fix in cm_format_req_event() function as well.

[1] https://patchwork.kernel.org/patch/10015997/


RE: [PATCH] IB/CM: fix memory corruption by avoiding unnecessary memset

2017-11-02 Thread Parav Pandit
Hi Qing,

> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> ow...@vger.kernel.org] On Behalf Of Qing Huang
> Sent: Thursday, November 02, 2017 6:22 PM
> To: linux-r...@vger.kernel.org; linux-kernel@vger.kernel.org
> Cc: dledf...@redhat.com; sean.he...@intel.com; hal.rosenst...@gmail.com;
> ira.we...@intel.com; Mark Bloch ; Qing Huang
> 
> Subject: [PATCH] IB/CM: fix memory corruption by avoiding unnecessary
> memset
> 
> The size of path array could be dynamic. However the fixed number(2) of
> memset could cause memory corruption by writing into wrong memory space.
> 
> Fixes: 9fdca4da4d8c (IB/SA: Split struct sa_path_rec based on IB ands
>   ROCE specific fields)
> 
> Signed-off-by: Qing Huang 
> ---
>  drivers/infiniband/core/cm.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index
> 4c4b465..af4f6a0 100644
> --- a/drivers/infiniband/core/cm.c
> +++ b/drivers/infiniband/core/cm.c
> @@ -1856,7 +1856,9 @@ static int cm_req_handler(struct cm_work *work)
>   cm_process_routed_req(req_msg, work->mad_recv_wc->wc);
> 
>   memset(>path[0], 0, sizeof(work->path[0]));
> - memset(>path[1], 0, sizeof(work->path[1]));
> + if (cm_req_has_alt_path(req_msg))
> + memset(>path[1], 0, sizeof(work->path[1]));
> +
>   grh = rdma_ah_read_grh(_id_priv->av.ah_attr);
>   ret = ib_get_cached_gid(work->port->cm_dev->ib_device,
>   work->port->port_num,
> @@ -3823,8 +3825,8 @@ static void cm_recv_handler(struct ib_mad_agent
> *mad_agent,
> 
>   switch (mad_recv_wc->recv_buf.mad->mad_hdr.attr_id) {
>   case CM_REQ_ATTR_ID:
> - paths = 1 + (((struct cm_req_msg *) mad_recv_wc-
> >recv_buf.mad)->
> - alt_local_lid != 0);
> + paths = 1 + cm_req_has_alt_path(
> + (struct cm_req_msg *)mad_recv_wc-
> >recv_buf.mad);
>   event = IB_CM_REQ_RECEIVED;
>   break;
>   case CM_MRA_ATTR_ID:
> --
> 2.9.3
> 
Thanks for the patch. Few weeks back I came across this bug and fix [1] is 
merged now by Doug.
[1] has one additional fix in cm_format_req_event() function as well.

[1] https://patchwork.kernel.org/patch/10015997/


RE: [PATCH] IB/mlx5: give back valid speed/width even without plugged in SFP module

2017-10-27 Thread Parav Pandit


> -Original Message-
> From: Hal Rosenstock [mailto:h...@dev.mellanox.co.il]
> Sent: Friday, October 27, 2017 3:19 PM
> To: Parav Pandit <pa...@mellanox.com>; Thomas Bogendoerfer
> <tbogendoer...@suse.de>; Matan Barak <mat...@mellanox.com>; Leon
> Romanovsky <leo...@mellanox.com>; Doug Ledford <dledf...@redhat.com>;
> linux-r...@vger.kernel.org; linux-kernel@vger.kernel.org
> Cc: Ghazale Hosseinabadi <ghazale.hosseinab...@oracle.com>
> Subject: Re: [PATCH] IB/mlx5: give back valid speed/width even without plugged
> in SFP module
> 
> On 10/27/2017 2:32 PM, Parav Pandit wrote:
> > However I believe that ibstat tool should be enhanced to report unknown port
> speed instead of expecting drivers to supply some random number like this.
> 
> ibstat gets the rate from libibumad via /sys/class/infiniband/ device>/ports//rate file which is supposed to be populated by the
> driver. Is there no rate file in this error case ?
> 
<...>//rate file exist.

rate_show() has invalid active_width as expected due to nonexistence of SFP.
So sysfs call return invalid value.
We don't have invalid_active_width defined right now.
So ibstat and other applications should not crash on such valid errors.



RE: [PATCH] IB/mlx5: give back valid speed/width even without plugged in SFP module

2017-10-27 Thread Parav Pandit


> -Original Message-
> From: Hal Rosenstock [mailto:h...@dev.mellanox.co.il]
> Sent: Friday, October 27, 2017 3:19 PM
> To: Parav Pandit ; Thomas Bogendoerfer
> ; Matan Barak ; Leon
> Romanovsky ; Doug Ledford ;
> linux-r...@vger.kernel.org; linux-kernel@vger.kernel.org
> Cc: Ghazale Hosseinabadi 
> Subject: Re: [PATCH] IB/mlx5: give back valid speed/width even without plugged
> in SFP module
> 
> On 10/27/2017 2:32 PM, Parav Pandit wrote:
> > However I believe that ibstat tool should be enhanced to report unknown port
> speed instead of expecting drivers to supply some random number like this.
> 
> ibstat gets the rate from libibumad via /sys/class/infiniband/ device>/ports//rate file which is supposed to be populated by the
> driver. Is there no rate file in this error case ?
> 
<...>//rate file exist.

rate_show() has invalid active_width as expected due to nonexistence of SFP.
So sysfs call return invalid value.
We don't have invalid_active_width defined right now.
So ibstat and other applications should not crash on such valid errors.



RE: [PATCH] IB/mlx5: give back valid speed/width even without plugged in SFP module

2017-10-27 Thread Parav Pandit


> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> ow...@vger.kernel.org] On Behalf Of Thomas Bogendoerfer
> Sent: Friday, October 27, 2017 7:30 AM
> To: Matan Barak ; Leon Romanovsky
> ; Doug Ledford ; linux-
> r...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: [PATCH] IB/mlx5: give back valid speed/width even without plugged in
> SFP module
> 
> If there is no SFP module plugged into a port of mlx5 cards 'cat
> /sys/class/infniband/mlx5_X/ports/1/rate' returns Invalid argument.
> This causes tools like 'ibstat' to malfunction. This change adjusts mlx5 to 
> all
> other RoCE/iWarp drivers, which always return valid speed/width.
> 
> Signed-off-by: Thomas Bogendoerfer 
> ---
>  drivers/infiniband/hw/mlx5/main.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/hw/mlx5/main.c
> b/drivers/infiniband/hw/mlx5/main.c
> index 260f8be1d0ed..4388618e3434 100644
> --- a/drivers/infiniband/hw/mlx5/main.c
> +++ b/drivers/infiniband/hw/mlx5/main.c
> @@ -246,7 +246,10 @@ static int translate_eth_proto_oper(u32
> eth_proto_oper, u8 *active_speed,
>   *active_speed = IB_SPEED_EDR;
>   break;
>   default:
> - return -EINVAL;
> + /* Unknown */
> + *active_width = IB_WIDTH_1X;
> + *active_speed = IB_SPEED_SDR;
> + break;
>   }
> 
>   return 0;
> --
> 2.12.3

Similar issue was reported by Ghazale in offline email and she also provided 
similar patch.
I added her in this mail thread.
Please add below reported-by tag if you find it appropriate.
Reported-by: Ghazale Hosseinabadi 

Thanks for the short term fix.
However I believe that ibstat tool should be enhanced to report unknown port 
speed instead of expecting drivers to supply some random number like this.
Similar tools such as ethtool does report unknown port speed as unknown like 
below output which doesn't have SFP.

ethtool ens2f0
<>
Speed: Unknown!
Duplex: Unknown! (255)
<>
Link detected: no


RE: [PATCH] IB/mlx5: give back valid speed/width even without plugged in SFP module

2017-10-27 Thread Parav Pandit


> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> ow...@vger.kernel.org] On Behalf Of Thomas Bogendoerfer
> Sent: Friday, October 27, 2017 7:30 AM
> To: Matan Barak ; Leon Romanovsky
> ; Doug Ledford ; linux-
> r...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: [PATCH] IB/mlx5: give back valid speed/width even without plugged in
> SFP module
> 
> If there is no SFP module plugged into a port of mlx5 cards 'cat
> /sys/class/infniband/mlx5_X/ports/1/rate' returns Invalid argument.
> This causes tools like 'ibstat' to malfunction. This change adjusts mlx5 to 
> all
> other RoCE/iWarp drivers, which always return valid speed/width.
> 
> Signed-off-by: Thomas Bogendoerfer 
> ---
>  drivers/infiniband/hw/mlx5/main.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/hw/mlx5/main.c
> b/drivers/infiniband/hw/mlx5/main.c
> index 260f8be1d0ed..4388618e3434 100644
> --- a/drivers/infiniband/hw/mlx5/main.c
> +++ b/drivers/infiniband/hw/mlx5/main.c
> @@ -246,7 +246,10 @@ static int translate_eth_proto_oper(u32
> eth_proto_oper, u8 *active_speed,
>   *active_speed = IB_SPEED_EDR;
>   break;
>   default:
> - return -EINVAL;
> + /* Unknown */
> + *active_width = IB_WIDTH_1X;
> + *active_speed = IB_SPEED_SDR;
> + break;
>   }
> 
>   return 0;
> --
> 2.12.3

Similar issue was reported by Ghazale in offline email and she also provided 
similar patch.
I added her in this mail thread.
Please add below reported-by tag if you find it appropriate.
Reported-by: Ghazale Hosseinabadi 

Thanks for the short term fix.
However I believe that ibstat tool should be enhanced to report unknown port 
speed instead of expecting drivers to supply some random number like this.
Similar tools such as ethtool does report unknown port speed as unknown like 
below output which doesn't have SFP.

ethtool ens2f0
<>
Speed: Unknown!
Duplex: Unknown! (255)
<>
Link detected: no


RE: [PATCH] checkpatch: Introduce check for format of Fixes line in commit log

2017-10-19 Thread Parav Pandit


> -Original Message-
> From: Joe Perches [mailto:j...@perches.com]
> Sent: Thursday, October 19, 2017 12:57 AM
> To: Parav Pandit <pa...@mellanox.com>; a...@canonical.com
> Cc: linux-kernel@vger.kernel.org; linux-r...@vger.kernel.org; Brad Erickson
> <brad...@mellanox.com>
> Subject: Re: [PATCH] checkpatch: Introduce check for format of Fixes line in
> commit log
> 
> On Thu, 2017-10-19 at 05:52 +, Parav Pandit wrote:
> > Hi Joe,
> 
> Hello Parav
> 
> > > -Original Message-
> > > From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> > > ow...@vger.kernel.org] On Behalf Of Parav Pandit
> > > Sent: Tuesday, October 10, 2017 5:44 PM
> > > To: j...@perches.com; a...@canonical.com
> > > Cc: linux-kernel@vger.kernel.org; linux-r...@vger.kernel.org; Parav
> > > Pandit <pa...@mellanox.com>; Brad Erickson <brad...@mellanox.com>
> > > Subject: [PATCH] checkpatch: Introduce check for format of Fixes
> > > line in commit log
> > >
> > > This patch introduces a format check for 'Fixes' line in commit log
> > > for 12 characters commit id and format for Fixes as ("...").
> 
> I think this doesn't handle case like:
> 
> Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=58735

I see such commits now. I will cover for such an additional format.
Fixes: 
Will send v1.


RE: [PATCH] checkpatch: Introduce check for format of Fixes line in commit log

2017-10-19 Thread Parav Pandit


> -Original Message-
> From: Joe Perches [mailto:j...@perches.com]
> Sent: Thursday, October 19, 2017 12:57 AM
> To: Parav Pandit ; a...@canonical.com
> Cc: linux-kernel@vger.kernel.org; linux-r...@vger.kernel.org; Brad Erickson
> 
> Subject: Re: [PATCH] checkpatch: Introduce check for format of Fixes line in
> commit log
> 
> On Thu, 2017-10-19 at 05:52 +, Parav Pandit wrote:
> > Hi Joe,
> 
> Hello Parav
> 
> > > -Original Message-
> > > From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> > > ow...@vger.kernel.org] On Behalf Of Parav Pandit
> > > Sent: Tuesday, October 10, 2017 5:44 PM
> > > To: j...@perches.com; a...@canonical.com
> > > Cc: linux-kernel@vger.kernel.org; linux-r...@vger.kernel.org; Parav
> > > Pandit ; Brad Erickson 
> > > Subject: [PATCH] checkpatch: Introduce check for format of Fixes
> > > line in commit log
> > >
> > > This patch introduces a format check for 'Fixes' line in commit log
> > > for 12 characters commit id and format for Fixes as ("...").
> 
> I think this doesn't handle case like:
> 
> Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=58735

I see such commits now. I will cover for such an additional format.
Fixes: 
Will send v1.


RE: [PATCH] checkpatch: Introduce check for format of Fixes line in commit log

2017-10-18 Thread Parav Pandit
Hi Joe,

> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> ow...@vger.kernel.org] On Behalf Of Parav Pandit
> Sent: Tuesday, October 10, 2017 5:44 PM
> To: j...@perches.com; a...@canonical.com
> Cc: linux-kernel@vger.kernel.org; linux-r...@vger.kernel.org; Parav Pandit
> <pa...@mellanox.com>; Brad Erickson <brad...@mellanox.com>
> Subject: [PATCH] checkpatch: Introduce check for format of Fixes line in 
> commit
> log
> 
> This patch introduces a format check for 'Fixes' line in commit log for 12
> characters commit id and format for Fixes as ("...").
> 
> Its created against linux-rdma Doug's tree [1].
> 
> [1] git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma.git
> 
> Signed-off-by: Parav Pandit <pa...@mellanox.com>
> Signed-off-by: Brad Erickson <brad...@mellanox.com>
> ---
>  scripts/checkpatch.pl | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index
> dd2c262..7d933e4 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -2548,6 +2548,14 @@ sub process {
> "Remove Gerrit Change-Id's before submitting
> upstream.\n" . $herecurr);
>   }
> 
> +# Check for incorrect format for Fixes line in commit log
> + if ($in_commit_log && $line =~ /^\s*Fixes:/i) {
> + if ($line !~ /^\s*Fixes: [a-z0-9]{12} \(\".*?\"\)$/i) {
> + ERROR("FIXES_FORMAT",
> +   "Follow format of Fixes: <12 characters
> commit id> (\"...\")\n" . $herecurr);
> + }
> + }
> +
>  # Check if the commit log is in a possible stack dump
>   if ($in_commit_log && !$commit_log_possible_stack_dump &&
>   ($line =~ /^\s*(?:WARNING:|BUG:)/ ||
> --

Did you get a chance to review this minor improvement patch?
Does it look ok?
Can it be pushed to Linux-rdma tree?


RE: [PATCH] checkpatch: Introduce check for format of Fixes line in commit log

2017-10-18 Thread Parav Pandit
Hi Joe,

> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> ow...@vger.kernel.org] On Behalf Of Parav Pandit
> Sent: Tuesday, October 10, 2017 5:44 PM
> To: j...@perches.com; a...@canonical.com
> Cc: linux-kernel@vger.kernel.org; linux-r...@vger.kernel.org; Parav Pandit
> ; Brad Erickson 
> Subject: [PATCH] checkpatch: Introduce check for format of Fixes line in 
> commit
> log
> 
> This patch introduces a format check for 'Fixes' line in commit log for 12
> characters commit id and format for Fixes as ("...").
> 
> Its created against linux-rdma Doug's tree [1].
> 
> [1] git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma.git
> 
> Signed-off-by: Parav Pandit 
> Signed-off-by: Brad Erickson 
> ---
>  scripts/checkpatch.pl | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index
> dd2c262..7d933e4 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -2548,6 +2548,14 @@ sub process {
> "Remove Gerrit Change-Id's before submitting
> upstream.\n" . $herecurr);
>   }
> 
> +# Check for incorrect format for Fixes line in commit log
> + if ($in_commit_log && $line =~ /^\s*Fixes:/i) {
> + if ($line !~ /^\s*Fixes: [a-z0-9]{12} \(\".*?\"\)$/i) {
> + ERROR("FIXES_FORMAT",
> +   "Follow format of Fixes: <12 characters
> commit id> (\"...\")\n" . $herecurr);
> + }
> + }
> +
>  # Check if the commit log is in a possible stack dump
>   if ($in_commit_log && !$commit_log_possible_stack_dump &&
>   ($line =~ /^\s*(?:WARNING:|BUG:)/ ||
> --

Did you get a chance to review this minor improvement patch?
Does it look ok?
Can it be pushed to Linux-rdma tree?


Re: [PATCH] IB/cma: make config_item_type const

2017-10-12 Thread Parav Pandit
On Thu, Oct 12, 2017 at 10:04 AM, Bhumika Goyal  wrote:
> On Thu, Oct 12, 2017 at 2:20 PM, Bhumika Goyal  wrote:
>> This is a followup patch for: https://lkml.org/lkml/2017/10/11/375 and
>> https://patchwork.kernel.org/patch/649/
>>
>> Make these structures const as they are either passed to the functions
>> having the argument as const or stored as a reference in the "ci_type"
>> const field of a config_item structure.
>>
>> Done using Coccienlle.
>>
>
> This patch is dependent on the patches in the links
> https://lkml.org/lkml/2017/10/11/375 and
> https://patchwork.kernel.org/patch/649/. Therefore, this patch
> won't be correct unless the patches in these links gets applied.
>
>> Signed-off-by: Bhumika Goyal 
>> ---
>>  drivers/infiniband/core/cma_configfs.c | 8 
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/infiniband/core/cma_configfs.c 
>> b/drivers/infiniband/core/cma_configfs.c
>> index 54076a3..31dfee0 100644
>> --- a/drivers/infiniband/core/cma_configfs.c
>> +++ b/drivers/infiniband/core/cma_configfs.c
>> @@ -186,7 +186,7 @@ static ssize_t default_roce_tos_store(struct config_item 
>> *item,
>> NULL,
>>  };
>>
>> -static struct config_item_type cma_port_group_type = {
>> +static const struct config_item_type cma_port_group_type = {
>> .ct_attrs   = cma_configfs_attributes,
>> .ct_owner   = THIS_MODULE
>>  };
>> @@ -263,7 +263,7 @@ static void release_cma_ports_group(struct config_item  
>> *item)
>> .release = release_cma_ports_group
>>  };
>>
>> -static struct config_item_type cma_ports_group_type = {
>> +static const struct config_item_type cma_ports_group_type = {
>> .ct_item_ops= _ports_item_ops,
>> .ct_owner   = THIS_MODULE
>>  };
>> @@ -272,7 +272,7 @@ static void release_cma_ports_group(struct config_item  
>> *item)
>> .release = release_cma_dev
>>  };
>>
>> -static struct config_item_type cma_device_group_type = {
>> +static const struct config_item_type cma_device_group_type = {
>> .ct_item_ops= _device_item_ops,
>> .ct_owner   = THIS_MODULE
>>  };
>> @@ -323,7 +323,7 @@ static struct config_group *make_cma_dev(struct 
>> config_group *group,
>> .make_group = make_cma_dev,
>>  };
>>
>> -static struct config_item_type cma_subsys_type = {
>> +static const struct config_item_type cma_subsys_type = {
>> .ct_group_ops   = _subsys_group_ops,
>> .ct_owner   = THIS_MODULE,
>>  };
>> --
>> 1.9.1
>>
> --
Patch subject should be RDMA/cma even though in past we have mix of
IB/cma and RDMA/cma patches.


Re: [PATCH] IB/cma: make config_item_type const

2017-10-12 Thread Parav Pandit
On Thu, Oct 12, 2017 at 10:04 AM, Bhumika Goyal  wrote:
> On Thu, Oct 12, 2017 at 2:20 PM, Bhumika Goyal  wrote:
>> This is a followup patch for: https://lkml.org/lkml/2017/10/11/375 and
>> https://patchwork.kernel.org/patch/649/
>>
>> Make these structures const as they are either passed to the functions
>> having the argument as const or stored as a reference in the "ci_type"
>> const field of a config_item structure.
>>
>> Done using Coccienlle.
>>
>
> This patch is dependent on the patches in the links
> https://lkml.org/lkml/2017/10/11/375 and
> https://patchwork.kernel.org/patch/649/. Therefore, this patch
> won't be correct unless the patches in these links gets applied.
>
>> Signed-off-by: Bhumika Goyal 
>> ---
>>  drivers/infiniband/core/cma_configfs.c | 8 
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/infiniband/core/cma_configfs.c 
>> b/drivers/infiniband/core/cma_configfs.c
>> index 54076a3..31dfee0 100644
>> --- a/drivers/infiniband/core/cma_configfs.c
>> +++ b/drivers/infiniband/core/cma_configfs.c
>> @@ -186,7 +186,7 @@ static ssize_t default_roce_tos_store(struct config_item 
>> *item,
>> NULL,
>>  };
>>
>> -static struct config_item_type cma_port_group_type = {
>> +static const struct config_item_type cma_port_group_type = {
>> .ct_attrs   = cma_configfs_attributes,
>> .ct_owner   = THIS_MODULE
>>  };
>> @@ -263,7 +263,7 @@ static void release_cma_ports_group(struct config_item  
>> *item)
>> .release = release_cma_ports_group
>>  };
>>
>> -static struct config_item_type cma_ports_group_type = {
>> +static const struct config_item_type cma_ports_group_type = {
>> .ct_item_ops= _ports_item_ops,
>> .ct_owner   = THIS_MODULE
>>  };
>> @@ -272,7 +272,7 @@ static void release_cma_ports_group(struct config_item  
>> *item)
>> .release = release_cma_dev
>>  };
>>
>> -static struct config_item_type cma_device_group_type = {
>> +static const struct config_item_type cma_device_group_type = {
>> .ct_item_ops= _device_item_ops,
>> .ct_owner   = THIS_MODULE
>>  };
>> @@ -323,7 +323,7 @@ static struct config_group *make_cma_dev(struct 
>> config_group *group,
>> .make_group = make_cma_dev,
>>  };
>>
>> -static struct config_item_type cma_subsys_type = {
>> +static const struct config_item_type cma_subsys_type = {
>> .ct_group_ops   = _subsys_group_ops,
>> .ct_owner   = THIS_MODULE,
>>  };
>> --
>> 1.9.1
>>
> --
Patch subject should be RDMA/cma even though in past we have mix of
IB/cma and RDMA/cma patches.


[PATCH] checkpatch: Introduce check for format of Fixes line in commit log

2017-10-10 Thread Parav Pandit
This patch introduces a format check for 'Fixes' line in commit log
for 12 characters commit id and format for Fixes as ("...").

Its created against linux-rdma Doug's tree [1].

[1] git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma.git

Signed-off-by: Parav Pandit <pa...@mellanox.com>
Signed-off-by: Brad Erickson <brad...@mellanox.com>
---
 scripts/checkpatch.pl | 8 
 1 file changed, 8 insertions(+)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index dd2c262..7d933e4 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -2548,6 +2548,14 @@ sub process {
  "Remove Gerrit Change-Id's before submitting 
upstream.\n" . $herecurr);
}
 
+# Check for incorrect format for Fixes line in commit log
+   if ($in_commit_log && $line =~ /^\s*Fixes:/i) {
+   if ($line !~ /^\s*Fixes: [a-z0-9]{12} \(\".*?\"\)$/i) {
+   ERROR("FIXES_FORMAT",
+ "Follow format of Fixes: <12 characters 
commit id> (\"...\")\n" . $herecurr);
+   }
+   }
+
 # Check if the commit log is in a possible stack dump
if ($in_commit_log && !$commit_log_possible_stack_dump &&
($line =~ /^\s*(?:WARNING:|BUG:)/ ||
-- 
1.8.3.1



[PATCH] checkpatch: Introduce check for format of Fixes line in commit log

2017-10-10 Thread Parav Pandit
This patch introduces a format check for 'Fixes' line in commit log
for 12 characters commit id and format for Fixes as ("...").

Its created against linux-rdma Doug's tree [1].

[1] git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma.git

Signed-off-by: Parav Pandit 
Signed-off-by: Brad Erickson 
---
 scripts/checkpatch.pl | 8 
 1 file changed, 8 insertions(+)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index dd2c262..7d933e4 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -2548,6 +2548,14 @@ sub process {
  "Remove Gerrit Change-Id's before submitting 
upstream.\n" . $herecurr);
}
 
+# Check for incorrect format for Fixes line in commit log
+   if ($in_commit_log && $line =~ /^\s*Fixes:/i) {
+   if ($line !~ /^\s*Fixes: [a-z0-9]{12} \(\".*?\"\)$/i) {
+   ERROR("FIXES_FORMAT",
+ "Follow format of Fixes: <12 characters 
commit id> (\"...\")\n" . $herecurr);
+   }
+   }
+
 # Check if the commit log is in a possible stack dump
if ($in_commit_log && !$commit_log_possible_stack_dump &&
($line =~ /^\s*(?:WARNING:|BUG:)/ ||
-- 
1.8.3.1



RE: [PATCH] IB/core: fix memory leak on ah on error return path

2017-08-08 Thread Parav Pandit
Hi,

I need to top post because comments are unrelated to past discussion.

rdma_ah_retrieve_dmac() can never fail for RoCE as its returning pointer from 
structure ah_attr.
Provider driver doesn't need to check for null pointer as ib/core would never 
call provider if it's not RoCE provider.
So this memory leak only exist in theory.

When its null, driver should WARN_ON/BUG_ON in extreme case, but that's not 
necessary either.

I have patch is progress under internal review that does nice small cleanup in 
many provider drivers that eliminates the check completely.
Waiting for Moni to finish the review.

Parav

> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> ow...@vger.kernel.org] On Behalf Of Johannes Thumshirn
> Sent: Tuesday, August 08, 2017 7:59 AM
> To: Colin Ian King 
> Cc: Lijun Ou ; Wei Hu ;
> Doug Ledford ; Sean Hefty ;
> Hal Rosenstock ; linux-r...@vger.kernel.org;
> kernel-janit...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] IB/core: fix memory leak on ah on error return path
> 
> On Tue, Aug 08, 2017 at 11:28:16AM +0100, Colin Ian King wrote:
> > I was using the same subject start as the patch that introduced the
> > memory leak and touched the same portion of code. I can resend if necessary.
> 
> I think having the hns prefix makes it clearer, as the patch doesn't touch IB 
> core
> code but hns code. The reference to the patch which introduced the leak is 
> given
> byu the fixes line.
> 
> Thanks,
>   Johannes
> 
> --
> Johannes Thumshirn  Storage
> jthumsh...@suse.de+49 911 74053 689
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg)
> Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the 
> body
> of a message to majord...@vger.kernel.org More majordomo info at
> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fvger.ke
> rnel.org%2Fmajordomo-
> info.html=02%7C01%7Cparav%40mellanox.com%7C4b396976866648806
> a1008d4de5d512c%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C63
> 6377939837218186=l9u%2Bc6N9E31L2I%2BRDBLNh7KuPi%2Fb49yAlbvw
> qZvGruk%3D=0


RE: [PATCH] IB/core: fix memory leak on ah on error return path

2017-08-08 Thread Parav Pandit
Hi,

I need to top post because comments are unrelated to past discussion.

rdma_ah_retrieve_dmac() can never fail for RoCE as its returning pointer from 
structure ah_attr.
Provider driver doesn't need to check for null pointer as ib/core would never 
call provider if it's not RoCE provider.
So this memory leak only exist in theory.

When its null, driver should WARN_ON/BUG_ON in extreme case, but that's not 
necessary either.

I have patch is progress under internal review that does nice small cleanup in 
many provider drivers that eliminates the check completely.
Waiting for Moni to finish the review.

Parav

> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> ow...@vger.kernel.org] On Behalf Of Johannes Thumshirn
> Sent: Tuesday, August 08, 2017 7:59 AM
> To: Colin Ian King 
> Cc: Lijun Ou ; Wei Hu ;
> Doug Ledford ; Sean Hefty ;
> Hal Rosenstock ; linux-r...@vger.kernel.org;
> kernel-janit...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] IB/core: fix memory leak on ah on error return path
> 
> On Tue, Aug 08, 2017 at 11:28:16AM +0100, Colin Ian King wrote:
> > I was using the same subject start as the patch that introduced the
> > memory leak and touched the same portion of code. I can resend if necessary.
> 
> I think having the hns prefix makes it clearer, as the patch doesn't touch IB 
> core
> code but hns code. The reference to the patch which introduced the leak is 
> given
> byu the fixes line.
> 
> Thanks,
>   Johannes
> 
> --
> Johannes Thumshirn  Storage
> jthumsh...@suse.de+49 911 74053 689
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg)
> Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the 
> body
> of a message to majord...@vger.kernel.org More majordomo info at
> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fvger.ke
> rnel.org%2Fmajordomo-
> info.html=02%7C01%7Cparav%40mellanox.com%7C4b396976866648806
> a1008d4de5d512c%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C63
> 6377939837218186=l9u%2Bc6N9E31L2I%2BRDBLNh7KuPi%2Fb49yAlbvw
> qZvGruk%3D=0


RE: [infiniband-core] question about arguments position

2017-05-04 Thread Parav Pandit
Hi,

> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> ow...@vger.kernel.org] On Behalf Of Gustavo A. R. Silva
> Sent: Thursday, May 4, 2017 12:42 PM
> To: Doug Ledford ; Sean Hefty
> ; Hal Rosenstock ; Sagi
> Grimberg ; Bart Van Assche
> ; Steve Wise ;
> Leon Romanovsky ; Yishai Hadas
> ; Moni Shoua 
> Cc: linux-r...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: [infiniband-core] question about arguments position
> 
> 
> Hello everybody,
> 
> While looking into Coverity ID 1351047 I ran into the following piece of code 
> at
> drivers/infiniband/core/verbs.c:496:
> 
> ret = rdma_addr_find_l2_eth_by_grh(, ,
> ah_attr->dmac,
> wc->wc_flags & IB_WC_WITH_VLAN ?
> NULL : _id,
> _index, );
> 
> 
> The issue here is that the position of arguments in the call to
> rdma_addr_find_l2_eth_by_grh() function do not match the order of the
> parameters:
> 
>  is passed to sgid
>  is passed to dgid
> 
> This is the function prototype:
> 
> int rdma_addr_find_l2_eth_by_grh(const union ib_gid *sgid,
>const union ib_gid *dgid,
>u8 *dmac, u16 *vlan_id, int *if_index,
>int *hoplimit)
> 
> My question here is if this is intentional?
> 
Yes. ib_init_ah_from_wc() creates ah from the incoming packet.
Incoming packet has dgid of the receiver node on which this code is getting 
executed
And sgid contains the GID of the sender.

When resolving mac address of destination, you use arrived dgid as sgid.
And use sgid as dgid because sgid contains destinations GID whom to respond to.


RE: [infiniband-core] question about arguments position

2017-05-04 Thread Parav Pandit
Hi,

> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> ow...@vger.kernel.org] On Behalf Of Gustavo A. R. Silva
> Sent: Thursday, May 4, 2017 12:42 PM
> To: Doug Ledford ; Sean Hefty
> ; Hal Rosenstock ; Sagi
> Grimberg ; Bart Van Assche
> ; Steve Wise ;
> Leon Romanovsky ; Yishai Hadas
> ; Moni Shoua 
> Cc: linux-r...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: [infiniband-core] question about arguments position
> 
> 
> Hello everybody,
> 
> While looking into Coverity ID 1351047 I ran into the following piece of code 
> at
> drivers/infiniband/core/verbs.c:496:
> 
> ret = rdma_addr_find_l2_eth_by_grh(, ,
> ah_attr->dmac,
> wc->wc_flags & IB_WC_WITH_VLAN ?
> NULL : _id,
> _index, );
> 
> 
> The issue here is that the position of arguments in the call to
> rdma_addr_find_l2_eth_by_grh() function do not match the order of the
> parameters:
> 
>  is passed to sgid
>  is passed to dgid
> 
> This is the function prototype:
> 
> int rdma_addr_find_l2_eth_by_grh(const union ib_gid *sgid,
>const union ib_gid *dgid,
>u8 *dmac, u16 *vlan_id, int *if_index,
>int *hoplimit)
> 
> My question here is if this is intentional?
> 
Yes. ib_init_ah_from_wc() creates ah from the incoming packet.
Incoming packet has dgid of the receiver node on which this code is getting 
executed
And sgid contains the GID of the sender.

When resolving mac address of destination, you use arrived dgid as sgid.
And use sgid as dgid because sgid contains destinations GID whom to respond to.


Re: counting file descriptors with a cgroup controller

2017-03-07 Thread Parav Pandit
Hi,

On Tue, Mar 7, 2017 at 2:48 PM, Tejun Heo  wrote:
>
> Hello,
>
> On Tue, Mar 07, 2017 at 09:06:49PM +0100, Krzysztof Opasiak wrote:
> > Personally, I don't want to use rlimit for this as it ends up returning
> > error code from for example open() when we hit the limit. This may lead to
> > some unpredictable crashes in  services (esp. those poor proprietary binary
> > blobs). Instead of injecting errors to service we would like to just get
> > notification that this service has more opened fds than it should and ask it
> > to restart in a polite way.
> >

How does those poor proprietary binary blobs remain polite after restart?
Do you mean you want to keep restarting them when it reaches the limit?

> > For memory seems to be quite easy to achieve as we can just get eventfd
> > notification when application passes given memory usage using memory cgroup
> > controller. Maybe you know some efficient method to do the same for fds?
>
> So, if all you wanna do is reliably detecting open(2) failures, can't
> you do that with bpf tracing?
>
> Thanks.
>
> --
> tejun
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: counting file descriptors with a cgroup controller

2017-03-07 Thread Parav Pandit
Hi,

On Tue, Mar 7, 2017 at 2:48 PM, Tejun Heo  wrote:
>
> Hello,
>
> On Tue, Mar 07, 2017 at 09:06:49PM +0100, Krzysztof Opasiak wrote:
> > Personally, I don't want to use rlimit for this as it ends up returning
> > error code from for example open() when we hit the limit. This may lead to
> > some unpredictable crashes in  services (esp. those poor proprietary binary
> > blobs). Instead of injecting errors to service we would like to just get
> > notification that this service has more opened fds than it should and ask it
> > to restart in a polite way.
> >

How does those poor proprietary binary blobs remain polite after restart?
Do you mean you want to keep restarting them when it reaches the limit?

> > For memory seems to be quite easy to achieve as we can just get eventfd
> > notification when application passes given memory usage using memory cgroup
> > controller. Maybe you know some efficient method to do the same for fds?
>
> So, if all you wanna do is reliably detecting open(2) failures, can't
> you do that with bpf tracing?
>
> Thanks.
>
> --
> tejun
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/2] device: Stop requiring that struct device is embedded in struct pci_dev

2017-03-07 Thread Parav Pandit


> -Original Message-
> From: gre...@linuxfoundation.org [mailto:gre...@linuxfoundation.org]
> Sent: Tuesday, March 7, 2017 11:14 AM
> To: Bart Van Assche <bart.vanass...@sandisk.com>
> Cc: linux-kernel@vger.kernel.org; linux-r...@vger.kernel.org; Parav Pandit
> <pa...@mellanox.com>; seb...@linux.vnet.ibm.com;
> li...@armlinux.org.uk; h...@zytor.com; mi...@redhat.com;
> dw...@infradead.org; bhelg...@google.com; dledf...@redhat.com;
> b...@kernel.crashing.org
> Subject: Re: [PATCH 1/2] device: Stop requiring that struct device is
> embedded in struct pci_dev
> 
> On Tue, Mar 07, 2017 at 04:54:58PM +, Bart Van Assche wrote:
> > On Tue, 2017-03-07 at 05:52 +0100, Greg Kroah-Hartman wrote:
> > > Somehow all other subsystems work just fine, don't instantly think
> > > that the driver core needs to bend to the will of the IB code,
> > > because you are somehow "special".  Hint, you aren't :)
> >
> > Hi Greg,
> >
> > In another e-mail Parav compared IB drivers with networking drivers.
> 
> Great, then notice that networking drivers don't need to do this type of crud
> :)
> 

Well what I compared is:
netdev has struct device and it also has underlying pci_dev based device.
ibdev has struct device and it also has underlying pci_dev based device.
So let us try to treat them in same way wherever possible and keep setup needed 
in ib drivers.

> > But I think that's a bad comparison: in the networking stack it's the
> > network driver itself that sets up and triggers DMA while in the IB
> > stack it's the upper layer protocol (ULP) driver that calls the
> > functions defined in struct dma_ops. For some IB HW drivers (hfi1, qib
> > and rdma_rxe) the ULP driver must use the DMA mapping operations from

DMA mapping and allocation is done in different layer for its own reason 
unrelated to this change.
If rxe, qib, hfi1 point to right dma_device, can't we remove the ib_dma_*()?



RE: [PATCH 1/2] device: Stop requiring that struct device is embedded in struct pci_dev

2017-03-07 Thread Parav Pandit


> -Original Message-
> From: gre...@linuxfoundation.org [mailto:gre...@linuxfoundation.org]
> Sent: Tuesday, March 7, 2017 11:14 AM
> To: Bart Van Assche 
> Cc: linux-kernel@vger.kernel.org; linux-r...@vger.kernel.org; Parav Pandit
> ; seb...@linux.vnet.ibm.com;
> li...@armlinux.org.uk; h...@zytor.com; mi...@redhat.com;
> dw...@infradead.org; bhelg...@google.com; dledf...@redhat.com;
> b...@kernel.crashing.org
> Subject: Re: [PATCH 1/2] device: Stop requiring that struct device is
> embedded in struct pci_dev
> 
> On Tue, Mar 07, 2017 at 04:54:58PM +, Bart Van Assche wrote:
> > On Tue, 2017-03-07 at 05:52 +0100, Greg Kroah-Hartman wrote:
> > > Somehow all other subsystems work just fine, don't instantly think
> > > that the driver core needs to bend to the will of the IB code,
> > > because you are somehow "special".  Hint, you aren't :)
> >
> > Hi Greg,
> >
> > In another e-mail Parav compared IB drivers with networking drivers.
> 
> Great, then notice that networking drivers don't need to do this type of crud
> :)
> 

Well what I compared is:
netdev has struct device and it also has underlying pci_dev based device.
ibdev has struct device and it also has underlying pci_dev based device.
So let us try to treat them in same way wherever possible and keep setup needed 
in ib drivers.

> > But I think that's a bad comparison: in the networking stack it's the
> > network driver itself that sets up and triggers DMA while in the IB
> > stack it's the upper layer protocol (ULP) driver that calls the
> > functions defined in struct dma_ops. For some IB HW drivers (hfi1, qib
> > and rdma_rxe) the ULP driver must use the DMA mapping operations from

DMA mapping and allocation is done in different layer for its own reason 
unrelated to this change.
If rxe, qib, hfi1 point to right dma_device, can't we remove the ib_dma_*()?



<    1   2   3   4   5   6   7   >