Re: [PATCH 2/2] drivers core: multi-threading device shutdown

2018-05-03 Thread Pavel Tatashin
On Thu, May 3, 2018 at 1:54 AM Tobin C. Harding  wrote:

> This code was a pleasure to read, super clean.


Hi Tobin,

Thank you very much for your review, I will address all of your comments in
the next revision.

BTW, I found  a lock ordering issue in my work that  that I will need to
fix:

In device_shutdown() device_lock() must be taken before
devices_kset->list_lock. Instead I will use device_trylock(), and if that
fails, will drop devices_kset->list_lock and acquiring the device lock
outside, and check that the device is still in the list after taking the
list lock again.

Pavel


Re: [PATCH 2/2] drivers core: multi-threading device shutdown

2018-05-02 Thread Tobin C. Harding
This code was a pleasure to read, super clean.

On Wed, May 02, 2018 at 11:59:31PM -0400, Pavel Tatashin wrote:
> When system is rebooted, halted or kexeced device_shutdown() is
> called.
> 
> This function shuts down every single device by calling either:
>   dev->bus->shutdown(dev)
>   dev->driver->shutdown(dev)
> 
> Even on a machine just with a moderate amount of devices, device_shutdown()
> may take multiple seconds to complete. Because many devices require a
> specific delays to perform this operation.
> 
> Here is sample analysis of time it takes to call device_shutdown() on
> two socket Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz machine.
> 
> device_shutdown   2.95s
>  mlx4_shutdown1.14s
>  megasas_shutdown 0.24s
>  ixgbe_shutdown   0.37s x 4 (four ixgbe devices on my machine).
>  the rest 0.09s
> 
> In mlx4 we spent the most time, but that is because there is a 1 second
> sleep:
> mlx4_shutdown
>  mlx4_unload_one
>   mlx4_free_ownership
>msleep(1000)
> 
> With megasas we spend quoter of second, but sometimes longer (up-to 0.5s)
> in this path:
> 
> megasas_shutdown
>   megasas_flush_cache
> megasas_issue_blocked_cmd
>   wait_event_timeout
> 
> Finally, with ixgbe_shutdown() it takes 0.37 for each device, but that time
> is spread all over the place, with bigger offenders:
> 
> ixgbe_shutdown
>   __ixgbe_shutdown
> ixgbe_close_suspend
>   ixgbe_down
> ixgbe_init_hw_generic
>   ixgbe_reset_hw_X540
> msleep(100);0.104483472
> ixgbe_get_san_mac_addr_generic  0.048414851
> ixgbe_get_wwn_prefix_generic0.048409893
>   ixgbe_start_hw_X540
> ixgbe_start_hw_generic
>   ixgbe_clear_hw_cntrs_generic  0.048581502
>   ixgbe_setup_fc_generic0.024225800
> 
> All the ixgbe_*generic functions end-up calling:
> ixgbe_read_eerd_X540()
>   ixgbe_acquire_swfw_sync_X540
> usleep_range(5000, 6000);
>   ixgbe_release_swfw_sync_X540
> usleep_range(5000, 6000);
> 
> While these are short sleeps, they end-up calling them over 24 times!
> 24 * 0.0055s = 0.132s. Adding-up to 0.528s for four devices.
> 
> While we should keep optimizing the individual device drivers, in some
> cases this is simply a hardware property that forces a specific delay, and
> we must wait.
> 
> So, the solution for this problem is to shutdown devices in parallel.
> However, we must shutdown children before shutting down parents, so parent
> device must wait for its children to finish.
> 
> With this patch, on the same machine devices_shutdown() takes 1.142s, and
> without mlx4 one second delay only 0.38s
> 
> Signed-off-by: Pavel Tatashin 
> ---
>  drivers/base/core.c | 238 +++-
>  1 file changed, 189 insertions(+), 49 deletions(-)
> 
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index b610816eb887..f370369a303b 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -25,6 +25,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "base.h"
>  #include "power/power.h"
> @@ -2102,6 +2103,59 @@ const char *device_get_devnode(struct device *dev,
>   return *tmp = s;
>  }
>  
> +/**
> + * device_children_count - device children count
> + * @parent: parent struct device.
> + *
> + * Returns number of children for this device or 0 if nonde.
> + */
> +static int device_children_count(struct device *parent)
> +{
> + struct klist_iter i;
> + int children = 0;
> +
> + if (!parent->p)
> + return 0;
> +
> + klist_iter_init(&parent->p->klist_children, &i);
> + while (next_device(&i))
> + children++;
> + klist_iter_exit(&i);
> +
> + return children;
> +}
> +
> +/**
> + * device_get_child_by_index - Return child using the provide index.
> + * @parent: parent struct device.
> + * @index:  Index of the child, where 0 is the first child in the children 
> list,
> + * and so on.
> + *
> + * Returns child or NULL if child with this index is not present.
> + */
> +static struct device *
> +device_get_child_by_index(struct device *parent, int index)
> +{
> + struct klist_iter i;
> + struct device *dev = NULL, *d;
> + int child_index = 0;
> +
> + if (!parent->p || index < 0)
> + return NULL;
> +
> + klist_iter_init(&parent->p->klist_children, &i);
> + while ((d = next_device(&i)) != NULL) {

perhaps:
while ((d = next_device(&i))) {

> + if (child_index == index) {
> + dev = d;
> + break;
> + }
> + child_index++;
> + }
> + klist_iter_exit(&i);
> +
> + return dev;
> +}
> +
>  /**
>   * device_for_each_child - device child iterator.
>   * @parent: parent struct device.
> @@ -2765,71 +281

[PATCH 2/2] drivers core: multi-threading device shutdown

2018-05-02 Thread Pavel Tatashin
When system is rebooted, halted or kexeced device_shutdown() is
called.

This function shuts down every single device by calling either:
dev->bus->shutdown(dev)
dev->driver->shutdown(dev)

Even on a machine just with a moderate amount of devices, device_shutdown()
may take multiple seconds to complete. Because many devices require a
specific delays to perform this operation.

Here is sample analysis of time it takes to call device_shutdown() on
two socket Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz machine.

device_shutdown 2.95s
 mlx4_shutdown  1.14s
 megasas_shutdown   0.24s
 ixgbe_shutdown 0.37s x 4 (four ixgbe devices on my machine).
 the rest   0.09s

In mlx4 we spent the most time, but that is because there is a 1 second
sleep:
mlx4_shutdown
 mlx4_unload_one
  mlx4_free_ownership
   msleep(1000)

With megasas we spend quoter of second, but sometimes longer (up-to 0.5s)
in this path:

megasas_shutdown
  megasas_flush_cache
megasas_issue_blocked_cmd
  wait_event_timeout

Finally, with ixgbe_shutdown() it takes 0.37 for each device, but that time
is spread all over the place, with bigger offenders:

ixgbe_shutdown
  __ixgbe_shutdown
ixgbe_close_suspend
  ixgbe_down
ixgbe_init_hw_generic
  ixgbe_reset_hw_X540
msleep(100);0.104483472
ixgbe_get_san_mac_addr_generic  0.048414851
ixgbe_get_wwn_prefix_generic0.048409893
  ixgbe_start_hw_X540
ixgbe_start_hw_generic
  ixgbe_clear_hw_cntrs_generic  0.048581502
  ixgbe_setup_fc_generic0.024225800

All the ixgbe_*generic functions end-up calling:
ixgbe_read_eerd_X540()
  ixgbe_acquire_swfw_sync_X540
usleep_range(5000, 6000);
  ixgbe_release_swfw_sync_X540
usleep_range(5000, 6000);

While these are short sleeps, they end-up calling them over 24 times!
24 * 0.0055s = 0.132s. Adding-up to 0.528s for four devices.

While we should keep optimizing the individual device drivers, in some
cases this is simply a hardware property that forces a specific delay, and
we must wait.

So, the solution for this problem is to shutdown devices in parallel.
However, we must shutdown children before shutting down parents, so parent
device must wait for its children to finish.

With this patch, on the same machine devices_shutdown() takes 1.142s, and
without mlx4 one second delay only 0.38s

Signed-off-by: Pavel Tatashin 
---
 drivers/base/core.c | 238 +++-
 1 file changed, 189 insertions(+), 49 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index b610816eb887..f370369a303b 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "base.h"
 #include "power/power.h"
@@ -2102,6 +2103,59 @@ const char *device_get_devnode(struct device *dev,
return *tmp = s;
 }
 
+/**
+ * device_children_count - device children count
+ * @parent: parent struct device.
+ *
+ * Returns number of children for this device or 0 if nonde.
+ */
+static int device_children_count(struct device *parent)
+{
+   struct klist_iter i;
+   int children = 0;
+
+   if (!parent->p)
+   return 0;
+
+   klist_iter_init(&parent->p->klist_children, &i);
+   while (next_device(&i))
+   children++;
+   klist_iter_exit(&i);
+
+   return children;
+}
+
+/**
+ * device_get_child_by_index - Return child using the provide index.
+ * @parent: parent struct device.
+ * @index:  Index of the child, where 0 is the first child in the children 
list,
+ * and so on.
+ *
+ * Returns child or NULL if child with this index is not present.
+ */
+static struct device *
+device_get_child_by_index(struct device *parent, int index)
+{
+   struct klist_iter i;
+   struct device *dev = NULL, *d;
+   int child_index = 0;
+
+   if (!parent->p || index < 0)
+   return NULL;
+
+   klist_iter_init(&parent->p->klist_children, &i);
+   while ((d = next_device(&i)) != NULL) {
+   if (child_index == index) {
+   dev = d;
+   break;
+   }
+   child_index++;
+   }
+   klist_iter_exit(&i);
+
+   return dev;
+}
+
 /**
  * device_for_each_child - device child iterator.
  * @parent: parent struct device.
@@ -2765,71 +2819,157 @@ int device_move(struct device *dev, struct device 
*new_parent,
 }
 EXPORT_SYMBOL_GPL(device_move);
 
+/*
+ * device_shutdown_one - call ->shutdown() for the device passed as
+ * argument.
+ */
+static void device_shutdown_one(struct device *dev)
+{
+   /* Don't allow any more runtime suspends */
+   pm_runtime_get_noresume(dev);
+   pm_runtime_barrier(dev);
+
+   if (dev->class && dev->class->shutd