date:20161126

Re: [PATCH 1/2] bcache: Remove redundant set_capacity

2016-11-26 Thread Coly Li

On 2016/11/25 上午9:39, Yijing Wang wrote:
> set_capacity() has been called in bcache_device_init(),
> remove the redundant one.
> 
> Signed-off-by: Yijing Wang 
> ---
>  drivers/md/bcache/super.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index 849ad44..b638a16 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -1126,9 +1126,6 @@ static int cached_dev_init(struct cached_dev *dc, 
> unsigned block_size)
>   if (ret)
>   return ret;
>  
> - set_capacity(dc->disk.disk,
> -  dc->bdev->bd_part->nr_sects - dc->sb.data_offset);
> -
>   dc->disk.disk->queue->backing_dev_info.ra_pages =
>   max(dc->disk.disk->queue->backing_dev_info.ra_pages,
>   q->backing_dev_info.ra_pages);
> 

Hi Yijing,

Nice catch, it looks good to me.

Acked-by: Coly Li 

-- 
Coly Li

Re: [PATCH 1/2] bcache: Remove redundant set_capacity

2016-11-26 Thread Coly Li

On 2016/11/25 上午9:39, Yijing Wang wrote:
> set_capacity() has been called in bcache_device_init(),
> remove the redundant one.
> 
> Signed-off-by: Yijing Wang 
> ---
>  drivers/md/bcache/super.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index 849ad44..b638a16 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -1126,9 +1126,6 @@ static int cached_dev_init(struct cached_dev *dc, 
> unsigned block_size)
>   if (ret)
>   return ret;
>  
> - set_capacity(dc->disk.disk,
> -  dc->bdev->bd_part->nr_sects - dc->sb.data_offset);
> -
>   dc->disk.disk->queue->backing_dev_info.ra_pages =
>   max(dc->disk.disk->queue->backing_dev_info.ra_pages,
>   q->backing_dev_info.ra_pages);
> 

Hi Yijing,

Nice catch, it looks good to me.

Acked-by: Coly Li 

-- 
Coly Li

Re: [PATCH 05/29] fscrypt: Let fs select encryption index/tweak

2016-11-26 Thread Eric Biggers

On Thu, Nov 24, 2016 at 04:57:51PM +0100, David Gstir wrote:
> 
> > Also, if the intent is just that the 'index' represent the data's offset in
> > filesystem blocks rather than in pages, then perhaps it should be 
> > documented as
> > such.  (This would be correct for ext4 and f2fs; they just happen to only
> > support encryption with block_size = PAGE_SIZE currently.)
> 
> Yes, in case of UBIFS it is exactly that.
> 
> However, I'm actually not really happy with the name 'index'. I'd rather call 
> it 'iv' (or 'tweak') directly. In the context of encryption its purpose will 
> be more obvious, especially in regard to the "IV _must_ not be reused" 
> constraint you mentioned above.
> 

Well, the way I'd prefer to think about it is that the filesystem does not
provide an IV directly (it doesn't anyway, since the actual IV is a u8[16]), but
rather the number of the logical block of the file, like 'u64 lblk_num'.  And
that is sufficient to avoid IV reuse.

Eric

Re: [PATCH 05/29] fscrypt: Let fs select encryption index/tweak

2016-11-26 Thread Eric Biggers

On Thu, Nov 24, 2016 at 04:57:51PM +0100, David Gstir wrote:
> 
> > Also, if the intent is just that the 'index' represent the data's offset in
> > filesystem blocks rather than in pages, then perhaps it should be 
> > documented as
> > such.  (This would be correct for ext4 and f2fs; they just happen to only
> > support encryption with block_size = PAGE_SIZE currently.)
> 
> Yes, in case of UBIFS it is exactly that.
> 
> However, I'm actually not really happy with the name 'index'. I'd rather call 
> it 'iv' (or 'tweak') directly. In the context of encryption its purpose will 
> be more obvious, especially in regard to the "IV _must_ not be reused" 
> constraint you mentioned above.
> 

Well, the way I'd prefer to think about it is that the filesystem does not
provide an IV directly (it doesn't anyway, since the actual IV is a u8[16]), but
rather the number of the logical block of the file, like 'u64 lblk_num'.  And
that is sufficient to avoid IV reuse.

Eric

Re: [PATCH 01/29] fscrypt: Add in-place encryption mode

2016-11-26 Thread Eric Biggers

On Fri, Nov 25, 2016 at 01:09:05PM +0100, David Gstir wrote:
> 
> > Additionally, after this change the name of the flag FS_WRITE_PATH_FL is
> > misleading, since it now really indicates the presence of a bounce buffer 
> > rather
> > than the "write path".
> 
> I can see no use case for FS_WRITE_PATH_FL other than to indicate that the 
> bounce buffer has to be free'd. Is there any reason why we should not just 
> remove it and check the presence of a bounce buffer by a simple "if 
> (ctx->w.bounce_page)" ?
> 

It appears that the flag is needed because the 'w' (write) and 'r' (read)
members are in union.  So you can't simply check for 'ctx->w.bounce_page'.

Eric

Re: [PATCH 01/29] fscrypt: Add in-place encryption mode

2016-11-26 Thread Eric Biggers

On Fri, Nov 25, 2016 at 01:09:05PM +0100, David Gstir wrote:
> 
> > Additionally, after this change the name of the flag FS_WRITE_PATH_FL is
> > misleading, since it now really indicates the presence of a bounce buffer 
> > rather
> > than the "write path".
> 
> I can see no use case for FS_WRITE_PATH_FL other than to indicate that the 
> bounce buffer has to be free'd. Is there any reason why we should not just 
> remove it and check the presence of a bounce buffer by a simple "if 
> (ctx->w.bounce_page)" ?
> 

It appears that the flag is needed because the 'w' (write) and 'r' (read)
members are in union.  So you can't simply check for 'ctx->w.bounce_page'.

Eric

RE: [PATCH] net: fec: turn on device when extracting statistics

2016-11-26 Thread Andy Duan

From: Nikita Yushchenko  Sent: Friday, 
November 25, 2016 6:02 PM
 >To: Andy Duan ; David S. Miller
 >; Troy Kisky ;
 >Andrew Lunn ; Eric Nelson ; Philippe
 >Reynes ; Johannes Berg ;
 >net...@vger.kernel.org; linux-kernel@vger.kernel.org
 >Cc: Chris Healy ; Nikita Yushchenko
 >
 >Subject: [PATCH] net: fec: turn on device when extracting statistics
 >
 >Execution 'ethtool -S' on fec device that is down causes OOPS on Vybrid
 >board:
 >
 >Unhandled fault: external abort on non-linefetch (0x1008) at 0xe0898200 pgd
 >= ddecc000 [e0898200] *pgd=9e406811, *pte=400d1653, *ppte=400d1453
 >Internal error: : 1008 [#1] SMP ARM ...
 >
 >Reason of OOPS is that fec_enet_get_ethtool_stats() accesses fec registers
 >while IPG clock is stopped by PM.
 >
 >Fix that by wrapping statistics extraction into pm_runtime_get_sync() ...
 >pm_runtime_put_autosuspend() braces.
 >
 >Signed-off-by: Nikita Yushchenko 
 >---

Acked-by: Fugang Duan 

 > drivers/net/ethernet/freescale/fec_main.c | 11 ++-
 > 1 file changed, 10 insertions(+), 1 deletion(-)
 >
 >diff --git a/drivers/net/ethernet/freescale/fec_main.c
 >b/drivers/net/ethernet/freescale/fec_main.c
 >index 5aa9d4ded214..9c7592b80ce8 100644
 >--- a/drivers/net/ethernet/freescale/fec_main.c
 >+++ b/drivers/net/ethernet/freescale/fec_main.c
 >@@ -2317,10 +2317,19 @@ static void fec_enet_get_ethtool_stats(struct
 >net_device *dev,
 >  struct ethtool_stats *stats, u64 *data)  {
 >  struct fec_enet_private *fep = netdev_priv(dev);
 >- int i;
 >+ int i, ret;
 >+
 >+ ret = pm_runtime_get_sync(>pdev->dev);
 >+ if (IS_ERR_VALUE(ret)) {
 >+ memset(data, 0, sizeof(*data) * ARRAY_SIZE(fec_stats));
 >+ return;
 >+ }
 >
 >  for (i = 0; i < ARRAY_SIZE(fec_stats); i++)
 >  data[i] = readl(fep->hwp + fec_stats[i].offset);
 >+
 >+ pm_runtime_mark_last_busy(>pdev->dev);
 >+ pm_runtime_put_autosuspend(>pdev->dev);
 > }
 >
 > static void fec_enet_get_strings(struct net_device *netdev,
 >--
 >2.1.4

RE: [PATCH] net: fec: turn on device when extracting statistics

2016-11-26 Thread Andy Duan

From: Nikita Yushchenko  Sent: Friday, 
November 25, 2016 6:02 PM
 >To: Andy Duan ; David S. Miller
 >; Troy Kisky ;
 >Andrew Lunn ; Eric Nelson ; Philippe
 >Reynes ; Johannes Berg ;
 >net...@vger.kernel.org; linux-kernel@vger.kernel.org
 >Cc: Chris Healy ; Nikita Yushchenko
 >
 >Subject: [PATCH] net: fec: turn on device when extracting statistics
 >
 >Execution 'ethtool -S' on fec device that is down causes OOPS on Vybrid
 >board:
 >
 >Unhandled fault: external abort on non-linefetch (0x1008) at 0xe0898200 pgd
 >= ddecc000 [e0898200] *pgd=9e406811, *pte=400d1653, *ppte=400d1453
 >Internal error: : 1008 [#1] SMP ARM ...
 >
 >Reason of OOPS is that fec_enet_get_ethtool_stats() accesses fec registers
 >while IPG clock is stopped by PM.
 >
 >Fix that by wrapping statistics extraction into pm_runtime_get_sync() ...
 >pm_runtime_put_autosuspend() braces.
 >
 >Signed-off-by: Nikita Yushchenko 
 >---

Acked-by: Fugang Duan 

 > drivers/net/ethernet/freescale/fec_main.c | 11 ++-
 > 1 file changed, 10 insertions(+), 1 deletion(-)
 >
 >diff --git a/drivers/net/ethernet/freescale/fec_main.c
 >b/drivers/net/ethernet/freescale/fec_main.c
 >index 5aa9d4ded214..9c7592b80ce8 100644
 >--- a/drivers/net/ethernet/freescale/fec_main.c
 >+++ b/drivers/net/ethernet/freescale/fec_main.c
 >@@ -2317,10 +2317,19 @@ static void fec_enet_get_ethtool_stats(struct
 >net_device *dev,
 >  struct ethtool_stats *stats, u64 *data)  {
 >  struct fec_enet_private *fep = netdev_priv(dev);
 >- int i;
 >+ int i, ret;
 >+
 >+ ret = pm_runtime_get_sync(>pdev->dev);
 >+ if (IS_ERR_VALUE(ret)) {
 >+ memset(data, 0, sizeof(*data) * ARRAY_SIZE(fec_stats));
 >+ return;
 >+ }
 >
 >  for (i = 0; i < ARRAY_SIZE(fec_stats); i++)
 >  data[i] = readl(fep->hwp + fec_stats[i].offset);
 >+
 >+ pm_runtime_mark_last_busy(>pdev->dev);
 >+ pm_runtime_put_autosuspend(>pdev->dev);
 > }
 >
 > static void fec_enet_get_strings(struct net_device *netdev,
 >--
 >2.1.4

Xmas Offer

2016-11-26 Thread Mrs Julie Leach

You are a recipient to Mrs Julie Leach Donation of $3 million USD. Contact ( 
julieleac...@gmail.com ) for claims.

Xmas Offer

2016-11-26 Thread Mrs Julie Leach

You are a recipient to Mrs Julie Leach Donation of $3 million USD. Contact ( 
julieleac...@gmail.com ) for claims.

Did you get my message this time?

2016-11-26 Thread Friedrich Mayrhofer




This is the second time i am sending you this mail.

I, Friedrich Mayrhofer Donate $ 1,000,000.00 to You, Email  Me personally for 
more details.

Regards.
Friedrich Mayrhofer

Did you get my message this time?

2016-11-26 Thread Friedrich Mayrhofer




This is the second time i am sending you this mail.

I, Friedrich Mayrhofer Donate $ 1,000,000.00 to You, Email  Me personally for 
more details.

Regards.
Friedrich Mayrhofer

Re: INFO: rcu_sched detected stalls on CPUs/tasks with `kswapd` and `mem_cgroup_shrink_node`

2016-11-26 Thread Christopher S. Aker


> On Nov 24, 2016, at 5:15 AM, Michal Hocko  wrote:
> 
>> * No rcu_* warnings on that machine with 4.7.2, but with 4.8.4 , 4.8.6 ,
>> 4.8.8 and now 4.9.0-rc5+Pauls patch
> 
> I assume you haven't tried the Linus 4.8 kernel without any further
> stable patches? Just to be sure we are not talking about some later
> regression which found its way to the stable tree.

We are also seeing this frequently on our fleet since moving from 4.7.x to 4.8. 
This is from a machine running vanilla 4.8.6 just a few moments ago:

INFO: rcu_sched detected stalls on CPUs/tasks:
13-...: (420 ticks this GP) idle=ce1/140/0 
softirq=225550784/225550904 fqs=87105 
(detected by 26, t=600030 jiffies, g=68185325, c=68185324, q=344996)
Task dump for CPU 13:
kswapd1 R  running task12200  1840  2 0x0808
 0001 0034 012b 3139
 8b643fffb000 8b028cee7cf8 8b028cee7cf8 8b028cee7d08
 8b028cee7d08 8b028cee7d18 8b028cee7d18 8b02
Call Trace:
 [] ? shrink_node+0xcd/0x2f0
 [] ? kswapd+0x304/0x710
 [] ? mem_cgroup_shrink_node+0x160/0x160
 [] ? kthread+0xc4/0xe0
 [] ? ret_from_fork+0x1f/0x40
 [] ? kthread_worker_fn+0x140/0x140

The machine will lag terribly during these occurrences .. some will eventually 
recover, some will spiral down and require a reboot.

-Chris

Re: INFO: rcu_sched detected stalls on CPUs/tasks with `kswapd` and `mem_cgroup_shrink_node`

2016-11-26 Thread Christopher S. Aker


> On Nov 24, 2016, at 5:15 AM, Michal Hocko  wrote:
> 
>> * No rcu_* warnings on that machine with 4.7.2, but with 4.8.4 , 4.8.6 ,
>> 4.8.8 and now 4.9.0-rc5+Pauls patch
> 
> I assume you haven't tried the Linus 4.8 kernel without any further
> stable patches? Just to be sure we are not talking about some later
> regression which found its way to the stable tree.

We are also seeing this frequently on our fleet since moving from 4.7.x to 4.8. 
This is from a machine running vanilla 4.8.6 just a few moments ago:

INFO: rcu_sched detected stalls on CPUs/tasks:
13-...: (420 ticks this GP) idle=ce1/140/0 
softirq=225550784/225550904 fqs=87105 
(detected by 26, t=600030 jiffies, g=68185325, c=68185324, q=344996)
Task dump for CPU 13:
kswapd1 R  running task12200  1840  2 0x0808
 0001 0034 012b 3139
 8b643fffb000 8b028cee7cf8 8b028cee7cf8 8b028cee7d08
 8b028cee7d08 8b028cee7d18 8b028cee7d18 8b02
Call Trace:
 [] ? shrink_node+0xcd/0x2f0
 [] ? kswapd+0x304/0x710
 [] ? mem_cgroup_shrink_node+0x160/0x160
 [] ? kthread+0xc4/0xe0
 [] ? ret_from_fork+0x1f/0x40
 [] ? kthread_worker_fn+0x140/0x140

The machine will lag terribly during these occurrences .. some will eventually 
recover, some will spiral down and require a reboot.

-Chris

[PATCH 3/3] hwmon: tmp108: Update driver to use hwmon_chip_info.

2016-11-26 Thread John Muir

Move the tmp108 driver from hwmon attribute groups to
hwmon_chip_info.

Signed-off-by: John Muir 
---
 drivers/hwmon/tmp108.c | 142 +
 1 file changed, 73 insertions(+), 69 deletions(-)

diff --git a/drivers/hwmon/tmp108.c b/drivers/hwmon/tmp108.c
index b13d652..29ddc2e 100644
--- a/drivers/hwmon/tmp108.c
+++ b/drivers/hwmon/tmp108.c
@@ -25,7 +25,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 
 #defineDRIVER_NAME "tmp108"
@@ -87,12 +86,6 @@ struct tmp108 {
unsigned long ready_time;
 };
 
-static const u8 tmp108_temp_reg[TMP108_TEMP_REG_COUNT] = {
-   TMP108_REG_TEMP,
-   TMP108_REG_TLOW,
-   TMP108_REG_THIGH,
-};
-
 /* convert 12-bit TMP108 register value to milliCelsius */
 static inline int tmp108_temp_reg_to_mC(s16 val)
 {
@@ -105,23 +98,28 @@ static inline u16 tmp108_mC_to_temp_reg(int val)
return (val * 256) / 1000;
 }
 
-static int tmp108_read_reg_temp(struct device *dev, int reg, int *temp)
+static int tmp108_read(struct device *dev, enum hwmon_sensor_types type,
+  u32 attr, int channel, long *temp)
 {
struct tmp108 *tmp108 = dev_get_drvdata(dev);
unsigned int regval;
-   int err;
+   int err, reg;
 
-   switch (reg) {
-   case TMP108_REG_TEMP:
+   switch (attr) {
+   case hwmon_temp_input:
/* Is it too early to return a conversion ? */
if (time_before(jiffies, tmp108->ready_time)) {
dev_dbg(dev, "%s: Conversion not ready yet..\n",
__func__);
return -EAGAIN;
}
+   reg = TMP108_REG_TEMP;
+   break;
+   case hwmon_temp_min:
+   reg = TMP108_REG_TLOW;
break;
-   case TMP108_REG_TLOW:
-   case TMP108_REG_THIGH:
+   case hwmon_temp_max:
+   reg = TMP108_REG_THIGH;
break;
default:
return -EOPNOTSUPP;
@@ -135,68 +133,79 @@ static int tmp108_read_reg_temp(struct device *dev, int 
reg, int *temp)
return 0;
 }
 
-static ssize_t tmp108_show_temp(struct device *dev,
-   struct device_attribute *attr,
-   char *buf)
+static int tmp108_write(struct device *dev, enum hwmon_sensor_types type,
+   u32 attr, int channel, long temp)
 {
-   struct sensor_device_attribute *sda = to_sensor_dev_attr(attr);
-   int temp;
-   int err;
-
-   if (sda->index >= ARRAY_SIZE(tmp108_temp_reg))
-   return -EINVAL;
+   struct tmp108 *tmp108 = dev_get_drvdata(dev);
+   int reg;
 
-   err = tmp108_read_reg_temp(dev, tmp108_temp_reg[sda->index], );
-   if (err)
-   return err;
+   switch (attr) {
+   case hwmon_temp_min:
+   reg = TMP108_REG_TLOW;
+   break;
+   case hwmon_temp_max:
+   reg = TMP108_REG_THIGH;
+   break;
+   default:
+   return -EOPNOTSUPP;
+   }
 
-   return snprintf(buf, PAGE_SIZE, "%d\n", temp);
+   temp = clamp_val(temp, TMP108_TEMP_MIN_MC, TMP108_TEMP_MAX_MC);
+   return regmap_write(tmp108->regmap, reg, tmp108_mC_to_temp_reg(temp));
 }
 
-static ssize_t tmp108_set_temp(struct device *dev,
-  struct device_attribute *attr,
-  const char *buf, size_t count)
+static umode_t tmp108_is_visible(const void *data, enum hwmon_sensor_types 
type,
+u32 attr, int channel)
 {
-   struct sensor_device_attribute *sda = to_sensor_dev_attr(attr);
-   struct tmp108 *tmp108 = dev_get_drvdata(dev);
-   long temp;
-   int err;
+   if (type != hwmon_temp)
+   return 0;
+
+   switch (attr) {
+   case hwmon_temp_input:
+   return S_IRUGO;
+   case hwmon_temp_min:
+   case hwmon_temp_max:
+   return S_IRUGO | S_IWUSR;
+   default:
+   return 0;
+   }
+}
 
-   if (sda->index >= ARRAY_SIZE(tmp108_temp_reg))
-   return -EINVAL;
+static u32 tmp108_chip_config[] = {
+   HWMON_C_REGISTER_TZ,
+   0
+};
 
-   if (kstrtol(buf, 10, ) < 0)
-   return -EINVAL;
+static const struct hwmon_channel_info tmp108_chip = {
+   .type = hwmon_chip,
+   .config = tmp108_chip_config,
+};
 
-   temp = clamp_val(temp, TMP108_TEMP_MIN_MC, TMP108_TEMP_MAX_MC);
-   err = regmap_write(tmp108->regmap, tmp108_temp_reg[sda->index],
-  tmp108_mC_to_temp_reg(temp));
-   if (err)
-   return err;
-   return count;
-}
+static u32 tmp108_temp_config[] = {
+   HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_MIN,
+   0
+};
 
-static SENSOR_DEVICE_ATTR(temp1_input, S_IRUGO, tmp108_show_temp, NULL, 0);
-static SENSOR_DEVICE_ATTR(temp1_min, S_IWUSR | S_IRUGO,

[PATCH 2/3] hwmon: tmp108: Use devm variants of registration interfaces.

2016-11-26 Thread John Muir

From: John Muir 

Use the devm hwmon and thermal zone registration functions to
clean up the code and remove the need for an i2c_driver.remove
callback.

Signed-off-by: John Muir 
---
 drivers/hwmon/tmp108.c | 28 
 1 file changed, 8 insertions(+), 20 deletions(-)

diff --git a/drivers/hwmon/tmp108.c b/drivers/hwmon/tmp108.c
index da64517..b13d652 100644
--- a/drivers/hwmon/tmp108.c
+++ b/drivers/hwmon/tmp108.c
@@ -83,8 +83,6 @@
 
 struct tmp108 {
struct regmap *regmap;
-   struct device *hwmon_dev;
-   struct thermal_zone_device *tz;
u16 config;
unsigned long ready_time;
 };
@@ -245,6 +243,7 @@ static int tmp108_probe(struct i2c_client *client,
 {
struct device *dev = >dev;
struct device *hwmon_dev;
+   struct thermal_zone_device *tz;
struct tmp108 *tmp108;
unsigned int regval;
int err;
@@ -353,18 +352,18 @@ static int tmp108_probe(struct i2c_client *client,
if (err)
return err;
 
-   hwmon_dev = hwmon_device_register_with_groups(dev, client->name,
- tmp108, tmp108_groups);
+   hwmon_dev = devm_hwmon_device_register_with_groups(dev, client->name,
+  tmp108,
+  tmp108_groups);
if (IS_ERR(hwmon_dev)) {
dev_dbg(dev, "unable to register hwmon device\n");
return PTR_ERR(hwmon_dev);
}
 
-   tmp108->hwmon_dev = hwmon_dev;
-   tmp108->tz = thermal_zone_of_sensor_register(hwmon_dev, 0, hwmon_dev,
-_of_thermal_ops);
-   if (IS_ERR(tmp108->tz))
-   return PTR_ERR(tmp108->tz);
+   tz = devm_thermal_zone_of_sensor_register(hwmon_dev, 0, hwmon_dev,
+ _of_thermal_ops);
+   if (IS_ERR(tz))
+   return PTR_ERR(tz);
 
dev_info(dev, "%s, alert: active %s, hyst: %uC, conv: %ucHz\n",
 (tmp108->config & TMP108_CONF_TM) != 0 ?
@@ -374,16 +373,6 @@ static int tmp108_probe(struct i2c_client *client,
return 0;
 }
 
-static int tmp108_remove(struct i2c_client *client)
-{
-   struct tmp108 *tmp108 = i2c_get_clientdata(client);
-
-   thermal_zone_of_sensor_unregister(tmp108->hwmon_dev, tmp108->tz);
-   hwmon_device_unregister(tmp108->hwmon_dev);
-
-   return 0;
-}
-
 #ifdef CONFIG_PM_SLEEP
 static int tmp108_suspend(struct device *dev)
 {
@@ -418,7 +407,6 @@ static struct i2c_driver tmp108_driver = {
.driver.name= DRIVER_NAME,
.driver.pm  = _dev_pm_ops,
.probe  = tmp108_probe,
-   .remove = tmp108_remove,
.id_table   = tmp108_id,
 };
 
-- 
2.8.0.rc3.226.g39d4020

[PATCH 3/3] hwmon: tmp108: Update driver to use hwmon_chip_info.

2016-11-26 Thread John Muir

Move the tmp108 driver from hwmon attribute groups to
hwmon_chip_info.

Signed-off-by: John Muir 
---
 drivers/hwmon/tmp108.c | 142 +
 1 file changed, 73 insertions(+), 69 deletions(-)

diff --git a/drivers/hwmon/tmp108.c b/drivers/hwmon/tmp108.c
index b13d652..29ddc2e 100644
--- a/drivers/hwmon/tmp108.c
+++ b/drivers/hwmon/tmp108.c
@@ -25,7 +25,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 
 #defineDRIVER_NAME "tmp108"
@@ -87,12 +86,6 @@ struct tmp108 {
unsigned long ready_time;
 };
 
-static const u8 tmp108_temp_reg[TMP108_TEMP_REG_COUNT] = {
-   TMP108_REG_TEMP,
-   TMP108_REG_TLOW,
-   TMP108_REG_THIGH,
-};
-
 /* convert 12-bit TMP108 register value to milliCelsius */
 static inline int tmp108_temp_reg_to_mC(s16 val)
 {
@@ -105,23 +98,28 @@ static inline u16 tmp108_mC_to_temp_reg(int val)
return (val * 256) / 1000;
 }
 
-static int tmp108_read_reg_temp(struct device *dev, int reg, int *temp)
+static int tmp108_read(struct device *dev, enum hwmon_sensor_types type,
+  u32 attr, int channel, long *temp)
 {
struct tmp108 *tmp108 = dev_get_drvdata(dev);
unsigned int regval;
-   int err;
+   int err, reg;
 
-   switch (reg) {
-   case TMP108_REG_TEMP:
+   switch (attr) {
+   case hwmon_temp_input:
/* Is it too early to return a conversion ? */
if (time_before(jiffies, tmp108->ready_time)) {
dev_dbg(dev, "%s: Conversion not ready yet..\n",
__func__);
return -EAGAIN;
}
+   reg = TMP108_REG_TEMP;
+   break;
+   case hwmon_temp_min:
+   reg = TMP108_REG_TLOW;
break;
-   case TMP108_REG_TLOW:
-   case TMP108_REG_THIGH:
+   case hwmon_temp_max:
+   reg = TMP108_REG_THIGH;
break;
default:
return -EOPNOTSUPP;
@@ -135,68 +133,79 @@ static int tmp108_read_reg_temp(struct device *dev, int 
reg, int *temp)
return 0;
 }
 
-static ssize_t tmp108_show_temp(struct device *dev,
-   struct device_attribute *attr,
-   char *buf)
+static int tmp108_write(struct device *dev, enum hwmon_sensor_types type,
+   u32 attr, int channel, long temp)
 {
-   struct sensor_device_attribute *sda = to_sensor_dev_attr(attr);
-   int temp;
-   int err;
-
-   if (sda->index >= ARRAY_SIZE(tmp108_temp_reg))
-   return -EINVAL;
+   struct tmp108 *tmp108 = dev_get_drvdata(dev);
+   int reg;
 
-   err = tmp108_read_reg_temp(dev, tmp108_temp_reg[sda->index], );
-   if (err)
-   return err;
+   switch (attr) {
+   case hwmon_temp_min:
+   reg = TMP108_REG_TLOW;
+   break;
+   case hwmon_temp_max:
+   reg = TMP108_REG_THIGH;
+   break;
+   default:
+   return -EOPNOTSUPP;
+   }
 
-   return snprintf(buf, PAGE_SIZE, "%d\n", temp);
+   temp = clamp_val(temp, TMP108_TEMP_MIN_MC, TMP108_TEMP_MAX_MC);
+   return regmap_write(tmp108->regmap, reg, tmp108_mC_to_temp_reg(temp));
 }
 
-static ssize_t tmp108_set_temp(struct device *dev,
-  struct device_attribute *attr,
-  const char *buf, size_t count)
+static umode_t tmp108_is_visible(const void *data, enum hwmon_sensor_types 
type,
+u32 attr, int channel)
 {
-   struct sensor_device_attribute *sda = to_sensor_dev_attr(attr);
-   struct tmp108 *tmp108 = dev_get_drvdata(dev);
-   long temp;
-   int err;
+   if (type != hwmon_temp)
+   return 0;
+
+   switch (attr) {
+   case hwmon_temp_input:
+   return S_IRUGO;
+   case hwmon_temp_min:
+   case hwmon_temp_max:
+   return S_IRUGO | S_IWUSR;
+   default:
+   return 0;
+   }
+}
 
-   if (sda->index >= ARRAY_SIZE(tmp108_temp_reg))
-   return -EINVAL;
+static u32 tmp108_chip_config[] = {
+   HWMON_C_REGISTER_TZ,
+   0
+};
 
-   if (kstrtol(buf, 10, ) < 0)
-   return -EINVAL;
+static const struct hwmon_channel_info tmp108_chip = {
+   .type = hwmon_chip,
+   .config = tmp108_chip_config,
+};
 
-   temp = clamp_val(temp, TMP108_TEMP_MIN_MC, TMP108_TEMP_MAX_MC);
-   err = regmap_write(tmp108->regmap, tmp108_temp_reg[sda->index],
-  tmp108_mC_to_temp_reg(temp));
-   if (err)
-   return err;
-   return count;
-}
+static u32 tmp108_temp_config[] = {
+   HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_MIN,
+   0
+};
 
-static SENSOR_DEVICE_ATTR(temp1_input, S_IRUGO, tmp108_show_temp, NULL, 0);
-static SENSOR_DEVICE_ATTR(temp1_min, S_IWUSR | S_IRUGO, tmp108_show_temp,
-

[PATCH 2/3] hwmon: tmp108: Use devm variants of registration interfaces.

2016-11-26 Thread John Muir

From: John Muir 

Use the devm hwmon and thermal zone registration functions to
clean up the code and remove the need for an i2c_driver.remove
callback.

Signed-off-by: John Muir 
---
 drivers/hwmon/tmp108.c | 28 
 1 file changed, 8 insertions(+), 20 deletions(-)

diff --git a/drivers/hwmon/tmp108.c b/drivers/hwmon/tmp108.c
index da64517..b13d652 100644
--- a/drivers/hwmon/tmp108.c
+++ b/drivers/hwmon/tmp108.c
@@ -83,8 +83,6 @@
 
 struct tmp108 {
struct regmap *regmap;
-   struct device *hwmon_dev;
-   struct thermal_zone_device *tz;
u16 config;
unsigned long ready_time;
 };
@@ -245,6 +243,7 @@ static int tmp108_probe(struct i2c_client *client,
 {
struct device *dev = >dev;
struct device *hwmon_dev;
+   struct thermal_zone_device *tz;
struct tmp108 *tmp108;
unsigned int regval;
int err;
@@ -353,18 +352,18 @@ static int tmp108_probe(struct i2c_client *client,
if (err)
return err;
 
-   hwmon_dev = hwmon_device_register_with_groups(dev, client->name,
- tmp108, tmp108_groups);
+   hwmon_dev = devm_hwmon_device_register_with_groups(dev, client->name,
+  tmp108,
+  tmp108_groups);
if (IS_ERR(hwmon_dev)) {
dev_dbg(dev, "unable to register hwmon device\n");
return PTR_ERR(hwmon_dev);
}
 
-   tmp108->hwmon_dev = hwmon_dev;
-   tmp108->tz = thermal_zone_of_sensor_register(hwmon_dev, 0, hwmon_dev,
-_of_thermal_ops);
-   if (IS_ERR(tmp108->tz))
-   return PTR_ERR(tmp108->tz);
+   tz = devm_thermal_zone_of_sensor_register(hwmon_dev, 0, hwmon_dev,
+ _of_thermal_ops);
+   if (IS_ERR(tz))
+   return PTR_ERR(tz);
 
dev_info(dev, "%s, alert: active %s, hyst: %uC, conv: %ucHz\n",
 (tmp108->config & TMP108_CONF_TM) != 0 ?
@@ -374,16 +373,6 @@ static int tmp108_probe(struct i2c_client *client,
return 0;
 }
 
-static int tmp108_remove(struct i2c_client *client)
-{
-   struct tmp108 *tmp108 = i2c_get_clientdata(client);
-
-   thermal_zone_of_sensor_unregister(tmp108->hwmon_dev, tmp108->tz);
-   hwmon_device_unregister(tmp108->hwmon_dev);
-
-   return 0;
-}
-
 #ifdef CONFIG_PM_SLEEP
 static int tmp108_suspend(struct device *dev)
 {
@@ -418,7 +407,6 @@ static struct i2c_driver tmp108_driver = {
.driver.name= DRIVER_NAME,
.driver.pm  = _dev_pm_ops,
.probe  = tmp108_probe,
-   .remove = tmp108_remove,
.id_table   = tmp108_id,
 };
 
-- 
2.8.0.rc3.226.g39d4020

[PATCH 1/3] hwmon: Add Texas Instruments TMP108 temperature sensor driver.

2016-11-26 Thread John Muir

Add support for the TI TMP108 temperature sensor with some device
configuration parameters.

Signed-off-by: John Muir 
---
 Documentation/devicetree/bindings/hwmon/tmp108.txt |  27 ++
 Documentation/hwmon/tmp108 |  38 ++
 drivers/hwmon/Kconfig  |  11 +
 drivers/hwmon/Makefile |   1 +
 drivers/hwmon/tmp108.c | 429 +
 5 files changed, 506 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/hwmon/tmp108.txt
 create mode 100644 Documentation/hwmon/tmp108
 create mode 100644 drivers/hwmon/tmp108.c

diff --git a/Documentation/devicetree/bindings/hwmon/tmp108.txt 
b/Documentation/devicetree/bindings/hwmon/tmp108.txt
new file mode 100644
index 000..210af63
--- /dev/null
+++ b/Documentation/devicetree/bindings/hwmon/tmp108.txt
@@ -0,0 +1,27 @@
+TMP108 temperature sensor
+-
+
+This device supports I2C only.
+
+Requires node properties:
+- compatible : "ti,tmp108"
+- reg : the I2C address of the device. This is 0x48, 0x49, 0x4a, or 0x4b.
+
+Optional node properties:
+- ti,thermostat-mode-comparitor : (boolean) select the comparitor mode for the
+  thermostat rather than the default interrupt-mode.
+- ti,alert-active-high : (boolean) make the alert pin active-high instead of
+  the default active-low.
+- ti,conversion-rate-cHz : (integer, cHz) select the conversion frequency for
+  continuous mode, in centi-Hz: 25, 100 (default), 400, or 1600.
+- ti,hysteresis : (integer, C) select the hysteresis value: 0, 1, 2, or 4
+  (celcius).
+
+Example:
+   tmp108@48 {
+   compatible = "ti,tmp108";
+   reg = <0x48>;
+   ti,alert-active-high;
+   ti,hysteresis = <2>;
+   ti,conversion-rate-cHz = <25>;
+   };
diff --git a/Documentation/hwmon/tmp108 b/Documentation/hwmon/tmp108
new file mode 100644
index 000..ef2e9a3
--- /dev/null
+++ b/Documentation/hwmon/tmp108
@@ -0,0 +1,38 @@
+Kernel driver tmp108
+
+
+Supported chips:
+  * Texas Instruments TMP108
+Prefix: 'tmp108'
+Addresses scanned: none
+Datasheet: http://www.ti.com/product/tmp108
+
+Author:
+   John Muir 
+
+Description
+---
+
+The Texas Instruments TMP108 implements one temperature sensor. An alert pin
+can be set when temperatures exceed minimum or maximum values plus or minus a
+hysteresis value.
+
+The sensor is accurate to 0.75C over the range of -25 to +85 C, and to 1.0
+degree from -40 to +125 C. Resolution of the sensor is 0.0625 degree. The
+operating temperature has a minimum of -55 C and a maximum of +150 C.
+Hysteresis values can be set to 0, 1, 2, or 4C.
+
+The TMP108 has a programmable update rate that can select between 8, 4, 1, and
+0.5 Hz.
+
+By default the TMP108 reads the temperature continuously. To conserve power,
+the TMP108 has a one-shot mode where the device is normally shut-down. When a
+one shot is requested the temperature is read, the result can be retrieved,
+and then the device is shut down automatically. (This driver only supports
+continuous mode.)
+
+The driver provides the common sysfs-interface for temperatures (see
+Documentation/hwmon/sysfs-interface under Temperatures).
+
+See Documentation/devicetree/bindings/hwmon/tmp108.txt for configuration
+properties.
diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
index 45cef3d..4c173de 100644
--- a/drivers/hwmon/Kconfig
+++ b/drivers/hwmon/Kconfig
@@ -1591,6 +1591,17 @@ config SENSORS_TMP103
  This driver can also be built as a module.  If so, the module
  will be called tmp103.
 
+config SENSORS_TMP108
+   tristate "Texas Instruments TMP108"
+   depends on I2C
+   select REGMAP_I2C
+   help
+ If you say yes here you get support for Texas Instruments TMP108
+ sensor chips.
+
+ This driver can also be built as a module.  If so, the module
+ will be called tmp108.
+
 config SENSORS_TMP401
tristate "Texas Instruments TMP401 and compatibles"
depends on I2C
diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
index aecf4ba..51e5256 100644
--- a/drivers/hwmon/Makefile
+++ b/drivers/hwmon/Makefile
@@ -152,6 +152,7 @@ obj-$(CONFIG_SENSORS_TC74)  += tc74.o
 obj-$(CONFIG_SENSORS_THMC50)   += thmc50.o
 obj-$(CONFIG_SENSORS_TMP102)   += tmp102.o
 obj-$(CONFIG_SENSORS_TMP103)   += tmp103.o
+obj-$(CONFIG_SENSORS_TMP108)   += tmp108.o
 obj-$(CONFIG_SENSORS_TMP401)   += tmp401.o
 obj-$(CONFIG_SENSORS_TMP421)   += tmp421.o
 obj-$(CONFIG_SENSORS_TWL4030_MADC)+= twl4030-madc-hwmon.o
diff --git a/drivers/hwmon/tmp108.c b/drivers/hwmon/tmp108.c
new file mode 100644
index 000..da64517
--- /dev/null
+++ b/drivers/hwmon/tmp108.c
@@ -0,0 +1,429 @@
+/* Texas Instruments TMP108 SMBus temperature sensor driver
+ *
+ * Copyright (C) 2016 John Muir 
+ *
+ * This program is

[PATCH 1/3] hwmon: Add Texas Instruments TMP108 temperature sensor driver.

2016-11-26 Thread John Muir

Add support for the TI TMP108 temperature sensor with some device
configuration parameters.

Signed-off-by: John Muir 
---
 Documentation/devicetree/bindings/hwmon/tmp108.txt |  27 ++
 Documentation/hwmon/tmp108 |  38 ++
 drivers/hwmon/Kconfig  |  11 +
 drivers/hwmon/Makefile |   1 +
 drivers/hwmon/tmp108.c | 429 +
 5 files changed, 506 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/hwmon/tmp108.txt
 create mode 100644 Documentation/hwmon/tmp108
 create mode 100644 drivers/hwmon/tmp108.c

diff --git a/Documentation/devicetree/bindings/hwmon/tmp108.txt 
b/Documentation/devicetree/bindings/hwmon/tmp108.txt
new file mode 100644
index 000..210af63
--- /dev/null
+++ b/Documentation/devicetree/bindings/hwmon/tmp108.txt
@@ -0,0 +1,27 @@
+TMP108 temperature sensor
+-
+
+This device supports I2C only.
+
+Requires node properties:
+- compatible : "ti,tmp108"
+- reg : the I2C address of the device. This is 0x48, 0x49, 0x4a, or 0x4b.
+
+Optional node properties:
+- ti,thermostat-mode-comparitor : (boolean) select the comparitor mode for the
+  thermostat rather than the default interrupt-mode.
+- ti,alert-active-high : (boolean) make the alert pin active-high instead of
+  the default active-low.
+- ti,conversion-rate-cHz : (integer, cHz) select the conversion frequency for
+  continuous mode, in centi-Hz: 25, 100 (default), 400, or 1600.
+- ti,hysteresis : (integer, C) select the hysteresis value: 0, 1, 2, or 4
+  (celcius).
+
+Example:
+   tmp108@48 {
+   compatible = "ti,tmp108";
+   reg = <0x48>;
+   ti,alert-active-high;
+   ti,hysteresis = <2>;
+   ti,conversion-rate-cHz = <25>;
+   };
diff --git a/Documentation/hwmon/tmp108 b/Documentation/hwmon/tmp108
new file mode 100644
index 000..ef2e9a3
--- /dev/null
+++ b/Documentation/hwmon/tmp108
@@ -0,0 +1,38 @@
+Kernel driver tmp108
+
+
+Supported chips:
+  * Texas Instruments TMP108
+Prefix: 'tmp108'
+Addresses scanned: none
+Datasheet: http://www.ti.com/product/tmp108
+
+Author:
+   John Muir 
+
+Description
+---
+
+The Texas Instruments TMP108 implements one temperature sensor. An alert pin
+can be set when temperatures exceed minimum or maximum values plus or minus a
+hysteresis value.
+
+The sensor is accurate to 0.75C over the range of -25 to +85 C, and to 1.0
+degree from -40 to +125 C. Resolution of the sensor is 0.0625 degree. The
+operating temperature has a minimum of -55 C and a maximum of +150 C.
+Hysteresis values can be set to 0, 1, 2, or 4C.
+
+The TMP108 has a programmable update rate that can select between 8, 4, 1, and
+0.5 Hz.
+
+By default the TMP108 reads the temperature continuously. To conserve power,
+the TMP108 has a one-shot mode where the device is normally shut-down. When a
+one shot is requested the temperature is read, the result can be retrieved,
+and then the device is shut down automatically. (This driver only supports
+continuous mode.)
+
+The driver provides the common sysfs-interface for temperatures (see
+Documentation/hwmon/sysfs-interface under Temperatures).
+
+See Documentation/devicetree/bindings/hwmon/tmp108.txt for configuration
+properties.
diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
index 45cef3d..4c173de 100644
--- a/drivers/hwmon/Kconfig
+++ b/drivers/hwmon/Kconfig
@@ -1591,6 +1591,17 @@ config SENSORS_TMP103
  This driver can also be built as a module.  If so, the module
  will be called tmp103.
 
+config SENSORS_TMP108
+   tristate "Texas Instruments TMP108"
+   depends on I2C
+   select REGMAP_I2C
+   help
+ If you say yes here you get support for Texas Instruments TMP108
+ sensor chips.
+
+ This driver can also be built as a module.  If so, the module
+ will be called tmp108.
+
 config SENSORS_TMP401
tristate "Texas Instruments TMP401 and compatibles"
depends on I2C
diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
index aecf4ba..51e5256 100644
--- a/drivers/hwmon/Makefile
+++ b/drivers/hwmon/Makefile
@@ -152,6 +152,7 @@ obj-$(CONFIG_SENSORS_TC74)  += tc74.o
 obj-$(CONFIG_SENSORS_THMC50)   += thmc50.o
 obj-$(CONFIG_SENSORS_TMP102)   += tmp102.o
 obj-$(CONFIG_SENSORS_TMP103)   += tmp103.o
+obj-$(CONFIG_SENSORS_TMP108)   += tmp108.o
 obj-$(CONFIG_SENSORS_TMP401)   += tmp401.o
 obj-$(CONFIG_SENSORS_TMP421)   += tmp421.o
 obj-$(CONFIG_SENSORS_TWL4030_MADC)+= twl4030-madc-hwmon.o
diff --git a/drivers/hwmon/tmp108.c b/drivers/hwmon/tmp108.c
new file mode 100644
index 000..da64517
--- /dev/null
+++ b/drivers/hwmon/tmp108.c
@@ -0,0 +1,429 @@
+/* Texas Instruments TMP108 SMBus temperature sensor driver
+ *
+ * Copyright (C) 2016 John Muir 
+ *
+ * This program is free software; you can redistribute it and/or

[PATCH 0/3] Texas Instruments TMP108 temperature sensor driver.

2016-11-26 Thread John Muir

(Attempting a resend due to apparent delivery failure.)

This driver is split into three patches as it is being ported forward from
a Linux 4.4 implementation where it was tested. The final driver code uses
interfaces that are not available in 4.4.

John Muir (3):
  hwmon: Add Texas Instruments TMP108 temperature sensor driver.
  hwmon: tmp108: Use devm variants of registration interfaces.
  hwmon: tmp108: Update driver to use hwmon_chip_info.

 Documentation/devicetree/bindings/hwmon/tmp108.txt |  27 ++
 Documentation/hwmon/tmp108 |  38 ++
 drivers/hwmon/Kconfig  |  11 +
 drivers/hwmon/Makefile |   1 +
 drivers/hwmon/tmp108.c | 421 +
 5 files changed, 498 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/hwmon/tmp108.txt
 create mode 100644 Documentation/hwmon/tmp108
 create mode 100644 drivers/hwmon/tmp108.c

-- 
2.8.0.rc3.226.g39d4020

[PATCH 0/3] Texas Instruments TMP108 temperature sensor driver.

2016-11-26 Thread John Muir

(Attempting a resend due to apparent delivery failure.)

This driver is split into three patches as it is being ported forward from
a Linux 4.4 implementation where it was tested. The final driver code uses
interfaces that are not available in 4.4.

John Muir (3):
  hwmon: Add Texas Instruments TMP108 temperature sensor driver.
  hwmon: tmp108: Use devm variants of registration interfaces.
  hwmon: tmp108: Update driver to use hwmon_chip_info.

 Documentation/devicetree/bindings/hwmon/tmp108.txt |  27 ++
 Documentation/hwmon/tmp108 |  38 ++
 drivers/hwmon/Kconfig  |  11 +
 drivers/hwmon/Makefile |   1 +
 drivers/hwmon/tmp108.c | 421 +
 5 files changed, 498 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/hwmon/tmp108.txt
 create mode 100644 Documentation/hwmon/tmp108
 create mode 100644 drivers/hwmon/tmp108.c

-- 
2.8.0.rc3.226.g39d4020

Re: [PATCH] Avoid nested function definition

2016-11-26 Thread Coly Li

On 2016/11/27 上午6:24, Peter Foley wrote:
> Fixes below error with clang:
> ../drivers/md/bcache/sysfs.c:759:3: error: function definition is not allowed 
> here
> {   return *((uint16_t *) r) - *((uint16_t *) l); }
> ^
> ../drivers/md/bcache/sysfs.c:789:32: error: use of undeclared identifier 'cmp'
> sort(p, n, sizeof(uint16_t), cmp, NULL);
>  ^
> 2 errors generated.
> 
> Signed-off-by: Peter Foley 
> ---
>  drivers/md/bcache/sysfs.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c
> index b3ff57d61dde..22ac9e6676a0 100644
> --- a/drivers/md/bcache/sysfs.c
> +++ b/drivers/md/bcache/sysfs.c
> @@ -731,6 +731,11 @@ static struct attribute *bch_cache_set_internal_files[] 
> = {
>  };
>  KTYPE(bch_cache_set_internal);
>  
> +static int cmp(const void *l, const void *r)
> +{
> + return *((uint16_t *)r) - *((uint16_t *)l);
> +}
> +
>  SHOW(__bch_cache)
>  {
>   struct cache *ca = container_of(kobj, struct cache, kobj);
> @@ -755,9 +760,6 @@ SHOW(__bch_cache)
>  CACHE_REPLACEMENT(>sb));
>  
>   if (attr == _priority_stats) {
> - int cmp(const void *l, const void *r)
> - {   return *((uint16_t *) r) - *((uint16_t *) l); }
> -
>   struct bucket *b;
>   size_t n = ca->sb.nbuckets, i;
>   size_t unused = 0, available = 0, dirty = 0, meta = 0;
> 

 I agree with this fix. Anyway, Can we use a more unique name like
__bch_cache_cmp() ?

Thanks.

-- 
Coly Li

Re: [PATCH] Avoid nested function definition

2016-11-26 Thread Coly Li

On 2016/11/27 上午6:24, Peter Foley wrote:
> Fixes below error with clang:
> ../drivers/md/bcache/sysfs.c:759:3: error: function definition is not allowed 
> here
> {   return *((uint16_t *) r) - *((uint16_t *) l); }
> ^
> ../drivers/md/bcache/sysfs.c:789:32: error: use of undeclared identifier 'cmp'
> sort(p, n, sizeof(uint16_t), cmp, NULL);
>  ^
> 2 errors generated.
> 
> Signed-off-by: Peter Foley 
> ---
>  drivers/md/bcache/sysfs.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c
> index b3ff57d61dde..22ac9e6676a0 100644
> --- a/drivers/md/bcache/sysfs.c
> +++ b/drivers/md/bcache/sysfs.c
> @@ -731,6 +731,11 @@ static struct attribute *bch_cache_set_internal_files[] 
> = {
>  };
>  KTYPE(bch_cache_set_internal);
>  
> +static int cmp(const void *l, const void *r)
> +{
> + return *((uint16_t *)r) - *((uint16_t *)l);
> +}
> +
>  SHOW(__bch_cache)
>  {
>   struct cache *ca = container_of(kobj, struct cache, kobj);
> @@ -755,9 +760,6 @@ SHOW(__bch_cache)
>  CACHE_REPLACEMENT(>sb));
>  
>   if (attr == _priority_stats) {
> - int cmp(const void *l, const void *r)
> - {   return *((uint16_t *) r) - *((uint16_t *) l); }
> -
>   struct bucket *b;
>   size_t n = ca->sb.nbuckets, i;
>   size_t unused = 0, available = 0, dirty = 0, meta = 0;
> 

 I agree with this fix. Anyway, Can we use a more unique name like
__bch_cache_cmp() ?

Thanks.

-- 
Coly Li

Re: automatic IRQ affinity for virtio

2016-11-26 Thread Michael S. Tsirkin

On Fri, Nov 25, 2016 at 08:25:38AM +0100, Christoph Hellwig wrote:
> Btw, what's the best way to get any response to this series?
> But this and the predecessor seem to have completly fallen on deaf
> ears.

I'm sorry, I intend to review soon and if OK merge it.

The fact that it depends on tip means I can't put
it in my first pull request (until tip is merged)
so I deferred review somewhat.

Re: automatic IRQ affinity for virtio

2016-11-26 Thread Michael S. Tsirkin

On Fri, Nov 25, 2016 at 08:25:38AM +0100, Christoph Hellwig wrote:
> Btw, what's the best way to get any response to this series?
> But this and the predecessor seem to have completly fallen on deaf
> ears.

I'm sorry, I intend to review soon and if OK merge it.

The fact that it depends on tip means I can't put
it in my first pull request (until tip is merged)
so I deferred review somewhat.

Re: [PATCH v2 1/2] virtio: introduce little edian functions for virtio_cread/write# family

2016-11-26 Thread Michael S. Tsirkin

On Tue, Nov 22, 2016 at 04:10:22PM +0800, Gonglei wrote:
> Virtio modern devices are always little edian, let's introduce
> the LE functions for read/write configuration space for
> virtio modern devices, which avoid complaint by Sparse when
> we use the virtio_creaed/virtio_cwrite in VIRTIO_1 devices.
> 
> Signed-off-by: Gonglei 
> ---
>  include/linux/virtio_config.h | 45 
> +++
>  1 file changed, 45 insertions(+)
> 
> diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> index 26c155b..de05707 100644
> --- a/include/linux/virtio_config.h
> +++ b/include/linux/virtio_config.h
> @@ -414,4 +414,49 @@ static inline void virtio_cwrite64(struct virtio_device 
> *vdev,
>   _r; \
>   })
>  
> +static inline __le16 virtio_cread16_le(struct virtio_device *vdev,
> +  unsigned int offset)
> +{
> + __le16 ret;
> +
> + vdev->config->get(vdev, offset, , sizeof(ret));
> + return ret;
> +}
> +
> +static inline void virtio_cwrite16_le(struct virtio_device *vdev,
> +unsigned int offset, __le16 val)
> +{
> + vdev->config->set(vdev, offset, , sizeof(val));
> +}
> +
> +static inline __le32 virtio_cread32_le(struct virtio_device *vdev,
> +  unsigned int offset)
> +{
> + __le32 ret;
> +
> + vdev->config->get(vdev, offset, , sizeof(ret));
> + return ret;
> +}
> +
> +static inline void virtio_cwrite32_le(struct virtio_device *vdev,
> +unsigned int offset, __le32 val)
> +{
> + vdev->config->set(vdev, offset, , sizeof(val));
> +}
> +
> +static inline __le64 virtio_cread64_le(struct virtio_device *vdev,
> +  unsigned int offset)
> +{
> + __le64 ret;
> +
> + __virtio_cread_many(vdev, offset, , 1, sizeof(ret));
> + return ret;
> +}
> +
> +static inline void virtio_cwrite64_le(struct virtio_device *vdev,
> +unsigned int offset, __le64 val)
> +{
> + vdev->config->set(vdev, offset, , sizeof(val));
> +}
> +
>  #endif /* _LINUX_VIRTIO_CONFIG_H */

Could you please better explain what is the issue you are facing?
virtio_cwrite/virtio_cread all accept and return native types.

If you want it in LE format, swap it!



> -- 
> 1.8.3.1
>

Re: [PATCH v2 1/2] virtio: introduce little edian functions for virtio_cread/write# family

2016-11-26 Thread Michael S. Tsirkin

On Tue, Nov 22, 2016 at 04:10:22PM +0800, Gonglei wrote:
> Virtio modern devices are always little edian, let's introduce
> the LE functions for read/write configuration space for
> virtio modern devices, which avoid complaint by Sparse when
> we use the virtio_creaed/virtio_cwrite in VIRTIO_1 devices.
> 
> Signed-off-by: Gonglei 
> ---
>  include/linux/virtio_config.h | 45 
> +++
>  1 file changed, 45 insertions(+)
> 
> diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> index 26c155b..de05707 100644
> --- a/include/linux/virtio_config.h
> +++ b/include/linux/virtio_config.h
> @@ -414,4 +414,49 @@ static inline void virtio_cwrite64(struct virtio_device 
> *vdev,
>   _r; \
>   })
>  
> +static inline __le16 virtio_cread16_le(struct virtio_device *vdev,
> +  unsigned int offset)
> +{
> + __le16 ret;
> +
> + vdev->config->get(vdev, offset, , sizeof(ret));
> + return ret;
> +}
> +
> +static inline void virtio_cwrite16_le(struct virtio_device *vdev,
> +unsigned int offset, __le16 val)
> +{
> + vdev->config->set(vdev, offset, , sizeof(val));
> +}
> +
> +static inline __le32 virtio_cread32_le(struct virtio_device *vdev,
> +  unsigned int offset)
> +{
> + __le32 ret;
> +
> + vdev->config->get(vdev, offset, , sizeof(ret));
> + return ret;
> +}
> +
> +static inline void virtio_cwrite32_le(struct virtio_device *vdev,
> +unsigned int offset, __le32 val)
> +{
> + vdev->config->set(vdev, offset, , sizeof(val));
> +}
> +
> +static inline __le64 virtio_cread64_le(struct virtio_device *vdev,
> +  unsigned int offset)
> +{
> + __le64 ret;
> +
> + __virtio_cread_many(vdev, offset, , 1, sizeof(ret));
> + return ret;
> +}
> +
> +static inline void virtio_cwrite64_le(struct virtio_device *vdev,
> +unsigned int offset, __le64 val)
> +{
> + vdev->config->set(vdev, offset, , sizeof(val));
> +}
> +
>  #endif /* _LINUX_VIRTIO_CONFIG_H */

Could you please better explain what is the issue you are facing?
virtio_cwrite/virtio_cread all accept and return native types.

If you want it in LE format, swap it!



> -- 
> 1.8.3.1
>

Re: [PATCH v2 2/2] crypto: add virtio-crypto driver

2016-11-26 Thread Michael S. Tsirkin

On Tue, Nov 22, 2016 at 04:10:23PM +0800, Gonglei wrote:
> This patch introduces virtio-crypto driver for Linux Kernel.
> 
> The virtio crypto device is a virtual cryptography device
> as well as a kind of virtual hardware accelerator for
> virtual machines. The encryption anddecryption requests
> are placed in the data queue and are ultimately handled by
> thebackend crypto accelerators. The second queue is the
> control queue used to create or destroy sessions for
> symmetric algorithms and will control some advanced features
> in the future. The virtio crypto device provides the following
> cryptoservices: CIPHER, MAC, HASH, and AEAD.
> 
> For more information about virtio-crypto device, please see:
>   http://qemu-project.org/Features/VirtioCrypto
> 
> CC: Michael S. Tsirkin 
> CC: Cornelia Huck 
> CC: Stefan Hajnoczi 
> CC: Herbert Xu 
> CC: Halil Pasic 
> CC: David S. Miller 
> CC: Zeng Xin 
> Signed-off-by: Gonglei 
> ---
>  MAINTAINERS  |   8 +
>  drivers/crypto/Kconfig   |   2 +
>  drivers/crypto/Makefile  |   1 +
>  drivers/crypto/virtio/Kconfig|  10 +
>  drivers/crypto/virtio/Makefile   |   5 +
>  drivers/crypto/virtio/virtio_crypto.c| 444 +++
>  drivers/crypto/virtio/virtio_crypto_algs.c   | 524 
> +++
>  drivers/crypto/virtio/virtio_crypto_common.h | 124 +++
>  drivers/crypto/virtio/virtio_crypto_mgr.c| 258 +
>  include/uapi/linux/Kbuild|   1 +
>  include/uapi/linux/virtio_crypto.h   | 435 ++
>  include/uapi/linux/virtio_ids.h  |   1 +
>  12 files changed, 1813 insertions(+)
>  create mode 100644 drivers/crypto/virtio/Kconfig
>  create mode 100644 drivers/crypto/virtio/Makefile
>  create mode 100644 drivers/crypto/virtio/virtio_crypto.c
>  create mode 100644 drivers/crypto/virtio/virtio_crypto_algs.c
>  create mode 100644 drivers/crypto/virtio/virtio_crypto_common.h
>  create mode 100644 drivers/crypto/virtio/virtio_crypto_mgr.c
>  create mode 100644 include/uapi/linux/virtio_crypto.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 411e3b8..d6b18bb 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -12844,6 +12844,14 @@ S:   Maintained
>  F:   drivers/virtio/virtio_input.c
>  F:   include/uapi/linux/virtio_input.h
>  
> +VIRTIO CRYPTO DRIVER
> +M:  Gonglei 
> +L:  virtualizat...@lists.linux-foundation.org
> +L:  linux-cry...@vger.kernel.org
> +S:  Maintained
> +F:  drivers/crypto/virtio/
> +F:  include/uapi/linux/virtio_crypto.h
> +
>  VIA RHINE NETWORK DRIVER
>  S:   Orphan
>  F:   drivers/net/ethernet/via/via-rhine.c
> diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
> index 4d2b81f..7956478 100644
> --- a/drivers/crypto/Kconfig
> +++ b/drivers/crypto/Kconfig
> @@ -555,4 +555,6 @@ config CRYPTO_DEV_ROCKCHIP
>  
>  source "drivers/crypto/chelsio/Kconfig"
>  
> +source "drivers/crypto/virtio/Kconfig"
> +
>  endif # CRYPTO_HW
> diff --git a/drivers/crypto/Makefile b/drivers/crypto/Makefile
> index ad7250f..bc53cb8 100644
> --- a/drivers/crypto/Makefile
> +++ b/drivers/crypto/Makefile
> @@ -32,3 +32,4 @@ obj-$(CONFIG_CRYPTO_DEV_VMX) += vmx/
>  obj-$(CONFIG_CRYPTO_DEV_SUN4I_SS) += sunxi-ss/
>  obj-$(CONFIG_CRYPTO_DEV_ROCKCHIP) += rockchip/
>  obj-$(CONFIG_CRYPTO_DEV_CHELSIO) += chelsio/
> +obj-$(CONFIG_CRYPTO_DEV_VIRTIO) += virtio/
> diff --git a/drivers/crypto/virtio/Kconfig b/drivers/crypto/virtio/Kconfig
> new file mode 100644
> index 000..ceae88c
> --- /dev/null
> +++ b/drivers/crypto/virtio/Kconfig
> @@ -0,0 +1,10 @@
> +config CRYPTO_DEV_VIRTIO
> + tristate "VirtIO crypto driver"
> + depends on VIRTIO
> +select CRYPTO_AEAD
> +select CRYPTO_AUTHENC
> +select CRYPTO_BLKCIPHER
> + default m
> + help
> +   This driver provides support for virtio crypto device. If you
> +   choose 'M' here, this module will be called virtio-crypto.
> diff --git a/drivers/crypto/virtio/Makefile b/drivers/crypto/virtio/Makefile
> new file mode 100644
> index 000..a316e92
> --- /dev/null
> +++ b/drivers/crypto/virtio/Makefile
> @@ -0,0 +1,5 @@
> +obj-$(CONFIG_CRYPTO_DEV_VIRTIO) += virtio-crypto.o
> +virtio-crypto-objs := \
> + virtio_crypto_algs.o \
> + virtio_crypto_mgr.o \
> + virtio_crypto.o
> diff --git a/drivers/crypto/virtio/virtio_crypto.c 
> b/drivers/crypto/virtio/virtio_crypto.c
> new file mode 100644
> index 000..56fdfed
> --- /dev/null
> +++ b/drivers/crypto/virtio/virtio_crypto.c
> @@ -0,0 +1,444 @@
> + /* Driver for Virtio crypto device.
> +  *
> +  * Copyright 2016 HUAWEI TECHNOLOGIES CO., LTD.
> +  *
> +  * This program is free software; you can redistribute it and/or modify
> +  * it under

Re: [PATCH v2 2/2] crypto: add virtio-crypto driver

2016-11-26 Thread Michael S. Tsirkin

On Tue, Nov 22, 2016 at 04:10:23PM +0800, Gonglei wrote:
> This patch introduces virtio-crypto driver for Linux Kernel.
> 
> The virtio crypto device is a virtual cryptography device
> as well as a kind of virtual hardware accelerator for
> virtual machines. The encryption anddecryption requests
> are placed in the data queue and are ultimately handled by
> thebackend crypto accelerators. The second queue is the
> control queue used to create or destroy sessions for
> symmetric algorithms and will control some advanced features
> in the future. The virtio crypto device provides the following
> cryptoservices: CIPHER, MAC, HASH, and AEAD.
> 
> For more information about virtio-crypto device, please see:
>   http://qemu-project.org/Features/VirtioCrypto
> 
> CC: Michael S. Tsirkin 
> CC: Cornelia Huck 
> CC: Stefan Hajnoczi 
> CC: Herbert Xu 
> CC: Halil Pasic 
> CC: David S. Miller 
> CC: Zeng Xin 
> Signed-off-by: Gonglei 
> ---
>  MAINTAINERS  |   8 +
>  drivers/crypto/Kconfig   |   2 +
>  drivers/crypto/Makefile  |   1 +
>  drivers/crypto/virtio/Kconfig|  10 +
>  drivers/crypto/virtio/Makefile   |   5 +
>  drivers/crypto/virtio/virtio_crypto.c| 444 +++
>  drivers/crypto/virtio/virtio_crypto_algs.c   | 524 
> +++
>  drivers/crypto/virtio/virtio_crypto_common.h | 124 +++
>  drivers/crypto/virtio/virtio_crypto_mgr.c| 258 +
>  include/uapi/linux/Kbuild|   1 +
>  include/uapi/linux/virtio_crypto.h   | 435 ++
>  include/uapi/linux/virtio_ids.h  |   1 +
>  12 files changed, 1813 insertions(+)
>  create mode 100644 drivers/crypto/virtio/Kconfig
>  create mode 100644 drivers/crypto/virtio/Makefile
>  create mode 100644 drivers/crypto/virtio/virtio_crypto.c
>  create mode 100644 drivers/crypto/virtio/virtio_crypto_algs.c
>  create mode 100644 drivers/crypto/virtio/virtio_crypto_common.h
>  create mode 100644 drivers/crypto/virtio/virtio_crypto_mgr.c
>  create mode 100644 include/uapi/linux/virtio_crypto.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 411e3b8..d6b18bb 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -12844,6 +12844,14 @@ S:   Maintained
>  F:   drivers/virtio/virtio_input.c
>  F:   include/uapi/linux/virtio_input.h
>  
> +VIRTIO CRYPTO DRIVER
> +M:  Gonglei 
> +L:  virtualizat...@lists.linux-foundation.org
> +L:  linux-cry...@vger.kernel.org
> +S:  Maintained
> +F:  drivers/crypto/virtio/
> +F:  include/uapi/linux/virtio_crypto.h
> +
>  VIA RHINE NETWORK DRIVER
>  S:   Orphan
>  F:   drivers/net/ethernet/via/via-rhine.c
> diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
> index 4d2b81f..7956478 100644
> --- a/drivers/crypto/Kconfig
> +++ b/drivers/crypto/Kconfig
> @@ -555,4 +555,6 @@ config CRYPTO_DEV_ROCKCHIP
>  
>  source "drivers/crypto/chelsio/Kconfig"
>  
> +source "drivers/crypto/virtio/Kconfig"
> +
>  endif # CRYPTO_HW
> diff --git a/drivers/crypto/Makefile b/drivers/crypto/Makefile
> index ad7250f..bc53cb8 100644
> --- a/drivers/crypto/Makefile
> +++ b/drivers/crypto/Makefile
> @@ -32,3 +32,4 @@ obj-$(CONFIG_CRYPTO_DEV_VMX) += vmx/
>  obj-$(CONFIG_CRYPTO_DEV_SUN4I_SS) += sunxi-ss/
>  obj-$(CONFIG_CRYPTO_DEV_ROCKCHIP) += rockchip/
>  obj-$(CONFIG_CRYPTO_DEV_CHELSIO) += chelsio/
> +obj-$(CONFIG_CRYPTO_DEV_VIRTIO) += virtio/
> diff --git a/drivers/crypto/virtio/Kconfig b/drivers/crypto/virtio/Kconfig
> new file mode 100644
> index 000..ceae88c
> --- /dev/null
> +++ b/drivers/crypto/virtio/Kconfig
> @@ -0,0 +1,10 @@
> +config CRYPTO_DEV_VIRTIO
> + tristate "VirtIO crypto driver"
> + depends on VIRTIO
> +select CRYPTO_AEAD
> +select CRYPTO_AUTHENC
> +select CRYPTO_BLKCIPHER
> + default m
> + help
> +   This driver provides support for virtio crypto device. If you
> +   choose 'M' here, this module will be called virtio-crypto.
> diff --git a/drivers/crypto/virtio/Makefile b/drivers/crypto/virtio/Makefile
> new file mode 100644
> index 000..a316e92
> --- /dev/null
> +++ b/drivers/crypto/virtio/Makefile
> @@ -0,0 +1,5 @@
> +obj-$(CONFIG_CRYPTO_DEV_VIRTIO) += virtio-crypto.o
> +virtio-crypto-objs := \
> + virtio_crypto_algs.o \
> + virtio_crypto_mgr.o \
> + virtio_crypto.o
> diff --git a/drivers/crypto/virtio/virtio_crypto.c 
> b/drivers/crypto/virtio/virtio_crypto.c
> new file mode 100644
> index 000..56fdfed
> --- /dev/null
> +++ b/drivers/crypto/virtio/virtio_crypto.c
> @@ -0,0 +1,444 @@
> + /* Driver for Virtio crypto device.
> +  *
> +  * Copyright 2016 HUAWEI TECHNOLOGIES CO., LTD.
> +  *
> +  * This program is free software; you can redistribute it and/or modify
> +  * it under the terms of the GNU General Public License as published by
> +  * the Free Software Foundation; either version 2 of the License, or
> +  * (at your option) any later version.
> +  *
> +  * This program is

Re: [git pull] vfs fix

2016-11-26 Thread Al Viro

On Sun, Nov 27, 2016 at 02:51:18AM +, Al Viro wrote:

... and screwed up Cc due to comma in S-O-B of commit in question.
My apologies.

Re: [git pull] vfs fix

2016-11-26 Thread Al Viro

On Sun, Nov 27, 2016 at 02:51:18AM +, Al Viro wrote:

... and screwed up Cc due to comma in S-O-B of commit in question.
My apologies.

Re: [git pull] vfs fix

2016-11-26 Thread Linus Torvalds

On Sat, Nov 26, 2016 at 6:25 PM, Al Viro  wrote:
>
> Two issues here.  One is that iov_iter_get_pages{,_alloc}() calling
> conventions are fucking ugly.

No arguments there. I was going to suggest adding an "int *npages"
argument and letting that function fill that in together with the page
array, but it's not like that is going to really make the interface
any prettier.

> Anyway, leaving that BUG_ON() had been wrong; I can send a followup
> massaging that thing as you've suggested, if you are interested in
> that.  But keep in mind that the whole iov_iter_get_pages...() calling
> conventions are going to be changed, hopefully soon.

If it indeed happens soon, I can certainly live with the existing code
as a temporary "ugly but it works". But if it ends up being post-4.10
I'd prefer to clean this up independently first, ok?

Linus

Re: [git pull] vfs fix

2016-11-26 Thread Linus Torvalds

On Sat, Nov 26, 2016 at 6:25 PM, Al Viro  wrote:
>
> Two issues here.  One is that iov_iter_get_pages{,_alloc}() calling
> conventions are fucking ugly.

No arguments there. I was going to suggest adding an "int *npages"
argument and letting that function fill that in together with the page
array, but it's not like that is going to really make the interface
any prettier.

> Anyway, leaving that BUG_ON() had been wrong; I can send a followup
> massaging that thing as you've suggested, if you are interested in
> that.  But keep in mind that the whole iov_iter_get_pages...() calling
> conventions are going to be changed, hopefully soon.

If it indeed happens soon, I can certainly live with the existing code
as a temporary "ugly but it works". But if it ends up being post-4.10
I'd prefer to clean this up independently first, ok?

Linus

Re: [PATCH 0/2] virtio: fix complaint by sparse

2016-11-26 Thread Michael S. Tsirkin

this is in my tree, will be in the next pull request.

On Sat, Nov 26, 2016 at 09:36:50AM +, Gonglei (Arei) wrote:
> Ping...?
> 
> > -Original Message-
> > From: Gonglei (Arei)
> > Sent: Tuesday, November 22, 2016 1:52 PM
> > To: virtualizat...@lists.linux-foundation.org; linux-kernel@vger.kernel.org
> > Cc: m...@redhat.com; jasow...@redhat.com; Gonglei (Arei)
> > Subject: [PATCH 0/2] virtio: fix complaint by sparse
> > 
> > I found some warnings reported by sparse in the virtio code
> > when I checked virtio-crypto's driver stuff. Let's fix them.
> > 
> > Gonglei (2):
> >   virtio_pci_modern: fix complaint by sparse
> >   virtio_ring: fix complaint by sparse
> > 
> >  drivers/virtio/virtio_pci_modern.c | 8 
> >  drivers/virtio/virtio_ring.c   | 4 ++--
> >  2 files changed, 6 insertions(+), 6 deletions(-)
> > 
> > --
> > 1.8.3.1
> >

Re: [PATCH 0/2] virtio: fix complaint by sparse

2016-11-26 Thread Michael S. Tsirkin

this is in my tree, will be in the next pull request.

On Sat, Nov 26, 2016 at 09:36:50AM +, Gonglei (Arei) wrote:
> Ping...?
> 
> > -Original Message-
> > From: Gonglei (Arei)
> > Sent: Tuesday, November 22, 2016 1:52 PM
> > To: virtualizat...@lists.linux-foundation.org; linux-kernel@vger.kernel.org
> > Cc: m...@redhat.com; jasow...@redhat.com; Gonglei (Arei)
> > Subject: [PATCH 0/2] virtio: fix complaint by sparse
> > 
> > I found some warnings reported by sparse in the virtio code
> > when I checked virtio-crypto's driver stuff. Let's fix them.
> > 
> > Gonglei (2):
> >   virtio_pci_modern: fix complaint by sparse
> >   virtio_ring: fix complaint by sparse
> > 
> >  drivers/virtio/virtio_pci_modern.c | 8 
> >  drivers/virtio/virtio_ring.c   | 4 ++--
> >  2 files changed, 6 insertions(+), 6 deletions(-)
> > 
> > --
> > 1.8.3.1
> >

Re: [git pull] vfs fix

2016-11-26 Thread Al Viro

On Sun, Nov 27, 2016 at 02:25:09AM +, Al Viro wrote:

> Anyway, leaving that BUG_ON() had been wrong; I can send a followup
> massaging that thing as you've suggested, if you are interested in
> that.  But keep in mind that the whole iov_iter_get_pages...() calling
> conventions are going to be changed, hopefully soon.

BTW, speaking of iov_iter_get_pages_alloc() callers and splice:
"ceph: combine as many iovec as possile into one OSD request" has added
static size_t dio_get_pagev_size(const struct iov_iter *it)
{
const struct iovec *iov = it->iov;
const struct iovec *iovend = iov + it->nr_segs;
size_t size;

size = iov->iov_len - it->iov_offset;
/*
 * An iov can be page vectored when both the current tail
 * and the next base are page aligned.
 */
while (PAGE_ALIGNED((iov->iov_base + iov->iov_len)) &&
   (++iov < iovend && PAGE_ALIGNED((iov->iov_base {
size += iov->iov_len;
}
dout("dio_get_pagevlen len = %zu\n", size);
return size;
}

... with 'it' possibly being bio_vec-backed iterator.  Could somebody
explain what that code is trying to do?

I would really like to hear details - I'm not saying that the goal mentioned
in the commit message is worthless, but I don't know ceph codebase well
enough to tell what would be a good way to implement that.  In its current
form it's obviously broken.

Re: [git pull] vfs fix

2016-11-26 Thread Al Viro

On Sun, Nov 27, 2016 at 02:25:09AM +, Al Viro wrote:

> Anyway, leaving that BUG_ON() had been wrong; I can send a followup
> massaging that thing as you've suggested, if you are interested in
> that.  But keep in mind that the whole iov_iter_get_pages...() calling
> conventions are going to be changed, hopefully soon.

BTW, speaking of iov_iter_get_pages_alloc() callers and splice:
"ceph: combine as many iovec as possile into one OSD request" has added
static size_t dio_get_pagev_size(const struct iov_iter *it)
{
const struct iovec *iov = it->iov;
const struct iovec *iovend = iov + it->nr_segs;
size_t size;

size = iov->iov_len - it->iov_offset;
/*
 * An iov can be page vectored when both the current tail
 * and the next base are page aligned.
 */
while (PAGE_ALIGNED((iov->iov_base + iov->iov_len)) &&
   (++iov < iovend && PAGE_ALIGNED((iov->iov_base {
size += iov->iov_len;
}
dout("dio_get_pagevlen len = %zu\n", size);
return size;
}

... with 'it' possibly being bio_vec-backed iterator.  Could somebody
explain what that code is trying to do?

I would really like to hear details - I'm not saying that the goal mentioned
in the commit message is worthless, but I don't know ceph codebase well
enough to tell what would be a good way to implement that.  In its current
form it's obviously broken.

Re: [PATCH 1/2] kbuild: provide include/asm/asm-prototypes.h for ARM

2016-11-26 Thread Nicolas Pitre

On Wed, 23 Nov 2016, Russell King - ARM Linux wrote:

> On Wed, Nov 23, 2016 at 09:33:32AM +, Russell King - ARM Linux wrote:
> > So, I still think the whole approach is wrong - it's added extra
> > fragility that wasn't there with the armksyms.c approach.
> 
> Here's what the diffstat and patch looks like when you combine the
> original commit and the three fixes.  The LoC delta of 25 lines can
> be accounted for as deleted commentry.  So, I think (as I've detailed
> above) that the _technical_ benefit of the approach is very low.
> 
> If we want to move the exports into assembly files, I've no problem
> with that, provided we can do it better than this - and by better I
> mean not creating the fragile asm-prototypes.h which divorses the
> prototypes from everything else.

I think we agree on that.

I see that the revert made it to Linus' tree.  That's probably the best 
outcome for now.

I was working on that better solution I alluded to previously. And it 
isn't really complicated either, with much fewer lines of code.  But 
this can wait for the next merge window.

Nicolas

Re: [PATCH 1/2] kbuild: provide include/asm/asm-prototypes.h for ARM

2016-11-26 Thread Nicolas Pitre

On Wed, 23 Nov 2016, Russell King - ARM Linux wrote:

> On Wed, Nov 23, 2016 at 09:33:32AM +, Russell King - ARM Linux wrote:
> > So, I still think the whole approach is wrong - it's added extra
> > fragility that wasn't there with the armksyms.c approach.
> 
> Here's what the diffstat and patch looks like when you combine the
> original commit and the three fixes.  The LoC delta of 25 lines can
> be accounted for as deleted commentry.  So, I think (as I've detailed
> above) that the _technical_ benefit of the approach is very low.
> 
> If we want to move the exports into assembly files, I've no problem
> with that, provided we can do it better than this - and by better I
> mean not creating the fragile asm-prototypes.h which divorses the
> prototypes from everything else.

I think we agree on that.

I see that the revert made it to Linus' tree.  That's probably the best 
outcome for now.

I was working on that better solution I alluded to previously. And it 
isn't really complicated either, with much fewer lines of code.  But 
this can wait for the next merge window.

Nicolas

Re: [git pull] vfs fix

2016-11-26 Thread Al Viro

On Sat, Nov 26, 2016 at 05:48:54PM -0800, Linus Torvalds wrote:
> That's what all the other users do, and that's what should be the
> "right usage pattern", afaik. The number of pages really *is*
> calculated as
> 
>int n = DIV_ROUND_UP(result + offs, PAGE_SIZE);
> 
> in other iov_iter_get_pages_alloc() callers, although tghe nfs code
> open-codes it as
> 
> npages = (result + pgbase + PAGE_SIZE - 1) / PAGE_SIZE;
> 
> so it's not a very strong pattern.

Two issues here.  One is that iov_iter_get_pages{,_alloc}() calling
conventions are fucking ugly.  I'm guilty of that atrocity; my only
excuse is that this thing has congealed from many open-coded instances,
quite a few of those appearing only after considerable massage of the
code.  I _hate_ the boilerplate we have in the functions implementing
those for various iov_iter flavours and boilerplate in the callers.

I am going to try and come up with something less atrocious.  As it
is, renaming that variable and adding it to the return value of
iov_iter_get_pages_alloc() is certainly not a problem and would be
prettier, but TBH I just went "yet another place to go into that cleanup".
Shouldn't have.

Another thing is that it was a leftover from "Alexei, could you see if
that thing fixes your reproducer?" - just in case the things _really_ went
insane and it was not a wrong rounding but somehow completely buggered
pipe_get_pages_alloc().  They hadn't.

Anyway, leaving that BUG_ON() had been wrong; I can send a followup
massaging that thing as you've suggested, if you are interested in
that.  But keep in mind that the whole iov_iter_get_pages...() calling
conventions are going to be changed, hopefully soon.

Re: [git pull] vfs fix

2016-11-26 Thread Al Viro

On Sat, Nov 26, 2016 at 05:48:54PM -0800, Linus Torvalds wrote:
> That's what all the other users do, and that's what should be the
> "right usage pattern", afaik. The number of pages really *is*
> calculated as
> 
>int n = DIV_ROUND_UP(result + offs, PAGE_SIZE);
> 
> in other iov_iter_get_pages_alloc() callers, although tghe nfs code
> open-codes it as
> 
> npages = (result + pgbase + PAGE_SIZE - 1) / PAGE_SIZE;
> 
> so it's not a very strong pattern.

Two issues here.  One is that iov_iter_get_pages{,_alloc}() calling
conventions are fucking ugly.  I'm guilty of that atrocity; my only
excuse is that this thing has congealed from many open-coded instances,
quite a few of those appearing only after considerable massage of the
code.  I _hate_ the boilerplate we have in the functions implementing
those for various iov_iter flavours and boilerplate in the callers.

I am going to try and come up with something less atrocious.  As it
is, renaming that variable and adding it to the return value of
iov_iter_get_pages_alloc() is certainly not a problem and would be
prettier, but TBH I just went "yet another place to go into that cleanup".
Shouldn't have.

Another thing is that it was a leftover from "Alexei, could you see if
that thing fixes your reproducer?" - just in case the things _really_ went
insane and it was not a wrong rounding but somehow completely buggered
pipe_get_pages_alloc().  They hadn't.

Anyway, leaving that BUG_ON() had been wrong; I can send a followup
massaging that thing as you've suggested, if you are interested in
that.  But keep in mind that the whole iov_iter_get_pages...() calling
conventions are going to be changed, hopefully soon.

Re: [PATCH 0/3] Texas Instruments TMP108 temperature sensor driver.

2016-11-26 Thread Guenter Roeck


On 11/26/2016 05:26 PM, John Muir wrote:

This driver is split into three patches as it is being ported forward from
a Linux 4.4 implementation where it was tested. The final driver code uses
interfaces that are not available in 4.4.

John Muir (3):
  hwmon: Add Texas Instruments TMP108 temperature sensor driver.
  hwmon: tmp108: Use devm variants of registration interfaces.
  hwmon: tmp108: Update driver to use hwmon_chip_info.

 Documentation/devicetree/bindings/hwmon/tmp108.txt |  27 ++
 Documentation/hwmon/tmp108 |  38 ++
 drivers/hwmon/Kconfig  |  11 +
 drivers/hwmon/Makefile |   1 +
 drivers/hwmon/tmp108.c | 421 +
 5 files changed, 498 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/hwmon/tmp108.txt
 create mode 100644 Documentation/hwmon/tmp108
 create mode 100644 drivers/hwmon/tmp108.c



Hi John,

maybe the rest of the patches are delayed, but so far I only see this 
introduction.
Just an early note, in case the other patches still show up.

Can you send me a register dump for this chip ?

Thanks,
Guenter

Re: [PATCH 0/3] Texas Instruments TMP108 temperature sensor driver.

2016-11-26 Thread Guenter Roeck


On 11/26/2016 05:26 PM, John Muir wrote:

This driver is split into three patches as it is being ported forward from
a Linux 4.4 implementation where it was tested. The final driver code uses
interfaces that are not available in 4.4.

John Muir (3):
  hwmon: Add Texas Instruments TMP108 temperature sensor driver.
  hwmon: tmp108: Use devm variants of registration interfaces.
  hwmon: tmp108: Update driver to use hwmon_chip_info.

 Documentation/devicetree/bindings/hwmon/tmp108.txt |  27 ++
 Documentation/hwmon/tmp108 |  38 ++
 drivers/hwmon/Kconfig  |  11 +
 drivers/hwmon/Makefile |   1 +
 drivers/hwmon/tmp108.c | 421 +
 5 files changed, 498 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/hwmon/tmp108.txt
 create mode 100644 Documentation/hwmon/tmp108
 create mode 100644 drivers/hwmon/tmp108.c



Hi John,

maybe the rest of the patches are delayed, but so far I only see this 
introduction.
Just an early note, in case the other patches still show up.

Can you send me a register dump for this chip ?

Thanks,
Guenter

Re: [git pull] vfs fix

2016-11-26 Thread Linus Torvalds

On Sat, Nov 26, 2016 at 5:13 PM, Al Viro  wrote:
>
> Al Viro (1):
>   fix default_file_splice_read()

Ugh. I absolutely _hate_ this:

BUG_ON(dummy);

because it makes no sense.

I'm assuming that "dummy" here is "start_offset", and that you want to
make sure that there are no initial offsets that would affect the
nrpages calculation.

But dammit, if so, just *call* it "start_offset", not "dummy". A dummy
value is just a place-holder, it makes no sense to have BUG_ON() on
such a value.

So adding random BUG_ON() statements is evil to begin with, but when
you do it on something that is mis-named and makes no sense, that's
just wrong.

I'm further assuming that the reason we can do that is because
"iov_iter_pipe()" has set iov_offset to zero, and as a result we end
up havin g

  iov_iter_get_pages_alloc() ->
pipe_get_pages_alloc() ->
  data_start() will set *offp to zero.

but quite frankly, you can not tell that from the code itself, which
makes no sense. You have to go digging.

I was hoping the splice code would become more readable, not filled
with more crazy nonsensical code.

So I've pulled this, but _please_:

 - rename "dummy" (which isn't dummy at all now that you *do* things
to it!) to something sane.

   Like perhaps 'pg_offset' or 'iter_offset' or something.

 - Does the "BUG_ON()" really make sense? If the issue is that you
didn't use the offset in calculations, maybe you should just do so, ie
instead of

BUG_ON(dummy);
nr_pages = DIV_ROUND_UP(res, PAGE_SIZE);

   just do

nr_pages = DIV_ROUND_UP(res + iter_offset, PAGE_SIZE);

or something? Even if "iter_offset" ends up always being zero, why is
that worthy of a BUG_ON()? The BUG_ON() is more expensive than just
doing the natural math..

That's what all the other users do, and that's what should be the
"right usage pattern", afaik. The number of pages really *is*
calculated as

   int n = DIV_ROUND_UP(result + offs, PAGE_SIZE);

in other iov_iter_get_pages_alloc() callers, although tghe nfs code
open-codes it as

npages = (result + pgbase + PAGE_SIZE - 1) / PAGE_SIZE;

so it's not a very strong pattern.

Linus

Re: [git pull] vfs fix

2016-11-26 Thread Linus Torvalds

On Sat, Nov 26, 2016 at 5:13 PM, Al Viro  wrote:
>
> Al Viro (1):
>   fix default_file_splice_read()

Ugh. I absolutely _hate_ this:

BUG_ON(dummy);

because it makes no sense.

I'm assuming that "dummy" here is "start_offset", and that you want to
make sure that there are no initial offsets that would affect the
nrpages calculation.

But dammit, if so, just *call* it "start_offset", not "dummy". A dummy
value is just a place-holder, it makes no sense to have BUG_ON() on
such a value.

So adding random BUG_ON() statements is evil to begin with, but when
you do it on something that is mis-named and makes no sense, that's
just wrong.

I'm further assuming that the reason we can do that is because
"iov_iter_pipe()" has set iov_offset to zero, and as a result we end
up havin g

  iov_iter_get_pages_alloc() ->
pipe_get_pages_alloc() ->
  data_start() will set *offp to zero.

but quite frankly, you can not tell that from the code itself, which
makes no sense. You have to go digging.

I was hoping the splice code would become more readable, not filled
with more crazy nonsensical code.

So I've pulled this, but _please_:

 - rename "dummy" (which isn't dummy at all now that you *do* things
to it!) to something sane.

   Like perhaps 'pg_offset' or 'iter_offset' or something.

 - Does the "BUG_ON()" really make sense? If the issue is that you
didn't use the offset in calculations, maybe you should just do so, ie
instead of

BUG_ON(dummy);
nr_pages = DIV_ROUND_UP(res, PAGE_SIZE);

   just do

nr_pages = DIV_ROUND_UP(res + iter_offset, PAGE_SIZE);

or something? Even if "iter_offset" ends up always being zero, why is
that worthy of a BUG_ON()? The BUG_ON() is more expensive than just
doing the natural math..

That's what all the other users do, and that's what should be the
"right usage pattern", afaik. The number of pages really *is*
calculated as

   int n = DIV_ROUND_UP(result + offs, PAGE_SIZE);

in other iov_iter_get_pages_alloc() callers, although tghe nfs code
open-codes it as

npages = (result + pgbase + PAGE_SIZE - 1) / PAGE_SIZE;

so it's not a very strong pattern.

Linus

[PATCH 0/3] Texas Instruments TMP108 temperature sensor driver.

2016-11-26 Thread John Muir

This driver is split into three patches as it is being ported forward from
a Linux 4.4 implementation where it was tested. The final driver code uses
interfaces that are not available in 4.4.

John Muir (3):
  hwmon: Add Texas Instruments TMP108 temperature sensor driver.
  hwmon: tmp108: Use devm variants of registration interfaces.
  hwmon: tmp108: Update driver to use hwmon_chip_info.

 Documentation/devicetree/bindings/hwmon/tmp108.txt |  27 ++
 Documentation/hwmon/tmp108 |  38 ++
 drivers/hwmon/Kconfig  |  11 +
 drivers/hwmon/Makefile |   1 +
 drivers/hwmon/tmp108.c | 421 +
 5 files changed, 498 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/hwmon/tmp108.txt
 create mode 100644 Documentation/hwmon/tmp108
 create mode 100644 drivers/hwmon/tmp108.c

-- 
2.8.0.rc3.226.g39d4020

[PATCH 0/3] Texas Instruments TMP108 temperature sensor driver.

2016-11-26 Thread John Muir

This driver is split into three patches as it is being ported forward from
a Linux 4.4 implementation where it was tested. The final driver code uses
interfaces that are not available in 4.4.

John Muir (3):
  hwmon: Add Texas Instruments TMP108 temperature sensor driver.
  hwmon: tmp108: Use devm variants of registration interfaces.
  hwmon: tmp108: Update driver to use hwmon_chip_info.

 Documentation/devicetree/bindings/hwmon/tmp108.txt |  27 ++
 Documentation/hwmon/tmp108 |  38 ++
 drivers/hwmon/Kconfig  |  11 +
 drivers/hwmon/Makefile |   1 +
 drivers/hwmon/tmp108.c | 421 +
 5 files changed, 498 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/hwmon/tmp108.txt
 create mode 100644 Documentation/hwmon/tmp108
 create mode 100644 drivers/hwmon/tmp108.c

-- 
2.8.0.rc3.226.g39d4020

[git pull] vfs fix

2016-11-26 Thread Al Viro

The following changes since commit 3ad0e83cf86bcaeb6ca3c37060a3ce866b25fb42:

  Merge branch 'parisc-4.9-4' of 
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux (2016-11-25 
16:47:15 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-linus

for you to fetch changes up to 8e54cadab447dae779f80f79c87cbeaea9594f60:

  fix default_file_splice_read() (2016-11-26 20:05:42 -0500)


Al Viro (1):
  fix default_file_splice_read()

 fs/splice.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

[git pull] vfs fix

2016-11-26 Thread Al Viro

The following changes since commit 3ad0e83cf86bcaeb6ca3c37060a3ce866b25fb42:

  Merge branch 'parisc-4.9-4' of 
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux (2016-11-25 
16:47:15 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-linus

for you to fetch changes up to 8e54cadab447dae779f80f79c87cbeaea9594f60:

  fix default_file_splice_read() (2016-11-26 20:05:42 -0500)


Al Viro (1):
  fix default_file_splice_read()

 fs/splice.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Re: netlink: GPF in sock_sndtimeo

2016-11-26 Thread Cong Wang

On Sat, Nov 26, 2016 at 7:44 AM, Dmitry Vyukov  wrote:
> Hello,
>
> The following program triggers GPF in sock_sndtimeo:
> https://gist.githubusercontent.com/dvyukov/c19cadd309791cf5cb9b2bf936d3f48d/raw/1743ba0211079a5465d039512b427bc6b59b1a76/gistfile1.txt
>
> On commit 16ae16c6e5616c084168740990fc508bda6655d4 (Nov 24).
>
> general protection fault:  [#1] SMP DEBUG_PAGEALLOC KASAN
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 1 PID: 19950 Comm: syz-executor Not tainted 4.9.0-rc5+ #54
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: 88002a0d0840 task.stack: 88003692
> RIP: 0010:[]  [< inline >] sock_sndtimeo
> include/net/sock.h:2075
> RIP: 0010:[]  []
> netlink_unicast+0xe1/0x730 net/netlink/af_netlink.c:1232
> RSP: 0018:880036926f68  EFLAGS: 00010202
> RAX: 0068 RBX: 880036927000 RCX: c900021d
> RDX: 0d63 RSI: 024000c0 RDI: 0340
> RBP: 880036927028 R08: ed0006ea7aab R09: ed0006ea7aab
> R10: 0001 R11: ed0006ea7aaa R12: dc00
> R13:  R14: 880035de3400 R15: 880035de3400
> FS:  7f90a2fc7700() GS:88003ed0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 006de0c0 CR3: 35de6000 CR4: 06e0
> Stack:
>  880035de3400 819f02a1 110006d24df4 0004
>  4db40014 880036926fd8  41b58ab3
>  89653c11 86cb3500 819f0345 880035de3400
> Call Trace:
>  [< inline >] audit_replace kernel/audit.c:817
>  [] audit_receive_msg+0x22c9/0x2ce0 kernel/audit.c:894
>  [< inline >] audit_receive_skb kernel/audit.c:1120
>  [] audit_receive+0x1dc/0x360 kernel/audit.c:1133
>  [< inline >] netlink_unicast_kernel net/netlink/af_netlink.c:1214
>  [] netlink_unicast+0x514/0x730 
> net/netlink/af_netlink.c:1240
>  [] netlink_sendmsg+0xaa4/0xe50 
> net/netlink/af_netlink.c:1786
>  [< inline >] sock_sendmsg_nosec net/socket.c:621
>  [] sock_sendmsg+0xcf/0x110 net/socket.c:631
>  [] sock_write_iter+0x32b/0x620 net/socket.c:829
>  [< inline >] new_sync_write fs/read_write.c:499
>  [] __vfs_write+0x4fe/0x830 fs/read_write.c:512
>  [] vfs_write+0x175/0x4e0 fs/read_write.c:560
>  [< inline >] SYSC_write fs/read_write.c:607
>  [] SyS_write+0x100/0x240 fs/read_write.c:599
>  [] do_syscall_64+0x2f4/0x940 arch/x86/entry/common.c:280
>  [] entry_SYSCALL64_slow_path+0x25/0x25
> Code: fe 4c 89 f7 e8 31 16 ff ff 8b 8d 70 ff ff ff 49 89 c7 31 c0 85
> c9 75 25 e8 7d 4a a3 fa 49 8d bd 40 03 00 00 48 89 f8 48 c1 e8 03 <42>
> 80 3c 20 00 0f 85 3a 06 00 00 49 8b 85 40 03 00 00 4c 8d 73
> RIP  [< inline >] sock_sndtimeo include/net/sock.h:2075
> RIP  [] netlink_unicast+0xe1/0x730
> net/netlink/af_netlink.c:1232
>  RSP 
> ---[ end trace 8383a15fba6fdc59 ]---

It is racy on audit_sock, especially on the netns exit path.

Could the following patch help a little bit? Also, I don't see how the
synchronize_net() here could sync with netlink rcv path, since unlike
packets from wire, netlink messages are not handled in BH context
nor I see any RCU taken on rcv path.

diff --git a/kernel/audit.c b/kernel/audit.c
index f1ca116..20bc79e 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1167,10 +1167,13 @@ static void __net_exit audit_net_exit(struct net *net)
 {
struct audit_net *aunet = net_generic(net, audit_net_id);
struct sock *sock = aunet->nlsk;
+
+   mutex_lock(_cmd_mutex);
if (sock == audit_sock) {
audit_pid = 0;
audit_sock = NULL;
}
+   mutex_unlock(_cmd_mutex);

RCU_INIT_POINTER(aunet->nlsk, NULL);
synchronize_net();

Re: netlink: GPF in sock_sndtimeo

2016-11-26 Thread Cong Wang

On Sat, Nov 26, 2016 at 7:44 AM, Dmitry Vyukov  wrote:
> Hello,
>
> The following program triggers GPF in sock_sndtimeo:
> https://gist.githubusercontent.com/dvyukov/c19cadd309791cf5cb9b2bf936d3f48d/raw/1743ba0211079a5465d039512b427bc6b59b1a76/gistfile1.txt
>
> On commit 16ae16c6e5616c084168740990fc508bda6655d4 (Nov 24).
>
> general protection fault:  [#1] SMP DEBUG_PAGEALLOC KASAN
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 1 PID: 19950 Comm: syz-executor Not tainted 4.9.0-rc5+ #54
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: 88002a0d0840 task.stack: 88003692
> RIP: 0010:[]  [< inline >] sock_sndtimeo
> include/net/sock.h:2075
> RIP: 0010:[]  []
> netlink_unicast+0xe1/0x730 net/netlink/af_netlink.c:1232
> RSP: 0018:880036926f68  EFLAGS: 00010202
> RAX: 0068 RBX: 880036927000 RCX: c900021d
> RDX: 0d63 RSI: 024000c0 RDI: 0340
> RBP: 880036927028 R08: ed0006ea7aab R09: ed0006ea7aab
> R10: 0001 R11: ed0006ea7aaa R12: dc00
> R13:  R14: 880035de3400 R15: 880035de3400
> FS:  7f90a2fc7700() GS:88003ed0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 006de0c0 CR3: 35de6000 CR4: 06e0
> Stack:
>  880035de3400 819f02a1 110006d24df4 0004
>  4db40014 880036926fd8  41b58ab3
>  89653c11 86cb3500 819f0345 880035de3400
> Call Trace:
>  [< inline >] audit_replace kernel/audit.c:817
>  [] audit_receive_msg+0x22c9/0x2ce0 kernel/audit.c:894
>  [< inline >] audit_receive_skb kernel/audit.c:1120
>  [] audit_receive+0x1dc/0x360 kernel/audit.c:1133
>  [< inline >] netlink_unicast_kernel net/netlink/af_netlink.c:1214
>  [] netlink_unicast+0x514/0x730 
> net/netlink/af_netlink.c:1240
>  [] netlink_sendmsg+0xaa4/0xe50 
> net/netlink/af_netlink.c:1786
>  [< inline >] sock_sendmsg_nosec net/socket.c:621
>  [] sock_sendmsg+0xcf/0x110 net/socket.c:631
>  [] sock_write_iter+0x32b/0x620 net/socket.c:829
>  [< inline >] new_sync_write fs/read_write.c:499
>  [] __vfs_write+0x4fe/0x830 fs/read_write.c:512
>  [] vfs_write+0x175/0x4e0 fs/read_write.c:560
>  [< inline >] SYSC_write fs/read_write.c:607
>  [] SyS_write+0x100/0x240 fs/read_write.c:599
>  [] do_syscall_64+0x2f4/0x940 arch/x86/entry/common.c:280
>  [] entry_SYSCALL64_slow_path+0x25/0x25
> Code: fe 4c 89 f7 e8 31 16 ff ff 8b 8d 70 ff ff ff 49 89 c7 31 c0 85
> c9 75 25 e8 7d 4a a3 fa 49 8d bd 40 03 00 00 48 89 f8 48 c1 e8 03 <42>
> 80 3c 20 00 0f 85 3a 06 00 00 49 8b 85 40 03 00 00 4c 8d 73
> RIP  [< inline >] sock_sndtimeo include/net/sock.h:2075
> RIP  [] netlink_unicast+0xe1/0x730
> net/netlink/af_netlink.c:1232
>  RSP 
> ---[ end trace 8383a15fba6fdc59 ]---

It is racy on audit_sock, especially on the netns exit path.

Could the following patch help a little bit? Also, I don't see how the
synchronize_net() here could sync with netlink rcv path, since unlike
packets from wire, netlink messages are not handled in BH context
nor I see any RCU taken on rcv path.

diff --git a/kernel/audit.c b/kernel/audit.c
index f1ca116..20bc79e 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1167,10 +1167,13 @@ static void __net_exit audit_net_exit(struct net *net)
 {
struct audit_net *aunet = net_generic(net, audit_net_id);
struct sock *sock = aunet->nlsk;
+
+   mutex_lock(_cmd_mutex);
if (sock == audit_sock) {
audit_pid = 0;
audit_sock = NULL;
}
+   mutex_unlock(_cmd_mutex);

RCU_INIT_POINTER(aunet->nlsk, NULL);
synchronize_net();

Re: [PATCH v3 net-next 1/2] net: ethernet: slicoss: add slicoss gigabit ethernet driver

2016-11-26 Thread Lino Sanfilippo


Hi Rami,


On 26.11.2016 16:48, Rami Rosen wrote:
>> @@ -0,0 +1,28 @@
>> +config NET_VENDOR_ALACRITECH
>> +bool "Alacritech devices"
>> +default y
>> +---help---
>> +  If you have a network (Ethernet) card belonging to this class, 
>> say Y.
>> +
>> +  Note that the answer to this question doesn't directly affect the
>> +  kernel: saying N will just cause the configurator to skip all
> 
> Shouldn't it be "Alacritech devices" here, as appears earlier ?
> 
>> +  the questions about Renesas devices. If you say Y, you will be 
>> asked

Yes, it definitely should not be Renesas :). This is a stupid copy and paste 
error, I will fix it,
thank you! 

>> +  for your specific device in the following questions.
>> +
> 
> ...
> ...
> ...
>> +struct slic_device {
>> +   struct pci_dev *pdev;
> ...
>> +   bool promisc;
> 
> Seems that the autoneg boolean is not used anywhere, apart from
> setting it once to true in
> the slic_set_link_autoneg() method. Apart from this member it is not
> accessed anywhere, so it seems it should be removed.
> 
>> +   bool autoneg;
>> +   int speed;

Agreed, this variable can be removed.

> ...
> 
>> +static int slic_load_rcvseq_firmware(struct slic_device *sdev)
>> +{
>> +   const struct firmware *fw;
>> +   const char *file;
>> +   u32 codelen;
>> +   int idx = 0;
>> +   u32 instr;
>> +   u32 addr;
>> +   int err;
>> +
> ...
>> +   /* Do an initial sanity check concerning firmware size now. A further
>> +* check follows below.
>> +*/
>> +   if (fw->size < SLIC_FIRMWARE_MIN_SIZE) {
>> +   dev_err(>pdev->dev,
>> +   "invalid firmware size %zu (min %u expected)\n",
>> +   fw->size, SLIC_FIRMWARE_MIN_SIZE);
>> +   err = -EINVAL;
> 
> in the release label, always 0 is returned:
> 
>> +   goto release;
>> +   }
>> +
>> +   codelen = slic_read_dword_from_firmware(fw, );
>> +
>> +   /* do another sanity check against firmware size */
>> +   if ((codelen + 4) > fw->size) {
>> +   dev_err(>pdev->dev,
>> +   "invalid rcv-sequencer firmware size %zu\n", 
>> fw->size);
>> +   err = -EINVAL;
> 
> Again, in the release label, always 0 is returned:
> 
>> +   goto release;
>> +   }
>> +
>>
>> +release:
>> +   release_firmware(fw);
>> +
>> +   return 0;
>> +}

This should return "err", I will fix it.

Thanks a lot for the review!

Regards,
Lino

Re: [PATCH v3 net-next 1/2] net: ethernet: slicoss: add slicoss gigabit ethernet driver

2016-11-26 Thread Lino Sanfilippo


Hi Rami,


On 26.11.2016 16:48, Rami Rosen wrote:
>> @@ -0,0 +1,28 @@
>> +config NET_VENDOR_ALACRITECH
>> +bool "Alacritech devices"
>> +default y
>> +---help---
>> +  If you have a network (Ethernet) card belonging to this class, 
>> say Y.
>> +
>> +  Note that the answer to this question doesn't directly affect the
>> +  kernel: saying N will just cause the configurator to skip all
> 
> Shouldn't it be "Alacritech devices" here, as appears earlier ?
> 
>> +  the questions about Renesas devices. If you say Y, you will be 
>> asked

Yes, it definitely should not be Renesas :). This is a stupid copy and paste 
error, I will fix it,
thank you! 

>> +  for your specific device in the following questions.
>> +
> 
> ...
> ...
> ...
>> +struct slic_device {
>> +   struct pci_dev *pdev;
> ...
>> +   bool promisc;
> 
> Seems that the autoneg boolean is not used anywhere, apart from
> setting it once to true in
> the slic_set_link_autoneg() method. Apart from this member it is not
> accessed anywhere, so it seems it should be removed.
> 
>> +   bool autoneg;
>> +   int speed;

Agreed, this variable can be removed.

> ...
> 
>> +static int slic_load_rcvseq_firmware(struct slic_device *sdev)
>> +{
>> +   const struct firmware *fw;
>> +   const char *file;
>> +   u32 codelen;
>> +   int idx = 0;
>> +   u32 instr;
>> +   u32 addr;
>> +   int err;
>> +
> ...
>> +   /* Do an initial sanity check concerning firmware size now. A further
>> +* check follows below.
>> +*/
>> +   if (fw->size < SLIC_FIRMWARE_MIN_SIZE) {
>> +   dev_err(>pdev->dev,
>> +   "invalid firmware size %zu (min %u expected)\n",
>> +   fw->size, SLIC_FIRMWARE_MIN_SIZE);
>> +   err = -EINVAL;
> 
> in the release label, always 0 is returned:
> 
>> +   goto release;
>> +   }
>> +
>> +   codelen = slic_read_dword_from_firmware(fw, );
>> +
>> +   /* do another sanity check against firmware size */
>> +   if ((codelen + 4) > fw->size) {
>> +   dev_err(>pdev->dev,
>> +   "invalid rcv-sequencer firmware size %zu\n", 
>> fw->size);
>> +   err = -EINVAL;
> 
> Again, in the release label, always 0 is returned:
> 
>> +   goto release;
>> +   }
>> +
>>
>> +release:
>> +   release_firmware(fw);
>> +
>> +   return 0;
>> +}

This should return "err", I will fix it.

Thanks a lot for the review!

Regards,
Lino

Re: [RFC PATCH v8 1/9] Restartable sequences system call

2016-11-26 Thread Paul Turner

On Sat, Aug 27, 2016 at 5:21 AM, Pavel Machek  wrote:
>
> Hi!
>
>> Expose a new system call allowing each thread to register one userspace
>> memory area to be used as an ABI between kernel and user-space for two
>> purposes: user-space restartable sequences and quick access to read the
>> current CPU number value from user-space.
>>
>> * Restartable sequences (per-cpu atomics)
>>
>> Restartables sequences allow user-space to perform update operations on
>> per-cpu data without requiring heavy-weight atomic operations.
>>
>> The restartable critical sections (percpu atomics) work has been started
>> by Paul Turner and Andrew Hunter. It lets the kernel handle restart of
>> critical sections. [1] [2] The re-implementation proposed here brings a
>> few simplifications to the ABI which facilitates porting to other
>> architectures and speeds up the user-space fast path. A locking-based
>> fall-back, purely implemented in user-space, is proposed here to deal
>> with debugger single-stepping. This fallback interacts with rseq_start()
>> and rseq_finish(), which force retries in response to concurrent
>> lock-based activity.
>
> Hmm. Purely software fallback needed for singlestepping... Looks like this is 
> malware
> writer's dream come true...
>
> Also if you ever get bug in the restartable code, debugger will be useless to 
> debug it...
> unless new abilities are added to debuggers to manually schedule threads on 
> CPUs.
>
> Is this good idea?

We've had some off-list discussion.

I have a revised version which incoprorates some of Mattheiu's
improvements, but avoids this requirement nearly ready for posting.

- Paul

Re: [RFC PATCH v8 1/9] Restartable sequences system call

2016-11-26 Thread Paul Turner

On Sat, Aug 27, 2016 at 5:21 AM, Pavel Machek  wrote:
>
> Hi!
>
>> Expose a new system call allowing each thread to register one userspace
>> memory area to be used as an ABI between kernel and user-space for two
>> purposes: user-space restartable sequences and quick access to read the
>> current CPU number value from user-space.
>>
>> * Restartable sequences (per-cpu atomics)
>>
>> Restartables sequences allow user-space to perform update operations on
>> per-cpu data without requiring heavy-weight atomic operations.
>>
>> The restartable critical sections (percpu atomics) work has been started
>> by Paul Turner and Andrew Hunter. It lets the kernel handle restart of
>> critical sections. [1] [2] The re-implementation proposed here brings a
>> few simplifications to the ABI which facilitates porting to other
>> architectures and speeds up the user-space fast path. A locking-based
>> fall-back, purely implemented in user-space, is proposed here to deal
>> with debugger single-stepping. This fallback interacts with rseq_start()
>> and rseq_finish(), which force retries in response to concurrent
>> lock-based activity.
>
> Hmm. Purely software fallback needed for singlestepping... Looks like this is 
> malware
> writer's dream come true...
>
> Also if you ever get bug in the restartable code, debugger will be useless to 
> debug it...
> unless new abilities are added to debuggers to manually schedule threads on 
> CPUs.
>
> Is this good idea?

We've had some off-list discussion.

I have a revised version which incoprorates some of Mattheiu's
improvements, but avoids this requirement nearly ready for posting.

- Paul

Re: Linux 4.8.11

2016-11-26 Thread Adam Borowski

On Sat, Nov 26, 2016 at 05:12:35PM +0100, Greg KH wrote:
> I'm announcing the release of the 4.8.11 kernel.

... which splats during early boot where 4.8.10 worked fine.

[0.00] Linux version 4.8.11+ (kilobyte@umbar) (gcc version 6.2.1 
20161124 (Debian 6.2.1-5) ) #1 SMP Sat Nov 26 22:32:52 CET 2016
[0.00] Command line: BOOT_IMAGE=/sys/boot/vmlinuz-4.8.11+ 
root=UUID=b7c38da9-ae84-4083-a1f8-6d7b4fc33961 ro rootflags=subvol=sys 
syscall.x32=y syscall.x32=y console=tty1 console=ttyS0,115200,n81
[0.00] x86/fpu: Legacy x87 FPU detected.
[0.00] x86/fpu: Using 'eager' FPU context switches.
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009ebff] usable
[0.00] BIOS-e820: [mem 0x0009ec00-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000e4000-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0xc7e7] usable
[0.00] BIOS-e820: [mem 0xc7e8-0xc7e97fff] ACPI data
[0.00] BIOS-e820: [mem 0xc7e98000-0xc7ec] ACPI NVS
[0.00] BIOS-e820: [mem 0xc7ed-0xc7ef] reserved
[0.00] BIOS-e820: [mem 0xff70-0x] reserved
[0.00] BIOS-e820: [mem 0x0001-0x000237ff] usable
[0.00] NX (Execute Disable) protection: active
[0.00] SMBIOS 2.5 present.
[0.00] AGP: No AGP bridge found
[0.00] e820: last_pfn = 0x238000 max_arch_pfn = 0x4
[0.00] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WC  UC- WT  
[0.00] e820: last_pfn = 0xc7e80 max_arch_pfn = 0x4
[0.00] Using GB pages for direct mapping
[0.00] RAMDISK: [mem 0x3668d000-0x3733dfff]
[0.00] ACPI: Early table checksum verification disabled
[0.00] ACPI: RSDP 0x000FB510 24 (v02 ACPIAM)
[0.00] ACPI: XSDT 0xC7E80100 5C (v01 051811 XSDT1038 
20110518 MSFT 0097)
[0.00] ACPI: FACP 0xC7E80290 F4 (v03 051811 FACP1038 
20110518 MSFT 0097)
[0.00] ACPI BIOS Warning (bug): Optional FADT field Pm2ControlBlock has 
valid Length but zero Address: 0x/0x1 (20160422/tbfadt-658)
[0.00] ACPI BIOS Warning (bug): 32/64X length mismatch in 
FADT/Gpe0Block: 64/32 (20160422/tbfadt-624)
[0.00] ACPI: DSDT 0xC7E80450 00DB3A (v01 A1540  A1540001 
0001 INTL 20060113)
[0.00] ACPI: FACS 0xC7E98000 40
[0.00] ACPI: FACS 0xC7E98000 40
[0.00] ACPI: APIC 0xC7E80390 7C (v01 051811 APIC1038 
20110518 MSFT 0097)
[0.00] ACPI: MCFG 0xC7E80410 3C (v01 051811 OEMMCFG  
20110518 MSFT 0097)
[0.00] ACPI: OEMB 0xC7E98040 72 (v01 051811 OEMB1038 
20110518 MSFT 0097)
[0.00] ACPI: SRAT 0xC7E8F650 000108 (v01 AMDFAM_F_10 
0002 AMD  0001)
[0.00] ACPI: HPET 0xC7E8F760 38 (v01 051811 OEMHPET  
20110518 MSFT 0097)
[0.00] ACPI: SSDT 0xC7E8F7A0 000DA4 (v01 A M I  POWERNOW 
0001 AMD  0001)
[0.00] SRAT: PXM 0 -> APIC 0x00 -> Node 0
[0.00] SRAT: PXM 0 -> APIC 0x01 -> Node 0
[0.00] SRAT: PXM 0 -> APIC 0x02 -> Node 0
[0.00] SRAT: PXM 0 -> APIC 0x03 -> Node 0
[0.00] SRAT: PXM 0 -> APIC 0x04 -> Node 0
[0.00] SRAT: PXM 0 -> APIC 0x05 -> Node 0
[0.00] ACPI: SRAT: Node 0 PXM 0 [mem 0x-0x0009]
[0.00] ACPI: SRAT: Node 0 PXM 0 [mem 0x0010-0xc7ff]
[0.00] ACPI: SRAT: Node 0 PXM 0 [mem 0x1-0x237ff]
[0.00] NUMA: Node 0 [mem 0x-0x0009] + [mem 
0x0010-0xc7ff] -> [mem 0x-0xc7ff]
[0.00] NUMA: Node 0 [mem 0x-0xc7ff] + [mem 
0x1-0x237ff] -> [mem 0x-0x237ff]
[0.00] NODE_DATA(0) allocated [mem 0x237ffa000-0x237ffdfff]
[0.00] Zone ranges:
[0.00]   DMA  [mem 0x1000-0x00ff]
[0.00]   DMA32[mem 0x0100-0x]
[0.00]   Normal   [mem 0x0001-0x000237ff]
[0.00] Movable zone start for each node
[0.00] Early memory node ranges
[0.00]   node   0: [mem 0x1000-0x0009dfff]
[0.00]   node   0: [mem 0x0010-0xc7e7]
[0.00]   node   0: [mem 0x0001-0x000237ff]
[0.00] Initmem setup node 0 [mem 0x1000-0x000237ff]
[0.00] ACPI: PM-Timer IO Port: 0x808
[0.00] IOAPIC[0]: apic_id 6, version 33, address 0xfec0, GSI 0-23
[0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
[0.00] Using ACPI (MADT) for SMP configuration information
[0.00] ACPI:

Re: Linux 4.8.11

2016-11-26 Thread Adam Borowski

On Sat, Nov 26, 2016 at 05:12:35PM +0100, Greg KH wrote:
> I'm announcing the release of the 4.8.11 kernel.

... which splats during early boot where 4.8.10 worked fine.

[0.00] Linux version 4.8.11+ (kilobyte@umbar) (gcc version 6.2.1 
20161124 (Debian 6.2.1-5) ) #1 SMP Sat Nov 26 22:32:52 CET 2016
[0.00] Command line: BOOT_IMAGE=/sys/boot/vmlinuz-4.8.11+ 
root=UUID=b7c38da9-ae84-4083-a1f8-6d7b4fc33961 ro rootflags=subvol=sys 
syscall.x32=y syscall.x32=y console=tty1 console=ttyS0,115200,n81
[0.00] x86/fpu: Legacy x87 FPU detected.
[0.00] x86/fpu: Using 'eager' FPU context switches.
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009ebff] usable
[0.00] BIOS-e820: [mem 0x0009ec00-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000e4000-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0xc7e7] usable
[0.00] BIOS-e820: [mem 0xc7e8-0xc7e97fff] ACPI data
[0.00] BIOS-e820: [mem 0xc7e98000-0xc7ec] ACPI NVS
[0.00] BIOS-e820: [mem 0xc7ed-0xc7ef] reserved
[0.00] BIOS-e820: [mem 0xff70-0x] reserved
[0.00] BIOS-e820: [mem 0x0001-0x000237ff] usable
[0.00] NX (Execute Disable) protection: active
[0.00] SMBIOS 2.5 present.
[0.00] AGP: No AGP bridge found
[0.00] e820: last_pfn = 0x238000 max_arch_pfn = 0x4
[0.00] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WC  UC- WT  
[0.00] e820: last_pfn = 0xc7e80 max_arch_pfn = 0x4
[0.00] Using GB pages for direct mapping
[0.00] RAMDISK: [mem 0x3668d000-0x3733dfff]
[0.00] ACPI: Early table checksum verification disabled
[0.00] ACPI: RSDP 0x000FB510 24 (v02 ACPIAM)
[0.00] ACPI: XSDT 0xC7E80100 5C (v01 051811 XSDT1038 
20110518 MSFT 0097)
[0.00] ACPI: FACP 0xC7E80290 F4 (v03 051811 FACP1038 
20110518 MSFT 0097)
[0.00] ACPI BIOS Warning (bug): Optional FADT field Pm2ControlBlock has 
valid Length but zero Address: 0x/0x1 (20160422/tbfadt-658)
[0.00] ACPI BIOS Warning (bug): 32/64X length mismatch in 
FADT/Gpe0Block: 64/32 (20160422/tbfadt-624)
[0.00] ACPI: DSDT 0xC7E80450 00DB3A (v01 A1540  A1540001 
0001 INTL 20060113)
[0.00] ACPI: FACS 0xC7E98000 40
[0.00] ACPI: FACS 0xC7E98000 40
[0.00] ACPI: APIC 0xC7E80390 7C (v01 051811 APIC1038 
20110518 MSFT 0097)
[0.00] ACPI: MCFG 0xC7E80410 3C (v01 051811 OEMMCFG  
20110518 MSFT 0097)
[0.00] ACPI: OEMB 0xC7E98040 72 (v01 051811 OEMB1038 
20110518 MSFT 0097)
[0.00] ACPI: SRAT 0xC7E8F650 000108 (v01 AMDFAM_F_10 
0002 AMD  0001)
[0.00] ACPI: HPET 0xC7E8F760 38 (v01 051811 OEMHPET  
20110518 MSFT 0097)
[0.00] ACPI: SSDT 0xC7E8F7A0 000DA4 (v01 A M I  POWERNOW 
0001 AMD  0001)
[0.00] SRAT: PXM 0 -> APIC 0x00 -> Node 0
[0.00] SRAT: PXM 0 -> APIC 0x01 -> Node 0
[0.00] SRAT: PXM 0 -> APIC 0x02 -> Node 0
[0.00] SRAT: PXM 0 -> APIC 0x03 -> Node 0
[0.00] SRAT: PXM 0 -> APIC 0x04 -> Node 0
[0.00] SRAT: PXM 0 -> APIC 0x05 -> Node 0
[0.00] ACPI: SRAT: Node 0 PXM 0 [mem 0x-0x0009]
[0.00] ACPI: SRAT: Node 0 PXM 0 [mem 0x0010-0xc7ff]
[0.00] ACPI: SRAT: Node 0 PXM 0 [mem 0x1-0x237ff]
[0.00] NUMA: Node 0 [mem 0x-0x0009] + [mem 
0x0010-0xc7ff] -> [mem 0x-0xc7ff]
[0.00] NUMA: Node 0 [mem 0x-0xc7ff] + [mem 
0x1-0x237ff] -> [mem 0x-0x237ff]
[0.00] NODE_DATA(0) allocated [mem 0x237ffa000-0x237ffdfff]
[0.00] Zone ranges:
[0.00]   DMA  [mem 0x1000-0x00ff]
[0.00]   DMA32[mem 0x0100-0x]
[0.00]   Normal   [mem 0x0001-0x000237ff]
[0.00] Movable zone start for each node
[0.00] Early memory node ranges
[0.00]   node   0: [mem 0x1000-0x0009dfff]
[0.00]   node   0: [mem 0x0010-0xc7e7]
[0.00]   node   0: [mem 0x0001-0x000237ff]
[0.00] Initmem setup node 0 [mem 0x1000-0x000237ff]
[0.00] ACPI: PM-Timer IO Port: 0x808
[0.00] IOAPIC[0]: apic_id 6, version 33, address 0xfec0, GSI 0-23
[0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
[0.00] Using ACPI (MADT) for SMP configuration information
[0.00] ACPI:

[PATCH 02/22] cpufreq/acpi-cpufreq: drop rdmsr_on_cpus() usage

2016-11-26 Thread Sebastian Andrzej Siewior

The online / pre_down callback is invoked on the target CPU since commit
1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu") which means
for the hotplug callback we can use rmdsrl() instead of rdmsr_on_cpus().

This leaves us with set_boost() as the only user which still needs to
read/write the MSR on different CPUs. There is no point in doing that
update on all cpus with the read modify write magic via per cpu data. We
simply can issue a function call on all online CPUs which also means that we
need half that many IPIs.

Cc: "Rafael J. Wysocki" 
Cc: Viresh Kumar 
Cc: linux...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/cpufreq/acpi-cpufreq.c | 58 +++---
 1 file changed, 20 insertions(+), 38 deletions(-)

diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
index 2c29cbaca7b5..3a98702b7445 100644
--- a/drivers/cpufreq/acpi-cpufreq.c
+++ b/drivers/cpufreq/acpi-cpufreq.c
@@ -84,7 +84,6 @@ static inline struct acpi_processor_performance 
*to_perf_data(struct acpi_cpufre
 static struct cpufreq_driver acpi_cpufreq_driver;
 
 static unsigned int acpi_pstate_strict;
-static struct msr __percpu *msrs;
 
 static bool boost_state(unsigned int cpu)
 {
@@ -104,11 +103,10 @@ static bool boost_state(unsigned int cpu)
return false;
 }
 
-static void boost_set_msrs(bool enable, const struct cpumask *cpumask)
+static int boost_set_msr(bool enable)
 {
-   u32 cpu;
u32 msr_addr;
-   u64 msr_mask;
+   u64 msr_mask, val;
 
switch (boot_cpu_data.x86_vendor) {
case X86_VENDOR_INTEL:
@@ -120,26 +118,31 @@ static void boost_set_msrs(bool enable, const struct 
cpumask *cpumask)
msr_mask = MSR_K7_HWCR_CPB_DIS;
break;
default:
-   return;
+   return -EINVAL;
}
 
-   rdmsr_on_cpus(cpumask, msr_addr, msrs);
+   rdmsrl(msr_addr, val);
 
-   for_each_cpu(cpu, cpumask) {
-   struct msr *reg = per_cpu_ptr(msrs, cpu);
-   if (enable)
-   reg->q &= ~msr_mask;
-   else
-   reg->q |= msr_mask;
-   }
+   if (enable)
+   val &= ~msr_mask;
+   else
+   val |= msr_mask;
 
-   wrmsr_on_cpus(cpumask, msr_addr, msrs);
+   wrmsrl(msr_addr, val);
+   return 0;
+}
+
+static void boost_set_msr_each(void *p_en)
+{
+   bool enable = (bool) p_en;
+
+   boost_set_msr(enable);
 }
 
 static int set_boost(int val)
 {
get_online_cpus();
-   boost_set_msrs(val, cpu_online_mask);
+   on_each_cpu(boost_set_msr_each, (void *)(long)val, 1);
put_online_cpus();
pr_debug("Core Boosting %sabled.\n", val ? "en" : "dis");
 
@@ -538,29 +541,20 @@ static void free_acpi_perf_data(void)
 
 static int cpufreq_boost_online(unsigned int cpu)
 {
-   const struct cpumask *cpumask;
-
-   cpumask = get_cpu_mask(cpu);
/*
 * On the CPU_UP path we simply keep the boost-disable flag
 * in sync with the current global state.
 */
-   boost_set_msrs(acpi_cpufreq_driver.boost_enabled, cpumask);
-   return 0;
+   return boost_set_msr(acpi_cpufreq_driver.boost_enabled);
 }
 
 static int cpufreq_boost_down_prep(unsigned int cpu)
 {
-   const struct cpumask *cpumask;
-
-   cpumask = get_cpu_mask(cpu);
-
/*
 * Clear the boost-disable bit on the CPU_DOWN path so that
 * this cpu cannot block the remaining ones from boosting.
 */
-   boost_set_msrs(1, cpumask);
-   return 0;
+   return boost_set_msr(1);
 }
 
 /*
@@ -918,11 +912,6 @@ static void __init acpi_cpufreq_boost_init(void)
if (!(boot_cpu_has(X86_FEATURE_CPB) || boot_cpu_has(X86_FEATURE_IDA)))
return;
 
-   msrs = msrs_alloc();
-
-   if (!msrs)
-   return;
-
acpi_cpufreq_driver.set_boost = set_boost;
acpi_cpufreq_driver.boost_enabled = boost_state(0);
 
@@ -934,7 +923,6 @@ static void __init acpi_cpufreq_boost_init(void)
cpufreq_boost_online, cpufreq_boost_down_prep);
if (ret < 0) {
pr_err("acpi_cpufreq: failed to register hotplug callbacks\n");
-   msrs_free(msrs);
return;
}
acpi_cpufreq_online = ret;
@@ -942,14 +930,8 @@ static void __init acpi_cpufreq_boost_init(void)
 
 static void acpi_cpufreq_boost_exit(void)
 {
-   if (!msrs)
-   return;
-
if (acpi_cpufreq_online >= 0)
cpuhp_remove_state_nocalls(acpi_cpufreq_online);
-
-   msrs_free(msrs);
-   msrs = NULL;
 }
 
 static int __init acpi_cpufreq_init(void)
-- 
2.10.2

[PATCH 01/22] cpufreq/acpi-cpufreq: Convert to hotplug state machine

2016-11-26 Thread Sebastian Andrzej Siewior

Install the callbacks via the state machine.

Cc: "Rafael J. Wysocki" 
Cc: Viresh Kumar 
Cc: linux...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/cpufreq/acpi-cpufreq.c | 93 --
 1 file changed, 45 insertions(+), 48 deletions(-)

diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
index 297e9128fe9f..2c29cbaca7b5 100644
--- a/drivers/cpufreq/acpi-cpufreq.c
+++ b/drivers/cpufreq/acpi-cpufreq.c
@@ -536,46 +536,33 @@ static void free_acpi_perf_data(void)
free_percpu(acpi_perf_data);
 }
 
-static int boost_notify(struct notifier_block *nb, unsigned long action,
- void *hcpu)
+static int cpufreq_boost_online(unsigned int cpu)
+{
+   const struct cpumask *cpumask;
+
+   cpumask = get_cpu_mask(cpu);
+   /*
+* On the CPU_UP path we simply keep the boost-disable flag
+* in sync with the current global state.
+*/
+   boost_set_msrs(acpi_cpufreq_driver.boost_enabled, cpumask);
+   return 0;
+}
+
+static int cpufreq_boost_down_prep(unsigned int cpu)
 {
-   unsigned cpu = (long)hcpu;
const struct cpumask *cpumask;
 
cpumask = get_cpu_mask(cpu);
 
/*
 * Clear the boost-disable bit on the CPU_DOWN path so that
-* this cpu cannot block the remaining ones from boosting. On
-* the CPU_UP path we simply keep the boost-disable flag in
-* sync with the current global state.
+* this cpu cannot block the remaining ones from boosting.
 */
-
-   switch (action) {
-   case CPU_DOWN_FAILED:
-   case CPU_DOWN_FAILED_FROZEN:
-   case CPU_ONLINE:
-   case CPU_ONLINE_FROZEN:
-   boost_set_msrs(acpi_cpufreq_driver.boost_enabled, cpumask);
-   break;
-
-   case CPU_DOWN_PREPARE:
-   case CPU_DOWN_PREPARE_FROZEN:
-   boost_set_msrs(1, cpumask);
-   break;
-
-   default:
-   break;
-   }
-
-   return NOTIFY_OK;
+   boost_set_msrs(1, cpumask);
+   return 0;
 }
 
-
-static struct notifier_block boost_nb = {
-   .notifier_call  = boost_notify,
-};
-
 /*
  * acpi_cpufreq_early_init - initialize ACPI P-States library
  *
@@ -922,37 +909,47 @@ static struct cpufreq_driver acpi_cpufreq_driver = {
.attr   = acpi_cpufreq_attr,
 };
 
+static enum cpuhp_state acpi_cpufreq_online;
+
 static void __init acpi_cpufreq_boost_init(void)
 {
-   if (boot_cpu_has(X86_FEATURE_CPB) || boot_cpu_has(X86_FEATURE_IDA)) {
-   msrs = msrs_alloc();
+   int ret;
 
-   if (!msrs)
-   return;
+   if (!(boot_cpu_has(X86_FEATURE_CPB) || boot_cpu_has(X86_FEATURE_IDA)))
+   return;
 
-   acpi_cpufreq_driver.set_boost = set_boost;
-   acpi_cpufreq_driver.boost_enabled = boost_state(0);
+   msrs = msrs_alloc();
 
-   cpu_notifier_register_begin();
+   if (!msrs)
+   return;
 
-   /* Force all MSRs to the same value */
-   boost_set_msrs(acpi_cpufreq_driver.boost_enabled,
-  cpu_online_mask);
+   acpi_cpufreq_driver.set_boost = set_boost;
+   acpi_cpufreq_driver.boost_enabled = boost_state(0);
 
-   __register_cpu_notifier(_nb);
-
-   cpu_notifier_register_done();
+   /*
+* This calls the online callback on all online cpu and forces all
+* MSRs to the same value.
+*/
+   ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "cpufreq/acpi:online",
+   cpufreq_boost_online, cpufreq_boost_down_prep);
+   if (ret < 0) {
+   pr_err("acpi_cpufreq: failed to register hotplug callbacks\n");
+   msrs_free(msrs);
+   return;
}
+   acpi_cpufreq_online = ret;
 }
 
 static void acpi_cpufreq_boost_exit(void)
 {
-   if (msrs) {
-   unregister_cpu_notifier(_nb);
+   if (!msrs)
+   return;
 
-   msrs_free(msrs);
-   msrs = NULL;
-   }
+   if (acpi_cpufreq_online >= 0)
+   cpuhp_remove_state_nocalls(acpi_cpufreq_online);
+
+   msrs_free(msrs);
+   msrs = NULL;
 }
 
 static int __init acpi_cpufreq_init(void)
-- 
2.10.2

[PATCH 02/22] cpufreq/acpi-cpufreq: drop rdmsr_on_cpus() usage

2016-11-26 Thread Sebastian Andrzej Siewior

The online / pre_down callback is invoked on the target CPU since commit
1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu") which means
for the hotplug callback we can use rmdsrl() instead of rdmsr_on_cpus().

This leaves us with set_boost() as the only user which still needs to
read/write the MSR on different CPUs. There is no point in doing that
update on all cpus with the read modify write magic via per cpu data. We
simply can issue a function call on all online CPUs which also means that we
need half that many IPIs.

Cc: "Rafael J. Wysocki" 
Cc: Viresh Kumar 
Cc: linux...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/cpufreq/acpi-cpufreq.c | 58 +++---
 1 file changed, 20 insertions(+), 38 deletions(-)

diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
index 2c29cbaca7b5..3a98702b7445 100644
--- a/drivers/cpufreq/acpi-cpufreq.c
+++ b/drivers/cpufreq/acpi-cpufreq.c
@@ -84,7 +84,6 @@ static inline struct acpi_processor_performance 
*to_perf_data(struct acpi_cpufre
 static struct cpufreq_driver acpi_cpufreq_driver;
 
 static unsigned int acpi_pstate_strict;
-static struct msr __percpu *msrs;
 
 static bool boost_state(unsigned int cpu)
 {
@@ -104,11 +103,10 @@ static bool boost_state(unsigned int cpu)
return false;
 }
 
-static void boost_set_msrs(bool enable, const struct cpumask *cpumask)
+static int boost_set_msr(bool enable)
 {
-   u32 cpu;
u32 msr_addr;
-   u64 msr_mask;
+   u64 msr_mask, val;
 
switch (boot_cpu_data.x86_vendor) {
case X86_VENDOR_INTEL:
@@ -120,26 +118,31 @@ static void boost_set_msrs(bool enable, const struct 
cpumask *cpumask)
msr_mask = MSR_K7_HWCR_CPB_DIS;
break;
default:
-   return;
+   return -EINVAL;
}
 
-   rdmsr_on_cpus(cpumask, msr_addr, msrs);
+   rdmsrl(msr_addr, val);
 
-   for_each_cpu(cpu, cpumask) {
-   struct msr *reg = per_cpu_ptr(msrs, cpu);
-   if (enable)
-   reg->q &= ~msr_mask;
-   else
-   reg->q |= msr_mask;
-   }
+   if (enable)
+   val &= ~msr_mask;
+   else
+   val |= msr_mask;
 
-   wrmsr_on_cpus(cpumask, msr_addr, msrs);
+   wrmsrl(msr_addr, val);
+   return 0;
+}
+
+static void boost_set_msr_each(void *p_en)
+{
+   bool enable = (bool) p_en;
+
+   boost_set_msr(enable);
 }
 
 static int set_boost(int val)
 {
get_online_cpus();
-   boost_set_msrs(val, cpu_online_mask);
+   on_each_cpu(boost_set_msr_each, (void *)(long)val, 1);
put_online_cpus();
pr_debug("Core Boosting %sabled.\n", val ? "en" : "dis");
 
@@ -538,29 +541,20 @@ static void free_acpi_perf_data(void)
 
 static int cpufreq_boost_online(unsigned int cpu)
 {
-   const struct cpumask *cpumask;
-
-   cpumask = get_cpu_mask(cpu);
/*
 * On the CPU_UP path we simply keep the boost-disable flag
 * in sync with the current global state.
 */
-   boost_set_msrs(acpi_cpufreq_driver.boost_enabled, cpumask);
-   return 0;
+   return boost_set_msr(acpi_cpufreq_driver.boost_enabled);
 }
 
 static int cpufreq_boost_down_prep(unsigned int cpu)
 {
-   const struct cpumask *cpumask;
-
-   cpumask = get_cpu_mask(cpu);
-
/*
 * Clear the boost-disable bit on the CPU_DOWN path so that
 * this cpu cannot block the remaining ones from boosting.
 */
-   boost_set_msrs(1, cpumask);
-   return 0;
+   return boost_set_msr(1);
 }
 
 /*
@@ -918,11 +912,6 @@ static void __init acpi_cpufreq_boost_init(void)
if (!(boot_cpu_has(X86_FEATURE_CPB) || boot_cpu_has(X86_FEATURE_IDA)))
return;
 
-   msrs = msrs_alloc();
-
-   if (!msrs)
-   return;
-
acpi_cpufreq_driver.set_boost = set_boost;
acpi_cpufreq_driver.boost_enabled = boost_state(0);
 
@@ -934,7 +923,6 @@ static void __init acpi_cpufreq_boost_init(void)
cpufreq_boost_online, cpufreq_boost_down_prep);
if (ret < 0) {
pr_err("acpi_cpufreq: failed to register hotplug callbacks\n");
-   msrs_free(msrs);
return;
}
acpi_cpufreq_online = ret;
@@ -942,14 +930,8 @@ static void __init acpi_cpufreq_boost_init(void)
 
 static void acpi_cpufreq_boost_exit(void)
 {
-   if (!msrs)
-   return;
-
if (acpi_cpufreq_online >= 0)
cpuhp_remove_state_nocalls(acpi_cpufreq_online);
-
-   msrs_free(msrs);
-   msrs = NULL;
 }
 
 static int __init acpi_cpufreq_init(void)
-- 
2.10.2

[PATCH 01/22] cpufreq/acpi-cpufreq: Convert to hotplug state machine

2016-11-26 Thread Sebastian Andrzej Siewior

Install the callbacks via the state machine.

Cc: "Rafael J. Wysocki" 
Cc: Viresh Kumar 
Cc: linux...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/cpufreq/acpi-cpufreq.c | 93 --
 1 file changed, 45 insertions(+), 48 deletions(-)

diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
index 297e9128fe9f..2c29cbaca7b5 100644
--- a/drivers/cpufreq/acpi-cpufreq.c
+++ b/drivers/cpufreq/acpi-cpufreq.c
@@ -536,46 +536,33 @@ static void free_acpi_perf_data(void)
free_percpu(acpi_perf_data);
 }
 
-static int boost_notify(struct notifier_block *nb, unsigned long action,
- void *hcpu)
+static int cpufreq_boost_online(unsigned int cpu)
+{
+   const struct cpumask *cpumask;
+
+   cpumask = get_cpu_mask(cpu);
+   /*
+* On the CPU_UP path we simply keep the boost-disable flag
+* in sync with the current global state.
+*/
+   boost_set_msrs(acpi_cpufreq_driver.boost_enabled, cpumask);
+   return 0;
+}
+
+static int cpufreq_boost_down_prep(unsigned int cpu)
 {
-   unsigned cpu = (long)hcpu;
const struct cpumask *cpumask;
 
cpumask = get_cpu_mask(cpu);
 
/*
 * Clear the boost-disable bit on the CPU_DOWN path so that
-* this cpu cannot block the remaining ones from boosting. On
-* the CPU_UP path we simply keep the boost-disable flag in
-* sync with the current global state.
+* this cpu cannot block the remaining ones from boosting.
 */
-
-   switch (action) {
-   case CPU_DOWN_FAILED:
-   case CPU_DOWN_FAILED_FROZEN:
-   case CPU_ONLINE:
-   case CPU_ONLINE_FROZEN:
-   boost_set_msrs(acpi_cpufreq_driver.boost_enabled, cpumask);
-   break;
-
-   case CPU_DOWN_PREPARE:
-   case CPU_DOWN_PREPARE_FROZEN:
-   boost_set_msrs(1, cpumask);
-   break;
-
-   default:
-   break;
-   }
-
-   return NOTIFY_OK;
+   boost_set_msrs(1, cpumask);
+   return 0;
 }
 
-
-static struct notifier_block boost_nb = {
-   .notifier_call  = boost_notify,
-};
-
 /*
  * acpi_cpufreq_early_init - initialize ACPI P-States library
  *
@@ -922,37 +909,47 @@ static struct cpufreq_driver acpi_cpufreq_driver = {
.attr   = acpi_cpufreq_attr,
 };
 
+static enum cpuhp_state acpi_cpufreq_online;
+
 static void __init acpi_cpufreq_boost_init(void)
 {
-   if (boot_cpu_has(X86_FEATURE_CPB) || boot_cpu_has(X86_FEATURE_IDA)) {
-   msrs = msrs_alloc();
+   int ret;
 
-   if (!msrs)
-   return;
+   if (!(boot_cpu_has(X86_FEATURE_CPB) || boot_cpu_has(X86_FEATURE_IDA)))
+   return;
 
-   acpi_cpufreq_driver.set_boost = set_boost;
-   acpi_cpufreq_driver.boost_enabled = boost_state(0);
+   msrs = msrs_alloc();
 
-   cpu_notifier_register_begin();
+   if (!msrs)
+   return;
 
-   /* Force all MSRs to the same value */
-   boost_set_msrs(acpi_cpufreq_driver.boost_enabled,
-  cpu_online_mask);
+   acpi_cpufreq_driver.set_boost = set_boost;
+   acpi_cpufreq_driver.boost_enabled = boost_state(0);
 
-   __register_cpu_notifier(_nb);
-
-   cpu_notifier_register_done();
+   /*
+* This calls the online callback on all online cpu and forces all
+* MSRs to the same value.
+*/
+   ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "cpufreq/acpi:online",
+   cpufreq_boost_online, cpufreq_boost_down_prep);
+   if (ret < 0) {
+   pr_err("acpi_cpufreq: failed to register hotplug callbacks\n");
+   msrs_free(msrs);
+   return;
}
+   acpi_cpufreq_online = ret;
 }
 
 static void acpi_cpufreq_boost_exit(void)
 {
-   if (msrs) {
-   unregister_cpu_notifier(_nb);
+   if (!msrs)
+   return;
 
-   msrs_free(msrs);
-   msrs = NULL;
-   }
+   if (acpi_cpufreq_online >= 0)
+   cpuhp_remove_state_nocalls(acpi_cpufreq_online);
+
+   msrs_free(msrs);
+   msrs = NULL;
 }
 
 static int __init acpi_cpufreq_init(void)
-- 
2.10.2

[PATCH 07/22] mm/vmstat: Drop get_online_cpus() from init_cpu_node_state/vmstat_cpu_dead()

2016-11-26 Thread Sebastian Andrzej Siewior

Both functions are called with protection against cpu hotplug already so
*_online_cpus() could be dropped.

Cc: Andrew Morton 
Cc: Vlastimil Babka 
Cc: Mel Gorman 
Cc: Johannes Weiner 
Cc: linux...@kvack.org
Signed-off-by: Sebastian Andrzej Siewior 
---
 mm/vmstat.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 604f26a4f696..0b63ffb5c407 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1722,24 +1722,19 @@ static void __init init_cpu_node_state(void)
 {
int cpu;
 
-   get_online_cpus();
for_each_online_cpu(cpu)
node_set_state(cpu_to_node(cpu), N_CPU);
-   put_online_cpus();
 }
 
 static void vmstat_cpu_dead(int node)
 {
int cpu;
 
-   get_online_cpus();
for_each_online_cpu(cpu)
if (cpu_to_node(cpu) == node)
-   goto end;
+   return;
 
node_clear_state(node, N_CPU);
-end:
-   put_online_cpus();
 }
 
 /*
-- 
2.10.2

[PATCH 08/22] mm/vmstat: Avoid on each online CPU loops

2016-11-26 Thread Sebastian Andrzej Siewior

Both iterations over online cpus can be replaced by the proper node
specific functions.

Cc: Andrew Morton 
Cc: Vlastimil Babka 
Cc: Mel Gorman 
Cc: Johannes Weiner 
Cc: linux...@kvack.org
Signed-off-by: Sebastian Andrzej Siewior 
---
 mm/vmstat.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 0b63ffb5c407..b96dcec7e7d7 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1720,19 +1720,19 @@ static void __init start_shepherd_timer(void)
 
 static void __init init_cpu_node_state(void)
 {
-   int cpu;
+   int node;
 
-   for_each_online_cpu(cpu)
-   node_set_state(cpu_to_node(cpu), N_CPU);
+   for_each_online_node(node)
+   node_set_state(node, N_CPU);
 }
 
 static void vmstat_cpu_dead(int node)
 {
-   int cpu;
+   const struct cpumask *node_cpus;
 
-   for_each_online_cpu(cpu)
-   if (cpu_to_node(cpu) == node)
-   return;
+   node_cpus = cpumask_of_node(node);
+   if (cpumask_weight(node_cpus) > 0)
+   return;
 
node_clear_state(node, N_CPU);
 }
-- 
2.10.2

[PATCH 12/22] mm/zswap: Convert pool to hotplug state machine

2016-11-26 Thread Sebastian Andrzej Siewior

Install the callbacks via the state machine. Multi state is used to address the
per-pool notifier. Uppon adding of the intance the callback is invoked for all
online CPUs so the manual init can go.

Cc: Seth Jennings 
Cc: linux...@kvack.org
Signed-off-by: Sebastian Andrzej Siewior 
---
 include/linux/cpuhotplug.h |  1 +
 mm/zswap.c | 99 --
 2 files changed, 35 insertions(+), 65 deletions(-)

diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 62f51a4e8676..c7d0d76ef0ee 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -66,6 +66,7 @@ enum cpuhp_state {
CPUHP_TRACE_RB_PREPARE,
CPUHP_MM_ZS_PREPARE,
CPUHP_MM_ZSWP_MEM_PREPARE,
+   CPUHP_MM_ZSWP_POOL_PREPARE,
CPUHP_TIMERS_DEAD,
CPUHP_NOTF_ERR_INJ_PREPARE,
CPUHP_MIPS_SOC_PREPARE,
diff --git a/mm/zswap.c b/mm/zswap.c
index b13aa5706348..067a0d62f318 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -118,7 +118,7 @@ struct zswap_pool {
struct kref kref;
struct list_head list;
struct work_struct work;
-   struct notifier_block notifier;
+   struct hlist_node node;
char tfm_name[CRYPTO_MAX_ALG_NAME];
 };
 
@@ -376,77 +376,34 @@ static int zswap_dstmem_dead(unsigned int cpu)
return 0;
 }
 
-static int __zswap_cpu_comp_notifier(struct zswap_pool *pool,
-unsigned long action, unsigned long cpu)
+static int zswap_cpu_comp_prepare(unsigned int cpu, struct hlist_node *node)
 {
+   struct zswap_pool *pool = hlist_entry(node, struct zswap_pool, node);
struct crypto_comp *tfm;
 
-   switch (action) {
-   case CPU_UP_PREPARE:
-   if (WARN_ON(*per_cpu_ptr(pool->tfm, cpu)))
-   break;
-   tfm = crypto_alloc_comp(pool->tfm_name, 0, 0);
-   if (IS_ERR_OR_NULL(tfm)) {
-   pr_err("could not alloc crypto comp %s : %ld\n",
-  pool->tfm_name, PTR_ERR(tfm));
-   return NOTIFY_BAD;
-   }
-   *per_cpu_ptr(pool->tfm, cpu) = tfm;
-   break;
-   case CPU_DEAD:
-   case CPU_UP_CANCELED:
-   tfm = *per_cpu_ptr(pool->tfm, cpu);
-   if (!IS_ERR_OR_NULL(tfm))
-   crypto_free_comp(tfm);
-   *per_cpu_ptr(pool->tfm, cpu) = NULL;
-   break;
-   default:
-   break;
+   if (WARN_ON(*per_cpu_ptr(pool->tfm, cpu)))
+   return 0;
+
+   tfm = crypto_alloc_comp(pool->tfm_name, 0, 0);
+   if (IS_ERR_OR_NULL(tfm)) {
+   pr_err("could not alloc crypto comp %s : %ld\n",
+  pool->tfm_name, PTR_ERR(tfm));
+   return -ENOMEM;
}
-   return NOTIFY_OK;
-}
-
-static int zswap_cpu_comp_notifier(struct notifier_block *nb,
-  unsigned long action, void *pcpu)
-{
-   unsigned long cpu = (unsigned long)pcpu;
-   struct zswap_pool *pool = container_of(nb, typeof(*pool), notifier);
-
-   return __zswap_cpu_comp_notifier(pool, action, cpu);
-}
-
-static int zswap_cpu_comp_init(struct zswap_pool *pool)
-{
-   unsigned long cpu;
-
-   memset(>notifier, 0, sizeof(pool->notifier));
-   pool->notifier.notifier_call = zswap_cpu_comp_notifier;
-
-   cpu_notifier_register_begin();
-   for_each_online_cpu(cpu)
-   if (__zswap_cpu_comp_notifier(pool, CPU_UP_PREPARE, cpu) ==
-   NOTIFY_BAD)
-   goto cleanup;
-   __register_cpu_notifier(>notifier);
-   cpu_notifier_register_done();
+   *per_cpu_ptr(pool->tfm, cpu) = tfm;
return 0;
-
-cleanup:
-   for_each_online_cpu(cpu)
-   __zswap_cpu_comp_notifier(pool, CPU_UP_CANCELED, cpu);
-   cpu_notifier_register_done();
-   return -ENOMEM;
 }
 
-static void zswap_cpu_comp_destroy(struct zswap_pool *pool)
+static int zswap_cpu_comp_dead(unsigned int cpu, struct hlist_node *node)
 {
-   unsigned long cpu;
+   struct zswap_pool *pool = hlist_entry(node, struct zswap_pool, node);
+   struct crypto_comp *tfm;
 
-   cpu_notifier_register_begin();
-   for_each_online_cpu(cpu)
-   __zswap_cpu_comp_notifier(pool, CPU_UP_CANCELED, cpu);
-   __unregister_cpu_notifier(>notifier);
-   cpu_notifier_register_done();
+   tfm = *per_cpu_ptr(pool->tfm, cpu);
+   if (!IS_ERR_OR_NULL(tfm))
+   crypto_free_comp(tfm);
+   *per_cpu_ptr(pool->tfm, cpu) = NULL;
+   return 0;
 }
 
 /*
@@ -527,6 +484,7 @@ static struct zswap_pool *zswap_pool_create(char *type, 
char *compressor)
struct zswap_pool *pool;
char name[38]; /* 'zswap' + 32 char (max) num + \0 */
gfp_t gfp = __GFP_NORETRY | __GFP_NOWARN | __GFP_KSWAPD_RECLAIM;
+

[PATCH 06/22] tracing/rb: Convert to hotplug state machine

2016-11-26 Thread Sebastian Andrzej Siewior

Install the callbacks via the state machine. The notifier in struct
ring_buffer is replaced by the multi instance interface.  Upon
__ring_buffer_alloc() invocation, cpuhp_state_add_instance() will invoke
the trace_rb_cpu_prepare() on each CPU.

This callback may now fail. This means __ring_buffer_alloc() will fail and
cleanup (like previously) and during a CPU up event this failure will not
allow the CPU to come up.

Cc: Steven Rostedt 
Cc: Ingo Molnar 
Signed-off-by: Sebastian Andrzej Siewior 
---
 include/linux/cpuhotplug.h  |   1 +
 include/linux/ring_buffer.h |   6 ++
 kernel/trace/ring_buffer.c  | 133 +++-
 kernel/trace/trace.c|  15 -
 4 files changed, 65 insertions(+), 90 deletions(-)

diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index e3771fb959c0..18bcfeb2463e 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -62,6 +62,7 @@ enum cpuhp_state {
CPUHP_TOPOLOGY_PREPARE,
CPUHP_NET_IUCV_PREPARE,
CPUHP_ARM_BL_PREPARE,
+   CPUHP_TRACE_RB_PREPARE,
CPUHP_TIMERS_DEAD,
CPUHP_NOTF_ERR_INJ_PREPARE,
CPUHP_MIPS_SOC_PREPARE,
diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index 4acc552e9279..b6d4568795a7 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -198,4 +198,10 @@ enum ring_buffer_flags {
RB_FL_OVERWRITE = 1 << 0,
 };
 
+#ifdef CONFIG_RING_BUFFER
+int trace_rb_cpu_prepare(unsigned int cpu, struct hlist_node *node);
+#else
+#define trace_rb_cpu_prepare   NULL
+#endif
+
 #endif /* _LINUX_RING_BUFFER_H */
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 9c143739b8d7..a7a055f167c7 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -479,9 +479,7 @@ struct ring_buffer {
 
struct ring_buffer_per_cpu  **buffers;
 
-#ifdef CONFIG_HOTPLUG_CPU
-   struct notifier_block   cpu_notify;
-#endif
+   struct hlist_node   node;
u64 (*clock)(void);
 
struct rb_irq_work  irq_work;
@@ -1274,11 +1272,6 @@ static void rb_free_cpu_buffer(struct 
ring_buffer_per_cpu *cpu_buffer)
kfree(cpu_buffer);
 }
 
-#ifdef CONFIG_HOTPLUG_CPU
-static int rb_cpu_notify(struct notifier_block *self,
-unsigned long action, void *hcpu);
-#endif
-
 /**
  * __ring_buffer_alloc - allocate a new ring_buffer
  * @size: the size in bytes per cpu that is needed.
@@ -1296,6 +1289,7 @@ struct ring_buffer *__ring_buffer_alloc(unsigned long 
size, unsigned flags,
long nr_pages;
int bsize;
int cpu;
+   int ret;
 
/* keep it in its own cache line */
buffer = kzalloc(ALIGN(sizeof(*buffer), cache_line_size()),
@@ -1318,17 +1312,6 @@ struct ring_buffer *__ring_buffer_alloc(unsigned long 
size, unsigned flags,
if (nr_pages < 2)
nr_pages = 2;
 
-   /*
-* In case of non-hotplug cpu, if the ring-buffer is allocated
-* in early initcall, it will not be notified of secondary cpus.
-* In that off case, we need to allocate for all possible cpus.
-*/
-#ifdef CONFIG_HOTPLUG_CPU
-   cpu_notifier_register_begin();
-   cpumask_copy(buffer->cpumask, cpu_online_mask);
-#else
-   cpumask_copy(buffer->cpumask, cpu_possible_mask);
-#endif
buffer->cpus = nr_cpu_ids;
 
bsize = sizeof(void *) * nr_cpu_ids;
@@ -1337,19 +1320,15 @@ struct ring_buffer *__ring_buffer_alloc(unsigned long 
size, unsigned flags,
if (!buffer->buffers)
goto fail_free_cpumask;
 
-   for_each_buffer_cpu(buffer, cpu) {
-   buffer->buffers[cpu] =
-   rb_allocate_cpu_buffer(buffer, nr_pages, cpu);
-   if (!buffer->buffers[cpu])
-   goto fail_free_buffers;
-   }
+   cpu = raw_smp_processor_id();
+   cpumask_set_cpu(cpu, buffer->cpumask);
+   buffer->buffers[cpu] = rb_allocate_cpu_buffer(buffer, nr_pages, cpu);
+   if (!buffer->buffers[cpu])
+   goto fail_free_buffers;
 
-#ifdef CONFIG_HOTPLUG_CPU
-   buffer->cpu_notify.notifier_call = rb_cpu_notify;
-   buffer->cpu_notify.priority = 0;
-   __register_cpu_notifier(>cpu_notify);
-   cpu_notifier_register_done();
-#endif
+   ret = cpuhp_state_add_instance(CPUHP_TRACE_RB_PREPARE, >node);
+   if (ret < 0)
+   goto fail_free_buffers;
 
mutex_init(>mutex);
 
@@ -1364,9 +1343,6 @@ struct ring_buffer *__ring_buffer_alloc(unsigned long 
size, unsigned flags,
 
  fail_free_cpumask:
free_cpumask_var(buffer->cpumask);
-#ifdef CONFIG_HOTPLUG_CPU
-   cpu_notifier_register_done();
-#endif
 
  fail_free_buffer:
kfree(buffer);
@@ -1383,18 +1359,11 @@ ring_buffer_free(struct ring_buffer *buffer)
 {
int cpu;

[PATCH 07/22] mm/vmstat: Drop get_online_cpus() from init_cpu_node_state/vmstat_cpu_dead()

2016-11-26 Thread Sebastian Andrzej Siewior

Both functions are called with protection against cpu hotplug already so
*_online_cpus() could be dropped.

Cc: Andrew Morton 
Cc: Vlastimil Babka 
Cc: Mel Gorman 
Cc: Johannes Weiner 
Cc: linux...@kvack.org
Signed-off-by: Sebastian Andrzej Siewior 
---
 mm/vmstat.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 604f26a4f696..0b63ffb5c407 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1722,24 +1722,19 @@ static void __init init_cpu_node_state(void)
 {
int cpu;
 
-   get_online_cpus();
for_each_online_cpu(cpu)
node_set_state(cpu_to_node(cpu), N_CPU);
-   put_online_cpus();
 }
 
 static void vmstat_cpu_dead(int node)
 {
int cpu;
 
-   get_online_cpus();
for_each_online_cpu(cpu)
if (cpu_to_node(cpu) == node)
-   goto end;
+   return;
 
node_clear_state(node, N_CPU);
-end:
-   put_online_cpus();
 }
 
 /*
-- 
2.10.2

[PATCH 08/22] mm/vmstat: Avoid on each online CPU loops

2016-11-26 Thread Sebastian Andrzej Siewior

Both iterations over online cpus can be replaced by the proper node
specific functions.

Cc: Andrew Morton 
Cc: Vlastimil Babka 
Cc: Mel Gorman 
Cc: Johannes Weiner 
Cc: linux...@kvack.org
Signed-off-by: Sebastian Andrzej Siewior 
---
 mm/vmstat.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 0b63ffb5c407..b96dcec7e7d7 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1720,19 +1720,19 @@ static void __init start_shepherd_timer(void)
 
 static void __init init_cpu_node_state(void)
 {
-   int cpu;
+   int node;
 
-   for_each_online_cpu(cpu)
-   node_set_state(cpu_to_node(cpu), N_CPU);
+   for_each_online_node(node)
+   node_set_state(node, N_CPU);
 }
 
 static void vmstat_cpu_dead(int node)
 {
-   int cpu;
+   const struct cpumask *node_cpus;
 
-   for_each_online_cpu(cpu)
-   if (cpu_to_node(cpu) == node)
-   return;
+   node_cpus = cpumask_of_node(node);
+   if (cpumask_weight(node_cpus) > 0)
+   return;
 
node_clear_state(node, N_CPU);
 }
-- 
2.10.2

[PATCH 12/22] mm/zswap: Convert pool to hotplug state machine

2016-11-26 Thread Sebastian Andrzej Siewior

Install the callbacks via the state machine. Multi state is used to address the
per-pool notifier. Uppon adding of the intance the callback is invoked for all
online CPUs so the manual init can go.

Cc: Seth Jennings 
Cc: linux...@kvack.org
Signed-off-by: Sebastian Andrzej Siewior 
---
 include/linux/cpuhotplug.h |  1 +
 mm/zswap.c | 99 --
 2 files changed, 35 insertions(+), 65 deletions(-)

diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 62f51a4e8676..c7d0d76ef0ee 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -66,6 +66,7 @@ enum cpuhp_state {
CPUHP_TRACE_RB_PREPARE,
CPUHP_MM_ZS_PREPARE,
CPUHP_MM_ZSWP_MEM_PREPARE,
+   CPUHP_MM_ZSWP_POOL_PREPARE,
CPUHP_TIMERS_DEAD,
CPUHP_NOTF_ERR_INJ_PREPARE,
CPUHP_MIPS_SOC_PREPARE,
diff --git a/mm/zswap.c b/mm/zswap.c
index b13aa5706348..067a0d62f318 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -118,7 +118,7 @@ struct zswap_pool {
struct kref kref;
struct list_head list;
struct work_struct work;
-   struct notifier_block notifier;
+   struct hlist_node node;
char tfm_name[CRYPTO_MAX_ALG_NAME];
 };
 
@@ -376,77 +376,34 @@ static int zswap_dstmem_dead(unsigned int cpu)
return 0;
 }
 
-static int __zswap_cpu_comp_notifier(struct zswap_pool *pool,
-unsigned long action, unsigned long cpu)
+static int zswap_cpu_comp_prepare(unsigned int cpu, struct hlist_node *node)
 {
+   struct zswap_pool *pool = hlist_entry(node, struct zswap_pool, node);
struct crypto_comp *tfm;
 
-   switch (action) {
-   case CPU_UP_PREPARE:
-   if (WARN_ON(*per_cpu_ptr(pool->tfm, cpu)))
-   break;
-   tfm = crypto_alloc_comp(pool->tfm_name, 0, 0);
-   if (IS_ERR_OR_NULL(tfm)) {
-   pr_err("could not alloc crypto comp %s : %ld\n",
-  pool->tfm_name, PTR_ERR(tfm));
-   return NOTIFY_BAD;
-   }
-   *per_cpu_ptr(pool->tfm, cpu) = tfm;
-   break;
-   case CPU_DEAD:
-   case CPU_UP_CANCELED:
-   tfm = *per_cpu_ptr(pool->tfm, cpu);
-   if (!IS_ERR_OR_NULL(tfm))
-   crypto_free_comp(tfm);
-   *per_cpu_ptr(pool->tfm, cpu) = NULL;
-   break;
-   default:
-   break;
+   if (WARN_ON(*per_cpu_ptr(pool->tfm, cpu)))
+   return 0;
+
+   tfm = crypto_alloc_comp(pool->tfm_name, 0, 0);
+   if (IS_ERR_OR_NULL(tfm)) {
+   pr_err("could not alloc crypto comp %s : %ld\n",
+  pool->tfm_name, PTR_ERR(tfm));
+   return -ENOMEM;
}
-   return NOTIFY_OK;
-}
-
-static int zswap_cpu_comp_notifier(struct notifier_block *nb,
-  unsigned long action, void *pcpu)
-{
-   unsigned long cpu = (unsigned long)pcpu;
-   struct zswap_pool *pool = container_of(nb, typeof(*pool), notifier);
-
-   return __zswap_cpu_comp_notifier(pool, action, cpu);
-}
-
-static int zswap_cpu_comp_init(struct zswap_pool *pool)
-{
-   unsigned long cpu;
-
-   memset(>notifier, 0, sizeof(pool->notifier));
-   pool->notifier.notifier_call = zswap_cpu_comp_notifier;
-
-   cpu_notifier_register_begin();
-   for_each_online_cpu(cpu)
-   if (__zswap_cpu_comp_notifier(pool, CPU_UP_PREPARE, cpu) ==
-   NOTIFY_BAD)
-   goto cleanup;
-   __register_cpu_notifier(>notifier);
-   cpu_notifier_register_done();
+   *per_cpu_ptr(pool->tfm, cpu) = tfm;
return 0;
-
-cleanup:
-   for_each_online_cpu(cpu)
-   __zswap_cpu_comp_notifier(pool, CPU_UP_CANCELED, cpu);
-   cpu_notifier_register_done();
-   return -ENOMEM;
 }
 
-static void zswap_cpu_comp_destroy(struct zswap_pool *pool)
+static int zswap_cpu_comp_dead(unsigned int cpu, struct hlist_node *node)
 {
-   unsigned long cpu;
+   struct zswap_pool *pool = hlist_entry(node, struct zswap_pool, node);
+   struct crypto_comp *tfm;
 
-   cpu_notifier_register_begin();
-   for_each_online_cpu(cpu)
-   __zswap_cpu_comp_notifier(pool, CPU_UP_CANCELED, cpu);
-   __unregister_cpu_notifier(>notifier);
-   cpu_notifier_register_done();
+   tfm = *per_cpu_ptr(pool->tfm, cpu);
+   if (!IS_ERR_OR_NULL(tfm))
+   crypto_free_comp(tfm);
+   *per_cpu_ptr(pool->tfm, cpu) = NULL;
+   return 0;
 }
 
 /*
@@ -527,6 +484,7 @@ static struct zswap_pool *zswap_pool_create(char *type, 
char *compressor)
struct zswap_pool *pool;
char name[38]; /* 'zswap' + 32 char (max) num + \0 */
gfp_t gfp = __GFP_NORETRY | __GFP_NOWARN | __GFP_KSWAPD_RECLAIM;
+   int ret;
 
pool =

[PATCH 06/22] tracing/rb: Convert to hotplug state machine

2016-11-26 Thread Sebastian Andrzej Siewior

Install the callbacks via the state machine. The notifier in struct
ring_buffer is replaced by the multi instance interface.  Upon
__ring_buffer_alloc() invocation, cpuhp_state_add_instance() will invoke
the trace_rb_cpu_prepare() on each CPU.

This callback may now fail. This means __ring_buffer_alloc() will fail and
cleanup (like previously) and during a CPU up event this failure will not
allow the CPU to come up.

Cc: Steven Rostedt 
Cc: Ingo Molnar 
Signed-off-by: Sebastian Andrzej Siewior 
---
 include/linux/cpuhotplug.h  |   1 +
 include/linux/ring_buffer.h |   6 ++
 kernel/trace/ring_buffer.c  | 133 +++-
 kernel/trace/trace.c|  15 -
 4 files changed, 65 insertions(+), 90 deletions(-)

diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index e3771fb959c0..18bcfeb2463e 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -62,6 +62,7 @@ enum cpuhp_state {
CPUHP_TOPOLOGY_PREPARE,
CPUHP_NET_IUCV_PREPARE,
CPUHP_ARM_BL_PREPARE,
+   CPUHP_TRACE_RB_PREPARE,
CPUHP_TIMERS_DEAD,
CPUHP_NOTF_ERR_INJ_PREPARE,
CPUHP_MIPS_SOC_PREPARE,
diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index 4acc552e9279..b6d4568795a7 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -198,4 +198,10 @@ enum ring_buffer_flags {
RB_FL_OVERWRITE = 1 << 0,
 };
 
+#ifdef CONFIG_RING_BUFFER
+int trace_rb_cpu_prepare(unsigned int cpu, struct hlist_node *node);
+#else
+#define trace_rb_cpu_prepare   NULL
+#endif
+
 #endif /* _LINUX_RING_BUFFER_H */
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 9c143739b8d7..a7a055f167c7 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -479,9 +479,7 @@ struct ring_buffer {
 
struct ring_buffer_per_cpu  **buffers;
 
-#ifdef CONFIG_HOTPLUG_CPU
-   struct notifier_block   cpu_notify;
-#endif
+   struct hlist_node   node;
u64 (*clock)(void);
 
struct rb_irq_work  irq_work;
@@ -1274,11 +1272,6 @@ static void rb_free_cpu_buffer(struct 
ring_buffer_per_cpu *cpu_buffer)
kfree(cpu_buffer);
 }
 
-#ifdef CONFIG_HOTPLUG_CPU
-static int rb_cpu_notify(struct notifier_block *self,
-unsigned long action, void *hcpu);
-#endif
-
 /**
  * __ring_buffer_alloc - allocate a new ring_buffer
  * @size: the size in bytes per cpu that is needed.
@@ -1296,6 +1289,7 @@ struct ring_buffer *__ring_buffer_alloc(unsigned long 
size, unsigned flags,
long nr_pages;
int bsize;
int cpu;
+   int ret;
 
/* keep it in its own cache line */
buffer = kzalloc(ALIGN(sizeof(*buffer), cache_line_size()),
@@ -1318,17 +1312,6 @@ struct ring_buffer *__ring_buffer_alloc(unsigned long 
size, unsigned flags,
if (nr_pages < 2)
nr_pages = 2;
 
-   /*
-* In case of non-hotplug cpu, if the ring-buffer is allocated
-* in early initcall, it will not be notified of secondary cpus.
-* In that off case, we need to allocate for all possible cpus.
-*/
-#ifdef CONFIG_HOTPLUG_CPU
-   cpu_notifier_register_begin();
-   cpumask_copy(buffer->cpumask, cpu_online_mask);
-#else
-   cpumask_copy(buffer->cpumask, cpu_possible_mask);
-#endif
buffer->cpus = nr_cpu_ids;
 
bsize = sizeof(void *) * nr_cpu_ids;
@@ -1337,19 +1320,15 @@ struct ring_buffer *__ring_buffer_alloc(unsigned long 
size, unsigned flags,
if (!buffer->buffers)
goto fail_free_cpumask;
 
-   for_each_buffer_cpu(buffer, cpu) {
-   buffer->buffers[cpu] =
-   rb_allocate_cpu_buffer(buffer, nr_pages, cpu);
-   if (!buffer->buffers[cpu])
-   goto fail_free_buffers;
-   }
+   cpu = raw_smp_processor_id();
+   cpumask_set_cpu(cpu, buffer->cpumask);
+   buffer->buffers[cpu] = rb_allocate_cpu_buffer(buffer, nr_pages, cpu);
+   if (!buffer->buffers[cpu])
+   goto fail_free_buffers;
 
-#ifdef CONFIG_HOTPLUG_CPU
-   buffer->cpu_notify.notifier_call = rb_cpu_notify;
-   buffer->cpu_notify.priority = 0;
-   __register_cpu_notifier(>cpu_notify);
-   cpu_notifier_register_done();
-#endif
+   ret = cpuhp_state_add_instance(CPUHP_TRACE_RB_PREPARE, >node);
+   if (ret < 0)
+   goto fail_free_buffers;
 
mutex_init(>mutex);
 
@@ -1364,9 +1343,6 @@ struct ring_buffer *__ring_buffer_alloc(unsigned long 
size, unsigned flags,
 
  fail_free_cpumask:
free_cpumask_var(buffer->cpumask);
-#ifdef CONFIG_HOTPLUG_CPU
-   cpu_notifier_register_done();
-#endif
 
  fail_free_buffer:
kfree(buffer);
@@ -1383,18 +1359,11 @@ ring_buffer_free(struct ring_buffer *buffer)
 {
int cpu;
 
-#ifdef CONFIG_HOTPLUG_CPU
-   cpu_notifier_register_begin();

[PATCH 10/22] mm/zsmalloc: Convert to hotplug state machine

2016-11-26 Thread Sebastian Andrzej Siewior

Install the callbacks via the state machine and let the core invoke
the callbacks on the already online CPUs.

Cc: Minchan Kim 
Cc: Nitin Gupta 
Cc: Sergey Senozhatsky 
CC: linux...@kvack.org
Signed-off-by: Sebastian Andrzej Siewior 
---
 include/linux/cpuhotplug.h |  1 +
 mm/zsmalloc.c  | 67 +-
 2 files changed, 14 insertions(+), 54 deletions(-)

diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 4ebd1bc27f8d..9f29dd996088 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -64,6 +64,7 @@ enum cpuhp_state {
CPUHP_NET_IUCV_PREPARE,
CPUHP_ARM_BL_PREPARE,
CPUHP_TRACE_RB_PREPARE,
+   CPUHP_MM_ZS_PREPARE,
CPUHP_TIMERS_DEAD,
CPUHP_NOTF_ERR_INJ_PREPARE,
CPUHP_MIPS_SOC_PREPARE,
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index b0bc023d25c5..9cc3c0b2c2c1 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1284,61 +1284,21 @@ static void __zs_unmap_object(struct mapping_area *area,
 
 #endif /* CONFIG_PGTABLE_MAPPING */
 
-static int zs_cpu_notifier(struct notifier_block *nb, unsigned long action,
-   void *pcpu)
+static int zs_cpu_prepare(unsigned int cpu)
 {
-   int ret, cpu = (long)pcpu;
struct mapping_area *area;
 
-   switch (action) {
-   case CPU_UP_PREPARE:
-   area = _cpu(zs_map_area, cpu);
-   ret = __zs_cpu_up(area);
-   if (ret)
-   return notifier_from_errno(ret);
-   break;
-   case CPU_DEAD:
-   case CPU_UP_CANCELED:
-   area = _cpu(zs_map_area, cpu);
-   __zs_cpu_down(area);
-   break;
-   }
-
-   return NOTIFY_OK;
+   area = _cpu(zs_map_area, cpu);
+   return __zs_cpu_up(area);
 }
 
-static struct notifier_block zs_cpu_nb = {
-   .notifier_call = zs_cpu_notifier
-};
-
-static int zs_register_cpu_notifier(void)
+static int zs_cpu_dead(unsigned int cpu)
 {
-   int cpu, uninitialized_var(ret);
+   struct mapping_area *area;
 
-   cpu_notifier_register_begin();
-
-   __register_cpu_notifier(_cpu_nb);
-   for_each_online_cpu(cpu) {
-   ret = zs_cpu_notifier(NULL, CPU_UP_PREPARE, (void *)(long)cpu);
-   if (notifier_to_errno(ret))
-   break;
-   }
-
-   cpu_notifier_register_done();
-   return notifier_to_errno(ret);
-}
-
-static void zs_unregister_cpu_notifier(void)
-{
-   int cpu;
-
-   cpu_notifier_register_begin();
-
-   for_each_online_cpu(cpu)
-   zs_cpu_notifier(NULL, CPU_DEAD, (void *)(long)cpu);
-   __unregister_cpu_notifier(_cpu_nb);
-
-   cpu_notifier_register_done();
+   area = _cpu(zs_map_area, cpu);
+   __zs_cpu_down(area);
+   return 0;
 }
 
 static void __init init_zs_size_classes(void)
@@ -2534,10 +2494,10 @@ static int __init zs_init(void)
if (ret)
goto out;
 
-   ret = zs_register_cpu_notifier();
-
+   ret = cpuhp_setup_state(CPUHP_MM_ZS_PREPARE, "mm/zsmalloc:prepare",
+   zs_cpu_prepare, zs_cpu_dead);
if (ret)
-   goto notifier_fail;
+   goto hp_setup_fail;
 
init_zs_size_classes();
 
@@ -2549,8 +2509,7 @@ static int __init zs_init(void)
 
return 0;
 
-notifier_fail:
-   zs_unregister_cpu_notifier();
+hp_setup_fail:
zsmalloc_unmount();
 out:
return ret;
@@ -2562,7 +2521,7 @@ static void __exit zs_exit(void)
zpool_unregister_driver(_zpool_driver);
 #endif
zsmalloc_unmount();
-   zs_unregister_cpu_notifier();
+   cpuhp_remove_state(CPUHP_MM_ZS_PREPARE);
 
zs_stat_exit();
 }
-- 
2.10.2

[PATCH 13/22] iommu/vt-d: Convert to hotplug state machine

2016-11-26 Thread Sebastian Andrzej Siewior

From: Anna-Maria Gleixner 

Install the callbacks via the state machine.

Cc: David Woodhouse 
Cc: Joerg Roedel 
Cc: io...@lists.linux-foundation.org
Signed-off-by: Anna-Maria Gleixner 
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/iommu/intel-iommu.c | 24 ++--
 include/linux/cpuhotplug.h  |  1 +
 2 files changed, 7 insertions(+), 18 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index a4407eabf0e6..fd7962560e56 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4665,25 +4665,13 @@ static void free_all_cpu_cached_iovas(unsigned int cpu)
}
 }
 
-static int intel_iommu_cpu_notifier(struct notifier_block *nfb,
-   unsigned long action, void *v)
+static int intel_iommu_cpu_dead(unsigned int cpu)
 {
-   unsigned int cpu = (unsigned long)v;
-
-   switch (action) {
-   case CPU_DEAD:
-   case CPU_DEAD_FROZEN:
-   free_all_cpu_cached_iovas(cpu);
-   flush_unmaps_timeout(cpu);
-   break;
-   }
-   return NOTIFY_OK;
+   free_all_cpu_cached_iovas(cpu);
+   flush_unmaps_timeout(cpu);
+   return 0;
 }
 
-static struct notifier_block intel_iommu_cpu_nb = {
-   .notifier_call = intel_iommu_cpu_notifier,
-};
-
 static ssize_t intel_iommu_show_version(struct device *dev,
struct device_attribute *attr,
char *buf)
@@ -4832,8 +4820,8 @@ int __init intel_iommu_init(void)
bus_register_notifier(_bus_type, _nb);
if (si_domain && !hw_pass_through)
register_memory_notifier(_iommu_memory_nb);
-   register_hotcpu_notifier(_iommu_cpu_nb);
-
+   cpuhp_setup_state(CPUHP_IOMMU_INTEL_DEAD, "iommu/intel:dead", NULL,
+ intel_iommu_cpu_dead);
intel_iommu_enabled = 1;
 
return 0;
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index c7d0d76ef0ee..853f8176594d 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -40,6 +40,7 @@ enum cpuhp_state {
CPUHP_PAGE_ALLOC_DEAD,
CPUHP_NET_DEV_DEAD,
CPUHP_PCI_XGENE_DEAD,
+   CPUHP_IOMMU_INTEL_DEAD,
CPUHP_WORKQUEUE_PREP,
CPUHP_POWER_NUMA_PREPARE,
CPUHP_HRTIMERS_PREPARE,
-- 
2.10.2

[PATCH 20/22] soc/fsl/qbman: Convert to hotplug state machine

2016-11-26 Thread Sebastian Andrzej Siewior

Install the callbacks via the state machine.

Cc: Claudiu Manoil 
Cc: Scott Wood 
Cc: Roy Pledge 
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/soc/fsl/qbman/qman_portal.c | 39 -
 1 file changed, 12 insertions(+), 27 deletions(-)

diff --git a/drivers/soc/fsl/qbman/qman_portal.c 
b/drivers/soc/fsl/qbman/qman_portal.c
index 148614388fca..d068e4820f49 100644
--- a/drivers/soc/fsl/qbman/qman_portal.c
+++ b/drivers/soc/fsl/qbman/qman_portal.c
@@ -179,7 +179,7 @@ static void qman_portal_update_sdest(const struct 
qm_portal_config *pcfg,
qman_set_sdest(pcfg->channel, cpu);
 }
 
-static void qman_offline_cpu(unsigned int cpu)
+static int qman_offline_cpu(unsigned int cpu)
 {
struct qman_portal *p;
const struct qm_portal_config *pcfg;
@@ -192,9 +192,10 @@ static void qman_offline_cpu(unsigned int cpu)
qman_portal_update_sdest(pcfg, 0);
}
}
+   return 0;
 }
 
-static void qman_online_cpu(unsigned int cpu)
+static int qman_online_cpu(unsigned int cpu)
 {
struct qman_portal *p;
const struct qm_portal_config *pcfg;
@@ -207,31 +208,9 @@ static void qman_online_cpu(unsigned int cpu)
qman_portal_update_sdest(pcfg, cpu);
}
}
+   return 0;
 }
 
-static int qman_hotplug_cpu_callback(struct notifier_block *nfb,
-unsigned long action, void *hcpu)
-{
-   unsigned int cpu = (unsigned long)hcpu;
-
-   switch (action) {
-   case CPU_ONLINE:
-   case CPU_ONLINE_FROZEN:
-   qman_online_cpu(cpu);
-   break;
-   case CPU_DOWN_PREPARE:
-   case CPU_DOWN_PREPARE_FROZEN:
-   qman_offline_cpu(cpu);
-   default:
-   break;
-   }
-   return NOTIFY_OK;
-}
-
-static struct notifier_block qman_hotplug_cpu_notifier = {
-   .notifier_call = qman_hotplug_cpu_callback,
-};
-
 static int qman_portal_probe(struct platform_device *pdev)
 {
struct device *dev = >dev;
@@ -346,8 +325,14 @@ static int __init qman_portal_driver_register(struct 
platform_driver *drv)
if (ret < 0)
return ret;
 
-   register_hotcpu_notifier(_hotplug_cpu_notifier);
-
+   ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
+   "soc/qman_portal:online",
+   qman_online_cpu, qman_offline_cpu);
+   if (ret < 0) {
+   pr_err("qman: failed to register hotplug callbacks.\n");
+   platform_driver_unregister(drv);
+   return ret;
+   }
return 0;
 }
 
-- 
2.10.2

[PATCH 11/22] mm/zswap: Convert dst-mem to hotplug state machine

2016-11-26 Thread Sebastian Andrzej Siewior

Install the callbacks via the state machine and let the core invoke
the callbacks on the already online CPUs.

Cc: Seth Jennings 
Cc: linux...@kvack.org
Signed-off-by: Sebastian Andrzej Siewior 
---
 include/linux/cpuhotplug.h |  1 +
 mm/zswap.c | 75 +++---
 2 files changed, 19 insertions(+), 57 deletions(-)

diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 9f29dd996088..62f51a4e8676 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -65,6 +65,7 @@ enum cpuhp_state {
CPUHP_ARM_BL_PREPARE,
CPUHP_TRACE_RB_PREPARE,
CPUHP_MM_ZS_PREPARE,
+   CPUHP_MM_ZSWP_MEM_PREPARE,
CPUHP_TIMERS_DEAD,
CPUHP_NOTF_ERR_INJ_PREPARE,
CPUHP_MIPS_SOC_PREPARE,
diff --git a/mm/zswap.c b/mm/zswap.c
index 275b22cc8df4..b13aa5706348 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -352,70 +352,28 @@ static struct zswap_entry *zswap_entry_find_get(struct 
rb_root *root,
 **/
 static DEFINE_PER_CPU(u8 *, zswap_dstmem);
 
-static int __zswap_cpu_dstmem_notifier(unsigned long action, unsigned long cpu)
+static int zswap_dstmem_prepare(unsigned int cpu)
 {
u8 *dst;
 
-   switch (action) {
-   case CPU_UP_PREPARE:
-   dst = kmalloc_node(PAGE_SIZE * 2, GFP_KERNEL, cpu_to_node(cpu));
-   if (!dst) {
-   pr_err("can't allocate compressor buffer\n");
-   return NOTIFY_BAD;
-   }
-   per_cpu(zswap_dstmem, cpu) = dst;
-   break;
-   case CPU_DEAD:
-   case CPU_UP_CANCELED:
-   dst = per_cpu(zswap_dstmem, cpu);
-   kfree(dst);
-   per_cpu(zswap_dstmem, cpu) = NULL;
-   break;
-   default:
-   break;
+   dst = kmalloc_node(PAGE_SIZE * 2, GFP_KERNEL, cpu_to_node(cpu));
+   if (!dst) {
+   pr_err("can't allocate compressor buffer\n");
+   return -ENOMEM;
}
-   return NOTIFY_OK;
-}
-
-static int zswap_cpu_dstmem_notifier(struct notifier_block *nb,
-unsigned long action, void *pcpu)
-{
-   return __zswap_cpu_dstmem_notifier(action, (unsigned long)pcpu);
-}
-
-static struct notifier_block zswap_dstmem_notifier = {
-   .notifier_call =zswap_cpu_dstmem_notifier,
-};
-
-static int __init zswap_cpu_dstmem_init(void)
-{
-   unsigned long cpu;
-
-   cpu_notifier_register_begin();
-   for_each_online_cpu(cpu)
-   if (__zswap_cpu_dstmem_notifier(CPU_UP_PREPARE, cpu) ==
-   NOTIFY_BAD)
-   goto cleanup;
-   __register_cpu_notifier(_dstmem_notifier);
-   cpu_notifier_register_done();
+   per_cpu(zswap_dstmem, cpu) = dst;
return 0;
-
-cleanup:
-   for_each_online_cpu(cpu)
-   __zswap_cpu_dstmem_notifier(CPU_UP_CANCELED, cpu);
-   cpu_notifier_register_done();
-   return -ENOMEM;
 }
 
-static void zswap_cpu_dstmem_destroy(void)
+static int zswap_dstmem_dead(unsigned int cpu)
 {
-   unsigned long cpu;
+   u8 *dst;
 
-   cpu_notifier_register_begin();
-   for_each_online_cpu(cpu)
-   __zswap_cpu_dstmem_notifier(CPU_UP_CANCELED, cpu);
-   __unregister_cpu_notifier(_dstmem_notifier);
-   cpu_notifier_register_done();
+   dst = per_cpu(zswap_dstmem, cpu);
+   kfree(dst);
+   per_cpu(zswap_dstmem, cpu) = NULL;
+
+   return 0;
 }
 
 static int __zswap_cpu_comp_notifier(struct zswap_pool *pool,
@@ -1238,6 +1196,7 @@ static void __exit zswap_debugfs_exit(void) { }
 static int __init init_zswap(void)
 {
struct zswap_pool *pool;
+   int ret;
 
zswap_init_started = true;
 
@@ -1246,7 +1205,9 @@ static int __init init_zswap(void)
goto cache_fail;
}
 
-   if (zswap_cpu_dstmem_init()) {
+   ret = cpuhp_setup_state(CPUHP_MM_ZSWP_MEM_PREPARE, "mm/zswap:prepare",
+   zswap_dstmem_prepare, zswap_dstmem_dead);
+   if (ret) {
pr_err("dstmem alloc failed\n");
goto dstmem_fail;
}
@@ -1267,7 +1228,7 @@ static int __init init_zswap(void)
return 0;
 
 pool_fail:
-   zswap_cpu_dstmem_destroy();
+   cpuhp_remove_state(CPUHP_MM_ZSWP_MEM_PREPARE);
 dstmem_fail:
zswap_entry_cache_destroy();
 cache_fail:
-- 
2.10.2

[PATCH 15/22] arm64/cpuinfo: Make hotplug notifier symmetric

2016-11-26 Thread Sebastian Andrzej Siewior

From: Anna-Maria Gleixner 

There is no requirement to keep the sysfs files around until the CPU is
completely dead. Remove them during the DOWN_PREPARE notification. This is
a preparatory patch for converting to the hotplug state machine.

Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: linux-arm-ker...@lists.infradead.org
Signed-off-by: Anna-Maria Gleixner 
Signed-off-by: Sebastian Andrzej Siewior 
---
 arch/arm64/kernel/cpuinfo.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index b3d5b3e8fbcb..19aad7041e14 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -272,9 +272,10 @@ static int cpuid_callback(struct notifier_block *nb,
 
switch (action & ~CPU_TASKS_FROZEN) {
case CPU_ONLINE:
+   case CPU_DOWN_FAILED:
rc = cpuid_add_regs(cpu);
break;
-   case CPU_DEAD:
+   case CPU_DOWN_PREPARE:
rc = cpuid_remove_regs(cpu);
break;
}
-- 
2.10.2

Re: [RFC] kernel/sysctl.c: return -EINVAL when write invalid val to ulong type sysctl

2016-11-26 Thread subashab


On 2016-11-26 02:13, Yisheng Xie wrote:
I tried to echo an invalid value to an unsigned long type sysctl on 
4.9.0-rc6:

   linux:~# cat /proc/sys/vm/user_reserve_kbytes
   131072
   linux:~# echo -1 > /proc/sys/vm/user_reserve_kbytes
   linux:~# cat /proc/sys/vm/user_reserve_kbytes
   131072

The echo operation got error and the value do not write to 
user_reserve_kbytes,

however, user do not know it until check the value again.

Is it more suitable to return -EINVAL when echo an invalid value to an
unsigned long
type sysctl, in order to let user know what happened without checking
its value once more?
Just as what int type sysctl do:
   linux:~#cat /proc/sys/kernel/sysctl_writes_strict
   1
   linux:~# echo 3 > /proc/sys/kernel/sysctl_writes_strict
   bash: echo: write error: Invalid argument

--
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 706309f..40e9285 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -2485,10 +2485,14 @@ static int __do_proc_doulongvec_minmax(void
*data, struct ctl_table *table, int
 sizeof(proc_wspace_sep), 
NULL);

if (err)
break;
-   if (neg)
-   continue;
-   if ((min && val < *min) || (max && val > *max))
-   continue;
+   if (neg) {
+   err = -EINVAL;
+   break;
+   }
+   if ((min && val < *min) || (max && val > *max)) 
{

+   err = -EINVAL;
+   break;
+   }
*i = val;
} else {
val = convdiv * (*i) / convmul;


Agree, this should be similar to proc_douintvec

root@vm:~# echo 8192 > /proc/sys/net/core/xfrm_aevent_rseqth
root@vm:~# cat /proc/sys/net/core/xfrm_aevent_rseqth
8192
root@vm:~# echo -1 > /proc/sys/net/core/xfrm_aevent_rseqth
-bash: echo: write error: Invalid argument
root@vm:~# cat /proc/sys/net/core/xfrm_aevent_rseqth
8192

--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a 
Linux Foundation Collaborative Project

[PATCH 14/22] mm/compaction: Convert to hotplug state machine

2016-11-26 Thread Sebastian Andrzej Siewior

From: Anna-Maria Gleixner 

Install the callbacks via the state machine. Should the hotplug init fail then
no threads are spawned.

Cc: Andrew Morton 
Cc: Vlastimil Babka 
Cc: Michal Hocko 
Cc: Mel Gorman 
Cc: linux...@kvack.org
Signed-off-by: Anna-Maria Gleixner 
Signed-off-by: Sebastian Andrzej Siewior 
---
 mm/compaction.c | 31 ++-
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 0409a4ad6ea1..0d37192d9423 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2043,33 +2043,38 @@ void kcompactd_stop(int nid)
  * away, we get changed to run anywhere: as the first one comes back,
  * restore their cpu bindings.
  */
-static int cpu_callback(struct notifier_block *nfb, unsigned long action,
-   void *hcpu)
+static int kcompactd_cpu_online(unsigned int cpu)
 {
int nid;
 
-   if (action == CPU_ONLINE || action == CPU_ONLINE_FROZEN) {
-   for_each_node_state(nid, N_MEMORY) {
-   pg_data_t *pgdat = NODE_DATA(nid);
-   const struct cpumask *mask;
+   for_each_node_state(nid, N_MEMORY) {
+   pg_data_t *pgdat = NODE_DATA(nid);
+   const struct cpumask *mask;
 
-   mask = cpumask_of_node(pgdat->node_id);
+   mask = cpumask_of_node(pgdat->node_id);
 
-   if (cpumask_any_and(cpu_online_mask, mask) < nr_cpu_ids)
-   /* One of our CPUs online: restore mask */
-   set_cpus_allowed_ptr(pgdat->kcompactd, mask);
-   }
+   if (cpumask_any_and(cpu_online_mask, mask) < nr_cpu_ids)
+   /* One of our CPUs online: restore mask */
+   set_cpus_allowed_ptr(pgdat->kcompactd, mask);
}
-   return NOTIFY_OK;
+   return 0;
 }
 
 static int __init kcompactd_init(void)
 {
int nid;
+   int ret;
+
+   ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
+   "mm/compaction:online",
+   kcompactd_cpu_online, NULL);
+   if (ret < 0) {
+   pr_err("kcompactd: failed to register hotplug callbacks.\n");
+   return ret;
+   }
 
for_each_node_state(nid, N_MEMORY)
kcompactd_run(nid);
-   hotcpu_notifier(cpu_callback, 0);
return 0;
 }
 subsys_initcall(kcompactd_init)
-- 
2.10.2

[PATCH 10/22] mm/zsmalloc: Convert to hotplug state machine

2016-11-26 Thread Sebastian Andrzej Siewior

Install the callbacks via the state machine and let the core invoke
the callbacks on the already online CPUs.

Cc: Minchan Kim 
Cc: Nitin Gupta 
Cc: Sergey Senozhatsky 
CC: linux...@kvack.org
Signed-off-by: Sebastian Andrzej Siewior 
---
 include/linux/cpuhotplug.h |  1 +
 mm/zsmalloc.c  | 67 +-
 2 files changed, 14 insertions(+), 54 deletions(-)

diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 4ebd1bc27f8d..9f29dd996088 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -64,6 +64,7 @@ enum cpuhp_state {
CPUHP_NET_IUCV_PREPARE,
CPUHP_ARM_BL_PREPARE,
CPUHP_TRACE_RB_PREPARE,
+   CPUHP_MM_ZS_PREPARE,
CPUHP_TIMERS_DEAD,
CPUHP_NOTF_ERR_INJ_PREPARE,
CPUHP_MIPS_SOC_PREPARE,
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index b0bc023d25c5..9cc3c0b2c2c1 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1284,61 +1284,21 @@ static void __zs_unmap_object(struct mapping_area *area,
 
 #endif /* CONFIG_PGTABLE_MAPPING */
 
-static int zs_cpu_notifier(struct notifier_block *nb, unsigned long action,
-   void *pcpu)
+static int zs_cpu_prepare(unsigned int cpu)
 {
-   int ret, cpu = (long)pcpu;
struct mapping_area *area;
 
-   switch (action) {
-   case CPU_UP_PREPARE:
-   area = _cpu(zs_map_area, cpu);
-   ret = __zs_cpu_up(area);
-   if (ret)
-   return notifier_from_errno(ret);
-   break;
-   case CPU_DEAD:
-   case CPU_UP_CANCELED:
-   area = _cpu(zs_map_area, cpu);
-   __zs_cpu_down(area);
-   break;
-   }
-
-   return NOTIFY_OK;
+   area = _cpu(zs_map_area, cpu);
+   return __zs_cpu_up(area);
 }
 
-static struct notifier_block zs_cpu_nb = {
-   .notifier_call = zs_cpu_notifier
-};
-
-static int zs_register_cpu_notifier(void)
+static int zs_cpu_dead(unsigned int cpu)
 {
-   int cpu, uninitialized_var(ret);
+   struct mapping_area *area;
 
-   cpu_notifier_register_begin();
-
-   __register_cpu_notifier(_cpu_nb);
-   for_each_online_cpu(cpu) {
-   ret = zs_cpu_notifier(NULL, CPU_UP_PREPARE, (void *)(long)cpu);
-   if (notifier_to_errno(ret))
-   break;
-   }
-
-   cpu_notifier_register_done();
-   return notifier_to_errno(ret);
-}
-
-static void zs_unregister_cpu_notifier(void)
-{
-   int cpu;
-
-   cpu_notifier_register_begin();
-
-   for_each_online_cpu(cpu)
-   zs_cpu_notifier(NULL, CPU_DEAD, (void *)(long)cpu);
-   __unregister_cpu_notifier(_cpu_nb);
-
-   cpu_notifier_register_done();
+   area = _cpu(zs_map_area, cpu);
+   __zs_cpu_down(area);
+   return 0;
 }
 
 static void __init init_zs_size_classes(void)
@@ -2534,10 +2494,10 @@ static int __init zs_init(void)
if (ret)
goto out;
 
-   ret = zs_register_cpu_notifier();
-
+   ret = cpuhp_setup_state(CPUHP_MM_ZS_PREPARE, "mm/zsmalloc:prepare",
+   zs_cpu_prepare, zs_cpu_dead);
if (ret)
-   goto notifier_fail;
+   goto hp_setup_fail;
 
init_zs_size_classes();
 
@@ -2549,8 +2509,7 @@ static int __init zs_init(void)
 
return 0;
 
-notifier_fail:
-   zs_unregister_cpu_notifier();
+hp_setup_fail:
zsmalloc_unmount();
 out:
return ret;
@@ -2562,7 +2521,7 @@ static void __exit zs_exit(void)
zpool_unregister_driver(_zpool_driver);
 #endif
zsmalloc_unmount();
-   zs_unregister_cpu_notifier();
+   cpuhp_remove_state(CPUHP_MM_ZS_PREPARE);
 
zs_stat_exit();
 }
-- 
2.10.2

[PATCH 13/22] iommu/vt-d: Convert to hotplug state machine

2016-11-26 Thread Sebastian Andrzej Siewior

From: Anna-Maria Gleixner 

Install the callbacks via the state machine.

Cc: David Woodhouse 
Cc: Joerg Roedel 
Cc: io...@lists.linux-foundation.org
Signed-off-by: Anna-Maria Gleixner 
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/iommu/intel-iommu.c | 24 ++--
 include/linux/cpuhotplug.h  |  1 +
 2 files changed, 7 insertions(+), 18 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index a4407eabf0e6..fd7962560e56 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4665,25 +4665,13 @@ static void free_all_cpu_cached_iovas(unsigned int cpu)
}
 }
 
-static int intel_iommu_cpu_notifier(struct notifier_block *nfb,
-   unsigned long action, void *v)
+static int intel_iommu_cpu_dead(unsigned int cpu)
 {
-   unsigned int cpu = (unsigned long)v;
-
-   switch (action) {
-   case CPU_DEAD:
-   case CPU_DEAD_FROZEN:
-   free_all_cpu_cached_iovas(cpu);
-   flush_unmaps_timeout(cpu);
-   break;
-   }
-   return NOTIFY_OK;
+   free_all_cpu_cached_iovas(cpu);
+   flush_unmaps_timeout(cpu);
+   return 0;
 }
 
-static struct notifier_block intel_iommu_cpu_nb = {
-   .notifier_call = intel_iommu_cpu_notifier,
-};
-
 static ssize_t intel_iommu_show_version(struct device *dev,
struct device_attribute *attr,
char *buf)
@@ -4832,8 +4820,8 @@ int __init intel_iommu_init(void)
bus_register_notifier(_bus_type, _nb);
if (si_domain && !hw_pass_through)
register_memory_notifier(_iommu_memory_nb);
-   register_hotcpu_notifier(_iommu_cpu_nb);
-
+   cpuhp_setup_state(CPUHP_IOMMU_INTEL_DEAD, "iommu/intel:dead", NULL,
+ intel_iommu_cpu_dead);
intel_iommu_enabled = 1;
 
return 0;
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index c7d0d76ef0ee..853f8176594d 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -40,6 +40,7 @@ enum cpuhp_state {
CPUHP_PAGE_ALLOC_DEAD,
CPUHP_NET_DEV_DEAD,
CPUHP_PCI_XGENE_DEAD,
+   CPUHP_IOMMU_INTEL_DEAD,
CPUHP_WORKQUEUE_PREP,
CPUHP_POWER_NUMA_PREPARE,
CPUHP_HRTIMERS_PREPARE,
-- 
2.10.2

[PATCH 20/22] soc/fsl/qbman: Convert to hotplug state machine

2016-11-26 Thread Sebastian Andrzej Siewior

Install the callbacks via the state machine.

Cc: Claudiu Manoil 
Cc: Scott Wood 
Cc: Roy Pledge 
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/soc/fsl/qbman/qman_portal.c | 39 -
 1 file changed, 12 insertions(+), 27 deletions(-)

diff --git a/drivers/soc/fsl/qbman/qman_portal.c 
b/drivers/soc/fsl/qbman/qman_portal.c
index 148614388fca..d068e4820f49 100644
--- a/drivers/soc/fsl/qbman/qman_portal.c
+++ b/drivers/soc/fsl/qbman/qman_portal.c
@@ -179,7 +179,7 @@ static void qman_portal_update_sdest(const struct 
qm_portal_config *pcfg,
qman_set_sdest(pcfg->channel, cpu);
 }
 
-static void qman_offline_cpu(unsigned int cpu)
+static int qman_offline_cpu(unsigned int cpu)
 {
struct qman_portal *p;
const struct qm_portal_config *pcfg;
@@ -192,9 +192,10 @@ static void qman_offline_cpu(unsigned int cpu)
qman_portal_update_sdest(pcfg, 0);
}
}
+   return 0;
 }
 
-static void qman_online_cpu(unsigned int cpu)
+static int qman_online_cpu(unsigned int cpu)
 {
struct qman_portal *p;
const struct qm_portal_config *pcfg;
@@ -207,31 +208,9 @@ static void qman_online_cpu(unsigned int cpu)
qman_portal_update_sdest(pcfg, cpu);
}
}
+   return 0;
 }
 
-static int qman_hotplug_cpu_callback(struct notifier_block *nfb,
-unsigned long action, void *hcpu)
-{
-   unsigned int cpu = (unsigned long)hcpu;
-
-   switch (action) {
-   case CPU_ONLINE:
-   case CPU_ONLINE_FROZEN:
-   qman_online_cpu(cpu);
-   break;
-   case CPU_DOWN_PREPARE:
-   case CPU_DOWN_PREPARE_FROZEN:
-   qman_offline_cpu(cpu);
-   default:
-   break;
-   }
-   return NOTIFY_OK;
-}
-
-static struct notifier_block qman_hotplug_cpu_notifier = {
-   .notifier_call = qman_hotplug_cpu_callback,
-};
-
 static int qman_portal_probe(struct platform_device *pdev)
 {
struct device *dev = >dev;
@@ -346,8 +325,14 @@ static int __init qman_portal_driver_register(struct 
platform_driver *drv)
if (ret < 0)
return ret;
 
-   register_hotcpu_notifier(_hotplug_cpu_notifier);
-
+   ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
+   "soc/qman_portal:online",
+   qman_online_cpu, qman_offline_cpu);
+   if (ret < 0) {
+   pr_err("qman: failed to register hotplug callbacks.\n");
+   platform_driver_unregister(drv);
+   return ret;
+   }
return 0;
 }
 
-- 
2.10.2

[PATCH 11/22] mm/zswap: Convert dst-mem to hotplug state machine

2016-11-26 Thread Sebastian Andrzej Siewior

Install the callbacks via the state machine and let the core invoke
the callbacks on the already online CPUs.

Cc: Seth Jennings 
Cc: linux...@kvack.org
Signed-off-by: Sebastian Andrzej Siewior 
---
 include/linux/cpuhotplug.h |  1 +
 mm/zswap.c | 75 +++---
 2 files changed, 19 insertions(+), 57 deletions(-)

diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 9f29dd996088..62f51a4e8676 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -65,6 +65,7 @@ enum cpuhp_state {
CPUHP_ARM_BL_PREPARE,
CPUHP_TRACE_RB_PREPARE,
CPUHP_MM_ZS_PREPARE,
+   CPUHP_MM_ZSWP_MEM_PREPARE,
CPUHP_TIMERS_DEAD,
CPUHP_NOTF_ERR_INJ_PREPARE,
CPUHP_MIPS_SOC_PREPARE,
diff --git a/mm/zswap.c b/mm/zswap.c
index 275b22cc8df4..b13aa5706348 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -352,70 +352,28 @@ static struct zswap_entry *zswap_entry_find_get(struct 
rb_root *root,
 **/
 static DEFINE_PER_CPU(u8 *, zswap_dstmem);
 
-static int __zswap_cpu_dstmem_notifier(unsigned long action, unsigned long cpu)
+static int zswap_dstmem_prepare(unsigned int cpu)
 {
u8 *dst;
 
-   switch (action) {
-   case CPU_UP_PREPARE:
-   dst = kmalloc_node(PAGE_SIZE * 2, GFP_KERNEL, cpu_to_node(cpu));
-   if (!dst) {
-   pr_err("can't allocate compressor buffer\n");
-   return NOTIFY_BAD;
-   }
-   per_cpu(zswap_dstmem, cpu) = dst;
-   break;
-   case CPU_DEAD:
-   case CPU_UP_CANCELED:
-   dst = per_cpu(zswap_dstmem, cpu);
-   kfree(dst);
-   per_cpu(zswap_dstmem, cpu) = NULL;
-   break;
-   default:
-   break;
+   dst = kmalloc_node(PAGE_SIZE * 2, GFP_KERNEL, cpu_to_node(cpu));
+   if (!dst) {
+   pr_err("can't allocate compressor buffer\n");
+   return -ENOMEM;
}
-   return NOTIFY_OK;
-}
-
-static int zswap_cpu_dstmem_notifier(struct notifier_block *nb,
-unsigned long action, void *pcpu)
-{
-   return __zswap_cpu_dstmem_notifier(action, (unsigned long)pcpu);
-}
-
-static struct notifier_block zswap_dstmem_notifier = {
-   .notifier_call =zswap_cpu_dstmem_notifier,
-};
-
-static int __init zswap_cpu_dstmem_init(void)
-{
-   unsigned long cpu;
-
-   cpu_notifier_register_begin();
-   for_each_online_cpu(cpu)
-   if (__zswap_cpu_dstmem_notifier(CPU_UP_PREPARE, cpu) ==
-   NOTIFY_BAD)
-   goto cleanup;
-   __register_cpu_notifier(_dstmem_notifier);
-   cpu_notifier_register_done();
+   per_cpu(zswap_dstmem, cpu) = dst;
return 0;
-
-cleanup:
-   for_each_online_cpu(cpu)
-   __zswap_cpu_dstmem_notifier(CPU_UP_CANCELED, cpu);
-   cpu_notifier_register_done();
-   return -ENOMEM;
 }
 
-static void zswap_cpu_dstmem_destroy(void)
+static int zswap_dstmem_dead(unsigned int cpu)
 {
-   unsigned long cpu;
+   u8 *dst;
 
-   cpu_notifier_register_begin();
-   for_each_online_cpu(cpu)
-   __zswap_cpu_dstmem_notifier(CPU_UP_CANCELED, cpu);
-   __unregister_cpu_notifier(_dstmem_notifier);
-   cpu_notifier_register_done();
+   dst = per_cpu(zswap_dstmem, cpu);
+   kfree(dst);
+   per_cpu(zswap_dstmem, cpu) = NULL;
+
+   return 0;
 }
 
 static int __zswap_cpu_comp_notifier(struct zswap_pool *pool,
@@ -1238,6 +1196,7 @@ static void __exit zswap_debugfs_exit(void) { }
 static int __init init_zswap(void)
 {
struct zswap_pool *pool;
+   int ret;
 
zswap_init_started = true;
 
@@ -1246,7 +1205,9 @@ static int __init init_zswap(void)
goto cache_fail;
}
 
-   if (zswap_cpu_dstmem_init()) {
+   ret = cpuhp_setup_state(CPUHP_MM_ZSWP_MEM_PREPARE, "mm/zswap:prepare",
+   zswap_dstmem_prepare, zswap_dstmem_dead);
+   if (ret) {
pr_err("dstmem alloc failed\n");
goto dstmem_fail;
}
@@ -1267,7 +1228,7 @@ static int __init init_zswap(void)
return 0;
 
 pool_fail:
-   zswap_cpu_dstmem_destroy();
+   cpuhp_remove_state(CPUHP_MM_ZSWP_MEM_PREPARE);
 dstmem_fail:
zswap_entry_cache_destroy();
 cache_fail:
-- 
2.10.2

[PATCH 15/22] arm64/cpuinfo: Make hotplug notifier symmetric

2016-11-26 Thread Sebastian Andrzej Siewior

From: Anna-Maria Gleixner 

There is no requirement to keep the sysfs files around until the CPU is
completely dead. Remove them during the DOWN_PREPARE notification. This is
a preparatory patch for converting to the hotplug state machine.

Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: linux-arm-ker...@lists.infradead.org
Signed-off-by: Anna-Maria Gleixner 
Signed-off-by: Sebastian Andrzej Siewior 
---
 arch/arm64/kernel/cpuinfo.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index b3d5b3e8fbcb..19aad7041e14 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -272,9 +272,10 @@ static int cpuid_callback(struct notifier_block *nb,
 
switch (action & ~CPU_TASKS_FROZEN) {
case CPU_ONLINE:
+   case CPU_DOWN_FAILED:
rc = cpuid_add_regs(cpu);
break;
-   case CPU_DEAD:
+   case CPU_DOWN_PREPARE:
rc = cpuid_remove_regs(cpu);
break;
}
-- 
2.10.2

Re: [RFC] kernel/sysctl.c: return -EINVAL when write invalid val to ulong type sysctl

2016-11-26 Thread subashab


On 2016-11-26 02:13, Yisheng Xie wrote:
I tried to echo an invalid value to an unsigned long type sysctl on 
4.9.0-rc6:

   linux:~# cat /proc/sys/vm/user_reserve_kbytes
   131072
   linux:~# echo -1 > /proc/sys/vm/user_reserve_kbytes
   linux:~# cat /proc/sys/vm/user_reserve_kbytes
   131072

The echo operation got error and the value do not write to 
user_reserve_kbytes,

however, user do not know it until check the value again.

Is it more suitable to return -EINVAL when echo an invalid value to an
unsigned long
type sysctl, in order to let user know what happened without checking
its value once more?
Just as what int type sysctl do:
   linux:~#cat /proc/sys/kernel/sysctl_writes_strict
   1
   linux:~# echo 3 > /proc/sys/kernel/sysctl_writes_strict
   bash: echo: write error: Invalid argument

--
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 706309f..40e9285 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -2485,10 +2485,14 @@ static int __do_proc_doulongvec_minmax(void
*data, struct ctl_table *table, int
 sizeof(proc_wspace_sep), 
NULL);

if (err)
break;
-   if (neg)
-   continue;
-   if ((min && val < *min) || (max && val > *max))
-   continue;
+   if (neg) {
+   err = -EINVAL;
+   break;
+   }
+   if ((min && val < *min) || (max && val > *max)) 
{

+   err = -EINVAL;
+   break;
+   }
*i = val;
} else {
val = convdiv * (*i) / convmul;


Agree, this should be similar to proc_douintvec

root@vm:~# echo 8192 > /proc/sys/net/core/xfrm_aevent_rseqth
root@vm:~# cat /proc/sys/net/core/xfrm_aevent_rseqth
8192
root@vm:~# echo -1 > /proc/sys/net/core/xfrm_aevent_rseqth
-bash: echo: write error: Invalid argument
root@vm:~# cat /proc/sys/net/core/xfrm_aevent_rseqth
8192

--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a 
Linux Foundation Collaborative Project

[PATCH 14/22] mm/compaction: Convert to hotplug state machine

2016-11-26 Thread Sebastian Andrzej Siewior

From: Anna-Maria Gleixner 

Install the callbacks via the state machine. Should the hotplug init fail then
no threads are spawned.

Cc: Andrew Morton 
Cc: Vlastimil Babka 
Cc: Michal Hocko 
Cc: Mel Gorman 
Cc: linux...@kvack.org
Signed-off-by: Anna-Maria Gleixner 
Signed-off-by: Sebastian Andrzej Siewior 
---
 mm/compaction.c | 31 ++-
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 0409a4ad6ea1..0d37192d9423 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2043,33 +2043,38 @@ void kcompactd_stop(int nid)
  * away, we get changed to run anywhere: as the first one comes back,
  * restore their cpu bindings.
  */
-static int cpu_callback(struct notifier_block *nfb, unsigned long action,
-   void *hcpu)
+static int kcompactd_cpu_online(unsigned int cpu)
 {
int nid;
 
-   if (action == CPU_ONLINE || action == CPU_ONLINE_FROZEN) {
-   for_each_node_state(nid, N_MEMORY) {
-   pg_data_t *pgdat = NODE_DATA(nid);
-   const struct cpumask *mask;
+   for_each_node_state(nid, N_MEMORY) {
+   pg_data_t *pgdat = NODE_DATA(nid);
+   const struct cpumask *mask;
 
-   mask = cpumask_of_node(pgdat->node_id);
+   mask = cpumask_of_node(pgdat->node_id);
 
-   if (cpumask_any_and(cpu_online_mask, mask) < nr_cpu_ids)
-   /* One of our CPUs online: restore mask */
-   set_cpus_allowed_ptr(pgdat->kcompactd, mask);
-   }
+   if (cpumask_any_and(cpu_online_mask, mask) < nr_cpu_ids)
+   /* One of our CPUs online: restore mask */
+   set_cpus_allowed_ptr(pgdat->kcompactd, mask);
}
-   return NOTIFY_OK;
+   return 0;
 }
 
 static int __init kcompactd_init(void)
 {
int nid;
+   int ret;
+
+   ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
+   "mm/compaction:online",
+   kcompactd_cpu_online, NULL);
+   if (ret < 0) {
+   pr_err("kcompactd: failed to register hotplug callbacks.\n");
+   return ret;
+   }
 
for_each_node_state(nid, N_MEMORY)
kcompactd_run(nid);
-   hotcpu_notifier(cpu_callback, 0);
return 0;
 }
 subsys_initcall(kcompactd_init)
-- 
2.10.2

[PATCH 16/22] arm64/cpuinfo: Convert to hotplug state machine

2016-11-26 Thread Sebastian Andrzej Siewior

From: Anna-Maria Gleixner 

Install the callbacks via the state machine and let the core invoke
the callbacks on the already online CPUs.

Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: linux-arm-ker...@lists.infradead.org
Signed-off-by: Anna-Maria Gleixner 
Signed-off-by: Sebastian Andrzej Siewior 
---
 arch/arm64/kernel/cpuinfo.c | 37 +
 1 file changed, 9 insertions(+), 28 deletions(-)

diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index 19aad7041e14..7b7be71e87bf 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -227,7 +227,7 @@ static struct attribute_group cpuregs_attr_group = {
.name = "identification"
 };
 
-static int cpuid_add_regs(int cpu)
+static int cpuid_cpu_online(unsigned int cpu)
 {
int rc;
struct device *dev;
@@ -248,7 +248,7 @@ static int cpuid_add_regs(int cpu)
return rc;
 }
 
-static int cpuid_remove_regs(int cpu)
+static int cpuid_cpu_offline(unsigned int cpu)
 {
struct device *dev;
struct cpuinfo_arm64 *info = _cpu(cpu_data, cpu);
@@ -264,41 +264,22 @@ static int cpuid_remove_regs(int cpu)
return 0;
 }
 
-static int cpuid_callback(struct notifier_block *nb,
-unsigned long action, void *hcpu)
-{
-   int rc = 0;
-   unsigned long cpu = (unsigned long)hcpu;
-
-   switch (action & ~CPU_TASKS_FROZEN) {
-   case CPU_ONLINE:
-   case CPU_DOWN_FAILED:
-   rc = cpuid_add_regs(cpu);
-   break;
-   case CPU_DOWN_PREPARE:
-   rc = cpuid_remove_regs(cpu);
-   break;
-   }
-
-   return notifier_from_errno(rc);
-}
-
 static int __init cpuinfo_regs_init(void)
 {
-   int cpu;
-
-   cpu_notifier_register_begin();
+   int cpu, ret;
 
for_each_possible_cpu(cpu) {
struct cpuinfo_arm64 *info = _cpu(cpu_data, cpu);
 
kobject_init(>kobj, _kobj_type);
-   if (cpu_online(cpu))
-   cpuid_add_regs(cpu);
}
-   __hotcpu_notifier(cpuid_callback, 0);
 
-   cpu_notifier_register_done();
+   ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "arm64/cpuinfo:online",
+   cpuid_cpu_online, cpuid_cpu_offline);
+   if (ret < 0) {
+   pr_err("cpuinfo: failed to register hotplug callbacks.\n");
+   return ret;
+   }
return 0;
 }
 static void cpuinfo_detect_icache_policy(struct cpuinfo_arm64 *info)
-- 
2.10.2

[PATCH 16/22] arm64/cpuinfo: Convert to hotplug state machine

2016-11-26 Thread Sebastian Andrzej Siewior

From: Anna-Maria Gleixner 

Install the callbacks via the state machine and let the core invoke
the callbacks on the already online CPUs.

Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: linux-arm-ker...@lists.infradead.org
Signed-off-by: Anna-Maria Gleixner 
Signed-off-by: Sebastian Andrzej Siewior 
---
 arch/arm64/kernel/cpuinfo.c | 37 +
 1 file changed, 9 insertions(+), 28 deletions(-)

diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index 19aad7041e14..7b7be71e87bf 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -227,7 +227,7 @@ static struct attribute_group cpuregs_attr_group = {
.name = "identification"
 };
 
-static int cpuid_add_regs(int cpu)
+static int cpuid_cpu_online(unsigned int cpu)
 {
int rc;
struct device *dev;
@@ -248,7 +248,7 @@ static int cpuid_add_regs(int cpu)
return rc;
 }
 
-static int cpuid_remove_regs(int cpu)
+static int cpuid_cpu_offline(unsigned int cpu)
 {
struct device *dev;
struct cpuinfo_arm64 *info = _cpu(cpu_data, cpu);
@@ -264,41 +264,22 @@ static int cpuid_remove_regs(int cpu)
return 0;
 }
 
-static int cpuid_callback(struct notifier_block *nb,
-unsigned long action, void *hcpu)
-{
-   int rc = 0;
-   unsigned long cpu = (unsigned long)hcpu;
-
-   switch (action & ~CPU_TASKS_FROZEN) {
-   case CPU_ONLINE:
-   case CPU_DOWN_FAILED:
-   rc = cpuid_add_regs(cpu);
-   break;
-   case CPU_DOWN_PREPARE:
-   rc = cpuid_remove_regs(cpu);
-   break;
-   }
-
-   return notifier_from_errno(rc);
-}
-
 static int __init cpuinfo_regs_init(void)
 {
-   int cpu;
-
-   cpu_notifier_register_begin();
+   int cpu, ret;
 
for_each_possible_cpu(cpu) {
struct cpuinfo_arm64 *info = _cpu(cpu_data, cpu);
 
kobject_init(>kobj, _kobj_type);
-   if (cpu_online(cpu))
-   cpuid_add_regs(cpu);
}
-   __hotcpu_notifier(cpuid_callback, 0);
 
-   cpu_notifier_register_done();
+   ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "arm64/cpuinfo:online",
+   cpuid_cpu_online, cpuid_cpu_offline);
+   if (ret < 0) {
+   pr_err("cpuinfo: failed to register hotplug callbacks.\n");
+   return ret;
+   }
return 0;
 }
 static void cpuinfo_detect_icache_policy(struct cpuinfo_arm64 *info)
-- 
2.10.2

[PATCH 18/22] zram: Convert to hotplug state machine

2016-11-26 Thread Sebastian Andrzej Siewior

From: Anna-Maria Gleixner 

Install the callbacks via the state machine with multi instance support and let
the core invoke the callbacks on the already online CPUs.

Cc: Minchan Kim 
Cc: Nitin Gupta 
Cc: Sergey Senozhatsky 
[bigeasy: wire up the multi instance stuff]
Signed-off-by: Anna-Maria Gleixner 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Thomas Gleixner 
---
 drivers/block/zram/zcomp.c| 76 ++-
 drivers/block/zram/zcomp.h|  5 +--
 drivers/block/zram/zram_drv.c |  9 +
 include/linux/cpuhotplug.h|  1 +
 4 files changed, 38 insertions(+), 53 deletions(-)

diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
index 4b5cd3a7b2b6..12046f4f00e4 100644
--- a/drivers/block/zram/zcomp.c
+++ b/drivers/block/zram/zcomp.c
@@ -160,82 +160,56 @@ int zcomp_decompress(struct zcomp_strm *zstrm,
dst, _len);
 }
 
-static int __zcomp_cpu_notifier(struct zcomp *comp,
-   unsigned long action, unsigned long cpu)
+int zcomp_cpu_up_prepare(unsigned int cpu, struct hlist_node *node)
 {
+   struct zcomp *comp = hlist_entry(node, struct zcomp, node);
struct zcomp_strm *zstrm;
 
-   switch (action) {
-   case CPU_UP_PREPARE:
-   if (WARN_ON(*per_cpu_ptr(comp->stream, cpu)))
-   break;
-   zstrm = zcomp_strm_alloc(comp);
-   if (IS_ERR_OR_NULL(zstrm)) {
-   pr_err("Can't allocate a compression stream\n");
-   return NOTIFY_BAD;
-   }
-   *per_cpu_ptr(comp->stream, cpu) = zstrm;
-   break;
-   case CPU_DEAD:
-   case CPU_UP_CANCELED:
-   zstrm = *per_cpu_ptr(comp->stream, cpu);
-   if (!IS_ERR_OR_NULL(zstrm))
-   zcomp_strm_free(zstrm);
-   *per_cpu_ptr(comp->stream, cpu) = NULL;
-   break;
-   default:
-   break;
+   if (WARN_ON(*per_cpu_ptr(comp->stream, cpu)))
+   return 0;
+
+   zstrm = zcomp_strm_alloc(comp);
+   if (IS_ERR_OR_NULL(zstrm)) {
+   pr_err("Can't allocate a compression stream\n");
+   return -ENOMEM;
}
-   return NOTIFY_OK;
+   *per_cpu_ptr(comp->stream, cpu) = zstrm;
+   return 0;
 }
 
-static int zcomp_cpu_notifier(struct notifier_block *nb,
-   unsigned long action, void *pcpu)
+int zcomp_cpu_dead(unsigned int cpu, struct hlist_node *node)
 {
-   unsigned long cpu = (unsigned long)pcpu;
-   struct zcomp *comp = container_of(nb, typeof(*comp), notifier);
+   struct zcomp *comp = hlist_entry(node, struct zcomp, node);
+   struct zcomp_strm *zstrm;
 
-   return __zcomp_cpu_notifier(comp, action, cpu);
+   zstrm = *per_cpu_ptr(comp->stream, cpu);
+   if (!IS_ERR_OR_NULL(zstrm))
+   zcomp_strm_free(zstrm);
+   *per_cpu_ptr(comp->stream, cpu) = NULL;
+   return 0;
 }
 
 static int zcomp_init(struct zcomp *comp)
 {
-   unsigned long cpu;
int ret;
 
-   comp->notifier.notifier_call = zcomp_cpu_notifier;
-
comp->stream = alloc_percpu(struct zcomp_strm *);
if (!comp->stream)
return -ENOMEM;
 
-   cpu_notifier_register_begin();
-   for_each_online_cpu(cpu) {
-   ret = __zcomp_cpu_notifier(comp, CPU_UP_PREPARE, cpu);
-   if (ret == NOTIFY_BAD)
-   goto cleanup;
-   }
-   __register_cpu_notifier(>notifier);
-   cpu_notifier_register_done();
+   ret = cpuhp_state_add_instance(CPUHP_ZCOMP_PREPARE, >node);
+   if (ret < 0)
+   goto cleanup;
return 0;
 
 cleanup:
-   for_each_online_cpu(cpu)
-   __zcomp_cpu_notifier(comp, CPU_UP_CANCELED, cpu);
-   cpu_notifier_register_done();
-   return -ENOMEM;
+   free_percpu(comp->stream);
+   return ret;
 }
 
 void zcomp_destroy(struct zcomp *comp)
 {
-   unsigned long cpu;
-
-   cpu_notifier_register_begin();
-   for_each_online_cpu(cpu)
-   __zcomp_cpu_notifier(comp, CPU_UP_CANCELED, cpu);
-   __unregister_cpu_notifier(>notifier);
-   cpu_notifier_register_done();
-
+   cpuhp_state_remove_instance(CPUHP_ZCOMP_PREPARE, >node);
free_percpu(comp->stream);
kfree(comp);
 }
diff --git a/drivers/block/zram/zcomp.h b/drivers/block/zram/zcomp.h
index 478cac2ed465..41c1002a7d7d 100644
--- a/drivers/block/zram/zcomp.h
+++ b/drivers/block/zram/zcomp.h
@@ -19,11 +19,12 @@ struct zcomp_strm {
 /* dynamic per-device compression frontend */
 struct zcomp {
struct zcomp_strm * __percpu *stream;
-   struct notifier_block notifier;
-
const char *name;
+   struct hlist_node node;
 };
 
+int

[PATCH 18/22] zram: Convert to hotplug state machine

2016-11-26 Thread Sebastian Andrzej Siewior

From: Anna-Maria Gleixner 

Install the callbacks via the state machine with multi instance support and let
the core invoke the callbacks on the already online CPUs.

Cc: Minchan Kim 
Cc: Nitin Gupta 
Cc: Sergey Senozhatsky 
[bigeasy: wire up the multi instance stuff]
Signed-off-by: Anna-Maria Gleixner 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Thomas Gleixner 
---
 drivers/block/zram/zcomp.c| 76 ++-
 drivers/block/zram/zcomp.h|  5 +--
 drivers/block/zram/zram_drv.c |  9 +
 include/linux/cpuhotplug.h|  1 +
 4 files changed, 38 insertions(+), 53 deletions(-)

diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
index 4b5cd3a7b2b6..12046f4f00e4 100644
--- a/drivers/block/zram/zcomp.c
+++ b/drivers/block/zram/zcomp.c
@@ -160,82 +160,56 @@ int zcomp_decompress(struct zcomp_strm *zstrm,
dst, _len);
 }
 
-static int __zcomp_cpu_notifier(struct zcomp *comp,
-   unsigned long action, unsigned long cpu)
+int zcomp_cpu_up_prepare(unsigned int cpu, struct hlist_node *node)
 {
+   struct zcomp *comp = hlist_entry(node, struct zcomp, node);
struct zcomp_strm *zstrm;
 
-   switch (action) {
-   case CPU_UP_PREPARE:
-   if (WARN_ON(*per_cpu_ptr(comp->stream, cpu)))
-   break;
-   zstrm = zcomp_strm_alloc(comp);
-   if (IS_ERR_OR_NULL(zstrm)) {
-   pr_err("Can't allocate a compression stream\n");
-   return NOTIFY_BAD;
-   }
-   *per_cpu_ptr(comp->stream, cpu) = zstrm;
-   break;
-   case CPU_DEAD:
-   case CPU_UP_CANCELED:
-   zstrm = *per_cpu_ptr(comp->stream, cpu);
-   if (!IS_ERR_OR_NULL(zstrm))
-   zcomp_strm_free(zstrm);
-   *per_cpu_ptr(comp->stream, cpu) = NULL;
-   break;
-   default:
-   break;
+   if (WARN_ON(*per_cpu_ptr(comp->stream, cpu)))
+   return 0;
+
+   zstrm = zcomp_strm_alloc(comp);
+   if (IS_ERR_OR_NULL(zstrm)) {
+   pr_err("Can't allocate a compression stream\n");
+   return -ENOMEM;
}
-   return NOTIFY_OK;
+   *per_cpu_ptr(comp->stream, cpu) = zstrm;
+   return 0;
 }
 
-static int zcomp_cpu_notifier(struct notifier_block *nb,
-   unsigned long action, void *pcpu)
+int zcomp_cpu_dead(unsigned int cpu, struct hlist_node *node)
 {
-   unsigned long cpu = (unsigned long)pcpu;
-   struct zcomp *comp = container_of(nb, typeof(*comp), notifier);
+   struct zcomp *comp = hlist_entry(node, struct zcomp, node);
+   struct zcomp_strm *zstrm;
 
-   return __zcomp_cpu_notifier(comp, action, cpu);
+   zstrm = *per_cpu_ptr(comp->stream, cpu);
+   if (!IS_ERR_OR_NULL(zstrm))
+   zcomp_strm_free(zstrm);
+   *per_cpu_ptr(comp->stream, cpu) = NULL;
+   return 0;
 }
 
 static int zcomp_init(struct zcomp *comp)
 {
-   unsigned long cpu;
int ret;
 
-   comp->notifier.notifier_call = zcomp_cpu_notifier;
-
comp->stream = alloc_percpu(struct zcomp_strm *);
if (!comp->stream)
return -ENOMEM;
 
-   cpu_notifier_register_begin();
-   for_each_online_cpu(cpu) {
-   ret = __zcomp_cpu_notifier(comp, CPU_UP_PREPARE, cpu);
-   if (ret == NOTIFY_BAD)
-   goto cleanup;
-   }
-   __register_cpu_notifier(>notifier);
-   cpu_notifier_register_done();
+   ret = cpuhp_state_add_instance(CPUHP_ZCOMP_PREPARE, >node);
+   if (ret < 0)
+   goto cleanup;
return 0;
 
 cleanup:
-   for_each_online_cpu(cpu)
-   __zcomp_cpu_notifier(comp, CPU_UP_CANCELED, cpu);
-   cpu_notifier_register_done();
-   return -ENOMEM;
+   free_percpu(comp->stream);
+   return ret;
 }
 
 void zcomp_destroy(struct zcomp *comp)
 {
-   unsigned long cpu;
-
-   cpu_notifier_register_begin();
-   for_each_online_cpu(cpu)
-   __zcomp_cpu_notifier(comp, CPU_UP_CANCELED, cpu);
-   __unregister_cpu_notifier(>notifier);
-   cpu_notifier_register_done();
-
+   cpuhp_state_remove_instance(CPUHP_ZCOMP_PREPARE, >node);
free_percpu(comp->stream);
kfree(comp);
 }
diff --git a/drivers/block/zram/zcomp.h b/drivers/block/zram/zcomp.h
index 478cac2ed465..41c1002a7d7d 100644
--- a/drivers/block/zram/zcomp.h
+++ b/drivers/block/zram/zcomp.h
@@ -19,11 +19,12 @@ struct zcomp_strm {
 /* dynamic per-device compression frontend */
 struct zcomp {
struct zcomp_strm * __percpu *stream;
-   struct notifier_block notifier;
-
const char *name;
+   struct hlist_node node;
 };
 
+int zcomp_cpu_up_prepare(unsigned int cpu, struct hlist_node *node);
+int zcomp_cpu_dead(unsigned int cpu, struct hlist_node *node);
 ssize_t zcomp_available_show(const char *comp, char *buf);

[PATCH 04/22] idle/intel: Convert to hotplug state machine

2016-11-26 Thread Sebastian Andrzej Siewior

Install the callbacks via the state machine and let the core invoke the
callbacks on the already online CPUs.

The two smp_call_function_single() invocations in intel_idle_cpu_init() have
been removed because intel_idle_cpu_init() is now invoked via the hotplug
callback which runs on the target CPU. The IRQ-off calling convention for
auto_demotion_disable() and c1e_promotion_disable() has not been preserved
because only those two modify the MSR during CPU intialization.

Cc: Len Brown 
Cc: linux...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/idle/intel_idle.c | 106 ++
 1 file changed, 42 insertions(+), 64 deletions(-)

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index f53b42a78186..d9631db1b4f5 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -98,8 +98,6 @@ static int intel_idle(struct cpuidle_device *dev,
struct cpuidle_driver *drv, int index);
 static void intel_idle_freeze(struct cpuidle_device *dev,
  struct cpuidle_driver *drv, int index);
-static int intel_idle_cpu_init(int cpu);
-
 static struct cpuidle_state *cpuidle_state_table;
 
 /*
@@ -907,50 +905,15 @@ static void intel_idle_freeze(struct cpuidle_device *dev,
mwait_idle_with_hints(eax, ecx);
 }
 
-static void __setup_broadcast_timer(void *arg)
+static void __setup_broadcast_timer(bool on)
 {
-   unsigned long on = (unsigned long)arg;
-
if (on)
tick_broadcast_enable();
else
tick_broadcast_disable();
 }
 
-static int cpu_hotplug_notify(struct notifier_block *n,
- unsigned long action, void *hcpu)
-{
-   int hotcpu = (unsigned long)hcpu;
-   struct cpuidle_device *dev;
-
-   switch (action & ~CPU_TASKS_FROZEN) {
-   case CPU_ONLINE:
-
-   if (lapic_timer_reliable_states != LAPIC_TIMER_ALWAYS_RELIABLE)
-   __setup_broadcast_timer((void *)true);
-
-   /*
-* Some systems can hotplug a cpu at runtime after
-* the kernel has booted, we have to initialize the
-* driver in this case
-*/
-   dev = per_cpu_ptr(intel_idle_cpuidle_devices, hotcpu);
-   if (dev->registered)
-   break;
-
-   if (intel_idle_cpu_init(hotcpu))
-   return NOTIFY_BAD;
-
-   break;
-   }
-   return NOTIFY_OK;
-}
-
-static struct notifier_block cpu_hotplug_notifier = {
-   .notifier_call = cpu_hotplug_notify,
-};
-
-static void auto_demotion_disable(void *dummy)
+static void auto_demotion_disable(void)
 {
unsigned long long msr_bits;
 
@@ -958,7 +921,7 @@ static void auto_demotion_disable(void *dummy)
msr_bits &= ~(icpu->auto_demotion_disable_flags);
wrmsrl(MSR_NHM_SNB_PKG_CST_CFG_CTL, msr_bits);
 }
-static void c1e_promotion_disable(void *dummy)
+static void c1e_promotion_disable(void)
 {
unsigned long long msr_bits;
 
@@ -1372,12 +1335,11 @@ static void __init intel_idle_cpuidle_driver_init(void)
  * allocate, initialize, register cpuidle_devices
  * @cpu: cpu/core to initialize
  */
-static int intel_idle_cpu_init(int cpu)
+static int intel_idle_cpu_init(unsigned int cpu)
 {
struct cpuidle_device *dev;
 
dev = per_cpu_ptr(intel_idle_cpuidle_devices, cpu);
-
dev->cpu = cpu;
 
if (cpuidle_register_device(dev)) {
@@ -1386,17 +1348,38 @@ static int intel_idle_cpu_init(int cpu)
}
 
if (icpu->auto_demotion_disable_flags)
-   smp_call_function_single(cpu, auto_demotion_disable, NULL, 1);
+   auto_demotion_disable();
 
if (icpu->disable_promotion_to_c1e)
-   smp_call_function_single(cpu, c1e_promotion_disable, NULL, 1);
+   c1e_promotion_disable();
 
return 0;
 }
 
+static int intel_idle_cpu_online(unsigned int cpu)
+{
+   struct cpuidle_device *dev;
+
+   if (lapic_timer_reliable_states != LAPIC_TIMER_ALWAYS_RELIABLE)
+   __setup_broadcast_timer(true);
+
+   /*
+* Some systems can hotplug a cpu at runtime after
+* the kernel has booted, we have to initialize the
+* driver in this case
+*/
+   dev = per_cpu_ptr(intel_idle_cpuidle_devices, cpu);
+   if (!dev->registered)
+   return intel_idle_cpu_init(cpu);
+
+   return 0;
+}
+
+static enum cpuhp_state hp_online;
+
 static int __init intel_idle_init(void)
 {
-   int retval, i;
+   int retval;
 
/* Do not load intel_idle at all for now if idle= is passed */
if (boot_option_idle_override != IDLE_NO_OVERRIDE)
@@ -1416,35 +1399,30 @@ static int __init intel_idle_init(void)
struct cpuidle_driver *drv = cpuidle_get_driver();
printk(KERN_DEBUG PREFIX

[PATCH 09/22] mm/vmstat: Convert to hotplug state machine

2016-11-26 Thread Sebastian Andrzej Siewior

Install the callbacks via the state machine, but do not invoke them as we
can initialize the node state without calling the callbacks on all online
CPUs.

start_shepherd_timer() is now called outside the get_online_cpus() block
which is safe as it only operates on cpu possible mask.

Cc: Andrew Morton 
Cc: Vlastimil Babka 
Cc: Mel Gorman 
Cc: Johannes Weiner 
Cc: linux...@kvack.org
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Thomas Gleixner 
---
 include/linux/cpuhotplug.h |  1 +
 mm/vmstat.c| 76 +-
 2 files changed, 36 insertions(+), 41 deletions(-)

diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 18bcfeb2463e..4ebd1bc27f8d 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -20,6 +20,7 @@ enum cpuhp_state {
CPUHP_VIRT_NET_DEAD,
CPUHP_SLUB_DEAD,
CPUHP_MM_WRITEBACK_DEAD,
+   CPUHP_MM_VMSTAT_DEAD,
CPUHP_SOFTIRQ_DEAD,
CPUHP_NET_MVNETA_DEAD,
CPUHP_CPUIDLE_DEAD,
diff --git a/mm/vmstat.c b/mm/vmstat.c
index b96dcec7e7d7..dfe3cb9f2c36 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1726,64 +1726,58 @@ static void __init init_cpu_node_state(void)
node_set_state(node, N_CPU);
 }
 
-static void vmstat_cpu_dead(int node)
+static int vmstat_cpu_online(unsigned int cpu)
+{
+   refresh_zone_stat_thresholds();
+   node_set_state(cpu_to_node(cpu), N_CPU);
+   return 0;
+}
+
+static int vmstat_cpu_down_prep(unsigned int cpu)
+{
+   cancel_delayed_work_sync(_cpu(vmstat_work, cpu));
+   return 0;
+}
+
+static int vmstat_cpu_dead(unsigned int cpu)
 {
const struct cpumask *node_cpus;
+   int node;
 
+   node = cpu_to_node(cpu);
+
+   refresh_zone_stat_thresholds();
node_cpus = cpumask_of_node(node);
if (cpumask_weight(node_cpus) > 0)
-   return;
+   return 0;
 
node_clear_state(node, N_CPU);
+   return 0;
 }
 
-/*
- * Use the cpu notifier to insure that the thresholds are recalculated
- * when necessary.
- */
-static int vmstat_cpuup_callback(struct notifier_block *nfb,
-   unsigned long action,
-   void *hcpu)
-{
-   long cpu = (long)hcpu;
-
-   switch (action) {
-   case CPU_ONLINE:
-   case CPU_ONLINE_FROZEN:
-   refresh_zone_stat_thresholds();
-   node_set_state(cpu_to_node(cpu), N_CPU);
-   break;
-   case CPU_DOWN_PREPARE:
-   case CPU_DOWN_PREPARE_FROZEN:
-   cancel_delayed_work_sync(_cpu(vmstat_work, cpu));
-   break;
-   case CPU_DOWN_FAILED:
-   case CPU_DOWN_FAILED_FROZEN:
-   break;
-   case CPU_DEAD:
-   case CPU_DEAD_FROZEN:
-   refresh_zone_stat_thresholds();
-   vmstat_cpu_dead(cpu_to_node(cpu));
-   break;
-   default:
-   break;
-   }
-   return NOTIFY_OK;
-}
-
-static struct notifier_block vmstat_notifier =
-   { _cpuup_callback, NULL, 0 };
 #endif
 
 static int __init setup_vmstat(void)
 {
 #ifdef CONFIG_SMP
-   cpu_notifier_register_begin();
-   __register_cpu_notifier(_notifier);
+   int ret;
+
+   ret = cpuhp_setup_state_nocalls(CPUHP_MM_VMSTAT_DEAD, "mm/vmstat:dead",
+   NULL, vmstat_cpu_dead);
+   if (ret < 0)
+   pr_err("vmstat: failed to register 'dead' hotplug state\n");
+
+   ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "mm/vmstat:online",
+   vmstat_cpu_online,
+   vmstat_cpu_down_prep);
+   if (ret < 0)
+   pr_err("vmstat: failed to register 'online' hotplug state\n");
+
+   get_online_cpus();
init_cpu_node_state();
+   put_online_cpus();
 
start_shepherd_timer();
-   cpu_notifier_register_done();
 #endif
 #ifdef CONFIG_PROC_FS
proc_create("buddyinfo", S_IRUGO, NULL, _file_operations);
-- 
2.10.2

[PATCH 19/22] soc/fsl/qbman: Convert to hotplug state machine

2016-11-26 Thread Sebastian Andrzej Siewior

Install the callbacks via the state machine.

Cc: Roy Pledge 
Cc: Scott Wood 
Cc: Claudiu Manoil 
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/soc/fsl/qbman/bman_portal.c | 46 +
 1 file changed, 16 insertions(+), 30 deletions(-)

diff --git a/drivers/soc/fsl/qbman/bman_portal.c 
b/drivers/soc/fsl/qbman/bman_portal.c
index 6579cc18811a..986f64690e6e 100644
--- a/drivers/soc/fsl/qbman/bman_portal.c
+++ b/drivers/soc/fsl/qbman/bman_portal.c
@@ -53,58 +53,38 @@ static struct bman_portal *init_pcfg(struct 
bm_portal_config *pcfg)
return p;
 }
 
-static void bman_offline_cpu(unsigned int cpu)
+static int bman_offline_cpu(unsigned int cpu)
 {
struct bman_portal *p = affine_bportals[cpu];
const struct bm_portal_config *pcfg;
 
if (!p)
-   return;
+   return 0;
 
pcfg = bman_get_bm_portal_config(p);
if (!pcfg)
-   return;
+   return 0;
 
irq_set_affinity(pcfg->irq, cpumask_of(0));
+   return 0;
 }
 
-static void bman_online_cpu(unsigned int cpu)
+static int bman_online_cpu(unsigned int cpu)
 {
struct bman_portal *p = affine_bportals[cpu];
const struct bm_portal_config *pcfg;
 
if (!p)
-   return;
+   return 0;
 
pcfg = bman_get_bm_portal_config(p);
if (!pcfg)
-   return;
+   return 0;
 
irq_set_affinity(pcfg->irq, cpumask_of(cpu));
+   return 0;
 }
 
-static int bman_hotplug_cpu_callback(struct notifier_block *nfb,
-unsigned long action, void *hcpu)
-{
-   unsigned int cpu = (unsigned long)hcpu;
-
-   switch (action) {
-   case CPU_ONLINE:
-   case CPU_ONLINE_FROZEN:
-   bman_online_cpu(cpu);
-   break;
-   case CPU_DOWN_PREPARE:
-   case CPU_DOWN_PREPARE_FROZEN:
-   bman_offline_cpu(cpu);
-   }
-
-   return NOTIFY_OK;
-}
-
-static struct notifier_block bman_hotplug_cpu_notifier = {
-   .notifier_call = bman_hotplug_cpu_callback,
-};
-
 static int bman_portal_probe(struct platform_device *pdev)
 {
struct device *dev = >dev;
@@ -210,8 +190,14 @@ static int __init bman_portal_driver_register(struct 
platform_driver *drv)
if (ret < 0)
return ret;
 
-   register_hotcpu_notifier(_hotplug_cpu_notifier);
-
+   ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
+   "soc/qbman_portal:online",
+   bman_online_cpu, bman_offline_cpu);
+   if (ret < 0) {
+   pr_err("bman: failed to register hotplug callbacks.\n");
+   platform_driver_unregister(drv);
+   return ret;
+   }
return 0;
 }
 
-- 
2.10.2

[PATCH 21/22] staging/lustre/libcfs: Convert to hotplug state machine

2016-11-26 Thread Sebastian Andrzej Siewior

From: Anna-Maria Gleixner 

Install the callbacks via the state machine.

Cc: Oleg Drokin 
Cc: Andreas Dilger 
Cc: James Simmons 
Cc: Greg Kroah-Hartman 
Cc: lustre-de...@lists.lustre.org
Cc: de...@driverdev.osuosl.org
Signed-off-by: Anna-Maria Gleixner 
Signed-off-by: Sebastian Andrzej Siewior 
---
 .../staging/lustre/lnet/libcfs/linux/linux-cpu.c   | 89 --
 include/linux/cpuhotplug.h |  1 +
 2 files changed, 50 insertions(+), 40 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c 
b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
index e8b1a61420de..a75113ab2903 100644
--- a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
@@ -73,6 +73,9 @@ struct cfs_cpt_data {
 };
 
 static struct cfs_cpt_data cpt_data;
+#ifdef CONFIG_HOTPLUG_CPU
+static enum cpuhp_state lustre_cpu_online;
+#endif
 
 static void
 cfs_node_to_cpumask(int node, cpumask_t *mask)
@@ -942,48 +945,38 @@ cfs_cpt_table_create_pattern(char *pattern)
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
-static int
-cfs_cpu_notify(struct notifier_block *self, unsigned long action, void *hcpu)
+
+static void cfs_cpu_incr_cpt_version(void)
 {
-   unsigned int  cpu = (unsigned long)hcpu;
-   bool warn;
-
-   switch (action) {
-   case CPU_DEAD:
-   case CPU_DEAD_FROZEN:
-   case CPU_ONLINE:
-   case CPU_ONLINE_FROZEN:
-   spin_lock(_data.cpt_lock);
-   cpt_data.cpt_version++;
-   spin_unlock(_data.cpt_lock);
-   /* Fall through */
-   default:
-   if (action != CPU_DEAD && action != CPU_DEAD_FROZEN) {
-   CDEBUG(D_INFO, "CPU changed [cpu %u action %lx]\n",
-  cpu, action);
-   break;
-   }
-
-   mutex_lock(_data.cpt_mutex);
-   /* if all HTs in a core are offline, it may break affinity */
-   cpumask_copy(cpt_data.cpt_cpumask,
-topology_sibling_cpumask(cpu));
-   warn = cpumask_any_and(cpt_data.cpt_cpumask,
-  cpu_online_mask) >= nr_cpu_ids;
-   mutex_unlock(_data.cpt_mutex);
-   CDEBUG(warn ? D_WARNING : D_INFO,
-  "Lustre: can't support CPU plug-out well now, 
performance and stability could be impacted [CPU %u action: %lx]\n",
-  cpu, action);
-   }
-
-   return NOTIFY_OK;
+   spin_lock(_data.cpt_lock);
+   cpt_data.cpt_version++;
+   spin_unlock(_data.cpt_lock);
 }
 
-static struct notifier_block cfs_cpu_notifier = {
-   .notifier_call  = cfs_cpu_notify,
-   .priority   = 0
-};
+static int cfs_cpu_online(unsigned int cpu)
+{
+   cfs_cpu_incr_cpt_version();
+   return 0;
+}
 
+static int cfs_cpu_dead(unsigned int cpu)
+{
+   bool warn;
+   int next;
+
+   cfs_cpu_incr_cpt_version();
+
+   mutex_lock(_data.cpt_mutex);
+   /* if all HTs in a core are offline, it may break affinity */
+   cpumask_copy(cpt_data.cpt_cpumask, topology_sibling_cpumask(cpu));
+   next = cpumask_any_and(cpt_data.cpt_cpumask, cpu_online_mask);
+   warn = next >= nr_cpu_ids;
+   mutex_unlock(_data.cpt_mutex);
+   CDEBUG(warn ? D_WARNING : D_INFO,
+  "Lustre: can't support CPU plug-out well now, performance and 
stability could be impacted [CPU %u]\n",
+  cpu);
+   return 0;
+}
 #endif
 
 void
@@ -993,7 +986,9 @@ cfs_cpu_fini(void)
cfs_cpt_table_free(cfs_cpt_table);
 
 #ifdef CONFIG_HOTPLUG_CPU
-   unregister_hotcpu_notifier(_cpu_notifier);
+   if (lustre_cpu_online)
+   cpuhp_remove_state_nocalls(lustre_cpu_online);
+   cpuhp_remove_state_nocalls(CPUHP_LUSTRE_CFS_DEAD);
 #endif
if (cpt_data.cpt_cpumask)
LIBCFS_FREE(cpt_data.cpt_cpumask, cpumask_size());
@@ -1002,6 +997,10 @@ cfs_cpu_fini(void)
 int
 cfs_cpu_init(void)
 {
+#ifdef CONFIG_HOTPLUG_CPU
+   int ret;
+#endif
+
LASSERT(!cfs_cpt_table);
 
memset(_data, 0, sizeof(cpt_data));
@@ -1016,7 +1015,17 @@ cfs_cpu_init(void)
mutex_init(_data.cpt_mutex);
 
 #ifdef CONFIG_HOTPLUG_CPU
-   register_hotcpu_notifier(_cpu_notifier);
+   ret = cpuhp_setup_state_nocalls(CPUHP_LUSTRE_CFS_DEAD,
+   "staging/lustre/cfe:dead", NULL,
+   cfs_cpu_dead);
+   if (ret < 0)
+   goto failed;
+   ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
+   "staging/lustre/cfe:online",
+   cfs_cpu_online, NULL);
+   if (ret < 0)
+

[PATCH 09/22] mm/vmstat: Convert to hotplug state machine

2016-11-26 Thread Sebastian Andrzej Siewior

Install the callbacks via the state machine, but do not invoke them as we
can initialize the node state without calling the callbacks on all online
CPUs.

start_shepherd_timer() is now called outside the get_online_cpus() block
which is safe as it only operates on cpu possible mask.

Cc: Andrew Morton 
Cc: Vlastimil Babka 
Cc: Mel Gorman 
Cc: Johannes Weiner 
Cc: linux...@kvack.org
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Thomas Gleixner 
---
 include/linux/cpuhotplug.h |  1 +
 mm/vmstat.c| 76 +-
 2 files changed, 36 insertions(+), 41 deletions(-)

diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 18bcfeb2463e..4ebd1bc27f8d 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -20,6 +20,7 @@ enum cpuhp_state {
CPUHP_VIRT_NET_DEAD,
CPUHP_SLUB_DEAD,
CPUHP_MM_WRITEBACK_DEAD,
+   CPUHP_MM_VMSTAT_DEAD,
CPUHP_SOFTIRQ_DEAD,
CPUHP_NET_MVNETA_DEAD,
CPUHP_CPUIDLE_DEAD,
diff --git a/mm/vmstat.c b/mm/vmstat.c
index b96dcec7e7d7..dfe3cb9f2c36 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1726,64 +1726,58 @@ static void __init init_cpu_node_state(void)
node_set_state(node, N_CPU);
 }
 
-static void vmstat_cpu_dead(int node)
+static int vmstat_cpu_online(unsigned int cpu)
+{
+   refresh_zone_stat_thresholds();
+   node_set_state(cpu_to_node(cpu), N_CPU);
+   return 0;
+}
+
+static int vmstat_cpu_down_prep(unsigned int cpu)
+{
+   cancel_delayed_work_sync(_cpu(vmstat_work, cpu));
+   return 0;
+}
+
+static int vmstat_cpu_dead(unsigned int cpu)
 {
const struct cpumask *node_cpus;
+   int node;
 
+   node = cpu_to_node(cpu);
+
+   refresh_zone_stat_thresholds();
node_cpus = cpumask_of_node(node);
if (cpumask_weight(node_cpus) > 0)
-   return;
+   return 0;
 
node_clear_state(node, N_CPU);
+   return 0;
 }
 
-/*
- * Use the cpu notifier to insure that the thresholds are recalculated
- * when necessary.
- */
-static int vmstat_cpuup_callback(struct notifier_block *nfb,
-   unsigned long action,
-   void *hcpu)
-{
-   long cpu = (long)hcpu;
-
-   switch (action) {
-   case CPU_ONLINE:
-   case CPU_ONLINE_FROZEN:
-   refresh_zone_stat_thresholds();
-   node_set_state(cpu_to_node(cpu), N_CPU);
-   break;
-   case CPU_DOWN_PREPARE:
-   case CPU_DOWN_PREPARE_FROZEN:
-   cancel_delayed_work_sync(_cpu(vmstat_work, cpu));
-   break;
-   case CPU_DOWN_FAILED:
-   case CPU_DOWN_FAILED_FROZEN:
-   break;
-   case CPU_DEAD:
-   case CPU_DEAD_FROZEN:
-   refresh_zone_stat_thresholds();
-   vmstat_cpu_dead(cpu_to_node(cpu));
-   break;
-   default:
-   break;
-   }
-   return NOTIFY_OK;
-}
-
-static struct notifier_block vmstat_notifier =
-   { _cpuup_callback, NULL, 0 };
 #endif
 
 static int __init setup_vmstat(void)
 {
 #ifdef CONFIG_SMP
-   cpu_notifier_register_begin();
-   __register_cpu_notifier(_notifier);
+   int ret;
+
+   ret = cpuhp_setup_state_nocalls(CPUHP_MM_VMSTAT_DEAD, "mm/vmstat:dead",
+   NULL, vmstat_cpu_dead);
+   if (ret < 0)
+   pr_err("vmstat: failed to register 'dead' hotplug state\n");
+
+   ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "mm/vmstat:online",
+   vmstat_cpu_online,
+   vmstat_cpu_down_prep);
+   if (ret < 0)
+   pr_err("vmstat: failed to register 'online' hotplug state\n");
+
+   get_online_cpus();
init_cpu_node_state();
+   put_online_cpus();
 
start_shepherd_timer();
-   cpu_notifier_register_done();
 #endif
 #ifdef CONFIG_PROC_FS
proc_create("buddyinfo", S_IRUGO, NULL, _file_operations);
-- 
2.10.2

[PATCH 19/22] soc/fsl/qbman: Convert to hotplug state machine

2016-11-26 Thread Sebastian Andrzej Siewior

Install the callbacks via the state machine.

Cc: Roy Pledge 
Cc: Scott Wood 
Cc: Claudiu Manoil 
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/soc/fsl/qbman/bman_portal.c | 46 +
 1 file changed, 16 insertions(+), 30 deletions(-)

diff --git a/drivers/soc/fsl/qbman/bman_portal.c 
b/drivers/soc/fsl/qbman/bman_portal.c
index 6579cc18811a..986f64690e6e 100644
--- a/drivers/soc/fsl/qbman/bman_portal.c
+++ b/drivers/soc/fsl/qbman/bman_portal.c
@@ -53,58 +53,38 @@ static struct bman_portal *init_pcfg(struct 
bm_portal_config *pcfg)
return p;
 }
 
-static void bman_offline_cpu(unsigned int cpu)
+static int bman_offline_cpu(unsigned int cpu)
 {
struct bman_portal *p = affine_bportals[cpu];
const struct bm_portal_config *pcfg;
 
if (!p)
-   return;
+   return 0;
 
pcfg = bman_get_bm_portal_config(p);
if (!pcfg)
-   return;
+   return 0;
 
irq_set_affinity(pcfg->irq, cpumask_of(0));
+   return 0;
 }
 
-static void bman_online_cpu(unsigned int cpu)
+static int bman_online_cpu(unsigned int cpu)
 {
struct bman_portal *p = affine_bportals[cpu];
const struct bm_portal_config *pcfg;
 
if (!p)
-   return;
+   return 0;
 
pcfg = bman_get_bm_portal_config(p);
if (!pcfg)
-   return;
+   return 0;
 
irq_set_affinity(pcfg->irq, cpumask_of(cpu));
+   return 0;
 }
 
-static int bman_hotplug_cpu_callback(struct notifier_block *nfb,
-unsigned long action, void *hcpu)
-{
-   unsigned int cpu = (unsigned long)hcpu;
-
-   switch (action) {
-   case CPU_ONLINE:
-   case CPU_ONLINE_FROZEN:
-   bman_online_cpu(cpu);
-   break;
-   case CPU_DOWN_PREPARE:
-   case CPU_DOWN_PREPARE_FROZEN:
-   bman_offline_cpu(cpu);
-   }
-
-   return NOTIFY_OK;
-}
-
-static struct notifier_block bman_hotplug_cpu_notifier = {
-   .notifier_call = bman_hotplug_cpu_callback,
-};
-
 static int bman_portal_probe(struct platform_device *pdev)
 {
struct device *dev = >dev;
@@ -210,8 +190,14 @@ static int __init bman_portal_driver_register(struct 
platform_driver *drv)
if (ret < 0)
return ret;
 
-   register_hotcpu_notifier(_hotplug_cpu_notifier);
-
+   ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
+   "soc/qbman_portal:online",
+   bman_online_cpu, bman_offline_cpu);
+   if (ret < 0) {
+   pr_err("bman: failed to register hotplug callbacks.\n");
+   platform_driver_unregister(drv);
+   return ret;
+   }
return 0;
 }
 
-- 
2.10.2

[PATCH 21/22] staging/lustre/libcfs: Convert to hotplug state machine

2016-11-26 Thread Sebastian Andrzej Siewior

From: Anna-Maria Gleixner 

Install the callbacks via the state machine.

Cc: Oleg Drokin 
Cc: Andreas Dilger 
Cc: James Simmons 
Cc: Greg Kroah-Hartman 
Cc: lustre-de...@lists.lustre.org
Cc: de...@driverdev.osuosl.org
Signed-off-by: Anna-Maria Gleixner 
Signed-off-by: Sebastian Andrzej Siewior 
---
 .../staging/lustre/lnet/libcfs/linux/linux-cpu.c   | 89 --
 include/linux/cpuhotplug.h |  1 +
 2 files changed, 50 insertions(+), 40 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c 
b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
index e8b1a61420de..a75113ab2903 100644
--- a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
@@ -73,6 +73,9 @@ struct cfs_cpt_data {
 };
 
 static struct cfs_cpt_data cpt_data;
+#ifdef CONFIG_HOTPLUG_CPU
+static enum cpuhp_state lustre_cpu_online;
+#endif
 
 static void
 cfs_node_to_cpumask(int node, cpumask_t *mask)
@@ -942,48 +945,38 @@ cfs_cpt_table_create_pattern(char *pattern)
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
-static int
-cfs_cpu_notify(struct notifier_block *self, unsigned long action, void *hcpu)
+
+static void cfs_cpu_incr_cpt_version(void)
 {
-   unsigned int  cpu = (unsigned long)hcpu;
-   bool warn;
-
-   switch (action) {
-   case CPU_DEAD:
-   case CPU_DEAD_FROZEN:
-   case CPU_ONLINE:
-   case CPU_ONLINE_FROZEN:
-   spin_lock(_data.cpt_lock);
-   cpt_data.cpt_version++;
-   spin_unlock(_data.cpt_lock);
-   /* Fall through */
-   default:
-   if (action != CPU_DEAD && action != CPU_DEAD_FROZEN) {
-   CDEBUG(D_INFO, "CPU changed [cpu %u action %lx]\n",
-  cpu, action);
-   break;
-   }
-
-   mutex_lock(_data.cpt_mutex);
-   /* if all HTs in a core are offline, it may break affinity */
-   cpumask_copy(cpt_data.cpt_cpumask,
-topology_sibling_cpumask(cpu));
-   warn = cpumask_any_and(cpt_data.cpt_cpumask,
-  cpu_online_mask) >= nr_cpu_ids;
-   mutex_unlock(_data.cpt_mutex);
-   CDEBUG(warn ? D_WARNING : D_INFO,
-  "Lustre: can't support CPU plug-out well now, 
performance and stability could be impacted [CPU %u action: %lx]\n",
-  cpu, action);
-   }
-
-   return NOTIFY_OK;
+   spin_lock(_data.cpt_lock);
+   cpt_data.cpt_version++;
+   spin_unlock(_data.cpt_lock);
 }
 
-static struct notifier_block cfs_cpu_notifier = {
-   .notifier_call  = cfs_cpu_notify,
-   .priority   = 0
-};
+static int cfs_cpu_online(unsigned int cpu)
+{
+   cfs_cpu_incr_cpt_version();
+   return 0;
+}
 
+static int cfs_cpu_dead(unsigned int cpu)
+{
+   bool warn;
+   int next;
+
+   cfs_cpu_incr_cpt_version();
+
+   mutex_lock(_data.cpt_mutex);
+   /* if all HTs in a core are offline, it may break affinity */
+   cpumask_copy(cpt_data.cpt_cpumask, topology_sibling_cpumask(cpu));
+   next = cpumask_any_and(cpt_data.cpt_cpumask, cpu_online_mask);
+   warn = next >= nr_cpu_ids;
+   mutex_unlock(_data.cpt_mutex);
+   CDEBUG(warn ? D_WARNING : D_INFO,
+  "Lustre: can't support CPU plug-out well now, performance and 
stability could be impacted [CPU %u]\n",
+  cpu);
+   return 0;
+}
 #endif
 
 void
@@ -993,7 +986,9 @@ cfs_cpu_fini(void)
cfs_cpt_table_free(cfs_cpt_table);
 
 #ifdef CONFIG_HOTPLUG_CPU
-   unregister_hotcpu_notifier(_cpu_notifier);
+   if (lustre_cpu_online)
+   cpuhp_remove_state_nocalls(lustre_cpu_online);
+   cpuhp_remove_state_nocalls(CPUHP_LUSTRE_CFS_DEAD);
 #endif
if (cpt_data.cpt_cpumask)
LIBCFS_FREE(cpt_data.cpt_cpumask, cpumask_size());
@@ -1002,6 +997,10 @@ cfs_cpu_fini(void)
 int
 cfs_cpu_init(void)
 {
+#ifdef CONFIG_HOTPLUG_CPU
+   int ret;
+#endif
+
LASSERT(!cfs_cpt_table);
 
memset(_data, 0, sizeof(cpt_data));
@@ -1016,7 +1015,17 @@ cfs_cpu_init(void)
mutex_init(_data.cpt_mutex);
 
 #ifdef CONFIG_HOTPLUG_CPU
-   register_hotcpu_notifier(_cpu_notifier);
+   ret = cpuhp_setup_state_nocalls(CPUHP_LUSTRE_CFS_DEAD,
+   "staging/lustre/cfe:dead", NULL,
+   cfs_cpu_dead);
+   if (ret < 0)
+   goto failed;
+   ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
+   "staging/lustre/cfe:online",
+   cfs_cpu_online, NULL);
+   if (ret < 0)
+   goto failed;
+   lustre_cpu_online = ret;
 #endif
 
if (*cpu_pattern != 0) {
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index

[PATCH 04/22] idle/intel: Convert to hotplug state machine

2016-11-26 Thread Sebastian Andrzej Siewior

Install the callbacks via the state machine and let the core invoke the
callbacks on the already online CPUs.

The two smp_call_function_single() invocations in intel_idle_cpu_init() have
been removed because intel_idle_cpu_init() is now invoked via the hotplug
callback which runs on the target CPU. The IRQ-off calling convention for
auto_demotion_disable() and c1e_promotion_disable() has not been preserved
because only those two modify the MSR during CPU intialization.

Cc: Len Brown 
Cc: linux...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/idle/intel_idle.c | 106 ++
 1 file changed, 42 insertions(+), 64 deletions(-)

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index f53b42a78186..d9631db1b4f5 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -98,8 +98,6 @@ static int intel_idle(struct cpuidle_device *dev,
struct cpuidle_driver *drv, int index);
 static void intel_idle_freeze(struct cpuidle_device *dev,
  struct cpuidle_driver *drv, int index);
-static int intel_idle_cpu_init(int cpu);
-
 static struct cpuidle_state *cpuidle_state_table;
 
 /*
@@ -907,50 +905,15 @@ static void intel_idle_freeze(struct cpuidle_device *dev,
mwait_idle_with_hints(eax, ecx);
 }
 
-static void __setup_broadcast_timer(void *arg)
+static void __setup_broadcast_timer(bool on)
 {
-   unsigned long on = (unsigned long)arg;
-
if (on)
tick_broadcast_enable();
else
tick_broadcast_disable();
 }
 
-static int cpu_hotplug_notify(struct notifier_block *n,
- unsigned long action, void *hcpu)
-{
-   int hotcpu = (unsigned long)hcpu;
-   struct cpuidle_device *dev;
-
-   switch (action & ~CPU_TASKS_FROZEN) {
-   case CPU_ONLINE:
-
-   if (lapic_timer_reliable_states != LAPIC_TIMER_ALWAYS_RELIABLE)
-   __setup_broadcast_timer((void *)true);
-
-   /*
-* Some systems can hotplug a cpu at runtime after
-* the kernel has booted, we have to initialize the
-* driver in this case
-*/
-   dev = per_cpu_ptr(intel_idle_cpuidle_devices, hotcpu);
-   if (dev->registered)
-   break;
-
-   if (intel_idle_cpu_init(hotcpu))
-   return NOTIFY_BAD;
-
-   break;
-   }
-   return NOTIFY_OK;
-}
-
-static struct notifier_block cpu_hotplug_notifier = {
-   .notifier_call = cpu_hotplug_notify,
-};
-
-static void auto_demotion_disable(void *dummy)
+static void auto_demotion_disable(void)
 {
unsigned long long msr_bits;
 
@@ -958,7 +921,7 @@ static void auto_demotion_disable(void *dummy)
msr_bits &= ~(icpu->auto_demotion_disable_flags);
wrmsrl(MSR_NHM_SNB_PKG_CST_CFG_CTL, msr_bits);
 }
-static void c1e_promotion_disable(void *dummy)
+static void c1e_promotion_disable(void)
 {
unsigned long long msr_bits;
 
@@ -1372,12 +1335,11 @@ static void __init intel_idle_cpuidle_driver_init(void)
  * allocate, initialize, register cpuidle_devices
  * @cpu: cpu/core to initialize
  */
-static int intel_idle_cpu_init(int cpu)
+static int intel_idle_cpu_init(unsigned int cpu)
 {
struct cpuidle_device *dev;
 
dev = per_cpu_ptr(intel_idle_cpuidle_devices, cpu);
-
dev->cpu = cpu;
 
if (cpuidle_register_device(dev)) {
@@ -1386,17 +1348,38 @@ static int intel_idle_cpu_init(int cpu)
}
 
if (icpu->auto_demotion_disable_flags)
-   smp_call_function_single(cpu, auto_demotion_disable, NULL, 1);
+   auto_demotion_disable();
 
if (icpu->disable_promotion_to_c1e)
-   smp_call_function_single(cpu, c1e_promotion_disable, NULL, 1);
+   c1e_promotion_disable();
 
return 0;
 }
 
+static int intel_idle_cpu_online(unsigned int cpu)
+{
+   struct cpuidle_device *dev;
+
+   if (lapic_timer_reliable_states != LAPIC_TIMER_ALWAYS_RELIABLE)
+   __setup_broadcast_timer(true);
+
+   /*
+* Some systems can hotplug a cpu at runtime after
+* the kernel has booted, we have to initialize the
+* driver in this case
+*/
+   dev = per_cpu_ptr(intel_idle_cpuidle_devices, cpu);
+   if (!dev->registered)
+   return intel_idle_cpu_init(cpu);
+
+   return 0;
+}
+
+static enum cpuhp_state hp_online;
+
 static int __init intel_idle_init(void)
 {
-   int retval, i;
+   int retval;
 
/* Do not load intel_idle at all for now if idle= is passed */
if (boot_option_idle_override != IDLE_NO_OVERRIDE)
@@ -1416,35 +1399,30 @@ static int __init intel_idle_init(void)
struct cpuidle_driver *drv = cpuidle_get_driver();
printk(KERN_DEBUG PREFIX "intel_idle yielding to %s",

[PATCH 17/22] KVM/PPC/Book3S HV: Convert to hotplug state machine

2016-11-26 Thread Sebastian Andrzej Siewior

From: Anna-Maria Gleixner 

Install the callbacks via the state machine.

Cc: Alexander Graf 
Cc: Paolo Bonzini 
Cc: "Radim Krčmář" 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: kvm-...@vger.kernel.org
Cc: k...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Signed-off-by: Anna-Maria Gleixner 
Signed-off-by: Sebastian Andrzej Siewior 
---
 arch/powerpc/kvm/book3s_hv.c | 48 ++--
 include/linux/cpuhotplug.h   |  1 +
 2 files changed, 12 insertions(+), 37 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 3686471be32b..39ef1f4a7b02 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2254,12 +2254,12 @@ static void post_guest_process(struct kvmppc_vcore *vc, 
bool is_master)
  * enter the guest. Only do this if it is the primary thread of the
  * core (not if a subcore) that is entering the guest.
  */
-static inline void kvmppc_clear_host_core(int cpu)
+static inline int kvmppc_clear_host_core(unsigned int cpu)
 {
int core;
 
if (!kvmppc_host_rm_ops_hv || cpu_thread_in_core(cpu))
-   return;
+   return 0;
/*
 * Memory barrier can be omitted here as we will do a smp_wmb()
 * later in kvmppc_start_thread and we need ensure that state is
@@ -2267,6 +2267,7 @@ static inline void kvmppc_clear_host_core(int cpu)
 */
core = cpu >> threads_shift;
kvmppc_host_rm_ops_hv->rm_core[core].rm_state.in_host = 0;
+   return 0;
 }
 
 /*
@@ -2274,12 +2275,12 @@ static inline void kvmppc_clear_host_core(int cpu)
  * Only need to do this if it is the primary thread of the core that is
  * exiting.
  */
-static inline void kvmppc_set_host_core(int cpu)
+static inline int kvmppc_set_host_core(unsigned int cpu)
 {
int core;
 
if (!kvmppc_host_rm_ops_hv || cpu_thread_in_core(cpu))
-   return;
+   return 0;
 
/*
 * Memory barrier can be omitted here because we do a spin_unlock
@@ -2287,6 +2288,7 @@ static inline void kvmppc_set_host_core(int cpu)
 */
core = cpu >> threads_shift;
kvmppc_host_rm_ops_hv->rm_core[core].rm_state.in_host = 1;
+   return 0;
 }
 
 /*
@@ -3094,36 +3096,6 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu 
*vcpu)
 }
 
 #ifdef CONFIG_KVM_XICS
-static int kvmppc_cpu_notify(struct notifier_block *self, unsigned long action,
-   void *hcpu)
-{
-   unsigned long cpu = (long)hcpu;
-
-   switch (action) {
-   case CPU_UP_PREPARE:
-   case CPU_UP_PREPARE_FROZEN:
-   kvmppc_set_host_core(cpu);
-   break;
-
-#ifdef CONFIG_HOTPLUG_CPU
-   case CPU_DEAD:
-   case CPU_DEAD_FROZEN:
-   case CPU_UP_CANCELED:
-   case CPU_UP_CANCELED_FROZEN:
-   kvmppc_clear_host_core(cpu);
-   break;
-#endif
-   default:
-   break;
-   }
-
-   return NOTIFY_OK;
-}
-
-static struct notifier_block kvmppc_cpu_notifier = {
-   .notifier_call = kvmppc_cpu_notify,
-};
-
 /*
  * Allocate a per-core structure for managing state about which cores are
  * running in the host versus the guest and for exchanging data between
@@ -3185,15 +3157,17 @@ void kvmppc_alloc_host_rm_ops(void)
return;
}
 
-   register_cpu_notifier(_cpu_notifier);
-
+   cpuhp_setup_state_nocalls(CPUHP_KVM_PPC_BOOK3S_PREPARE,
+ "ppc/kvm_book3s:prepare",
+ kvmppc_set_host_core,
+ kvmppc_clear_host_core);
put_online_cpus();
 }
 
 void kvmppc_free_host_rm_ops(void)
 {
if (kvmppc_host_rm_ops_hv) {
-   unregister_cpu_notifier(_cpu_notifier);
+   cpuhp_remove_state_nocalls(CPUHP_KVM_PPC_BOOK3S_PREPARE);
kfree(kvmppc_host_rm_ops_hv->rm_core);
kfree(kvmppc_host_rm_ops_hv);
kvmppc_host_rm_ops_hv = NULL;
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 853f8176594d..71c6822dd5be 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -68,6 +68,7 @@ enum cpuhp_state {
CPUHP_MM_ZS_PREPARE,
CPUHP_MM_ZSWP_MEM_PREPARE,
CPUHP_MM_ZSWP_POOL_PREPARE,
+   CPUHP_KVM_PPC_BOOK3S_PREPARE,
CPUHP_TIMERS_DEAD,
CPUHP_NOTF_ERR_INJ_PREPARE,
CPUHP_MIPS_SOC_PREPARE,
-- 
2.10.2

[PATCH 22/22] Remove obsolete cpu hotplug register / unregister functions

2016-11-26 Thread Sebastian Andrzej Siewior

Users of
hotcpu_notifier()
cpu_notifier()
__hotcpu_notifier()
__cpu_notifier()
register_hotcpu_notifier()
register_cpu_notifier()
__register_hotcpu_notifier()
__register_cpu_notifier()
unregister_hotcpu_notifier()
unregister_cpu_notifier()
__unregister_hotcpu_notifier()
__unregister_cpu_notifier()

are not longer in tree so remove them.
While at it, some unused CPU states are removed here, too. This includes the
compatibily wrappers. The remaining states (like CPU_ONLINE) are used SMP
bring up code to makr the current state.

Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Thomas Gleixner 
---
 include/linux/cpu.h|  48 
 include/linux/cpuhotplug.h |   2 -
 kernel/cpu.c   | 139 +
 3 files changed, 1 insertion(+), 188 deletions(-)

diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index b886dc17f2f3..d510e06b377f 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -57,9 +57,6 @@ struct notifier_block;
 
 #define CPU_ONLINE 0x0002 /* CPU (unsigned)v is up */
 #define CPU_UP_PREPARE 0x0003 /* CPU (unsigned)v coming up */
-#define CPU_UP_CANCELED0x0004 /* CPU (unsigned)v NOT coming up 
*/
-#define CPU_DOWN_PREPARE   0x0005 /* CPU (unsigned)v going down */
-#define CPU_DOWN_FAILED0x0006 /* CPU (unsigned)v NOT going 
down */
 #define CPU_DEAD   0x0007 /* CPU (unsigned)v dead */
 #define CPU_POST_DEAD  0x0009 /* CPU (unsigned)v dead, cpu_hotplug
* lock is dropped */
@@ -134,33 +131,9 @@ void notify_cpu_starting(unsigned int cpu);
 extern void cpu_maps_update_begin(void);
 extern void cpu_maps_update_done(void);
 
-#define cpu_notifier_register_begincpu_maps_update_begin
-#define cpu_notifier_register_done cpu_maps_update_done
-
 #else  /* CONFIG_SMP */
 #define cpuhp_tasks_frozen 0
 
-#define cpu_notifier(fn, pri)  do { (void)(fn); } while (0)
-#define __cpu_notifier(fn, pri)do { (void)(fn); } while (0)
-
-static inline int register_cpu_notifier(struct notifier_block *nb)
-{
-   return 0;
-}
-
-static inline int __register_cpu_notifier(struct notifier_block *nb)
-{
-   return 0;
-}
-
-static inline void unregister_cpu_notifier(struct notifier_block *nb)
-{
-}
-
-static inline void __unregister_cpu_notifier(struct notifier_block *nb)
-{
-}
-
 static inline void cpu_maps_update_begin(void)
 {
 }
@@ -169,14 +142,6 @@ static inline void cpu_maps_update_done(void)
 {
 }
 
-static inline void cpu_notifier_register_begin(void)
-{
-}
-
-static inline void cpu_notifier_register_done(void)
-{
-}
-
 #endif /* CONFIG_SMP */
 extern struct bus_type cpu_subsys;
 
@@ -189,12 +154,6 @@ extern void get_online_cpus(void);
 extern void put_online_cpus(void);
 extern void cpu_hotplug_disable(void);
 extern void cpu_hotplug_enable(void);
-#define hotcpu_notifier(fn, pri)   cpu_notifier(fn, pri)
-#define __hotcpu_notifier(fn, pri) __cpu_notifier(fn, pri)
-#define register_hotcpu_notifier(nb)   register_cpu_notifier(nb)
-#define __register_hotcpu_notifier(nb) __register_cpu_notifier(nb)
-#define unregister_hotcpu_notifier(nb) unregister_cpu_notifier(nb)
-#define __unregister_hotcpu_notifier(nb)   __unregister_cpu_notifier(nb)
 void clear_tasks_mm_cpumask(int cpu);
 int cpu_down(unsigned int cpu);
 
@@ -206,13 +165,6 @@ static inline void cpu_hotplug_done(void) {}
 #define put_online_cpus()  do { } while (0)
 #define cpu_hotplug_disable()  do { } while (0)
 #define cpu_hotplug_enable()   do { } while (0)
-#define hotcpu_notifier(fn, pri)   do { (void)(fn); } while (0)
-#define __hotcpu_notifier(fn, pri) do { (void)(fn); } while (0)
-/* These aren't inline functions due to a GCC bug. */
-#define register_hotcpu_notifier(nb)   ({ (void)(nb); 0; })
-#define __register_hotcpu_notifier(nb) ({ (void)(nb); 0; })
-#define unregister_hotcpu_notifier(nb) ({ (void)(nb); })
-#define __unregister_hotcpu_notifier(nb)   ({ (void)(nb); })
 #endif /* CONFIG_HOTPLUG_CPU */
 
 #ifdef CONFIG_PM_SLEEP_SMP
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 141c3be242d1..42287b4a32f3 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -57,7 +57,6 @@ enum cpuhp_state {
CPUHP_POWERPC_MMU_CTX_PREPARE,
CPUHP_XEN_PREPARE,
CPUHP_XEN_EVTCHN_PREPARE,
-   CPUHP_NOTIFY_PREPARE,
CPUHP_ARM_SHMOBILE_SCU_PREPARE,
CPUHP_SH_SH3X_PREPARE,
CPUHP_BLK_MQ_PREPARE,
@@ -142,7 +141,6 @@ enum cpuhp_state {
CPUHP_AP_PERF_ARM_L2X0_ONLINE,
CPUHP_AP_WORKQUEUE_ONLINE,
CPUHP_AP_RCUTREE_ONLINE,
-   CPUHP_AP_NOTIFY_ONLINE,
CPUHP_AP_ONLINE_DYN,
CPUHP_AP_ONLINE_DYN_END = CPUHP_AP_ONLINE_DYN + 30,
CPUHP_AP_X86_HPET_ONLINE,

[PATCH 05/22] oprofile/nmi timer: Convert to hotplug state machine

2016-11-26 Thread Sebastian Andrzej Siewior

Install the callbacks via the state machine and let the core invoke
the callbacks on the already online CPUs.

Cc: Robert Richter 
Cc: oprofile-l...@lists.sf.net
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/oprofile/nmi_timer_int.c | 58 +---
 1 file changed, 19 insertions(+), 39 deletions(-)

diff --git a/drivers/oprofile/nmi_timer_int.c b/drivers/oprofile/nmi_timer_int.c
index 9559829fb234..e65a576e4032 100644
--- a/drivers/oprofile/nmi_timer_int.c
+++ b/drivers/oprofile/nmi_timer_int.c
@@ -59,25 +59,16 @@ static void nmi_timer_stop_cpu(int cpu)
perf_event_disable(event);
 }
 
-static int nmi_timer_cpu_notifier(struct notifier_block *b, unsigned long 
action,
- void *data)
+static int nmi_timer_cpu_online(unsigned int cpu)
 {
-   int cpu = (unsigned long)data;
-   switch (action) {
-   case CPU_DOWN_FAILED:
-   case CPU_ONLINE:
-   nmi_timer_start_cpu(cpu);
-   break;
-   case CPU_DOWN_PREPARE:
-   nmi_timer_stop_cpu(cpu);
-   break;
-   }
-   return NOTIFY_DONE;
+   nmi_timer_start_cpu(cpu);
+   return 0;
+}
+static int nmi_timer_cpu_predown(unsigned int cpu)
+{
+   nmi_timer_stop_cpu(cpu);
+   return 0;
 }
-
-static struct notifier_block nmi_timer_cpu_nb = {
-   .notifier_call = nmi_timer_cpu_notifier
-};
 
 static int nmi_timer_start(void)
 {
@@ -103,13 +94,14 @@ static void nmi_timer_stop(void)
put_online_cpus();
 }
 
+static enum cpuhp_state hp_online;
+
 static void nmi_timer_shutdown(void)
 {
struct perf_event *event;
int cpu;
 
-   cpu_notifier_register_begin();
-   __unregister_cpu_notifier(_timer_cpu_nb);
+   cpuhp_remove_state(hp_online);
for_each_possible_cpu(cpu) {
event = per_cpu(nmi_timer_events, cpu);
if (!event)
@@ -118,13 +110,11 @@ static void nmi_timer_shutdown(void)
per_cpu(nmi_timer_events, cpu) = NULL;
perf_event_release_kernel(event);
}
-
-   cpu_notifier_register_done();
 }
 
 static int nmi_timer_setup(void)
 {
-   int cpu, err;
+   int err;
u64 period;
 
/* clock cycles per tick: */
@@ -132,24 +122,14 @@ static int nmi_timer_setup(void)
do_div(period, HZ);
nmi_timer_attr.sample_period = period;
 
-   cpu_notifier_register_begin();
-   err = __register_cpu_notifier(_timer_cpu_nb);
-   if (err)
-   goto out;
-
-   /* can't attach events to offline cpus: */
-   for_each_online_cpu(cpu) {
-   err = nmi_timer_start_cpu(cpu);
-   if (err) {
-   cpu_notifier_register_done();
-   nmi_timer_shutdown();
-   return err;
-   }
+   err = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "oprofile/nmi:online",
+   nmi_timer_cpu_online, nmi_timer_cpu_predown);
+   if (err < 0) {
+   nmi_timer_shutdown();
+   return err;
}
-
-out:
-   cpu_notifier_register_done();
-   return err;
+   hp_online = err;
+   return 0;
 }
 
 int __init op_nmi_timer_init(struct oprofile_operations *ops)
-- 
2.10.2

[PATCH 03/22] idle/intel: Remove superfluous SMP fuction call

2016-11-26 Thread Sebastian Andrzej Siewior

From: Anna-Maria Gleixner 

Since commit 1cf4f629d9d2 ("cpu/hotplug: Move online calls to
hotplugged cpu") the CPU_ONLINE and CPU_DOWN_PREPARE notifiers are
always run on the hot plugged CPU, and as of commit 3b9d6da67e11
("cpu/hotplug: Fix rollback during error-out in __cpu_disable()") the
CPU_DOWN_FAILED notifier also runs on the hot plugged CPU. This patch
converts the SMP functional calls into direct calls.

smp_function_call_single() executes the function with interrupts
disabled. This calling convention is not preserved, because
tick_broadcast_enable() and tick_braodcast_disable() handle
interrupts themselves.

Cc: Len Brown 
Cc: linux...@vger.kernel.org
Signed-off-by: Anna-Maria Gleixner 
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/idle/intel_idle.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 4466a2f969d7..f53b42a78186 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -927,8 +927,7 @@ static int cpu_hotplug_notify(struct notifier_block *n,
case CPU_ONLINE:
 
if (lapic_timer_reliable_states != LAPIC_TIMER_ALWAYS_RELIABLE)
-   smp_call_function_single(hotcpu, 
__setup_broadcast_timer,
-(void *)true, 1);
+   __setup_broadcast_timer((void *)true);
 
/*
 * Some systems can hotplug a cpu at runtime after
-- 
2.10.2

1 2 3 4 5 >

1 - 100 of 438 matches

Mail list logo