date:20180321

Re: [PATCH 3/3] i2c: mux: pca9541: prepare for PCA9641 support

2018-03-21 Thread Vladimir Zapolskiy

On 03/21/2018 03:19 AM, Guenter Roeck wrote:
> On 03/20/2018 04:17 PM, Vladimir Zapolskiy wrote:
>> Hi Peter, Ken,
>>
>> On 03/20/2018 11:32 AM, Peter Rosin wrote:
>>> Make the arbitrate and release_bus implementation chip specific.
>>>
>>
>> by chance I took a look at the original implementation done by Ken, and
>> I would say that this 3/3 change is an overkill as a too generic one.
>> Is there any next observable extension? And do two abstracted (*arbitrate)
>> and (*release_bus) cover it well? Probably no.
>>
>> At first it would be simpler to add a new chip id field into struct pca9541
>> (struct rename would be needed of course), and do a selection of specific
>> pca9x41_arbitrate() and pca9x41_release_bus() depending on it:
>>
> 
> FWIW, I very much prefer Peter's code. I think it is much cleaner.

Peter's code is generic, and it makes the change about 3 times longer in lines
of code, and the following pca9641 change on top of it will be larger as well,
because generalization requires service.

My main concern is that if such generalization is really needed in the driver.

--
With best wishes,
Vladimir

RE: [PATCH 14/15] x86/fsgsbase/64: Support legacy behavior when FS/GS updated by ptracer

2018-03-21 Thread Metzger, Markus T

> -Original Message-
> From: Andy Lutomirski [mailto:l...@kernel.org]
> Sent: 21 March 2018 01:47

Hello Andy,

> I retract this particular comment.  But I still think that all this 
> complexity needs to
> be more clearly justified.  My objection to the old approach wasn't that I 
> thought
> it was obviously wrong -- I thought that someone needed to survey existing
> ptrace() users and see if anyone needed the fancier code that you're adding.  
> Did
> you find something that needs this fancy code?

There are 3 cases:
- only FS changed, e.g. "p $fs = ..."
- only FS_BASE changed, e.g. "p $fs_base = ..."
- both change, e.g. "p foo()" when restoring the original register state on 
return
  from the inferior call

The ptracer may use SETREGS in all 3 cases, even though only a single register 
changed.

For case 1, it might make sense to change FS_BASE as a side-effect.
For case 2, we'd only want to change FS_BASE and leave FS.
For case 3, we'd want both FS and FS_BASE to be set to the ptracer-provided 
values.

Does that make sense?

Thanks,
Markus.

Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Christian Lamprechter
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928

Re: [PATCH 3/3] i2c: mux: pca9541: prepare for PCA9641 support

2018-03-21 Thread Vladimir Zapolskiy

On 03/20/2018 11:32 AM, Peter Rosin wrote:
> Make the arbitrate and release_bus implementation chip specific.
> 
> Signed-off-by: Peter Rosin 

Reviewed-by: Vladimir Zapolskiy 


The change is really good and correct, it is just too extended IMHO.

--
With best wishes,
Vladimir

[PATCH net-next 00/11] fix some bugs for HNS3 driver

2018-03-21 Thread Peng Li

This patchset fixes some bugs for HNS3 driver:
[Patch 1/11 - 5/11] fix various bugs reported by hisilicon test team.
[Patch 6/11 - 7/11] fix bugs about interrupt coalescing self-adaptive
function.
[Patch 8/11 - 11/11] fix bugs about ethtool_ops.get_link_ksettings.

Fuyun Liang (7):
  net: hns3: reallocate tx/rx buffer after changing mtu
  net: hns3: change GL update rate
  net: hns3: change the time interval of int_gl calculating
  net: hns3: fix for getting wrong link mode problem
  net: hns3: add get_link support to VF
  net: hns3: add querying speed and duplex support to VF
  net: hns3: fix for not returning problem in get_link_ksettings when
phy exists

Peng Li (2):
  net: hns3: fix the VF queue reset flow error
  net: hns3: increase the max time for IMP handle command

Yunsheng Lin (2):
  net: hns3: fix for vlan table lost problem when resetting
  net: hns3: export pci table of hclge and hclgevf to userspace

 drivers/net/ethernet/hisilicon/hns3/hnae3.h|   4 +
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c|  80 ---
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.h|   6 +
 drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c | 107 +++---
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h |   4 +-
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 158 +++--
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h|  10 ++
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c |  19 ++-
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.h   |   2 +-
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c  |  42 +-
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.h  |   4 +
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_mbx.c   |   5 +
 12 files changed, 309 insertions(+), 132 deletions(-)

-- 
2.9.3

[PATCH net-next 09/11] net: hns3: add get_link support to VF

2018-03-21 Thread Peng Li

From: Fuyun Liang 

This patch adds ethtool_ops.get_link support to VF.

Signed-off-by: Fuyun Liang 
Signed-off-by: Peng Li 
---
 drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c| 1 +
 drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 8 
 2 files changed, 9 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
index 502f347..513d8d6 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
@@ -1053,6 +1053,7 @@ static const struct ethtool_ops hns3vf_ethtool_ops = {
.get_channels = hns3_get_channels,
.get_coalesce = hns3_get_coalesce,
.set_coalesce = hns3_set_coalesce,
+   .get_link = hns3_get_link,
 };
 
 static const struct ethtool_ops hns3_ethtool_ops = {
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
index 14b0e26..f917a1e 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
@@ -1468,6 +1468,13 @@ static void hclgevf_get_tqps_and_rss_info(struct 
hnae3_handle *handle,
*max_rss_size = hdev->rss_size_max;
 }
 
+static int hclgevf_get_status(struct hnae3_handle *handle)
+{
+   struct hclgevf_dev *hdev = hclgevf_ae_get_hdev(handle);
+
+   return hdev->hw.mac.link;
+}
+
 static const struct hnae3_ae_ops hclgevf_ops = {
.init_ae_dev = hclgevf_init_ae_dev,
.uninit_ae_dev = hclgevf_uninit_ae_dev,
@@ -1500,6 +1507,7 @@ static const struct hnae3_ae_ops hclgevf_ops = {
.set_vlan_filter = hclgevf_set_vlan_filter,
.get_channels = hclgevf_get_channels,
.get_tqps_and_rss_info = hclgevf_get_tqps_and_rss_info,
+   .get_status = hclgevf_get_status,
 };
 
 static struct hnae3_ae_algo ae_algovf = {
-- 
2.9.3

[PATCH net-next 11/11] net: hns3: fix for not returning problem in get_link_ksettings when phy exists

2018-03-21 Thread Peng Li

From: Fuyun Liang 

When phy exists, phy_ethtool_ksettings_get function is enough to get the
link ksettings. If the phy exists, get_link_ksettings function can return
directly after phy_ethtool_ksettings_get is called.

Signed-off-by: Fuyun Liang 
Signed-off-by: Peng Li 
---
 drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
index 513d8d6..9d07116 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
@@ -569,9 +569,13 @@ static int hns3_get_link_ksettings(struct net_device 
*netdev,
return -EOPNOTSUPP;
 
/* 1.auto_neg & speed & duplex from cmd */
-   if (netdev->phydev)
+   if (netdev->phydev) {
phy_ethtool_ksettings_get(netdev->phydev, cmd);
-   else if (h->ae_algo->ops->get_ksettings_an_result)
+
+   return 0;
+   }
+
+   if (h->ae_algo->ops->get_ksettings_an_result)
h->ae_algo->ops->get_ksettings_an_result(h,
 &cmd->base.autoneg,
 &cmd->base.speed,
-- 
2.9.3

[PATCH net-next 04/11] net: hns3: export pci table of hclge and hclgevf to userspace

2018-03-21 Thread Peng Li

From: Yunsheng Lin 

There is no module that is dependent on hclge or hclgevf's symbol,
but hns_enet need them to provide ops for it to run. When there is
a need to auto load the hns3 driver, the auto load will fail because
hclge or hclgevf is not loaded.

Hns_enet has already exported the pci table, so this patch exports
the pci table for hclge and hclgevf module too.

Signed-off-by: Yunsheng Lin 
Signed-off-by: Peng Li 
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c   | 2 ++
 drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 588f231..869e98a 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -55,6 +55,8 @@ static const struct pci_device_id ae_algo_pci_tbl[] = {
{0, }
 };
 
+MODULE_DEVICE_TABLE(pci, ae_algo_pci_tbl);
+
 static const char hns3_nic_test_strs[][ETH_GSTRING_LEN] = {
"MacLoopback test",
"Serdes Loopback test",
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
index c96cf03..14b0e26 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
@@ -18,6 +18,8 @@ static const struct pci_device_id ae_algovf_pci_tbl[] = {
{0, }
 };
 
+MODULE_DEVICE_TABLE(pci, ae_algovf_pci_tbl);
+
 static inline struct hclgevf_dev *hclgevf_ae_get_hdev(
struct hnae3_handle *handle)
 {
-- 
2.9.3

RE: [PATCH v2 1/2] dma-mapping: move dma configuration to bus infrastructure

2018-03-21 Thread Bharat Bhushan



> -Original Message-
> From: Nipun Gupta
> Sent: Wednesday, March 21, 2018 12:25 PM
> To: robin.mur...@arm.com; h...@lst.de; li...@armlinux.org.uk;
> gre...@linuxfoundation.org; m.szyprow...@samsung.com
> Cc: bhelg...@google.com; zaj...@gmail.com; andy.gr...@linaro.org;
> david.br...@linaro.org; dan.j.willi...@intel.com; vinod.k...@intel.com;
> thierry.red...@gmail.com; robh...@kernel.org; frowand.l...@gmail.com;
> jarkko.sakki...@linux.intel.com; rafael.j.wyso...@intel.com;
> dmitry.torok...@gmail.com; jo...@kernel.org; msucha...@suse.de; linux-
> ker...@vger.kernel.org; io...@lists.linux-foundation.org; linux-
> wirel...@vger.kernel.org; linux-arm-...@vger.kernel.org; linux-
> s...@vger.kernel.org; dmaeng...@vger.kernel.org; dri-
> de...@lists.freedesktop.org; linux-te...@vger.kernel.org;
> devicet...@vger.kernel.org; linux-...@vger.kernel.org; Bharat Bhushan
> ; Leo Li ; Nipun Gupta
> 
> Subject: [PATCH v2 1/2] dma-mapping: move dma configuration to bus
> infrastructure
> 
> It's bus specific aspect to map a given device on the bus and relevant 
> firmware
> description of its DMA configuration.
> So, this change introduces '/dma_configure/' as bus callback giving 
> flexibility to
> busses for implementing its own dma configuration function.
> 
> The change eases the addition of new busses w.r.t. adding the dma
> configuration functionality.
> 
> This patch also updates the PCI, Platform, ACPI and host1x bus to use new
> introduced callbacks.
> 
> Suggested-by: Christoph Hellwig 
> Signed-off-by: Nipun Gupta 
> ---
>  - The patches are based on the comments on:
>https://patchwork.kernel.org/patch/10259087/
> 
> Changes in v2:
>   - Do not have dma_deconfigure callback
>   - Have '/dma_common_configure/' API to provide a common DMA
> configuration which can be used by busses if it suits them.
>   - Platform and ACPI bus to use '/dma_common_configure/' in
> '/dma_configure/' callback.
>   - Updated commit message
>   - Updated pci_dma_configure API with changes suggested by Robin
> 
>  drivers/amba/bus.c  |  7 +++
>  drivers/base/dma-mapping.c  | 35 +++
>  drivers/base/platform.c |  6 ++
>  drivers/gpu/host1x/bus.c|  9 +
>  drivers/pci/pci-driver.c| 32 
>  include/linux/device.h  |  4 
>  include/linux/dma-mapping.h |  1 +
>  7 files changed, 74 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/amba/bus.c b/drivers/amba/bus.c index 594c228..2fa1e8b
> 100644
> --- a/drivers/amba/bus.c
> +++ b/drivers/amba/bus.c
> @@ -20,6 +20,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #include 
> 
> @@ -171,6 +172,11 @@ static int amba_pm_runtime_resume(struct device
> *dev)  }  #endif /* CONFIG_PM */
> 
> +static int amba_dma_configure(struct device *dev) {
> + return dma_common_configure(dev);
> +}
> +
>  static const struct dev_pm_ops amba_pm = {
>   .suspend= pm_generic_suspend,
>   .resume = pm_generic_resume,
> @@ -194,6 +200,7 @@ struct bus_type amba_bustype = {
>   .dev_groups = amba_dev_groups,
>   .match  = amba_match,
>   .uevent = amba_uevent,
> + .dma_configure  = amba_dma_configure,
>   .pm = &amba_pm,
>   .force_dma  = true,
>  };
> diff --git a/drivers/base/dma-mapping.c b/drivers/base/dma-mapping.c index
> 3b11835..48f9af0 100644
> --- a/drivers/base/dma-mapping.c
> +++ b/drivers/base/dma-mapping.c
> @@ -331,38 +331,33 @@ void dma_common_free_remap(void *cpu_addr,
> size_t size, unsigned long vm_flags)  #endif
> 
>  /*
> - * Common configuration to enable DMA API use for a device
> + * Common configuration to enable DMA API use for a device.
> + * A bus can use this function in its 'dma_configure' callback, if
> + * suitable for the bus.
>   */
> -#include 
> -
> -int dma_configure(struct device *dev)
> +int dma_common_configure(struct device *dev)
>  {
> - struct device *bridge = NULL, *dma_dev = dev;
>   enum dev_dma_attr attr;
>   int ret = 0;
> 
> - if (dev_is_pci(dev)) {
> - bridge = pci_get_host_bridge_device(to_pci_dev(dev));
> - dma_dev = bridge;
> - if (IS_ENABLED(CONFIG_OF) && dma_dev->parent &&
> - dma_dev->parent->of_node)
> - dma_dev = dma_dev->parent;
> - }
> -
> - if (dma_dev->of_node) {
> - ret = of_dma_configure(dev, dma_dev->of_node);
> - } else if (has_acpi_companion(dma_dev)) {
> - attr = acpi_get_dma_attr(to_acpi_device_node(dma_dev-
> >fwnode));
> + if (dev->of_node) {
> + ret = of_dma_configure(dev, dev->of_node);
> + } else if (has_acpi_companion(dev)) {
> + attr = acpi_get_dma_attr(to_acpi_device_node(dev->fwnode));
>   if (attr != DEV_DMA_NOT_SUPPORTED)
>   ret = acpi_dma_configure(dev, attr);
>   }
> 
> - if (bridge)
> - pci_put_host_

[PATCH net-next 03/11] net: hns3: fix for vlan table lost problem when resetting

2018-03-21 Thread Peng Li

From: Yunsheng Lin 

The vlan table in hardware is clear after PF/Core/IMP/Global
reset, which will cause vlan tagged packets not being received
problem.

This patch fixes it by restoring the vlan table after reset.

Signed-off-by: Yunsheng Lin 
Signed-off-by: Peng Li 
---
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 26 +
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.h |  3 +++
 2 files changed, 29 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 94f0b92..f700ec1 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -1404,11 +1404,15 @@ static int hns3_vlan_rx_add_vid(struct net_device 
*netdev,
__be16 proto, u16 vid)
 {
struct hnae3_handle *h = hns3_get_handle(netdev);
+   struct hns3_nic_priv *priv = netdev_priv(netdev);
int ret = -EIO;
 
if (h->ae_algo->ops->set_vlan_filter)
ret = h->ae_algo->ops->set_vlan_filter(h, proto, vid, false);
 
+   if (!ret)
+   set_bit(vid, priv->active_vlans);
+
return ret;
 }
 
@@ -1416,14 +1420,32 @@ static int hns3_vlan_rx_kill_vid(struct net_device 
*netdev,
 __be16 proto, u16 vid)
 {
struct hnae3_handle *h = hns3_get_handle(netdev);
+   struct hns3_nic_priv *priv = netdev_priv(netdev);
int ret = -EIO;
 
if (h->ae_algo->ops->set_vlan_filter)
ret = h->ae_algo->ops->set_vlan_filter(h, proto, vid, true);
 
+   if (!ret)
+   clear_bit(vid, priv->active_vlans);
+
return ret;
 }
 
+static void hns3_restore_vlan(struct net_device *netdev)
+{
+   struct hns3_nic_priv *priv = netdev_priv(netdev);
+   u16 vid;
+   int ret;
+
+   for_each_set_bit(vid, priv->active_vlans, VLAN_N_VID) {
+   ret = hns3_vlan_rx_add_vid(netdev, htons(ETH_P_8021Q), vid);
+   if (ret)
+   netdev_warn(netdev, "Restore vlan: %d filter, ret:%d\n",
+   vid, ret);
+   }
+}
+
 static int hns3_ndo_set_vf_vlan(struct net_device *netdev, int vf, u16 vlan,
u8 qos, __be16 vlan_proto)
 {
@@ -3341,6 +3363,10 @@ static int hns3_reset_notify_init_enet(struct 
hnae3_handle *handle)
hns3_nic_set_rx_mode(netdev);
hns3_recover_hw_addr(netdev);
 
+   /* Hardware table is only clear when pf resets */
+   if (!(handle->flags & HNAE3_SUPPORT_VF))
+   hns3_restore_vlan(netdev);
+
/* Carrier off reporting is important to ethtool even BEFORE open */
netif_carrier_off(netdev);
 
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h 
b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
index a5f4550..c313780 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
@@ -10,6 +10,8 @@
 #ifndef __HNS3_ENET_H
 #define __HNS3_ENET_H
 
+#include 
+
 #include "hnae3.h"
 
 extern const char hns3_driver_version[];
@@ -539,6 +541,7 @@ struct hns3_nic_priv {
struct notifier_block notifier_block;
/* Vxlan/Geneve information */
struct hns3_udp_tunnel udp_tnl[HNS3_UDP_TNL_MAX];
+   unsigned long active_vlans[BITS_TO_LONGS(VLAN_N_VID)];
 };
 
 union l3_hdr_info {
-- 
2.9.3

[PATCH net-next 01/11] net: hns3: reallocate tx/rx buffer after changing mtu

2018-03-21 Thread Peng Li

From: Fuyun Liang 

When changing the mtu, the max frame size also will be changed. The tx
buffer size and the rx buffer size to be allocated are determined by max
frame size. So when max frame size is changed, the tx buffer and rx buffer
need to be reallocated.

When the tc_num is changed, the tx buffer and rx buffer need to be
reallocated too. So calling set_mtu and buffer_alloc separately is better.

Signed-off-by: Fuyun Liang 
Signed-off-by: Peng Li 
---
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 36 +-
 1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index d70619b..e110c65 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -4772,11 +4772,9 @@ static int hclge_en_hw_strip_rxvtag(struct hnae3_handle 
*handle, bool enable)
return hclge_set_vlan_rx_offload_cfg(vport);
 }
 
-static int hclge_set_mtu(struct hnae3_handle *handle, int new_mtu)
+static int hclge_set_mac_mtu(struct hclge_dev *hdev, int new_mtu)
 {
-   struct hclge_vport *vport = hclge_get_vport(handle);
struct hclge_config_max_frm_size_cmd *req;
-   struct hclge_dev *hdev = vport->back;
struct hclge_desc desc;
int max_frm_size;
int ret;
@@ -4805,6 +4803,27 @@ static int hclge_set_mtu(struct hnae3_handle *handle, 
int new_mtu)
return 0;
 }
 
+static int hclge_set_mtu(struct hnae3_handle *handle, int new_mtu)
+{
+   struct hclge_vport *vport = hclge_get_vport(handle);
+   struct hclge_dev *hdev = vport->back;
+   int ret;
+
+   ret = hclge_set_mac_mtu(hdev, new_mtu);
+   if (ret) {
+   dev_err(&hdev->pdev->dev,
+   "Change mtu fail, ret =%d\n", ret);
+   return ret;
+   }
+
+   ret = hclge_buffer_alloc(hdev);
+   if (ret)
+   dev_err(&hdev->pdev->dev,
+   "Allocate buffer fail, ret =%d\n", ret);
+
+   return ret;
+}
+
 static int hclge_send_reset_tqp_cmd(struct hclge_dev *hdev, u16 queue_id,
bool enable)
 {
@@ -5392,11 +5411,6 @@ static int hclge_init_ae_dev(struct hnae3_ae_dev *ae_dev)
dev_err(&pdev->dev, "Mac init error, ret = %d\n", ret);
return ret;
}
-   ret = hclge_buffer_alloc(hdev);
-   if (ret) {
-   dev_err(&pdev->dev, "Buffer allocate fail, ret =%d\n", ret);
-   return  ret;
-   }
 
ret = hclge_config_tso(hdev, HCLGE_TSO_MSS_MIN, HCLGE_TSO_MSS_MAX);
if (ret) {
@@ -5503,12 +5517,6 @@ static int hclge_reset_ae_dev(struct hnae3_ae_dev 
*ae_dev)
return ret;
}
 
-   ret = hclge_buffer_alloc(hdev);
-   if (ret) {
-   dev_err(&pdev->dev, "Buffer allocate fail, ret =%d\n", ret);
-   return ret;
-   }
-
ret = hclge_config_tso(hdev, HCLGE_TSO_MSS_MIN, HCLGE_TSO_MSS_MAX);
if (ret) {
dev_err(&pdev->dev, "Enable tso fail, ret =%d\n", ret);
-- 
2.9.3

[PATCH net-next 08/11] net: hns3: fix for getting wrong link mode problem

2018-03-21 Thread Peng Li

From: Fuyun Liang 

Fixed link mode is returned by hns3_get_link_ksettings. It is
unreasonable.

This patch fixes it by adding some related functions to get link
mode from hardware.

Signed-off-by: Fuyun Liang 
Signed-off-by: Peng Li 
---
 drivers/net/ethernet/hisilicon/hns3/hnae3.h|  4 +
 drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c | 98 ++
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h |  2 +
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 83 ++
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h|  9 ++
 5 files changed, 107 insertions(+), 89 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.h 
b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
index 70441d2..9daa88d 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hnae3.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
@@ -411,6 +411,10 @@ struct hnae3_ae_ops {
 u32 *flowctrl_adv);
int (*set_led_id)(struct hnae3_handle *handle,
  enum ethtool_phys_id_state status);
+   void (*get_link_mode)(struct hnae3_handle *handle,
+ unsigned long *supported,
+ unsigned long *advertising);
+   void (*get_port_type)(struct hnae3_handle *handle, u8 *port_type);
 };
 
 struct hnae3_dcb_ops {
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
index 2db127c..502f347 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
@@ -74,19 +74,6 @@ struct hns3_link_mode_mapping {
u32 ethtool_link_mode;
 };
 
-static const struct hns3_link_mode_mapping hns3_lm_map[] = {
-   {HNS3_LM_FIBRE_BIT, ETHTOOL_LINK_MODE_FIBRE_BIT},
-   {HNS3_LM_AUTONEG_BIT, ETHTOOL_LINK_MODE_Autoneg_BIT},
-   {HNS3_LM_TP_BIT, ETHTOOL_LINK_MODE_TP_BIT},
-   {HNS3_LM_PAUSE_BIT, ETHTOOL_LINK_MODE_Pause_BIT},
-   {HNS3_LM_BACKPLANE_BIT, ETHTOOL_LINK_MODE_Backplane_BIT},
-   {HNS3_LM_10BASET_HALF_BIT, ETHTOOL_LINK_MODE_10baseT_Half_BIT},
-   {HNS3_LM_10BASET_FULL_BIT, ETHTOOL_LINK_MODE_10baseT_Full_BIT},
-   {HNS3_LM_100BASET_HALF_BIT, ETHTOOL_LINK_MODE_100baseT_Half_BIT},
-   {HNS3_LM_100BASET_FULL_BIT, ETHTOOL_LINK_MODE_100baseT_Full_BIT},
-   {HNS3_LM_1000BASET_FULL_BIT, ETHTOOL_LINK_MODE_1000baseT_Full_BIT},
-};
-
 static int hns3_lp_setup(struct net_device *ndev, enum hnae3_loop loop)
 {
struct hnae3_handle *h = hns3_get_handle(ndev);
@@ -365,24 +352,6 @@ static void hns3_self_test(struct net_device *ndev,
dev_open(ndev);
 }
 
-static void hns3_driv_to_eth_caps(u32 caps, struct ethtool_link_ksettings *cmd,
- bool is_advertised)
-{
-   int i;
-
-   for (i = 0; i < ARRAY_SIZE(hns3_lm_map); i++) {
-   if (!(caps & hns3_lm_map[i].hns3_link_mode))
-   continue;
-
-   if (is_advertised)
-   __set_bit(hns3_lm_map[i].ethtool_link_mode,
- cmd->link_modes.advertising);
-   else
-   __set_bit(hns3_lm_map[i].ethtool_link_mode,
- cmd->link_modes.supported);
-   }
-}
-
 static int hns3_get_sset_count(struct net_device *netdev, int stringset)
 {
struct hnae3_handle *h = hns3_get_handle(netdev);
@@ -594,9 +563,6 @@ static int hns3_get_link_ksettings(struct net_device 
*netdev,
 {
struct hnae3_handle *h = hns3_get_handle(netdev);
u32 flowctrl_adv = 0;
-   u32 supported_caps;
-   u32 advertised_caps;
-   u8 media_type = HNAE3_MEDIA_TYPE_UNKNOWN;
u8 link_stat;
 
if (!h->ae_algo || !h->ae_algo->ops)
@@ -619,62 +585,16 @@ static int hns3_get_link_ksettings(struct net_device 
*netdev,
cmd->base.duplex = DUPLEX_UNKNOWN;
}
 
-   /* 2.media_type get from bios parameter block */
-   if (h->ae_algo->ops->get_media_type) {
-   h->ae_algo->ops->get_media_type(h, &media_type);
+   /* 2.get link mode and port type*/
+   if (h->ae_algo->ops->get_link_mode)
+   h->ae_algo->ops->get_link_mode(h,
+  cmd->link_modes.supported,
+  cmd->link_modes.advertising);
 
-   switch (media_type) {
-   case HNAE3_MEDIA_TYPE_FIBER:
-   cmd->base.port = PORT_FIBRE;
-   supported_caps = HNS3_LM_FIBRE_BIT |
-HNS3_LM_AUTONEG_BIT |
-HNS3_LM_PAUSE_BIT |
-HNS3_LM_1000BASET_FULL_BIT;
-
-   advertised_caps = supported_caps;
-   break;
-   case HNAE3_MEDIA_TYPE_COPPER:
-   cmd->base.port = PORT_TP;
-

[PATCH net-next 07/11] net: hns3: change the time interval of int_gl calculating

2018-03-21 Thread Peng Li

From: Fuyun Liang 

Since we change the update rate of int_gl from every interrupt to every
one hundred interrupts, the old way to get time interval by int_gl value
is not accurate. This patch calculates the time interval using the jiffies
value.

Signed-off-by: Fuyun Liang 
Signed-off-by: Peng Li 
---
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 46 -
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.h |  1 +
 2 files changed, 30 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index e7cf7b4..0b4a676 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -2406,15 +2406,15 @@ int hns3_clean_rx_ring(
 
 static bool hns3_get_new_int_gl(struct hns3_enet_ring_group *ring_group)
 {
-#define HNS3_RX_ULTRA_PACKET_RATE 4
+   struct hns3_enet_tqp_vector *tqp_vector =
+   ring_group->ring->tqp_vector;
enum hns3_flow_level_range new_flow_level;
-   struct hns3_enet_tqp_vector *tqp_vector;
-   int packets_per_secs;
-   int bytes_per_usecs;
+   int packets_per_msecs;
+   int bytes_per_msecs;
+   u32 time_passed_ms;
u16 new_int_gl;
-   int usecs;
 
-   if (!ring_group->coal.int_gl)
+   if (!ring_group->coal.int_gl || !tqp_vector->last_jiffies)
return false;
 
if (ring_group->total_packets == 0) {
@@ -2431,33 +2431,44 @@ static bool hns3_get_new_int_gl(struct 
hns3_enet_ring_group *ring_group)
 */
new_flow_level = ring_group->coal.flow_level;
new_int_gl = ring_group->coal.int_gl;
-   tqp_vector = ring_group->ring->tqp_vector;
-   usecs = (ring_group->coal.int_gl << 1);
-   bytes_per_usecs = ring_group->total_bytes / usecs;
-   /* 100 microseconds */
-   packets_per_secs = ring_group->total_packets * 100 / usecs;
+   time_passed_ms =
+   jiffies_to_msecs(jiffies - tqp_vector->last_jiffies);
+
+   if (!time_passed_ms)
+   return false;
+
+   do_div(ring_group->total_packets, time_passed_ms);
+   packets_per_msecs = ring_group->total_packets;
+
+   do_div(ring_group->total_bytes, time_passed_ms);
+   bytes_per_msecs = ring_group->total_bytes;
+
+#define HNS3_RX_LOW_BYTE_RATE 1
+#define HNS3_RX_MID_BYTE_RATE 2
 
switch (new_flow_level) {
case HNS3_FLOW_LOW:
-   if (bytes_per_usecs > 10)
+   if (bytes_per_msecs > HNS3_RX_LOW_BYTE_RATE)
new_flow_level = HNS3_FLOW_MID;
break;
case HNS3_FLOW_MID:
-   if (bytes_per_usecs > 20)
+   if (bytes_per_msecs > HNS3_RX_MID_BYTE_RATE)
new_flow_level = HNS3_FLOW_HIGH;
-   else if (bytes_per_usecs <= 10)
+   else if (bytes_per_msecs <= HNS3_RX_LOW_BYTE_RATE)
new_flow_level = HNS3_FLOW_LOW;
break;
case HNS3_FLOW_HIGH:
case HNS3_FLOW_ULTRA:
default:
-   if (bytes_per_usecs <= 20)
+   if (bytes_per_msecs <= HNS3_RX_MID_BYTE_RATE)
new_flow_level = HNS3_FLOW_MID;
break;
}
 
-   if ((packets_per_secs > HNS3_RX_ULTRA_PACKET_RATE) &&
-   (&tqp_vector->rx_group == ring_group))
+#define HNS3_RX_ULTRA_PACKET_RATE 40
+
+   if (packets_per_msecs > HNS3_RX_ULTRA_PACKET_RATE &&
+   &tqp_vector->rx_group == ring_group)
new_flow_level = HNS3_FLOW_ULTRA;
 
switch (new_flow_level) {
@@ -2512,6 +2523,7 @@ static void hns3_update_new_int_gl(struct 
hns3_enet_tqp_vector *tqp_vector)
   tx_group->coal.int_gl);
}
 
+   tqp_vector->last_jiffies = jiffies;
tqp_vector->int_adapt_down = HNS3_INT_ADAPT_DOWN_START;
 }
 
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h 
b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
index 2fe870b..39daa01 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
@@ -499,6 +499,7 @@ struct hns3_enet_tqp_vector {
 
/* when 0 should adjust interrupt coalesce parameter */
u8 int_adapt_down;
+   unsigned long last_jiffies;
 } cacheline_internodealigned_in_smp;
 
 enum hns3_udp_tnl_type {
-- 
2.9.3

[PATCH net-next 10/11] net: hns3: add querying speed and duplex support to VF

2018-03-21 Thread Peng Li

From: Fuyun Liang 

This patch adds support for querying speed and duplex by ethtool ethX
to VF.

Signed-off-by: Fuyun Liang 
Signed-off-by: Peng Li 
---
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c |  8 ++--
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c  | 22 ++
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.h  |  4 
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_mbx.c   |  5 +
 4 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c
index cef14e7..949da0c 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c
@@ -309,16 +309,20 @@ static int hclge_get_link_info(struct hclge_vport *vport,
 {
struct hclge_dev *hdev = vport->back;
u16 link_status;
-   u8 msg_data[2];
+   u8 msg_data[8];
u8 dest_vfid;
+   u16 duplex;
 
/* mac.link can only be 0 or 1 */
link_status = (u16)hdev->hw.mac.link;
+   duplex = hdev->hw.mac.duplex;
memcpy(&msg_data[0], &link_status, sizeof(u16));
+   memcpy(&msg_data[2], &hdev->hw.mac.speed, sizeof(u32));
+   memcpy(&msg_data[6], &duplex, sizeof(u16));
dest_vfid = mbx_req->mbx_src_vfid;
 
/* send this requested info to VF */
-   return hclge_send_mbx_msg(vport, msg_data, sizeof(u8),
+   return hclge_send_mbx_msg(vport, msg_data, sizeof(msg_data),
  HCLGE_MBX_LINK_STAT_CHANGE, dest_vfid);
 }
 
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
index f917a1e..906dfa3 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
@@ -1475,6 +1475,27 @@ static int hclgevf_get_status(struct hnae3_handle 
*handle)
return hdev->hw.mac.link;
 }
 
+static void hclgevf_get_ksettings_an_result(struct hnae3_handle *handle,
+   u8 *auto_neg, u32 *speed,
+   u8 *duplex)
+{
+   struct hclgevf_dev *hdev = hclgevf_ae_get_hdev(handle);
+
+   if (speed)
+   *speed = hdev->hw.mac.speed;
+   if (duplex)
+   *duplex = hdev->hw.mac.duplex;
+   if (auto_neg)
+   *auto_neg = AUTONEG_DISABLE;
+}
+
+void hclgevf_update_speed_duplex(struct hclgevf_dev *hdev, u32 speed,
+u8 duplex)
+{
+   hdev->hw.mac.speed = speed;
+   hdev->hw.mac.duplex = duplex;
+}
+
 static const struct hnae3_ae_ops hclgevf_ops = {
.init_ae_dev = hclgevf_init_ae_dev,
.uninit_ae_dev = hclgevf_uninit_ae_dev,
@@ -1508,6 +1529,7 @@ static const struct hnae3_ae_ops hclgevf_ops = {
.get_channels = hclgevf_get_channels,
.get_tqps_and_rss_info = hclgevf_get_tqps_and_rss_info,
.get_status = hclgevf_get_status,
+   .get_ksettings_an_result = hclgevf_get_ksettings_an_result,
 };
 
 static struct hnae3_ae_algo ae_algovf = {
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.h 
b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.h
index a63bee4..0eaea06 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.h
@@ -61,6 +61,8 @@ enum hclgevf_states {
 struct hclgevf_mac {
u8 mac_addr[ETH_ALEN];
int link;
+   u8 duplex;
+   u32 speed;
 };
 
 struct hclgevf_hw {
@@ -161,4 +163,6 @@ int hclgevf_send_mbx_msg(struct hclgevf_dev *hdev, u16 
code, u16 subcode,
 u8 *resp_data, u16 resp_len);
 void hclgevf_mbx_handler(struct hclgevf_dev *hdev);
 void hclgevf_update_link_status(struct hclgevf_dev *hdev, int link_state);
+void hclgevf_update_speed_duplex(struct hclgevf_dev *hdev, u32 speed,
+u8 duplex);
 #endif
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_mbx.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_mbx.c
index 9768f71..a63ed3a 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_mbx.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_mbx.c
@@ -133,6 +133,8 @@ void hclgevf_mbx_handler(struct hclgevf_dev *hdev)
struct hclgevf_cmq_ring *crq;
struct hclgevf_desc *desc;
u16 link_status, flag;
+   u32 speed;
+   u8 duplex;
u8 *temp;
int i;
 
@@ -164,9 +166,12 @@ void hclgevf_mbx_handler(struct hclgevf_dev *hdev)
break;
case HCLGE_MBX_LINK_STAT_CHANGE:
link_status = le16_to_cpu(req->msg[1]);
+   memcpy(&speed, &req->msg[2], sizeof(speed));
+   duplex = (u8)le16_to_cpu(req->msg[4]);
 
/* update upper layer with new link link status */

[PATCH net-next 05/11] net: hns3: increase the max time for IMP handle command

2018-03-21 Thread Peng Li

It may need more time for IMP handle some command, such as reset.
This patch enlarges the max time for cmd timeout.

Driver will check the IMP result every us, it may break through the
loop when get the right result. So not all command need the max time.

Signed-off-by: Peng Li 
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h   | 2 +-
 drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
index 3fd10a6..aae4abe 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
@@ -12,7 +12,7 @@
 #include 
 #include 
 
-#define HCLGE_CMDQ_TX_TIMEOUT  1000
+#define HCLGE_CMDQ_TX_TIMEOUT  3
 
 struct hclge_dev;
 struct hclge_desc {
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.h 
b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.h
index 2caca93..621c6cb 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.h
@@ -7,7 +7,7 @@
 #include 
 #include "hnae3.h"
 
-#define HCLGEVF_CMDQ_TX_TIMEOUT200
+#define HCLGEVF_CMDQ_TX_TIMEOUT3
 #define HCLGEVF_CMDQ_RX_INVLD_B0
 #define HCLGEVF_CMDQ_RX_OUTVLD_B   1
 
-- 
2.9.3

[PATCH net-next 06/11] net: hns3: change GL update rate

2018-03-21 Thread Peng Li

From: Fuyun Liang 

The interrupt coalescing self-adaptive function updates the int_gl every
interrupt. The GL update rate is too faster to get a better new GL value.
This patch changes the GL update rate to every one hundred interrupts.
The GL update rate is defined by HNS3_INT_ADAPT_DOWN_START.

Signed-off-by: Fuyun Liang 
Signed-off-by: Peng Li 
---
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 8 
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.h | 2 ++
 2 files changed, 10 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index f700ec1..e7cf7b4 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -214,6 +214,7 @@ static void hns3_vector_gl_rl_init(struct 
hns3_enet_tqp_vector *tqp_vector,
/* Default: disable RL */
h->kinfo.int_rl_setting = 0;
 
+   tqp_vector->int_adapt_down = HNS3_INT_ADAPT_DOWN_START;
tqp_vector->rx_group.coal.flow_level = HNS3_FLOW_LOW;
tqp_vector->tx_group.coal.flow_level = HNS3_FLOW_LOW;
 }
@@ -2492,6 +2493,11 @@ static void hns3_update_new_int_gl(struct 
hns3_enet_tqp_vector *tqp_vector)
struct hns3_enet_ring_group *tx_group = &tqp_vector->tx_group;
bool rx_update, tx_update;
 
+   if (tqp_vector->int_adapt_down > 0) {
+   tqp_vector->int_adapt_down--;
+   return;
+   }
+
if (rx_group->coal.gl_adapt_enable) {
rx_update = hns3_get_new_int_gl(rx_group);
if (rx_update)
@@ -2505,6 +2511,8 @@ static void hns3_update_new_int_gl(struct 
hns3_enet_tqp_vector *tqp_vector)
hns3_set_vector_coalesce_tx_gl(tqp_vector,
   tx_group->coal.int_gl);
}
+
+   tqp_vector->int_adapt_down = HNS3_INT_ADAPT_DOWN_START;
 }
 
 static int hns3_nic_common_poll(struct napi_struct *napi, int budget)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h 
b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
index c313780..2fe870b 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
@@ -462,6 +462,8 @@ enum hns3_link_mode_bits {
 #define HNS3_INT_RL_MAX0x00EC
 #define HNS3_INT_RL_ENABLE_MASK0x40
 
+#define HNS3_INT_ADAPT_DOWN_START  100
+
 struct hns3_enet_coalesce {
u16 int_gl;
u8 gl_adapt_enable;
-- 
2.9.3

[PATCH net-next 02/11] net: hns3: fix the VF queue reset flow error

2018-03-21 Thread Peng Li

VF queue reset flow is different from PF queue reset flow.
VF driver should stop VF queue first, then send message to PF
and PF do the reset. PF should send a response to VF after
PF complete the queue reset, VF can initialize the queue hw
after get the response.
This patch fixes the VF queue reset flow as the correct step.

Signed-off-by: Peng Li 
---
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 37 ++
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h|  1 +
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c | 11 ---
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c  | 10 --
 4 files changed, 53 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index e110c65..588f231 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -4926,6 +4926,43 @@ void hclge_reset_tqp(struct hnae3_handle *handle, u16 
queue_id)
}
 }
 
+void hclge_reset_vf_queue(struct hclge_vport *vport, u16 queue_id)
+{
+   struct hclge_dev *hdev = vport->back;
+   int reset_try_times = 0;
+   int reset_status;
+   u16 queue_gid;
+   int ret;
+
+   queue_gid = hclge_covert_handle_qid_global(&vport->nic, queue_id);
+
+   ret = hclge_send_reset_tqp_cmd(hdev, queue_gid, true);
+   if (ret) {
+   dev_warn(&hdev->pdev->dev,
+"Send reset tqp cmd fail, ret = %d\n", ret);
+   return;
+   }
+
+   reset_try_times = 0;
+   while (reset_try_times++ < HCLGE_TQP_RESET_TRY_TIMES) {
+   /* Wait for tqp hw reset */
+   msleep(20);
+   reset_status = hclge_get_reset_status(hdev, queue_gid);
+   if (reset_status)
+   break;
+   }
+
+   if (reset_try_times >= HCLGE_TQP_RESET_TRY_TIMES) {
+   dev_warn(&hdev->pdev->dev, "Reset TQP fail\n");
+   return;
+   }
+
+   ret = hclge_send_reset_tqp_cmd(hdev, queue_gid, false);
+   if (ret)
+   dev_warn(&hdev->pdev->dev,
+"Deassert the soft reset fail, ret = %d\n", ret);
+}
+
 static u32 hclge_get_fw_version(struct hnae3_handle *handle)
 {
struct hclge_vport *vport = hclge_get_vport(handle);
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
index 7bff6ef..edbcb73 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
@@ -646,5 +646,6 @@ void hclge_rss_indir_init_cfg(struct hclge_dev *hdev);
 
 void hclge_mbx_handler(struct hclge_dev *hdev);
 void hclge_reset_tqp(struct hnae3_handle *handle, u16 queue_id);
+void hclge_reset_vf_queue(struct hclge_vport *vport, u16 queue_id);
 int hclge_cfg_flowctrl(struct hclge_dev *hdev);
 #endif
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c
index 4a49a6b..cef14e7 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c
@@ -322,14 +322,17 @@ static int hclge_get_link_info(struct hclge_vport *vport,
  HCLGE_MBX_LINK_STAT_CHANGE, dest_vfid);
 }
 
-static void hclge_reset_vf_queue(struct hclge_vport *vport,
-struct hclge_mbx_vf_to_pf_cmd *mbx_req)
+static void hclge_mbx_reset_vf_queue(struct hclge_vport *vport,
+struct hclge_mbx_vf_to_pf_cmd *mbx_req)
 {
u16 queue_id;
 
memcpy(&queue_id, &mbx_req->msg[2], sizeof(queue_id));
 
-   hclge_reset_tqp(&vport->nic, queue_id);
+   hclge_reset_vf_queue(vport, queue_id);
+
+   /* send response msg to VF after queue reset complete*/
+   hclge_gen_resp_to_vf(vport, mbx_req, 0, NULL, 0);
 }
 
 void hclge_mbx_handler(struct hclge_dev *hdev)
@@ -407,7 +410,7 @@ void hclge_mbx_handler(struct hclge_dev *hdev)
ret);
break;
case HCLGE_MBX_QUEUE_RESET:
-   hclge_reset_vf_queue(vport, req);
+   hclge_mbx_reset_vf_queue(vport, req);
break;
default:
dev_err(&hdev->pdev->dev,
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
index 2337025..c96cf03 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
@@ -817,11 +817,17 @@ static void hclgevf_reset_tqp(struct hnae3_handle 
*handle, u16 queue_id)
 {
struct hclgevf_dev *hdev = hclgevf_ae_get_hdev(handle);
u8 msg_data[2];
+   int ret;
 
memcpy(&msg_da

Re: [Bug 199003] console stalled, cause Hard LOCKUP.

2018-03-21 Thread Sergey Senozhatsky

On (03/20/18 09:34), bugzilla-dae...@bugzilla.kernel.org wrote:
[..]
> Thanks very much.
> commit e480af09c49736848f749a43dff2c902104f6691 avoided the NMI watchdog
> trigger.

Hm, okay... But "touch_nmi_watchdog() everywhere printk/console-related"
is not exactly where I wanted us to be.

By the way e480af09c49736848f749a43dff2c902104f6691 is from 2006.
Are you sure you meant exactly that commit? What kernel do you use?

Are you saying that none of Steven's patches helped on your setups?

> And this patch may  avdoid long time blocking:
> https://lkml.org/lkml/2018/3/8/584
> 
> We've test it several days.

Hm, printk_deferred is a bit dangerous; it moves console_unlock() to
IRQ. So you still can have the problem of stuck CPUs, it's just now
you shut up the watchdog. Did you test Steven's patches?

A tricky part about printk_deferred() is that it does not use hand off
mechanism. And even more...  What we have with "printk vs printk"
sceanrio

CPU0CPU1... CPUN

printk  printk
 console_unlock  hand off   printk
  console_unlock hand off
  console_unlock

turns into a good old "one CPU prints it all" when we have "printk vs
printk_deferred" case. Because printk_deferred just log_store messages
and then _may be_ it grabs the console_sem from IRQ and invokes
console_unlock().

So it's something like this

CPU0CPU1... CPUN

printk  printk_deffered
 console_unlock printk_deferred
 console_unlock
 console_unlock
... ... ...
printk_deffered printk_deferred
 console_unlock
 console_unlock

// offtopic  "I can has printk_kthread?"

You now touch_nmi_watchdog() from the console driver [well... at least this
is what e480af09c4973 is doing, but I'm not sure I see how come you didn't
have it applied], so that's why you don't see hard lockups on that CPU0. But
your printing CPU still can stuck, which will defer RCUs on that CPU, etc.
etc. etc. So I'd say that those two approaches

printk_deferred + touch_nmi_watchdog

combined can do quite some harm. One thing for sure - they don't really fix
any problems.

-ss

RE: [PATCH v2 1/2] dma-mapping: move dma configuration to bus infrastructure

2018-03-21 Thread Nipun Gupta



> -Original Message-
> From: Bharat Bhushan
> Sent: Wednesday, March 21, 2018 12:49

> >
> > +int dma_configure(struct device *dev)
> > +{
> > +   if (dev->bus->dma_configure)
> > +   return dev->bus->dma_configure(dev);
> 
> What if dma_common_configure() is called in case "bus->dma_configure" is
> not defined?
> 
> Thanks
> -Bharat

I think it is cleaner for bus to call '/dma_common_configure/' rather
than this been called implicitly, but Robin/Christoph can comment
better on this.

Thanks,
Nipun

Re: [PATCH RFC 2/2] virtio_ring: support packed ring

2018-03-21 Thread Tiwei Bie

On Fri, Mar 16, 2018 at 07:36:47PM +0800, Jason Wang wrote:
> On 2018年03月16日 18:04, Tiwei Bie wrote:
> > On Fri, Mar 16, 2018 at 04:34:28PM +0800, Jason Wang wrote:
> > > On 2018年03月16日 15:40, Tiwei Bie wrote:
> > > > On Fri, Mar 16, 2018 at 02:44:12PM +0800, Jason Wang wrote:
> > > > > On 2018年03月16日 14:10, Tiwei Bie wrote:
> > > > > > On Fri, Mar 16, 2018 at 12:03:25PM +0800, Jason Wang wrote:
> > > > > > > On 2018年02月23日 19:18, Tiwei Bie wrote:
> > > > > > > > Signed-off-by: Tiwei Bie 
> > > > > > > > ---
> > > > > > > >  drivers/virtio/virtio_ring.c | 699 
> > > > > > > > +--
> > > > > > > >  include/linux/virtio_ring.h  |   8 +-
> > > > > > > >  2 files changed, 618 insertions(+), 89 deletions(-)
[...]
> > > @@ -1096,17 +1599,21 @@ struct virtqueue *vring_create_virtqueue(
> > > > > > > > if (!queue) {
> > > > > > > > /* Try to get a single page. You are my only 
> > > > > > > > hope! */
> > > > > > > > -   queue = vring_alloc_queue(vdev, vring_size(num, 
> > > > > > > > vring_align),
> > > > > > > > +   queue = vring_alloc_queue(vdev, 
> > > > > > > > __vring_size(num, vring_align,
> > > > > > > > +
> > > > > > > > packed),
> > > > > > > >   &dma_addr, 
> > > > > > > > GFP_KERNEL|__GFP_ZERO);
> > > > > > > > }
> > > > > > > > if (!queue)
> > > > > > > > return NULL;
> > > > > > > > -   queue_size_in_bytes = vring_size(num, vring_align);
> > > > > > > > -   vring_init(&vring, num, queue, vring_align);
> > > > > > > > +   queue_size_in_bytes = __vring_size(num, vring_align, 
> > > > > > > > packed);
> > > > > > > > +   if (packed)
> > > > > > > > +   vring_packed_init(&vring.vring_packed, num, 
> > > > > > > > queue, vring_align);
> > > > > > > > +   else
> > > > > > > > +   vring_init(&vring.vring_split, num, queue, 
> > > > > > > > vring_align);
> > > > > > > Let's rename vring_init to vring_init_split() like other helpers?
> > > > > > The vring_init() is a public API in 
> > > > > > include/uapi/linux/virtio_ring.h.
> > > > > > I don't think we can rename it.
> > > > > I see, then this need more thoughts to unify the API.
> > > > My thought is to keep the old API as is, and introduce
> > > > new types and helpers for packed ring.
> > > I admit it's not a fault of this patch. But we'd better think of this in 
> > > the
> > > future, consider we may have new kinds of ring.
> > > 
> > > > More details can be found in this patch:
> > > > https://lkml.org/lkml/2018/2/23/243
> > > > (PS. The type which has bit fields is just for reference,
> > > >and will be changed in next version.)
> > > > 
> > > > Do you have any other suggestions?
> > > No.
> > Hmm.. Sorry, I didn't describe my question well.
> > I mean do you have any suggestions about the API
> > design for packed ring in uapi header? Currently
> > I introduced below two new helpers:
> > 
> > static inline void vring_packed_init(struct vring_packed *vr, unsigned int 
> > num,
> >  void *p, unsigned long align);
> > static inline unsigned vring_packed_size(unsigned int num, unsigned long 
> > align);
> > 
> > When new rings are introduced in the future, above
> > helpers can't be reused. Maybe we should make the
> > helpers be able to determine the ring type?
> 
> Let's wait for Michael's comment here. Generally, I fail to understand why
> vring_init() become a part of uapi. Git grep shows the only use cases are
> virtio_test/vringh_test.

Thank you very much for the review on this patch!
I'll send out a new version ASAP to address these
comments. :)

Best regards,
Tiwei Bie

[PATCH 1/9] aio: don't print the page size at boot time

2018-03-21 Thread Christoph Hellwig

The page size is in no way related to the aio code, and printing it in
the (debug) dmesg at every boot serves no purpose.

Signed-off-by: Christoph Hellwig 
Acked-by: Jeff Moyer 
Reviewed-by: Darrick J. Wong 
---
 fs/aio.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index a062d75109cb..03d59593912d 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -264,9 +264,6 @@ static int __init aio_setup(void)
 
kiocb_cachep = KMEM_CACHE(aio_kiocb, SLAB_HWCACHE_ALIGN|SLAB_PANIC);
kioctx_cachep = KMEM_CACHE(kioctx,SLAB_HWCACHE_ALIGN|SLAB_PANIC);
-
-   pr_debug("sizeof(struct page) = %zu\n", sizeof(struct page));
-
return 0;
 }
 __initcall(aio_setup);
-- 
2.14.2

[PATCH 3/9] aio: refactor read/write iocb setup

2018-03-21 Thread Christoph Hellwig

Don't reference the kiocb structure from the common aio code, and move
any use of it into helper specific to the read/write path.  This is in
preparation for aio_poll support that wants to use the space for different
fields.

Signed-off-by: Christoph Hellwig 
Acked-by: Jeff Moyer 
Reviewed-by: Darrick J. Wong 
---
 fs/aio.c | 171 ---
 1 file changed, 97 insertions(+), 74 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 41fc8ce6bc7f..6295fc00f104 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -170,7 +170,9 @@ struct kioctx {
 #define KIOCB_CANCELLED((void *) (~0ULL))
 
 struct aio_kiocb {
-   struct kiocbcommon;
+   union {
+   struct kiocbrw;
+   };
 
struct kioctx   *ki_ctx;
kiocb_cancel_fn *ki_cancel;
@@ -549,7 +551,7 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int 
nr_events)
 
 void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel)
 {
-   struct aio_kiocb *req = container_of(iocb, struct aio_kiocb, common);
+   struct aio_kiocb *req = container_of(iocb, struct aio_kiocb, rw);
struct kioctx *ctx = req->ki_ctx;
unsigned long flags;
 
@@ -582,7 +584,7 @@ static int kiocb_cancel(struct aio_kiocb *kiocb)
cancel = cmpxchg(&kiocb->ki_cancel, old, KIOCB_CANCELLED);
} while (cancel != old);
 
-   return cancel(&kiocb->common);
+   return cancel(&kiocb->rw);
 }
 
 static void free_ioctx(struct work_struct *work)
@@ -1040,15 +1042,6 @@ static inline struct aio_kiocb *aio_get_req(struct 
kioctx *ctx)
return NULL;
 }
 
-static void kiocb_free(struct aio_kiocb *req)
-{
-   if (req->common.ki_filp)
-   fput(req->common.ki_filp);
-   if (req->ki_eventfd != NULL)
-   eventfd_ctx_put(req->ki_eventfd);
-   kmem_cache_free(kiocb_cachep, req);
-}
-
 static struct kioctx *lookup_ioctx(unsigned long ctx_id)
 {
struct aio_ring __user *ring  = (void __user *)ctx_id;
@@ -1079,29 +1072,14 @@ static struct kioctx *lookup_ioctx(unsigned long ctx_id)
 /* aio_complete
  * Called when the io request on the given iocb is complete.
  */
-static void aio_complete(struct kiocb *kiocb, long res, long res2)
+static void aio_complete(struct aio_kiocb *iocb, long res, long res2)
 {
-   struct aio_kiocb *iocb = container_of(kiocb, struct aio_kiocb, common);
struct kioctx   *ctx = iocb->ki_ctx;
struct aio_ring *ring;
struct io_event *ev_page, *event;
unsigned tail, pos, head;
unsigned long   flags;
 
-   BUG_ON(is_sync_kiocb(kiocb));
-
-   if (kiocb->ki_flags & IOCB_WRITE) {
-   struct file *file = kiocb->ki_filp;
-
-   /*
-* Tell lockdep we inherited freeze protection from submission
-* thread.
-*/
-   if (S_ISREG(file_inode(file)->i_mode))
-   __sb_writers_acquired(file_inode(file)->i_sb, 
SB_FREEZE_WRITE);
-   file_end_write(file);
-   }
-
if (iocb->ki_list.next) {
unsigned long flags;
 
@@ -1163,11 +1141,12 @@ static void aio_complete(struct kiocb *kiocb, long res, 
long res2)
 * eventfd. The eventfd_signal() function is safe to be called
 * from IRQ context.
 */
-   if (iocb->ki_eventfd != NULL)
+   if (iocb->ki_eventfd) {
eventfd_signal(iocb->ki_eventfd, 1);
+   eventfd_ctx_put(iocb->ki_eventfd);
+   }
 
-   /* everything turned out well, dispose of the aiocb. */
-   kiocb_free(iocb);
+   kmem_cache_free(kiocb_cachep, iocb);
 
/*
 * We have to order our ring_info tail store above and test
@@ -1430,6 +1409,47 @@ SYSCALL_DEFINE1(io_destroy, aio_context_t, ctx)
return -EINVAL;
 }
 
+static void aio_complete_rw(struct kiocb *kiocb, long res, long res2)
+{
+   struct aio_kiocb *iocb = container_of(kiocb, struct aio_kiocb, rw);
+
+   WARN_ON_ONCE(is_sync_kiocb(kiocb));
+
+   if (kiocb->ki_flags & IOCB_WRITE) {
+   struct inode *inode = file_inode(kiocb->ki_filp);
+
+   /*
+* Tell lockdep we inherited freeze protection from submission
+* thread.
+*/
+   if (S_ISREG(inode->i_mode))
+   __sb_writers_acquired(inode->i_sb, SB_FREEZE_WRITE);
+   file_end_write(kiocb->ki_filp);
+   }
+
+   fput(kiocb->ki_filp);
+   aio_complete(iocb, res, res2);
+}
+
+static int aio_prep_rw(struct kiocb *req, struct iocb *iocb)
+{
+   int ret;
+
+   req->ki_filp = fget(iocb->aio_fildes);
+   if (unlikely(!req->ki_filp))
+   return -EBADF;
+   req->ki_complete = aio_complete_rw;
+   req->ki_pos = iocb->aio_offset;
+   req->ki_flags = iocb_flags(req->ki_filp);
+   if (iocb->aio_flags & IOCB_FLAG_RESFD)
+

[PATCH 7/9] aio: add delayed cancel support

2018-03-21 Thread Christoph Hellwig

The upcoming aio poll support would like to be able to complete the
iocb inline from the cancellation context, but that would cause
a lock order reversal.  Add support for optionally moving the cancelation
outside the context lock to avoid this reversal.

Signed-off-by: Christoph Hellwig 
Acked-by: Jeff Moyer 
---
 fs/aio.c | 49 ++---
 1 file changed, 38 insertions(+), 11 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 0b6394b4e528..9d7d6e4cde87 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -170,6 +170,10 @@ struct aio_kiocb {
struct list_headki_list;/* the aio core uses this
 * for cancellation */
 
+   unsigned intflags;  /* protected by ctx->ctx_lock */
+#define AIO_IOCB_DELAYED_CANCEL(1 << 0)
+#define AIO_IOCB_CANCELLED (1 << 1)
+
/*
 * If the aio_resfd field of the userspace iocb is not zero,
 * this is the underlying eventfd context to deliver events to.
@@ -536,9 +540,9 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int 
nr_events)
 #define AIO_EVENTS_FIRST_PAGE  ((PAGE_SIZE - sizeof(struct aio_ring)) / 
sizeof(struct io_event))
 #define AIO_EVENTS_OFFSET  (AIO_EVENTS_PER_PAGE - AIO_EVENTS_FIRST_PAGE)
 
-void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel)
+static void __kiocb_set_cancel_fn(struct aio_kiocb *req,
+   kiocb_cancel_fn *cancel, unsigned int iocb_flags)
 {
-   struct aio_kiocb *req = container_of(iocb, struct aio_kiocb, rw);
struct kioctx *ctx = req->ki_ctx;
unsigned long flags;
 
@@ -548,8 +552,15 @@ void kiocb_set_cancel_fn(struct kiocb *iocb, 
kiocb_cancel_fn *cancel)
spin_lock_irqsave(&ctx->ctx_lock, flags);
list_add_tail(&req->ki_list, &ctx->active_reqs);
req->ki_cancel = cancel;
+   req->flags |= iocb_flags;
spin_unlock_irqrestore(&ctx->ctx_lock, flags);
 }
+
+void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel)
+{
+   return __kiocb_set_cancel_fn(container_of(iocb, struct aio_kiocb, rw),
+   cancel, 0);
+}
 EXPORT_SYMBOL(kiocb_set_cancel_fn);
 
 /*
@@ -603,17 +614,27 @@ static void free_ioctx_users(struct percpu_ref *ref)
 {
struct kioctx *ctx = container_of(ref, struct kioctx, users);
struct aio_kiocb *req;
+   LIST_HEAD(list);
 
spin_lock_irq(&ctx->ctx_lock);
-
while (!list_empty(&ctx->active_reqs)) {
req = list_first_entry(&ctx->active_reqs,
   struct aio_kiocb, ki_list);
-   kiocb_cancel(req);
-   }
 
+   if (req->flags & AIO_IOCB_DELAYED_CANCEL) {
+   req->flags |= AIO_IOCB_CANCELLED;
+   list_move_tail(&req->ki_list, &list);
+   } else {
+   kiocb_cancel(req);
+   }
+   }
spin_unlock_irq(&ctx->ctx_lock);
 
+   while (!list_empty(&list)) {
+   req = list_first_entry(&list, struct aio_kiocb, ki_list);
+   kiocb_cancel(req);
+   }
+
percpu_ref_kill(&ctx->reqs);
percpu_ref_put(&ctx->reqs);
 }
@@ -1785,15 +1806,22 @@ SYSCALL_DEFINE3(io_cancel, aio_context_t, ctx_id, 
struct iocb __user *, iocb,
if (unlikely(!ctx))
return -EINVAL;
 
-   spin_lock_irq(&ctx->ctx_lock);
+   ret = -EINVAL;
 
+   spin_lock_irq(&ctx->ctx_lock);
kiocb = lookup_kiocb(ctx, iocb, key);
+   if (kiocb) {
+   if (kiocb->flags & AIO_IOCB_DELAYED_CANCEL) {
+   kiocb->flags |= AIO_IOCB_CANCELLED;
+   } else {
+   ret = kiocb_cancel(kiocb);
+   kiocb = NULL;
+   }
+   }
+   spin_unlock_irq(&ctx->ctx_lock);
+
if (kiocb)
ret = kiocb_cancel(kiocb);
-   else
-   ret = -EINVAL;
-
-   spin_unlock_irq(&ctx->ctx_lock);
 
if (!ret) {
/*
@@ -1805,7 +1833,6 @@ SYSCALL_DEFINE3(io_cancel, aio_context_t, ctx_id, struct 
iocb __user *, iocb,
}
 
percpu_ref_put(&ctx->users);
-
return ret;
 }
 
-- 
2.14.2

[PATCH 6/9] aio: delete iocbs from the active_reqs list in kiocb_cancel

2018-03-21 Thread Christoph Hellwig

One we cancel an iocb there is no reason to keep it on the active_reqs
list, given that the list is only used to look for cancelation candidates.

Signed-off-by: Christoph Hellwig 
Acked-by: Jeff Moyer 
---
 fs/aio.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 2d40cf5dd4ec..0b6394b4e528 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -561,6 +561,8 @@ static int kiocb_cancel(struct aio_kiocb *kiocb)
 {
kiocb_cancel_fn *cancel = kiocb->ki_cancel;
 
+   list_del_init(&kiocb->ki_list);
+
if (!cancel)
return -EINVAL;
kiocb->ki_cancel = NULL;
@@ -607,8 +609,6 @@ static void free_ioctx_users(struct percpu_ref *ref)
while (!list_empty(&ctx->active_reqs)) {
req = list_first_entry(&ctx->active_reqs,
   struct aio_kiocb, ki_list);
-
-   list_del_init(&req->ki_list);
kiocb_cancel(req);
}
 
-- 
2.14.2

[PATCH 9/9] aio: implement IOCB_CMD_FSYNC and IOCB_CMD_FDSYNC

2018-03-21 Thread Christoph Hellwig

Simple workqueue offload for now, but prepared for adding a real aio_fsync
method if the need arises.  Based on an earlier patch from Dave Chinner.

Signed-off-by: Christoph Hellwig 
---
 fs/aio.c | 50 ++
 1 file changed, 50 insertions(+)

diff --git a/fs/aio.c b/fs/aio.c
index da87cbf7c67a..79d3eb3d2dd9 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -156,9 +156,16 @@ struct kioctx {
unsignedid;
 };
 
+struct fsync_iocb {
+   struct work_struct  work;
+   struct file *file;
+   booldatasync;
+};
+
 struct aio_kiocb {
union {
struct kiocbrw;
+   struct fsync_iocb   fsync;
};
 
struct kioctx   *ki_ctx;
@@ -1565,6 +1572,43 @@ static ssize_t aio_write(struct kiocb *req, struct iocb 
*iocb, bool vectored,
return ret;
 }
 
+static void aio_fsync_work(struct work_struct *work)
+{
+   struct fsync_iocb *req = container_of(work, struct fsync_iocb, work);
+   int ret;
+
+   ret = vfs_fsync(req->file, req->datasync);
+   fput(req->file);
+   aio_complete(container_of(req, struct aio_kiocb, fsync), ret, 0);
+}
+
+static int aio_fsync(struct fsync_iocb *req, struct iocb *iocb, bool datasync)
+{
+   int ret;
+
+   if (iocb->aio_buf)
+   return -EINVAL;
+   if (iocb->aio_offset || iocb->aio_nbytes || iocb->aio_rw_flags)
+   return -EINVAL;
+
+   req->file = fget(iocb->aio_fildes);
+   if (unlikely(!req->file))
+   return -EBADF;
+
+   ret = -EINVAL;
+   if (!req->file->f_op->fsync)
+   goto out_fput;
+
+   req->datasync = datasync;
+   INIT_WORK(&req->work, aio_fsync_work);
+   schedule_work(&req->work);
+   return -EIOCBQUEUED;
+out_fput:
+   if (unlikely(ret && ret != -EIOCBQUEUED))
+   fput(req->file);
+   return ret;
+}
+
 static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 struct iocb *iocb, bool compat)
 {
@@ -1628,6 +1672,12 @@ static int io_submit_one(struct kioctx *ctx, struct iocb 
__user *user_iocb,
case IOCB_CMD_PWRITEV:
ret = aio_write(&req->rw, iocb, true, compat);
break;
+   case IOCB_CMD_FSYNC:
+   ret = aio_fsync(&req->fsync, iocb, false);
+   break;
+   case IOCB_CMD_FDSYNC:
+   ret = aio_fsync(&req->fsync, iocb, true);
+   break;
default:
pr_debug("invalid aio operation %d\n", iocb->aio_lio_opcode);
ret = -EINVAL;
-- 
2.14.2

[PATCH 8/9] aio: implement io_pgetevents

2018-03-21 Thread Christoph Hellwig

This is the io_getevents equivalent of ppoll/pselect and allows to
properly mix signals and aio completions (especially with IOCB_CMD_POLL)
and atomically executes the following sequence:

sigset_t origmask;

pthread_sigmask(SIG_SETMASK, &sigmask, &origmask);
ret = io_getevents(ctx, min_nr, nr, events, timeout);
pthread_sigmask(SIG_SETMASK, &origmask, NULL);

Note that unlike many other signal related calls we do not pass a sigmask
size, as that would get us to 7 arguments, which aren't easily supported
by the syscall infrastructure.  It seems a lot less painful to just add a
new syscall variant in the unlikely case we're going to increase the
sigset size.

Signed-off-by: Christoph Hellwig 
---
 arch/x86/entry/syscalls/syscall_32.tbl |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl |   1 +
 fs/aio.c   | 114 ++---
 include/linux/compat.h |   7 ++
 include/linux/syscalls.h   |   6 ++
 include/uapi/asm-generic/unistd.h  |   4 +-
 include/uapi/linux/aio_abi.h   |   6 ++
 kernel/sys_ni.c|   2 +
 8 files changed, 130 insertions(+), 11 deletions(-)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl 
b/arch/x86/entry/syscalls/syscall_32.tbl
index 2a5e99cff859..c1018580ddaa 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -391,3 +391,4 @@
 382i386pkey_free   sys_pkey_free
 383i386statx   sys_statx
 384i386arch_prctl  sys_arch_prctl  
compat_sys_arch_prctl
+385i386io_pgetevents   sys_io_pgetevents   
compat_sys_io_pgetevents
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl 
b/arch/x86/entry/syscalls/syscall_64.tbl
index 5aef183e2f85..e995cd2b4e65 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -339,6 +339,7 @@
 330common  pkey_alloc  sys_pkey_alloc
 331common  pkey_free   sys_pkey_free
 332common  statx   sys_statx
+333common  io_pgetevents   sys_io_pgetevents
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/fs/aio.c b/fs/aio.c
index 9d7d6e4cde87..da87cbf7c67a 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1291,10 +1291,6 @@ static long read_events(struct kioctx *ctx, long min_nr, 
long nr,
wait_event_interruptible_hrtimeout(ctx->wait,
aio_read_events(ctx, min_nr, nr, event, &ret),
until);
-
-   if (!ret && signal_pending(current))
-   ret = -EINTR;
-
return ret;
 }
 
@@ -1874,13 +1870,60 @@ SYSCALL_DEFINE5(io_getevents, aio_context_t, ctx_id,
struct timespec __user *, timeout)
 {
struct timespec64   ts;
+   int ret;
+
+   if (timeout && unlikely(get_timespec64(&ts, timeout)))
+   return -EFAULT;
+
+   ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL);
+   if (!ret && signal_pending(current))
+   ret = -EINTR;
+   return ret;
+}
+
+SYSCALL_DEFINE6(io_pgetevents,
+   aio_context_t, ctx_id,
+   long, min_nr,
+   long, nr,
+   struct io_event __user *, events,
+   struct timespec __user *, timeout,
+   const struct __aio_sigset __user *, usig)
+{
+   struct __aio_sigset ksig = { NULL, };
+   sigset_tksigmask, sigsaved;
+   struct timespec64   ts;
+   int ret;
+
+   if (timeout && unlikely(get_timespec64(&ts, timeout)))
+   return -EFAULT;
 
-   if (timeout) {
-   if (unlikely(get_timespec64(&ts, timeout)))
+   if (usig && copy_from_user(&ksig, usig, sizeof(ksig)))
+   return -EFAULT;
+
+   if (ksig.sigmask) {
+   if (ksig.sigsetsize != sizeof(sigset_t))
+   return -EINVAL;
+   if (copy_from_user(&ksigmask, ksig.sigmask, sizeof(ksigmask)))
return -EFAULT;
+   sigdelsetmask(&ksigmask, sigmask(SIGKILL) | sigmask(SIGSTOP));
+   sigprocmask(SIG_SETMASK, &ksigmask, &sigsaved);
+   }
+
+   ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL);
+   if (signal_pending(current)) {
+   if (ksig.sigmask) {
+   current->saved_sigmask = sigsaved;
+   set_restore_sigmask();
+   }
+
+   if (!ret)
+   ret = -ERESTARTNOHAND;
+   } else {
+   if (ksig.sigmask)
+   sigprocmask(SIG_SETMASK, &sigsaved, NULL);
}
 
-   return do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : 
NULL);
+   return ret;
 }
 
 #ifdef CONFIG_COMPAT
@@ -1891,13

[PATCH 5/9] aio: simplify cancellation

2018-03-21 Thread Christoph Hellwig

With the current aio code there is no need for the magic KIOCB_CANCELLED
value, as a cancelation just kicks the driver to queue the completion
ASAP, with all actual completion handling done in another thread. Given
that both the completion path and cancelation take the context lock there
is no need for magic cmpxchg loops either.

Signed-off-by: Christoph Hellwig 
Acked-by: Jeff Moyer 
---
 fs/aio.c | 37 +
 1 file changed, 9 insertions(+), 28 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index c32c315f05b5..2d40cf5dd4ec 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -156,19 +156,6 @@ struct kioctx {
unsignedid;
 };
 
-/*
- * We use ki_cancel == KIOCB_CANCELLED to indicate that a kiocb has been either
- * cancelled or completed (this makes a certain amount of sense because
- * successful cancellation - io_cancel() - does deliver the completion to
- * userspace).
- *
- * And since most things don't implement kiocb cancellation and we'd really 
like
- * kiocb completion to be lockless when possible, we use ki_cancel to
- * synchronize cancellation and completion - we only set it to KIOCB_CANCELLED
- * with xchg() or cmpxchg(), see batch_complete_aio() and kiocb_cancel().
- */
-#define KIOCB_CANCELLED((void *) (~0ULL))
-
 struct aio_kiocb {
union {
struct kiocbrw;
@@ -565,24 +552,18 @@ void kiocb_set_cancel_fn(struct kiocb *iocb, 
kiocb_cancel_fn *cancel)
 }
 EXPORT_SYMBOL(kiocb_set_cancel_fn);
 
+/*
+ * Only cancel if there ws a ki_cancel function to start with, and we
+ * are the one how managed to clear it (to protect against simulatinious
+ * cancel calls).
+ */
 static int kiocb_cancel(struct aio_kiocb *kiocb)
 {
-   kiocb_cancel_fn *old, *cancel;
-
-   /*
-* Don't want to set kiocb->ki_cancel = KIOCB_CANCELLED unless it
-* actually has a cancel function, hence the cmpxchg()
-*/
-
-   cancel = READ_ONCE(kiocb->ki_cancel);
-   do {
-   if (!cancel || cancel == KIOCB_CANCELLED)
-   return -EINVAL;
-
-   old = cancel;
-   cancel = cmpxchg(&kiocb->ki_cancel, old, KIOCB_CANCELLED);
-   } while (cancel != old);
+   kiocb_cancel_fn *cancel = kiocb->ki_cancel;
 
+   if (!cancel)
+   return -EINVAL;
+   kiocb->ki_cancel = NULL;
return cancel(&kiocb->rw);
 }
 
-- 
2.14.2

Re: [PATCH 1/5 v4] add compression algorithm zBeWalgo

2018-03-21 Thread Benjamin Warnke

Hi Philippe,


> Am 20.03.2018 um 17:30 schrieb Philippe Ombredanne :
> 
> Hi Benjamin,
> 
> On Tue, Mar 20, 2018 at 7:04 AM, Benjamin Warnke
> <4bwar...@informatik.uni-hamburg.de> wrote:
>> zBeWalgo is a completely new algorithm - Currently it is not published
>> somewhere else right now, googleing it would not show up any results. The
>> following section describes how the algorithm works.
> 
> 
> 
>> diff --git a/lib/zbewalgo/zbewalgo.c b/lib/zbewalgo/zbewalgo.c
>> new file mode 100644
>> index 0..ef922bc27
>> --- /dev/null
>> +++ b/lib/zbewalgo/zbewalgo.c
>> @@ -0,0 +1,723 @@
>> +/*
>> + * Copyright (c) 2018 Benjamin Warnke <4bwar...@informatik.uni-hamburg.de>
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms of the GNU General Public License version 2 as published 
>> by
>> + * the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful, but 
>> WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along 
>> with
>> + * this program.
>> + *
> 
> Would you mind using SPDX ids [1] instead of this fine boilerplate
> here and throughout your patches?

Ok, I will use 

/* SPDX-License-Identifier: GPL-2.0 */
/*
 * Copyright (c) 2018 Benjamin Warnke <4bwar...@informatik.uni-hamburg.de>
...

at the top of my files instead of that boilerplate text. And

MODULE_LICENSE("GPL");

at the bottom of the module-files.


> 
> 
>> +MODULE_LICENSE("GPL");
>> +MODULE_DESCRIPTION("zBeWalgo Compression Algorithm");
> 
> Here your MODULE_LICENSE does not match your top level license. See
> module.h [2] for a description of values: GPL would mean "GNU Public
> License v2 or later" whereas your top level license (best expressed
> with SPDX) would mean GPL-2.0 and no other version. To avoid
> confusion, you would need to state the same thing in the
> MODULE_LICENSE and your SPDX tags.

I used the file "crypto/lz4.c" - since it is a compression algorithm too - as 
an example of how to format the licensing text.
Unfortunately there is the same 'error'.
I fixed this error in all of my files in all patches.

Cordially
Benjamin Warnke

[PATCH 4/9] aio: sanitize ki_list handling

2018-03-21 Thread Christoph Hellwig

Instead of handcoded non-null checks always initialize ki_list to an
empty list and use list_empty / list_empty_careful on it.  While we're
at it also error out on a double call to kiocb_set_cancel_fn instead
of ignoring it.

Signed-off-by: Christoph Hellwig 
Acked-by: Jeff Moyer 
Reviewed-by: Darrick J. Wong 
---
 fs/aio.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 6295fc00f104..c32c315f05b5 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -555,13 +555,12 @@ void kiocb_set_cancel_fn(struct kiocb *iocb, 
kiocb_cancel_fn *cancel)
struct kioctx *ctx = req->ki_ctx;
unsigned long flags;
 
-   spin_lock_irqsave(&ctx->ctx_lock, flags);
-
-   if (!req->ki_list.next)
-   list_add(&req->ki_list, &ctx->active_reqs);
+   if (WARN_ON_ONCE(!list_empty(&req->ki_list)))
+   return;
 
+   spin_lock_irqsave(&ctx->ctx_lock, flags);
+   list_add_tail(&req->ki_list, &ctx->active_reqs);
req->ki_cancel = cancel;
-
spin_unlock_irqrestore(&ctx->ctx_lock, flags);
 }
 EXPORT_SYMBOL(kiocb_set_cancel_fn);
@@ -1034,7 +1033,7 @@ static inline struct aio_kiocb *aio_get_req(struct kioctx 
*ctx)
goto out_put;
 
percpu_ref_get(&ctx->reqs);
-
+   INIT_LIST_HEAD(&req->ki_list);
req->ki_ctx = ctx;
return req;
 out_put:
@@ -1080,7 +1079,7 @@ static void aio_complete(struct aio_kiocb *iocb, long 
res, long res2)
unsigned tail, pos, head;
unsigned long   flags;
 
-   if (iocb->ki_list.next) {
+   if (!list_empty_careful(iocb->ki_list.next)) {
unsigned long flags;
 
spin_lock_irqsave(&ctx->ctx_lock, flags);
-- 
2.14.2

[PATCH 2/9] aio: remove an outdated comment in aio_complete

2018-03-21 Thread Christoph Hellwig

These days we don't treat sync iocbs special in the aio completion code as
they never use it.  Remove the old comment, and move the BUG_ON for a sync
iocb to the top of the function.

Signed-off-by: Christoph Hellwig 
Acked-by: Jeff Moyer 
Reviewed-by: Darrick J. Wong 
---
 fs/aio.c | 11 ++-
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 03d59593912d..41fc8ce6bc7f 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1088,6 +1088,8 @@ static void aio_complete(struct kiocb *kiocb, long res, 
long res2)
unsigned tail, pos, head;
unsigned long   flags;
 
+   BUG_ON(is_sync_kiocb(kiocb));
+
if (kiocb->ki_flags & IOCB_WRITE) {
struct file *file = kiocb->ki_filp;
 
@@ -1100,15 +1102,6 @@ static void aio_complete(struct kiocb *kiocb, long res, 
long res2)
file_end_write(file);
}
 
-   /*
-* Special case handling for sync iocbs:
-*  - events go directly into the iocb for fast handling
-*  - the sync task with the iocb in its stack holds the single iocb
-*ref, no other paths have a way to get another ref
-*  - the sync task helpfully left a reference to itself in the iocb
-*/
-   BUG_ON(is_sync_kiocb(kiocb));
-
if (iocb->ki_list.next) {
unsigned long flags;
 
-- 
2.14.2

Re: [PATCH 2/3] i2c: mux: pca9541: namespace cleanup

2018-03-21 Thread Peter Rosin

On 2018-03-21 07:54, Vladimir Zapolskiy wrote:
> Hi Peter,
> 
> On 03/21/2018 07:53 AM, Peter Rosin wrote:
>> On 2018-03-21 00:24, Vladimir Zapolskiy wrote:
>>> Hi Peter,
>>>
>>> On 03/20/2018 11:31 AM, Peter Rosin wrote:
 In preparation for PCA9641 support, convert the mybus and busoff macros
 to functions, and in the process prefix them with pca9541_. Also prefix
 remaining chip specific macros with PCA9541_.

 Signed-off-by: Peter Rosin 
 ---
  drivers/i2c/muxes/i2c-mux-pca9541.c | 26 +++---
  1 file changed, 19 insertions(+), 7 deletions(-)

 diff --git a/drivers/i2c/muxes/i2c-mux-pca9541.c 
 b/drivers/i2c/muxes/i2c-mux-pca9541.c
 index ad168125d23d..47685eb4e0e9 100644
 --- a/drivers/i2c/muxes/i2c-mux-pca9541.c
 +++ b/drivers/i2c/muxes/i2c-mux-pca9541.c
 @@ -59,10 +59,8 @@
  #define PCA9541_ISTAT_MYTEST  BIT(6)
  #define PCA9541_ISTAT_NMYTEST BIT(7)
  
 -#define BUSON (PCA9541_CTL_BUSON | PCA9541_CTL_NBUSON)
 -#define MYBUS (PCA9541_CTL_MYBUS | PCA9541_CTL_NMYBUS)
 -#define mybus(x)  (!((x) & MYBUS) || ((x) & MYBUS) == MYBUS)
 -#define busoff(x) (!((x) & BUSON) || ((x) & BUSON) == BUSON)
 +#define PCA9541_BUSON (PCA9541_CTL_BUSON | PCA9541_CTL_NBUSON)
 +#define PCA9541_MYBUS (PCA9541_CTL_MYBUS | PCA9541_CTL_NMYBUS)
  
  /* arbitration timeouts, in jiffies */
  #define ARB_TIMEOUT   (HZ / 8)/* 125 ms until forcing bus 
 ownership */
 @@ -93,6 +91,20 @@ static const struct of_device_id pca9541_of_match[] = {
  MODULE_DEVICE_TABLE(of, pca9541_of_match);
  #endif
  
 +static int pca9541_mybus(int ctl)
>>>
>>> static inline?
>>
>> No, "inline" is only used in header files in the kernel. 
> 
> No, it is an incorrect statement, you should be aware of that.

Yeah, that was sloppy wording on my part. Let's say I meant useful
instead of used. My point is that inline is quite useless (in a C
file), the compiler will do its thing anyway. Rhetorical question:
what is the point of having both noinline and __always_inline?
Because plain old inline is overridden by the compiler, just like
the register keyword.

>> The compiler is free to inline whatever function it likes anyway, and
>> in this case we do not know better than the compiler. We don't care
> 
> That's a candidate case, when we could know better than the compiler.

Could we? Maybe for specific compilers and architectures, but
probably not for all cases. And the future is in the cards etc. And
we don't actually know even for current compilers. Also, quoting
Documentation/process/4.Coding

More recent compilers take an increasingly active role in deciding
whether a given function should actually be inlined or not.  So the
liberal placement of "inline" keywords may not just be excessive; it
could also be irrelevant.

> But "don't care" argument is still valid :)

Yes :-)

Cheers,
Peter

io_pgetevents & aio fsync

2018-03-21 Thread Christoph Hellwig

Hi all,

this patch adds workqueue based fsync offload.  Version of this
patch have been floating around for a couple years, but we now
have a user with seastar used by ScyllaDB (who sponsored this
work) that really wants this in addition to the aio poll support.
More details are in the patch itself.

Because the iocb types have been defined sine day one (and probably
were supported by RHEL3) libaio already supports these calls as-is.

This also pulls in the aio cleanups and io_pgetevents support previously
submitted and review as part of the aio poll series.  The aio poll
series will be resubmitted on top of this series

A git tree is available here:

git://git.infradead.org/users/hch/vfs.git aio-fsync.2

Gitweb:

http://git.infradead.org/users/hch/vfs.git/shortlog/refs/heads/aio-fsync.2

Re: [PATCH RFC 2/2] virtio_ring: support packed ring

2018-03-21 Thread Tiwei Bie

On Fri, Mar 16, 2018 at 04:30:02PM +0200, Michael S. Tsirkin wrote:
> On Fri, Mar 16, 2018 at 07:36:47PM +0800, Jason Wang wrote:
> > > > @@ -1096,17 +1599,21 @@ struct virtqueue *vring_create_virtqueue(
> > > > > > > > >   if (!queue) {
> > > > > > > > >   /* Try to get a single page. You are my only 
> > > > > > > > > hope! */
> > > > > > > > > - queue = vring_alloc_queue(vdev, vring_size(num, 
> > > > > > > > > vring_align),
> > > > > > > > > + queue = vring_alloc_queue(vdev, 
> > > > > > > > > __vring_size(num, vring_align,
> > > > > > > > > +  
> > > > > > > > > packed),
> > > > > > > > > &dma_addr, 
> > > > > > > > > GFP_KERNEL|__GFP_ZERO);
> > > > > > > > >   }
> > > > > > > > >   if (!queue)
> > > > > > > > >   return NULL;
> > > > > > > > > - queue_size_in_bytes = vring_size(num, vring_align);
> > > > > > > > > - vring_init(&vring, num, queue, vring_align);
> > > > > > > > > + queue_size_in_bytes = __vring_size(num, vring_align, 
> > > > > > > > > packed);
> > > > > > > > > + if (packed)
> > > > > > > > > + vring_packed_init(&vring.vring_packed, num, 
> > > > > > > > > queue, vring_align);
> > > > > > > > > + else
> > > > > > > > > + vring_init(&vring.vring_split, num, queue, 
> > > > > > > > > vring_align);
> > > > > > > > Let's rename vring_init to vring_init_split() like other 
> > > > > > > > helpers?
> > > > > > > The vring_init() is a public API in 
> > > > > > > include/uapi/linux/virtio_ring.h.
> > > > > > > I don't think we can rename it.
> > > > > > I see, then this need more thoughts to unify the API.
> > > > > My thought is to keep the old API as is, and introduce
> > > > > new types and helpers for packed ring.
> > > > I admit it's not a fault of this patch. But we'd better think of this 
> > > > in the
> > > > future, consider we may have new kinds of ring.
> > > > 
> > > > > More details can be found in this patch:
> > > > > https://lkml.org/lkml/2018/2/23/243
> > > > > (PS. The type which has bit fields is just for reference,
> > > > >and will be changed in next version.)
> > > > > 
> > > > > Do you have any other suggestions?
> > > > No.
> > > Hmm.. Sorry, I didn't describe my question well.
> > > I mean do you have any suggestions about the API
> > > design for packed ring in uapi header? Currently
> > > I introduced below two new helpers:
> > > 
> > > static inline void vring_packed_init(struct vring_packed *vr, unsigned 
> > > int num,
> > >void *p, unsigned long align);
> > > static inline unsigned vring_packed_size(unsigned int num, unsigned long 
> > > align);
> > > 
> > > When new rings are introduced in the future, above
> > > helpers can't be reused. Maybe we should make the
> > > helpers be able to determine the ring type?
> > 
> > Let's wait for Michael's comment here. Generally, I fail to understand why
> > vring_init() become a part of uapi. Git grep shows the only use cases are
> > virtio_test/vringh_test.
> > 
> > Thanks
> 
> For init - I think it's a mistake that stems from lguest which sometimes
> made it less than obvious which code is where.  I don't see a reason to
> add to it.

Got it! I'll move vring_packed_init() out of uapi. Many thanks! :)

Best regards,
Tiwei Bie

> 
> -- 
> MST

Re: [PATCH 15/19] csky: Build infrastructure

2018-03-21 Thread Arnd Bergmann

On Tue, Mar 20, 2018 at 9:13 PM, Guo Ren  wrote:
> Hi arnd,
>
> On Mon, Mar 19, 2018 at 11:45:23PM +0800, Arnd Bergmann wrote:
>> Does your architecture provide a reliable high-reslution clocksource?
>> If yes, you
>> could use that for the delay, rather than a calibrated loop.
> Currently, all boards have clocksource drivers and the reslution is depend on 
> SOC.
> I'll try to remove it.

If the clocksource depends on a driver rather than a feature of the
architecture,
this may not be worth optimizing though, so maybe leave it as it is for now.

>> Usually the kernel should allow multiple CPU types to be selected
>> together, or ask for a "minimum architecture" level to be selected
>> by allow newer cores to be used as a superset.
> No, I need keep them seperate.

Can you explain? What is it that makes them all incompatible?

>> > +config CPU_TLB_SIZE
>> > +   int
>> > +   default "128"   if(CPU_CK610 || CPU_CK807 || CPU_CK810)
>> > +   default "1024"  if(CPU_CK860)
>> > +
>> > +config L1_CACHE_SHIFT
>> > +   int
>> > +   default "4" if(CPU_CK610)
>> > +   default "5" if(CPU_CK807 || CPU_CK810)
>> > +   default "6" if(CPU_CK860)
>>
>> I think you then need to reverse the order of the list here: When e.g. CK860
>> and CK810 are both enabled, L1_CACHE_SHIFT should be the largest
>> possible size.
> No, I use L1_CACHE_SHIFT to determine the size of cache_line.
> When I flush cache for a range of memory, I need the size to loop flush cache 
> line.

This is still relatively easy to fix, you just need a cpu specific loop
that uses the actual line size rather than the maximum size.

>> > +config SSEG0_BASE
>> > +   hex "Direct mapping physical address"
>> > +   default 0x0
>> > +   help
>> > + There are MSAx regs can be used to change the base physical 
>> > address
>> > + of direct mapping. The default base physical address is 0x0.
>> > +
>> > +config RAM_BASE
>> > +   hex "DRAM base address offset from SSEG0_BASE, it must be the same 
>> > with dts memory."
>> > +   default 0x0800
>>
>> To allow one kernel to run on multiple boards, it's better to detect
>> these two at runtime.
> CK-CPUs have a mips-like direct-mapping, and I use the macros to calculate 
> the virtual-addr
> in headers.

On many architectures, we detect the offsets at boot time and pass
them as variables. On
ARM, we go as far as patching the kernel at boot time to have constant
offsets, but usually
it's not worth the effort.

>> > +config CSKY_NR_IRQS
>> > +   int "NR_IRQS to max virtual interrupt numbers of the whole system"
>> > +   range 64 8192
>> > +   default "128"
>> > +endmenu
>>
>> This should no longer be needed, with the IRQ domain code, any number
>> of interrupts
>> can be used without noticeable overhead.
> Not I use it, some of our users need it to expand the GPIO irqs. Because
> they don't use irq domain code properly. I move it to Kconfig.debug, OK?

It sounds like your GPIO driver should get fixed to use irq domains right,
it should not be too hard. The number of GPIOs is typically a compile
time constant today, but we also try to turn it into a dynamic allocation
that we have for IRQs on most targets.

>> > +config CSKY_BUILTIN_DTB
>> > +   bool "Use kernel builtin dtb"
>> > +   default n
>> > +
>> > +config CSKY_BUILTIN_DTB_NAME
>> > +   string "kernel builtin dtb name"
>> > +   depends on CSKY_BUILTIN_DTB
>>
>> It's generally better not to use a builtin dtb, but use the bootloader
>> to pass a dtb.
>>
>> If you need to support existing bootloaders, the best way is to allow
>> appending the dtb to the kernel.
> Most of our boards use bootloader to pass the dtb, but Hangzhou
> Nationalchip want dtb compiled in the vmlinux. So I keep it in
> Kconfig.debug.

What I meant here is that you can get the same behavior by
appending the dtb to the kernel rather than linking it into the
kernel. The reason for preferring the appended one is that you
can more easily use the same kernel binary across boards with
different bootloaders.

>> > +ifeq ($(VERSION)_$(PATCHLEVEL), 4_9)
>> > +COMPAT_KERNEL_4_9 = -DCOMPAT_KERNEL_4_9
>> > +endif
>>
>> Should not be needed
> May I keep it? It's a very internal macro for arch/csky and I can
> maintain the linux-4.9 together.

I'd say it's better to get rid of it for the upstream port, more importantly
getting rid of the code that checks for this symbol. Usually what happens
with version checks like this one is that they get out of sync quickly
as a new kernel version does things differently and diverges more
from the old release you were comparing against. In device drivers,
we tend to remove all those checks.

>> -fno-tree-dse?
> This is from "gcc-4.5 compile linux-4.7" and it will cause wrong code without
> -fno-tree-dse for list.h. Now we use gcc-6.3, so I will try to remove it.

You can also use the cc-ifversion Makefile macro to apply it on
the old compiler. That way you can still use gc

Re: [PATCH] [RFC] drm: rcar-du: keep temporary dtb files around during build

2018-03-21 Thread Arnd Bergmann

On Tue, Mar 20, 2018 at 9:15 PM, Laurent Pinchart
 wrote:
> Hi Arnd,
>
> On Friday, 16 March 2018 10:25:25 EET Arnd Bergmann wrote:
>> On Fri, Mar 16, 2018 at 2:39 AM,   wrote:
>> > On Thursday, March 15, 2018 8:37 AM, Arnd Bergmann wrote:
>> >> The *.dtb and *.dtb.S files get removed by 'make' during the build
>> >> process,
>> >> and later seem to be missed during the 'modpost' stage:
>> >>
>> >> rm drivers/gpu/drm/rcar-du/rcar_du_of_lvds_r8a7795.dtb
>> >> drivers/gpu/drm/rcar-du/rcar_du_of_lvds_r8a7791.dtb
>> >> drivers/gpu/drm/rcar-du/rcar_du_of_lvds_r8a7791.dtb.S
>> >> drivers/gpu/drm/rcar-du/rcar_du_of_lvds_r8a7795.dtb.S
>> >> drivers/gpu/drm/rcar-du/rcar_du_of_lvds_r8a7790.dtb.S
>> >> drivers/gpu/drm/rcar-du/rcar_du_of_lvds_r8a7793.dtb
>> >> drivers/gpu/drm/rcar-du/rcar_du_of_lvds_r8a7796.dtb
>> >> drivers/gpu/drm/rcar-du/rcar_du_of_lvds_r8a7790.dtb
>> >> drivers/gpu/drm/rcar-du/rcar_du_of_lvds_r8a7796.dtb.S
>> >> drivers/gpu/drm/rcar-du/rcar_du_of_lvds_r8a7793.dtb.S
>> >> WARNING: could not open
>> >> drivers/gpu/drm/rcar-du/rcar_du_of_lvds_r8a7790.dtb.S: No such file or
>> >> directory
>> >>
>> >> As a workaround, this adds all those files to the 'extra-y' target list,
>> >> but that's really ugly. Any ideas for a better fix?
>> >
>> > Does this work for you (untested, but the way it is done in
>> > drivers/of/unittest-data/Makefile):
>> >
>> > .PRECIOUS: \
>> >
>> > $(obj)/%.dtb.S \
>> > $(obj)/%.dtb
>>
>> Yes, that works and looks much better than my version.
>
> Thank you for your patch, and sorry for breaking the build. Do you plan to
> submit a new version based on Frank's approach ?

I'm currently at Linaro Connect and won't be able to send a tested patch
before mid next week. If you want it earlier, feel free to apply that patch
with my original description and 'Reported-by: Arnd Bergmann '.

Arnd

aio poll and a new in-kernel poll API V6

2018-03-21 Thread Christoph Hellwig

Hi all,

this series adds support for the IOCB_CMD_POLL operation to poll for the
readyness of file descriptors using the aio subsystem.  The API is based
on patches that existed in RHAS2.1 and RHEL3, which means it already is
supported by libaio.  To implement the poll support efficiently new
methods to poll are introduced in struct file_operations:  get_poll_head
and poll_mask.  The first one returns a wait_queue_head to wait on
(lifetime is bound by the file), and the second does a non-blocking
check for the POLL* events.  This allows aio poll to work without
any additional context switches, unlike epoll.

This series sits on top of the aio-fsync series that also includes
support for io_pgetevents.

The changes were sponsored by Scylladb, and improve performance
of the seastar framework up to 10%, while also removing the need
for a privileged SCHED_FIFO epoll listener thread.

git://git.infradead.org/users/hch/vfs.git aio-poll.6

Gitweb:

http://git.infradead.org/users/hch/vfs.git/shortlog/refs/heads/aio-poll.6

Libaio changes:

https://pagure.io/libaio.git io-poll

Seastar changes (not updated for the new io_pgetevens ABI yet):

https://github.com/avikivity/seastar/commits/aio

Changes since V6:
 - small changelog updates
 - rebased on top of the aio-fsync changes

Changes since V4:
 - rebased ontop of Linux 4.16-rc4

Changes since V3:
 - remove the pre-sleep ->poll_mask call in vfs_poll,
   allow ->get_poll_head to return POLL* values.

Changes since V2:
 - removed a double initialization
 - new vfs_get_poll_head helper
 - document that ->get_poll_head can return NULL
 - call ->poll_mask before sleeping
 - various ACKs
 - add conversion of random to ->poll_mask
 - add conversion of af_alg to ->poll_mask
 - lacking ->poll_mask support now returns -EINVAL for IOCB_CMD_POLL
 - reshuffled the series so that prep patches and everything not
   requiring the new in-kernel poll API is in the beginning

Changes since V1:
 - handle the NULL ->poll case in vfs_poll
 - dropped the file argument to the ->poll_mask socket operation
 - replace the ->pre_poll socket operation with ->get_poll_head as
   in the file operations

[PATCH 03/28] fs: update documentation to mention __poll_t

2018-03-21 Thread Christoph Hellwig

Signed-off-by: Christoph Hellwig 
---
 Documentation/filesystems/Locking | 2 +-
 Documentation/filesystems/vfs.txt | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/Locking 
b/Documentation/filesystems/Locking
index 75d2d57e2c44..220bba28f72b 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -439,7 +439,7 @@ prototypes:
ssize_t (*read_iter) (struct kiocb *, struct iov_iter *);
ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
int (*iterate) (struct file *, struct dir_context *);
-   unsigned int (*poll) (struct file *, struct poll_table_struct *);
+   __poll_t (*poll) (struct file *, struct poll_table_struct *);
long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
int (*mmap) (struct file *, struct vm_area_struct *);
diff --git a/Documentation/filesystems/vfs.txt 
b/Documentation/filesystems/vfs.txt
index 5fd325df59e2..f608180ad59d 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -856,7 +856,7 @@ struct file_operations {
ssize_t (*read_iter) (struct kiocb *, struct iov_iter *);
ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
int (*iterate) (struct file *, struct dir_context *);
-   unsigned int (*poll) (struct file *, struct poll_table_struct *);
+   __poll_t (*poll) (struct file *, struct poll_table_struct *);
long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
int (*mmap) (struct file *, struct vm_area_struct *);
-- 
2.14.2

[PATCH 14/28] net/atm: convert to ->poll_mask

2018-03-21 Thread Christoph Hellwig

Signed-off-by: Christoph Hellwig 
---
 net/atm/common.c | 11 +++
 net/atm/common.h |  2 +-
 net/atm/pvc.c|  2 +-
 net/atm/svc.c|  2 +-
 4 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/net/atm/common.c b/net/atm/common.c
index fc78a0508ae1..1f2af59935db 100644
--- a/net/atm/common.c
+++ b/net/atm/common.c
@@ -648,16 +648,11 @@ int vcc_sendmsg(struct socket *sock, struct msghdr *m, 
size_t size)
return error;
 }
 
-__poll_t vcc_poll(struct file *file, struct socket *sock, poll_table *wait)
+__poll_t vcc_poll_mask(struct socket *sock, __poll_t events)
 {
struct sock *sk = sock->sk;
-   struct atm_vcc *vcc;
-   __poll_t mask;
-
-   sock_poll_wait(file, sk_sleep(sk), wait);
-   mask = 0;
-
-   vcc = ATM_SD(sock);
+   struct atm_vcc *vcc = ATM_SD(sock);
+   __poll_t mask = 0;
 
/* exceptional events */
if (sk->sk_err)
diff --git a/net/atm/common.h b/net/atm/common.h
index 5850649068bb..526796ad230f 100644
--- a/net/atm/common.h
+++ b/net/atm/common.h
@@ -17,7 +17,7 @@ int vcc_connect(struct socket *sock, int itf, short vpi, int 
vci);
 int vcc_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
int flags);
 int vcc_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len);
-__poll_t vcc_poll(struct file *file, struct socket *sock, poll_table *wait);
+__poll_t vcc_poll_mask(struct socket *sock, __poll_t events);
 int vcc_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg);
 int vcc_compat_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg);
 int vcc_setsockopt(struct socket *sock, int level, int optname,
diff --git a/net/atm/pvc.c b/net/atm/pvc.c
index e1140b3bdcaa..930651c5e77c 100644
--- a/net/atm/pvc.c
+++ b/net/atm/pvc.c
@@ -114,7 +114,7 @@ static const struct proto_ops pvc_proto_ops = {
.socketpair =   sock_no_socketpair,
.accept =   sock_no_accept,
.getname =  pvc_getname,
-   .poll = vcc_poll,
+   .poll_mask =vcc_poll_mask,
.ioctl =vcc_ioctl,
 #ifdef CONFIG_COMPAT
.compat_ioctl = vcc_compat_ioctl,
diff --git a/net/atm/svc.c b/net/atm/svc.c
index c458adcbc177..ad0e6ffb9cfe 100644
--- a/net/atm/svc.c
+++ b/net/atm/svc.c
@@ -637,7 +637,7 @@ static const struct proto_ops svc_proto_ops = {
.socketpair =   sock_no_socketpair,
.accept =   svc_accept,
.getname =  svc_getname,
-   .poll = vcc_poll,
+   .poll_mask =vcc_poll_mask,
.ioctl =svc_ioctl,
 #ifdef CONFIG_COMPAT
.compat_ioctl = svc_compat_ioctl,
-- 
2.14.2

[PATCH 08/28] net: add support for ->poll_mask in proto_ops

2018-03-21 Thread Christoph Hellwig

The socket file operations still implement ->poll until all protocols are
switched over.

Signed-off-by: Christoph Hellwig 
---
 include/linux/net.h |  3 +++
 net/socket.c| 51 ++-
 2 files changed, 49 insertions(+), 5 deletions(-)

diff --git a/include/linux/net.h b/include/linux/net.h
index 91216b16feb7..ce3d4dacb51e 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -147,6 +147,9 @@ struct proto_ops {
int (*getname)   (struct socket *sock,
  struct sockaddr *addr,
  int *sockaddr_len, int peer);
+   struct wait_queue_head *(*get_poll_head)(struct socket *sock,
+ __poll_t events);
+   __poll_t(*poll_mask) (struct socket *sock, __poll_t events);
__poll_t(*poll)  (struct file *file, struct socket *sock,
  struct poll_table_struct *wait);
int (*ioctl) (struct socket *sock, unsigned int cmd,
diff --git a/net/socket.c b/net/socket.c
index 3f859a07641a..ceb69ddcd7bd 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -118,8 +118,10 @@ static ssize_t sock_write_iter(struct kiocb *iocb, struct 
iov_iter *from);
 static int sock_mmap(struct file *file, struct vm_area_struct *vma);
 
 static int sock_close(struct inode *inode, struct file *file);
-static __poll_t sock_poll(struct file *file,
- struct poll_table_struct *wait);
+static struct wait_queue_head *sock_get_poll_head(struct file *file,
+   __poll_t events);
+static __poll_t sock_poll_mask(struct file *file, __poll_t);
+static __poll_t sock_poll(struct file *file, struct poll_table_struct *wait);
 static long sock_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
 #ifdef CONFIG_COMPAT
 static long compat_sock_ioctl(struct file *file,
@@ -142,6 +144,8 @@ static const struct file_operations socket_file_ops = {
.llseek =   no_llseek,
.read_iter =sock_read_iter,
.write_iter =   sock_write_iter,
+   .get_poll_head = sock_get_poll_head,
+   .poll_mask =sock_poll_mask,
.poll = sock_poll,
.unlocked_ioctl = sock_ioctl,
 #ifdef CONFIG_COMPAT
@@ -1114,14 +1118,51 @@ int sock_create_lite(int family, int type, int 
protocol, struct socket **res)
 }
 EXPORT_SYMBOL(sock_create_lite);
 
+static struct wait_queue_head *sock_get_poll_head(struct file *file,
+   __poll_t events)
+{
+   struct socket *sock = file->private_data;
+
+   if (!sock->ops->poll_mask)
+   return NULL;
+   if (sock->ops->get_poll_head)
+   return sock->ops->get_poll_head(sock, events);
+
+   sock_poll_busy_loop(sock, events);
+   return sk_sleep(sock->sk);
+}
+
+static __poll_t sock_poll_mask(struct file *file, __poll_t events)
+{
+   struct socket *sock = file->private_data;
+
+   /*
+* We need to be sure we are in sync with the socket flags modification.
+*
+* This memory barrier is paired in the wq_has_sleeper.
+*/
+   smp_mb();
+
+   /* this socket can poll_ll so tell the system call */
+   return sock->ops->poll_mask(sock, events) |
+   (sk_can_busy_loop(sock->sk) ? POLL_BUSY_LOOP : 0);
+}
+
 /* No kernel lock held - perfect */
 static __poll_t sock_poll(struct file *file, poll_table *wait)
 {
struct socket *sock = file->private_data;
-   __poll_t events = poll_requested_events(wait);
+   __poll_t events = poll_requested_events(wait), mask = 0;
 
-   sock_poll_busy_loop(sock, events);
-   return sock->ops->poll(file, sock, wait) | sock_poll_busy_flag(sock);
+   if (sock->ops->poll) {
+   sock_poll_busy_loop(sock, events);
+   mask = sock->ops->poll(file, sock, wait);
+   } else if (sock->ops->poll_mask) {
+   sock_poll_wait(file, sock_get_poll_head(file, events), wait);
+   mask = sock->ops->poll_mask(sock, events);
+   }
+
+   return mask | sock_poll_busy_flag(sock);
 }
 
 static int sock_mmap(struct file *file, struct vm_area_struct *vma)
-- 
2.14.2

[PATCH 10/28] net/tcp: convert to ->poll_mask

2018-03-21 Thread Christoph Hellwig

Signed-off-by: Christoph Hellwig 
---
 include/net/tcp.h   |  4 ++--
 net/ipv4/af_inet.c  |  3 ++-
 net/ipv4/tcp.c  | 31 ++-
 net/ipv6/af_inet6.c |  3 ++-
 4 files changed, 20 insertions(+), 21 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index e3fc667f9ac2..fb52f93d556c 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -387,8 +387,8 @@ bool tcp_peer_is_proven(struct request_sock *req, struct 
dst_entry *dst);
 void tcp_close(struct sock *sk, long timeout);
 void tcp_init_sock(struct sock *sk);
 void tcp_init_transfer(struct sock *sk, int bpf_op);
-__poll_t tcp_poll(struct file *file, struct socket *sock,
- struct poll_table_struct *wait);
+struct wait_queue_head *tcp_get_poll_head(struct socket *sock, __poll_t 
events);
+__poll_t tcp_poll_mask(struct socket *sock, __poll_t events);
 int tcp_getsockopt(struct sock *sk, int level, int optname,
   char __user *optval, int __user *optlen);
 int tcp_setsockopt(struct sock *sk, int level, int optname,
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index e4329e161943..ec32cc263b18 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -952,7 +952,8 @@ const struct proto_ops inet_stream_ops = {
.socketpair= sock_no_socketpair,
.accept= inet_accept,
.getname   = inet_getname,
-   .poll  = tcp_poll,
+   .get_poll_head = tcp_get_poll_head,
+   .poll_mask = tcp_poll_mask,
.ioctl = inet_ioctl,
.listen= inet_listen,
.shutdown  = inet_shutdown,
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 48636aee23c3..ad8e281066a0 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -484,33 +484,30 @@ static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
}
 }
 
+struct wait_queue_head *tcp_get_poll_head(struct socket *sock, __poll_t events)
+{
+   sock_poll_busy_loop(sock, events);
+   sock_rps_record_flow(sock->sk);
+   return sk_sleep(sock->sk);
+}
+EXPORT_SYMBOL(tcp_get_poll_head);
+
 /*
- * Wait for a TCP event.
- *
- * Note that we don't need to lock the socket, as the upper poll layers
- * take care of normal races (between the test and the event) and we don't
- * go look at any of the socket buffers directly.
+ * Socket is not locked. We are protected from async events by poll logic and
+ * correct handling of state changes made by other threads is impossible in
+ * any case.
  */
-__poll_t tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
+__poll_t tcp_poll_mask(struct socket *sock, __poll_t events)
 {
-   __poll_t mask;
struct sock *sk = sock->sk;
const struct tcp_sock *tp = tcp_sk(sk);
+   __poll_t mask = 0;
int state;
 
-   sock_poll_wait(file, sk_sleep(sk), wait);
-
state = inet_sk_state_load(sk);
if (state == TCP_LISTEN)
return inet_csk_listen_poll(sk);
 
-   /* Socket is not locked. We are protected from async events
-* by poll logic and correct handling of state changes
-* made by other threads is impossible in any case.
-*/
-
-   mask = 0;
-
/*
 * EPOLLHUP is certainly not done right. But poll() doesn't
 * have a notion of HUP in just one direction, and for a
@@ -591,7 +588,7 @@ __poll_t tcp_poll(struct file *file, struct socket *sock, 
poll_table *wait)
 
return mask;
 }
-EXPORT_SYMBOL(tcp_poll);
+EXPORT_SYMBOL(tcp_poll_mask);
 
 int tcp_ioctl(struct sock *sk, int cmd, unsigned long arg)
 {
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 416917719a6f..c470549d6ef9 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -547,7 +547,8 @@ const struct proto_ops inet6_stream_ops = {
.socketpair= sock_no_socketpair,/* a do nothing */
.accept= inet_accept,   /* ok   */
.getname   = inet6_getname,
-   .poll  = tcp_poll,  /* ok   */
+   .get_poll_head = tcp_get_poll_head,
+   .poll_mask = tcp_poll_mask, /* ok   */
.ioctl = inet6_ioctl,   /* must change  */
.listen= inet_listen,   /* ok   */
.shutdown  = inet_shutdown, /* ok   */
-- 
2.14.2

[PATCH 17/28] net/sctp: convert to ->poll_mask

2018-03-21 Thread Christoph Hellwig

Signed-off-by: Christoph Hellwig 
---
 include/net/sctp/sctp.h | 3 +--
 net/sctp/ipv6.c | 2 +-
 net/sctp/protocol.c | 2 +-
 net/sctp/socket.c   | 4 +---
 4 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index f7ae6b0a21d0..37abd5ba4a3f 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -107,8 +107,7 @@ int sctp_backlog_rcv(struct sock *sk, struct sk_buff *skb);
 int sctp_inet_listen(struct socket *sock, int backlog);
 void sctp_write_space(struct sock *sk);
 void sctp_data_ready(struct sock *sk);
-__poll_t sctp_poll(struct file *file, struct socket *sock,
-   poll_table *wait);
+__poll_t sctp_poll_mask(struct socket *sock, __poll_t events);
 void sctp_sock_rfree(struct sk_buff *skb);
 void sctp_copy_sock(struct sock *newsk, struct sock *sk,
struct sctp_association *asoc);
diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index e35d4f73d2df..6b0b8fc5b75a 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -976,7 +976,7 @@ static const struct proto_ops inet6_seqpacket_ops = {
.socketpair= sock_no_socketpair,
.accept= inet_accept,
.getname   = sctp_getname,
-   .poll  = sctp_poll,
+   .poll_mask = sctp_poll_mask,
.ioctl = inet6_ioctl,
.listen= sctp_inet_listen,
.shutdown  = inet_shutdown,
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 91813e686c67..20c544890e80 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -1024,7 +1024,7 @@ static const struct proto_ops inet_seqpacket_ops = {
.socketpair= sock_no_socketpair,
.accept= inet_accept,
.getname   = inet_getname,  /* Semantics are different.  */
-   .poll  = sctp_poll,
+   .poll_mask = sctp_poll_mask,
.ioctl = inet_ioctl,
.listen= sctp_inet_listen,
.shutdown  = inet_shutdown, /* Looks harmless.  */
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index bf271f8c2dc9..097454740929 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -7587,14 +7587,12 @@ int sctp_inet_listen(struct socket *sock, int backlog)
  * here, again, by modeling the current TCP/UDP code.  We don't have
  * a good way to test with it yet.
  */
-__poll_t sctp_poll(struct file *file, struct socket *sock, poll_table *wait)
+__poll_t sctp_poll_mask(struct socket *sock, __poll_t events)
 {
struct sock *sk = sock->sk;
struct sctp_sock *sp = sctp_sk(sk);
__poll_t mask;
 
-   poll_wait(file, sk_sleep(sk), wait);
-
sock_rps_record_flow(sk);
 
/* A TCP-style listening socket becomes readable when the accept queue
-- 
2.14.2

[PATCH 22/28] net/iucv: convert to ->poll_mask

2018-03-21 Thread Christoph Hellwig

Signed-off-by: Christoph Hellwig 
---
 include/net/iucv/af_iucv.h | 2 --
 net/iucv/af_iucv.c | 7 ++-
 2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/include/net/iucv/af_iucv.h b/include/net/iucv/af_iucv.h
index f4c21b5a1242..b0eaeb02d46d 100644
--- a/include/net/iucv/af_iucv.h
+++ b/include/net/iucv/af_iucv.h
@@ -153,8 +153,6 @@ struct iucv_sock_list {
atomic_t  autobind_name;
 };
 
-__poll_t iucv_sock_poll(struct file *file, struct socket *sock,
-   poll_table *wait);
 void iucv_sock_link(struct iucv_sock_list *l, struct sock *s);
 void iucv_sock_unlink(struct iucv_sock_list *l, struct sock *s);
 void iucv_accept_enqueue(struct sock *parent, struct sock *sk);
diff --git a/net/iucv/af_iucv.c b/net/iucv/af_iucv.c
index 1e8cc7bcbca3..539a312dc481 100644
--- a/net/iucv/af_iucv.c
+++ b/net/iucv/af_iucv.c
@@ -1489,14 +1489,11 @@ static inline __poll_t iucv_accept_poll(struct sock 
*parent)
return 0;
 }
 
-__poll_t iucv_sock_poll(struct file *file, struct socket *sock,
-   poll_table *wait)
+static __poll_t iucv_sock_poll_mask(struct socket *sock, __poll_t events)
 {
struct sock *sk = sock->sk;
__poll_t mask = 0;
 
-   sock_poll_wait(file, sk_sleep(sk), wait);
-
if (sk->sk_state == IUCV_LISTEN)
return iucv_accept_poll(sk);
 
@@ -2389,7 +2386,7 @@ static const struct proto_ops iucv_sock_ops = {
.getname= iucv_sock_getname,
.sendmsg= iucv_sock_sendmsg,
.recvmsg= iucv_sock_recvmsg,
-   .poll   = iucv_sock_poll,
+   .poll_mask  = iucv_sock_poll_mask,
.ioctl  = sock_no_ioctl,
.mmap   = sock_no_mmap,
.socketpair = sock_no_socketpair,
-- 
2.14.2

[PATCH 24/28] crypto: af_alg: convert to ->poll_mask

2018-03-21 Thread Christoph Hellwig

Signed-off-by: Christoph Hellwig 
---
 crypto/af_alg.c | 13 +++--
 crypto/algif_aead.c |  4 ++--
 crypto/algif_skcipher.c |  4 ++--
 include/crypto/if_alg.h |  3 +--
 4 files changed, 8 insertions(+), 16 deletions(-)

diff --git a/crypto/af_alg.c b/crypto/af_alg.c
index 50d75de539f5..330aef1cd08b 100644
--- a/crypto/af_alg.c
+++ b/crypto/af_alg.c
@@ -1060,19 +1060,12 @@ void af_alg_async_cb(struct crypto_async_request *_req, 
int err)
 }
 EXPORT_SYMBOL_GPL(af_alg_async_cb);
 
-/**
- * af_alg_poll - poll system call handler
- */
-__poll_t af_alg_poll(struct file *file, struct socket *sock,
-poll_table *wait)
+__poll_t af_alg_poll_mask(struct socket *sock, __poll_t events)
 {
struct sock *sk = sock->sk;
struct alg_sock *ask = alg_sk(sk);
struct af_alg_ctx *ctx = ask->private;
-   __poll_t mask;
-
-   sock_poll_wait(file, sk_sleep(sk), wait);
-   mask = 0;
+   __poll_t mask = 0;
 
if (!ctx->more || ctx->used)
mask |= EPOLLIN | EPOLLRDNORM;
@@ -1082,7 +1075,7 @@ __poll_t af_alg_poll(struct file *file, struct socket 
*sock,
 
return mask;
 }
-EXPORT_SYMBOL_GPL(af_alg_poll);
+EXPORT_SYMBOL_GPL(af_alg_poll_mask);
 
 /**
  * af_alg_alloc_areq - allocate struct af_alg_async_req
diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c
index 4b07edd5a9ff..330cf9f2b767 100644
--- a/crypto/algif_aead.c
+++ b/crypto/algif_aead.c
@@ -375,7 +375,7 @@ static struct proto_ops algif_aead_ops = {
.sendmsg=   aead_sendmsg,
.sendpage   =   af_alg_sendpage,
.recvmsg=   aead_recvmsg,
-   .poll   =   af_alg_poll,
+   .poll_mask  =   af_alg_poll_mask,
 };
 
 static int aead_check_key(struct socket *sock)
@@ -471,7 +471,7 @@ static struct proto_ops algif_aead_ops_nokey = {
.sendmsg=   aead_sendmsg_nokey,
.sendpage   =   aead_sendpage_nokey,
.recvmsg=   aead_recvmsg_nokey,
-   .poll   =   af_alg_poll,
+   .poll_mask  =   af_alg_poll_mask,
 };
 
 static void *aead_bind(const char *name, u32 type, u32 mask)
diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c
index c4e885df4564..15cf3c5222e0 100644
--- a/crypto/algif_skcipher.c
+++ b/crypto/algif_skcipher.c
@@ -205,7 +205,7 @@ static struct proto_ops algif_skcipher_ops = {
.sendmsg=   skcipher_sendmsg,
.sendpage   =   af_alg_sendpage,
.recvmsg=   skcipher_recvmsg,
-   .poll   =   af_alg_poll,
+   .poll_mask  =   af_alg_poll_mask,
 };
 
 static int skcipher_check_key(struct socket *sock)
@@ -301,7 +301,7 @@ static struct proto_ops algif_skcipher_ops_nokey = {
.sendmsg=   skcipher_sendmsg_nokey,
.sendpage   =   skcipher_sendpage_nokey,
.recvmsg=   skcipher_recvmsg_nokey,
-   .poll   =   af_alg_poll,
+   .poll_mask  =   af_alg_poll_mask,
 };
 
 static void *skcipher_bind(const char *name, u32 type, u32 mask)
diff --git a/include/crypto/if_alg.h b/include/crypto/if_alg.h
index 482461d8931d..cc414db9da0a 100644
--- a/include/crypto/if_alg.h
+++ b/include/crypto/if_alg.h
@@ -245,8 +245,7 @@ ssize_t af_alg_sendpage(struct socket *sock, struct page 
*page,
int offset, size_t size, int flags);
 void af_alg_free_resources(struct af_alg_async_req *areq);
 void af_alg_async_cb(struct crypto_async_request *_req, int err);
-__poll_t af_alg_poll(struct file *file, struct socket *sock,
-poll_table *wait);
+__poll_t af_alg_poll_mask(struct socket *sock, __poll_t events);
 struct af_alg_async_req *af_alg_alloc_areq(struct sock *sk,
   unsigned int areqlen);
 int af_alg_get_rsgl(struct sock *sk, struct msghdr *msg, int flags,
-- 
2.14.2

[PATCH 26/28] eventfd: switch to ->poll_mask

2018-03-21 Thread Christoph Hellwig

Signed-off-by: Christoph Hellwig 
---
 fs/eventfd.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/fs/eventfd.c b/fs/eventfd.c
index 012f5bd46dfa..d70b4907f978 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -101,14 +101,20 @@ static int eventfd_release(struct inode *inode, struct 
file *file)
return 0;
 }
 
-static __poll_t eventfd_poll(struct file *file, poll_table *wait)
+static struct wait_queue_head *
+eventfd_get_poll_head(struct file *file, __poll_t events)
+{
+   struct eventfd_ctx *ctx = file->private_data;
+
+   return &ctx->wqh;
+}
+
+static __poll_t eventfd_poll_mask(struct file *file, __poll_t eventmask)
 {
struct eventfd_ctx *ctx = file->private_data;
__poll_t events = 0;
u64 count;
 
-   poll_wait(file, &ctx->wqh, wait);
-
/*
 * All writes to ctx->count occur within ctx->wqh.lock.  This read
 * can be done outside ctx->wqh.lock because we know that poll_wait
@@ -305,7 +311,8 @@ static const struct file_operations eventfd_fops = {
.show_fdinfo= eventfd_show_fdinfo,
 #endif
.release= eventfd_release,
-   .poll   = eventfd_poll,
+   .get_poll_head  = eventfd_get_poll_head,
+   .poll_mask  = eventfd_poll_mask,
.read   = eventfd_read,
.write  = eventfd_write,
.llseek = noop_llseek,
-- 
2.14.2

[PATCH 28/28] random: convert to ->poll_mask

2018-03-21 Thread Christoph Hellwig

The big change is that random_read_wait and random_write_wait are merged
into a single waitqueue that uses keyed wakeups.  Because wait_event_*
doesn't know about that this will lead to occassional spurious wakeups
in _random_read and add_hwgenerator_randomness, but wait_event_* is
designed to handle these and were are not in a a hot path there.

Signed-off-by: Christoph Hellwig 
---
 drivers/char/random.c | 27 +++
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index e5b3d3ba4660..840d80b64431 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -401,8 +401,7 @@ static struct poolinfo {
 /*
  * Static global variables
  */
-static DECLARE_WAIT_QUEUE_HEAD(random_read_wait);
-static DECLARE_WAIT_QUEUE_HEAD(random_write_wait);
+static DECLARE_WAIT_QUEUE_HEAD(random_wait);
 static struct fasync_struct *fasync;
 
 static DEFINE_SPINLOCK(random_ready_list_lock);
@@ -710,7 +709,7 @@ static void credit_entropy_bits(struct entropy_store *r, 
int nbits)
 
/* should we wake readers? */
if (entropy_bits >= random_read_wakeup_bits) {
-   wake_up_interruptible(&random_read_wait);
+   wake_up_interruptible_poll(&random_wait, POLLIN);
kill_fasync(&fasync, SIGIO, POLL_IN);
}
/* If the input pool is getting full, send some
@@ -1293,7 +1292,7 @@ static size_t account(struct entropy_store *r, size_t 
nbytes, int min,
trace_debit_entropy(r->name, 8 * ibytes);
if (ibytes &&
(r->entropy_count >> ENTROPY_SHIFT) < random_write_wakeup_bits) {
-   wake_up_interruptible(&random_write_wait);
+   wake_up_interruptible_poll(&random_wait, POLLOUT);
kill_fasync(&fasync, SIGIO, POLL_OUT);
}
 
@@ -1748,7 +1747,7 @@ _random_read(int nonblock, char __user *buf, size_t 
nbytes)
if (nonblock)
return -EAGAIN;
 
-   wait_event_interruptible(random_read_wait,
+   wait_event_interruptible(random_wait,
ENTROPY_BITS(&input_pool) >=
random_read_wakeup_bits);
if (signal_pending(current))
@@ -1784,14 +1783,17 @@ urandom_read(struct file *file, char __user *buf, 
size_t nbytes, loff_t *ppos)
return ret;
 }
 
+static struct wait_queue_head *
+random_get_poll_head(struct file *file, __poll_t events)
+{
+   return &random_wait;
+}
+
 static __poll_t
-random_poll(struct file *file, poll_table * wait)
+random_poll_mask(struct file *file, __poll_t events)
 {
-   __poll_t mask;
+   __poll_t mask = 0;
 
-   poll_wait(file, &random_read_wait, wait);
-   poll_wait(file, &random_write_wait, wait);
-   mask = 0;
if (ENTROPY_BITS(&input_pool) >= random_read_wakeup_bits)
mask |= EPOLLIN | EPOLLRDNORM;
if (ENTROPY_BITS(&input_pool) < random_write_wakeup_bits)
@@ -1890,7 +1892,8 @@ static int random_fasync(int fd, struct file *filp, int 
on)
 const struct file_operations random_fops = {
.read  = random_read,
.write = random_write,
-   .poll  = random_poll,
+   .get_poll_head  = random_get_poll_head,
+   .poll_mask  = random_poll_mask,
.unlocked_ioctl = random_ioctl,
.fasync = random_fasync,
.llseek = noop_llseek,
@@ -2223,7 +2226,7 @@ void add_hwgenerator_randomness(const char *buffer, 
size_t count,
 * We'll be woken up again once below random_write_wakeup_thresh,
 * or when the calling thread is about to terminate.
 */
-   wait_event_interruptible(random_write_wait, kthread_should_stop() ||
+   wait_event_interruptible(random_wait, kthread_should_stop() ||
ENTROPY_BITS(&input_pool) <= random_write_wakeup_bits);
mix_pool_bytes(poolp, buffer, count);
credit_entropy_bits(poolp, entropy);
-- 
2.14.2

[PATCH 27/28] timerfd: convert to ->poll_mask

2018-03-21 Thread Christoph Hellwig

Signed-off-by: Christoph Hellwig 
---
 fs/timerfd.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/fs/timerfd.c b/fs/timerfd.c
index cdad49da3ff7..d84a2bee4f82 100644
--- a/fs/timerfd.c
+++ b/fs/timerfd.c
@@ -226,21 +226,20 @@ static int timerfd_release(struct inode *inode, struct 
file *file)
kfree_rcu(ctx, rcu);
return 0;
 }
-
-static __poll_t timerfd_poll(struct file *file, poll_table *wait)
+   
+static struct wait_queue_head *timerfd_get_poll_head(struct file *file,
+   __poll_t eventmask)
 {
struct timerfd_ctx *ctx = file->private_data;
-   __poll_t events = 0;
-   unsigned long flags;
 
-   poll_wait(file, &ctx->wqh, wait);
+   return &ctx->wqh;
+}
 
-   spin_lock_irqsave(&ctx->wqh.lock, flags);
-   if (ctx->ticks)
-   events |= EPOLLIN;
-   spin_unlock_irqrestore(&ctx->wqh.lock, flags);
+static __poll_t timerfd_poll_mask(struct file *file, __poll_t eventmask)
+{
+   struct timerfd_ctx *ctx = file->private_data;
 
-   return events;
+   return ctx->ticks ? EPOLLIN : 0;
 }
 
 static ssize_t timerfd_read(struct file *file, char __user *buf, size_t count,
@@ -364,7 +363,8 @@ static long timerfd_ioctl(struct file *file, unsigned int 
cmd, unsigned long arg
 
 static const struct file_operations timerfd_fops = {
.release= timerfd_release,
-   .poll   = timerfd_poll,
+   .get_poll_head  = timerfd_get_poll_head,
+   .poll_mask  = timerfd_poll_mask,
.read   = timerfd_read,
.llseek = noop_llseek,
.show_fdinfo= timerfd_show,
-- 
2.14.2

[PATCH 25/28] pipe: convert to ->poll_mask

2018-03-21 Thread Christoph Hellwig

Signed-off-by: Christoph Hellwig 
---
 fs/pipe.c | 22 +-
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/fs/pipe.c b/fs/pipe.c
index 7b1954caf388..81937590ea0a 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -509,19 +509,22 @@ static long pipe_ioctl(struct file *filp, unsigned int 
cmd, unsigned long arg)
}
 }
 
-/* No kernel lock held - fine */
-static __poll_t
-pipe_poll(struct file *filp, poll_table *wait)
+static struct wait_queue_head *
+pipe_get_poll_head(struct file *filp, __poll_t events)
 {
-   __poll_t mask;
struct pipe_inode_info *pipe = filp->private_data;
-   int nrbufs;
 
-   poll_wait(filp, &pipe->wait, wait);
+   return &pipe->wait;
+}
+
+/* No kernel lock held - fine */
+static __poll_t pipe_poll_mask(struct file *filp, __poll_t events)
+{
+   struct pipe_inode_info *pipe = filp->private_data;
+   int nrbufs = pipe->nrbufs;
+   __poll_t mask = 0;
 
/* Reading only -- no need for acquiring the semaphore.  */
-   nrbufs = pipe->nrbufs;
-   mask = 0;
if (filp->f_mode & FMODE_READ) {
mask = (nrbufs > 0) ? EPOLLIN | EPOLLRDNORM : 0;
if (!pipe->writers && filp->f_version != pipe->w_counter)
@@ -1015,7 +1018,8 @@ const struct file_operations pipefifo_fops = {
.llseek = no_llseek,
.read_iter  = pipe_read,
.write_iter = pipe_write,
-   .poll   = pipe_poll,
+   .get_poll_head  = pipe_get_poll_head,
+   .poll_mask  = pipe_poll_mask,
.unlocked_ioctl = pipe_ioctl,
.release= pipe_release,
.fasync = pipe_fasync,
-- 
2.14.2

[PATCH 23/28] net/rxrpc: convert to ->poll_mask

2018-03-21 Thread Christoph Hellwig

Signed-off-by: Christoph Hellwig 
---
 net/rxrpc/af_rxrpc.c | 10 +++---
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index 0c9c18aa7c77..d2440d5c3ce8 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -729,15 +729,11 @@ static int rxrpc_getsockopt(struct socket *sock, int 
level, int optname,
 /*
  * permit an RxRPC socket to be polled
  */
-static __poll_t rxrpc_poll(struct file *file, struct socket *sock,
-  poll_table *wait)
+static __poll_t rxrpc_poll_mask(struct socket *sock, __poll_t events)
 {
struct sock *sk = sock->sk;
struct rxrpc_sock *rx = rxrpc_sk(sk);
-   __poll_t mask;
-
-   sock_poll_wait(file, sk_sleep(sk), wait);
-   mask = 0;
+   __poll_t mask = 0;
 
/* the socket is readable if there are any messages waiting on the Rx
 * queue */
@@ -940,7 +936,7 @@ static const struct proto_ops rxrpc_rpc_ops = {
.socketpair = sock_no_socketpair,
.accept = sock_no_accept,
.getname= sock_no_getname,
-   .poll   = rxrpc_poll,
+   .poll_mask  = rxrpc_poll_mask,
.ioctl  = sock_no_ioctl,
.listen = rxrpc_listen,
.shutdown   = rxrpc_shutdown,
-- 
2.14.2

[PATCH 21/28] net/phonet: convert to ->poll_mask

2018-03-21 Thread Christoph Hellwig

Signed-off-by: Christoph Hellwig 
---
 net/phonet/socket.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/net/phonet/socket.c b/net/phonet/socket.c
index 28d981512f5f..70ac4539d5b7 100644
--- a/net/phonet/socket.c
+++ b/net/phonet/socket.c
@@ -341,15 +341,12 @@ static int pn_socket_getname(struct socket *sock, struct 
sockaddr *addr,
return 0;
 }
 
-static __poll_t pn_socket_poll(struct file *file, struct socket *sock,
-   poll_table *wait)
+static __poll_t pn_socket_poll_mask(struct socket *sock, __poll_t events)
 {
struct sock *sk = sock->sk;
struct pep_sock *pn = pep_sk(sk);
__poll_t mask = 0;
 
-   poll_wait(file, sk_sleep(sk), wait);
-
if (sk->sk_state == TCP_CLOSE)
return EPOLLERR;
if (!skb_queue_empty(&sk->sk_receive_queue))
@@ -474,7 +471,7 @@ const struct proto_ops phonet_stream_ops = {
.socketpair = sock_no_socketpair,
.accept = pn_socket_accept,
.getname= pn_socket_getname,
-   .poll   = pn_socket_poll,
+   .poll_mask  = pn_socket_poll_mask,
.ioctl  = pn_socket_ioctl,
.listen = pn_socket_listen,
.shutdown   = sock_no_shutdown,
-- 
2.14.2

[PATCH 20/28] net/nfc: convert to ->poll_mask

2018-03-21 Thread Christoph Hellwig

Signed-off-by: Christoph Hellwig 
---
 net/nfc/llcp_sock.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/net/nfc/llcp_sock.c b/net/nfc/llcp_sock.c
index 376040092142..b6010750e634 100644
--- a/net/nfc/llcp_sock.c
+++ b/net/nfc/llcp_sock.c
@@ -549,16 +549,13 @@ static inline __poll_t llcp_accept_poll(struct sock 
*parent)
return 0;
 }
 
-static __poll_t llcp_sock_poll(struct file *file, struct socket *sock,
-  poll_table *wait)
+static __poll_t llcp_sock_poll_mask(struct socket *sock, __poll_t events)
 {
struct sock *sk = sock->sk;
__poll_t mask = 0;
 
pr_debug("%p\n", sk);
 
-   sock_poll_wait(file, sk_sleep(sk), wait);
-
if (sk->sk_state == LLCP_LISTEN)
return llcp_accept_poll(sk);
 
@@ -900,7 +897,7 @@ static const struct proto_ops llcp_sock_ops = {
.socketpair = sock_no_socketpair,
.accept = llcp_sock_accept,
.getname= llcp_sock_getname,
-   .poll   = llcp_sock_poll,
+   .poll_mask  = llcp_sock_poll_mask,
.ioctl  = sock_no_ioctl,
.listen = llcp_sock_listen,
.shutdown   = sock_no_shutdown,
@@ -920,7 +917,7 @@ static const struct proto_ops llcp_rawsock_ops = {
.socketpair = sock_no_socketpair,
.accept = sock_no_accept,
.getname= llcp_sock_getname,
-   .poll   = llcp_sock_poll,
+   .poll_mask  = llcp_sock_poll_mask,
.ioctl  = sock_no_ioctl,
.listen = sock_no_listen,
.shutdown   = sock_no_shutdown,
-- 
2.14.2

[PATCH 19/28] net/caif: convert to ->poll_mask

2018-03-21 Thread Christoph Hellwig

Signed-off-by: Christoph Hellwig 
---
 net/caif/caif_socket.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/net/caif/caif_socket.c b/net/caif/caif_socket.c
index a6fb1b3bcad9..c7991867d622 100644
--- a/net/caif/caif_socket.c
+++ b/net/caif/caif_socket.c
@@ -934,15 +934,11 @@ static int caif_release(struct socket *sock)
 }
 
 /* Copied from af_unix.c:unix_poll(), added CAIF tx_flow handling */
-static __poll_t caif_poll(struct file *file,
- struct socket *sock, poll_table *wait)
+static __poll_t caif_poll_mask(struct socket *sock, __poll_t events)
 {
struct sock *sk = sock->sk;
-   __poll_t mask;
struct caifsock *cf_sk = container_of(sk, struct caifsock, sk);
-
-   sock_poll_wait(file, sk_sleep(sk), wait);
-   mask = 0;
+   __poll_t mask = 0;
 
/* exceptional events? */
if (sk->sk_err)
@@ -976,7 +972,7 @@ static const struct proto_ops caif_seqpacket_ops = {
.socketpair = sock_no_socketpair,
.accept = sock_no_accept,
.getname = sock_no_getname,
-   .poll = caif_poll,
+   .poll_mask = caif_poll_mask,
.ioctl = sock_no_ioctl,
.listen = sock_no_listen,
.shutdown = sock_no_shutdown,
@@ -997,7 +993,7 @@ static const struct proto_ops caif_stream_ops = {
.socketpair = sock_no_socketpair,
.accept = sock_no_accept,
.getname = sock_no_getname,
-   .poll = caif_poll,
+   .poll_mask = caif_poll_mask,
.ioctl = sock_no_ioctl,
.listen = sock_no_listen,
.shutdown = sock_no_shutdown,
-- 
2.14.2

[PATCH 18/28] net/bluetooth: convert to ->poll_mask

2018-03-21 Thread Christoph Hellwig

Signed-off-by: Christoph Hellwig 
---
 include/net/bluetooth/bluetooth.h | 2 +-
 net/bluetooth/af_bluetooth.c  | 7 ++-
 net/bluetooth/l2cap_sock.c| 2 +-
 net/bluetooth/rfcomm/sock.c   | 2 +-
 net/bluetooth/sco.c   | 2 +-
 5 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/include/net/bluetooth/bluetooth.h 
b/include/net/bluetooth/bluetooth.h
index ec9d6bc65855..53ce8176c313 100644
--- a/include/net/bluetooth/bluetooth.h
+++ b/include/net/bluetooth/bluetooth.h
@@ -271,7 +271,7 @@ int  bt_sock_recvmsg(struct socket *sock, struct msghdr 
*msg, size_t len,
 int flags);
 int  bt_sock_stream_recvmsg(struct socket *sock, struct msghdr *msg,
size_t len, int flags);
-__poll_t bt_sock_poll(struct file *file, struct socket *sock, poll_table 
*wait);
+__poll_t bt_sock_poll_mask(struct socket *sock, __poll_t events);
 int  bt_sock_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg);
 int  bt_sock_wait_state(struct sock *sk, int state, unsigned long timeo);
 int  bt_sock_wait_ready(struct sock *sk, unsigned long flags);
diff --git a/net/bluetooth/af_bluetooth.c b/net/bluetooth/af_bluetooth.c
index 84d92a077834..80033a7e1de2 100644
--- a/net/bluetooth/af_bluetooth.c
+++ b/net/bluetooth/af_bluetooth.c
@@ -437,16 +437,13 @@ static inline __poll_t bt_accept_poll(struct sock *parent)
return 0;
 }
 
-__poll_t bt_sock_poll(struct file *file, struct socket *sock,
- poll_table *wait)
+__poll_t bt_sock_poll_mask(struct socket *sock, __poll_t events)
 {
struct sock *sk = sock->sk;
__poll_t mask = 0;
 
BT_DBG("sock %p, sk %p", sock, sk);
 
-   poll_wait(file, sk_sleep(sk), wait);
-
if (sk->sk_state == BT_LISTEN)
return bt_accept_poll(sk);
 
@@ -478,7 +475,7 @@ __poll_t bt_sock_poll(struct file *file, struct socket 
*sock,
 
return mask;
 }
-EXPORT_SYMBOL(bt_sock_poll);
+EXPORT_SYMBOL(bt_sock_poll_mask);
 
 int bt_sock_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
 {
diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c
index 67a8642f57ea..d20b33daa80f 100644
--- a/net/bluetooth/l2cap_sock.c
+++ b/net/bluetooth/l2cap_sock.c
@@ -1654,7 +1654,7 @@ static const struct proto_ops l2cap_sock_ops = {
.getname= l2cap_sock_getname,
.sendmsg= l2cap_sock_sendmsg,
.recvmsg= l2cap_sock_recvmsg,
-   .poll   = bt_sock_poll,
+   .poll_mask  = bt_sock_poll_mask,
.ioctl  = bt_sock_ioctl,
.mmap   = sock_no_mmap,
.socketpair = sock_no_socketpair,
diff --git a/net/bluetooth/rfcomm/sock.c b/net/bluetooth/rfcomm/sock.c
index 1aaccf637479..b4dc96481d92 100644
--- a/net/bluetooth/rfcomm/sock.c
+++ b/net/bluetooth/rfcomm/sock.c
@@ -1049,7 +1049,7 @@ static const struct proto_ops rfcomm_sock_ops = {
.setsockopt = rfcomm_sock_setsockopt,
.getsockopt = rfcomm_sock_getsockopt,
.ioctl  = rfcomm_sock_ioctl,
-   .poll   = bt_sock_poll,
+   .poll_mask  = bt_sock_poll_mask,
.socketpair = sock_no_socketpair,
.mmap   = sock_no_mmap
 };
diff --git a/net/bluetooth/sco.c b/net/bluetooth/sco.c
index 08df57665e1f..b2bf5c767b3e 100644
--- a/net/bluetooth/sco.c
+++ b/net/bluetooth/sco.c
@@ -1198,7 +1198,7 @@ static const struct proto_ops sco_sock_ops = {
.getname= sco_sock_getname,
.sendmsg= sco_sock_sendmsg,
.recvmsg= sco_sock_recvmsg,
-   .poll   = bt_sock_poll,
+   .poll_mask  = bt_sock_poll_mask,
.ioctl  = bt_sock_ioctl,
.mmap   = sock_no_mmap,
.socketpair = sock_no_socketpair,
-- 
2.14.2

Re: [RFC PATCH v2 2/4] mm/__free_one_page: skip merge for order-0 page unless compaction failed

2018-03-21 Thread Aaron Lu

On Tue, Mar 20, 2018 at 10:59:16PM -0700, Figo.zhang wrote:
> 2018-03-20 21:53 GMT-07:00 Aaron Lu :
> 
> > On Tue, Mar 20, 2018 at 09:21:33PM -0700, Figo.zhang wrote:
> > > suppose that in free_one_page() will try to merge to high order anytime ,
> > > but now in your patch,
> > > those merge has postponed when system in low memory status, it is very
> > easy
> > > let system trigger
> > > low memory state and get poor performance.
> >
> > Merge or not merge, the size of free memory is not affected.
> >
> 
> yes, the total free memory is not impact, but will influence the higher
> order allocation.

Yes, that's correct.

Re: [RFC PATCH 0/3] kernel: add support for 256-bit IO access

2018-03-21 Thread Ingo Molnar

So I poked around a bit and I'm having second thoughts:

* Linus Torvalds  wrote:

> On Tue, Mar 20, 2018 at 1:26 AM, Ingo Molnar  wrote:
> >
> > So assuming the target driver will only load on modern FPUs I *think* it 
> > should
> > actually be possible to do something like (pseudocode):
> >
> > vmovdqa %ymm0, 40(%rsp)
> > vmovdqa %ymm1, 80(%rsp)
> >
> > ...
> > # use ymm0 and ymm1
> > ...
> >
> > vmovdqa 80(%rsp), %ymm1
> > vmovdqa 40(%rsp), %ymm0
> >
> > ... without using the heavy XSAVE/XRSTOR instructions.
> 
> No. The above is buggy. It may *work*, but it won't work in the long run.
> 
> Pretty much every single vector extension has traditionally meant that
> touching "old" registers breaks any new register use. Even if you save
> the registers carefully like in your example code, it will do magic
> and random things to the "future extended" version.

This should be relatively straightforward to solve via a proper CPU features 
check: for example by only patching in the AVX routines for 'known compatible' 
fpu_kernel_xstate_size values. Future extensions of register width will extend
the XSAVE area.

It's not fool-proof: in theory there could be semantic extensions to the vector 
unit that does not increase the size of the context - but the normal pattern is 
to 
increase the number of XINUSE bits and bump up the maximum context area size.

If that's a worry then an even safer compatibility check would be to explicitly 
list CPU models - we do track them pretty accurately anyway these days, mostly 
due 
to perf PMU support defaulting to a safe but dumb variant if a CPU model is not 
specifically listed.

That method, although more maintenance-intense, should be pretty fool-proof 
AFAICS.

> So I absolutely *refuse* to have anything to do with the vector unit.
> You can only touch it in the kernel if you own it entirely (ie that
> "kernel_fpu_begin()/_end()" thing). Anything else is absolutely
> guaranteed to cause problems down the line.
> 
> And even if you ignore that "maintenance problems down the line" issue
> ("we can fix them when they happen") I don't want to see games like
> this, because I'm pretty sure it breaks the optimized xsave by tagging
> the state as being dirty.

So I added a bit of instrumentation and the current state of things is that on 
64-bit x86 every single task has an initialized FPU, every task has the exact 
same, fully filled in xfeatures (XINUSE) value:

 [root@galatea ~]# grep -h fpu /proc/*/task/*/fpu | sort | uniq -c
504 x86/fpu: initialized :1
504 x86/fpu: xfeatures_mask  :7

So our latest FPU model is *really* simple and user-space should not be able to 
observe any changes in the XINUSE bits of the XSAVE header, because (at least 
for 
the basic vector CPU features) all bits are maxed out all the time.

Note that this is with an AVX (128-bit) supporting CPU:

[0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point 
registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[0.00] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[0.00] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, 
using 'standard' format.

But note that it probably wouldn't make sense to make use of XINUSE 
optimizations 
on most systems for the AVX space, as glibc will use the highest-bitness vector 
ops for its regular memcpy(), and every user task makes use of memcpy.

It does make sense for some of the more optional XSAVE based features such as 
pkeys. But I don't have any newer Intel system with a wider xsave feature set 
to 
check.

> So no. Don't use vector stuff in the kernel. It's not worth the pain.

That might still be true, but still I'm torn:

 - Broad areas of user-space has seemlessly integrated vector ops and is using 
   them all the time they can find an excuse to use them.

 - The vector registers are fundamentally callee-saved, so in most synchronous 
   calls the vector unit registers are unused. Asynchronous interruptions of 
   context (interrupts, faults, preemption, etc.) can still use them as well, 
as 
   long as they save/restore register contents.

So other than Intel not making it particularly easy to make a forwards 
compatible 
vector register granular save/restore pattern (but see above for how we could 
handle that) for asynchronous contexts, I don't see too many other 
complications.

Thanks,

Ingo

[PATCH 15/28] net/vmw_vsock: convert to ->poll_mask

2018-03-21 Thread Christoph Hellwig

Signed-off-by: Christoph Hellwig 
---
 net/vmw_vsock/af_vsock.c | 19 ++-
 1 file changed, 6 insertions(+), 13 deletions(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index e0fc84daed94..b9210329bda8 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -850,18 +850,11 @@ static int vsock_shutdown(struct socket *sock, int mode)
return err;
 }
 
-static __poll_t vsock_poll(struct file *file, struct socket *sock,
-  poll_table *wait)
+static __poll_t vsock_poll_mask(struct socket *sock, __poll_t events)
 {
-   struct sock *sk;
-   __poll_t mask;
-   struct vsock_sock *vsk;
-
-   sk = sock->sk;
-   vsk = vsock_sk(sk);
-
-   poll_wait(file, sk_sleep(sk), wait);
-   mask = 0;
+   struct sock *sk = sock->sk;
+   struct vsock_sock *vsk = vsock_sk(sk);
+   __poll_t mask = 0;
 
if (sk->sk_err)
/* Signify that there has been an error on this socket. */
@@ -1091,7 +1084,7 @@ static const struct proto_ops vsock_dgram_ops = {
.socketpair = sock_no_socketpair,
.accept = sock_no_accept,
.getname = vsock_getname,
-   .poll = vsock_poll,
+   .poll_mask = vsock_poll_mask,
.ioctl = sock_no_ioctl,
.listen = sock_no_listen,
.shutdown = vsock_shutdown,
@@ -1849,7 +1842,7 @@ static const struct proto_ops vsock_stream_ops = {
.socketpair = sock_no_socketpair,
.accept = vsock_accept,
.getname = vsock_getname,
-   .poll = vsock_poll,
+   .poll_mask = vsock_poll_mask,
.ioctl = sock_no_ioctl,
.listen = vsock_listen,
.shutdown = vsock_shutdown,
-- 
2.14.2

[PATCH 16/28] net/tipc: convert to ->poll_mask

2018-03-21 Thread Christoph Hellwig

Signed-off-by: Christoph Hellwig 
---
 net/tipc/socket.c | 14 +-
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 7dfa9fc99ec3..e9c6f185db74 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -695,10 +695,9 @@ static int tipc_getname(struct socket *sock, struct 
sockaddr *uaddr,
 }
 
 /**
- * tipc_poll - read and possibly block on pollmask
+ * tipc_poll - read pollmask
  * @file: file structure associated with the socket
  * @sock: socket for which to calculate the poll bits
- * @wait: ???
  *
  * Returns pollmask value
  *
@@ -712,15 +711,12 @@ static int tipc_getname(struct socket *sock, struct 
sockaddr *uaddr,
  * imply that the operation will succeed, merely that it should be performed
  * and will not block.
  */
-static __poll_t tipc_poll(struct file *file, struct socket *sock,
- poll_table *wait)
+static __poll_t tipc_poll_mask(struct socket *sock, __poll_t events)
 {
struct sock *sk = sock->sk;
struct tipc_sock *tsk = tipc_sk(sk);
__poll_t revents = 0;
 
-   sock_poll_wait(file, sk_sleep(sk), wait);
-
if (sk->sk_shutdown & RCV_SHUTDOWN)
revents |= EPOLLRDHUP | EPOLLIN | EPOLLRDNORM;
if (sk->sk_shutdown == SHUTDOWN_MASK)
@@ -3020,7 +3016,7 @@ static const struct proto_ops msg_ops = {
.socketpair = tipc_socketpair,
.accept = sock_no_accept,
.getname= tipc_getname,
-   .poll   = tipc_poll,
+   .poll_mask  = tipc_poll_mask,
.ioctl  = tipc_ioctl,
.listen = sock_no_listen,
.shutdown   = tipc_shutdown,
@@ -3041,7 +3037,7 @@ static const struct proto_ops packet_ops = {
.socketpair = tipc_socketpair,
.accept = tipc_accept,
.getname= tipc_getname,
-   .poll   = tipc_poll,
+   .poll_mask  = tipc_poll_mask,
.ioctl  = tipc_ioctl,
.listen = tipc_listen,
.shutdown   = tipc_shutdown,
@@ -3062,7 +3058,7 @@ static const struct proto_ops stream_ops = {
.socketpair = tipc_socketpair,
.accept = tipc_accept,
.getname= tipc_getname,
-   .poll   = tipc_poll,
+   .poll_mask  = tipc_poll_mask,
.ioctl  = tipc_ioctl,
.listen = tipc_listen,
.shutdown   = tipc_shutdown,
-- 
2.14.2

[PATCH 13/28] net/dccp: convert to ->poll_mask

2018-03-21 Thread Christoph Hellwig

Signed-off-by: Christoph Hellwig 
---
 net/dccp/dccp.h  |  3 +--
 net/dccp/ipv4.c  |  2 +-
 net/dccp/ipv6.c  |  2 +-
 net/dccp/proto.c | 13 ++---
 4 files changed, 5 insertions(+), 15 deletions(-)

diff --git a/net/dccp/dccp.h b/net/dccp/dccp.h
index f91e3816806b..0ea2ee56ac1b 100644
--- a/net/dccp/dccp.h
+++ b/net/dccp/dccp.h
@@ -316,8 +316,7 @@ int dccp_recvmsg(struct sock *sk, struct msghdr *msg, 
size_t len, int nonblock,
 int flags, int *addr_len);
 void dccp_shutdown(struct sock *sk, int how);
 int inet_dccp_listen(struct socket *sock, int backlog);
-__poll_t dccp_poll(struct file *file, struct socket *sock,
-  poll_table *wait);
+__poll_t dccp_poll_mask(struct socket *sock, __poll_t events);
 int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len);
 void dccp_req_err(struct sock *sk, u64 seq);
 
diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index e65fcb45c3f6..e8476f319efd 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -983,7 +983,7 @@ static const struct proto_ops inet_dccp_ops = {
.accept= inet_accept,
.getname   = inet_getname,
/* FIXME: work on tcp_poll to rename it to inet_csk_poll */
-   .poll  = dccp_poll,
+   .poll_mask = dccp_poll_mask,
.ioctl = inet_ioctl,
/* FIXME: work on inet_listen to rename it to sock_common_listen */
.listen= inet_dccp_listen,
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index 5df7857fc0f3..f0aac8e4b888 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -1069,7 +1069,7 @@ static const struct proto_ops inet6_dccp_ops = {
.socketpair= sock_no_socketpair,
.accept= inet_accept,
.getname   = inet6_getname,
-   .poll  = dccp_poll,
+   .poll_mask = dccp_poll_mask,
.ioctl = inet6_ioctl,
.listen= inet_dccp_listen,
.shutdown  = inet_shutdown,
diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index 15bdc002d90c..26816032a7c2 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -314,20 +314,11 @@ int dccp_disconnect(struct sock *sk, int flags)
 
 EXPORT_SYMBOL_GPL(dccp_disconnect);
 
-/*
- * Wait for a DCCP event.
- *
- * Note that we don't need to lock the socket, as the upper poll layers
- * take care of normal races (between the test and the event) and we don't
- * go look at any of the socket buffers directly.
- */
-__poll_t dccp_poll(struct file *file, struct socket *sock,
-  poll_table *wait)
+__poll_t dccp_poll_mask(struct socket *sock, __poll_t events)
 {
__poll_t mask;
struct sock *sk = sock->sk;
 
-   sock_poll_wait(file, sk_sleep(sk), wait);
if (sk->sk_state == DCCP_LISTEN)
return inet_csk_listen_poll(sk);
 
@@ -369,7 +360,7 @@ __poll_t dccp_poll(struct file *file, struct socket *sock,
return mask;
 }
 
-EXPORT_SYMBOL_GPL(dccp_poll);
+EXPORT_SYMBOL_GPL(dccp_poll_mask);
 
 int dccp_ioctl(struct sock *sk, int cmd, unsigned long arg)
 {
-- 
2.14.2

[PATCH 09/28] net: remove sock_no_poll

2018-03-21 Thread Christoph Hellwig

Now that sock_poll handles a NULL ->poll or ->poll_mask there is no need
for a stub.

Signed-off-by: Christoph Hellwig 
---
 crypto/af_alg.c | 1 -
 crypto/algif_hash.c | 2 --
 crypto/algif_rng.c  | 1 -
 drivers/isdn/mISDN/socket.c | 1 -
 drivers/net/ppp/pptp.c  | 1 -
 include/net/sock.h  | 2 --
 net/bluetooth/bnep/sock.c   | 1 -
 net/bluetooth/cmtp/sock.c   | 1 -
 net/bluetooth/hidp/sock.c   | 1 -
 net/core/sock.c | 6 --
 10 files changed, 17 deletions(-)

diff --git a/crypto/af_alg.c b/crypto/af_alg.c
index c49766b03165..50d75de539f5 100644
--- a/crypto/af_alg.c
+++ b/crypto/af_alg.c
@@ -347,7 +347,6 @@ static const struct proto_ops alg_proto_ops = {
.sendpage   =   sock_no_sendpage,
.sendmsg=   sock_no_sendmsg,
.recvmsg=   sock_no_recvmsg,
-   .poll   =   sock_no_poll,
 
.bind   =   alg_bind,
.release=   af_alg_release,
diff --git a/crypto/algif_hash.c b/crypto/algif_hash.c
index 6c9b1927a520..bfcf595fd8f9 100644
--- a/crypto/algif_hash.c
+++ b/crypto/algif_hash.c
@@ -288,7 +288,6 @@ static struct proto_ops algif_hash_ops = {
.mmap   =   sock_no_mmap,
.bind   =   sock_no_bind,
.setsockopt =   sock_no_setsockopt,
-   .poll   =   sock_no_poll,
 
.release=   af_alg_release,
.sendmsg=   hash_sendmsg,
@@ -396,7 +395,6 @@ static struct proto_ops algif_hash_ops_nokey = {
.mmap   =   sock_no_mmap,
.bind   =   sock_no_bind,
.setsockopt =   sock_no_setsockopt,
-   .poll   =   sock_no_poll,
 
.release=   af_alg_release,
.sendmsg=   hash_sendmsg_nokey,
diff --git a/crypto/algif_rng.c b/crypto/algif_rng.c
index 150c2b6480ed..22df3799a17b 100644
--- a/crypto/algif_rng.c
+++ b/crypto/algif_rng.c
@@ -106,7 +106,6 @@ static struct proto_ops algif_rng_ops = {
.bind   =   sock_no_bind,
.accept =   sock_no_accept,
.setsockopt =   sock_no_setsockopt,
-   .poll   =   sock_no_poll,
.sendmsg=   sock_no_sendmsg,
.sendpage   =   sock_no_sendpage,
 
diff --git a/drivers/isdn/mISDN/socket.c b/drivers/isdn/mISDN/socket.c
index c5603d1a07d6..c84270e16bdd 100644
--- a/drivers/isdn/mISDN/socket.c
+++ b/drivers/isdn/mISDN/socket.c
@@ -746,7 +746,6 @@ static const struct proto_ops base_sock_ops = {
.getname= sock_no_getname,
.sendmsg= sock_no_sendmsg,
.recvmsg= sock_no_recvmsg,
-   .poll   = sock_no_poll,
.listen = sock_no_listen,
.shutdown   = sock_no_shutdown,
.setsockopt = sock_no_setsockopt,
diff --git a/drivers/net/ppp/pptp.c b/drivers/net/ppp/pptp.c
index 6dde9a0cfe76..87f892f1d0fe 100644
--- a/drivers/net/ppp/pptp.c
+++ b/drivers/net/ppp/pptp.c
@@ -627,7 +627,6 @@ static const struct proto_ops pptp_ops = {
.socketpair = sock_no_socketpair,
.accept = sock_no_accept,
.getname= pptp_getname,
-   .poll   = sock_no_poll,
.listen = sock_no_listen,
.shutdown   = sock_no_shutdown,
.setsockopt = sock_no_setsockopt,
diff --git a/include/net/sock.h b/include/net/sock.h
index 169c92afcafa..d9249fe65859 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1585,8 +1585,6 @@ int sock_no_connect(struct socket *, struct sockaddr *, 
int, int);
 int sock_no_socketpair(struct socket *, struct socket *);
 int sock_no_accept(struct socket *, struct socket *, int, bool);
 int sock_no_getname(struct socket *, struct sockaddr *, int *, int);
-__poll_t sock_no_poll(struct file *, struct socket *,
- struct poll_table_struct *);
 int sock_no_ioctl(struct socket *, unsigned int, unsigned long);
 int sock_no_listen(struct socket *, int);
 int sock_no_shutdown(struct socket *, int);
diff --git a/net/bluetooth/bnep/sock.c b/net/bluetooth/bnep/sock.c
index b5116fa9835e..00deacdcb51c 100644
--- a/net/bluetooth/bnep/sock.c
+++ b/net/bluetooth/bnep/sock.c
@@ -175,7 +175,6 @@ static const struct proto_ops bnep_sock_ops = {
.getname= sock_no_getname,
.sendmsg= sock_no_sendmsg,
.recvmsg= sock_no_recvmsg,
-   .poll   = sock_no_poll,
.listen = sock_no_listen,
.shutdown   = sock_no_shutdown,
.setsockopt = sock_no_setsockopt,
diff --git a/net/bluetooth/cmtp/sock.c b/net/bluetooth/cmtp/sock.c
index ce86a7bae844..e08f28fadd65 100644
--- a/net/bluetooth/cmtp/sock.c
+++ b/net/bluetooth/cmtp/sock.c
@@ -178,7 +178,6 @@ static const struct proto_ops cmtp_sock_ops = {
.getname= sock_no_getname,
.sendmsg= sock_no_sendmsg,
.recvmsg= sock_no_recvmsg,

[PATCH 12/28] net: convert datagram_poll users tp ->poll_mask

2018-03-21 Thread Christoph Hellwig

Signed-off-by: Christoph Hellwig 
---
 drivers/isdn/mISDN/socket.c|  2 +-
 drivers/net/ppp/pppoe.c|  2 +-
 drivers/staging/ipx/af_ipx.c   |  2 +-
 drivers/staging/irda/net/af_irda.c |  6 +++---
 include/linux/skbuff.h |  3 +--
 include/net/udp.h  |  2 +-
 net/appletalk/ddp.c|  2 +-
 net/ax25/af_ax25.c |  2 +-
 net/bluetooth/hci_sock.c   |  2 +-
 net/can/bcm.c  |  2 +-
 net/can/raw.c  |  2 +-
 net/core/datagram.c| 13 -
 net/decnet/af_decnet.c |  6 +++---
 net/ieee802154/socket.c|  4 ++--
 net/ipv4/af_inet.c |  6 +++---
 net/ipv4/udp.c | 10 +-
 net/ipv6/af_inet6.c|  2 +-
 net/ipv6/raw.c |  4 ++--
 net/kcm/kcmsock.c  |  4 ++--
 net/key/af_key.c   |  2 +-
 net/l2tp/l2tp_ip.c |  2 +-
 net/l2tp/l2tp_ip6.c|  2 +-
 net/l2tp/l2tp_ppp.c|  2 +-
 net/llc/af_llc.c   |  2 +-
 net/netlink/af_netlink.c   |  2 +-
 net/netrom/af_netrom.c |  2 +-
 net/nfc/rawsock.c  |  4 ++--
 net/packet/af_packet.c |  9 -
 net/phonet/socket.c|  2 +-
 net/qrtr/qrtr.c|  2 +-
 net/rose/af_rose.c |  2 +-
 net/x25/af_x25.c   |  2 +-
 32 files changed, 52 insertions(+), 59 deletions(-)

diff --git a/drivers/isdn/mISDN/socket.c b/drivers/isdn/mISDN/socket.c
index c84270e16bdd..61d6e4c9e7d1 100644
--- a/drivers/isdn/mISDN/socket.c
+++ b/drivers/isdn/mISDN/socket.c
@@ -589,7 +589,7 @@ static const struct proto_ops data_sock_ops = {
.getname= data_sock_getname,
.sendmsg= mISDN_sock_sendmsg,
.recvmsg= mISDN_sock_recvmsg,
-   .poll   = datagram_poll,
+   .poll_mask  = datagram_poll_mask,
.listen = sock_no_listen,
.shutdown   = sock_no_shutdown,
.setsockopt = data_sock_setsockopt,
diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
index 5aa59f41bf8c..8c311e626884 100644
--- a/drivers/net/ppp/pppoe.c
+++ b/drivers/net/ppp/pppoe.c
@@ -1120,7 +1120,7 @@ static const struct proto_ops pppoe_ops = {
.socketpair = sock_no_socketpair,
.accept = sock_no_accept,
.getname= pppoe_getname,
-   .poll   = datagram_poll,
+   .poll_mask  = datagram_poll_mask,
.listen = sock_no_listen,
.shutdown   = sock_no_shutdown,
.setsockopt = sock_no_setsockopt,
diff --git a/drivers/staging/ipx/af_ipx.c b/drivers/staging/ipx/af_ipx.c
index d21a9d128d3e..3373f7f67d35 100644
--- a/drivers/staging/ipx/af_ipx.c
+++ b/drivers/staging/ipx/af_ipx.c
@@ -1967,7 +1967,7 @@ static const struct proto_ops ipx_dgram_ops = {
.socketpair = sock_no_socketpair,
.accept = sock_no_accept,
.getname= ipx_getname,
-   .poll   = datagram_poll,
+   .poll_mask  = datagram_poll_mask,
.ioctl  = ipx_ioctl,
 #ifdef CONFIG_COMPAT
.compat_ioctl   = ipx_compat_ioctl,
diff --git a/drivers/staging/irda/net/af_irda.c 
b/drivers/staging/irda/net/af_irda.c
index 2f1e9ab3d6d0..77659b1c40ba 100644
--- a/drivers/staging/irda/net/af_irda.c
+++ b/drivers/staging/irda/net/af_irda.c
@@ -2600,7 +2600,7 @@ static const struct proto_ops irda_seqpacket_ops = {
.socketpair =   sock_no_socketpair,
.accept =   irda_accept,
.getname =  irda_getname,
-   .poll = datagram_poll,
+   .poll_mask =datagram_poll_mask,
.ioctl =irda_ioctl,
 #ifdef CONFIG_COMPAT
.compat_ioctl = irda_compat_ioctl,
@@ -2624,7 +2624,7 @@ static const struct proto_ops irda_dgram_ops = {
.socketpair =   sock_no_socketpair,
.accept =   irda_accept,
.getname =  irda_getname,
-   .poll = datagram_poll,
+   .poll_mask =datagram_poll_mask,
.ioctl =irda_ioctl,
 #ifdef CONFIG_COMPAT
.compat_ioctl = irda_compat_ioctl,
@@ -2649,7 +2649,7 @@ static const struct proto_ops irda_ultra_ops = {
.socketpair =   sock_no_socketpair,
.accept =   sock_no_accept,
.getname =  irda_getname,
-   .poll = datagram_poll,
+   .poll_mask =datagram_poll_mask,
.ioctl =irda_ioctl,
 #ifdef CONFIG_COMPAT
.compat_ioctl = irda_compat_ioctl,
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index ddf77cf4ff2d..1ac027bd33ec 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3246,8 +3246,7 @@ struct sk_buff *__skb_recv_datagram(struct sock *sk, 
unsigned flags,
int *peeked, int *off, int *err);
 struct sk_buff *skb_recv_datagram(struct

[PATCH 06/28] aio: implement IOCB_CMD_POLL

2018-03-21 Thread Christoph Hellwig

Simple one-shot poll through the io_submit() interface.  To poll for
a file descriptor the application should submit an iocb of type
IOCB_CMD_POLL.  It will poll the fd for the events specified in the
the first 32 bits of the aio_buf field of the iocb.

Unlike poll or epoll without EPOLLONESHOT this interface always works
in one shot mode, that is once the iocb is completed, it will have to be
resubmitted.

Signed-off-by: Christoph Hellwig 
Acked-by: Jeff Moyer 
---
 fs/aio.c | 102 ++-
 include/uapi/linux/aio_abi.h |   6 +--
 2 files changed, 103 insertions(+), 5 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 79d3eb3d2dd9..38b408129697 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -5,6 +5,7 @@
  * Implements an efficient asynchronous io interface.
  *
  * Copyright 2000, 2001, 2002 Red Hat, Inc.  All Rights Reserved.
+ * Copyright 2018 Christoph Hellwig.
  *
  * See ../COPYING for licensing terms.
  */
@@ -162,10 +163,18 @@ struct fsync_iocb {
booldatasync;
 };
 
+struct poll_iocb {
+   struct file *file;
+   __poll_tevents;
+   struct wait_queue_head  *head;
+   struct wait_queue_entry wait;
+};
+
 struct aio_kiocb {
union {
struct kiocbrw;
struct fsync_iocb   fsync;
+   struct poll_iocbpoll;
};
 
struct kioctx   *ki_ctx;
@@ -1590,7 +1599,6 @@ static int aio_fsync(struct fsync_iocb *req, struct iocb 
*iocb, bool datasync)
return -EINVAL;
if (iocb->aio_offset || iocb->aio_nbytes || iocb->aio_rw_flags)
return -EINVAL;
-
req->file = fget(iocb->aio_fildes);
if (unlikely(!req->file))
return -EBADF;
@@ -1609,6 +1617,96 @@ static int aio_fsync(struct fsync_iocb *req, struct iocb 
*iocb, bool datasync)
return ret;
 }
 
+static void __aio_complete_poll(struct poll_iocb *req, __poll_t mask)
+{
+   fput(req->file);
+   aio_complete(container_of(req, struct aio_kiocb, poll),
+   mangle_poll(mask), 0);
+}
+
+static void aio_complete_poll(struct poll_iocb *req, __poll_t mask)
+{
+   struct aio_kiocb *iocb = container_of(req, struct aio_kiocb, poll);
+
+   if (!(iocb->flags & AIO_IOCB_CANCELLED))
+   __aio_complete_poll(req, mask);
+}
+
+static int aio_poll_cancel(struct kiocb *rw)
+{
+   struct aio_kiocb *iocb = container_of(rw, struct aio_kiocb, rw);
+
+   remove_wait_queue(iocb->poll.head, &iocb->poll.wait);
+   __aio_complete_poll(&iocb->poll, 0); /* no events to report */
+   return 0;
+}
+
+static int aio_poll_wake(struct wait_queue_entry *wait, unsigned mode, int 
sync,
+   void *key)
+{
+   struct poll_iocb *req = container_of(wait, struct poll_iocb, wait);
+   struct file *file = req->file;
+   __poll_t mask = key_to_poll(key);
+
+   assert_spin_locked(&req->head->lock);
+
+   /* for instances that support it check for an event match first: */
+   if (mask && !(mask & req->events))
+   return 0;
+
+   mask = vfs_poll_mask(file, req->events);
+   if (!mask)
+   return 0;
+
+   __remove_wait_queue(req->head, &req->wait);
+   aio_complete_poll(req, mask);
+   return 1;
+}
+
+static ssize_t aio_poll(struct aio_kiocb *aiocb, struct iocb *iocb)
+{
+   struct poll_iocb *req = &aiocb->poll;
+   unsigned long flags;
+   __poll_t mask;
+
+   /* reject any unknown events outside the normal event mask. */
+   if ((u16)iocb->aio_buf != iocb->aio_buf)
+   return -EINVAL;
+   /* reject fields that are not defined for poll */
+   if (iocb->aio_offset || iocb->aio_nbytes || iocb->aio_rw_flags)
+   return -EINVAL;
+
+   req->events = demangle_poll(iocb->aio_buf) | POLLERR | POLLHUP;
+   req->file = fget(iocb->aio_fildes);
+   if (unlikely(!req->file))
+   return -EBADF;
+
+   req->head = vfs_get_poll_head(req->file, req->events);
+   if (!req->head) {
+   fput(req->file);
+   return -EINVAL; /* same as no support for IOCB_CMD_POLL */
+   }
+   if (IS_ERR(req->head)) {
+   mask = PTR_TO_POLL(req->head);
+   goto done;
+   }
+
+   init_waitqueue_func_entry(&req->wait, aio_poll_wake);
+
+   spin_lock_irqsave(&req->head->lock, flags);
+   mask = vfs_poll_mask(req->file, req->events);
+   if (!mask) {
+   __kiocb_set_cancel_fn(aiocb, aio_poll_cancel,
+   AIO_IOCB_DELAYED_CANCEL);
+   __add_wait_queue(req->head, &req->wait);
+   }
+   spin_unlock_irqrestore(&req->head->lock, flags);
+done:
+   if (mask)
+   aio_complete_poll(req, mask);
+   return -EIOCBQUEUED;
+}
+
 static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_

[PATCH 11/28] net/unix: convert to ->poll_mask

2018-03-21 Thread Christoph Hellwig

Signed-off-by: Christoph Hellwig 
---
 net/unix/af_unix.c | 30 +++---
 1 file changed, 11 insertions(+), 19 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 2d465bdeccbc..619c6921dd46 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -638,9 +638,8 @@ static int unix_stream_connect(struct socket *, struct 
sockaddr *,
 static int unix_socketpair(struct socket *, struct socket *);
 static int unix_accept(struct socket *, struct socket *, int, bool);
 static int unix_getname(struct socket *, struct sockaddr *, int *, int);
-static __poll_t unix_poll(struct file *, struct socket *, poll_table *);
-static __poll_t unix_dgram_poll(struct file *, struct socket *,
-   poll_table *);
+static __poll_t unix_poll_mask(struct socket *, __poll_t);
+static __poll_t unix_dgram_poll_mask(struct socket *, __poll_t);
 static int unix_ioctl(struct socket *, unsigned int, unsigned long);
 static int unix_shutdown(struct socket *, int);
 static int unix_stream_sendmsg(struct socket *, struct msghdr *, size_t);
@@ -681,7 +680,7 @@ static const struct proto_ops unix_stream_ops = {
.socketpair =   unix_socketpair,
.accept =   unix_accept,
.getname =  unix_getname,
-   .poll = unix_poll,
+   .poll_mask =unix_poll_mask,
.ioctl =unix_ioctl,
.listen =   unix_listen,
.shutdown = unix_shutdown,
@@ -704,7 +703,7 @@ static const struct proto_ops unix_dgram_ops = {
.socketpair =   unix_socketpair,
.accept =   sock_no_accept,
.getname =  unix_getname,
-   .poll = unix_dgram_poll,
+   .poll_mask =unix_dgram_poll_mask,
.ioctl =unix_ioctl,
.listen =   sock_no_listen,
.shutdown = unix_shutdown,
@@ -726,7 +725,7 @@ static const struct proto_ops unix_seqpacket_ops = {
.socketpair =   unix_socketpair,
.accept =   unix_accept,
.getname =  unix_getname,
-   .poll = unix_dgram_poll,
+   .poll_mask =unix_dgram_poll_mask,
.ioctl =unix_ioctl,
.listen =   unix_listen,
.shutdown = unix_shutdown,
@@ -2640,13 +2639,10 @@ static int unix_ioctl(struct socket *sock, unsigned int 
cmd, unsigned long arg)
return err;
 }
 
-static __poll_t unix_poll(struct file *file, struct socket *sock, poll_table 
*wait)
+static __poll_t unix_poll_mask(struct socket *sock, __poll_t events)
 {
struct sock *sk = sock->sk;
-   __poll_t mask;
-
-   sock_poll_wait(file, sk_sleep(sk), wait);
-   mask = 0;
+   __poll_t mask = 0;
 
/* exceptional events? */
if (sk->sk_err)
@@ -2675,15 +2671,11 @@ static __poll_t unix_poll(struct file *file, struct 
socket *sock, poll_table *wa
return mask;
 }
 
-static __poll_t unix_dgram_poll(struct file *file, struct socket *sock,
-   poll_table *wait)
+static __poll_t unix_dgram_poll_mask(struct socket *sock, __poll_t events)
 {
struct sock *sk = sock->sk, *other;
-   unsigned int writable;
-   __poll_t mask;
-
-   sock_poll_wait(file, sk_sleep(sk), wait);
-   mask = 0;
+   int writable;
+   __poll_t mask = 0;
 
/* exceptional events? */
if (sk->sk_err || !skb_queue_empty(&sk->sk_error_queue))
@@ -2709,7 +2701,7 @@ static __poll_t unix_dgram_poll(struct file *file, struct 
socket *sock,
}
 
/* No write status requested, avoid expensive OUT tests. */
-   if (!(poll_requested_events(wait) & (EPOLLWRBAND|EPOLLWRNORM|EPOLLOUT)))
+   if (!(events & (EPOLLWRBAND|EPOLLWRNORM|EPOLLOUT)))
return mask;
 
writable = unix_writable(sk);
-- 
2.14.2

[PATCH 07/28] net: refactor socket_poll

2018-03-21 Thread Christoph Hellwig

Factor out two busy poll related helpers for late reuse, and remove
a command that isn't very helpful, especially with the __poll_t
annotations in place.

Signed-off-by: Christoph Hellwig 
---
 include/net/busy_poll.h | 15 +++
 net/socket.c| 21 -
 2 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
index 71c72a939bf8..c5187438af38 100644
--- a/include/net/busy_poll.h
+++ b/include/net/busy_poll.h
@@ -121,6 +121,21 @@ static inline void sk_busy_loop(struct sock *sk, int 
nonblock)
 #endif
 }
 
+static inline void sock_poll_busy_loop(struct socket *sock, __poll_t events)
+{
+   if (sk_can_busy_loop(sock->sk) &&
+   events && (events & POLL_BUSY_LOOP)) {
+   /* once, only if requested by syscall */
+   sk_busy_loop(sock->sk, 1);
+   }
+}
+
+/* if this socket can poll_ll, tell the system call */
+static inline __poll_t sock_poll_busy_flag(struct socket *sock)
+{
+   return sk_can_busy_loop(sock->sk) ? POLL_BUSY_LOOP : 0;
+}
+
 /* used in the NIC receive handler to mark the skb */
 static inline void skb_mark_napi_id(struct sk_buff *skb,
struct napi_struct *napi)
diff --git a/net/socket.c b/net/socket.c
index a93c99b518ca..3f859a07641a 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1117,24 +1117,11 @@ EXPORT_SYMBOL(sock_create_lite);
 /* No kernel lock held - perfect */
 static __poll_t sock_poll(struct file *file, poll_table *wait)
 {
-   __poll_t busy_flag = 0;
-   struct socket *sock;
-
-   /*
-*  We can't return errors to poll, so it's either yes or no.
-*/
-   sock = file->private_data;
-
-   if (sk_can_busy_loop(sock->sk)) {
-   /* this socket can poll_ll so tell the system call */
-   busy_flag = POLL_BUSY_LOOP;
-
-   /* once, only if requested by syscall */
-   if (wait && (wait->_key & POLL_BUSY_LOOP))
-   sk_busy_loop(sock->sk, 1);
-   }
+   struct socket *sock = file->private_data;
+   __poll_t events = poll_requested_events(wait);
 
-   return busy_flag | sock->ops->poll(file, sock, wait);
+   sock_poll_busy_loop(sock, events);
+   return sock->ops->poll(file, sock, wait) | sock_poll_busy_flag(sock);
 }
 
 static int sock_mmap(struct file *file, struct vm_area_struct *vma)
-- 
2.14.2

[PATCH 05/28] fs: introduce new ->get_poll_head and ->poll_mask methods

2018-03-21 Thread Christoph Hellwig

->get_poll_head returns the waitqueue that the poll operation is going
to sleep on.  Note that this means we can only use a single waitqueue
for the poll, unlike some current drivers that use two waitqueues for
different events.  But now that we have keyed wakeups and heavily use
those for poll there aren't that many good reason left to keep the
multiple waitqueues, and if there are any ->poll is still around, the
driver just won't support aio poll.

Signed-off-by: Christoph Hellwig 
---
 Documentation/filesystems/Locking |  7 ++-
 Documentation/filesystems/vfs.txt | 13 +
 fs/select.c   | 28 
 include/linux/fs.h|  2 ++
 include/linux/poll.h  | 27 +++
 5 files changed, 72 insertions(+), 5 deletions(-)

diff --git a/Documentation/filesystems/Locking 
b/Documentation/filesystems/Locking
index 220bba28f72b..6d227f9d7bd9 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -440,6 +440,8 @@ prototypes:
ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
int (*iterate) (struct file *, struct dir_context *);
__poll_t (*poll) (struct file *, struct poll_table_struct *);
+   struct wait_queue_head * (*get_poll_head)(struct file *, __poll_t);
+   __poll_t (*poll_mask) (struct file *, __poll_t);
long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
int (*mmap) (struct file *, struct vm_area_struct *);
@@ -470,7 +472,7 @@ prototypes:
 };
 
 locking rules:
-   All may block.
+   All except for ->poll_mask may block.
 
 ->llseek() locking has moved from llseek to the individual llseek
 implementations.  If your fs is not using generic_file_llseek, you
@@ -498,6 +500,9 @@ in sys_read() and friends.
 the lease within the individual filesystem to record the result of the
 operation
 
+->poll_mask can be called with or without the waitqueue lock for the waitqueue
+returned from ->get_poll_head.
+
 --- dquot_operations ---
 prototypes:
int (*write_dquot) (struct dquot *);
diff --git a/Documentation/filesystems/vfs.txt 
b/Documentation/filesystems/vfs.txt
index f608180ad59d..50ee13563271 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -857,6 +857,8 @@ struct file_operations {
ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
int (*iterate) (struct file *, struct dir_context *);
__poll_t (*poll) (struct file *, struct poll_table_struct *);
+   struct wait_queue_head * (*get_poll_head)(struct file *, __poll_t);
+   __poll_t (*poll_mask) (struct file *, __poll_t);
long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
int (*mmap) (struct file *, struct vm_area_struct *);
@@ -901,6 +903,17 @@ otherwise noted.
activity on this file and (optionally) go to sleep until there
is activity. Called by the select(2) and poll(2) system calls
 
+  get_poll_head: Returns the struct wait_queue_head that poll, select,
+  epoll or aio poll should wait on in case this instance only has single
+  waitqueue.  Can return NULL to indicate polling is not supported,
+  or a POLL* value using the POLL_TO_PTR helper in case a grave error
+  occured and ->poll_mask shall not be called.
+
+  poll_mask: return the mask of POLL* values describing the file descriptor
+  state.  Called either before going to sleep on the waitqueue returned by
+  get_poll_head, or after it has been woken.  If ->get_poll_head and
+  ->poll_mask are implemented ->poll does not need to be implement.
+
   unlocked_ioctl: called by the ioctl(2) system call.
 
   compat_ioctl: called by the ioctl(2) system call when 32 bit system calls
diff --git a/fs/select.c b/fs/select.c
index ba91103707ea..cc270d7f6192 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -34,6 +34,34 @@
 
 #include 
 
+__poll_t vfs_poll(struct file *file, struct poll_table_struct *pt)
+{
+   unsigned int events = poll_requested_events(pt);
+   struct wait_queue_head *head;
+
+   if (unlikely(!file_can_poll(file)))
+   return DEFAULT_POLLMASK;
+
+   if (file->f_op->poll)
+   return file->f_op->poll(file, pt);
+
+   /*
+* Only get the poll head and do the first mask check if we are actually
+* going to sleep on this file:
+*/
+   if (pt && pt->_qproc) {
+   head = vfs_get_poll_head(file, events);
+   if (!head)
+   return DEFAULT_POLLMASK;
+   if (IS_ERR(head))
+   return PTR_TO_POLL(head);
+
+   pt->_qproc(file, head, pt);
+   }
+
+   return file->f_op->poll_mask(file, events);
+}
+EXPORT_SYMBOL_G

Re: [PATCH 1/1] lz4: Implement lz4 with dynamic offset length.

2018-03-21 Thread Sergey Senozhatsky

On (03/21/18 10:10), Maninder Singh wrote:
[..]
> +static struct crypto_alg alg_lz4_dyn = {
> + .cra_name   = "lz4_dyn",
> + .cra_flags  = CRYPTO_ALG_TYPE_COMPRESS,
> + .cra_ctxsize= sizeof(struct lz4_ctx),
> + .cra_module = THIS_MODULE,
> + .cra_list   = LIST_HEAD_INIT(alg_lz4_dyn.cra_list),
> + .cra_init   = lz4_init,
> + .cra_exit   = lz4_exit,
> + .cra_u  = { .compress = {
> + .coa_compress   = lz4_compress_crypto_dynamic,
> + .coa_decompress = lz4_decompress_crypto_dynamic } }
> +};

[..]

> diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
> index 4ed0a78..5bc5aab 100644
> --- a/drivers/block/zram/zcomp.c
> +++ b/drivers/block/zram/zcomp.c
> @@ -17,11 +17,15 @@
>  #include 
>  
>  #include "zcomp.h"
> +#define KB   (1 << 10)
>  
>  static const char * const backends[] = {
>   "lzo",
>  #if IS_ENABLED(CONFIG_CRYPTO_LZ4)
>   "lz4",
> +#if (PAGE_SIZE < (32 * KB))
> + "lz4_dyn",
> +#endif

This is not the list of supported algorithms. It's the list of
recommended algorithms. You can configure zram to use any of
available and known to Crypto API algorithms. Including lz4_dyn
on PAGE_SIZE > 32K systems.

-ss

[PATCH 04/28] fs: add new vfs_poll and file_can_poll helpers

2018-03-21 Thread Christoph Hellwig

These abstract out calls to the poll method in preparation for changes
in how we poll.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Darrick J. Wong 
---
 drivers/staging/comedi/drivers/serial2002.c |  4 ++--
 drivers/vfio/virqfd.c   |  2 +-
 drivers/vhost/vhost.c   |  2 +-
 fs/eventpoll.c  |  5 ++---
 fs/select.c | 23 ---
 include/linux/poll.h| 12 
 mm/memcontrol.c |  2 +-
 net/9p/trans_fd.c   | 18 --
 virt/kvm/eventfd.c  |  2 +-
 9 files changed, 32 insertions(+), 38 deletions(-)

diff --git a/drivers/staging/comedi/drivers/serial2002.c 
b/drivers/staging/comedi/drivers/serial2002.c
index b3f3b4a201af..5471b2212a62 100644
--- a/drivers/staging/comedi/drivers/serial2002.c
+++ b/drivers/staging/comedi/drivers/serial2002.c
@@ -113,7 +113,7 @@ static void serial2002_tty_read_poll_wait(struct file *f, 
int timeout)
long elapsed;
__poll_t mask;
 
-   mask = f->f_op->poll(f, &table.pt);
+   mask = vfs_poll(f, &table.pt);
if (mask & (EPOLLRDNORM | EPOLLRDBAND | EPOLLIN |
EPOLLHUP | EPOLLERR)) {
break;
@@ -136,7 +136,7 @@ static int serial2002_tty_read(struct file *f, int timeout)
 
result = -1;
if (!IS_ERR(f)) {
-   if (f->f_op->poll) {
+   if (file_can_poll(f)) {
serial2002_tty_read_poll_wait(f, timeout);
 
if (kernel_read(f, &ch, 1, &pos) == 1)
diff --git a/drivers/vfio/virqfd.c b/drivers/vfio/virqfd.c
index 085700f1be10..2a1be859ee71 100644
--- a/drivers/vfio/virqfd.c
+++ b/drivers/vfio/virqfd.c
@@ -166,7 +166,7 @@ int vfio_virqfd_enable(void *opaque,
init_waitqueue_func_entry(&virqfd->wait, virqfd_wakeup);
init_poll_funcptr(&virqfd->pt, virqfd_ptable_queue_proc);
 
-   events = irqfd.file->f_op->poll(irqfd.file, &virqfd->pt);
+   events = vfs_poll(irqfd.file, &virqfd->pt);
 
/*
 * Check if there was an event already pending on the eventfd
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 1b3e8d2d5c8b..4d27e288bb1d 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -208,7 +208,7 @@ int vhost_poll_start(struct vhost_poll *poll, struct file 
*file)
if (poll->wqh)
return 0;
 
-   mask = file->f_op->poll(file, &poll->table);
+   mask = vfs_poll(file, &poll->table);
if (mask)
vhost_poll_wakeup(&poll->wait, 0, 0, poll_to_key(mask));
if (mask & EPOLLERR) {
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 0f3494ed3ed0..2bebae5a38cf 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -884,8 +884,7 @@ static __poll_t ep_item_poll(const struct epitem *epi, 
poll_table *pt,
 
pt->_key = epi->event.events;
if (!is_file_epoll(epi->ffd.file))
-   return epi->ffd.file->f_op->poll(epi->ffd.file, pt) &
-  epi->event.events;
+   return vfs_poll(epi->ffd.file, pt) & epi->event.events;
 
ep = epi->ffd.file->private_data;
poll_wait(epi->ffd.file, &ep->poll_wait, pt);
@@ -2020,7 +2019,7 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
 
/* The target file descriptor must support poll */
error = -EPERM;
-   if (!tf.file->f_op->poll)
+   if (!file_can_poll(tf.file))
goto error_tgt_fput;
 
/* Check if EPOLLWAKEUP is allowed */
diff --git a/fs/select.c b/fs/select.c
index c6c504a814f9..ba91103707ea 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -502,14 +502,10 @@ static int do_select(int n, fd_set_bits *fds, struct 
timespec64 *end_time)
continue;
f = fdget(i);
if (f.file) {
-   const struct file_operations *f_op;
-   f_op = f.file->f_op;
-   mask = DEFAULT_POLLMASK;
-   if (f_op->poll) {
-   wait_key_set(wait, in, out,
-bit, busy_flag);
-   mask = (*f_op->poll)(f.file, 
wait);
-   }
+   wait_key_set(wait, in, out, bit,
+busy_flag);
+   mask = vfs_poll(f.file, wait);
+
fdput(f);
if ((mask & POLLIN_SET) && (in & bit)) {
res_in |= bit;
@@ -819,13 +81

[PATCH 02/28] fs: cleanup do_pollfd

2018-03-21 Thread Christoph Hellwig

Use straigline code with failure handling gotos instead of a lot
of nested conditionals.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Darrick J. Wong 
---
 fs/select.c | 48 +++-
 1 file changed, 23 insertions(+), 25 deletions(-)

diff --git a/fs/select.c b/fs/select.c
index 686de7b3a1db..c6c504a814f9 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -806,34 +806,32 @@ static inline __poll_t do_pollfd(struct pollfd *pollfd, 
poll_table *pwait,
 bool *can_busy_poll,
 __poll_t busy_flag)
 {
-   __poll_t mask;
-   int fd;
-
-   mask = 0;
-   fd = pollfd->fd;
-   if (fd >= 0) {
-   struct fd f = fdget(fd);
-   mask = EPOLLNVAL;
-   if (f.file) {
-   /* userland u16 ->events contains POLL... bitmap */
-   __poll_t filter = demangle_poll(pollfd->events) |
-   EPOLLERR | EPOLLHUP;
-   mask = DEFAULT_POLLMASK;
-   if (f.file->f_op->poll) {
-   pwait->_key = filter;
-   pwait->_key |= busy_flag;
-   mask = f.file->f_op->poll(f.file, pwait);
-   if (mask & busy_flag)
-   *can_busy_poll = true;
-   }
-   /* Mask out unneeded events. */
-   mask &= filter;
-   fdput(f);
-   }
+   int fd = pollfd->fd;
+   __poll_t mask = 0, filter;
+   struct fd f;
+
+   if (fd < 0)
+   goto out;
+   mask = EPOLLNVAL;
+   f = fdget(fd);
+   if (!f.file)
+   goto out;
+
+   /* userland u16 ->events contains POLL... bitmap */
+   filter = demangle_poll(pollfd->events) | EPOLLERR | EPOLLHUP;
+   mask = DEFAULT_POLLMASK;
+   if (f.file->f_op->poll) {
+   pwait->_key = filter | busy_flag;
+   mask = f.file->f_op->poll(f.file, pwait);
+   if (mask & busy_flag)
+   *can_busy_poll = true;
}
+   mask &= filter; /* Mask out unneeded events. */
+   fdput(f);
+
+out:
/* ... and so does ->revents */
pollfd->revents = mangle_poll(mask);
-
return mask;
 }
 
-- 
2.14.2

[PATCH 01/28] fs: unexport poll_schedule_timeout

2018-03-21 Thread Christoph Hellwig

No users outside of select.c.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Darrick J. Wong 
---
 fs/select.c  | 3 +--
 include/linux/poll.h | 2 --
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/fs/select.c b/fs/select.c
index b6c36254028a..686de7b3a1db 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -233,7 +233,7 @@ static void __pollwait(struct file *filp, wait_queue_head_t 
*wait_address,
add_wait_queue(wait_address, &entry->wait);
 }
 
-int poll_schedule_timeout(struct poll_wqueues *pwq, int state,
+static int poll_schedule_timeout(struct poll_wqueues *pwq, int state,
  ktime_t *expires, unsigned long slack)
 {
int rc = -EINTR;
@@ -258,7 +258,6 @@ int poll_schedule_timeout(struct poll_wqueues *pwq, int 
state,
 
return rc;
 }
-EXPORT_SYMBOL(poll_schedule_timeout);
 
 /**
  * poll_select_set_timeout - helper function to setup the timeout value
diff --git a/include/linux/poll.h b/include/linux/poll.h
index f45ebd017eaa..a3576da63377 100644
--- a/include/linux/poll.h
+++ b/include/linux/poll.h
@@ -96,8 +96,6 @@ struct poll_wqueues {
 
 extern void poll_initwait(struct poll_wqueues *pwq);
 extern void poll_freewait(struct poll_wqueues *pwq);
-extern int poll_schedule_timeout(struct poll_wqueues *pwq, int state,
-ktime_t *expires, unsigned long slack);
 extern u64 select_estimate_accuracy(struct timespec64 *tv);
 
 #define MAX_INT64_SECONDS (((s64)(~((u64)0)>>1)/HZ)-1)
-- 
2.14.2

Re: [RFC PATCH v2 2/4] mm/__free_one_page: skip merge for order-0 page unless compaction failed

2018-03-21 Thread Vlastimil Babka

On 03/20/2018 03:11 PM, Aaron Lu wrote:
> On Tue, Mar 20, 2018 at 12:45:50PM +0100, Vlastimil Babka wrote:
>> But why, with all the prefetching in place?
> 
> The prefetch is just for its order 0 buddy, if merge happens, then its
> order 1 buddy will also be checked and on and on, so the cache misses
> are much more in merge mode.

I see.

>> Not thrilled about such disruptive change in the name of a
>> microbenchmark :/ Shouldn't normally the pcplists hide the overhead?
> 
> Sadly, with the default pcp count, it didn't avoid the lock contention.
> We can of course increase pcp->count to a large enough value to avoid
> entering buddy and thus avoid zone->lock contention, but that would
> require admin to manually change the value on a per-machine per-workload
> basis I believe.

Well, anyone who really cares about performance has to invest some time
to tuning anyway, I believe?

>> If not, wouldn't it make more sense to turn zone->lock into a range lock?
> 
> Not familiar with range lock, will need to take a look at it, thanks for
> the pointer.

The suggestion was rather quick and not well thought-out. Range lock
itself is insufficient - for merging/splitting buddies it's ok for
working with struct pages because the candidate buddies are within a
MAX_ORDER range. But the freelists contain pages from the whole zone.

>>
>>> A new document file called "struct_page_filed" is added to explain
>>> the newly reused field in "struct page".
>>
>> Sounds rather ad-hoc for a single field, I'd rather document it via
>> comments.
> 
> Dave would like to have a document to explain all those "struct page"
> fields that are repurposed under different scenarios and this is the
> very start of the document :-)

Oh, I see.

> I probably should have explained the intent of the document more.
> 
> Thanks for taking a look at this.
> 
>>> Suggested-by: Dave Hansen 
>>> Signed-off-by: Aaron Lu 
>>> ---
>>>  Documentation/vm/struct_page_field |  5 +++
>>>  include/linux/mm_types.h   |  1 +
>>>  mm/compaction.c| 13 +-
>>>  mm/internal.h  | 27 
>>>  mm/page_alloc.c| 89 
>>> +-
>>>  5 files changed, 122 insertions(+), 13 deletions(-)
>>>  create mode 100644 Documentation/vm/struct_page_field
>>>

[PATCH V2 3/4] clk: add managed version of clk_bulk_get_all

2018-03-21 Thread Dong Aisheng

This patch introduces the managed version of clk_bulk_get_all.

Cc: Michael Turquette 
Cc: Stephen Boyd 
Signed-off-by: Dong Aisheng 
---
v1->v2:
 * new patch
---
 drivers/clk/clk-devres.c | 24 
 include/linux/clk.h  | 23 +++
 2 files changed, 47 insertions(+)

diff --git a/drivers/clk/clk-devres.c b/drivers/clk/clk-devres.c
index d854e26..6d3ca5e 100644
--- a/drivers/clk/clk-devres.c
+++ b/drivers/clk/clk-devres.c
@@ -70,6 +70,30 @@ int __must_check devm_clk_bulk_get(struct device *dev, int 
num_clks,
 }
 EXPORT_SYMBOL_GPL(devm_clk_bulk_get);
 
+int __must_check devm_clk_bulk_get_all(struct device *dev,
+  struct clk_bulk_data **clks)
+{
+   struct clk_bulk_devres *devres;
+   int ret;
+
+   devres = devres_alloc(devm_clk_bulk_release,
+ sizeof(*devres), GFP_KERNEL);
+   if (!devres)
+   return -ENOMEM;
+
+   ret = clk_bulk_get_all(dev, clks);
+   if (ret > 0) {
+   devres->clks = *clks;
+   devres->num_clks = ret;
+   devres_add(dev, devres);
+   } else {
+   devres_free(devres);
+   }
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(devm_clk_bulk_get_all);
+
 static int devm_clk_match(struct device *dev, void *res, void *data)
 {
struct clk **c = res;
diff --git a/include/linux/clk.h b/include/linux/clk.h
index a76fdff..fe48e01 100644
--- a/include/linux/clk.h
+++ b/include/linux/clk.h
@@ -313,6 +313,22 @@ int __must_check clk_bulk_get_all(struct device *dev,
  */
 int __must_check devm_clk_bulk_get(struct device *dev, int num_clks,
   struct clk_bulk_data *clks);
+/**
+ * devm_clk_bulk_get_all - managed get multiple clk consumers
+ * @dev: device for clock "consumer"
+ * @clks: pointer to the clk_bulk_data table of consumer
+ *
+ * Returns a positive value for the number of clocks obtained while the
+ * clock references are stored in the clk_bulk_data table in @clks field.
+ * Returns 0 if there're none and a negative value if something failed.
+ *
+ * This helper function allows drivers to get several clk
+ * consumers in one operation with management, the clks will
+ * automatically be freed when the device is unbound.
+ */
+
+int __must_check devm_clk_bulk_get_all(struct device *dev,
+  struct clk_bulk_data **clks);
 
 /**
  * devm_clk_get - lookup and obtain a managed reference to a clock producer.
@@ -658,6 +674,13 @@ static inline int __must_check devm_clk_bulk_get(struct 
device *dev, int num_clk
return 0;
 }
 
+static inline int __must_check devm_clk_bulk_get_all(struct device *dev,
+struct clk_bulk_data 
**clks);
+{
+
+   return 0;
+}
+
 static inline struct clk *devm_get_clk_from_child(struct device *dev,
struct device_node *np, const char *con_id)
 {
-- 
2.7.4

[PATCH V2 1/4] clk: bulk: add of_clk_bulk_get()

2018-03-21 Thread Dong Aisheng

'clock-names' property is optional in DT, so of_clk_bulk_get() is
introduced here to handle this for DT users without 'clock-names'
specified. Later clk_bulk_get_all() will be implemented on top of
it and this API will be kept private until someone proves they need
it because they don't have a struct device pointer.

Cc: Stephen Boyd 
Cc: Michael Turquette 
Cc: Russell King 
Reported-by: Shawn Guo 
Signed-off-by: Dong Aisheng 
---
 drivers/clk/clk-bulk.c | 32 
 1 file changed, 32 insertions(+)

diff --git a/drivers/clk/clk-bulk.c b/drivers/clk/clk-bulk.c
index 4c10456..4b357b2 100644
--- a/drivers/clk/clk-bulk.c
+++ b/drivers/clk/clk-bulk.c
@@ -19,6 +19,38 @@
 #include 
 #include 
 #include 
+#include 
+
+#if defined(CONFIG_OF) && defined(CONFIG_COMMON_CLK)
+static int __must_check of_clk_bulk_get(struct device_node *np, int num_clks,
+   struct clk_bulk_data *clks)
+{
+   int ret;
+   int i;
+
+   for (i = 0; i < num_clks; i++)
+   clks[i].clk = NULL;
+
+   for (i = 0; i < num_clks; i++) {
+   clks[i].clk = of_clk_get(np, i);
+   if (IS_ERR(clks[i].clk)) {
+   ret = PTR_ERR(clks[i].clk);
+   pr_err("%pOF: Failed to get clk index: %d ret: %d\n",
+  np, i, ret);
+   clks[i].clk = NULL;
+   goto err;
+   }
+   }
+
+   return 0;
+
+err:
+   clk_bulk_put(i, clks);
+
+   return ret;
+}
+EXPORT_SYMBOL(of_clk_bulk_get);
+#endif
 
 void clk_bulk_put(int num_clks, struct clk_bulk_data *clks)
 {
-- 
2.7.4

[PATCH V2 4/4] video: simplefb: switch to use clk_bulk API to simplify clock operations

2018-03-21 Thread Dong Aisheng

Switching to use clk_bulk API to simplify clock operations.

Cc: Hans de Goede 
Cc: Bartlomiej Zolnierkiewicz 
Cc: linux-fb...@vger.kernel.org
Cc: Masahiro Yamada 
Cc: Stephen Boyd 
Signed-off-by: Dong Aisheng 
---
v1->v2:
 * switch to clk_bulk_get_all from of_clk_bulk_get_all
---
 drivers/video/fbdev/simplefb.c | 69 --
 1 file changed, 13 insertions(+), 56 deletions(-)

diff --git a/drivers/video/fbdev/simplefb.c b/drivers/video/fbdev/simplefb.c
index a3c44ec..3c8124e 100644
--- a/drivers/video/fbdev/simplefb.c
+++ b/drivers/video/fbdev/simplefb.c
@@ -182,7 +182,7 @@ struct simplefb_par {
 #if defined CONFIG_OF && defined CONFIG_COMMON_CLK
bool clks_enabled;
unsigned int clk_count;
-   struct clk **clks;
+   struct clk_bulk_data *clks;
 #endif
 #if defined CONFIG_OF && defined CONFIG_REGULATOR
bool regulators_enabled;
@@ -214,37 +214,13 @@ static int simplefb_clocks_get(struct simplefb_par *par,
   struct platform_device *pdev)
 {
struct device_node *np = pdev->dev.of_node;
-   struct clk *clock;
-   int i;
 
if (dev_get_platdata(&pdev->dev) || !np)
return 0;
 
-   par->clk_count = of_clk_get_parent_count(np);
-   if (!par->clk_count)
-   return 0;
-
-   par->clks = kcalloc(par->clk_count, sizeof(struct clk *), GFP_KERNEL);
-   if (!par->clks)
-   return -ENOMEM;
-
-   for (i = 0; i < par->clk_count; i++) {
-   clock = of_clk_get(np, i);
-   if (IS_ERR(clock)) {
-   if (PTR_ERR(clock) == -EPROBE_DEFER) {
-   while (--i >= 0) {
-   if (par->clks[i])
-   clk_put(par->clks[i]);
-   }
-   kfree(par->clks);
-   return -EPROBE_DEFER;
-   }
-   dev_err(&pdev->dev, "%s: clock %d not found: %ld\n",
-   __func__, i, PTR_ERR(clock));
-   continue;
-   }
-   par->clks[i] = clock;
-   }
+   par->clk_count = clk_bulk_get_all(&pdev->dev, &par->clks);
+   if ((par->clk_count < 0) && (par->clk_count == -EPROBE_DEFER))
+   return -EPROBE_DEFER;
 
return 0;
 }
@@ -252,45 +228,26 @@ static int simplefb_clocks_get(struct simplefb_par *par,
 static void simplefb_clocks_enable(struct simplefb_par *par,
   struct platform_device *pdev)
 {
-   int i, ret;
+   int ret;
+
+   ret = clk_bulk_prepare_enable(par->clk_count, par->clks);
+   if (ret)
+   dev_warn(&pdev->dev, "failed to enable clocks\n");
 
-   for (i = 0; i < par->clk_count; i++) {
-   if (par->clks[i]) {
-   ret = clk_prepare_enable(par->clks[i]);
-   if (ret) {
-   dev_err(&pdev->dev,
-   "%s: failed to enable clock %d: %d\n",
-   __func__, i, ret);
-   clk_put(par->clks[i]);
-   par->clks[i] = NULL;
-   }
-   }
-   }
par->clks_enabled = true;
 }
 
 static void simplefb_clocks_destroy(struct simplefb_par *par)
 {
-   int i;
-
-   if (!par->clks)
-   return;
+   if (par->clks_enabled)
+   clk_bulk_disable_unprepare(par->clk_count, par->clks);
 
-   for (i = 0; i < par->clk_count; i++) {
-   if (par->clks[i]) {
-   if (par->clks_enabled)
-   clk_disable_unprepare(par->clks[i]);
-   clk_put(par->clks[i]);
-   }
-   }
-
-   kfree(par->clks);
+   clk_bulk_put_all(par->clk_count, par->clks);
 }
 #else
 static int simplefb_clocks_get(struct simplefb_par *par,
struct platform_device *pdev) { return 0; }
-static void simplefb_clocks_enable(struct simplefb_par *par,
-   struct platform_device *pdev) { }
+static int simplefb_clocks_enable(struct simplefb_par *par) { }
 static void simplefb_clocks_destroy(struct simplefb_par *par) { }
 #endif
 
-- 
2.7.4

[PATCH V2 2/4] clk: add new APIs to operate on all available clocks

2018-03-21 Thread Dong Aisheng

This patch introduces of_clk_bulk_get_all and clk_bulk_x_all APIs
to users who just want to handle all available clocks from device tree
without need to know the detailed clock information likes clock numbers
and names. This is useful in writing some generic drivers to handle clock
part.

Cc: Stephen Boyd 
Cc: Masahiro Yamada 
Signed-off-by: Dong Aisheng 
---
v1->v2:
 * make of_clk_bulk_get_all private
 * add clk_bulk_get/put_all
---
 drivers/clk/clk-bulk.c | 57 ++
 include/linux/clk.h| 42 -
 2 files changed, 98 insertions(+), 1 deletion(-)

diff --git a/drivers/clk/clk-bulk.c b/drivers/clk/clk-bulk.c
index 4b357b2..3293c6b 100644
--- a/drivers/clk/clk-bulk.c
+++ b/drivers/clk/clk-bulk.c
@@ -17,9 +17,11 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
+#include 
 
 #if defined(CONFIG_OF) && defined(CONFIG_COMMON_CLK)
 static int __must_check of_clk_bulk_get(struct device_node *np, int num_clks,
@@ -50,6 +52,38 @@ static int __must_check of_clk_bulk_get(struct device_node 
*np, int num_clks,
return ret;
 }
 EXPORT_SYMBOL(of_clk_bulk_get);
+
+static int __must_check of_clk_bulk_get_all(struct device_node *np,
+   struct clk_bulk_data **clks)
+{
+   struct clk_bulk_data *clk_bulk;
+   int num_clks;
+   int ret;
+
+   num_clks = of_clk_get_parent_count(np);
+   if (!num_clks)
+   return 0;
+
+   clk_bulk = kcalloc(num_clks, sizeof(*clk_bulk), GFP_KERNEL);
+   if (!clk_bulk)
+   return -ENOMEM;
+
+   ret = of_clk_bulk_get(np, num_clks, clk_bulk);
+   if (ret) {
+   kfree(clk_bulk);
+   return ret;
+   }
+
+   *clks = clk_bulk;
+
+   return num_clks;
+}
+#else
+static int __must_check of_clk_bulk_get_all(struct device_node *np,
+   struct clk_bulk_data **clks)
+{
+   return -ENOENT;
+}
 #endif
 
 void clk_bulk_put(int num_clks, struct clk_bulk_data *clks)
@@ -90,6 +124,29 @@ int __must_check clk_bulk_get(struct device *dev, int 
num_clks,
 }
 EXPORT_SYMBOL(clk_bulk_get);
 
+void clk_bulk_put_all(int num_clks, struct clk_bulk_data *clks)
+{
+   if (IS_ERR_OR_NULL(clks))
+   return;
+
+   clk_bulk_put(num_clks, clks);
+
+   kfree(clks);
+}
+EXPORT_SYMBOL(clk_bulk_put_all);
+
+int __must_check clk_bulk_get_all(struct device *dev,
+ struct clk_bulk_data **clks)
+{
+   struct device_node *np = dev_of_node(dev);
+
+   if (!np)
+   return 0;
+
+   return of_clk_bulk_get_all(np, clks);
+}
+EXPORT_SYMBOL(clk_bulk_get_all);
+
 #ifdef CONFIG_HAVE_CLK_PREPARE
 
 /**
diff --git a/include/linux/clk.h b/include/linux/clk.h
index 0dbd088..a76fdff 100644
--- a/include/linux/clk.h
+++ b/include/linux/clk.h
@@ -279,7 +279,26 @@ struct clk *clk_get(struct device *dev, const char *id);
  */
 int __must_check clk_bulk_get(struct device *dev, int num_clks,
  struct clk_bulk_data *clks);
-
+/**
+ * clk_bulk_get_all - lookup and obtain all available references to clock
+ *   producer.
+ * @dev: device for clock "consumer"
+ * @clks: pointer to the clk_bulk_data table of consumer
+ *
+ * This helper function allows drivers to get all clk consumers in one
+ * operation. If any of the clk cannot be acquired then any clks
+ * that were obtained will be freed before returning to the caller.
+ *
+ * Returns a positive value for the number of clocks obtained while the
+ * clock references are stored in the clk_bulk_data table in @clks field.
+ * Returns 0 if there're none and a negative value if something failed.
+ *
+ * Drivers must assume that the clock source is not enabled.
+ *
+ * clk_bulk_get should not be called from within interrupt context.
+ */
+int __must_check clk_bulk_get_all(struct device *dev,
+ struct clk_bulk_data **clks);
 /**
  * devm_clk_bulk_get - managed get multiple clk consumers
  * @dev: device for clock "consumer"
@@ -455,6 +474,19 @@ void clk_put(struct clk *clk);
 void clk_bulk_put(int num_clks, struct clk_bulk_data *clks);
 
 /**
+ * clk_bulk_put_all - "free" all the clock source
+ * @num_clks: the number of clk_bulk_data
+ * @clks: the clk_bulk_data table of consumer
+ *
+ * Note: drivers must ensure that all clk_bulk_enable calls made on this
+ * clock source are balanced by clk_bulk_disable calls prior to calling
+ * this function.
+ *
+ * clk_bulk_put_all should not be called from within interrupt context.
+ */
+void clk_bulk_put_all(int num_clks, struct clk_bulk_data *clks);
+
+/**
  * devm_clk_put- "free" a managed clock source
  * @dev: device used to acquire the clock
  * @clk: clock source acquired with devm_clk_get()
@@ -609,6 +641,12 @@ static inline int __must_check clk_bulk_get(struct device 
*dev, int num_clks,
return 0;
 }
 
+static inline int _

Re: [PATCH RESEND] rpmsg: Add driver_override device attribute for rpmsg_device

2018-03-21 Thread Anup Patel

On Mon, Mar 19, 2018 at 4:17 AM, Bjorn Andersson
 wrote:
> On Wed 10 Jan 05:17 PST 2018, Anup Patel wrote:
>
>> This patch adds "driver_override" device attribute for rpmsg_device which
>> will allow users to explicitly specify the rpmsg_driver to be used via
>> sysfs entry.
>>
>> The "driver_override" device attribute implemented here is very similar
>> to "driver_override" implemented for platform, pci, and amba bus types.
>>
>> One important use-case of "driver_override" device attribute is to force
>> use of rpmsg_chrdev driver for certain rpmsg_device instances.
>>
>
> I assume you mean specifically for the case where you want to prevent
> some kernel driver to probe for some given channel?

Yes, there are few use-cases where we want to prevent kernel
driver and rather access rpmsg device from user-space using
rpmsg_chrdev driver.

>
> The intention with rpmsg_char is that you through the rpmsg_ctrlX
> interface create and destroy endpoints dynamically, so you wouldn't need
> to use this mechanism to bind some specific channel to rpmsg_char.
>
>
> That said, this does make sense for completeness sake.
>
> [..]
>> diff --git a/drivers/rpmsg/rpmsg_core.c b/drivers/rpmsg/rpmsg_core.c
>> index dffa3aa..9a25e42 100644
>> --- a/drivers/rpmsg/rpmsg_core.c
>> +++ b/drivers/rpmsg/rpmsg_core.c
>> @@ -321,11 +321,11 @@ struct device *rpmsg_find_device(struct device *parent,
>>  }
>>  EXPORT_SYMBOL(rpmsg_find_device);
>>
>> -/* sysfs show configuration fields */
>> +/* sysfs configuration fields */
>>  #define rpmsg_show_attr(field, path, format_string)  \
>>  static ssize_t  
>>  \
>>  field##_show(struct device *dev, \
>> - struct device_attribute *attr, char *buf)   \
>> +  struct device_attribute *attr, char *buf)  \
>
> Seems unnecessary.

OK, I will drop these changes.

>
>>  {\
>>   struct rpmsg_device *rpdev = to_rpmsg_device(dev);  \
>>   \
>> @@ -333,11 +333,52 @@ field##_show(struct device *dev,   
>>  \
>>  }\
>>  static DEVICE_ATTR_RO(field);
>>
>> +#define rpmsg_string_attr(field, path)  
>>  \
>
> "path" is an odd name for these, I think it's a "member".
>
>> +static ssize_t  
>>  \
>> +field##_store(struct device *dev,\
>> +   struct device_attribute *attr, const char *buf, size_t sz)\
>
> field##_store(struct device *dev, struct device_attribute *attr,\
>   const char *buf, size_t sz)   \
>
> Is prettier

OK, I will update this.

>
>> +{\
>> + struct rpmsg_device *rpdev = to_rpmsg_device(dev);  \
>> + char *new, *old, *cp;   \
>> + \
>> + new = kstrndup(buf, sz, GFP_KERNEL);\
>> + if (!new)   \
>> + return -ENOMEM; \
>> + \
>> + cp = strchr(new, '\n'); \
>> + if (cp) \
>> + *cp = '\0'; \
>
> I prefer
>
> new[strcspn(new, "\n")] = '\0';

Sure, I will update this.

Thanks,
Anup

Re: [PATCH] mtd: devices: check mtd_device_register() return code

2018-03-21 Thread Miquel Raynal

Hi Arushi,

On Wed, 21 Mar 2018 11:07:09 +0530, Arushi Singhal
 wrote:

> stfsm_probe() misses error handling of mtd_device_register().
> 
> Signed-off-by: Arushi Singhal 
> ---
>  drivers/mtd/devices/st_spi_fsm.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/mtd/devices/st_spi_fsm.c 
> b/drivers/mtd/devices/st_spi_fsm.c
> index 7bc29d7..e1aa4f8 100644
> --- a/drivers/mtd/devices/st_spi_fsm.c
> +++ b/drivers/mtd/devices/st_spi_fsm.c
> @@ -2125,7 +2125,13 @@ static int stfsm_probe(struct platform_device *pdev)
>   (long long)fsm->mtd.size, (long long)(fsm->mtd.size >> 20),
>   fsm->mtd.erasesize, (fsm->mtd.erasesize >> 10));
>  
> - return mtd_device_register(&fsm->mtd, NULL, 0);
> + ret = mtd_device_register(&fsm->mtd, NULL, 0);
> + if (ret) {
> + pr_err("Failed to register device\n");
> + return ret;
> + }
> +
> + return 0;

I don't think this bring anything. However, if you want to fix
something you should jump below on error to disable the clock instead
of returning 'ret' directly.

>  
>  err_clk_unprepare:
>   clk_disable_unprepare(fsm->clk);

Thanks,
Miquèl

-- 
Miquel Raynal, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com

Re: [PATCH] pinctrl: armada-37xx: Add edge both type gpio irq support

2018-03-21 Thread Linus Walleij

On Tue, Mar 20, 2018 at 10:56 PM, Uwe Kleine-König
 wrote:

> Maybe I'm wrong, but I wonder if there could be a set of helper
> functions provided by the gpio core that helps implementing this
> software simulation of IRQ_TYPE_EDGE_BOTH reliably (i.e. as good as
> possible in software) to prevent common mistakes.
>
> First draft:
>
> disable_irq_nosync(...);
> level = gpio_get(...);
> retry:
> if (level)
> configure_for_falling_edge();
> else
> configure_for_raising_edge();
> postlevel = gpio_get(...);
>
> if (level != postlevel) {
> mark_irq_pending(); /* something like desc->istate |= 
> IRQS_PENDING */
> level = postlevel;
> goto retry;
> }
>
> enable_irq(...); /* this resends the irq */
>
> I think this only looses an event if there is an edge between gpio_get
> and the configure_for_${some}_edge and another before postlevel = ...
> that make the two events invisible. But I think this is okish, as a
> short spike might also be missed by a hw-edge-detector. And compared to
> the current code there should be no way to end in a state where we
> configured for raising edge and the level is already high.

This is looking good compared to the solutions people have hacked up.

> When the gpio toggles quickly this might keep the cpu busy in an endless
> loop, but such a sequence would also block a controller that can trigger
> on both edges in hardware. Not sure if breaking the loop at some point
> is sensible anyhow. Also calling the irq handlers would be beneficial,
> but I don't know if/how this works without (more) racing.

What would make sense (if you want a perfect solution) is to enforce
some reasonable debouncing on double edges.

That may seem hard to do since not all HW has debounce.

In the past I had the idea to implement also generic debounce with a timer
in gpiolib, so that gpiod_set_debounce() would never fail, so in effect
to factor the code from drivers/input/keyboard/gpio_keys.c
over to gpiolib so they don't need a fallback at all, and then with
double edges, enforce some debouncing based on HZ.

At one point I tried to bring the debounce code over from the
input driver, but I hit some snag, I don't remember what though.
An optional per-gpiod timer can be created in struct gpio_desc
when needed.

> A similar approach would be great to have to "simulate" level sensitive
> irqs if the hardware only implements edge logic (which affects
> armada-37xx, too, which annoys me).

Yes that would be neat too...

Yours,
Linus Walleij

Re: [PATCH RESEND] rpmsg: virtio_rpmsg_bus: fix rpmsg_probe() for virtio-mmio transport

2018-03-21 Thread Anup Patel

On Mon, Mar 19, 2018 at 4:17 AM, Bjorn Andersson
 wrote:
> On Wed 10 Jan 05:16 PST 2018, Anup Patel wrote:
>> diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c 
>> b/drivers/rpmsg/virtio_rpmsg_bus.c
> [..]
>> @@ -924,9 +925,16 @@ static int rpmsg_probe(struct virtio_device *vdev)
>>total_buf_space, &vrp->bufs_dma,
>>GFP_KERNEL);
>>   if (!bufs_va) {
>> - err = -ENOMEM;
>> - goto vqs_del;
>> - }
>> + bufs_va = dma_alloc_coherent(vdev->dev.parent,
>> +  total_buf_space, &vrp->bufs_dma,
>> +  GFP_KERNEL);
>> + if (!bufs_va) {
>> + err = -ENOMEM;
>> + goto vqs_del;
>> + } else
>> + vrp->bufs_dev = vdev->dev.parent;
>> + } else
>> + vrp->bufs_dev = vdev->dev.parent->parent;
>
> I really don't fancy the idea of us allocating on behalf of our
> grandparent here, as you show it's not certain that our grandparent is
> what someone originally expected it to be.
>
> With the purpose of being able to control these allocations there is an
> ongoing discussion related to this, which I believe will result in this
> being changed to at least vdev->dev.parent..
>
>
> I do expect that this discussion will be brought up during Linaro
> Connect the coming week.
>

Currently, rpmsg_probe() is broken for virtio-mmio transport
hence I send this patch as a stable fix.

In general, I am fine if we are eventually going towards
vdev->dev.parent usage.

Regards,
Anup

Re: [PATCH v8 7/9] pinctrl: madera: Add DT bindings for Cirrus Logic Madera codecs

2018-03-21 Thread Linus Walleij

On Mon, Feb 26, 2018 at 2:05 PM, Richard Fitzgerald
 wrote:

> This is the binding description of the pinctrl driver for Cirrus Logic
> Madera codecs. The binding uses the generic pinctrl binding so  the main
> purpose here is to describe the device-specific names for groups and
> functions.
>
> Signed-off-by: Richard Fitzgerald 
> Acked-by: Rob Herring 

Reviewed-by: Linus Walleij 

Tell me if you want me to just merge this patch to get your patch stack
smaller.

Yours,
Linus Walleij

Re: [PATCH v2] spi: rspi: use correct enum for DMA transfer direction

2018-03-21 Thread Geert Uytterhoeven

Hi Stefan,

On Mon, Mar 19, 2018 at 11:16 PM, Stefan Agner  wrote:
> Use enum dma_transfer_direction as required by dmaengine_prep_slave_sg
> instead of enum dma_data_direction. This won't change behavior in
> practice as the enum values are equivalent.

Thanks for catching!

BTW, spi-sh-msiof has the same issue. Will sent a fix.

> This fixes two warnings when building with clang:
>   drivers/spi/spi-rspi.c:538:26: warning: implicit conversion from enumeration
>   type 'enum dma_data_direction' to different enumeration type
>   'enum dma_transfer_direction' [-Wenum-conversion]
> rx->sgl, rx->nents, DMA_FROM_DEVICE,
> ^~~
>   drivers/spi/spi-rspi.c:558:26: warning: implicit conversion from enumeration
>   type 'enum dma_data_direction' to different enumeration type
>   'enum dma_transfer_direction' [-Wenum-conversion]
> tx->sgl, tx->nents, DMA_TO_DEVICE,
> ^
>
> Signed-off-by: Stefan Agner 

JFTR, as it's already applied
Reviewed-by: Geert Uytterhoeven 

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [PATCH v8 8/9] pinctrl: madera: Add driver for Cirrus Logic Madera codecs

2018-03-21 Thread Linus Walleij

On Mon, Feb 26, 2018 at 2:05 PM, Richard Fitzgerald
 wrote:

> These codecs have a variable number of I/O lines each of which
> is individually selectable to a wide range of possible functions.
>
> The functionality is slightly different from the traditional muxed
> GPIO since most of the functions can be mapped to any pin (and even
> the same function to multiple pins). Most pins have a dedicated
> "alternate" function that is only available on that pin. The
> alternate functions are usually a group of signals, though it is
> not always necessary to enable the full group, depending on the
> alternate function and how it is to be used. The mapping between
> alternate functions and GPIO pins varies between codecs depending
> on the number of alternate functions and available pins.
>
> Signed-off-by: Richard Fitzgerald 
> Reviewed-by: Linus Walleij 
> ---
> There are some minor changes since LinusW last reviewed this patch but as
> they are trivial I have carried forward Linus's Reviewed-by:
> - SPDX license headers
> - can now build it as a module
> - avoided a minor checkpatch warning about an unnecessary else {} in
> madera_get_group_name()

It's fine. Keep my Review tag.

Yours,
Linus Walleij

Re: [PATCH 3/3] i2c: mux: pca9541: prepare for PCA9641 support

2018-03-21 Thread Peter Rosin

On 2018-03-21 08:01, Vladimir Zapolskiy wrote:
> On 03/21/2018 03:19 AM, Guenter Roeck wrote:
>> On 03/20/2018 04:17 PM, Vladimir Zapolskiy wrote:
>>> Hi Peter, Ken,
>>>
>>> On 03/20/2018 11:32 AM, Peter Rosin wrote:
 Make the arbitrate and release_bus implementation chip specific.

>>>
>>> by chance I took a look at the original implementation done by Ken, and
>>> I would say that this 3/3 change is an overkill as a too generic one.
>>> Is there any next observable extension? And do two abstracted (*arbitrate)
>>> and (*release_bus) cover it well? Probably no.
>>>
>>> At first it would be simpler to add a new chip id field into struct pca9541
>>> (struct rename would be needed of course), and do a selection of specific
>>> pca9x41_arbitrate() and pca9x41_release_bus() depending on it:
>>>
>>
>> FWIW, I very much prefer Peter's code. I think it is much cleaner.
> 
> Peter's code is generic, and it makes the change about 3 times longer in lines
> of code, and the following pca9641 change on top of it will be larger as well,
> because generalization requires service.
> 
> My main concern is that if such generalization is really needed in the driver.

I just did a comparison of what would happen if I took the same shortcuts
you did, and I got 18 new lines and 3 changed lines (and some moved lines
that could have been a separate patch). You have 12 new lines and 5 changed
lines.

So, the big difference is that I add the of_match_device call while you
do not. So, it looks like you are comparing apples and oranges. Do you
have a reason for not calling of_match_device? Or were you punting that
for the patch adding PCA9641 support? That's odd, because the point of
the patch is to prepare for smooth addition of that support.

Also, I think my code allows adding support for PCA9641 with only new
lines, while your version requires changing of code.

So, I'm rejecting your arguments that your patch is significantly simpler.
And while I'm obviously a tad bit biased, I do agree with Guenter that
my structure is cleaner.

Cheers,
Peter

[PATCH 2/4] mm/memblock: introduce memblock_search_pfn_regions()

2018-03-21 Thread Jia He

This api is the preparation for further optimizing early_pfn_valid

Signed-off-by: Jia He 
---
 include/linux/memblock.h |  2 ++
 mm/memblock.c| 12 
 2 files changed, 14 insertions(+)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 9471db4..5f46956 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -203,6 +203,8 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long 
*out_start_pfn,
 i >= 0; __next_mem_pfn_range(&i, nid, p_start, p_end, p_nid))
 #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
 
+int memblock_search_pfn_regions(unsigned long pfn);
+
 unsigned long memblock_next_valid_pfn(unsigned long pfn, int *last_idx);
 /**
  * for_each_free_mem_range - iterate through free memblock areas
diff --git a/mm/memblock.c b/mm/memblock.c
index a9e8da4..f50fe5b 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1659,6 +1659,18 @@ static int __init_memblock memblock_search(struct 
memblock_type *type, phys_addr
return -1;
 }
 
+/* search memblock with the input pfn, return the region idx */
+int __init_memblock memblock_search_pfn_regions(unsigned long pfn)
+{
+   struct memblock_type *type = &memblock.memory;
+   int mid = memblock_search(type, PFN_PHYS(pfn));
+
+   if (mid == -1)
+   return -1;
+
+   return mid;
+}
+
 bool __init memblock_is_reserved(phys_addr_t addr)
 {
return memblock_search(&memblock.reserved, addr) != -1;
-- 
2.7.4

[PATCH 1/4] mm: page_alloc: reduce unnecessary binary search in memblock_next_valid_pfn()

2018-03-21 Thread Jia He

Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
where possible") optimized the loop in memmap_init_zone(). But there is
still some room for improvement. E.g. if pfn and pfn+1 are in the same
memblock region, we can simply pfn++ instead of doing the binary search
in memblock_next_valid_pfn.

Signed-off-by: Jia He 
---
 include/linux/memblock.h |  3 +--
 mm/memblock.c| 23 +++
 mm/page_alloc.c  |  3 ++-
 3 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index b7aa3ff..9471db4 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -203,8 +203,7 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long 
*out_start_pfn,
 i >= 0; __next_mem_pfn_range(&i, nid, p_start, p_end, p_nid))
 #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
 
-unsigned long memblock_next_valid_pfn(unsigned long pfn);
-
+unsigned long memblock_next_valid_pfn(unsigned long pfn, int *last_idx);
 /**
  * for_each_free_mem_range - iterate through free memblock areas
  * @i: u64 used as loop variable
diff --git a/mm/memblock.c b/mm/memblock.c
index c87924d..a9e8da4 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1133,13 +1133,26 @@ int __init_memblock memblock_set_node(phys_addr_t base, 
phys_addr_t size,
 }
 #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
 
-unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn)
+unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn,
+   int *last_idx)
 {
struct memblock_type *type = &memblock.memory;
unsigned int right = type->cnt;
unsigned int mid, left = 0;
+   unsigned long start_pfn, end_pfn;
phys_addr_t addr = PFN_PHYS(++pfn);
 
+   /* fast path, return pfh+1 if next pfn is in the same region */
+   if (*last_idx != -1) {
+   start_pfn = PFN_DOWN(type->regions[*last_idx].base);
+   end_pfn = PFN_DOWN(type->regions[*last_idx].base +
+   type->regions[*last_idx].size);
+
+   if (pfn < end_pfn && pfn > start_pfn)
+   return pfn;
+   }
+
+   /* slow path, do the binary searching */
do {
mid = (right + left) / 2;
 
@@ -1149,15 +1162,17 @@ unsigned long __init_memblock 
memblock_next_valid_pfn(unsigned long pfn)
  type->regions[mid].size))
left = mid + 1;
else {
-   /* addr is within the region, so pfn is valid */
+   *last_idx = mid;
return pfn;
}
} while (left < right);
 
if (right == type->cnt)
return -1UL;
-   else
-   return PHYS_PFN(type->regions[right].base);
+
+   *last_idx = right;
+
+   return PHYS_PFN(type->regions[*last_idx].base);
 }
 
 static phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3899209..f28c62c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5456,6 +5456,7 @@ void __meminit memmap_init_zone(unsigned long size, int 
nid, unsigned long zone,
unsigned long end_pfn = start_pfn + size;
pg_data_t *pgdat = NODE_DATA(nid);
unsigned long pfn;
+   int idx = -1;
unsigned long nr_initialised = 0;
struct page *page;
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
@@ -5487,7 +5488,7 @@ void __meminit memmap_init_zone(unsigned long size, int 
nid, unsigned long zone,
 * end_pfn), such that we hit a valid pfn (or end_pfn)
 * on our next iteration of the loop.
 */
-   pfn = memblock_next_valid_pfn(pfn) - 1;
+   pfn = memblock_next_valid_pfn(pfn, &idx) - 1;
 #endif
continue;
}
-- 
2.7.4

[PATCH RFC 0/4] optimize memblock_next_valid_pfn() and early_pfn_valid()

2018-03-21 Thread Jia He

Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
where possible") tried to optimize the loop in memmap_init_zone(). But
there is still some room for improvement.

Patch 1 optimized the memblock_next_valid_pfn()
Patch 2~4 optimized the early_pfn_valid(), I have to split it into parts
because the changes are located across subsystems.

I tested the pfn steping up process in memmap_init(), the same as before.
As for the performance improvement, after this set, I can see the time
overhead of memmap_init() is reduced from 41313 us to 24345 us in my
armv8a server(QDF2400 with 96G memory).

Attached the memblock region information in my server.
[   86.956758] Zone ranges:
[   86.959452]   DMA  [mem 0x0020-0x]
[   86.966041]   Normal   [mem 0x0001-0x0017]
[   86.972631] Movable zone start for each node
[   86.977179] Early memory node ranges
[   86.980985]   node   0: [mem 0x0020-0x0021]
[   86.987666]   node   0: [mem 0x0082-0x0307]
[   86.994348]   node   0: [mem 0x0308-0x0308]
[   87.001029]   node   0: [mem 0x0309-0x031f]
[   87.007710]   node   0: [mem 0x0320-0x033f]
[   87.014392]   node   0: [mem 0x0341-0x0563]
[   87.021073]   node   0: [mem 0x0564-0x0567]
[   87.027754]   node   0: [mem 0x0568-0x056d]
[   87.034435]   node   0: [mem 0x056e-0x086f]
[   87.041117]   node   0: [mem 0x0870-0x0871]
[   87.047798]   node   0: [mem 0x0872-0x0894]
[   87.054479]   node   0: [mem 0x0895-0x08ba]
[   87.061161]   node   0: [mem 0x08bb-0x08bc]
[   87.067842]   node   0: [mem 0x08bd-0x08c4]
[   87.074524]   node   0: [mem 0x08c5-0x08e2]
[   87.081205]   node   0: [mem 0x08e3-0x08e4]
[   87.087886]   node   0: [mem 0x08e5-0x08fc]
[   87.094568]   node   0: [mem 0x08fd-0x0910]
[   87.101249]   node   0: [mem 0x0911-0x092e]
[   87.107930]   node   0: [mem 0x092f-0x0930]
[   87.114612]   node   0: [mem 0x0931-0x0963]
[   87.121293]   node   0: [mem 0x0964-0x0e61]
[   87.127975]   node   0: [mem 0x0e62-0x0e64]
[   87.134657]   node   0: [mem 0x0e65-0x0fff]
[   87.141338]   node   0: [mem 0x1080-0x17fe]
[   87.148019]   node   0: [mem 0x1c00-0x1c00]
[   87.154701]   node   0: [mem 0x1c01-0x1c7f]
[   87.161383]   node   0: [mem 0x1c81-0x7efb]
[   87.168064]   node   0: [mem 0x7efc-0x7efd]
[   87.174746]   node   0: [mem 0x7efe-0x7efe]
[   87.181427]   node   0: [mem 0x7eff-0x7eff]
[   87.188108]   node   0: [mem 0x7f00-0x0017]
[   87.194791] Initmem setup node 0 [mem 0x0020-0x0017]

Without this patchset:
[  117.106153] Initmem setup node 0 [mem 0x0020-0x0017]
[  117.113677] before memmap_init
[  117.118195] after  memmap_init
>>> memmap_init takes 4518 us
[  117.121446] before memmap_init
[  117.154992] after  memmap_init
>>> memmap_init takes 33546 us
[  117.158241] before memmap_init
[  117.161490] after  memmap_init
>>> memmap_init takes 3249 us
>>> totally takes 41313 us

With this patchset:
[   87.194791] Initmem setup node 0 [mem 0x0020-0x0017]
[   87.202314] before memmap_init
[   87.206164] after  memmap_init
>>> memmap_init takes 3850 us
[   87.209416] before memmap_init
[   87.226662] after  memmap_init
>>> memmap_init takes 17246 us
[   87.229911] before memmap_init
[   87.233160] after  memmap_init
>>> memmap_init takes 3249 us
>>> totally takes 24345 us

Jia He (4):
  mm: page_alloc: reduce unnecessary binary search in memblock_next_valid_pfn()
  mm/memblock: introduce memblock_search_pfn_regions()
  arm64: introduce pfn_valid_region()
  mm: page_alloc: reduce unnecessary binary search in early_pfn_valid()

 arch/arm64/include/asm/page.h|  3 ++-
 arch/arm64/mm/init.c | 19 ++-
 arch/x86/include/asm/mmzone_32.h |  2 +-
 include/linux/memblock.h |  3 ++-
 include/linux/mmzone.h   | 12 +---
 mm/memblock.c| 35 +++
 mm/page_alloc.c  |  5 +++--
 7 files changed, 66 insertions(+), 13 deletions(-)

-- 
2.7.4

[PATCH 3/4] arm64: introduce pfn_valid_region()

2018-03-21 Thread Jia He

This is the preparation for further optimizing in early_pfn_valid
on arm64.

Signed-off-by: Jia He 
---
 arch/arm64/include/asm/page.h |  3 ++-
 arch/arm64/mm/init.c  | 19 ++-
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
index 60d02c8..da2cba3 100644
--- a/arch/arm64/include/asm/page.h
+++ b/arch/arm64/include/asm/page.h
@@ -38,7 +38,8 @@ extern void clear_page(void *to);
 typedef struct page *pgtable_t;
 
 #ifdef CONFIG_HAVE_ARCH_PFN_VALID
-extern int pfn_valid(unsigned long);
+extern int pfn_valid(unsigned long pfn);
+extern int pfn_valid_region(unsigned long pfn, int *last_idx);
 #endif
 
 #include 
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 00e7b90..1d9842e 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -290,7 +290,24 @@ int pfn_valid(unsigned long pfn)
return memblock_is_map_memory(pfn << PAGE_SHIFT);
 }
 EXPORT_SYMBOL(pfn_valid);
-#endif
+
+int pfn_valid_region(unsigned long pfn, int *last_idx)
+{
+   struct memblock_type *type = &memblock.memory;
+
+   if (*last_idx != -1 && pfn < PFN_DOWN(type->regions[*last_idx].base
+   + type->regions[*last_idx].size))
+   return !memblock_is_nomap(&memblock.memory.regions[*last_idx]);
+
+   *last_idx = memblock_search_pfn_regions(pfn);
+
+   if (*last_idx == -1)
+   return false;
+
+   return !memblock_is_nomap(&memblock.memory.regions[*last_idx]);
+}
+EXPORT_SYMBOL(pfn_valid_region);
+#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
 
 #ifndef CONFIG_SPARSEMEM
 static void __init arm64_memory_present(void)
-- 
2.7.4

[PATCH 4/4] mm: page_alloc: reduce unnecessary binary search in early_pfn_valid()

2018-03-21 Thread Jia He

Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
where possible") optimized the loop in memmap_init_zone(). But there is
still some room for improvement. E.g. in early_pfn_valid(), we can record
the last returned memblock region index and check check pfn++ is still in
the same region.

Currently it only improves the performance on arm64 and has no impact on
other arches.

Signed-off-by: Jia He 
---
 arch/x86/include/asm/mmzone_32.h |  2 +-
 include/linux/mmzone.h   | 12 +---
 mm/page_alloc.c  |  2 +-
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/mmzone_32.h b/arch/x86/include/asm/mmzone_32.h
index 73d8dd1..329d3ba 100644
--- a/arch/x86/include/asm/mmzone_32.h
+++ b/arch/x86/include/asm/mmzone_32.h
@@ -49,7 +49,7 @@ static inline int pfn_valid(int pfn)
return 0;
 }
 
-#define early_pfn_valid(pfn)   pfn_valid((pfn))
+#define early_pfn_valid(pfn, last_region_idx)  pfn_valid((pfn))
 
 #endif /* CONFIG_DISCONTIGMEM */
 
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index d797716..3a686af 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1267,9 +1267,15 @@ static inline int pfn_present(unsigned long pfn)
 })
 #else
 #define pfn_to_nid(pfn)(0)
-#endif
+#endif /*CONFIG_NUMA*/
+
+#ifdef CONFIG_HAVE_ARCH_PFN_VALID
+#define early_pfn_valid(pfn, last_region_idx) \
+   pfn_valid_region(pfn, last_region_idx)
+#else
+#define early_pfn_valid(pfn, last_region_idx)  pfn_valid(pfn)
+#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
 
-#define early_pfn_valid(pfn)   pfn_valid(pfn)
 void sparse_init(void);
 #else
 #define sparse_init()  do {} while (0)
@@ -1288,7 +1294,7 @@ struct mminit_pfnnid_cache {
 };
 
 #ifndef early_pfn_valid
-#define early_pfn_valid(pfn)   (1)
+#define early_pfn_valid(pfn, last_region_idx)  (1)
 #endif
 
 void memory_present(int nid, unsigned long start, unsigned long end);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f28c62c..215dc92 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5481,7 +5481,7 @@ void __meminit memmap_init_zone(unsigned long size, int 
nid, unsigned long zone,
if (context != MEMMAP_EARLY)
goto not_early;
 
-   if (!early_pfn_valid(pfn)) {
+   if (!early_pfn_valid(pfn, &idx)) {
 #ifdef CONFIG_HAVE_MEMBLOCK
/*
 * Skip to the pfn preceding the next valid one (or
-- 
2.7.4

AW: Kontakt

2018-03-21 Thread Thomas Stein

Sehr geehrte Damen und Herren, 

nach unserem Besuch Ihrer Homepage möchten wir Ihnen ein Angebot von Produkten 
vorstellen, das Ihnen ermöglichen wird, den Verkauf Ihrer Produkte sowie 
Dienstleistungen deutlich zu erhöhen.

Die Datenbanken der Firmen sind in für Sie interessante und relevante 
Zielgruppen untergliedert.

Der neue Katalog enthält 187.764 schweizerische Firmen und stellt solche Daten 
zur Verfügung wie: Namen der Firma, Firmenanschrift, Kontaktdaten des 
Firmeninhabers oder des Managers, E-Mail-Adresse, Telefonummer,
Faxnummer, Branche usw.

*** 
1. Schweiz 2018 ( 187 764 ) - 149 EUR ( bis zum 21.03.2018 )
***

http://www.gc-schweiz.net/?page=catalog


Die Verwendungsmöglichkeiten der Datenbanken sind praktisch unbegrenzt und Sie 
können durch Verwendung 
der von uns entwickelten Programme des personalisierten Versendens von 
Angeboten u.ä. mittels
E-mailing bzw. Fax effektive und sichere Werbekampagnen damit durchführen.

Bitte informieren Sie sich über die weiteren Details einmal unverbindlich auf 
unseren Webseite:

http://www.gc-schweiz.net/?page=catalog

MfG
Thomas Stein.

Re: [PATCH] mm/hugetlb: prevent hugetlb VMA to be misaligned

2018-03-21 Thread Laurent Dufour

On 20/03/2018 22:26, Mike Kravetz wrote:
> On 03/20/2018 10:25 AM, Laurent Dufour wrote:
>> When running the sampler detailed below, the kernel, if built with the VM
>> debug option turned on (as many distro do), is panicing with the following
>> message :
>> kernel BUG at /build/linux-jWa1Fv/linux-4.15.0/mm/hugetlb.c:3310!
>> Oops: Exception in kernel mode, sig: 5 [#1]
>> LE SMP NR_CPUS=2048 NUMA PowerNV
>> Modules linked in: kcm nfc af_alg caif_socket caif phonet fcrypt
>>  8<--8<--8<--8< snip 8<--8<--8<--8<
>> CPU: 18 PID: 43243 Comm: trinity-subchil Tainted: G C  E
>> 4.15.0-10-generic #11-Ubuntu
>> NIP:  c036e764 LR: c036ee48 CTR: 0009
>> REGS: c03fbcdcf810 TRAP: 0700   Tainted: G C  E
>> (4.15.0-10-generic)
>> MSR:  90029033   CR: 2400  XER:
>> 2004
>> CFAR: c036ee44 SOFTE: 1
>> GPR00: c036ee48 c03fbcdcfa90 c16ea600 c03fbcdcfc40
>> GPR04: c03fd9858950 7115e4e0 7115e4e1 
>> GPR08: 0010 0001  
>> GPR12: 2000 c7a2c600 0fe3985954d0 7115e4e0
>> GPR16:    
>> GPR20: 0fe398595a94 a6fc c03fd9858950 00018554
>> GPR24: c03fdcd84500 c19acd00 7115e4e1 c03fbcdcfc40
>> GPR28: 0020 7115e4e0 c03fbc9ac600 c03fd9858950
>> NIP [c036e764] __unmap_hugepage_range+0xa4/0x760
>> LR [c036ee48] __unmap_hugepage_range_final+0x28/0x50
>> Call Trace:
>> [c03fbcdcfa90] [7115e4e0] 0x7115e4e0 (unreliable)
>> [c03fbcdcfb50] [c036ee48]
>> __unmap_hugepage_range_final+0x28/0x50
>> [c03fbcdcfb80] [c033497c] unmap_single_vma+0x11c/0x190
>> [c03fbcdcfbd0] [c0334e14] unmap_vmas+0x94/0x140
>> [c03fbcdcfc20] [c034265c] exit_mmap+0x9c/0x1d0
>> [c03fbcdcfce0] [c0105448] mmput+0xa8/0x1d0
>> [c03fbcdcfd10] [c010fad0] do_exit+0x360/0xc80
>> [c03fbcdcfdd0] [c01104c0] do_group_exit+0x60/0x100
>> [c03fbcdcfe10] [c0110584] SyS_exit_group+0x24/0x30
>> [c03fbcdcfe30] [c000b184] system_call+0x58/0x6c
>> Instruction dump:
>> 552907fe e94a0028 e94a0408 eb2a0018 81590008 7f9c5036 0b09 e9390010
>> 7d2948f8 7d2a2838 0b0a 7d293038 <0b09> e9230086 2fa9 419e0468
>> ---[ end trace ee88f958a1c62605 ]---
>>
>> The panic is due to a VMA pointing to a hugetlb area while the
>> vma->vm_start or vma->vm_end field are not aligned to the huge page
>> boundaries. The sampler is just unmapping a part of the hugetlb area,
>> leading to 2 VMAs which are not well aligned.  The same could be achieved
>> by calling madvise() situation, as it is when running:
>> stress-ng --shm-sysv 1
>>
>> The hugetlb code is assuming that the VMA will be well aligned when it is
>> unmapped, so we must prevent such a VMA to be split or shrink to a
>> misaligned address.
>>
>> This patch is preventing this by checking the new VMA's boundaries when a
>> VMA is modified by calling vma_adjust().
>>
>> If this patch is applied, stable should be Cced.
> 
> Thanks Laurent!
> 
> This bug was introduced by 31383c6865a5.  Dan's changes for 31383c6865a5
> seem pretty straight forward.  It simply replaces an explicit check when
> splitting a vma to a new vm_ops split callout.  Unfortunately, mappings
> created via shmget/shmat have their vm_ops replaced.  Therefore, this
> split callout is never made.
> 
> The shm vm_ops do indirectly call the original vm_ops routines as needed.
> Therefore, I would suggest a patch something like the following instead.
> If we move forward with the patch, we should include Laurent's BUG output
> and perhaps test program in the commit message.

Hi Mike,

That's definitively smarter ! I missed that split() new vm ops...

Cheers,
Laurent.

linux-next: build failure after merge of the akpm tree

2018-03-21 Thread Stephen Rothwell

Hi Andrew,

After merging the akpm tree, today's linux-next build (powerpc
allyesconfig) failed like this:

drivers/base/firmware_loader/fallback.c: In function 'map_fw_priv_pages':
drivers/base/firmware_loader/fallback.c:232:2: error: implicit declaration of 
function 'vunmap'; did you mean 'kunmap'? 
[-Werror=implicit-function-declaration]
  vunmap(fw_priv->data);
  ^~
  kunmap
drivers/base/firmware_loader/fallback.c:233:18: error: implicit declaration of 
function 'vmap'; did you mean 'kmap'? [-Werror=implicit-function-declaration]
  fw_priv->data = vmap(fw_priv->pages, fw_priv->nr_pages, 0,
  ^~~~
  kmap
drivers/base/firmware_loader/fallback.c:233:16: warning: assignment makes 
pointer from integer without a cast [-Wint-conversion]
  fw_priv->data = vmap(fw_priv->pages, fw_priv->nr_pages, 0,
^
drivers/base/firmware_loader/fallback.c: In function 'firmware_loading_store':
drivers/base/firmware_loader/fallback.c:274:4: error: implicit declaration of 
function 'vfree'; did you mean 'kvfree'? [-Werror=implicit-function-declaration]
vfree(fw_priv->pages);
^
kvfree
drivers/base/firmware_loader/fallback.c: In function 'fw_realloc_pages':
drivers/base/firmware_loader/fallback.c:405:15: error: implicit declaration of 
function 'vmalloc'; did you mean 'kvmalloc'? 
[-Werror=implicit-function-declaration]
   new_pages = vmalloc(new_array_size * sizeof(void *));
   ^~~
   kvmalloc
drivers/base/firmware_loader/fallback.c:405:13: warning: assignment makes 
pointer from integer without a cast [-Wint-conversion]
   new_pages = vmalloc(new_array_size * sizeof(void *));
 ^

Maybe caused by patch

  "headers: untangle kmemleak.h from mm.h"

Anyway this file should explicitly include linux/vmalloc.h since it uses
stuff in there, so I have added the following patch for today (I think
this could just be applied to the driver-core tree ...):

From: Stephen Rothwell 
Date: Wed, 21 Mar 2018 19:06:35 +1100
Subject: [PATCH] firmware: explicitly include vmalloc.h

After some other include file changes, fixes:

drivers/base/firmware_loader/fallback.c: In function 'map_fw_priv_pages':
drivers/base/firmware_loader/fallback.c:232:2: error: implicit declaration of 
function 'vunmap'; did you mean 'kunmap'? 
[-Werror=implicit-function-declaration]
  vunmap(fw_priv->data);
  ^~
  kunmap
drivers/base/firmware_loader/fallback.c:233:18: error: implicit declaration of 
function 'vmap'; did you mean 'kmap'? [-Werror=implicit-function-declaration]
  fw_priv->data = vmap(fw_priv->pages, fw_priv->nr_pages, 0,
  ^~~~
  kmap
drivers/base/firmware_loader/fallback.c:233:16: warning: assignment makes 
pointer from integer without a cast [-Wint-conversion]
  fw_priv->data = vmap(fw_priv->pages, fw_priv->nr_pages, 0,
^
drivers/base/firmware_loader/fallback.c: In function 'firmware_loading_store':
drivers/base/firmware_loader/fallback.c:274:4: error: implicit declaration of 
function 'vfree'; did you mean 'kvfree'? [-Werror=implicit-function-declaration]
vfree(fw_priv->pages);
^
kvfree
drivers/base/firmware_loader/fallback.c: In function 'fw_realloc_pages':
drivers/base/firmware_loader/fallback.c:405:15: error: implicit declaration of 
function 'vmalloc'; did you mean 'kvmalloc'? 
[-Werror=implicit-function-declaration]
   new_pages = vmalloc(new_array_size * sizeof(void *));
   ^~~
   kvmalloc
drivers/base/firmware_loader/fallback.c:405:13: warning: assignment makes 
pointer from integer without a cast [-Wint-conversion]
   new_pages = vmalloc(new_array_size * sizeof(void *));
 ^

Signed-off-by: Stephen Rothwell 
---
 drivers/base/firmware_loader/fallback.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/base/firmware_loader/fallback.c 
b/drivers/base/firmware_loader/fallback.c
index 0a8ec7fec585..d231bbcb95d7 100644
--- a/drivers/base/firmware_loader/fallback.c
+++ b/drivers/base/firmware_loader/fallback.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "fallback.h"
 #include "firmware.h"
-- 
2.16.1

-- 
Cheers,
Stephen Rothwell


pgp12PyaCcEgf.pgp
Description: OpenPGP digital signature

Re: [Outreachy kernel] [PATCH] drm/qxl: Replace drm_gem_object_reference/unreference() with _get/put()

2018-03-21 Thread Daniel Vetter

On Tue, Mar 20, 2018 at 11:29:27AM -0700, Santha Meena Ramamoorthy wrote:
> Replace drm_gem_object_reference/unreference function with *_get/put()
> suffixes, because it is shorter and consistent with the kernel
> kref_get/put() functions. The following Coccinelle script was used:
> 
> @@
> expression e;
> @@
> 
> (
> -drm_gem_object_reference(e);
> +drm_gem_object_get(e);
> |
> -drm_gem_object_unreference(e);
> +drm_gem_object_put(e);
> |
> -drm_gem_object_unreference_unlocked(e);
> +drm_gem_object_put_unlocked(e);
> )
> 
> Signed-off-by: Santha Meena Ramamoorthy 

lgtm, thanks for your patch. Applied to drm-misc-next.
-Daniel

> ---
>  drivers/gpu/drm/qxl/qxl_display.c | 4 ++--
>  drivers/gpu/drm/qxl/qxl_dumb.c| 2 +-
>  drivers/gpu/drm/qxl/qxl_fb.c  | 6 +++---
>  drivers/gpu/drm/qxl/qxl_gem.c | 2 +-
>  drivers/gpu/drm/qxl/qxl_ioctl.c   | 4 ++--
>  drivers/gpu/drm/qxl/qxl_object.c  | 6 +++---
>  6 files changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/qxl/qxl_display.c 
> b/drivers/gpu/drm/qxl/qxl_display.c
> index 9a9214a..ecb35ed 100644
> --- a/drivers/gpu/drm/qxl/qxl_display.c
> +++ b/drivers/gpu/drm/qxl/qxl_display.c
> @@ -309,7 +309,7 @@ void qxl_user_framebuffer_destroy(struct drm_framebuffer 
> *fb)
>   struct qxl_bo *bo = gem_to_qxl_bo(qxl_fb->obj);
>  
>   WARN_ON(bo->shadow);
> - drm_gem_object_unreference_unlocked(qxl_fb->obj);
> + drm_gem_object_put_unlocked(qxl_fb->obj);
>   drm_framebuffer_cleanup(fb);
>   kfree(qxl_fb);
>  }
> @@ -1215,7 +1215,7 @@ qxl_user_framebuffer_create(struct drm_device *dev,
>   ret = qxl_framebuffer_init(dev, qxl_fb, mode_cmd, obj, &qxl_fb_funcs);
>   if (ret) {
>   kfree(qxl_fb);
> - drm_gem_object_unreference_unlocked(obj);
> + drm_gem_object_put_unlocked(obj);
>   return NULL;
>   }
>  
> diff --git a/drivers/gpu/drm/qxl/qxl_dumb.c b/drivers/gpu/drm/qxl/qxl_dumb.c
> index 11085ab..c666b89 100644
> --- a/drivers/gpu/drm/qxl/qxl_dumb.c
> +++ b/drivers/gpu/drm/qxl/qxl_dumb.c
> @@ -82,6 +82,6 @@ int qxl_mode_dumb_mmap(struct drm_file *file_priv,
>   return -ENOENT;
>   qobj = gem_to_qxl_bo(gobj);
>   *offset_p = qxl_bo_mmap_offset(qobj);
> - drm_gem_object_unreference_unlocked(gobj);
> + drm_gem_object_put_unlocked(gobj);
>   return 0;
>  }
> diff --git a/drivers/gpu/drm/qxl/qxl_fb.c b/drivers/gpu/drm/qxl/qxl_fb.c
> index 23af3e3..3388914 100644
> --- a/drivers/gpu/drm/qxl/qxl_fb.c
> +++ b/drivers/gpu/drm/qxl/qxl_fb.c
> @@ -95,7 +95,7 @@ static void qxlfb_destroy_pinned_object(struct 
> drm_gem_object *gobj)
>   qxl_bo_kunmap(qbo);
>   qxl_bo_unpin(qbo);
>  
> - drm_gem_object_unreference_unlocked(gobj);
> + drm_gem_object_put_unlocked(gobj);
>  }
>  
>  int qxl_get_handle_for_primary_fb(struct qxl_device *qdev,
> @@ -316,11 +316,11 @@ static int qxlfb_create(struct qxl_fbdev *qfbdev,
>   qxl_bo_unpin(qbo);
>   }
>   if (fb && ret) {
> - drm_gem_object_unreference_unlocked(gobj);
> + drm_gem_object_put_unlocked(gobj);
>   drm_framebuffer_cleanup(fb);
>   kfree(fb);
>   }
> - drm_gem_object_unreference_unlocked(gobj);
> + drm_gem_object_put_unlocked(gobj);
>   return ret;
>  }
>  
> diff --git a/drivers/gpu/drm/qxl/qxl_gem.c b/drivers/gpu/drm/qxl/qxl_gem.c
> index 85f5467..f5c1e78 100644
> --- a/drivers/gpu/drm/qxl/qxl_gem.c
> +++ b/drivers/gpu/drm/qxl/qxl_gem.c
> @@ -98,7 +98,7 @@ int qxl_gem_object_create_with_handle(struct qxl_device 
> *qdev,
>   return r;
>   /* drop reference from allocate - handle holds it now */
>   *qobj = gem_to_qxl_bo(gobj);
> - drm_gem_object_unreference_unlocked(gobj);
> + drm_gem_object_put_unlocked(gobj);
>   return 0;
>  }
>  
> diff --git a/drivers/gpu/drm/qxl/qxl_ioctl.c b/drivers/gpu/drm/qxl/qxl_ioctl.c
> index e8c0b10..e238a1a 100644
> --- a/drivers/gpu/drm/qxl/qxl_ioctl.c
> +++ b/drivers/gpu/drm/qxl/qxl_ioctl.c
> @@ -121,7 +121,7 @@ static int qxlhw_handle_to_bo(struct drm_file *file_priv, 
> uint64_t handle,
>   qobj = gem_to_qxl_bo(gobj);
>  
>   ret = qxl_release_list_add(release, qobj);
> - drm_gem_object_unreference_unlocked(gobj);
> + drm_gem_object_put_unlocked(gobj);
>   if (ret)
>   return ret;
>  
> @@ -343,7 +343,7 @@ static int qxl_update_area_ioctl(struct drm_device *dev, 
> void *data,
>   qxl_bo_unreserve(qobj);
>  
>  out:
> - drm_gem_object_unreference_unlocked(gobj);
> + drm_gem_object_put_unlocked(gobj);
>   return ret;
>  }
>  
> diff --git a/drivers/gpu/drm/qxl/qxl_object.c 
> b/drivers/gpu/drm/qxl/qxl_object.c
> index f6b80fe..e9fb0ab 100644
> --- a/drivers/gpu/drm/qxl/qxl_object.c
> +++ b/drivers/gpu/drm/qxl/qxl_object.c
> @@ -211,13 +211,13 @@ void qxl_bo_unref(struct qxl_bo **bo)
>   if ((*bo) == NULL)
>   return;
>  
> - drm_gem_object_unreference_unlocke

Re: [PATCH] mm/hugetlb: prevent hugetlb VMA to be misaligned

2018-03-21 Thread Laurent Dufour

On 20/03/2018 22:35, Mike Kravetz wrote:
> On 03/20/2018 02:26 PM, Mike Kravetz wrote:
>> Thanks Laurent!
>>
>> This bug was introduced by 31383c6865a5.  Dan's changes for 31383c6865a5
>> seem pretty straight forward.  It simply replaces an explicit check when
>> splitting a vma to a new vm_ops split callout.  Unfortunately, mappings
>> created via shmget/shmat have their vm_ops replaced.  Therefore, this
>> split callout is never made.
>>
>> The shm vm_ops do indirectly call the original vm_ops routines as needed.
>> Therefore, I would suggest a patch something like the following instead.
>> If we move forward with the patch, we should include Laurent's BUG output
>> and perhaps test program in the commit message.
> 
> Sorry, patch in previous mail was a mess
> 
> From 7a19414319c7937fd2757c27f936258f16c1f61d Mon Sep 17 00:00:00 2001
> From: Mike Kravetz 
> Date: Tue, 20 Mar 2018 13:56:57 -0700
> Subject: [PATCH] shm: add split function to shm_vm_ops
> 
> The split function was added to vm_operations_struct to determine
> if a mapping can be split.  This was mostly for device-dax and
> hugetlbfs mappings which have specific alignment constraints.
> 
> mappings initiated via shmget/shmat have their original vm_ops
> overwritten with shm_vm_ops.  shm_vm_ops functions will call back
> to the original vm_ops if needed.  Add such a split function.

FWIW,
Reviewed-by: Laurent Dufour 
Tested-by: Laurent Dufour 

> Fixes: 31383c6865a5 ("mm, hugetlbfs: introduce ->split() to 
> vm_operations_struct)
> Reported by: Laurent Dufour 
> Signed-off-by: Mike Kravetz 
> ---
>  ipc/shm.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/ipc/shm.c b/ipc/shm.c
> index 7acda23430aa..50e88fc060b1 100644
> --- a/ipc/shm.c
> +++ b/ipc/shm.c
> @@ -386,6 +386,17 @@ static int shm_fault(struct vm_fault *vmf)
>   return sfd->vm_ops->fault(vmf);
>  }
> 
> +static int shm_split(struct vm_area_struct *vma, unsigned long addr)
> +{
> + struct file *file = vma->vm_file;
> + struct shm_file_data *sfd = shm_file_data(file);
> +
> + if (sfd->vm_ops && sfd->vm_ops->split)
> + return sfd->vm_ops->split(vma, addr);
> +
> + return 0;
> +}
> +
>  #ifdef CONFIG_NUMA
>  static int shm_set_policy(struct vm_area_struct *vma, struct mempolicy *new)
>  {
> @@ -510,6 +521,7 @@ static const struct vm_operations_struct shm_vm_ops = {
>   .open   = shm_open, /* callback for a new vm-area open */
>   .close  = shm_close,/* callback for when the vm-area is released */
>   .fault  = shm_fault,
> + .split  = shm_split,
>  #if defined(CONFIG_NUMA)
>   .set_policy = shm_set_policy,
>   .get_policy = shm_get_policy,
>

Re: [PATCH 1/5] bus: arm-cci: use asm unreachable

2018-03-21 Thread Stefan Agner

On 21.03.2018 00:30, Russell King - ARM Linux wrote:
> On Wed, Mar 21, 2018 at 12:02:02AM +0100, Stefan Agner wrote:
>> Mixing asm and C code is not recommended in a naked function by
>> gcc and leads to an error when using clang:
>>   drivers/bus/arm-cci.c:2107:2: error: non-ASM statement in naked
>>   function is not supported
>> unreachable();
>> ^
>>
>> Instead of using the unreachable() macro use the assember variant
>> ASM_UNREACHABLE.  This will no longer emit __builtin_unreachable(),
>> but since the function is naked and its return type is void it seems
>> not to have aversive effects.
> 
> I think that unreachable() there is rather silly - this function
> *does* return, and the comments say as much.  Just delete the silly
> "unreachable()", there's no need to put an ASM_UNREACHABLE in there.
> 
> The function is not declared as not returning, and nothing in this
> file uses it anyway - it's called from the mcpm code, which also
> _does_ expect this function to return (if it doesn't, then we're
> basically saying the CPU that called it is dead.)
> 

Hm, that makes sense. Will just drop unreachable() in the next revision.

Thanks for reviewing!

--
Stefan

>>
>> Signed-off-by: Stefan Agner 
>> ---
>>  drivers/bus/arm-cci.c | 3 +--
>>  1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/drivers/bus/arm-cci.c b/drivers/bus/arm-cci.c
>> index 5426c04fe24b..ee9da86fec47 100644
>> --- a/drivers/bus/arm-cci.c
>> +++ b/drivers/bus/arm-cci.c
>> @@ -2084,6 +2084,7 @@ asmlinkage void __naked cci_enable_port_for_self(void)
>>
>>  "   mov r0, #0 \n"
>>  "   bx  lr \n"
>> +ASM_UNREACHABLE
>>
>>  "   .align  2 \n"
>>  "5: .word   cpu_port - . \n"
>> @@ -2103,8 +2104,6 @@ asmlinkage void __naked cci_enable_port_for_self(void)
>>  [sizeof_struct_cpu_port] "i" (sizeof(struct cpu_port)),
>>  [sizeof_struct_ace_port] "i" (sizeof(struct cci_ace_port)),
>>  [offsetof_port_phys] "i" (offsetof(struct cci_ace_port, phys)) );
>> -
>> -unreachable();
>>  }
>>
>>  /**
>> --
>> 2.16.2
>>

Re: [patch] mm, thp: do not cause memcg oom for thp

2018-03-21 Thread Michal Hocko

On Tue 20-03-18 13:25:23, David Rientjes wrote:
> On Tue, 20 Mar 2018, Michal Hocko wrote:
> 
> > > Commit 2516035499b9 ("mm, thp: remove __GFP_NORETRY from khugepaged and
> > > madvised allocations") changed the page allocator to no longer detect thp
> > > allocations based on __GFP_NORETRY.
> > > 
> > > It did not, however, modify the mem cgroup try_charge() path to avoid oom
> > > kill for either khugepaged collapsing or thp faulting.  It is never
> > > expected to oom kill a process to allocate a hugepage for thp; reclaim is
> > > governed by the thp defrag mode and MADV_HUGEPAGE, but allocations (and
> > > charging) should fallback instead of oom killing processes.
> > 
> > For some reason I thought that the charging path simply bails out for
> > costly orders - effectively the same thing as for the global OOM killer.
> > But we do not. Is there any reason to not do that though? Why don't we
> > simply do
> > 
> 
> I'm not sure of the expectation of high-order memcg charging without 
> __GFP_NORETRY,

It should be semantically compatible with the allocation path.

> I only know that khugepaged can now cause memcg oom kills 
> when trying to collapse memory, and then subsequently found that the same 
> situation exists for faulting instead of falling back to small pages.

And that is clearly a bug because page allocator doesn't oom kill while
the memcg charge does for the same gfp flag. That should be fixed.

> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index d1a917b5b7b7..08accbcd1a18 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -1493,7 +1493,7 @@ static void memcg_oom_recover(struct mem_cgroup 
> > *memcg)
> >  
> >  static void mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int order)
> >  {
> > -   if (!current->memcg_may_oom)
> > +   if (!current->memcg_may_oom || order > PAGE_ALLOC_COSTLY_ORDER)
> > return;
> > /*
> >  * We are in the middle of the charge context here, so we
> 
> That may make sense as an additional patch, but for thp allocations we 
> don't want to retry reclaim nr_retries times anyway; we want the old 
> behavior of __GFP_NORETRY before commit 2516035499b9.

Why? Allocation and the charge path should use the same gfp mask unless
there is a strong reason for it. If you have one then please mention it
in the changelog.

> So the above would be a follow-up patch that wouldn't replace mine.

Unless there is a strong reason to use different gfp mask for the
allocation and the charge then your fix is actually wrong.
-- 
Michal Hocko
SUSE Labs

Re: [PATCH 0/1] cover-letter/lz4: Implement lz4 with dynamic offset length.

2018-03-21 Thread Sergey Senozhatsky

CC-ing Nick

Nick, can you take a look?

Message-IDs:
lkml.kernel.org/r/1521607242-3968-1-git-send-email-maninder...@samsung.com
lkml.kernel.org/r/1521607242-3968-2-git-send-email-maninder...@samsung.com

-ss

linux-next: Tree for Mar 21

2018-03-21 Thread Stephen Rothwell

Hi all,

Changes since 20180320:

The arm64 tree gained a conflict against the asm-generic tree.

The powerpc tree gained a conflict against the asm-generic tree.

The sparc-next tree gained a conflict against the arm64 tree.

The vfs tree still had its build failure for which I reverted a commit.

The sound-asoc tree gained a build failure, so I used the version from
next-20180320.

The selinux tree gained a conflict against the security tree.

The ftrace tree gained conflicts against the jc_docs tree.

The userns tree gained conflicts against the fuse tree.

The akpm tree gained a build failure for which I applied a patch.

Non-merge commits (relative to Linus' tree): 8707
 11212 files changed, 403350 insertions(+), 742901 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 261 trees (counting Linus' and 44 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (1b5f3ba415fe Merge branch 'for-4.16-fixes' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup)
Merging fixes/master (7928b2cbe55b Linux 4.16-rc1)
Merging kbuild-current/fixes (55fe6da9efba kbuild: Handle builtin dtb file 
names containing hyphens)
Merging arc-current/for-curr (661e50bc8532 Linux 4.16-rc4)
Merging arm-current/fixes (091f02483df7 ARM: net: bpf: clarify tail_call index)
Merging arm64-fixes/for-next/fixes (e21da1c99200 arm64: Relax 
ARM_SMCCC_ARCH_WORKAROUND_1 discovery)
Merging m68k-current/for-linus (2334b1ac1235 MAINTAINERS: Add NuBus subsystem 
entry)
Merging metag-fixes/fixes (b884a190afce metag/usercopy: Add missing fixups)
Merging powerpc-fixes/fixes (e4b79900222b powerpc/64s: Fix NULL 
AT_BASE_PLATFORM when using DT CPU features)
Merging sparc/master (9c548bb5823d sparc64: Oracle DAX driver depends on 
SPARC64)
Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2)
Merging net/master (5f2fb802eee1 ipv6: old_dport should be a __be16 in 
__ip6_datagram_connect())
Merging bpf/master (b6b76dd62c56 error-injection: Fix to prohibit jump 
optimization)
Merging ipsec/master (f8a554b4aa96 vti6: Fix dev->max_mtu setting)
Merging netfilter/master (467697d289e7 netfilter: nf_tables: add missing 
netlink attrs to policies)
Merging ipvs/master (f7fb77fc1235 netfilter: nft_compat: check extension hook 
mask only if set)
Merging wireless-drivers/master (9b9322db5c5a brcmfmac: Fix check for ISO3166 
code)
Merging mac80211/master (a069215cf598 net: fec: Fix unbalanced PM runtime calls)
Merging rdma-fixes/for-rc (80cf79ae4f68 RDMA/verbs: Remove restrack entry from 
XRCD structure)
Merging sound-current/for-linus (a6618f4aedb2 ALSA: usb-audio: Fix parsing 
descriptor of UAC2 processing unit)
Merging pci-current/for-linus (fc110ebdd014 PCI: dwc: Fix enumeration end when 
reaching root subordinate)
Merging driver-core.current/driver-core-linus (0c8efd610b58 Linux 4.16-rc5)
Merging tty.current/tty-linus (c698ca527893 Linux 4.16-rc6)
Merging usb.current/usb-linus (c698ca527893 Linux 4.16-rc6)
Merging usb-gadget-fixes/fixes (c6ba5084ce0d usb: gadget: udc: renesas_usb3: 
add binging for r8a77965)
Merging usb-serial-fixes/usb-linus (86d71233b615 USB: serial: ftdi_sio: add 
support for Harman FirmwareHubEmulator)
Merging usb-chipidea-fixes/ci-for-usb-stable (964728f9f407 USB: chipidea: msm: 
fix ulpi-node lookup)
Merging phy/fixes (59fba0869aca phy

Re: [PATCH 1/1] mm/page_owner: fix recursion bug after changing skip entries

2018-03-21 Thread Vlastimil Babka

On 03/21/2018 05:37 AM, Maninder Singh wrote:
> This patch fixes "5f48f0bd4e368425db4424b9afd1bd251d32367a".
> (mm, page_owner: skip unnecessary stack_trace entries)
> 
> Because if we skip first two entries then logic of checking count
> value as 2 for recursion is broken and code will go in one depth
> recursion.
> 
> so we need to check only one call of _RET_IP(__set_page_owner)
> while checking for recursion.
> 
> Current Backtrace while checking for recursion:-
> 
> (save_stack) from (__set_page_owner)  // (But recursion returns 
> true here)
> (__set_page_owner)   from (get_page_from_freelist)
> (get_page_from_freelist) from (__alloc_pages_nodemask)
> (__alloc_pages_nodemask) from (depot_save_stack)
> (depot_save_stack)   from (save_stack)   // recursion should return 
> true here
> (save_stack) from (__set_page_owner)
> (__set_page_owner)   from (get_page_from_freelist)
> (get_page_from_freelist) from (__alloc_pages_nodemask+)
> (__alloc_pages_nodemask) from (depot_save_stack)
> (depot_save_stack)   from (save_stack)
> (save_stack) from (__set_page_owner)
> (__set_page_owner)   from (get_page_from_freelist)
> 
> Correct Backtrace with fix:
> 
> (save_stack) from (__set_page_owner) // recursion returned true 
> here
> (__set_page_owner)   from (get_page_from_freelist)
> (get_page_from_freelist) from (__alloc_pages_nodemask+)
> (__alloc_pages_nodemask) from (depot_save_stack)
> (depot_save_stack)   from (save_stack)
> (save_stack) from (__set_page_owner)
> (__set_page_owner)   from (get_page_from_freelist)
> 
> Signed-off-by: Maninder Singh 
> Signed-off-by: Vaneet Narang 
Fixes: 5f48f0bd4e36 ("mm, page_owner: skip unnecessary stack_trace entries")

Good catch.
Acked-by: Vlastimil Babka 

> ---
>  mm/page_owner.c |6 +++---
>  1 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/page_owner.c b/mm/page_owner.c
> index 8592543..46ab1c4 100644
> --- a/mm/page_owner.c
> +++ b/mm/page_owner.c
> @@ -123,13 +123,13 @@ void __reset_page_owner(struct page *page, unsigned int 
> order)
>  static inline bool check_recursive_alloc(struct stack_trace *trace,
>   unsigned long ip)
>  {
> - int i, count;
> + int i;
>  
>   if (!trace->nr_entries)
>   return false;
>  
> - for (i = 0, count = 0; i < trace->nr_entries; i++) {
> - if (trace->entries[i] == ip && ++count == 2)
> + for (i = 0; i < trace->nr_entries; i++) {
> + if (trace->entries[i] == ip)
>   return true;
>   }
>  
>

[PATCH] dmaengine: edma: Check the memory allocation for the memcpy dma device

2018-03-21 Thread Peter Ujfalusi

If the allocation fails then disable the memcpy support.

Signed-off-by: Peter Ujfalusi 
---
 drivers/dma/edma.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c
index 5b197473106b..519e69e81fca 100644
--- a/drivers/dma/edma.c
+++ b/drivers/dma/edma.c
@@ -1899,6 +1899,11 @@ static void edma_dma_init(struct edma_cc *ecc, bool 
legacy_mode)
 
if (memcpy_channels) {
m_ddev = devm_kzalloc(ecc->dev, sizeof(*m_ddev), GFP_KERNEL);
+   if (!m_ddev) {
+   dev_warn(ecc->dev, "memcpy is disabled due to OoM\n");
+   memcpy_channels = NULL;
+   goto ch_setup;
+   }
ecc->dma_memcpy = m_ddev;
 
dma_cap_zero(m_ddev->cap_mask);
@@ -1926,6 +1931,7 @@ static void edma_dma_init(struct edma_cc *ecc, bool 
legacy_mode)
dev_info(ecc->dev, "memcpy is disabled\n");
}
 
+ch_setup:
for (i = 0; i < ecc->num_channels; i++) {
struct edma_chan *echan = &ecc->slave_chans[i];
echan->ch_num = EDMA_CTLR_CHAN(ecc->id, i);
-- 
Peter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki

Re: [PATCH RESEND v2 1/2] drm/xen-front: Add support for Xen PV display frontend

2018-03-21 Thread Oleksandr Andrushchenko


On 03/20/2018 03:47 PM, Daniel Vetter wrote:

On Tue, Mar 20, 2018 at 01:58:01PM +0200, Oleksandr Andrushchenko wrote:

On 03/19/2018 05:28 PM, Daniel Vetter wrote:

There should be no difference between immediate removal and delayed
removal of the drm_device from the xenbus pov. The lifetimes of the
front-end (drm_device) and backend (the xen bus thing) are entirely
decoupled:

Well, they are not decoupled for simplicity of handling,
please see below

So for case 2 you only have 1 case:

- drm_dev_unplug
- tear down the entire xenbus backend completely
- all xenbus access will be caught with drm_dev_entre/exit (well right
now drm_dev_is_unplugged) checks, including any access to your private
drm_device data
- once drm_device->open_count == 0 the core will tear down the
drm_device instance and call your optional drm_driver->release
callback.

So past drm_dev_unplug the drm_device is in zombie state and the only
thing that will happen is a) it rejects all ioctls and anything else
userspace might ask it to do and b) gets releases once the last
userspace reference is gone.

I have re-worked the driver with this in mind [1]
So, I now use drm_dev_unplug and destroy the DRM device
on drm_driver.release.
In context of unplug work I also merged xen_drm_front_drv.c and
xen_drm_front.c as these are too coupled together now.

Could you please take a look and tell me if this is what you mean?

If the backend comes up again, you create a _new_ drm_device instance
(while the other one is still in the process of eventually getting
released).

We only have a single xenbus instance, so this way I'll need
to handle list of such zombies. For that reason I prefer to
wait until the DRM device is destroyed, telling the backend
to hold on until then (via going into XenbusStateReconfiguring state).

Why exactly do you need to keep track of your drm_devices from the xenbus?
Once unplugged, there should be no connection with the "hw" for your
device, in neither direction. Maybe I need to look again, but this still
smells funny and not like something you should ever do.

Ok, probably new reworked code will make things cleaner and answer
your concerns. I also removed some obsolete stuff, e.g. platform device,
so this path became even cleaner now ;)

Another drawback of such approach is that I'll have different
minors at run-time, e.g. card0, card1, etc.
For software which has /dev/dri/card0 hardcoded it may be a problem.
But this is minor, IMO

Fix userspace :-)

But yeah unlikely this is a problem, hotplugging is fairly old thing.


In short, your driver code should never have a need to look at
drm_device->open_count. I hope this explains it a bit better.
-Daniel


Yes, you are correct: at [1] I am not touching drm_device->open_count
anymore and everything just happens synchronously
[1] https://github.com/andr2000/linux/commits/drm_tip_pv_drm_v3

Please just resend, makes it easier to comment inline.

I need to wait for Xen community reviewers before resending, so this
is why I hoped you can take a look before that, so I have a chance to
address more of your comments in v4

-Daniel

Thank you,
Oleksandr

Re: [PATCH v9 0/4] fuse: mounts from non-init user namespaces

2018-03-21 Thread Miklos Szeredi

On Tue, Mar 20, 2018 at 7:27 PM, Eric W. Biederman
 wrote:
> Miklos Szeredi  writes:

>> I did just one modification to "fuse: Fail all requests with invalid
>> uids or gids": instead of zeroing out the context for the nofail case,
>> continue to use the "_munged" variants. I don't think this hurts and
>> is better for backward compatibility (I guess the only relevant use
>> would be for debugging output, but we don't want to regress even for
>> that if not necessary)
>
> Hmm...
>
> The thing is the failure doesn't come in the difference between the
> _munged and the normal variants.  The difference between
> munged and non-munged variants is how they handled failure ((uid16_t)-2)
> aka 0xfffe for munged and -1 for the non-munged case.
>
> The failures are introduced by changing &init_user_ns to fc->user_ns.

Right.

> The operations in question are iop->flush and fuse_force_forget (on an
> error).   I don't know what value having ids on those paths will do
> they are operations that must succeed, and they should not change the
> on-disk ids.  I was thinking saying the most privileged id was asking
> for the oepration would seem to make sense.

I don't think anybody should actually *care* about the id's in flush,
but I'd still not change the current behavior for change's sake.

>
> With the munged variants we will get (uid16_t)-2 aka 0xfffe aka
> nobody asking for the operation if things don't map.  In practice
> the don't map case is new.
>
> Since the id's should not be looked at anyway I don't see it makes
> much difference which ids we use so the munged case seems at least
> plausible.
>
> It might be better to use the non-munghed variant and do:
> if (req->in.h.uid == (uid_t)-1)
> req.in.h.uid = 0;
> if (req->in.h.gid == (gid_t)-1)
> req.in.h.gid = 0;
>
> That might be less surprising to userspace.  As I don't think the
> unmapped case has ever occurred in practice yet.

Right, that would work too, but I don't think it actually matters, so
unless you can think of an actual security issue arising from using
the munged variants, I'd just leave it as it is.

Thanks,
Miklos

Re: [PATCH] mm/hugetlb: prevent hugetlb VMA to be misaligned

2018-03-21 Thread Michal Hocko

On Tue 20-03-18 14:35:28, Mike Kravetz wrote:
> On 03/20/2018 02:26 PM, Mike Kravetz wrote:
> > Thanks Laurent!
> > 
> > This bug was introduced by 31383c6865a5.  Dan's changes for 31383c6865a5
> > seem pretty straight forward.  It simply replaces an explicit check when
> > splitting a vma to a new vm_ops split callout.  Unfortunately, mappings
> > created via shmget/shmat have their vm_ops replaced.  Therefore, this
> > split callout is never made.
> > 
> > The shm vm_ops do indirectly call the original vm_ops routines as needed.
> > Therefore, I would suggest a patch something like the following instead.
> > If we move forward with the patch, we should include Laurent's BUG output
> > and perhaps test program in the commit message.
> 
> Sorry, patch in previous mail was a mess
> 
> >From 7a19414319c7937fd2757c27f936258f16c1f61d Mon Sep 17 00:00:00 2001
> From: Mike Kravetz 
> Date: Tue, 20 Mar 2018 13:56:57 -0700
> Subject: [PATCH] shm: add split function to shm_vm_ops
> 
> The split function was added to vm_operations_struct to determine
> if a mapping can be split.  This was mostly for device-dax and
> hugetlbfs mappings which have specific alignment constraints.
> 
> mappings initiated via shmget/shmat have their original vm_ops
> overwritten with shm_vm_ops.  shm_vm_ops functions will call back
> to the original vm_ops if needed.  Add such a split function.
> 
> Fixes: 31383c6865a5 ("mm, hugetlbfs: introduce ->split() to 
> vm_operations_struct)
> Reported by: Laurent Dufour 
> Signed-off-by: Mike Kravetz 

Yes this looks much better than the original hugetlb specific code in
the generic vma code.

Please add the original VM_BUG_ON report to the changelog

Cc: stable
Acked-by: Michal Hocko 

> ---
>  ipc/shm.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/ipc/shm.c b/ipc/shm.c
> index 7acda23430aa..50e88fc060b1 100644
> --- a/ipc/shm.c
> +++ b/ipc/shm.c
> @@ -386,6 +386,17 @@ static int shm_fault(struct vm_fault *vmf)
>   return sfd->vm_ops->fault(vmf);
>  }
>  
> +static int shm_split(struct vm_area_struct *vma, unsigned long addr)
> +{
> + struct file *file = vma->vm_file;
> + struct shm_file_data *sfd = shm_file_data(file);
> +
> + if (sfd->vm_ops && sfd->vm_ops->split)
> + return sfd->vm_ops->split(vma, addr);
> +
> + return 0;
> +}
> +
>  #ifdef CONFIG_NUMA
>  static int shm_set_policy(struct vm_area_struct *vma, struct mempolicy *new)
>  {
> @@ -510,6 +521,7 @@ static const struct vm_operations_struct shm_vm_ops = {
>   .open   = shm_open, /* callback for a new vm-area open */
>   .close  = shm_close,/* callback for when the vm-area is released */
>   .fault  = shm_fault,
> + .split  = shm_split,
>  #if defined(CONFIG_NUMA)
>   .set_policy = shm_set_policy,
>   .get_policy = shm_get_policy,
> -- 
> 2.13.6

-- 
Michal Hocko
SUSE Labs

Re: [PATCH 3/5] ARM: trusted_foundations: do not use naked function

2018-03-21 Thread Stefan Agner

On 21.03.2018 00:13, Russell King - ARM Linux wrote:
> On Wed, Mar 21, 2018 at 12:02:04AM +0100, Stefan Agner wrote:
>> As documented in GCC naked functions should only use Basic asm
>> syntax. The Extended asm or mixture of Basic asm and "C" code is
>> not guaranteed. Currently this works because it was hard coded
>> to follow and check GCC behavior for arguments and register
>> placement.
> 
> Those checks have nothing to do with that at all.  The whole point of
> __asmeq() is to catch situations where you use register variables,
> specifying which register you want them in, and GCC then ends up
> passing them to assembly code in some other random register(s).
> 
> This was found with older GCCs, and the problem was fixed.  It has
> nothing to do with naked functions per se.
> 

Ok, will reword that part to something like:

As documented in GCC naked functions should only use Basic asm
syntax. The Extended asm or mixture of Basic asm and "C" code cannot
be depended upon.

Furthermore with clang using parameters in Extended asm in a
naked function is not supported:
...

> In fact, as you're introducing further register variables, these
> checks become more important to have than they were with the
> previous code.

Ok I see, so I definitely have to leave them in.

You generally agree with the change otherwise?

--
Stefan

[PATCH ghak21 V4 1/2] audit: remove path param from link denied function

2018-03-21 Thread Richard Guy Briggs

In commit 45b578fe4c3cade6f4ca1fc934ce199afd857edc
("audit: link denied should not directly generate PATH record")
the need for the struct path *link parameter was removed.
Remove the now useless struct path argument.

Signed-off-by: Richard Guy Briggs 
---
 fs/namei.c| 4 ++--
 include/linux/audit.h | 6 ++
 kernel/audit.c| 3 +--
 3 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 9cc91fb..e3682bb 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -945,7 +945,7 @@ static inline int may_follow_link(struct nameidata *nd)
if (nd->flags & LOOKUP_RCU)
return -ECHILD;
 
-   audit_log_link_denied("follow_link", &nd->stack[0].link);
+   audit_log_link_denied("follow_link");
return -EACCES;
 }
 
@@ -1011,7 +1011,7 @@ static int may_linkat(struct path *link)
if (safe_hardlink_source(inode) || inode_owner_or_capable(inode))
return 0;
 
-   audit_log_link_denied("linkat", link);
+   audit_log_link_denied("linkat");
return -EPERM;
 }
 
diff --git a/include/linux/audit.h b/include/linux/audit.h
index af410d9..75d5b03 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -146,8 +146,7 @@ extern void audit_log_d_path(struct 
audit_buffer *ab,
 const struct path *path);
 extern voidaudit_log_key(struct audit_buffer *ab,
  char *key);
-extern voidaudit_log_link_denied(const char *operation,
- const struct path *link);
+extern voidaudit_log_link_denied(const char *operation);
 extern voidaudit_log_lost(const char *message);
 
 extern int audit_log_task_context(struct audit_buffer *ab);
@@ -194,8 +193,7 @@ static inline void audit_log_d_path(struct audit_buffer *ab,
 { }
 static inline void audit_log_key(struct audit_buffer *ab, char *key)
 { }
-static inline void audit_log_link_denied(const char *string,
-const struct path *link)
+static inline void audit_log_link_denied(const char *string)
 { }
 static inline int audit_log_task_context(struct audit_buffer *ab)
 {
diff --git a/kernel/audit.c b/kernel/audit.c
index 3f2f143..e8bf8d7 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -2308,9 +2308,8 @@ void audit_log_task_info(struct audit_buffer *ab, struct 
task_struct *tsk)
 /**
  * audit_log_link_denied - report a link restriction denial
  * @operation: specific link operation
- * @link: the path that triggered the restriction
  */
-void audit_log_link_denied(const char *operation, const struct path *link)
+void audit_log_link_denied(const char *operation)
 {
struct audit_buffer *ab;
 
-- 
1.8.3.1

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 926 matches

Mail list logo