RE: [PATCH net v2] xen-netback: fix race condition on XenBus disconnect

2017-03-06 Thread Paul Durrant
> -Original Message-
> From: Igor Druzhinin [mailto:igor.druzhi...@citrix.com]
> Sent: 03 March 2017 20:23
> To: netdev@vger.kernel.org; xen-de...@lists.xenproject.org
> Cc: Paul Durrant ; jgr...@suse.com; Wei Liu
> ; Igor Druzhinin 
> Subject: [PATCH net v2] xen-netback: fix race condition on XenBus
> disconnect
> 
> In some cases during XenBus disconnect event handling and subsequent
> queue resource release there may be some TX handlers active on
> other processors. Use RCU in order to synchronize with them.
> 
> Signed-off-by: Igor Druzhinin 
> ---
> v2:
>  * Add protection for xenvif_get_ethtool_stats
>  * Additional comments and fixes
> ---
>  drivers/net/xen-netback/interface.c | 29 ++---
>  drivers/net/xen-netback/netback.c   |  2 +-
>  drivers/net/xen-netback/xenbus.c| 20 ++--
>  3 files changed, 33 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-
> netback/interface.c
> index a2d32676..266b7cd 100644
> --- a/drivers/net/xen-netback/interface.c
> +++ b/drivers/net/xen-netback/interface.c
> @@ -164,13 +164,17 @@ static int xenvif_start_xmit(struct sk_buff *skb,
> struct net_device *dev)
>  {
>   struct xenvif *vif = netdev_priv(dev);
>   struct xenvif_queue *queue = NULL;
> - unsigned int num_queues = vif->num_queues;
> + unsigned int num_queues;
>   u16 index;
>   struct xenvif_rx_cb *cb;
> 
>   BUG_ON(skb->dev != dev);
> 
> - /* Drop the packet if queues are not set up */
> + /* Drop the packet if queues are not set up.
> +  * This handler should be called inside an RCU read section
> +  * so we don't need to enter it here explicitly.
> +  */
> + num_queues = rcu_dereference(vif)->num_queues;
>   if (num_queues < 1)
>   goto drop;
> 
> @@ -221,18 +225,21 @@ static struct net_device_stats
> *xenvif_get_stats(struct net_device *dev)
>  {
>   struct xenvif *vif = netdev_priv(dev);
>   struct xenvif_queue *queue = NULL;
> + unsigned int num_queues;
>   u64 rx_bytes = 0;
>   u64 rx_packets = 0;
>   u64 tx_bytes = 0;
>   u64 tx_packets = 0;
>   unsigned int index;
> 
> - spin_lock(&vif->lock);
> - if (vif->queues == NULL)
> + rcu_read_lock();
> +
> + num_queues = rcu_dereference(vif)->num_queues;
> + if (num_queues < 1)
>   goto out;

Is this if clause worth it? All it does is jump over the for loop, which would 
not be executed anyway, since the initial test (0 < 0) would fail.

> 
>   /* Aggregate tx and rx stats from each queue */
> - for (index = 0; index < vif->num_queues; ++index) {
> + for (index = 0; index < num_queues; ++index) {
>   queue = &vif->queues[index];
>   rx_bytes += queue->stats.rx_bytes;
>   rx_packets += queue->stats.rx_packets;
> @@ -241,7 +248,7 @@ static struct net_device_stats
> *xenvif_get_stats(struct net_device *dev)
>   }
> 
>  out:
> - spin_unlock(&vif->lock);
> + rcu_read_unlock();
> 
>   vif->dev->stats.rx_bytes = rx_bytes;
>   vif->dev->stats.rx_packets = rx_packets;
> @@ -377,10 +384,16 @@ static void xenvif_get_ethtool_stats(struct
> net_device *dev,
>struct ethtool_stats *stats, u64 * data)
>  {
>   struct xenvif *vif = netdev_priv(dev);
> - unsigned int num_queues = vif->num_queues;
> + unsigned int num_queues;
>   int i;
>   unsigned int queue_index;
> 
> + rcu_read_lock();
> +
> + num_queues = rcu_dereference(vif)->num_queues;
> + if (num_queues < 1)
> + goto out;
> +

You have introduced a semantic change with the above if clause. The 
xenvif_stats array was previously zeroed if num_queues < 1. It appears that 
ethtool does actually allocate a zeroed array to pass in here, but I wonder 
whether it is still safer to have this function zero it anyway. 

>   for (i = 0; i < ARRAY_SIZE(xenvif_stats); i++) {
>   unsigned long accum = 0;
>   for (queue_index = 0; queue_index < num_queues;
> ++queue_index) {
> @@ -389,6 +402,8 @@ static void xenvif_get_ethtool_stats(struct
> net_device *dev,
>   }
>   data[i] = accum;
>   }
> +out:
> + rcu_read_unlock();
>  }
> 
>  static void xenvif_get_strings(struct net_device *dev, u32 stringset, u8 *
> data)
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-
> netback/netback.c
> index f9bcf4a..62fa74d 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -214,7 +214,7 @@ static void xenvif_fatal_tx_err(struct xenvif *vif)
>   netdev_err(vif->dev, "fatal error; disabling device\n");
>   vif->disabled = true;
>   /* Disable the vif from queue 0's kthread */
> - if (vif->queues)
> + if (vif->num_queues > 0)

num_queues is unsigned so this check should not be > 0. It would be better 
simply to do:

if (vif->num_queue

Re: [PATCH 07/26] brcmsmac: reduce stack size with KASAN

2017-03-06 Thread Arend Van Spriel
On 2-3-2017 17:38, Arnd Bergmann wrote:
> The wlc_phy_table_write_nphy/wlc_phy_table_read_nphy functions always put an 
> object
> on the stack, which will each require a redzone with KASAN and lead to 
> possible
> stack overflow:
> 
> drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c: In function 
> 'wlc_phy_workarounds_nphy':
> drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c:17135:1: 
> warning: the frame size of 6312 bytes is larger than 1000 bytes 
> [-Wframe-larger-than=]

Looks like this warning text ended up in the wrong commit message. Got
me confused for a sec :-p

> This marks the two functions as noinline_for_kasan, avoiding the problem 
> entirely.

Frankly I seriously dislike annotating code for the sake of some
(dynamic) memory analyzer. To me the whole thing seems rather
unnecessary. If the code passes the 2048 stack limit without KASAN it
would seem the limit with KASAN should be such that no warning is given.
I suspect that it is rather difficult to predict the additional size of
the instrumentation code and on some systems there might be a real issue
with increased stack usage.

Regards,
Arend

> Signed-off-by: Arnd Bergmann 
> ---
>  drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c 
> b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c
> index b3aab2fe96eb..42dc8e1f483d 100644
> --- a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c
> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c
> @@ -14157,7 +14157,7 @@ static void wlc_phy_bphy_init_nphy(struct brcms_phy 
> *pi)
>   write_phy_reg(pi, NPHY_TO_BPHY_OFF + BPHY_STEP, 0x668);
>  }
>  
> -void
> +noinline_for_kasan void
>  wlc_phy_table_write_nphy(struct brcms_phy *pi, u32 id, u32 len, u32 offset,
>u32 width, const void *data)
>  {
> @@ -14171,7 +14171,7 @@ wlc_phy_table_write_nphy(struct brcms_phy *pi, u32 
> id, u32 len, u32 offset,
>   wlc_phy_write_table_nphy(pi, &tbl);
>  }
>  
> -void
> +noinline_for_kasan void
>  wlc_phy_table_read_nphy(struct brcms_phy *pi, u32 id, u32 len, u32 offset,
>   u32 width, void *data)
>  {
> 


4.11-rc1 regression: e1000e "BUG at drivers/pci/msi.c" on unplugged suspend+resume

2017-03-06 Thread Bjørn Mork
This is new with v4.11-rc1, so I strongly suspect commit 7e54d9d063fa
("e1000e: driver trying to free already-free irq"), which looks more
than suspicious in this context.  Haven't had time to test a revert
yet.  Just wanted to give an advance warning in case this isn't known.


Suspending and resuming my laptop with the ethernet unplugged results
in:

[ cut here ]
WARNING: CPU: 1 PID: 2086 at drivers/pci/msi.c:1052 
__pci_enable_msi_range+0x3c8/0x420
Modules linked in: rfcomm xt_multiport iptable_filter 8021q garp mrp stp llc 
tun ctr ccm cmac bnep nls_utf8 nls_cp437 vfat fat qcserial usb_wwan arc4 
mei_wdt intel_rapl cdc_mbim cdc_wdm cdc_ncm usbnet mii usbserial 
x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass 
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc uvcvideo btusb btrtl 
btbcm videobuf2_vmalloc videobuf2_memops snd_hda_codec_hdmi btintel 
videobuf2_v4l2 videobuf2_core bluetooth videodev snd_hda_codec_conexant 
snd_hda_codec_generic iwlmvm mac80211 efi_pstore snd_hda_intel iwlwifi 
snd_hda_codec snd_hwdep aesni_intel aes_x86_64 crypto_simd glue_helper cryptd 
snd_hda_core evdev serio_raw snd_pcm efivars cfg80211 iTCO_wdt 
iTCO_vendor_support snd_timer mei_me mei thinkpad_acpi wmi nvram snd soundcore 
ac
 rfkill battery i915 intel_gtt i2c_algo_bit drm_kms_helper syscopyarea 
sysfillrect sysimgblt video fb_sys_fops tpm_crb drm intel_pch_thermal button 
tpm_tis tpm_tis_core tpm sunrpc efivarfs ip_tables x_tables autofs4 ext4 crc16 
jbd2 fscrypto mbcache crc32c_generic intel_ishtp_hid hid rtsx_pci_sdmmc 
mmc_core crc32c_intel psmouse i2c_i801 e1000e ptp pps_core xhci_pci xhci_hcd 
nvme nvme_core usbcore rtsx_pci mfd_core intel_ish_ipc intel_ishtp thermal
CPU: 1 PID: 2086 Comm: kworker/u8:38 Not tainted 4.11.0-rc1 #443
Hardware name: LENOVO 20FB006AMN/20FB006AMN, BIOS N1FET47W (1.21 ) 11/28/2016
Workqueue: events_unbound async_run_entry_fn
Call Trace:
 dump_stack+0x67/0x92
 __warn+0xd1/0xf0
 warn_slowpath_null+0x1d/0x20
 __pci_enable_msi_range+0x3c8/0x420
 ? e1000_get_phy_info_82577+0x30/0x170 [e1000e]
 pci_enable_msi+0x1a/0x30
 e1000e_set_interrupt_capability+0x3c/0x120 [e1000e]
 e1000e_pm_thaw+0x22/0x60 [e1000e]
 e1000e_pm_resume+0x25/0x30 [e1000e]
 pci_pm_resume+0x64/0xa0
 dpm_run_callback+0xb9/0x2f0
 ? pci_pm_thaw+0x90/0x90
 device_resume+0x87/0x190
 async_resume+0x1d/0x50
 async_run_entry_fn+0x39/0x170
 process_one_work+0x1fe/0x6d0
 ? process_one_work+0x17f/0x6d0
 worker_thread+0x69/0x4c0
 kthread+0x12b/0x160
 ? process_one_work+0x6d0/0x6d0
 ? kthread_create_on_node+0x60/0x60
 ret_from_fork+0x2e/0x40
---[ end trace 103a4ba3722e184f ]---
e1000e :00:1f.6 eth0: Failed to initialize MSI interrupts.  Falling back to 
legacy interrupts.


followed by


[ cut here ]
kernel BUG at drivers/pci/msi.c:893!
invalid opcode:  [#1] SMP
Modules linked in: rfcomm xt_multiport iptable_filter 8021q garp mrp stp llc 
tun ctr ccm cmac bnep nls_utf8 nls_cp437 vfat fat qcserial usb_wwan arc4 
mei_wdt intel_rapl cdc_mbim cdc_wdm cdc_ncm usbnet mii usbserial 
x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass 
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc uvcvideo btusb btrtl 
btbcm videobuf2_vmalloc videobuf2_memops snd_hda_codec_hdmi btintel 
videobuf2_v4l2 videobuf2_core bluetooth videodev snd_hda_codec_conexant 
snd_hda_codec_generic iwlmvm mac80211 efi_pstore snd_hda_intel iwlwifi 
snd_hda_codec snd_hwdep aesni_intel aes_x86_64 crypto_simd glue_helper cryptd 
snd_hda_core evdev serio_raw snd_pcm efivars cfg80211 iTCO_wdt 
iTCO_vendor_support snd_timer mei_me mei thinkpad_acpi wmi nvram snd soundcore 
ac
 rfkill battery i915 intel_gtt i2c_algo_bit drm_kms_helper syscopyarea 
sysfillrect sysimgblt video fb_sys_fops tpm_crb drm intel_pch_thermal button 
tpm_tis tpm_tis_core tpm sunrpc efivarfs ip_tables x_tables autofs4 ext4 crc16 
jbd2 fscrypto mbcache crc32c_generic intel_ishtp_hid hid rtsx_pci_sdmmc 
mmc_core crc32c_intel psmouse i2c_i801 e1000e ptp pps_core xhci_pci xhci_hcd 
nvme nvme_core usbcore rtsx_pci mfd_core intel_ish_ipc intel_ishtp thermal
CPU: 3 PID: 545 Comm: NetworkManager Tainted: GW   4.11.0-rc1 #443
Hardware name: LENOVO 20FB006AMN/20FB006AMN, BIOS N1FET47W (1.21 ) 11/28/2016
task: 98efa6452380 task.stack: b475426e
RIP: 0010:pci_msi_shutdown+0x11c/0x130
RSP: 0018:b475426e36a8 EFLAGS: 00010246
RAX: 98efaddfd440 RBX: 98efaddfd000 RCX: 
RDX: 98efaddfd440 RSI:  RDI: 98efaddfd000
RBP: b475426e36c8 R08: 98efaab74000 R09: 98efaab74000
R10: 8d784c80 R11:  R12: 98efaab74000
R13: ffea R14: 98efaab74000 R15: 
FS:  7f25df4c7e40() GS:98efb0c0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f609181eab4 CR3: 00042645d000 CR4: 001406e0
DR0:  DR1:  DR2: 0

Re: [PATCH net-next RFC 2/4] virtio-net: transmit napi

2017-03-06 Thread Jason Wang



On 2017年03月03日 22:39, Willem de Bruijn wrote:

From: Willem de Bruijn 

Convert virtio-net to a standard napi tx completion path. This enables
better TCP pacing using TCP small queues and increases single stream
throughput.

The virtio-net driver currently cleans tx descriptors on transmission
of new packets in ndo_start_xmit. Latency depends on new traffic, so
is unbounded. To avoid deadlock when a socket reaches its snd limit,
packets are orphaned on tranmission. This breaks socket backpressure,
including TSQ.

Napi increases the number of interrupts generated compared to the
current model, which keeps interrupts disabled as long as the ring
has enough free descriptors. Keep tx napi optional for now. Follow-on
patches will reduce the interrupt cost.

Signed-off-by: Willem de Bruijn 
---
  drivers/net/virtio_net.c | 73 
  1 file changed, 61 insertions(+), 12 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 8c21e9a4adc7..9a9031640179 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -33,6 +33,8 @@
  static int napi_weight = NAPI_POLL_WEIGHT;
  module_param(napi_weight, int, 0444);
  
+static int napi_tx_weight = NAPI_POLL_WEIGHT;

+


Maybe we should use module_param for this? Or in the future, use 
tx-frames-irq for a per-device configuration.


Thanks


Re: [PATCH net] bpf: add get_next_key callback to LPM map

2017-03-06 Thread Daniel Borkmann

On 03/05/2017 06:41 PM, Alexei Starovoitov wrote:

map_get_next_key callback is mandatory. Supply dummy handler.

Fixes: b95a5c4db09b ("bpf: add a longest prefix match trie map implementation")
Reported-by: Dmitry Vyukov 
Signed-off-by: Alexei Starovoitov 


Acked-by: Daniel Borkmann 


Re: [PATCH net-next RFC 3/4] vhost: interrupt coalescing support

2017-03-06 Thread Jason Wang



On 2017年03月03日 22:39, Willem de Bruijn wrote:

+void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq);
+static enum hrtimer_restart vhost_coalesce_timer(struct hrtimer *timer)
+{
+   struct vhost_virtqueue *vq =
+   container_of(timer, struct vhost_virtqueue, ctimer);
+
+   if (mutex_trylock(&vq->mutex)) {
+   vq->coalesce_frames = vq->max_coalesce_frames;
+   vhost_signal(vq->dev, vq);
+   mutex_unlock(&vq->mutex);
+   }
+
+   /* TODO: restart if lock failed and not held by handle_tx */
+   return HRTIMER_NORESTART;
+}
+


Then we may lose an interrupt forever if no new tx request? I believe we 
need e.g vhost_poll_queue() here.


Thanks


Re: [PATCH 09/26] brcmsmac: split up wlc_phy_workarounds_nphy

2017-03-06 Thread Arend Van Spriel
On 2-3-2017 17:38, Arnd Bergmann wrote:
> The stack consumption in this driver is still relatively high, with one
> remaining warning if the warning level is lowered to 1536 bytes:
> 
> drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c:17135:1: error: 
> the frame size of 1880 bytes is larger than 1536 bytes 
> [-Werror=frame-larger-than=]
> 
> The affected function is actually a collection of three separate 
> implementations,
> and each of them is fairly large by itself. Splitting them up is done easily
> and improves readability at the same time.
> 
> I'm leaving the original indentation to make the review easier.

Thanks ;-)

Acked-by: Arend van Spriel 
> Signed-off-by: Arnd Bergmann 
> ---
>  .../broadcom/brcm80211/brcmsmac/phy/phy_n.c| 178 
> -
>  1 file changed, 104 insertions(+), 74 deletions(-)
> 
> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c 
> b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c
> index 48a4df488d75..d76c092bb6b4 100644
> --- a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c
> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c
> @@ -16061,52 +16061,8 @@ static void wlc_phy_workarounds_nphy_gainctrl(struct 
> brcms_phy *pi)
>   }
>  }
>  
> -static void wlc_phy_workarounds_nphy(struct brcms_phy *pi)
> +static void wlc_phy_workarounds_nphy_rev7(struct brcms_phy *pi)
>  {
> - static const u8 rfseq_rx2tx_events[] = {
> - NPHY_RFSEQ_CMD_NOP,
> - NPHY_RFSEQ_CMD_RXG_FBW,
> - NPHY_RFSEQ_CMD_TR_SWITCH,
> - NPHY_RFSEQ_CMD_CLR_HIQ_DIS,
> - NPHY_RFSEQ_CMD_RXPD_TXPD,
> - NPHY_RFSEQ_CMD_TX_GAIN,
> - NPHY_RFSEQ_CMD_EXT_PA
> - };
> - u8 rfseq_rx2tx_dlys[] = { 8, 6, 6, 2, 4, 60, 1 };
> - static const u8 rfseq_tx2rx_events[] = {
> - NPHY_RFSEQ_CMD_NOP,
> - NPHY_RFSEQ_CMD_EXT_PA,
> - NPHY_RFSEQ_CMD_TX_GAIN,
> - NPHY_RFSEQ_CMD_RXPD_TXPD,
> - NPHY_RFSEQ_CMD_TR_SWITCH,
> - NPHY_RFSEQ_CMD_RXG_FBW,
> - NPHY_RFSEQ_CMD_CLR_HIQ_DIS
> - };
> - static const u8 rfseq_tx2rx_dlys[] = { 8, 6, 2, 4, 4, 6, 1 };
> - static const u8 rfseq_tx2rx_events_rev3[] = {
> - NPHY_REV3_RFSEQ_CMD_EXT_PA,
> - NPHY_REV3_RFSEQ_CMD_INT_PA_PU,
> - NPHY_REV3_RFSEQ_CMD_TX_GAIN,
> - NPHY_REV3_RFSEQ_CMD_RXPD_TXPD,
> - NPHY_REV3_RFSEQ_CMD_TR_SWITCH,
> - NPHY_REV3_RFSEQ_CMD_RXG_FBW,
> - NPHY_REV3_RFSEQ_CMD_CLR_HIQ_DIS,
> - NPHY_REV3_RFSEQ_CMD_END
> - };
> - static const u8 rfseq_tx2rx_dlys_rev3[] = { 8, 4, 2, 2, 4, 4, 6, 1 };
> - u8 rfseq_rx2tx_events_rev3[] = {
> - NPHY_REV3_RFSEQ_CMD_NOP,
> - NPHY_REV3_RFSEQ_CMD_RXG_FBW,
> - NPHY_REV3_RFSEQ_CMD_TR_SWITCH,
> - NPHY_REV3_RFSEQ_CMD_CLR_HIQ_DIS,
> - NPHY_REV3_RFSEQ_CMD_RXPD_TXPD,
> - NPHY_REV3_RFSEQ_CMD_TX_GAIN,
> - NPHY_REV3_RFSEQ_CMD_INT_PA_PU,
> - NPHY_REV3_RFSEQ_CMD_EXT_PA,
> - NPHY_REV3_RFSEQ_CMD_END
> - };
> - u8 rfseq_rx2tx_dlys_rev3[] = { 8, 6, 6, 4, 4, 18, 42, 1, 1 };
> -
>   static const u8 rfseq_rx2tx_events_rev3_ipa[] = {
>   NPHY_REV3_RFSEQ_CMD_NOP,
>   NPHY_REV3_RFSEQ_CMD_RXG_FBW,
> @@ -16120,29 +16076,15 @@ static void wlc_phy_workarounds_nphy(struct 
> brcms_phy *pi)
>   };
>   static const u8 rfseq_rx2tx_dlys_rev3_ipa[] = { 8, 6, 6, 4, 4, 16, 43, 
> 1, 1 };
>   static const u16 rfseq_rx2tx_dacbufpu_rev7[] = { 0x10f, 0x10f };
> -
> - s16 alpha0, alpha1, alpha2;
> - s16 beta0, beta1, beta2;
> - u32 leg_data_weights, ht_data_weights, nss1_data_weights,
> - stbc_data_weights;
> + u32 leg_data_weights;
>   u8 chan_freq_range = 0;
>   static const u16 dac_control = 0x0002;
>   u16 aux_adc_vmid_rev7_core0[] = { 0x8e, 0x96, 0x96, 0x96 };
>   u16 aux_adc_vmid_rev7_core1[] = { 0x8f, 0x9f, 0x9f, 0x96 };
> - u16 aux_adc_vmid_rev4[] = { 0xa2, 0xb4, 0xb4, 0x89 };
> - u16 aux_adc_vmid_rev3[] = { 0xa2, 0xb4, 0xb4, 0x89 };
> - u16 *aux_adc_vmid;
>   u16 aux_adc_gain_rev7[] = { 0x02, 0x02, 0x02, 0x02 };
> - u16 aux_adc_gain_rev4[] = { 0x02, 0x02, 0x02, 0x00 };
> - u16 aux_adc_gain_rev3[] = { 0x02, 0x02, 0x02, 0x00 };
> - u16 *aux_adc_gain;
> - static const u16 sk_adc_vmid[] = { 0xb4, 0xb4, 0xb4, 0x24 };
> - static const u16 sk_adc_gain[] = { 0x02, 0x02, 0x02, 0x02 };
>   s32 min_nvar_val = 0x18d;
>   s32 min_nvar_offset_6mbps = 20;
>   u8 pdetrange;
> - u8 triso;
> - u16 regval;
>   u16 afectrl_adc_ctrl1_rev7 = 0x20;
>   u16 afectrl_adc_ctrl2_rev7 = 0x0;
>   u16 rfseq_rx2tx_lpf_h_hpc_rev7 = 0x77;
> @@ -16171,17 +16113,6 @@ static void wlc_phy_workarounds_nphy(struct 
> brcms_phy *pi)
>   u16 freq;
>   int coreNum;

[PATCH iproute2 net-next] devlink: Add json and pretty options to help and man

2017-03-06 Thread Roi Dayan
While at it also fixed missing double dash for long opts.

Signed-off-by: Roi Dayan 
---
 devlink/devlink.c  |  2 +-
 man/man8/devlink.8 | 14 --
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index c357580..e90226e 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -2470,7 +2470,7 @@ static void help(void)
 {
pr_err("Usage: devlink [ OPTIONS ] OBJECT { COMMAND | help }\n"
   "where  OBJECT := { dev | port | sb | monitor }\n"
-  "   OPTIONS := { -V[ersion] | -n[no-nice-names] }\n");
+  "   OPTIONS := { -V[ersion] | -n[no-nice-names] | -j[json] | 
-p[pretty] }\n");
 }
 
 static int dl_cmd(struct dl *dl)
diff --git a/man/man8/devlink.8 b/man/man8/devlink.8
index cf0563b..a480766 100644
--- a/man/man8/devlink.8
+++ b/man/man8/devlink.8
@@ -20,19 +20,29 @@ devlink \- Devlink tool
 .IR OPTIONS " := { "
 \fB\-V\fR[\fIersion\fR] |
 \fB\-n\fR[\fIno-nice-names\fR] }
+\fB\-j\fR[\fIjson\fR] }
+\fB\-p\fR[\fIpretty\fR] }
 
 .SH OPTIONS
 
 .TP
-.BR "\-V" , " -Version"
+.BR "\-V" , " --Version"
 Print the version of the
 .B devlink
 utility and exit.
 
 .TP
-.BR "\-n" , " -no-nice-names"
+.BR "\-n" , " --no-nice-names"
 Turn off printing out nice names, for example netdevice ifnames instead of 
devlink port identification.
 
+.TP
+.BR "\-j" , " --json"
+Generate JSON output.
+
+.TP
+.BR "\-p" , " --pretty"
+When combined with -j generate a pretty JSON output.
+
 .SS
 .I OBJECT
 
-- 
2.7.4



Re: [PATCH 08/26] brcmsmac: make some local variables 'static const' to reduce stack size

2017-03-06 Thread Arend Van Spriel
On 2-3-2017 17:38, Arnd Bergmann wrote:
> With KASAN and a couple of other patches applied, this driver is one
> of the few remaining ones that actually use more than 2048 bytes of
> kernel stack:
> 
> broadcom/brcm80211/brcmsmac/phy/phy_n.c: In function 
> 'wlc_phy_workarounds_nphy_gainctrl':
> broadcom/brcm80211/brcmsmac/phy/phy_n.c:16065:1: warning: the frame size of 
> 3264 bytes is larger than 2048 bytes [-Wframe-larger-than=]
> broadcom/brcm80211/brcmsmac/phy/phy_n.c: In function 
> 'wlc_phy_workarounds_nphy':
> broadcom/brcm80211/brcmsmac/phy/phy_n.c:17138:1: warning: the frame size of 
> 2864 bytes is larger than 2048 bytes [-Wframe-larger-than=]
> 
> Here, I'm reducing the stack size by marking as many local variables as
> 'static const' as I can without changing the actual code.

Acked-by: Arend van Spriel 
> Signed-off-by: Arnd Bergmann 
> ---
>  .../broadcom/brcm80211/brcmsmac/phy/phy_n.c| 197 
> ++---
>  1 file changed, 97 insertions(+), 100 deletions(-)
> 
> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c 
> b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c
> index 42dc8e1f483d..48a4df488d75 100644
> --- a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c
> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c
> @@ -14764,8 +14764,8 @@ static void 
> wlc_phy_ipa_restore_tx_digi_filts_nphy(struct brcms_phy *pi)
>  }
>  
>  static void
> -wlc_phy_set_rfseq_nphy(struct brcms_phy *pi, u8 cmd, u8 *events, u8 *dlys,
> -u8 len)
> +wlc_phy_set_rfseq_nphy(struct brcms_phy *pi, u8 cmd, const u8 *events,
> +const u8 *dlys, u8 len)
>  {
>   u32 t1_offset, t2_offset;
>   u8 ctr;
> @@ -15240,16 +15240,16 @@ static void 
> wlc_phy_workarounds_nphy_gainctrl_2057_rev5(struct brcms_phy *pi)
>  static void wlc_phy_workarounds_nphy_gainctrl_2057_rev6(struct brcms_phy *pi)
>  {
>   u16 currband;
> - s8 lna1G_gain_db_rev7[] = { 9, 14, 19, 24 };
> - s8 *lna1_gain_db = NULL;
> - s8 *lna1_gain_db_2 = NULL;
> - s8 *lna2_gain_db = NULL;
> - s8 tiaA_gain_db_rev7[] = { -9, -6, -3, 0, 3, 3, 3, 3, 3, 3 };
> - s8 *tia_gain_db;
> - s8 tiaA_gainbits_rev7[] = { 0, 1, 2, 3, 4, 4, 4, 4, 4, 4 };
> - s8 *tia_gainbits;
> - u16 rfseqA_init_gain_rev7[] = { 0x624f, 0x624f };
> - u16 *rfseq_init_gain;
> + static const s8 lna1G_gain_db_rev7[] = { 9, 14, 19, 24 };
> + const s8 *lna1_gain_db = NULL;
> + const s8 *lna1_gain_db_2 = NULL;
> + const s8 *lna2_gain_db = NULL;
> + static const s8 tiaA_gain_db_rev7[] = { -9, -6, -3, 0, 3, 3, 3, 3, 3, 3 
> };
> + const s8 *tia_gain_db;
> + static const s8 tiaA_gainbits_rev7[] = { 0, 1, 2, 3, 4, 4, 4, 4, 4, 4 };
> + const s8 *tia_gainbits;
> + static const u16 rfseqA_init_gain_rev7[] = { 0x624f, 0x624f };
> + const u16 *rfseq_init_gain;
>   u16 init_gaincode;
>   u16 clip1hi_gaincode;
>   u16 clip1md_gaincode = 0;
> @@ -15310,10 +15310,9 @@ static void 
> wlc_phy_workarounds_nphy_gainctrl_2057_rev6(struct brcms_phy *pi)
>  
>   if ((freq <= 5080) || (freq == 5825)) {
>  
> - s8 lna1A_gain_db_rev7[] = { 11, 16, 20, 24 };
> - s8 lna1A_gain_db_2_rev7[] = {
> - 11, 17, 22, 25};
> - s8 lna2A_gain_db_rev7[] = { -1, 6, 10, 14 };
> + static const s8 lna1A_gain_db_rev7[] = { 11, 
> 16, 20, 24 };
> + static const s8 lna1A_gain_db_2_rev7[] = { 11, 
> 17, 22, 25};
> + static const s8 lna2A_gain_db_rev7[] = { -1, 6, 
> 10, 14 };
>  
>   crsminu_th = 0x3e;
>   lna1_gain_db = lna1A_gain_db_rev7;
> @@ -15321,10 +15320,9 @@ static void 
> wlc_phy_workarounds_nphy_gainctrl_2057_rev6(struct brcms_phy *pi)
>   lna2_gain_db = lna2A_gain_db_rev7;
>   } else if ((freq >= 5500) && (freq <= 5700)) {
>  
> - s8 lna1A_gain_db_rev7[] = { 11, 17, 21, 25 };
> - s8 lna1A_gain_db_2_rev7[] = {
> - 12, 18, 22, 26};
> - s8 lna2A_gain_db_rev7[] = { 1, 8, 12, 16 };
> + static const s8 lna1A_gain_db_rev7[] = { 11, 
> 17, 21, 25 };
> + static const s8 lna1A_gain_db_2_rev7[] = { 12, 
> 18, 22, 26};
> + static const s8 lna2A_gain_db_rev7[] = { 1, 8, 
> 12, 16 };
>  
>   crsminu_th = 0x45;
>   clip1md_gaincode_B = 0x14;
> @@ -15335,10 +15333,9 @@ static void 
> wlc_phy_workarounds_nphy_gainctrl_2057_rev6(struct brcms_phy *pi)
>   lna2_gain_db = lna2A_gain_db_rev7;
>   } else {
>  
> - 

Re: [PATCH net-next RFC 4/4] virtio-net: clean tx descriptors from rx napi

2017-03-06 Thread Jason Wang



On 2017年03月03日 22:39, Willem de Bruijn wrote:

From: Willem de Bruijn 

Amortize the cost of virtual interrupts by doing both rx and tx work
on reception of a receive interrupt. Together VIRTIO_F_EVENT_IDX and
vhost interrupt moderation, this suppresses most explicit tx
completion interrupts for bidirectional workloads.

Signed-off-by: Willem de Bruijn 
---
  drivers/net/virtio_net.c | 19 +++
  1 file changed, 19 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 9a9031640179..21c575127d50 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1031,6 +1031,23 @@ static int virtnet_receive(struct receive_queue *rq, int 
budget)
return received;
  }
  
+static unsigned int free_old_xmit_skbs(struct send_queue *sq, int budget);

+
+static void virtnet_poll_cleantx(struct receive_queue *rq)
+{
+   struct virtnet_info *vi = rq->vq->vdev->priv;
+   unsigned int index = vq2rxq(rq->vq);
+   struct send_queue *sq = &vi->sq[index];
+   struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, index);
+
+   __netif_tx_lock(txq, smp_processor_id());
+   free_old_xmit_skbs(sq, sq->napi.weight);
+   __netif_tx_unlock(txq);


Should we check tx napi weight here? Or this was treated as an 
independent optimization?



+
+   if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
+   netif_wake_subqueue(vi->dev, vq2txq(sq->vq));
+}
+
  static int virtnet_poll(struct napi_struct *napi, int budget)
  {
struct receive_queue *rq =
@@ -1039,6 +1056,8 @@ static int virtnet_poll(struct napi_struct *napi, int 
budget)
  
  	received = virtnet_receive(rq, budget);
  
+	virtnet_poll_cleantx(rq);

+


Better to do the before virtnet_receive() consider refill may allocate 
memory for rx buffers.


Btw, if this is proved to be more efficient. In the future we may 
consider to:


1) use a single interrupt for both rx and tx
2) use a single napi to handle both rx and tx

Thanks


/* Out of packets? */
if (received < budget)
virtqueue_napi_complete(napi, rq->vq, received);




Re: [PATCH 10/26] brcmsmac: reindent split functions

2017-03-06 Thread Arend Van Spriel
On 2-3-2017 17:38, Arnd Bergmann wrote:
> In the previous commit I left the indentation alone to help reviewing
> the patch, this one now runs the three new functions through 'indent -kr -8'
> with some manual fixups to avoid silliness.
> 
> No changes other than whitespace are intended here.

Acked-by: Arend van Spriel 
> Signed-off-by: Arnd Bergmann 
> ---
>  .../broadcom/brcm80211/brcmsmac/phy/phy_n.c| 1507 
> +---
>  1 file changed, 697 insertions(+), 810 deletions(-)
> 
> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c 
> b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c
> index d76c092bb6b4..9b39789c673d 100644
> --- a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c
> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c
> @@ -16074,7 +16074,8 @@ static void wlc_phy_workarounds_nphy_rev7(struct 
> brcms_phy *pi)
>   NPHY_REV3_RFSEQ_CMD_INT_PA_PU,
>   NPHY_REV3_RFSEQ_CMD_END
>   };
> - static const u8 rfseq_rx2tx_dlys_rev3_ipa[] = { 8, 6, 6, 4, 4, 16, 43, 
> 1, 1 };
> + static const u8 rfseq_rx2tx_dlys_rev3_ipa[] =
> + { 8, 6, 6, 4, 4, 16, 43, 1, 1 };
>   static const u16 rfseq_rx2tx_dacbufpu_rev7[] = { 0x10f, 0x10f };
>   u32 leg_data_weights;
>   u8 chan_freq_range = 0;
> @@ -16114,526 +16115,452 @@ static void wlc_phy_workarounds_nphy_rev7(struct 
> brcms_phy *pi)
>   int coreNum;
>  
>  
> - if (NREV_IS(pi->pubpi.phy_rev, 7)) {
> - mod_phy_reg(pi, 0x221, (0x1 << 4), (1 << 4));
> -
> - mod_phy_reg(pi, 0x160, (0x7f << 0), (32 << 0));
> - mod_phy_reg(pi, 0x160, (0x7f << 8), (39 << 8));
> - mod_phy_reg(pi, 0x161, (0x7f << 0), (46 << 0));
> - mod_phy_reg(pi, 0x161, (0x7f << 8), (51 << 8));
> - mod_phy_reg(pi, 0x162, (0x7f << 0), (55 << 0));
> - mod_phy_reg(pi, 0x162, (0x7f << 8), (58 << 8));
> - mod_phy_reg(pi, 0x163, (0x7f << 0), (60 << 0));
> - mod_phy_reg(pi, 0x163, (0x7f << 8), (62 << 8));
> - mod_phy_reg(pi, 0x164, (0x7f << 0), (62 << 0));
> - mod_phy_reg(pi, 0x164, (0x7f << 8), (63 << 8));
> - mod_phy_reg(pi, 0x165, (0x7f << 0), (63 << 0));
> - mod_phy_reg(pi, 0x165, (0x7f << 8), (64 << 8));
> - mod_phy_reg(pi, 0x166, (0x7f << 0), (64 << 0));
> - mod_phy_reg(pi, 0x166, (0x7f << 8), (64 << 8));
> - mod_phy_reg(pi, 0x167, (0x7f << 0), (64 << 0));
> - mod_phy_reg(pi, 0x167, (0x7f << 8), (64 << 8));
> - }
> -
> - if (NREV_LE(pi->pubpi.phy_rev, 8)) {
> - write_phy_reg(pi, 0x23f, 0x1b0);
> - write_phy_reg(pi, 0x240, 0x1b0);
> - }
> + if (NREV_IS(pi->pubpi.phy_rev, 7)) {
> + mod_phy_reg(pi, 0x221, (0x1 << 4), (1 << 4));
> +
> + mod_phy_reg(pi, 0x160, (0x7f << 0), (32 << 0));
> + mod_phy_reg(pi, 0x160, (0x7f << 8), (39 << 8));
> + mod_phy_reg(pi, 0x161, (0x7f << 0), (46 << 0));
> + mod_phy_reg(pi, 0x161, (0x7f << 8), (51 << 8));
> + mod_phy_reg(pi, 0x162, (0x7f << 0), (55 << 0));
> + mod_phy_reg(pi, 0x162, (0x7f << 8), (58 << 8));
> + mod_phy_reg(pi, 0x163, (0x7f << 0), (60 << 0));
> + mod_phy_reg(pi, 0x163, (0x7f << 8), (62 << 8));
> + mod_phy_reg(pi, 0x164, (0x7f << 0), (62 << 0));
> + mod_phy_reg(pi, 0x164, (0x7f << 8), (63 << 8));
> + mod_phy_reg(pi, 0x165, (0x7f << 0), (63 << 0));
> + mod_phy_reg(pi, 0x165, (0x7f << 8), (64 << 8));
> + mod_phy_reg(pi, 0x166, (0x7f << 0), (64 << 0));
> + mod_phy_reg(pi, 0x166, (0x7f << 8), (64 << 8));
> + mod_phy_reg(pi, 0x167, (0x7f << 0), (64 << 0));
> + mod_phy_reg(pi, 0x167, (0x7f << 8), (64 << 8));
> + }
>  
> - if (NREV_GE(pi->pubpi.phy_rev, 8))
> - mod_phy_reg(pi, 0xbd, (0xff << 0), (114 << 0));
> + if (NREV_LE(pi->pubpi.phy_rev, 8)) {
> + write_phy_reg(pi, 0x23f, 0x1b0);
> + write_phy_reg(pi, 0x240, 0x1b0);
> + }
>  
> - wlc_phy_table_write_nphy(pi, NPHY_TBL_ID_AFECTRL, 1, 0x00, 16,
> -  &dac_control);
> - wlc_phy_table_write_nphy(pi, NPHY_TBL_ID_AFECTRL, 1, 0x10, 16,
> -  &dac_control);
> + if (NREV_GE(pi->pubpi.phy_rev, 8))
> + mod_phy_reg(pi, 0xbd, (0xff << 0), (114 << 0));
>  
> - wlc_phy_table_read_nphy(pi, NPHY_TBL_ID_CMPMETRICDATAWEIGHTTBL,
> - 1, 0, 32, &leg_data_weights);
> - leg_data_weights = leg_data_weights & 0xff;
> - wl

Re: crypto: deadlock between crypto_alg_sem/rtnl_mutex/genl_mutex

2017-03-06 Thread Dmitry Vyukov
On Sun, Mar 5, 2017 at 6:36 PM, Dmitry Vyukov  wrote:
> On Sun, Mar 5, 2017 at 4:08 PM, Dmitry Vyukov  wrote:
>> Hello,
>>
>> I am getting the following deadlock reports while running syzkaller
>> fuzzer on net-next/8d70eeb84ab277377c017af6a21d0a337025dede:
>>
>> ==
>> [ INFO: possible circular locking dependency detected ]
>> 4.10.0+ #5 Not tainted
>> ---
>> syz-executor6/6143 is trying to acquire lock:
>>  (nlk->cb_mutex){+.+.+.}, at: []
>> __netlink_dump_start+0xf4/0x760 net/netlink/af_netlink.c:2187
>>
>> but task is already holding lock:
>>  (crypto_alg_sem){+.}, at: []
>> crypto_user_rcv_msg+0x136/0x4f0 crypto/crypto_user.c:507
>>
>> which lock already depends on the new lock.
>>
>>
>> the existing dependency chain (in reverse order) is:
>>
>> -> #4 (crypto_alg_sem){+.}:
>>validate_chain kernel/locking/lockdep.c:2267 [inline]
>>__lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
>>lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
>>down_read+0x9b/0x150 kernel/locking/rwsem.c:23
>>crypto_alg_lookup+0x23/0x50 crypto/api.c:199
>>crypto_larval_lookup.part.10+0x9a/0x3b0 crypto/api.c:217
>>crypto_larval_lookup crypto/api.c:211 [inline]
>>crypto_alg_mod_lookup+0x77/0x1b0 crypto/api.c:270
>>crypto_alloc_base+0x50/0x1e0 crypto/api.c:416
>>crypto_alloc_cipher include/linux/crypto.h:1407 [inline]
>>tcp_fastopen_reset_cipher+0xc2/0x2e0 net/ipv4/tcp_fastopen.c:48
>>tcp_fastopen_init_key_once+0x114/0x120 net/ipv4/tcp_fastopen.c:29
>>do_tcp_setsockopt.isra.36+0x140a/0x20a0 net/ipv4/tcp.c:2684
>>tcp_setsockopt+0xb0/0xd0 net/ipv4/tcp.c:2733
>>sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2731
>>SYSC_setsockopt net/socket.c:1786 [inline]
>>SyS_setsockopt+0x25c/0x390 net/socket.c:1765
>>entry_SYSCALL_64_fastpath+0x1f/0xc2
>>
>> -> #3 (sk_lock-AF_INET){+.+.+.}:
>>validate_chain kernel/locking/lockdep.c:2267 [inline]
>>__lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
>>lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
>>lock_sock_nested+0xcb/0x120 net/core/sock.c:2536
>>lock_sock include/net/sock.h:1460 [inline]
>>rds_tcp_listen_stop+0x57/0x140 net/rds/tcp_listen.c:284
>>rds_tcp_kill_sock net/rds/tcp.c:529 [inline]
>>rds_tcp_dev_event+0x383/0xc50 net/rds/tcp.c:568
>>notifier_call_chain+0x1b5/0x2b0 kernel/notifier.c:93
>>__raw_notifier_call_chain kernel/notifier.c:394 [inline]
>>raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
>>call_netdevice_notifiers_info+0x51/0x90 net/core/dev.c:1646
>>call_netdevice_notifiers net/core/dev.c:1662 [inline]
>>netdev_run_todo+0x3b2/0xa30 net/core/dev.c:7530
>>rtnl_unlock+0xe/0x10 net/core/rtnetlink.c:104
>>default_device_exit_batch+0x504/0x620 net/core/dev.c:8334
>>ops_exit_list.isra.6+0x100/0x150 net/core/net_namespace.c:144
>>cleanup_net+0x551/0xa90 net/core/net_namespace.c:463
>>process_one_work+0xbd0/0x1c10 kernel/workqueue.c:2096
>>worker_thread+0x223/0x1990 kernel/workqueue.c:2230
>>kthread+0x326/0x3f0 kernel/kthread.c:229
>>ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430
>>
>> -> #2 (rtnl_mutex){+.+.+.}:
>>validate_chain kernel/locking/lockdep.c:2267 [inline]
>>__lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
>>lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
>>__mutex_lock_common kernel/locking/mutex.c:756 [inline]
>>__mutex_lock+0x172/0x1730 kernel/locking/mutex.c:893
>>mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
>>rtnl_lock+0x17/0x20 net/core/rtnetlink.c:70
>>tipc_nl_bearer_dump+0x3ef/0x720 net/tipc/bearer.c:774
>>genl_lock_dumpit+0x68/0x90 net/netlink/genetlink.c:479
>>netlink_dump+0x54d/0xd40 net/netlink/af_netlink.c:2127
>>__netlink_dump_start+0x4e5/0x760 net/netlink/af_netlink.c:2217
>>genl_family_rcv_msg+0xd9d/0x1040 net/netlink/genetlink.c:546
>>genl_rcv_msg+0xa6/0x140 net/netlink/genetlink.c:620
>>netlink_rcv_skb+0x2ab/0x390 net/netlink/af_netlink.c:2298
>>genl_rcv+0x28/0x40 net/netlink/genetlink.c:631
>>netlink_unicast_kernel net/netlink/af_netlink.c:1231 [inline]
>>netlink_unicast+0x514/0x730 net/netlink/af_netlink.c:1257
>>netlink_sendmsg+0xa9f/0xe50 net/netlink/af_netlink.c:1803
>>sock_sendmsg_nosec net/socket.c:633 [inline]
>>sock_sendmsg+0xca/0x110 net/socket.c:643
>>sock_write_iter+0x326/0x600 net/socket.c:846
>>call_write_iter include/linux/fs.h:1733 [inline]
>>new_sync_write fs/read_write.c:497 [inline]
>>__vfs_write+0x483/0x740 fs/read_write.c:510
>>vfs_write+0x187/0x530 f

Re: [PATCH 07/26] brcmsmac: reduce stack size with KASAN

2017-03-06 Thread Arnd Bergmann
On Mon, Mar 6, 2017 at 10:16 AM, Arend Van Spriel
 wrote:
> On 2-3-2017 17:38, Arnd Bergmann wrote:
>> The wlc_phy_table_write_nphy/wlc_phy_table_read_nphy functions always put an 
>> object
>> on the stack, which will each require a redzone with KASAN and lead to 
>> possible
>> stack overflow:
>>
>> drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c: In function 
>> 'wlc_phy_workarounds_nphy':
>> drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c:17135:1: 
>> warning: the frame size of 6312 bytes is larger than 1000 bytes 
>> [-Wframe-larger-than=]
>
> Looks like this warning text ended up in the wrong commit message. Got
> me confused for a sec :-p

What's wrong about the warning?

>> This marks the two functions as noinline_for_kasan, avoiding the problem 
>> entirely.
>
> Frankly I seriously dislike annotating code for the sake of some
> (dynamic) memory analyzer. To me the whole thing seems rather
> unnecessary. If the code passes the 2048 stack limit without KASAN it
> would seem the limit with KASAN should be such that no warning is given.
> I suspect that it is rather difficult to predict the additional size of
> the instrumentation code and on some systems there might be a real issue
> with increased stack usage.

The frame sizes don't normally change that much. There are a couple of
drivers like brcmsmac that repeatedly call an inline function which has
a local variable that it passes by reference to an extern function.

While normally those variables share a stack location, KASAN forces
each instance to its own location and adds (in this case) 80 bytes of
redzone around it to detect out-of-bounds access.

While most drivers are fine with a 1500 byte warning limit, increasing
the limit to 7kb would silence brcmsmac (unless more registers
are accessed from wlc_phy_workarounds_nphy) but also risk a
stack overflow to go unnoticed.

Arnd


Re: net: BUG in unix_notinflight

2017-03-06 Thread Dmitry Vyukov
On Sat, Nov 26, 2016 at 7:05 PM, Dmitry Vyukov  wrote:
> Hello,
>
> I am hitting the following BUG while running syzkaller fuzzer:
>
> kernel BUG at net/unix/garbage.c:149!
> invalid opcode:  [#1] SMP DEBUG_PAGEALLOC KASAN
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 0 PID: 23491 Comm: syz-executor Not tainted 4.9.0-rc5+ #41
> Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 01/01/2011
> task: 8801c16b06c0 task.stack: 8801c2928000
> RIP: 0010:[]  []
> unix_notinflight+0x3b4/0x490 net/unix/garbage.c:149
> RSP: 0018:8801c292ea40  EFLAGS: 00010297
> RAX: 8801c16b06c0 RBX: 110038525d4a RCX: dc00
> RDX:  RSI: 110038525d4e RDI: 8a6e9d84
> RBP: 8801c292eb18 R08:  R09: 
> R10: cdca594876e035a1 R11: 0005 R12: 110038525d4e
> R13: 899156e0 R14: 8801c292eaf0 R15: 88018b7cd780
> FS:  7f10420fa700() GS:8801d980() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 2000a000 CR3: 0001c2ecc000 CR4: 001406f0
> DR0:  DR1: 0400 DR2: 
> DR3:  DR6: 0ff0 DR7: 0600
> Stack:
>  dc00 88019f036970 41b58ab3 894c5120
>  8717e840 8801c16b06c0 88018b7cdcf0 894c51e2
>  81576d50   1100
> Call Trace:
>  [] unix_detach_fds.isra.19+0xff/0x170 
> net/unix/af_unix.c:1487
>  [] unix_destruct_scm+0xf9/0x210 net/unix/af_unix.c:1496
>  [] skb_release_head_state+0x101/0x200 net/core/skbuff.c:655
>  [] skb_release_all+0x1a/0x60 net/core/skbuff.c:668
>  [] __kfree_skb+0x1a/0x30 net/core/skbuff.c:684
>  [] kfree_skb+0x184/0x570 net/core/skbuff.c:705
>  [] unix_release_sock+0x5b5/0xbd0 net/unix/af_unix.c:559
>  [] unix_release+0x49/0x90 net/unix/af_unix.c:836
>  [] sock_release+0x92/0x1f0 net/socket.c:570
>  [] sock_close+0x1b/0x20 net/socket.c:1017
>  [] __fput+0x34e/0x910 fs/file_table.c:208
>  [] fput+0x1a/0x20 fs/file_table.c:244
>  [] task_work_run+0x1a0/0x280 kernel/task_work.c:116
>  [< inline >] exit_task_work include/linux/task_work.h:21
>  [] do_exit+0x183a/0x2640 kernel/exit.c:828
>  [] do_group_exit+0x14e/0x420 kernel/exit.c:931
>  [] get_signal+0x663/0x1880 kernel/signal.c:2307
>  [] do_signal+0xc5/0x2190 arch/x86/kernel/signal.c:807
>  [] exit_to_usermode_loop+0x1ea/0x2d0
> arch/x86/entry/common.c:156
>  [< inline >] prepare_exit_to_usermode arch/x86/entry/common.c:190
>  [] syscall_return_slowpath+0x4d3/0x570
> arch/x86/entry/common.c:259
>  [] entry_SYSCALL_64_fastpath+0xc4/0xc6
> Code: df 49 89 87 70 05 00 00 41 c6 04 14 f8 48 89 f9 48 c1 e9 03 80
> 3c 11 00 75 64 49 89 87 78 05 00 00 e9 65 ff ff ff e8 ac 94 56 fa <0f>
> 0b 48 89 d7 48 89 95 30 ff ff ff e8 bb 22 87 fa 48 8b 95 30
> RIP  [] unix_notinflight+0x3b4/0x490 net/unix/garbage.c:149
>  RSP 
> ---[ end trace 4cbbd52674b68dab ]---
>
>
> On commit 16ae16c6e5616c084168740990fc508bda6655d4 (Nov 24).
> Unfortunately this is not reproducible outside of syzkaller.
> But easily reproducible with syzkaller. If you need to reproduce it,
> follow instructions described here:
> https://github.com/google/syzkaller/wiki/How-to-execute-syzkaller-programs
> With the following as the program:
>
> mmap(&(0x7f00/0xdd5000)=nil, (0xdd5000), 0x3, 0x32,
> 0x, 0x0)
> socketpair$unix(0x1, 0x5, 0x0, &(0x7fdc7000-0x8)={0x0, 0x0})
> sendmmsg$unix(r1,
> &(0x7fdbf000-0xa8)=[{&(0x7fdbe000)=@file={0x1, ""}, 0x2,
> &(0x7fdbe000)=[], 0x0, &(0x7fdc4000)=[@rights={0x20, 0x1, 0x1,
> [r0, r0, r0, r1]}, @rights={0x14, 0x1, 0x1, [0x]},
> @cred={0x20, 0x1, 0x2, 0x0, 0x0, 0x0}, @cred={0x20, 0x1, 0x2, 0x0,
> 0x0, 0x0}, @cred={0x20, 0x1, 0x2, 0x0, 0x0, 0x0}], 0x5, 0x800},
> {&(0x7fdbf000-0x7d)=@file={0x1, ""}, 0x2, &(0x7fdbe000)=[],
> 0x0, &(0x7fdbf000-0x80)=[@rights={0x20, 0x1, 0x1,
> [0x, r1, 0x, r0]}, @cred={0x20, 0x1,
> 0x2, 0x0, 0x0, 0x0}, @cred={0x20, 0x1, 0x2, 0x0, 0x0, 0x0},
> @cred={0x20, 0x1, 0x2, 0x0, 0x0, 0x0}], 0x4, 0x4},
> {&(0x7fdbf000-0x8)=@abs={0x0, 0x0, 0x8}, 0x8,
> &(0x7fdbe000)=[{&(0x7fdc-0x27)="", 0x0},
> {&(0x7fdc1000-0xb0)="", 0x0}, {&(0x7fdc2000-0xc4)="", 0x0},
> {&(0x7fdc2000)="", 0x0}, {&(0x7fdc3000)="", 0x0}], 0x5,
> &(0x7fdbe000)=[@cred={0x20, 0x1, 0x2, 0x0, 0x0, 0x0},
> @rights={0x14, 0x1, 0x1, [r1]}, @cred={0x20, 0x1, 0x2, 0x0, 0x0, 0x0},
> @cred={0x20, 0x1, 0x2, 0x0, 0x0, 0x0}], 0x4, 0x4}], 0x3, 0x800)
> dup3(r1, r0, 0x8)
> close(r1)



Now with a nice single-threaded C reproducer!

// autogenerated by syzkaller (http://github.com/google/syzkaller)
#define _GNU_SOURCE
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

void test()
{
  long r[54];
  m

Re: [PATCH iproute2 net-next] devlink: Add json and pretty options to help and man

2017-03-06 Thread Jiri Pirko
Mon, Mar 06, 2017 at 10:06:18AM CET, r...@mellanox.com wrote:
>While at it also fixed missing double dash for long opts.
>
>Signed-off-by: Roi Dayan 

Acked-by: Jiri Pirko 


netlink: GPF in netlink_unicast

2017-03-06 Thread Dmitry Vyukov
Hello,

I've got the following crash while running syzkaller fuzzer on
net-next/8d70eeb84ab277377c017af6a21d0a337025dede:

kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault:  [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 883 Comm: kauditd Not tainted 4.10.0+ #6
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
task: 8801d79f0240 task.stack: 8801d7a2
RIP: 0010:sock_sndtimeo include/net/sock.h:2162 [inline]
RIP: 0010:netlink_unicast+0xdd/0x730 net/netlink/af_netlink.c:1249
RSP: 0018:8801d7a27c38 EFLAGS: 00010206
RAX: 0056 RBX: 8801d7a27cd0 RCX: 
RDX:  RSI:  RDI: 02b0
RBP: 8801d7a27cf8 R08: ed00385cf286 R09: ed00385cf286
R10: 0006 R11: ed00385cf285 R12: 
R13: dc00 R14: 8801c2fc3c80 R15: 014000c0
FS:  () GS:8801dbe0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 20cfd000 CR3: 0001c758f000 CR4: 001406f0
Call Trace:
 kauditd_send_unicast_skb+0x3c/0x70 kernel/audit.c:482
 kauditd_thread+0x174/0xb00 kernel/audit.c:599
 kthread+0x326/0x3f0 kernel/kthread.c:229
 ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430
Code: 44 89 fe e8 56 15 ff ff 8b 8d 70 ff ff ff 49 89 c6 31 c0 85 c9
75 27 e8 b2 b2 f4 fd 49 8d bc 24 b0 02 00 00 48 89 f8 48 c1 e8 03 <42>
80 3c 28 00 0f 85 37 06 00 00 49 8b 84 24 b0 02 00 00 4c 8d
RIP: sock_sndtimeo include/net/sock.h:2162 [inline] RSP: 8801d7a27c38
RIP: netlink_unicast+0xdd/0x730 net/netlink/af_netlink.c:1249 RSP:
8801d7a27c38
---[ end trace ad1bba9d457430b6 ]---
Kernel panic - not syncing: Fatal exception


This is not reproducible and seems to be caused by an elusive race.
However, looking at the code I don't see any proper protection of
audit_sock (other than the if (!audit_pid) which is obviously not
enough to protect against races).


Re: [PATCH v2] can: m_can: enable transmission of FD frame on latest version

2017-03-06 Thread Marc Kleine-Budde
On 03/06/2017 03:21 AM, Wenyou Yang wrote:
> Enables the transmission of CAN FD frames on M_CAN IP core >= v3.1.x
> and with the bit rate switching.
> 
> Tested on M_CAN IP 3.1.0 (CREL = 0x31040730) of SAMA5D2 SoC.

Does this patch work still with the old version of the silicon?

Marc

-- 
Pengutronix e.K.  | Marc Kleine-Budde   |
Industrial Linux Solutions| Phone: +49-231-2826-924 |
Vertretung West/Dortmund  | Fax:   +49-5121-206917- |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 1/2] net: sched: make default fifo qdiscs appear in the dump

2017-03-06 Thread Jiri Kosina
On Wed, 1 Mar 2017, David Miller wrote:

> > @@ -1066,6 +1066,7 @@ hfsc_change_class(struct Qdisc *sch, u32 classid, u32 
> > parentid,
> >   &pfifo_qdisc_ops, classid);
> > if (cl->qdisc == NULL)
> > cl->qdisc = &noop_qdisc;
> > +   qdisc_hash_add(cl->qdisc, true);
> > INIT_LIST_HEAD(&cl->children);
> > cl->vt_tree = RB_ROOT;
> > cl->cf_tree = RB_ROOT;
> > @@ -1425,6 +1426,7 @@ hfsc_init_qdisc(struct Qdisc *sch, struct nlattr *opt)
> >   sch->handle);
> > if (q->root.qdisc == NULL)
> > q->root.qdisc = &noop_qdisc;
> > +   qdisc_hash_add(q->root.qdisc, true);
> > INIT_LIST_HEAD(&q->root.children);
> > q->root.vt_tree = RB_ROOT;
> > q->root.cf_tree = RB_ROOT;
> 
> I'm not so sure it is legal is potentially pass &noop_qdisc into 
> qdisc_hash_add().

Ah, right you are, thanks. The complete fix is not super trivial, as it 
needs some more surgery to tc_dump_qdisc_root(), tc_dump_tclass_root() and 
qdisc_match_from_root() (see 69012ae42 for some details).

There are two options:

- this gets fixed in two phases, in first everything *but* noop qdisc gets 
  dumped (in the "give me everything" dump) and later we finalize it by
  teaching the above functions about noop_qdisc as well

- I extend this patchset to handle noop qdisc from the very beginning; 
  I am unlikely to find time for this during coming weeks though. But OTOH
  this whole thing is very low priority anyway

What do you think?

Thanks,

-- 
Jiri Kosina
SUSE Labs



Re: [PATCH 07/26] brcmsmac: reduce stack size with KASAN

2017-03-06 Thread Arend Van Spriel
On 6-3-2017 11:38, Arnd Bergmann wrote:
> On Mon, Mar 6, 2017 at 10:16 AM, Arend Van Spriel
>  wrote:
>> On 2-3-2017 17:38, Arnd Bergmann wrote:
>>> The wlc_phy_table_write_nphy/wlc_phy_table_read_nphy functions always put 
>>> an object
>>> on the stack, which will each require a redzone with KASAN and lead to 
>>> possible
>>> stack overflow:
>>>
>>> drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c: In function 
>>> 'wlc_phy_workarounds_nphy':
>>> drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c:17135:1: 
>>> warning: the frame size of 6312 bytes is larger than 1000 bytes 
>>> [-Wframe-larger-than=]
>>
>> Looks like this warning text ended up in the wrong commit message. Got
>> me confused for a sec :-p
> 
> What's wrong about the warning?

The warning is about the function 'wlc_phy_workarounds_nphy' (see PATCH
9/26) and not about wlc_phy_table_write_nphy/wlc_phy_table_read_nphy
functions.

>>> This marks the two functions as noinline_for_kasan, avoiding the problem 
>>> entirely.
>>
>> Frankly I seriously dislike annotating code for the sake of some
>> (dynamic) memory analyzer. To me the whole thing seems rather
>> unnecessary. If the code passes the 2048 stack limit without KASAN it
>> would seem the limit with KASAN should be such that no warning is given.
>> I suspect that it is rather difficult to predict the additional size of
>> the instrumentation code and on some systems there might be a real issue
>> with increased stack usage.
> 
> The frame sizes don't normally change that much. There are a couple of
> drivers like brcmsmac that repeatedly call an inline function which has
> a local variable that it passes by reference to an extern function.
> 
> While normally those variables share a stack location, KASAN forces
> each instance to its own location and adds (in this case) 80 bytes of
> redzone around it to detect out-of-bounds access.
> 
> While most drivers are fine with a 1500 byte warning limit, increasing
> the limit to 7kb would silence brcmsmac (unless more registers
> are accessed from wlc_phy_workarounds_nphy) but also risk a
> stack overflow to go unnoticed.

Given the amount of local variables maybe just tag the functions with
noinline instead.

Regards,
Arend


Re: [PATCH 07/26] brcmsmac: reduce stack size with KASAN

2017-03-06 Thread Arnd Bergmann
On Mon, Mar 6, 2017 at 12:02 PM, Arend Van Spriel
 wrote:
> On 6-3-2017 11:38, Arnd Bergmann wrote:
>> On Mon, Mar 6, 2017 at 10:16 AM, Arend Van Spriel
>>  wrote:
>>> On 2-3-2017 17:38, Arnd Bergmann wrote:
 The wlc_phy_table_write_nphy/wlc_phy_table_read_nphy functions always put 
 an object
 on the stack, which will each require a redzone with KASAN and lead to 
 possible
 stack overflow:

 drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c: In function 
 'wlc_phy_workarounds_nphy':
 drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c:17135:1: 
 warning: the frame size of 6312 bytes is larger than 1000 bytes 
 [-Wframe-larger-than=]
>>>
>>> Looks like this warning text ended up in the wrong commit message. Got
>>> me confused for a sec :-p
>>
>> What's wrong about the warning?
>
> The warning is about the function 'wlc_phy_workarounds_nphy' (see PATCH
> 9/26) and not about wlc_phy_table_write_nphy/wlc_phy_table_read_nphy
> functions.

The warning only shows up for wlc_phy_workarounds_nphy, and we have to
fix both issues to get the size down enough. If we split it up without
uninlining the register access functions, we end up with two or three smaller
functions that still exceed the limit.

 This marks the two functions as noinline_for_kasan, avoiding the problem 
 entirely.
>>>
>>> Frankly I seriously dislike annotating code for the sake of some
>>> (dynamic) memory analyzer. To me the whole thing seems rather
>>> unnecessary. If the code passes the 2048 stack limit without KASAN it
>>> would seem the limit with KASAN should be such that no warning is given.
>>> I suspect that it is rather difficult to predict the additional size of
>>> the instrumentation code and on some systems there might be a real issue
>>> with increased stack usage.
>>
>> The frame sizes don't normally change that much. There are a couple of
>> drivers like brcmsmac that repeatedly call an inline function which has
>> a local variable that it passes by reference to an extern function.
>>
>> While normally those variables share a stack location, KASAN forces
>> each instance to its own location and adds (in this case) 80 bytes of
>> redzone around it to detect out-of-bounds access.
>>
>> While most drivers are fine with a 1500 byte warning limit, increasing
>> the limit to 7kb would silence brcmsmac (unless more registers
>> are accessed from wlc_phy_workarounds_nphy) but also risk a
>> stack overflow to go unnoticed.
>
> Given the amount of local variables maybe just tag the functions with
> noinline instead.

But that would result in less efficient object code without KASAN,
as inlining these by default is a good idea when the stack variables
all get folded.

 Arnd


Re: [PATCH 07/26] brcmsmac: reduce stack size with KASAN

2017-03-06 Thread Arnd Bergmann
On Mon, Mar 6, 2017 at 12:16 PM, Arnd Bergmann  wrote:
> On Mon, Mar 6, 2017 at 12:02 PM, Arend Van Spriel
>  wrote:
>> On 6-3-2017 11:38, Arnd Bergmann wrote:
>>> On Mon, Mar 6, 2017 at 10:16 AM, Arend Van Spriel
>>>  wrote:

>> Given the amount of local variables maybe just tag the functions with
>> noinline instead.
>
> But that would result in less efficient object code without KASAN,
> as inlining these by default is a good idea when the stack variables
> all get folded.

Note that David Laight alread suggested renaming noinline_for_kasan
to noinline_if_stackbloat, which makes it a little more obvious what
is going on. Would that address your concern as well?

Arnd


Re: [4.9.13] brcmf use-after-free on resume

2017-03-06 Thread Arend Van Spriel
+ linux-wireless

On 6-3-2017 8:04, Daniel J Blueman wrote:
> When resuming from suspend with a BCM43602 on Ubuntu 16.04 with
> 4.9.13, we see use after free [1].
> 
> We see the struct cfg80211_ops is accessed in the resume path, after
> it was previously freed:
> 
> (gdb) list *(brcmf_cfg80211_attach+0x10b)
> 0x1d77b is in brcmf_cfg80211_attach
> (drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c:6861).
> 6856brcmf_err("ndev is invalid\n");
> 6857return NULL;
> 6858}
> 6859
> 6860ops = kmemdup(&brcmf_cfg80211_ops, sizeof(*ops), GFP_KERNEL);
> 6861if (!ops)
> 6862return NULL;
> 6863
> 6864ifp = netdev_priv(ndev);
> 6865#ifdef CONFIG_PM
> 
> (gdb) list *(wiphy_resume+0x591)
> 0xb751 is in wiphy_resume (net/wireless/sysfs.c:133).
> 128int ret = 0;
> 129
> 130/* Age scan results with time spent in suspend */
> 131cfg80211_bss_age(rdev, get_seconds() - rdev->suspend_at);
> 132
> 133if (rdev->ops->resume) {
> 134rtnl_lock();
> 135if (rdev->wiphy.registered)
> 136ret = rdev_resume(rdev);
> 137rtnl_unlock();
> 
> I'm unsure if this relates to the ordering of callbacks processed by
> dpm_run_callback.

The problem is that our driver can not access the device as it has been
powered off during suspend. So in the resume we cleanup everything
calling wiphy_unregister() and wiphy_free(). This means the rdev in
wiphy_resume() above is already freed. Not sure how to handle this
properly. Probably we should do a proper rebind.

Regards,
Arend

> Thanks,
>   Daniel
> 
> -- [1]
> 
> BUG: KASAN: use-after-free in wiphy_resume+0x591/0x5a0 [cfg80211] at
> addr 8803fefebb30
> Read of size 8 by task kworker/u16:15/3066
> CPU: 0 PID: 3066 Comm: kworker/u16:15 Not tainted 4.9.13-debug+ #7
> Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
> Workqueue: events_unbound async_run_entry_fn
>  8803bffdf9d8 880db6e1 88042740ef00 8803fefebb28
>  8803bffdfa00 87a4d941 8803bffdfa98 8803fefebb20
>  88042740ef00 8803bffdfa88 87a4dbda 8803fb132360
> Call Trace:
>  [] dump_stack+0x85/0xc4
>  [] kasan_object_err+0x21/0x70
>  [] kasan_report_error+0x1fa/0x500
>  [] ? trace_hardirqs_on_caller+0x3fe/0x580
>  [] ? cfg80211_bss_age+0x9a/0xc0 [cfg80211]
>  [] ? trace_hardirqs_on+0xd/0x10
>  [] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
>  [] __asan_report_load8_noabort+0x61/0x70
>  [] ? wiphy_suspend+0xbb0/0xc70 [cfg80211]
>  [] ? wiphy_resume+0x591/0x5a0 [cfg80211]
>  [] wiphy_resume+0x591/0x5a0 [cfg80211]
>  [] ? wiphy_suspend+0xc70/0xc70 [cfg80211]
>  [] dpm_run_callback+0x6e/0x4f0
>  [] device_resume+0x1c2/0x670
>  [] async_resume+0x1d/0x50
>  [] async_run_entry_fn+0xfe/0x610
>  [] process_one_work+0x716/0x1a50
>  [] ? process_one_work+0x679/0x1a50
>  [] ? _raw_spin_unlock_irq+0x3d/0x60
>  [] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
>  [] worker_thread+0xe0/0x1460
>  [] ? process_one_work+0x1a50/0x1a50
>  [] kthread+0x222/0x2e0
>  [] ? kthread_park+0x80/0x80
>  [] ? kthread_park+0x80/0x80
>  [] ? kthread_park+0x80/0x80
>  [] ret_from_fork+0x2a/0x40
> Object at 8803fefebb28, in cache kmalloc-1024 size: 1024
> Allocated:
> PID = 431
>  save_stack_trace+0x1b/0x20
>  save_stack+0x46/0xd0
>  kasan_kmalloc+0xad/0xe0
>  kasan_slab_alloc+0x12/0x20
>  __kmalloc_track_caller+0x134/0x360
>  kmemdup+0x20/0x50
>  brcmf_cfg80211_attach+0x10b/0x3a90 [brcmfmac]
>  brcmf_bus_start+0x19a/0x9a0 [brcmfmac]
>  brcmf_pcie_setup+0x1f1a/0x3680 [brcmfmac]
>  brcmf_fw_request_nvram_done+0x44c/0x11b0 [brcmfmac]
>  request_firmware_work_func+0x135/0x280
>  process_one_work+0x716/0x1a50
>  worker_thread+0xe0/0x1460
>  kthread+0x222/0x2e0
>  ret_from_fork+0x2a/0x40
> Freed:
> PID = 3101
>  save_stack_trace+0x1b/0x20
>  save_stack+0x46/0xd0
>  kasan_slab_free+0x71/0xb0
>  kfree+0xe8/0x2e0
>  brcmf_cfg80211_detach+0x62/0xf0 [brcmfmac]
>  brcmf_detach+0x14a/0x2b0 [brcmfmac]
>  brcmf_pcie_remove+0x140/0x5d0 [brcmfmac]
>  brcmf_pcie_pm_leave_D3+0x198/0x2e0 [brcmfmac]
>  pci_pm_resume+0x186/0x220
>  dpm_run_callback+0x6e/0x4f0
>  device_resume+0x1c2/0x670
>  async_resume+0x1d/0x50
>  async_run_entry_fn+0xfe/0x610
>  process_one_work+0x716/0x1a50
>  worker_thread+0xe0/0x1460
>  kthread+0x222/0x2e0
>  ret_from_fork+0x2a/0x40
> Memory state around the buggy address:
>  8803fefeba00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>  8803fefeba80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>> 8803fefebb00: fc fc fc fc fc fb fb fb fb fb fb fb fb fb fb fb
>^
>  8803fefebb80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  8803fefebc00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> 


Re: 4.11-rc1 regression: e1000e "BUG at drivers/pci/msi.c" on unplugged suspend+resume

2017-03-06 Thread Bjørn Mork
Bjørn Mork  writes:

> This is new with v4.11-rc1, so I strongly suspect commit 7e54d9d063fa
> ("e1000e: driver trying to free already-free irq"), which looks more
> than suspicious in this context.  Haven't had time to test a revert
> yet.  Just wanted to give an advance warning in case this isn't known.

Now tested.  I can confirm that reverting commit 7e54d9d063fa ("e1000e:
driver trying to free already-free irq") fixes the issue.

Further testing also shows that "netif running" is irrelevant.  The BUG
happens consistently on revery system resume, regardless of the e1000e
link state.  Which sort of indicates that this change to the driver's
freeze callback wasn't tested with system suspend.  Which seems odd?

Well, whatever.  Please revert commit 7e54d9d063fa.



Bjørn


[PATCH iproute2 master] bpf: test for valid type in bpf_get_work_dir

2017-03-06 Thread Daniel Borkmann
Jan-Erik reported an assertion in bpf_prog_to_subdir() failed where
type was BPF_PROG_TYPE_UNSPEC, which is only used in bpf_init_env()
to auto-mount and cache the bpf fs mount point.

Therefore, make sure when bpf_init_env() is called multiple times
(f.e. eBPF classifier with eBPF action attached) and bpf_mnt_cached
is set already that the type is also valid. In bpf_init_env(), we're
only interested in the mount point and not a type-specific subdir.

Fixes: e42256699cac ("bpf: make tc's bpf loader generic and move into lib")
Reported-by: Jan-Erik Rediger 
Signed-off-by: Daniel Borkmann 
---
 lib/bpf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/bpf.c b/lib/bpf.c
index 211c3d1..04ee1ab 100644
--- a/lib/bpf.c
+++ b/lib/bpf.c
@@ -596,7 +596,7 @@ static const char *bpf_get_work_dir(enum bpf_prog_type type)
if (bpf_mnt_cached) {
const char *out = mnt;
 
-   if (out) {
+   if (out && type) {
snprintf(bpf_tmp, sizeof(bpf_tmp), "%s%s/",
 out, bpf_prog_to_subdir(type));
out = bpf_tmp;
-- 
1.9.3



Re: [PATCH] mac80211: Use setup_timer instead of init_timer

2017-03-06 Thread Jiri Slaby
On 03/06/2017, 01:25 PM, Johannes Berg wrote:
> On Fri, 2017-03-03 at 13:45 +0100, Jiri Slaby wrote:
>> From: Ondřej Lysoněk 
>>
>> Use setup_timer() and setup_deferrable_timer() to set the data and
>> function timer fields. It makes the code cleaner and will allow for
>> easier change of the timer struct internals.
> 
> Btw, I suspect you generated this with coccinelle and didn't put enough
> "..." there, because you missed one in mesh_path_new() :)

Not really. This is one of assignments for students I lead, so this is
done by hand every end of winter semester (Note the From line.)

> Care to send a patch for that one too?

I am just a forwarder, he received this request too, so you can try to
persuade him :).

thanks,
-- 
js
suse labs


Re: [PATCH] mac80211: Use setup_timer instead of init_timer

2017-03-06 Thread Johannes Berg

> Not really. This is one of assignments for students I lead, so this
> is done by hand every end of winter semester (Note the From line.)

You really should teach them about coccinelle then :-)

> > Care to send a patch for that one too?
> 
> I am just a forwarder, he received this request too, so you can try
> to persuade him :).

Hah ok. I thought you actually cared about the end result ;)

johannes


Re: [PATCH 1/4] net: thunderx: Fix IOMMU translation faults

2017-03-06 Thread Robin Murphy
On 04/03/17 05:54, Sunil Kovvuri wrote:
> On Fri, Mar 3, 2017 at 11:26 PM, David Miller  wrote:
>> From: sunil.kovv...@gmail.com
>> Date: Fri,  3 Mar 2017 16:17:47 +0530
>>
>>> @@ -1643,6 +1650,9 @@ static int nicvf_probe(struct pci_dev *pdev, const 
>>> struct pci_device_id *ent)
>>>   if (!pass1_silicon(nic->pdev))
>>>   nic->hw_tso = true;
>>>
>>> + /* Check if we are attached to IOMMU */
>>> + nic->iommu_domain = iommu_get_domain_for_dev(dev);
>>
>> This function is not universally available.
> 
> Even if CONFIG_IOMMU_API is not enabled, it will return NULL and will be okay.
> http://lxr.free-electrons.com/source/include/linux/iommu.h#L400
> 
>>
>> This looks very hackish to me anyways, how all of this stuff is supposed
>> to work is that you simply use the DMA interfaces unconditionally and
>> whatever is behind the operations takes care of everything.
>>
>> Doing it conditionally in the driver with all of this special IOMMU
>> domain et al. knowledge makes no sense to me at all.
>>
>> I don't see other drivers doing stuff like this at all, so if you're
>> going to handle this in a unique way like this you better write
>> several paragraphs in your commit message explaining why this weird
>> crap is necessary.
> 
> I already tried to explain in the commit message that HW anyway takes care
> of data coherency, so calling DMA interfaces when there is no IOMMU will
> only result in performance drop.
> 
> We are seeing a 0.75Mpps drop with IP forwarding rate due to that.
> Hence I have restricted calling DMA interfaces to only when IOMMU is enabled.

What's 0.07Mpps as a percentage of baseline? On a correctly configured
coherent arm64 system, in the absence of an IOMMU, dma_map_*() is
essentially just virt_to_phys() behind a function call or two, so I'd be
interested to know where any non-trivial overhead might be coming from.

Robin.

> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 



Re: [PATCH 1/4] net: thunderx: Fix IOMMU translation faults

2017-03-06 Thread Sunil Kovvuri
>>
>> We are seeing a 0.75Mpps drop with IP forwarding rate due to that.
>> Hence I have restricted calling DMA interfaces to only when IOMMU is enabled.
>
> What's 0.07Mpps as a percentage of baseline? On a correctly configured
> coherent arm64 system, in the absence of an IOMMU, dma_map_*() is
> essentially just virt_to_phys() behind a function call or two, so I'd be
> interested to know where any non-trivial overhead might be coming from.

It's a 5% drop and yes device is configured as coherent.
And the drop is due to additional function calls.

Thanks,
Sunil.


Re: [PATCH] 4.9.13 brcmfmac: fix use-after-free on resume

2017-03-06 Thread Arend Van Spriel
+ linux-wireless

On 6-3-2017 8:14, Daniel J Blueman wrote:
> KASAN reported 'struct wireless_dev wdev' was read after being freed.
> Fix by freeing after the access.

I would rather like to see the KASAN report, because something is off
here. This function is called with wdev as a parameter so how can it be
accessed after free here? brcmf_remove_interface() does not free the
wdev nor the brcmf_cfg80211_vif instance which contains the wdev.

Regards,
Arend

> Signed-off-by: Daniel J Blueman 
> 
> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c
> b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c
> index de19c7c..aa0f470 100644
> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c
> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c
> @@ -2288,12 +2288,13 @@ int brcmf_p2p_del_vif(struct wiphy *wiphy,
> struct wireless_dev *wdev)
> else
> err = 0;
> }
> -   brcmf_remove_interface(vif->ifp, true);
> 
> -   brcmf_cfg80211_arm_vif_event(cfg, NULL);
> if (vif->wdev.iftype != NL80211_IFTYPE_P2P_DEVICE)
> p2p->bss_idx[P2PAPI_BSSCFG_CONNECTION].vif = NULL;
> 
> +   brcmf_remove_interface(vif->ifp, true);
> +   brcmf_cfg80211_arm_vif_event(cfg, NULL);
> +
> return err;
>  }
> 


Re: [PATCH] mac80211: Use setup_timer instead of init_timer

2017-03-06 Thread Johannes Berg
On Mon, 2017-03-06 at 13:25 +0100, Johannes Berg wrote:
> On Fri, 2017-03-03 at 13:45 +0100, Jiri Slaby wrote:
> > From: Ondřej Lysoněk 
> > 
> > Use setup_timer() and setup_deferrable_timer() to set the data and
> > function timer fields. It makes the code cleaner and will allow for
> > easier change of the timer struct internals.
> 
> Btw, I suspect you generated this with coccinelle and didn't put
> enough
> "..." there, because you missed one in mesh_path_new() :)
> 
and perhaps mesh_plink_timer_set()?

johannes


Re: [PATCH] mac80211: Use setup_timer instead of init_timer

2017-03-06 Thread Johannes Berg
On Fri, 2017-03-03 at 13:45 +0100, Jiri Slaby wrote:
> From: Ondřej Lysoněk 
> 
> Use setup_timer() and setup_deferrable_timer() to set the data and
> function timer fields. It makes the code cleaner and will allow for
> easier change of the timer struct internals.

Applied.

johannes


Re: [PATCH] mac80211: Use setup_timer instead of init_timer

2017-03-06 Thread Johannes Berg
On Fri, 2017-03-03 at 13:45 +0100, Jiri Slaby wrote:
> From: Ondřej Lysoněk 
> 
> Use setup_timer() and setup_deferrable_timer() to set the data and
> function timer fields. It makes the code cleaner and will allow for
> easier change of the timer struct internals.

Btw, I suspect you generated this with coccinelle and didn't put enough
"..." there, because you missed one in mesh_path_new() :)

Care to send a patch for that one too?

johannes


Re: [Patch net] ipv6: reorder icmpv6_init() and ip6_mr_init()

2017-03-06 Thread Andrey Konovalov
On Sun, Mar 5, 2017 at 9:34 PM, Cong Wang  wrote:
> Andrey reported the following kernel crash:
>
> kasan: GPF could be caused by NULL-ptr deref or user memory access
> general protection fault:  [#1] SMP KASAN
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 0 PID: 14446 Comm: syz-executor6 Not tainted 4.10.0+ #82
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: 88001f311700 task.stack: 88001f6e8000
> RIP: 0010:ip6mr_sk_done+0x15a/0x3d0 net/ipv6/ip6mr.c:1618
> RSP: 0018:88001f6ef418 EFLAGS: 00010202
> RAX: dc00 RBX: 110003edde8c RCX: c900043ee000
> RDX: 0004 RSI: 83e3b3f8 RDI: 0020
> RBP: 88001f6ef508 R08: fbfff0dcc5d8 R09: 
> R10: 86e62ec0 R11:  R12: 
> R13:  R14: 88001f6ef4e0 R15: 8800380a0040
> FS:  7f7a52cec700() GS:88003ec0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 0061c500 CR3: 1f1ae000 CR4: 06f0
> DR0: 2000 DR1: 2000 DR2: 
> DR3:  DR6: 0ff0 DR7: 0600
> Call Trace:
>  rawv6_close+0x4c/0x80 net/ipv6/raw.c:1217
>  inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425
>  inet6_release+0x50/0x70 net/ipv6/af_inet6.c:432
>  sock_release+0x8d/0x1e0 net/socket.c:597
>  __sock_create+0x39d/0x880 net/socket.c:1226
>  sock_create_kern+0x3f/0x50 net/socket.c:1243
>  inet_ctl_sock_create+0xbb/0x280 net/ipv4/af_inet.c:1526
>  icmpv6_sk_init+0x163/0x500 net/ipv6/icmp.c:954
>  ops_init+0x10a/0x550 net/core/net_namespace.c:115
>  setup_net+0x261/0x660 net/core/net_namespace.c:291
>  copy_net_ns+0x27e/0x540 net/core/net_namespace.c:396
> 9pnet_virtio: no channels available for device ./file1
>  create_new_namespaces+0x437/0x9b0 kernel/nsproxy.c:106
>  unshare_nsproxy_namespaces+0xae/0x1e0 kernel/nsproxy.c:205
>  SYSC_unshare kernel/fork.c:2281 [inline]
>  SyS_unshare+0x64e/0x1000 kernel/fork.c:2231
>  entry_SYSCALL_64_fastpath+0x1f/0xc2
>
> This is because net->ipv6.mr6_tables is not initialized at that point,
> ip6mr_rules_init() is not called yet, therefore on the error path when
> we iterator the list, we trigger this oops. Fix this by reordering
> ip6mr_rules_init() before icmpv6_sk_init().
>
> Reported-by: Andrey Konovalov 
> Signed-off-by: Cong Wang 
> ---
>  net/ipv6/af_inet6.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
> index 04db406..a9a9553 100644
> --- a/net/ipv6/af_inet6.c
> +++ b/net/ipv6/af_inet6.c
> @@ -920,12 +920,12 @@ static int __init inet6_init(void)
> err = register_pernet_subsys(&inet6_net_ops);
> if (err)
> goto register_pernet_fail;
> -   err = icmpv6_init();
> -   if (err)
> -   goto icmp_fail;
> err = ip6_mr_init();
> if (err)
> goto ipmr_fail;
> +   err = icmpv6_init();
> +   if (err)
> +   goto icmp_fail;
> err = ndisc_init();
> if (err)
> goto ndisc_fail;
> @@ -1061,10 +1061,10 @@ static int __init inet6_init(void)
> ndisc_cleanup();
>  ndisc_fail:
> ip6_mr_cleanup();
> -ipmr_fail:
> -   icmpv6_cleanup();
>  icmp_fail:
> unregister_pernet_subsys(&inet6_net_ops);
> +ipmr_fail:
> +   icmpv6_cleanup();
>  register_pernet_fail:
> sock_unregister(PF_INET6);
> rtnl_unregister_all(PF_INET6);
> --
> 2.5.5
>

Thanks!


Re: [E1000-devel] jitter / latency reduction

2017-03-06 Thread Leonardo Amaral - Listas
2017-03-03 20:52 GMT-03:00 Mahmood Qazen :
>
> this week I read a presentation by Jesse and towards the end it asks if we
> can help.


Hello,

Can you please share this presentation? I'm interested in this subject too.

Thanks!


Leonardo Amaral
about.me/leonardo.amaral


[PATCH] net: ibm: emac: fix regression caused by emac_dt_phy_probe()

2017-03-06 Thread Christian Lamparter
Julian Margetson reported a panic on his SAM460EX with Kernel 4.11-rc1:
| Unable to handle kernel paging request for data at address 0x0014
| Oops: Kernel access of bad area, sig: 11 [#1]
| PREEMPT
| Canyonlands
| Modules linked in:
| CPU: 0 PID: 1 Comm: swapper Not tainted [...]
| task: ea838000 task.stack: ea836000
| NIP: c0599f5c LR: c0599dd8 CTR: 
| REGS: ea837c80 TRAP: 0300   Not tainted [...]
| MSR: 00029000 
|  CR: 24371242  XER: 2000
| DEAR: 0014 ESR: 
| GPR00: c0599ce8 ea837d30 ea838000 c0e52dcc c0d56ffb [...]
| NIP [c0599f5c] emac_probe+0xfb4/0x1304
| LR [c0599dd8] emac_probe+0xe30/0x1304
| Call Trace:
| [ea837d30] [c0599ce8] emac_probe+0xd40/0x1304 (unreliable)
| [ea837d80] [c0533504] platform_drv_probe+0x48/0x90
| [ea837da0] [c0531c14] driver_probe_device+0x15c/0x2c4
| [ea837dd0] [c0531e04] __driver_attach+0x88/0xb0
| ---[ end trace ... ]---

The problem is caused by emac_dt_phy_probe() returing success (0)
for existing device-trees configurations that do not specify a
"phy-handle" property. This caused the code to skip the existing
phy probe and setup. Which led to essential phy related
data-structures being uninitialized.

This patch also removes the unused variable in emac_dt_phy_connect().

Fixes: a577ca6badb5261d ("net: emac: add support for device-tree based PHY 
discovery and setup")
Reported-by: Julian Margetson 
Signed-off-by: Christian Lamparter 
---
 drivers/net/ethernet/ibm/emac/core.c | 25 +
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/ibm/emac/core.c 
b/drivers/net/ethernet/ibm/emac/core.c
index 275c2e2349ad..c44036d5761a 100644
--- a/drivers/net/ethernet/ibm/emac/core.c
+++ b/drivers/net/ethernet/ibm/emac/core.c
@@ -2589,8 +2589,6 @@ static int emac_dt_mdio_probe(struct emac_instance *dev)
 static int emac_dt_phy_connect(struct emac_instance *dev,
   struct device_node *phy_handle)
 {
-   int res;
-
dev->phy.def = devm_kzalloc(&dev->ofdev->dev, sizeof(*dev->phy.def),
GFP_KERNEL);
if (!dev->phy.def)
@@ -2617,7 +2615,7 @@ static int emac_dt_phy_probe(struct emac_instance *dev)
 {
struct device_node *np = dev->ofdev->dev.of_node;
struct device_node *phy_handle;
-   int res = 0;
+   int res = 1;
 
phy_handle = of_parse_phandle(np, "phy-handle", 0);
 
@@ -2714,13 +2712,24 @@ static int emac_init_phy(struct emac_instance *dev)
if (emac_has_feature(dev, EMAC_FTR_HAS_RGMII)) {
int res = emac_dt_phy_probe(dev);
 
-   mutex_unlock(&emac_phy_map_lock);
-   if (!res)
+   switch (res) {
+   case 1:
+   /* No phy-handle property configured.
+* Continue with the existing phy probe
+* and setup code.
+*/
+   break;
+
+   case 0:
+   mutex_unlock(&emac_phy_map_lock);
goto init_phy;
 
-   dev_err(&dev->ofdev->dev, "failed to attach dt phy (%d).\n",
-   res);
-   return res;
+   default:
+   mutex_unlock(&emac_phy_map_lock);
+   dev_err(&dev->ofdev->dev, "failed to attach dt phy 
(%d).\n",
+   res);
+   return res;
+   }
}
 
if (dev->phy_address != 0x)
-- 
2.11.0



Re: [Patch net] bonding: use ETH_MAX_MTU as max mtu

2017-03-06 Thread Jarod Wilson

On 2017-03-02 3:24 PM, Cong Wang wrote:

This restores the ability of setting bond device's mtu to 9000.

Fixes: 91572088e3fd ("net: use core MTU range checking in core net infra")
Reported-by: daz...@gmail.com
Reported-by: Brad Campbell 
Cc: Jarod Wilson 
Signed-off-by: Cong Wang 


Apologies, I'm a bit late to the party, direct CC didn't land in inbox 
because of duplicate suppression or perhaps a greedy mail filtering rule 
(*grumble*)... Too late to ack, but yeah, that's necessary. I *think* 
the team driver may also require the same treatment. It calls 
ether_setup() without anything setting max_mtu as well.


--
Jarod Wilson
ja...@redhat.com


Re: [4.9.13] use after free in ipv4_mtu

2017-03-06 Thread Eric Dumazet
On Mon, 2017-03-06 at 14:33 +0800, Daniel J Blueman wrote:
> On 2 March 2017 at 21:28, Eric Dumazet  wrote:
> > On Thu, 2017-03-02 at 05:08 -0800, Eric Dumazet wrote:
> >
> >> Thanks for the report !
> >>
> >> This patch should solve this precise issue, but we need more work.
> >>
> >> We need to audit all __sk_dst_get() and make sure they are inside an
> >> rcu_read_lock()/rcu_read_unlock() section.
> >>
> >> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> >> index 
> >> 22548b5f05cbe5a655e0c53df2d31c5cc2e8a702..517963e1cb6eb9d70fcd71f44262813c3378759f
> >>  100644
> >> --- a/net/ipv4/tcp_output.c
> >> +++ b/net/ipv4/tcp_output.c
> >> @@ -1459,7 +1459,7 @@ EXPORT_SYMBOL(tcp_sync_mss);
> >>  unsigned int tcp_current_mss(struct sock *sk)
> >>  {
> >>   const struct tcp_sock *tp = tcp_sk(sk);
> >> - const struct dst_entry *dst = __sk_dst_get(sk);
> >> + const struct dst_entry *dst;
> >>   u32 mss_now;
> >>   unsigned int header_len;
> >>   struct tcp_out_options opts;
> >> @@ -1467,11 +1467,14 @@ unsigned int tcp_current_mss(struct sock *sk)
> >>
> >>   mss_now = tp->mss_cache;
> >>
> >> + rcu_read_lock();
> >> + dst = __sk_dst_get(sk);
> >>   if (dst) {
> >>   u32 mtu = dst_mtu(dst);
> >>   if (mtu != inet_csk(sk)->icsk_pmtu_cookie)
> >>   mss_now = tcp_sync_mss(sk, mtu);
> >>   }
> >> + rcu_read_unlock();
> >>
> >>   header_len = tcp_established_options(sk, NULL, &opts, &md5) +
> >>sizeof(struct tcphdr);
> >
> > Normally TCP sockets sk_dst_cache can only be changed if the thread
> > doing the change owns the socket.
> >
> > I have not yet understood which point was breaking the rule yet.
> 
> Great work Eric! I have been unable to reproduce the KASAN warning
> with this patch.
> 
> Reported-by: Daniel J Blueman 
> Tested-by: Daniel J Blueman 
> 
> I do change the network queueing discipline and related at runtime [1]
> which may be triggering this, though I did think I saw the KASAN
> report only after resuming from suspend. rf(un)kill and other tweaking
> may have been involved too.
> 
> Thanks,
>   Dan
> 
> [1] /etc/sysctl.d/90-tcp.conf
> 
> net.core.default_qdisc = fq_codel
> net.ipv4.tcp_congestion_control = bbr
> net.ipv4.tcp_slow_start_after_idle = 0
> net.ipv4.tcp_ecn = 1

Thanks Daniel, but this bandaid patch should not be needed.

Somehow another point in the stack is at fault and needs to be
identified.

Otherwise we'll keep adding works around.

Since net-next is soon to be re-opened, I will submit patches adding
more lockdep assisted checks.





[PATCH net] team: use ETH_MAX_MTU as max mtu

2017-03-06 Thread Jarod Wilson
This restores the ability to set a team device's mtu to anything higher
than 1500. Similar to the reported issue with bonding, the team driver
calls ether_setup(), which sets an initial max_mtu of 1500, while the
underlying hardware can handle something much larger. Just set it to
ETH_MAX_MTU to support all possible values, and the limitations of the
underlying devices will prevent setting anything too large.

Fixes: 91572088e3fd ("net: use core MTU range checking in core net infra")
CC: Cong Wang 
CC: Jiri Pirko 
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson 
---
 drivers/net/team/team.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index 4a24b5d15f5a..1b52520715ae 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -2072,6 +2072,7 @@ static int team_dev_type_check_change(struct net_device 
*dev,
 static void team_setup(struct net_device *dev)
 {
ether_setup(dev);
+   dev->max_mtu = ETH_MAX_MTU;
 
dev->netdev_ops = &team_netdev_ops;
dev->ethtool_ops = &team_ethtool_ops;
-- 
2.11.0



Re: [PATCH net] team: use ETH_MAX_MTU as max mtu

2017-03-06 Thread Jiri Pirko
Mon, Mar 06, 2017 at 02:48:58PM CET, ja...@redhat.com wrote:
>This restores the ability to set a team device's mtu to anything higher
>than 1500. Similar to the reported issue with bonding, the team driver
>calls ether_setup(), which sets an initial max_mtu of 1500, while the
>underlying hardware can handle something much larger. Just set it to
>ETH_MAX_MTU to support all possible values, and the limitations of the
>underlying devices will prevent setting anything too large.
>
>Fixes: 91572088e3fd ("net: use core MTU range checking in core net infra")
>CC: Cong Wang 
>CC: Jiri Pirko 
>CC: netdev@vger.kernel.org
>Signed-off-by: Jarod Wilson 

Acked-by: Jiri Pirko 


Re: [Patch net] bonding: use ETH_MAX_MTU as max mtu

2017-03-06 Thread Jarod Wilson

On 2017-03-06 8:40 AM, Jiri Pirko wrote:

Mon, Mar 06, 2017 at 02:36:47PM CET, ja...@redhat.com wrote:

On 2017-03-02 3:24 PM, Cong Wang wrote:

This restores the ability of setting bond device's mtu to 9000.

Fixes: 91572088e3fd ("net: use core MTU range checking in core net infra")
Reported-by: daz...@gmail.com
Reported-by: Brad Campbell 
Cc: Jarod Wilson 
Signed-off-by: Cong Wang 


Apologies, I'm a bit late to the party, direct CC didn't land in inbox
because of duplicate suppression or perhaps a greedy mail filtering rule
(*grumble*)... Too late to ack, but yeah, that's necessary. I *think* the
team driver may also require the same treatment. It calls ether_setup()
without anything setting max_mtu as well.


Jarod, could you please send the fix? Thanks.


Done.

--
Jarod Wilson
ja...@redhat.com


Re: [Patch net] bonding: use ETH_MAX_MTU as max mtu

2017-03-06 Thread Jiri Pirko
Mon, Mar 06, 2017 at 02:36:47PM CET, ja...@redhat.com wrote:
>On 2017-03-02 3:24 PM, Cong Wang wrote:
>> This restores the ability of setting bond device's mtu to 9000.
>> 
>> Fixes: 91572088e3fd ("net: use core MTU range checking in core net infra")
>> Reported-by: daz...@gmail.com
>> Reported-by: Brad Campbell 
>> Cc: Jarod Wilson 
>> Signed-off-by: Cong Wang 
>
>Apologies, I'm a bit late to the party, direct CC didn't land in inbox
>because of duplicate suppression or perhaps a greedy mail filtering rule
>(*grumble*)... Too late to ack, but yeah, that's necessary. I *think* the
>team driver may also require the same treatment. It calls ether_setup()
>without anything setting max_mtu as well.

Jarod, could you please send the fix? Thanks.


Re: [PATCH net v2] xen-netback: fix race condition on XenBus disconnect

2017-03-06 Thread Igor Druzhinin
On 06/03/17 08:58, Paul Durrant wrote:
>> -Original Message-
>> From: Igor Druzhinin [mailto:igor.druzhi...@citrix.com]
>> Sent: 03 March 2017 20:23
>> To: netdev@vger.kernel.org; xen-de...@lists.xenproject.org
>> Cc: Paul Durrant ; jgr...@suse.com; Wei Liu
>> ; Igor Druzhinin 
>> Subject: [PATCH net v2] xen-netback: fix race condition on XenBus
>> disconnect
>>
>> In some cases during XenBus disconnect event handling and subsequent
>> queue resource release there may be some TX handlers active on
>> other processors. Use RCU in order to synchronize with them.
>>
>> Signed-off-by: Igor Druzhinin 
>> ---
>> v2:
>>  * Add protection for xenvif_get_ethtool_stats
>>  * Additional comments and fixes
>> ---
>>  drivers/net/xen-netback/interface.c | 29 ++---
>>  drivers/net/xen-netback/netback.c   |  2 +-
>>  drivers/net/xen-netback/xenbus.c| 20 ++--
>>  3 files changed, 33 insertions(+), 18 deletions(-)
>>
>> diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-
>> netback/interface.c
>> index a2d32676..266b7cd 100644
>> --- a/drivers/net/xen-netback/interface.c
>> +++ b/drivers/net/xen-netback/interface.c
>> @@ -164,13 +164,17 @@ static int xenvif_start_xmit(struct sk_buff *skb,
>> struct net_device *dev)
>>  {
>>  struct xenvif *vif = netdev_priv(dev);
>>  struct xenvif_queue *queue = NULL;
>> -unsigned int num_queues = vif->num_queues;
>> +unsigned int num_queues;
>>  u16 index;
>>  struct xenvif_rx_cb *cb;
>>
>>  BUG_ON(skb->dev != dev);
>>
>> -/* Drop the packet if queues are not set up */
>> +/* Drop the packet if queues are not set up.
>> + * This handler should be called inside an RCU read section
>> + * so we don't need to enter it here explicitly.
>> + */
>> +num_queues = rcu_dereference(vif)->num_queues;
>>  if (num_queues < 1)
>>  goto drop;
>>
>> @@ -221,18 +225,21 @@ static struct net_device_stats
>> *xenvif_get_stats(struct net_device *dev)
>>  {
>>  struct xenvif *vif = netdev_priv(dev);
>>  struct xenvif_queue *queue = NULL;
>> +unsigned int num_queues;
>>  u64 rx_bytes = 0;
>>  u64 rx_packets = 0;
>>  u64 tx_bytes = 0;
>>  u64 tx_packets = 0;
>>  unsigned int index;
>>
>> -spin_lock(&vif->lock);
>> -if (vif->queues == NULL)
>> +rcu_read_lock();
>> +
>> +num_queues = rcu_dereference(vif)->num_queues;
>> +if (num_queues < 1)
>>  goto out;
> 
> Is this if clause worth it? All it does is jump over the for loop, which 
> would not be executed anyway, since the initial test (0 < 0) would fail.

Probably not needed here, but it does make it consistent with other
similar checks across the file. Just looks more descriptive.

> 
>>
>>  /* Aggregate tx and rx stats from each queue */
>> -for (index = 0; index < vif->num_queues; ++index) {
>> +for (index = 0; index < num_queues; ++index) {
>>  queue = &vif->queues[index];
>>  rx_bytes += queue->stats.rx_bytes;
>>  rx_packets += queue->stats.rx_packets;
>> @@ -241,7 +248,7 @@ static struct net_device_stats
>> *xenvif_get_stats(struct net_device *dev)
>>  }
>>
>>  out:
>> -spin_unlock(&vif->lock);
>> +rcu_read_unlock();
>>
>>  vif->dev->stats.rx_bytes = rx_bytes;
>>  vif->dev->stats.rx_packets = rx_packets;
>> @@ -377,10 +384,16 @@ static void xenvif_get_ethtool_stats(struct
>> net_device *dev,
>>   struct ethtool_stats *stats, u64 * data)
>>  {
>>  struct xenvif *vif = netdev_priv(dev);
>> -unsigned int num_queues = vif->num_queues;
>> +unsigned int num_queues;
>>  int i;
>>  unsigned int queue_index;
>>
>> +rcu_read_lock();
>> +
>> +num_queues = rcu_dereference(vif)->num_queues;
>> +if (num_queues < 1)
>> +goto out;
>> +
> 
> You have introduced a semantic change with the above if clause. The 
> xenvif_stats array was previously zeroed if num_queues < 1. It appears that 
> ethtool does actually allocate a zeroed array to pass in here, but I wonder 
> whether it is still safer to have this function zero it anyway. 

Agree. Should at least zero out data array before exiting.

> 
>>  for (i = 0; i < ARRAY_SIZE(xenvif_stats); i++) {
>>  unsigned long accum = 0;
>>  for (queue_index = 0; queue_index < num_queues;
>> ++queue_index) {
>> @@ -389,6 +402,8 @@ static void xenvif_get_ethtool_stats(struct
>> net_device *dev,
>>  }
>>  data[i] = accum;
>>  }
>> +out:
>> +rcu_read_unlock();
>>  }
>>
>>  static void xenvif_get_strings(struct net_device *dev, u32 stringset, u8 *
>> data)
>> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-
>> netback/netback.c
>> index f9bcf4a..62fa74d 100644
>> --- a/drivers/net/xen-netback/netback.c
>> +++ b/drivers/net/xen-netback/netback.c
>> @@ -214,7 +214,7 @@ static void xenvif_fatal_tx_err(struct xenvif *vif)
>>  net

[PATCH 24/29] drivers: convert iblock_req.pending from atomic_t to refcount_t

2017-03-06 Thread Elena Reshetova
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova 
Signed-off-by: Hans Liljestrand 
Signed-off-by: Kees Cook 
Signed-off-by: David Windsor 
---
 drivers/target/target_core_iblock.c | 12 ++--
 drivers/target/target_core_iblock.h |  3 ++-
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/target/target_core_iblock.c 
b/drivers/target/target_core_iblock.c
index d316ed5..bb069eb 100644
--- a/drivers/target/target_core_iblock.c
+++ b/drivers/target/target_core_iblock.c
@@ -279,7 +279,7 @@ static void iblock_complete_cmd(struct se_cmd *cmd)
struct iblock_req *ibr = cmd->priv;
u8 status;
 
-   if (!atomic_dec_and_test(&ibr->pending))
+   if (!refcount_dec_and_test(&ibr->pending))
return;
 
if (atomic_read(&ibr->ib_bio_err_cnt))
@@ -487,7 +487,7 @@ iblock_execute_write_same(struct se_cmd *cmd)
bio_list_init(&list);
bio_list_add(&list, bio);
 
-   atomic_set(&ibr->pending, 1);
+   refcount_set(&ibr->pending, 1);
 
while (sectors) {
while (bio_add_page(bio, sg_page(sg), sg->length, sg->offset)
@@ -498,7 +498,7 @@ iblock_execute_write_same(struct se_cmd *cmd)
if (!bio)
goto fail_put_bios;
 
-   atomic_inc(&ibr->pending);
+   refcount_inc(&ibr->pending);
bio_list_add(&list, bio);
}
 
@@ -706,7 +706,7 @@ iblock_execute_rw(struct se_cmd *cmd, struct scatterlist 
*sgl, u32 sgl_nents,
cmd->priv = ibr;
 
if (!sgl_nents) {
-   atomic_set(&ibr->pending, 1);
+   refcount_set(&ibr->pending, 1);
iblock_complete_cmd(cmd);
return 0;
}
@@ -719,7 +719,7 @@ iblock_execute_rw(struct se_cmd *cmd, struct scatterlist 
*sgl, u32 sgl_nents,
bio_list_init(&list);
bio_list_add(&list, bio);
 
-   atomic_set(&ibr->pending, 2);
+   refcount_set(&ibr->pending, 2);
bio_cnt = 1;
 
for_each_sg(sgl, sg, sgl_nents, i) {
@@ -740,7 +740,7 @@ iblock_execute_rw(struct se_cmd *cmd, struct scatterlist 
*sgl, u32 sgl_nents,
if (!bio)
goto fail_put_bios;
 
-   atomic_inc(&ibr->pending);
+   refcount_inc(&ibr->pending);
bio_list_add(&list, bio);
bio_cnt++;
}
diff --git a/drivers/target/target_core_iblock.h 
b/drivers/target/target_core_iblock.h
index 718d3fc..f2a5797 100644
--- a/drivers/target/target_core_iblock.h
+++ b/drivers/target/target_core_iblock.h
@@ -2,6 +2,7 @@
 #define TARGET_CORE_IBLOCK_H
 
 #include 
+#include 
 #include 
 
 #define IBLOCK_VERSION "4.0"
@@ -10,7 +11,7 @@
 #define IBLOCK_LBA_SHIFT   9
 
 struct iblock_req {
-   atomic_t pending;
+   refcount_t pending;
atomic_t ib_bio_err_cnt;
 } cacheline_aligned;
 
-- 
2.7.4



[PATCH 02/29] drivers, firewire: convert fw_node.ref_count from atomic_t to refcount_t

2017-03-06 Thread Elena Reshetova
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova 
Signed-off-by: Hans Liljestrand 
Signed-off-by: Kees Cook 
Signed-off-by: David Windsor 
---
 drivers/firewire/core-topology.c | 2 +-
 drivers/firewire/core.h  | 8 
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/firewire/core-topology.c b/drivers/firewire/core-topology.c
index 0de8350..939d259 100644
--- a/drivers/firewire/core-topology.c
+++ b/drivers/firewire/core-topology.c
@@ -124,7 +124,7 @@ static struct fw_node *fw_node_create(u32 sid, int 
port_count, int color)
node->initiated_reset = SELF_ID_PHY_INITIATOR(sid);
node->port_count = port_count;
 
-   atomic_set(&node->ref_count, 1);
+   refcount_set(&node->ref_count, 1);
INIT_LIST_HEAD(&node->link);
 
return node;
diff --git a/drivers/firewire/core.h b/drivers/firewire/core.h
index e1480ff6..c07962e 100644
--- a/drivers/firewire/core.h
+++ b/drivers/firewire/core.h
@@ -12,7 +12,7 @@
 #include 
 #include 
 
-#include 
+#include 
 
 struct device;
 struct fw_card;
@@ -184,7 +184,7 @@ struct fw_node {
 * local node to this node. */
u8 max_depth:4; /* Maximum depth to any leaf node */
u8 max_hops:4;  /* Max hops in this sub tree */
-   atomic_t ref_count;
+   refcount_t ref_count;
 
/* For serializing node topology into a list. */
struct list_head link;
@@ -197,14 +197,14 @@ struct fw_node {
 
 static inline struct fw_node *fw_node_get(struct fw_node *node)
 {
-   atomic_inc(&node->ref_count);
+   refcount_inc(&node->ref_count);
 
return node;
 }
 
 static inline void fw_node_put(struct fw_node *node)
 {
-   if (atomic_dec_and_test(&node->ref_count))
+   if (refcount_dec_and_test(&node->ref_count))
kfree(node);
 }
 
-- 
2.7.4



[PATCH 26/29] drivers, usb: convert dev_data.count from atomic_t to refcount_t

2017-03-06 Thread Elena Reshetova
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova 
Signed-off-by: Hans Liljestrand 
Signed-off-by: Kees Cook 
Signed-off-by: David Windsor 
---
 drivers/usb/gadget/legacy/inode.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/gadget/legacy/inode.c 
b/drivers/usb/gadget/legacy/inode.c
index 79a2d8f..81d76f3 100644
--- a/drivers/usb/gadget/legacy/inode.c
+++ b/drivers/usb/gadget/legacy/inode.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -114,7 +115,7 @@ enum ep0_state {
 
 struct dev_data {
spinlock_t  lock;
-   atomic_tcount;
+   refcount_t  count;
enum ep0_state  state;  /* P: lock */
struct usb_gadgetfs_event   event [N_EVENT];
unsignedev_next;
@@ -150,12 +151,12 @@ struct dev_data {
 
 static inline void get_dev (struct dev_data *data)
 {
-   atomic_inc (&data->count);
+   refcount_inc (&data->count);
 }
 
 static void put_dev (struct dev_data *data)
 {
-   if (likely (!atomic_dec_and_test (&data->count)))
+   if (likely (!refcount_dec_and_test (&data->count)))
return;
/* needs no more cleanup */
BUG_ON (waitqueue_active (&data->wait));
@@ -170,7 +171,7 @@ static struct dev_data *dev_new (void)
if (!dev)
return NULL;
dev->state = STATE_DEV_DISABLED;
-   atomic_set (&dev->count, 1);
+   refcount_set (&dev->count, 1);
spin_lock_init (&dev->lock);
INIT_LIST_HEAD (&dev->epfiles);
init_waitqueue_head (&dev->wait);
-- 
2.7.4



[PATCH 10/29] drivers, md: convert stripe_head.count from atomic_t to refcount_t

2017-03-06 Thread Elena Reshetova
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova 
Signed-off-by: Hans Liljestrand 
Signed-off-by: Kees Cook 
Signed-off-by: David Windsor 
---
 drivers/md/raid5-cache.c |  8 +++---
 drivers/md/raid5.c   | 66 
 drivers/md/raid5.h   |  3 ++-
 3 files changed, 39 insertions(+), 38 deletions(-)

diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 3f307be..6c05e12 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -979,7 +979,7 @@ int r5l_write_stripe(struct r5l_log *log, struct 
stripe_head *sh)
 * don't delay.
 */
clear_bit(STRIPE_DELAYED, &sh->state);
-   atomic_inc(&sh->count);
+   refcount_inc(&sh->count);
 
mutex_lock(&log->io_mutex);
/* meta + data */
@@ -1321,7 +1321,7 @@ static void r5c_flush_stripe(struct r5conf *conf, struct 
stripe_head *sh)
assert_spin_locked(&conf->device_lock);
 
list_del_init(&sh->lru);
-   atomic_inc(&sh->count);
+   refcount_inc(&sh->count);
 
set_bit(STRIPE_HANDLE, &sh->state);
atomic_inc(&conf->active_stripes);
@@ -1424,7 +1424,7 @@ static void r5c_do_reclaim(struct r5conf *conf)
 */
if (!list_empty(&sh->lru) &&
!test_bit(STRIPE_HANDLE, &sh->state) &&
-   atomic_read(&sh->count) == 0) {
+   refcount_read(&sh->count) == 0) {
r5c_flush_stripe(conf, sh);
if (count++ >= R5C_RECLAIM_STRIPE_GROUP)
break;
@@ -2650,7 +2650,7 @@ r5c_cache_data(struct r5l_log *log, struct stripe_head 
*sh,
 * don't delay.
 */
clear_bit(STRIPE_DELAYED, &sh->state);
-   atomic_inc(&sh->count);
+   refcount_inc(&sh->count);
 
mutex_lock(&log->io_mutex);
/* meta + data */
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 2ce23b0..30c96a8 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -296,7 +296,7 @@ static void do_release_stripe(struct r5conf *conf, struct 
stripe_head *sh,
 static void __release_stripe(struct r5conf *conf, struct stripe_head *sh,
 struct list_head *temp_inactive_list)
 {
-   if (atomic_dec_and_test(&sh->count))
+   if (refcount_dec_and_test(&sh->count))
do_release_stripe(conf, sh, temp_inactive_list);
 }
 
@@ -388,7 +388,7 @@ void raid5_release_stripe(struct stripe_head *sh)
 
/* Avoid release_list until the last reference.
 */
-   if (atomic_add_unless(&sh->count, -1, 1))
+   if (refcount_dec_not_one(&sh->count))
return;
 
if (unlikely(!conf->mddev->thread) ||
@@ -401,7 +401,7 @@ void raid5_release_stripe(struct stripe_head *sh)
 slow_path:
local_irq_save(flags);
/* we are ok here if STRIPE_ON_RELEASE_LIST is set or not */
-   if (atomic_dec_and_lock(&sh->count, &conf->device_lock)) {
+   if (refcount_dec_and_lock(&sh->count, &conf->device_lock)) {
INIT_LIST_HEAD(&list);
hash = sh->hash_lock_index;
do_release_stripe(conf, sh, &list);
@@ -491,7 +491,7 @@ static void init_stripe(struct stripe_head *sh, sector_t 
sector, int previous)
struct r5conf *conf = sh->raid_conf;
int i, seq;
 
-   BUG_ON(atomic_read(&sh->count) != 0);
+   BUG_ON(refcount_read(&sh->count) != 0);
BUG_ON(test_bit(STRIPE_HANDLE, &sh->state));
BUG_ON(stripe_operations_active(sh));
BUG_ON(sh->batch_head);
@@ -668,11 +668,11 @@ raid5_get_active_stripe(struct r5conf *conf, sector_t 
sector,
  &conf->cache_state);
} else {
init_stripe(sh, sector, previous);
-   atomic_inc(&sh->count);
+   refcount_inc(&sh->count);
}
-   } else if (!atomic_inc_not_zero(&sh->count)) {
+   } else if (!refcount_inc_not_zero(&sh->count)) {
spin_lock(&conf->device_lock);
-   if (!atomic_read(&sh->count)) {
+   if (!refcount_read(&sh->count)) {
if (!test_bit(STRIPE_HANDLE, &sh->state))
atomic_inc(&conf->active_stripes);
BUG_ON(list_empty(&sh->lru) &&
@@ -688,7 +688,7 @@ raid5_get_active_stripe(struct r5conf *conf, sector_t 
sector,
sh->group = NULL;
}
}
-   atomic_inc(&sh->count);
+

[PATCH 08/29] drivers, md: convert mddev.active from atomic_t to refcount_t

2017-03-06 Thread Elena Reshetova
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova 
Signed-off-by: Hans Liljestrand 
Signed-off-by: Kees Cook 
Signed-off-by: David Windsor 
---
 drivers/md/md.c | 6 +++---
 drivers/md/md.h | 3 ++-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 985374f..94c8ebf 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -449,7 +449,7 @@ EXPORT_SYMBOL(md_unplug);
 
 static inline struct mddev *mddev_get(struct mddev *mddev)
 {
-   atomic_inc(&mddev->active);
+   refcount_inc(&mddev->active);
return mddev;
 }
 
@@ -459,7 +459,7 @@ static void mddev_put(struct mddev *mddev)
 {
struct bio_set *bs = NULL;
 
-   if (!atomic_dec_and_lock(&mddev->active, &all_mddevs_lock))
+   if (!refcount_dec_and_lock(&mddev->active, &all_mddevs_lock))
return;
if (!mddev->raid_disks && list_empty(&mddev->disks) &&
mddev->ctime == 0 && !mddev->hold_active) {
@@ -495,7 +495,7 @@ void mddev_init(struct mddev *mddev)
INIT_LIST_HEAD(&mddev->all_mddevs);
setup_timer(&mddev->safemode_timer, md_safemode_timeout,
(unsigned long) mddev);
-   atomic_set(&mddev->active, 1);
+   refcount_set(&mddev->active, 1);
atomic_set(&mddev->openers, 0);
atomic_set(&mddev->active_io, 0);
spin_lock_init(&mddev->lock);
diff --git a/drivers/md/md.h b/drivers/md/md.h
index b8859cb..4811663 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -360,7 +361,7 @@ struct mddev {
 */
struct mutexopen_mutex;
struct mutexreconfig_mutex;
-   atomic_tactive; /* general refcount */
+   refcount_t  active; /* general refcount */
atomic_topeners;/* number of active 
opens */
 
int changed;/* True if we might 
need to
-- 
2.7.4



[PATCH 00/29] drivers, mics refcount conversions

2017-03-06 Thread Elena Reshetova
This series, for various different drivers, replaces atomic_t reference
counters with the new refcount_t type and API (see include/linux/refcount.h).
By doing this we prevent intentional or accidental
underflows or overflows that can led to use-after-free vulnerabilities.

The below patches are fully independent and can be cherry-picked separately*.
Since we convert all kernel subsystems in the same fashion, resulting
in about 300 patches, we have to group them for sending at least in some
fashion to be manageable. Please excuse the long cc list.

*with the exception of the media/vb2-related patches that depend on
vb2_vmarea_handler.refcount conversions.

Not run-time tested beyond booting and using kernel with refcount conversions
for my daily work.

If there are no objections to these patches,
I think they can go via Greg's drivers tree, as he suggested before.

Elena Reshetova (29):
  drivers, block: convert xen_blkif.refcnt from atomic_t to refcount_t
  drivers, firewire: convert fw_node.ref_count from atomic_t to
refcount_t
  drivers, char: convert vma_data.refcnt from atomic_t to refcount_t
  drivers, connector: convert cn_callback_entry.refcnt from atomic_t to
refcount_t
  drivers, md, bcache: convert cached_dev.count from atomic_t to
refcount_t
  drivers, md: convert dm_cache_metadata.ref_count from atomic_t to
refcount_t
  drivers, md: convert dm_dev_internal.count from atomic_t to refcount_t
  drivers, md: convert mddev.active from atomic_t to refcount_t
  drivers, md: convert table_device.count from atomic_t to refcount_t
  drivers, md: convert stripe_head.count from atomic_t to refcount_t
  drivers, media: convert cx88_core.refcount from atomic_t to refcount_t
  drivers, media: convert s2255_dev.num_channels from atomic_t to
refcount_t
  drivers, media: convert vb2_vmarea_handler.refcount from atomic_t to
refcount_t
  drivers, media: convert vb2_dc_buf.refcount from atomic_t to
refcount_t
  drivers, media: convert vb2_dma_sg_buf.refcount from atomic_t to
refcount_t
  drivers, media: convert vb2_vmalloc_buf.refcount from atomic_t to
refcount_t
  drivers, pci: convert hv_pci_dev.refs from atomic_t to refcount_t
  drivers, s390: convert urdev.ref_count from atomic_t to refcount_t
  drivers, s390: convert lcs_reply.refcnt from atomic_t to refcount_t
  drivers, s390: convert qeth_reply.refcnt from atomic_t to refcount_t
  drivers, s390: convert fc_fcp_pkt.ref_cnt from atomic_t to refcount_t
  drivers, scsi: convert iscsi_task.refcount from atomic_t to refcount_t
  drivers: convert vme_user_vma_priv.refcnt from atomic_t to refcount_t
  drivers: convert iblock_req.pending from atomic_t to refcount_t
  drivers, usb: convert ffs_data.ref from atomic_t to refcount_t
  drivers, usb: convert dev_data.count from atomic_t to refcount_t
  drivers, usb: convert ep_data.count from atomic_t to refcount_t
  drivers: convert sbd_duart.map_guard from atomic_t to refcount_t
  drivers, xen: convert grant_map.users from atomic_t to refcount_t

 drivers/block/xen-blkback/common.h |  7 +--
 drivers/block/xen-blkback/xenbus.c |  2 +-
 drivers/char/mspec.c   |  9 ++--
 drivers/connector/cn_queue.c   |  4 +-
 drivers/connector/connector.c  |  2 +-
 drivers/firewire/core-topology.c   |  2 +-
 drivers/firewire/core.h|  8 ++--
 drivers/md/bcache/bcache.h |  7 +--
 drivers/md/bcache/super.c  |  6 +--
 drivers/md/bcache/writeback.h  |  2 +-
 drivers/md/dm-cache-metadata.c |  9 ++--
 drivers/md/dm-table.c  |  6 +--
 drivers/md/dm.c| 12 +++--
 drivers/md/dm.h|  3 +-
 drivers/md/md.c|  6 +--
 drivers/md/md.h|  3 +-
 drivers/md/raid5-cache.c   |  8 ++--
 drivers/md/raid5.c | 66 +-
 drivers/md/raid5.h |  3 +-
 drivers/media/pci/cx88/cx88-cards.c|  2 +-
 drivers/media/pci/cx88/cx88-core.c |  4 +-
 drivers/media/pci/cx88/cx88.h  |  3 +-
 drivers/media/usb/s2255/s2255drv.c | 21 
 drivers/media/v4l2-core/videobuf2-dma-contig.c | 11 +++--
 drivers/media/v4l2-core/videobuf2-dma-sg.c | 11 +++--
 drivers/media/v4l2-core/videobuf2-memops.c |  6 +--
 drivers/media/v4l2-core/videobuf2-vmalloc.c| 11 +++--
 drivers/pci/host/pci-hyperv.c  |  9 ++--
 drivers/s390/char/vmur.c   |  8 ++--
 drivers/s390/char/vmur.h   |  4 +-
 drivers/s390/net/lcs.c |  8 ++--
 drivers/s390/net/lcs.h |  3 +-
 drivers/s390/net/qeth_core.h   |  3 +-
 drivers/s390/net/qeth_core_main.c  |  8 ++--
 drivers/

[PATCH 12/29] drivers, media: convert s2255_dev.num_channels from atomic_t to refcount_t

2017-03-06 Thread Elena Reshetova
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova 
Signed-off-by: Hans Liljestrand 
Signed-off-by: Kees Cook 
Signed-off-by: David Windsor 
---
 drivers/media/usb/s2255/s2255drv.c | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/media/usb/s2255/s2255drv.c 
b/drivers/media/usb/s2255/s2255drv.c
index a9d4484..2b4b009 100644
--- a/drivers/media/usb/s2255/s2255drv.c
+++ b/drivers/media/usb/s2255/s2255drv.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -256,7 +257,7 @@ struct s2255_vc {
 struct s2255_dev {
struct s2255_vc vc[MAX_CHANNELS];
struct v4l2_device  v4l2_dev;
-   atomic_tnum_channels;
+   refcount_tnum_channels;
int frames;
struct mutexlock;   /* channels[].vdev.lock */
struct mutexcmdlock; /* protects cmdbuf */
@@ -1581,11 +1582,11 @@ static void s2255_video_device_release(struct 
video_device *vdev)
container_of(vdev, struct s2255_vc, vdev);
 
dprintk(dev, 4, "%s, chnls: %d\n", __func__,
-   atomic_read(&dev->num_channels));
+   refcount_read(&dev->num_channels));
 
v4l2_ctrl_handler_free(&vc->hdl);
 
-   if (atomic_dec_and_test(&dev->num_channels))
+   if (refcount_dec_and_test(&dev->num_channels))
s2255_destroy(dev);
return;
 }
@@ -1688,7 +1689,7 @@ static int s2255_probe_v4l(struct s2255_dev *dev)
"failed to register video device!\n");
break;
}
-   atomic_inc(&dev->num_channels);
+   refcount_set(&dev->num_channels, 1);
v4l2_info(&dev->v4l2_dev, "V4L2 device registered as %s\n",
  video_device_node_name(&vc->vdev));
 
@@ -1696,11 +1697,11 @@ static int s2255_probe_v4l(struct s2255_dev *dev)
pr_info("Sensoray 2255 V4L driver Revision: %s\n",
S2255_VERSION);
/* if no channels registered, return error and probe will fail*/
-   if (atomic_read(&dev->num_channels) == 0) {
+   if (refcount_read(&dev->num_channels) == 0) {
v4l2_device_unregister(&dev->v4l2_dev);
return ret;
}
-   if (atomic_read(&dev->num_channels) != MAX_CHANNELS)
+   if (refcount_read(&dev->num_channels) != MAX_CHANNELS)
pr_warn("s2255: Not all channels available.\n");
return 0;
 }
@@ -2248,7 +2249,7 @@ static int s2255_probe(struct usb_interface *interface,
goto errorFWDATA1;
}
 
-   atomic_set(&dev->num_channels, 0);
+   refcount_set(&dev->num_channels, 0);
dev->pid = id->idProduct;
dev->fw_data = kzalloc(sizeof(struct s2255_fw), GFP_KERNEL);
if (!dev->fw_data)
@@ -2368,12 +2369,12 @@ static void s2255_disconnect(struct usb_interface 
*interface)
 {
struct s2255_dev *dev = to_s2255_dev(usb_get_intfdata(interface));
int i;
-   int channels = atomic_read(&dev->num_channels);
+   int channels = refcount_read(&dev->num_channels);
mutex_lock(&dev->lock);
v4l2_device_disconnect(&dev->v4l2_dev);
mutex_unlock(&dev->lock);
/*see comments in the uvc_driver.c usb disconnect function */
-   atomic_inc(&dev->num_channels);
+   refcount_inc(&dev->num_channels);
/* unregister each video device. */
for (i = 0; i < channels; i++)
video_unregister_device(&dev->vc[i].vdev);
@@ -2386,7 +2387,7 @@ static void s2255_disconnect(struct usb_interface 
*interface)
dev->vc[i].vidstatus_ready = 1;
wake_up(&dev->vc[i].wait_vidstatus);
}
-   if (atomic_dec_and_test(&dev->num_channels))
+   if (refcount_dec_and_test(&dev->num_channels))
s2255_destroy(dev);
dev_info(&interface->dev, "%s\n", __func__);
 }
-- 
2.7.4



[PATCH 11/29] drivers, media: convert cx88_core.refcount from atomic_t to refcount_t

2017-03-06 Thread Elena Reshetova
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova 
Signed-off-by: Hans Liljestrand 
Signed-off-by: Kees Cook 
Signed-off-by: David Windsor 
---
 drivers/media/pci/cx88/cx88-cards.c | 2 +-
 drivers/media/pci/cx88/cx88-core.c  | 4 ++--
 drivers/media/pci/cx88/cx88.h   | 3 ++-
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/media/pci/cx88/cx88-cards.c 
b/drivers/media/pci/cx88/cx88-cards.c
index cdfbde2..7fc5f5f 100644
--- a/drivers/media/pci/cx88/cx88-cards.c
+++ b/drivers/media/pci/cx88/cx88-cards.c
@@ -3670,7 +3670,7 @@ struct cx88_core *cx88_core_create(struct pci_dev *pci, 
int nr)
if (!core)
return NULL;
 
-   atomic_inc(&core->refcount);
+   refcount_set(&core->refcount, 1);
core->pci_bus  = pci->bus->number;
core->pci_slot = PCI_SLOT(pci->devfn);
core->pci_irqmask = PCI_INT_RISC_RD_BERRINT | PCI_INT_RISC_WR_BERRINT |
diff --git a/drivers/media/pci/cx88/cx88-core.c 
b/drivers/media/pci/cx88/cx88-core.c
index 973a9cd4..8bfa5b7 100644
--- a/drivers/media/pci/cx88/cx88-core.c
+++ b/drivers/media/pci/cx88/cx88-core.c
@@ -1052,7 +1052,7 @@ struct cx88_core *cx88_core_get(struct pci_dev *pci)
mutex_unlock(&devlist);
return NULL;
}
-   atomic_inc(&core->refcount);
+   refcount_inc(&core->refcount);
mutex_unlock(&devlist);
return core;
}
@@ -1073,7 +1073,7 @@ void cx88_core_put(struct cx88_core *core, struct pci_dev 
*pci)
release_mem_region(pci_resource_start(pci, 0),
   pci_resource_len(pci, 0));
 
-   if (!atomic_dec_and_test(&core->refcount))
+   if (!refcount_dec_and_test(&core->refcount))
return;
 
mutex_lock(&devlist);
diff --git a/drivers/media/pci/cx88/cx88.h b/drivers/media/pci/cx88/cx88.h
index 115414c..16c1313 100644
--- a/drivers/media/pci/cx88/cx88.h
+++ b/drivers/media/pci/cx88/cx88.h
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -339,7 +340,7 @@ struct cx8802_dev;
 
 struct cx88_core {
struct list_head   devlist;
-   atomic_t   refcount;
+   refcount_t   refcount;
 
/* board name */
intnr;
-- 
2.7.4



[PATCH 06/29] drivers, md: convert dm_cache_metadata.ref_count from atomic_t to refcount_t

2017-03-06 Thread Elena Reshetova
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova 
Signed-off-by: Hans Liljestrand 
Signed-off-by: Kees Cook 
Signed-off-by: David Windsor 
---
 drivers/md/dm-cache-metadata.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/md/dm-cache-metadata.c b/drivers/md/dm-cache-metadata.c
index e4c2c1a..6d26e71 100644
--- a/drivers/md/dm-cache-metadata.c
+++ b/drivers/md/dm-cache-metadata.c
@@ -13,6 +13,7 @@
 #include "persistent-data/dm-transaction-manager.h"
 
 #include 
+#include 
 
 /**/
 
@@ -102,7 +103,7 @@ struct cache_disk_superblock {
 } __packed;
 
 struct dm_cache_metadata {
-   atomic_t ref_count;
+   refcount_t ref_count;
struct list_head list;
 
unsigned version;
@@ -756,7 +757,7 @@ static struct dm_cache_metadata *metadata_open(struct 
block_device *bdev,
}
 
cmd->version = metadata_version;
-   atomic_set(&cmd->ref_count, 1);
+   refcount_set(&cmd->ref_count, 1);
init_rwsem(&cmd->root_lock);
cmd->bdev = bdev;
cmd->data_block_size = data_block_size;
@@ -794,7 +795,7 @@ static struct dm_cache_metadata *lookup(struct block_device 
*bdev)
 
list_for_each_entry(cmd, &table, list)
if (cmd->bdev == bdev) {
-   atomic_inc(&cmd->ref_count);
+   refcount_inc(&cmd->ref_count);
return cmd;
}
 
@@ -865,7 +866,7 @@ struct dm_cache_metadata *dm_cache_metadata_open(struct 
block_device *bdev,
 
 void dm_cache_metadata_close(struct dm_cache_metadata *cmd)
 {
-   if (atomic_dec_and_test(&cmd->ref_count)) {
+   if (refcount_dec_and_test(&cmd->ref_count)) {
mutex_lock(&table_lock);
list_del(&cmd->list);
mutex_unlock(&table_lock);
-- 
2.7.4



[PATCH 22/29] drivers, scsi: convert iscsi_task.refcount from atomic_t to refcount_t

2017-03-06 Thread Elena Reshetova
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova 
Signed-off-by: Hans Liljestrand 
Signed-off-by: Kees Cook 
Signed-off-by: David Windsor 
---
 drivers/scsi/libiscsi.c| 8 
 drivers/scsi/qedi/qedi_iscsi.c | 2 +-
 include/scsi/libiscsi.h| 3 ++-
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 834d121..7eb1d2c 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -516,13 +516,13 @@ static void iscsi_free_task(struct iscsi_task *task)
 
 void __iscsi_get_task(struct iscsi_task *task)
 {
-   atomic_inc(&task->refcount);
+   refcount_inc(&task->refcount);
 }
 EXPORT_SYMBOL_GPL(__iscsi_get_task);
 
 void __iscsi_put_task(struct iscsi_task *task)
 {
-   if (atomic_dec_and_test(&task->refcount))
+   if (refcount_dec_and_test(&task->refcount))
iscsi_free_task(task);
 }
 EXPORT_SYMBOL_GPL(__iscsi_put_task);
@@ -744,7 +744,7 @@ __iscsi_conn_send_pdu(struct iscsi_conn *conn, struct 
iscsi_hdr *hdr,
 * released by the lld when it has transmitted the task for
 * pdus we do not expect a response for.
 */
-   atomic_set(&task->refcount, 1);
+   refcount_set(&task->refcount, 1);
task->conn = conn;
task->sc = NULL;
INIT_LIST_HEAD(&task->running);
@@ -1616,7 +1616,7 @@ static inline struct iscsi_task *iscsi_alloc_task(struct 
iscsi_conn *conn,
sc->SCp.phase = conn->session->age;
sc->SCp.ptr = (char *) task;
 
-   atomic_set(&task->refcount, 1);
+   refcount_set(&task->refcount, 1);
task->state = ISCSI_TASK_PENDING;
task->conn = conn;
task->sc = sc;
diff --git a/drivers/scsi/qedi/qedi_iscsi.c b/drivers/scsi/qedi/qedi_iscsi.c
index b9f79d3..3895bd5 100644
--- a/drivers/scsi/qedi/qedi_iscsi.c
+++ b/drivers/scsi/qedi/qedi_iscsi.c
@@ -1372,7 +1372,7 @@ static void qedi_cleanup_task(struct iscsi_task *task)
 {
if (!task->sc || task->state == ISCSI_TASK_PENDING) {
QEDI_INFO(NULL, QEDI_LOG_IO, "Returning ref_cnt=%d\n",
- atomic_read(&task->refcount));
+ refcount_read(&task->refcount));
return;
}
 
diff --git a/include/scsi/libiscsi.h b/include/scsi/libiscsi.h
index b0e275d..24d74b5 100644
--- a/include/scsi/libiscsi.h
+++ b/include/scsi/libiscsi.h
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -139,7 +140,7 @@ struct iscsi_task {
 
/* state set/tested under session->lock */
int state;
-   atomic_trefcount;
+   refcount_t  refcount;
struct list_headrunning;/* running cmd list */
void*dd_data;   /* driver/transport data */
 };
-- 
2.7.4



[PATCH 05/29] drivers, md, bcache: convert cached_dev.count from atomic_t to refcount_t

2017-03-06 Thread Elena Reshetova
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova 
Signed-off-by: Hans Liljestrand 
Signed-off-by: Kees Cook 
Signed-off-by: David Windsor 
---
 drivers/md/bcache/bcache.h| 7 ---
 drivers/md/bcache/super.c | 6 +++---
 drivers/md/bcache/writeback.h | 2 +-
 3 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index c3ea03c..de2be28 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -184,6 +184,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -299,7 +300,7 @@ struct cached_dev {
struct semaphoresb_write_mutex;
 
/* Refcount on the cache set. Always nonzero when we're caching. */
-   atomic_tcount;
+   refcount_t  count;
struct work_struct  detach;
 
/*
@@ -805,13 +806,13 @@ do {  
\
 
 static inline void cached_dev_put(struct cached_dev *dc)
 {
-   if (atomic_dec_and_test(&dc->count))
+   if (refcount_dec_and_test(&dc->count))
schedule_work(&dc->detach);
 }
 
 static inline bool cached_dev_get(struct cached_dev *dc)
 {
-   if (!atomic_inc_not_zero(&dc->count))
+   if (!refcount_inc_not_zero(&dc->count))
return false;
 
/* Paired with the mb in cached_dev_attach */
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 85e3f21..cc36ce4 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -891,7 +891,7 @@ static void cached_dev_detach_finish(struct work_struct *w)
closure_init_stack(&cl);
 
BUG_ON(!test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags));
-   BUG_ON(atomic_read(&dc->count));
+   BUG_ON(refcount_read(&dc->count));
 
mutex_lock(&bch_register_lock);
 
@@ -1018,7 +1018,7 @@ int bch_cached_dev_attach(struct cached_dev *dc, struct 
cache_set *c)
 * dc->c must be set before dc->count != 0 - paired with the mb in
 * cached_dev_get()
 */
-   atomic_set(&dc->count, 1);
+   refcount_set(&dc->count, 1);
 
/* Block writeback thread, but spawn it */
down_write(&dc->writeback_lock);
@@ -1030,7 +1030,7 @@ int bch_cached_dev_attach(struct cached_dev *dc, struct 
cache_set *c)
if (BDEV_STATE(&dc->sb) == BDEV_STATE_DIRTY) {
bch_sectors_dirty_init(dc);
atomic_set(&dc->has_dirty, 1);
-   atomic_inc(&dc->count);
+   refcount_inc(&dc->count);
bch_writeback_queue(dc);
}
 
diff --git a/drivers/md/bcache/writeback.h b/drivers/md/bcache/writeback.h
index 629bd1a..5bac1b0 100644
--- a/drivers/md/bcache/writeback.h
+++ b/drivers/md/bcache/writeback.h
@@ -70,7 +70,7 @@ static inline void bch_writeback_add(struct cached_dev *dc)
 {
if (!atomic_read(&dc->has_dirty) &&
!atomic_xchg(&dc->has_dirty, 1)) {
-   atomic_inc(&dc->count);
+   refcount_inc(&dc->count);
 
if (BDEV_STATE(&dc->sb) != BDEV_STATE_DIRTY) {
SET_BDEV_STATE(&dc->sb, BDEV_STATE_DIRTY);
-- 
2.7.4



[PATCH 28/29] drivers: convert sbd_duart.map_guard from atomic_t to refcount_t

2017-03-06 Thread Elena Reshetova
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova 
Signed-off-by: Hans Liljestrand 
Signed-off-by: Kees Cook 
Signed-off-by: David Windsor 
---
 drivers/tty/serial/sb1250-duart.c | 18 +++---
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/drivers/tty/serial/sb1250-duart.c 
b/drivers/tty/serial/sb1250-duart.c
index 771f361..041625c 100644
--- a/drivers/tty/serial/sb1250-duart.c
+++ b/drivers/tty/serial/sb1250-duart.c
@@ -41,7 +41,7 @@
 #include 
 #include 
 
-#include 
+#include 
 #include 
 #include 
 
@@ -103,7 +103,7 @@ struct sbd_port {
 struct sbd_duart {
struct sbd_port sport[2];
unsigned long   mapctrl;
-   atomic_tmap_guard;
+   refcount_t  map_guard;
 };
 
 #define to_sport(uport) container_of(uport, struct sbd_port, port)
@@ -654,15 +654,13 @@ static void sbd_release_port(struct uart_port *uport)
 {
struct sbd_port *sport = to_sport(uport);
struct sbd_duart *duart = sport->duart;
-   int map_guard;
 
iounmap(sport->memctrl);
sport->memctrl = NULL;
iounmap(uport->membase);
uport->membase = NULL;
 
-   map_guard = atomic_add_return(-1, &duart->map_guard);
-   if (!map_guard)
+   if(refcount_dec_and_test(&duart->map_guard))
release_mem_region(duart->mapctrl, DUART_CHANREG_SPACING);
release_mem_region(uport->mapbase, DUART_CHANREG_SPACING);
 }
@@ -698,7 +696,6 @@ static int sbd_request_port(struct uart_port *uport)
 {
const char *err = KERN_ERR "sbd: Unable to reserve MMIO resource\n";
struct sbd_duart *duart = to_sport(uport)->duart;
-   int map_guard;
int ret = 0;
 
if (!request_mem_region(uport->mapbase, DUART_CHANREG_SPACING,
@@ -706,11 +703,11 @@ static int sbd_request_port(struct uart_port *uport)
printk(err);
return -EBUSY;
}
-   map_guard = atomic_add_return(1, &duart->map_guard);
-   if (map_guard == 1) {
+   refcount_inc(&duart->map_guard);
+   if (refcount_read(&duart->map_guard) == 1) {
if (!request_mem_region(duart->mapctrl, DUART_CHANREG_SPACING,
"sb1250-duart")) {
-   atomic_add(-1, &duart->map_guard);
+   refcount_dec(&duart->map_guard);
printk(err);
ret = -EBUSY;
}
@@ -718,8 +715,7 @@ static int sbd_request_port(struct uart_port *uport)
if (!ret) {
ret = sbd_map_port(uport);
if (ret) {
-   map_guard = atomic_add_return(-1, &duart->map_guard);
-   if (!map_guard)
+   if (refcount_dec_and_test(&duart->map_guard))
release_mem_region(duart->mapctrl,
   DUART_CHANREG_SPACING);
}
-- 
2.7.4



[PATCH 04/29] drivers, connector: convert cn_callback_entry.refcnt from atomic_t to refcount_t

2017-03-06 Thread Elena Reshetova
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova 
Signed-off-by: Hans Liljestrand 
Signed-off-by: Kees Cook 
Signed-off-by: David Windsor 
---
 drivers/connector/cn_queue.c  | 4 ++--
 drivers/connector/connector.c | 2 +-
 include/linux/connector.h | 4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/connector/cn_queue.c b/drivers/connector/cn_queue.c
index 1f8bf05..9c54fdf 100644
--- a/drivers/connector/cn_queue.c
+++ b/drivers/connector/cn_queue.c
@@ -45,7 +45,7 @@ cn_queue_alloc_callback_entry(struct cn_queue_dev *dev, const 
char *name,
return NULL;
}
 
-   atomic_set(&cbq->refcnt, 1);
+   refcount_set(&cbq->refcnt, 1);
 
atomic_inc(&dev->refcnt);
cbq->pdev = dev;
@@ -58,7 +58,7 @@ cn_queue_alloc_callback_entry(struct cn_queue_dev *dev, const 
char *name,
 
 void cn_queue_release_callback(struct cn_callback_entry *cbq)
 {
-   if (!atomic_dec_and_test(&cbq->refcnt))
+   if (!refcount_dec_and_test(&cbq->refcnt))
return;
 
atomic_dec(&cbq->pdev->refcnt);
diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c
index 25693b0..8615594b 100644
--- a/drivers/connector/connector.c
+++ b/drivers/connector/connector.c
@@ -157,7 +157,7 @@ static int cn_call_callback(struct sk_buff *skb)
spin_lock_bh(&dev->cbdev->queue_lock);
list_for_each_entry(i, &dev->cbdev->queue_list, callback_entry) {
if (cn_cb_equal(&i->id.id, &msg->id)) {
-   atomic_inc(&i->refcnt);
+   refcount_inc(&i->refcnt);
cbq = i;
break;
}
diff --git a/include/linux/connector.h b/include/linux/connector.h
index f8fe863..032102b 100644
--- a/include/linux/connector.h
+++ b/include/linux/connector.h
@@ -22,7 +22,7 @@
 #define __CONNECTOR_H
 
 
-#include 
+#include 
 
 #include 
 #include 
@@ -49,7 +49,7 @@ struct cn_callback_id {
 
 struct cn_callback_entry {
struct list_head callback_entry;
-   atomic_t refcnt;
+   refcount_t refcnt;
struct cn_queue_dev *pdev;
 
struct cn_callback_id id;
-- 
2.7.4



[PATCH 16/29] drivers, media: convert vb2_vmalloc_buf.refcount from atomic_t to refcount_t

2017-03-06 Thread Elena Reshetova
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova 
Signed-off-by: Hans Liljestrand 
Signed-off-by: Kees Cook 
Signed-off-by: David Windsor 
---
 drivers/media/v4l2-core/videobuf2-vmalloc.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/media/v4l2-core/videobuf2-vmalloc.c 
b/drivers/media/v4l2-core/videobuf2-vmalloc.c
index 3f77814..f83253a 100644
--- a/drivers/media/v4l2-core/videobuf2-vmalloc.c
+++ b/drivers/media/v4l2-core/videobuf2-vmalloc.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -26,7 +27,7 @@ struct vb2_vmalloc_buf {
struct frame_vector *vec;
enum dma_data_direction dma_dir;
unsigned long   size;
-   atomic_trefcount;
+   refcount_t  refcount;
struct vb2_vmarea_handler   handler;
struct dma_buf  *dbuf;
 };
@@ -56,7 +57,7 @@ static void *vb2_vmalloc_alloc(struct device *dev, unsigned 
long attrs,
return ERR_PTR(-ENOMEM);
}
 
-   atomic_inc(&buf->refcount);
+   refcount_set(&buf->refcount, 1);
return buf;
 }
 
@@ -64,7 +65,7 @@ static void vb2_vmalloc_put(void *buf_priv)
 {
struct vb2_vmalloc_buf *buf = buf_priv;
 
-   if (atomic_dec_and_test(&buf->refcount)) {
+   if (refcount_dec_and_test(&buf->refcount)) {
vfree(buf->vaddr);
kfree(buf);
}
@@ -161,7 +162,7 @@ static void *vb2_vmalloc_vaddr(void *buf_priv)
 static unsigned int vb2_vmalloc_num_users(void *buf_priv)
 {
struct vb2_vmalloc_buf *buf = buf_priv;
-   return atomic_read(&buf->refcount);
+   return refcount_read(&buf->refcount);
 }
 
 static int vb2_vmalloc_mmap(void *buf_priv, struct vm_area_struct *vma)
@@ -368,7 +369,7 @@ static struct dma_buf *vb2_vmalloc_get_dmabuf(void 
*buf_priv, unsigned long flag
return NULL;
 
/* dmabuf keeps reference to vb2 buffer */
-   atomic_inc(&buf->refcount);
+   refcount_inc(&buf->refcount);
 
return dbuf;
 }
-- 
2.7.4



[PATCH 13/29] drivers, media: convert vb2_vmarea_handler.refcount from atomic_t to refcount_t

2017-03-06 Thread Elena Reshetova
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova 
Signed-off-by: Hans Liljestrand 
Signed-off-by: Kees Cook 
Signed-off-by: David Windsor 
---
 drivers/media/v4l2-core/videobuf2-memops.c | 6 +++---
 include/media/videobuf2-memops.h   | 3 ++-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/media/v4l2-core/videobuf2-memops.c 
b/drivers/media/v4l2-core/videobuf2-memops.c
index 1cd322e..4bb8424 100644
--- a/drivers/media/v4l2-core/videobuf2-memops.c
+++ b/drivers/media/v4l2-core/videobuf2-memops.c
@@ -96,10 +96,10 @@ static void vb2_common_vm_open(struct vm_area_struct *vma)
struct vb2_vmarea_handler *h = vma->vm_private_data;
 
pr_debug("%s: %p, refcount: %d, vma: %08lx-%08lx\n",
-  __func__, h, atomic_read(h->refcount), vma->vm_start,
+  __func__, h, refcount_read(h->refcount), vma->vm_start,
   vma->vm_end);
 
-   atomic_inc(h->refcount);
+   refcount_inc(h->refcount);
 }
 
 /**
@@ -114,7 +114,7 @@ static void vb2_common_vm_close(struct vm_area_struct *vma)
struct vb2_vmarea_handler *h = vma->vm_private_data;
 
pr_debug("%s: %p, refcount: %d, vma: %08lx-%08lx\n",
-  __func__, h, atomic_read(h->refcount), vma->vm_start,
+  __func__, h, refcount_read(h->refcount), vma->vm_start,
   vma->vm_end);
 
h->put(h->arg);
diff --git a/include/media/videobuf2-memops.h b/include/media/videobuf2-memops.h
index 36565c7a..a6ed091 100644
--- a/include/media/videobuf2-memops.h
+++ b/include/media/videobuf2-memops.h
@@ -16,6 +16,7 @@
 
 #include 
 #include 
+#include 
 
 /**
  * struct vb2_vmarea_handler - common vma refcount tracking handler
@@ -25,7 +26,7 @@
  * @arg:   argument for @put callback
  */
 struct vb2_vmarea_handler {
-   atomic_t*refcount;
+   refcount_t  *refcount;
void(*put)(void *arg);
void*arg;
 };
-- 
2.7.4



[PATCH 14/29] drivers, media: convert vb2_dc_buf.refcount from atomic_t to refcount_t

2017-03-06 Thread Elena Reshetova
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova 
Signed-off-by: Hans Liljestrand 
Signed-off-by: Kees Cook 
Signed-off-by: David Windsor 
---
 drivers/media/v4l2-core/videobuf2-dma-contig.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/media/v4l2-core/videobuf2-dma-contig.c 
b/drivers/media/v4l2-core/videobuf2-dma-contig.c
index fb6a177..d29a07f 100644
--- a/drivers/media/v4l2-core/videobuf2-dma-contig.c
+++ b/drivers/media/v4l2-core/videobuf2-dma-contig.c
@@ -12,6 +12,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -34,7 +35,7 @@ struct vb2_dc_buf {
 
/* MMAP related */
struct vb2_vmarea_handler   handler;
-   atomic_trefcount;
+   refcount_t  refcount;
struct sg_table *sgt_base;
 
/* DMABUF related */
@@ -86,7 +87,7 @@ static unsigned int vb2_dc_num_users(void *buf_priv)
 {
struct vb2_dc_buf *buf = buf_priv;
 
-   return atomic_read(&buf->refcount);
+   return refcount_read(&buf->refcount);
 }
 
 static void vb2_dc_prepare(void *buf_priv)
@@ -122,7 +123,7 @@ static void vb2_dc_put(void *buf_priv)
 {
struct vb2_dc_buf *buf = buf_priv;
 
-   if (!atomic_dec_and_test(&buf->refcount))
+   if (!refcount_dec_and_test(&buf->refcount))
return;
 
if (buf->sgt_base) {
@@ -170,7 +171,7 @@ static void *vb2_dc_alloc(struct device *dev, unsigned long 
attrs,
buf->handler.put = vb2_dc_put;
buf->handler.arg = buf;
 
-   atomic_inc(&buf->refcount);
+   refcount_set(&buf->refcount, 1);
 
return buf;
 }
@@ -407,7 +408,7 @@ static struct dma_buf *vb2_dc_get_dmabuf(void *buf_priv, 
unsigned long flags)
return NULL;
 
/* dmabuf keeps reference to vb2 buffer */
-   atomic_inc(&buf->refcount);
+   refcount_inc(&buf->refcount);
 
return dbuf;
 }
-- 
2.7.4



[PATCH 09/29] drivers, md: convert table_device.count from atomic_t to refcount_t

2017-03-06 Thread Elena Reshetova
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova 
Signed-off-by: Hans Liljestrand 
Signed-off-by: Kees Cook 
Signed-off-by: David Windsor 
---
 drivers/md/dm.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 9f37d7f..cba91c3 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define DM_MSG_PREFIX "core"
 
@@ -96,7 +97,7 @@ struct dm_md_mempools {
 
 struct table_device {
struct list_head list;
-   atomic_t count;
+   refcount_t count;
struct dm_dev dm_dev;
 };
 
@@ -680,10 +681,11 @@ int dm_get_table_device(struct mapped_device *md, dev_t 
dev, fmode_t mode,
 
format_dev_t(td->dm_dev.name, dev);
 
-   atomic_set(&td->count, 0);
+   refcount_set(&td->count, 1);
list_add(&td->list, &md->table_devices);
+   } else {
+   refcount_inc(&td->count);
}
-   atomic_inc(&td->count);
mutex_unlock(&md->table_devices_lock);
 
*result = &td->dm_dev;
@@ -696,7 +698,7 @@ void dm_put_table_device(struct mapped_device *md, struct 
dm_dev *d)
struct table_device *td = container_of(d, struct table_device, dm_dev);
 
mutex_lock(&md->table_devices_lock);
-   if (atomic_dec_and_test(&td->count)) {
+   if (refcount_dec_and_test(&td->count)) {
close_table_device(td, md);
list_del(&td->list);
kfree(td);
@@ -713,7 +715,7 @@ static void free_table_devices(struct list_head *devices)
struct table_device *td = list_entry(tmp, struct table_device, 
list);
 
DMWARN("dm_destroy: %s still exists with %d references",
-  td->dm_dev.name, atomic_read(&td->count));
+  td->dm_dev.name, refcount_read(&td->count));
kfree(td);
}
 }
-- 
2.7.4



[PATCH 23/29] drivers: convert vme_user_vma_priv.refcnt from atomic_t to refcount_t

2017-03-06 Thread Elena Reshetova
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova 
Signed-off-by: Hans Liljestrand 
Signed-off-by: Kees Cook 
Signed-off-by: David Windsor 
---
 drivers/staging/vme/devices/vme_user.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/vme/devices/vme_user.c 
b/drivers/staging/vme/devices/vme_user.c
index 69e9a770..a3d4610 100644
--- a/drivers/staging/vme/devices/vme_user.c
+++ b/drivers/staging/vme/devices/vme_user.c
@@ -17,7 +17,7 @@
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -118,7 +118,7 @@ static const int type[VME_DEVS] = { MASTER_MINOR,   
MASTER_MINOR,
 
 struct vme_user_vma_priv {
unsigned int minor;
-   atomic_t refcnt;
+   refcount_t refcnt;
 };
 
 static ssize_t resource_to_user(int minor, char __user *buf, size_t count,
@@ -430,7 +430,7 @@ static void vme_user_vm_open(struct vm_area_struct *vma)
 {
struct vme_user_vma_priv *vma_priv = vma->vm_private_data;
 
-   atomic_inc(&vma_priv->refcnt);
+   refcount_inc(&vma_priv->refcnt);
 }
 
 static void vme_user_vm_close(struct vm_area_struct *vma)
@@ -438,7 +438,7 @@ static void vme_user_vm_close(struct vm_area_struct *vma)
struct vme_user_vma_priv *vma_priv = vma->vm_private_data;
unsigned int minor = vma_priv->minor;
 
-   if (!atomic_dec_and_test(&vma_priv->refcnt))
+   if (!refcount_dec_and_test(&vma_priv->refcnt))
return;
 
mutex_lock(&image[minor].mutex);
@@ -473,7 +473,7 @@ static int vme_user_master_mmap(unsigned int minor, struct 
vm_area_struct *vma)
}
 
vma_priv->minor = minor;
-   atomic_set(&vma_priv->refcnt, 1);
+   refcount_set(&vma_priv->refcnt, 1);
vma->vm_ops = &vme_user_vm_ops;
vma->vm_private_data = vma_priv;
 
-- 
2.7.4



[PATCH 15/29] drivers, media: convert vb2_dma_sg_buf.refcount from atomic_t to refcount_t

2017-03-06 Thread Elena Reshetova
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova 
Signed-off-by: Hans Liljestrand 
Signed-off-by: Kees Cook 
Signed-off-by: David Windsor 
---
 drivers/media/v4l2-core/videobuf2-dma-sg.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/media/v4l2-core/videobuf2-dma-sg.c 
b/drivers/media/v4l2-core/videobuf2-dma-sg.c
index ecff8f4..29fde1a 100644
--- a/drivers/media/v4l2-core/videobuf2-dma-sg.c
+++ b/drivers/media/v4l2-core/videobuf2-dma-sg.c
@@ -12,6 +12,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -46,7 +47,7 @@ struct vb2_dma_sg_buf {
struct sg_table *dma_sgt;
size_t  size;
unsigned intnum_pages;
-   atomic_trefcount;
+   refcount_t  refcount;
struct vb2_vmarea_handler   handler;
 
struct dma_buf_attachment   *db_attach;
@@ -150,7 +151,7 @@ static void *vb2_dma_sg_alloc(struct device *dev, unsigned 
long dma_attrs,
buf->handler.put = vb2_dma_sg_put;
buf->handler.arg = buf;
 
-   atomic_inc(&buf->refcount);
+   refcount_set(&buf->refcount, 1);
 
dprintk(1, "%s: Allocated buffer of %d pages\n",
__func__, buf->num_pages);
@@ -176,7 +177,7 @@ static void vb2_dma_sg_put(void *buf_priv)
struct sg_table *sgt = &buf->sg_table;
int i = buf->num_pages;
 
-   if (atomic_dec_and_test(&buf->refcount)) {
+   if (refcount_dec_and_test(&buf->refcount)) {
dprintk(1, "%s: Freeing buffer of %d pages\n", __func__,
buf->num_pages);
dma_unmap_sg_attrs(buf->dev, sgt->sgl, sgt->orig_nents,
@@ -320,7 +321,7 @@ static unsigned int vb2_dma_sg_num_users(void *buf_priv)
 {
struct vb2_dma_sg_buf *buf = buf_priv;
 
-   return atomic_read(&buf->refcount);
+   return refcount_read(&buf->refcount);
 }
 
 static int vb2_dma_sg_mmap(void *buf_priv, struct vm_area_struct *vma)
@@ -530,7 +531,7 @@ static struct dma_buf *vb2_dma_sg_get_dmabuf(void 
*buf_priv, unsigned long flags
return NULL;
 
/* dmabuf keeps reference to vb2 buffer */
-   atomic_inc(&buf->refcount);
+   refcount_inc(&buf->refcount);
 
return dbuf;
 }
-- 
2.7.4



[PATCH 27/29] drivers, usb: convert ep_data.count from atomic_t to refcount_t

2017-03-06 Thread Elena Reshetova
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova 
Signed-off-by: Hans Liljestrand 
Signed-off-by: Kees Cook 
Signed-off-by: David Windsor 
---
 drivers/usb/gadget/legacy/inode.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/gadget/legacy/inode.c 
b/drivers/usb/gadget/legacy/inode.c
index 81d76f3..d21a5f8 100644
--- a/drivers/usb/gadget/legacy/inode.c
+++ b/drivers/usb/gadget/legacy/inode.c
@@ -191,7 +191,7 @@ enum ep_state {
 struct ep_data {
struct mutexlock;
enum ep_state   state;
-   atomic_tcount;
+   refcount_t  count;
struct dev_data *dev;
/* must hold dev->lock before accessing ep or req */
struct usb_ep   *ep;
@@ -206,12 +206,12 @@ struct ep_data {
 
 static inline void get_ep (struct ep_data *data)
 {
-   atomic_inc (&data->count);
+   refcount_inc (&data->count);
 }
 
 static void put_ep (struct ep_data *data)
 {
-   if (likely (!atomic_dec_and_test (&data->count)))
+   if (likely (!refcount_dec_and_test (&data->count)))
return;
put_dev (data->dev);
/* needs no more cleanup */
@@ -1562,7 +1562,7 @@ static int activate_ep_files (struct dev_data *dev)
init_waitqueue_head (&data->wait);
 
strncpy (data->name, ep->name, sizeof (data->name) - 1);
-   atomic_set (&data->count, 1);
+   refcount_set (&data->count, 1);
data->dev = dev;
get_dev (dev);
 
-- 
2.7.4



[PATCH 17/29] drivers, pci: convert hv_pci_dev.refs from atomic_t to refcount_t

2017-03-06 Thread Elena Reshetova
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova 
Signed-off-by: Hans Liljestrand 
Signed-off-by: Kees Cook 
Signed-off-by: David Windsor 
---
 drivers/pci/host/pci-hyperv.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/host/pci-hyperv.c b/drivers/pci/host/pci-hyperv.c
index cd114c6..870deed 100644
--- a/drivers/pci/host/pci-hyperv.c
+++ b/drivers/pci/host/pci-hyperv.c
@@ -56,6 +56,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /*
@@ -421,7 +422,7 @@ enum hv_pcidev_ref_reason {
 struct hv_pci_dev {
/* List protected by pci_rescan_remove_lock */
struct list_head list_entry;
-   atomic_t refs;
+   refcount_t refs;
enum hv_pcichild_state state;
struct pci_function_description desc;
bool reported_missing;
@@ -1254,13 +1255,13 @@ static void q_resource_requirements(void *context, 
struct pci_response *resp,
 static void get_pcichild(struct hv_pci_dev *hpdev,
enum hv_pcidev_ref_reason reason)
 {
-   atomic_inc(&hpdev->refs);
+   refcount_inc(&hpdev->refs);
 }
 
 static void put_pcichild(struct hv_pci_dev *hpdev,
enum hv_pcidev_ref_reason reason)
 {
-   if (atomic_dec_and_test(&hpdev->refs))
+   if (refcount_dec_and_test(&hpdev->refs))
kfree(hpdev);
 }
 
@@ -1314,7 +1315,7 @@ static struct hv_pci_dev *new_pcichild_device(struct 
hv_pcibus_device *hbus,
wait_for_completion(&comp_pkt.host_event);
 
hpdev->desc = *desc;
-   get_pcichild(hpdev, hv_pcidev_ref_initial);
+   refcount_set(&hpdev->refs, 1);
get_pcichild(hpdev, hv_pcidev_ref_childlist);
spin_lock_irqsave(&hbus->device_list_lock, flags);
list_add_tail(&hpdev->list_entry, &hbus->children);
-- 
2.7.4



[PATCH 19/29] drivers, s390: convert lcs_reply.refcnt from atomic_t to refcount_t

2017-03-06 Thread Elena Reshetova
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova 
Signed-off-by: Hans Liljestrand 
Signed-off-by: Kees Cook 
Signed-off-by: David Windsor 
---
 drivers/s390/net/lcs.c | 8 +++-
 drivers/s390/net/lcs.h | 3 ++-
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/s390/net/lcs.c b/drivers/s390/net/lcs.c
index 211b31d..18dc787 100644
--- a/drivers/s390/net/lcs.c
+++ b/drivers/s390/net/lcs.c
@@ -774,15 +774,13 @@ lcs_get_lancmd(struct lcs_card *card, int count)
 static void
 lcs_get_reply(struct lcs_reply *reply)
 {
-   WARN_ON(atomic_read(&reply->refcnt) <= 0);
-   atomic_inc(&reply->refcnt);
+   refcount_inc(&reply->refcnt);
 }
 
 static void
 lcs_put_reply(struct lcs_reply *reply)
 {
-WARN_ON(atomic_read(&reply->refcnt) <= 0);
-if (atomic_dec_and_test(&reply->refcnt)) {
+if (refcount_dec_and_test(&reply->refcnt)) {
kfree(reply);
}
 
@@ -798,7 +796,7 @@ lcs_alloc_reply(struct lcs_cmd *cmd)
reply = kzalloc(sizeof(struct lcs_reply), GFP_ATOMIC);
if (!reply)
return NULL;
-   atomic_set(&reply->refcnt,1);
+   refcount_set(&reply->refcnt,1);
reply->sequence_no = cmd->sequence_no;
reply->received = 0;
reply->rc = 0;
diff --git a/drivers/s390/net/lcs.h b/drivers/s390/net/lcs.h
index 150fcb4..3802f4f 100644
--- a/drivers/s390/net/lcs.h
+++ b/drivers/s390/net/lcs.h
@@ -4,6 +4,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #define LCS_DBF_TEXT(level, name, text) \
@@ -270,7 +271,7 @@ struct lcs_buffer {
 struct lcs_reply {
struct list_head list;
__u16 sequence_no;
-   atomic_t refcnt;
+   refcount_t refcnt;
/* Callback for completion notification. */
void (*callback)(struct lcs_card *, struct lcs_cmd *);
wait_queue_head_t wait_q;
-- 
2.7.4



[PATCH 21/29] drivers, s390: convert fc_fcp_pkt.ref_cnt from atomic_t to refcount_t

2017-03-06 Thread Elena Reshetova
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova 
Signed-off-by: Hans Liljestrand 
Signed-off-by: Kees Cook 
Signed-off-by: David Windsor 
---
 drivers/scsi/libfc/fc_fcp.c | 6 +++---
 include/scsi/libfc.h| 3 ++-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/libfc/fc_fcp.c b/drivers/scsi/libfc/fc_fcp.c
index 0e67621..a808e8e 100644
--- a/drivers/scsi/libfc/fc_fcp.c
+++ b/drivers/scsi/libfc/fc_fcp.c
@@ -154,7 +154,7 @@ static struct fc_fcp_pkt *fc_fcp_pkt_alloc(struct fc_lport 
*lport, gfp_t gfp)
memset(fsp, 0, sizeof(*fsp));
fsp->lp = lport;
fsp->xfer_ddp = FC_XID_UNKNOWN;
-   atomic_set(&fsp->ref_cnt, 1);
+   refcount_set(&fsp->ref_cnt, 1);
init_timer(&fsp->timer);
fsp->timer.data = (unsigned long)fsp;
INIT_LIST_HEAD(&fsp->list);
@@ -175,7 +175,7 @@ static struct fc_fcp_pkt *fc_fcp_pkt_alloc(struct fc_lport 
*lport, gfp_t gfp)
  */
 static void fc_fcp_pkt_release(struct fc_fcp_pkt *fsp)
 {
-   if (atomic_dec_and_test(&fsp->ref_cnt)) {
+   if (refcount_dec_and_test(&fsp->ref_cnt)) {
struct fc_fcp_internal *si = fc_get_scsi_internal(fsp->lp);
 
mempool_free(fsp, si->scsi_pkt_pool);
@@ -188,7 +188,7 @@ static void fc_fcp_pkt_release(struct fc_fcp_pkt *fsp)
  */
 static void fc_fcp_pkt_hold(struct fc_fcp_pkt *fsp)
 {
-   atomic_inc(&fsp->ref_cnt);
+   refcount_inc(&fsp->ref_cnt);
 }
 
 /**
diff --git a/include/scsi/libfc.h b/include/scsi/libfc.h
index da5033d..2109844 100644
--- a/include/scsi/libfc.h
+++ b/include/scsi/libfc.h
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -321,7 +322,7 @@ struct fc_seq_els_data {
  */
 struct fc_fcp_pkt {
spinlock_tscsi_pkt_lock;
-   atomic_t  ref_cnt;
+   refcount_tref_cnt;
 
/* SCSI command and data transfer information */
u32   data_len;
-- 
2.7.4



[PATCH 20/29] drivers, s390: convert qeth_reply.refcnt from atomic_t to refcount_t

2017-03-06 Thread Elena Reshetova
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova 
Signed-off-by: Hans Liljestrand 
Signed-off-by: Kees Cook 
Signed-off-by: David Windsor 
---
 drivers/s390/net/qeth_core.h  | 3 ++-
 drivers/s390/net/qeth_core_main.c | 8 +++-
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/s390/net/qeth_core.h b/drivers/s390/net/qeth_core.h
index e7addea..e2c81d21 100644
--- a/drivers/s390/net/qeth_core.h
+++ b/drivers/s390/net/qeth_core.h
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -641,7 +642,7 @@ struct qeth_reply {
int rc;
void *param;
struct qeth_card *card;
-   atomic_t refcnt;
+   refcount_t refcnt;
 };
 
 
diff --git a/drivers/s390/net/qeth_core_main.c 
b/drivers/s390/net/qeth_core_main.c
index 315d8a2..a2bf13f 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -555,7 +555,7 @@ static struct qeth_reply *qeth_alloc_reply(struct qeth_card 
*card)
 
reply = kzalloc(sizeof(struct qeth_reply), GFP_ATOMIC);
if (reply) {
-   atomic_set(&reply->refcnt, 1);
+   refcount_set(&reply->refcnt, 1);
atomic_set(&reply->received, 0);
reply->card = card;
}
@@ -564,14 +564,12 @@ static struct qeth_reply *qeth_alloc_reply(struct 
qeth_card *card)
 
 static void qeth_get_reply(struct qeth_reply *reply)
 {
-   WARN_ON(atomic_read(&reply->refcnt) <= 0);
-   atomic_inc(&reply->refcnt);
+   refcount_inc(&reply->refcnt);
 }
 
 static void qeth_put_reply(struct qeth_reply *reply)
 {
-   WARN_ON(atomic_read(&reply->refcnt) <= 0);
-   if (atomic_dec_and_test(&reply->refcnt))
+   if (refcount_dec_and_test(&reply->refcnt))
kfree(reply);
 }
 
-- 
2.7.4



[PATCH 29/29] drivers, xen: convert grant_map.users from atomic_t to refcount_t

2017-03-06 Thread Elena Reshetova
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova 
Signed-off-by: Hans Liljestrand 
Signed-off-by: Kees Cook 
Signed-off-by: David Windsor 
---
 drivers/xen/gntdev.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 2ef2b61..b183cb2 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -85,7 +86,7 @@ struct grant_map {
int index;
int count;
int flags;
-   atomic_t users;
+   refcount_t users;
struct unmap_notify notify;
struct ioctl_gntdev_grant_ref *grants;
struct gnttab_map_grant_ref   *map_ops;
@@ -165,7 +166,7 @@ static struct grant_map *gntdev_alloc_map(struct 
gntdev_priv *priv, int count)
 
add->index = 0;
add->count = count;
-   atomic_set(&add->users, 1);
+   refcount_set(&add->users, 1);
 
return add;
 
@@ -211,7 +212,7 @@ static void gntdev_put_map(struct gntdev_priv *priv, struct 
grant_map *map)
if (!map)
return;
 
-   if (!atomic_dec_and_test(&map->users))
+   if (!refcount_dec_and_test(&map->users))
return;
 
atomic_sub(map->count, &pages_mapped);
@@ -399,7 +400,7 @@ static void gntdev_vma_open(struct vm_area_struct *vma)
struct grant_map *map = vma->vm_private_data;
 
pr_debug("gntdev_vma_open %p\n", vma);
-   atomic_inc(&map->users);
+   refcount_inc(&map->users);
 }
 
 static void gntdev_vma_close(struct vm_area_struct *vma)
@@ -1003,7 +1004,7 @@ static int gntdev_mmap(struct file *flip, struct 
vm_area_struct *vma)
goto unlock_out;
}
 
-   atomic_inc(&map->users);
+   refcount_inc(&map->users);
 
vma->vm_ops = &gntdev_vmops;
 
-- 
2.7.4



[PATCH 25/29] drivers, usb: convert ffs_data.ref from atomic_t to refcount_t

2017-03-06 Thread Elena Reshetova
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova 
Signed-off-by: Hans Liljestrand 
Signed-off-by: Kees Cook 
Signed-off-by: David Windsor 
---
 drivers/usb/gadget/function/f_fs.c | 8 
 drivers/usb/gadget/function/u_fs.h | 3 ++-
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/usb/gadget/function/f_fs.c 
b/drivers/usb/gadget/function/f_fs.c
index 87fccf6..3cdeb91 100644
--- a/drivers/usb/gadget/function/f_fs.c
+++ b/drivers/usb/gadget/function/f_fs.c
@@ -1570,14 +1570,14 @@ static void ffs_data_get(struct ffs_data *ffs)
 {
ENTER();
 
-   atomic_inc(&ffs->ref);
+   refcount_inc(&ffs->ref);
 }
 
 static void ffs_data_opened(struct ffs_data *ffs)
 {
ENTER();
 
-   atomic_inc(&ffs->ref);
+   refcount_inc(&ffs->ref);
if (atomic_add_return(1, &ffs->opened) == 1 &&
ffs->state == FFS_DEACTIVATED) {
ffs->state = FFS_CLOSING;
@@ -1589,7 +1589,7 @@ static void ffs_data_put(struct ffs_data *ffs)
 {
ENTER();
 
-   if (unlikely(atomic_dec_and_test(&ffs->ref))) {
+   if (unlikely(refcount_dec_and_test(&ffs->ref))) {
pr_info("%s(): freeing\n", __func__);
ffs_data_clear(ffs);
BUG_ON(waitqueue_active(&ffs->ev.waitq) ||
@@ -1634,7 +1634,7 @@ static struct ffs_data *ffs_data_new(void)
 
ENTER();
 
-   atomic_set(&ffs->ref, 1);
+   refcount_set(&ffs->ref, 1);
atomic_set(&ffs->opened, 0);
ffs->state = FFS_READ_DESCRIPTORS;
mutex_init(&ffs->mutex);
diff --git a/drivers/usb/gadget/function/u_fs.h 
b/drivers/usb/gadget/function/u_fs.h
index 4b69694..abfca48 100644
--- a/drivers/usb/gadget/function/u_fs.h
+++ b/drivers/usb/gadget/function/u_fs.h
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef VERBOSE_DEBUG
 #ifndef pr_vdebug
@@ -177,7 +178,7 @@ struct ffs_data {
struct completion   ep0req_completion;  /* P: mutex */
 
/* reference counter */
-   atomic_tref;
+   refcount_t  ref;
/* how many files are opened (EP0 and others) */
atomic_topened;
 
-- 
2.7.4



Re: [PATCH net 1/7] bnx2x: prevent crash when accessing PTP with interface down

2017-03-06 Thread Michal Schmidt

Dne 5.3.2017 v 10:43 Mintz, Yuval napsal(a):

It is possible to crash the kernel by accessing a PTP device while its
associated bnx2x interface is down. Before the interface is brought up, the
timecounter is not initialized, so accessing it results in NULL dereference.

Fix it by checking if the interface is up.

Use -ENETDOWN as the error code when the interface is down.
 -EFAULT in bnx2x_ptp_adjfreq() did not seem right.

Tested using phc_ctl get/set/adj/freq commands.

Signed-off-by: Michal Schmidt 


While I have no objections to the patch contents, does it even make
sense to try adjusting frequencies on a DOWNed interface?
Wouldn't it make more sense checking this in the calling context
Instead?


The caller does not know. A PTP device is not necessarily associated 
with a net device.


Michal



Re: [PATCH net 5/7] bnx2x: do not rollback VF MAC/VLAN filters we did not configure

2017-03-06 Thread Michal Schmidt

Dne 5.3.2017 v 11:13 Mintz, Yuval napsal(a):

On failure to configure a VF MAC/VLAN filter we should not attempt to
rollback filters that we failed to configure with -EEXIST.


Is this theoretical or did you actually manage to hit it?
If so, did it involve non-linux VFs?

Asking as linux VFs don't actually send multiple vlan/umac configurations
Via same request, and with a single filter per-message you're not expected
to ever do rollback.


This one is theoretical, found by reading the code, not actually hitting 
the rollback case.


Michal



[PATCH 07/29] drivers, md: convert dm_dev_internal.count from atomic_t to refcount_t

2017-03-06 Thread Elena Reshetova
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova 
Signed-off-by: Hans Liljestrand 
Signed-off-by: Kees Cook 
Signed-off-by: David Windsor 
---
 drivers/md/dm-table.c | 6 +++---
 drivers/md/dm.h   | 3 ++-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 3ad16d9..d2e2741 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -416,15 +416,15 @@ int dm_get_device(struct dm_target *ti, const char *path, 
fmode_t mode,
return r;
}
 
-   atomic_set(&dd->count, 0);
+   refcount_set(&dd->count, 1);
list_add(&dd->list, &t->devices);
 
} else if (dd->dm_dev->mode != (mode | dd->dm_dev->mode)) {
r = upgrade_mode(dd, mode, t->md);
if (r)
return r;
+   refcount_inc(&dd->count);
}
-   atomic_inc(&dd->count);
 
*result = dd->dm_dev;
return 0;
@@ -478,7 +478,7 @@ void dm_put_device(struct dm_target *ti, struct dm_dev *d)
   dm_device_name(ti->table->md), d->name);
return;
}
-   if (atomic_dec_and_test(&dd->count)) {
+   if (refcount_dec_and_test(&dd->count)) {
dm_put_table_device(ti->table->md, d);
list_del(&dd->list);
kfree(dd);
diff --git a/drivers/md/dm.h b/drivers/md/dm.h
index f298b01..63b8142 100644
--- a/drivers/md/dm.h
+++ b/drivers/md/dm.h
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "dm-stats.h"
 
@@ -38,7 +39,7 @@
  */
 struct dm_dev_internal {
struct list_head list;
-   atomic_t count;
+   refcount_t count;
struct dm_dev *dm_dev;
 };
 
-- 
2.7.4



[PATCH 01/29] drivers, block: convert xen_blkif.refcnt from atomic_t to refcount_t

2017-03-06 Thread Elena Reshetova
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova 
Signed-off-by: Hans Liljestrand 
Signed-off-by: Kees Cook 
Signed-off-by: David Windsor 
---
 drivers/block/xen-blkback/common.h | 7 ---
 drivers/block/xen-blkback/xenbus.c | 2 +-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/block/xen-blkback/common.h 
b/drivers/block/xen-blkback/common.h
index dea61f6..2ccfd62 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -333,7 +334,7 @@ struct xen_blkif {
struct xen_vbd  vbd;
/* Back pointer to the backend_info. */
struct backend_info *be;
-   atomic_trefcnt;
+   refcount_t  refcnt;
/* for barrier (drain) requests */
struct completion   drain_complete;
atomic_tdrain;
@@ -386,10 +387,10 @@ struct pending_req {
 (_v)->bdev->bd_part->nr_sects : \
  get_capacity((_v)->bdev->bd_disk))
 
-#define xen_blkif_get(_b) (atomic_inc(&(_b)->refcnt))
+#define xen_blkif_get(_b) (refcount_inc(&(_b)->refcnt))
 #define xen_blkif_put(_b)  \
do {\
-   if (atomic_dec_and_test(&(_b)->refcnt)) \
+   if (refcount_dec_and_test(&(_b)->refcnt))   \
schedule_work(&(_b)->free_work);\
} while (0)
 
diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index 8fe61b5..9f89be3 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -176,7 +176,7 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
return ERR_PTR(-ENOMEM);
 
blkif->domid = domid;
-   atomic_set(&blkif->refcnt, 1);
+   refcount_set(&blkif->refcnt, 1);
init_completion(&blkif->drain_complete);
INIT_WORK(&blkif->free_work, xen_blkif_deferred_free);
 
-- 
2.7.4



[PATCH 18/29] drivers, s390: convert urdev.ref_count from atomic_t to refcount_t

2017-03-06 Thread Elena Reshetova
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova 
Signed-off-by: Hans Liljestrand 
Signed-off-by: Kees Cook 
Signed-off-by: David Windsor 
---
 drivers/s390/char/vmur.c | 8 
 drivers/s390/char/vmur.h | 4 +++-
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/s390/char/vmur.c b/drivers/s390/char/vmur.c
index 04aceb6..ced8151 100644
--- a/drivers/s390/char/vmur.c
+++ b/drivers/s390/char/vmur.c
@@ -110,7 +110,7 @@ static struct urdev *urdev_alloc(struct ccw_device *cdev)
mutex_init(&urd->io_mutex);
init_waitqueue_head(&urd->wait);
spin_lock_init(&urd->open_lock);
-   atomic_set(&urd->ref_count,  1);
+   refcount_set(&urd->ref_count,  1);
urd->cdev = cdev;
get_device(&cdev->dev);
return urd;
@@ -126,7 +126,7 @@ static void urdev_free(struct urdev *urd)
 
 static void urdev_get(struct urdev *urd)
 {
-   atomic_inc(&urd->ref_count);
+   refcount_inc(&urd->ref_count);
 }
 
 static struct urdev *urdev_get_from_cdev(struct ccw_device *cdev)
@@ -159,7 +159,7 @@ static struct urdev *urdev_get_from_devno(u16 devno)
 
 static void urdev_put(struct urdev *urd)
 {
-   if (atomic_dec_and_test(&urd->ref_count))
+   if (refcount_dec_and_test(&urd->ref_count))
urdev_free(urd);
 }
 
@@ -946,7 +946,7 @@ static int ur_set_offline_force(struct ccw_device *cdev, 
int force)
rc = -EBUSY;
goto fail_urdev_put;
}
-   if (!force && (atomic_read(&urd->ref_count) > 2)) {
+   if (!force && (refcount_read(&urd->ref_count) > 2)) {
/* There is still a user of urd (e.g. ur_open) */
TRACE("ur_set_offline: BUSY\n");
rc = -EBUSY;
diff --git a/drivers/s390/char/vmur.h b/drivers/s390/char/vmur.h
index fa320ad..35ea9d1 100644
--- a/drivers/s390/char/vmur.h
+++ b/drivers/s390/char/vmur.h
@@ -11,6 +11,8 @@
 #ifndef _VMUR_H_
 #define _VMUR_H_
 
+#include 
+
 #define DEV_CLASS_UR_I 0x20 /* diag210 unit record input device class */
 #define DEV_CLASS_UR_O 0x10 /* diag210 unit record output device class */
 /*
@@ -69,7 +71,7 @@ struct urdev {
size_t reclen;  /* Record length for *write* CCWs */
int class;  /* VM device class */
int io_request_rc;  /* return code from I/O request */
-   atomic_t ref_count; /* reference counter */
+   refcount_t ref_count;   /* reference counter */
wait_queue_head_t wait; /* wait queue to serialize open */
int open_flag;  /* "urdev is open" flag */
spinlock_t open_lock;   /* serialize critical sections */
-- 
2.7.4



[PATCH net 3/7 v2] bnx2x: fix possible overrun of VFPF multicast addresses array

2017-03-06 Thread Michal Schmidt

It is too late to check for the limit of the number of VF multicast
addresses after they have already been copied to the req->multicast[]
array, possibly overflowing it.

Do the check before copying.

Checking early also avoids having to (and forgetting to) unlock
vf2pf_mutex.

While we're looking at the error paths in the function, also return
an error code from it when the PF responds with an error. Even though
the caller ignores it.

v2: Move the check before bnx2x_vfpf_prep() as suggested by Yuval.

Signed-off-by: Michal Schmidt 
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c | 22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c
index bfae300..2b2ae92 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c
@@ -864,46 +864,44 @@ int bnx2x_vfpf_config_rss(struct bnx2x *bp,
 }
 
 int bnx2x_vfpf_set_mcast(struct net_device *dev)

 {
struct bnx2x *bp = netdev_priv(dev);
struct vfpf_set_q_filters_tlv *req = &bp->vf2pf_mbox->req.set_q_filters;
struct pfvf_general_resp_tlv *resp = &bp->vf2pf_mbox->resp.general_resp;
-   int rc, i = 0;
+   int rc = 0, i = 0;
struct netdev_hw_addr *ha;
 
 	if (bp->state != BNX2X_STATE_OPEN) {

DP(NETIF_MSG_IFUP, "state is %x, returning\n", bp->state);
return -EINVAL;
}
 
+	/* We support PFVF_MAX_MULTICAST_PER_VF mcast addresses tops */

+   if (netdev_mc_count(dev) > PFVF_MAX_MULTICAST_PER_VF) {
+   DP(NETIF_MSG_IFUP,
+  "VF supports not more than %d multicast MAC addresses\n",
+  PFVF_MAX_MULTICAST_PER_VF);
+   return -EINVAL;
+   }
+
/* clear mailbox and prep first tlv */
bnx2x_vfpf_prep(bp, &req->first_tlv, CHANNEL_TLV_SET_Q_FILTERS,
sizeof(*req));
 
 	/* Get Rx mode requested */

DP(NETIF_MSG_IFUP, "dev->flags = %x\n", dev->flags);
 
 	netdev_for_each_mc_addr(ha, dev) {

DP(NETIF_MSG_IFUP, "Adding mcast MAC: %pM\n",
   bnx2x_mc_addr(ha));
memcpy(req->multicast[i], bnx2x_mc_addr(ha), ETH_ALEN);
i++;
}
 
-	/* We support four PFVF_MAX_MULTICAST_PER_VF mcast

- * addresses tops
- */
-   if (i >= PFVF_MAX_MULTICAST_PER_VF) {
-   DP(NETIF_MSG_IFUP,
-  "VF supports not more than %d multicast MAC addresses\n",
-  PFVF_MAX_MULTICAST_PER_VF);
-   return -EINVAL;
-   }
-
req->n_multicast = i;
req->flags |= VFPF_SET_Q_FILTERS_MULTICAST_CHANGED;
req->vf_qid = 0;
 
 	/* add list termination tlv */

bnx2x_add_tlv(bp, req, req->first_tlv.tl.length, CHANNEL_TLV_LIST_END,
  sizeof(struct channel_list_end_tlv));
@@ -920,15 +918,15 @@ int bnx2x_vfpf_set_mcast(struct net_device *dev)
BNX2X_ERR("Set Rx mode/multicast failed: %d\n",
  resp->hdr.status);
rc = -EINVAL;
}
 out:
bnx2x_vfpf_finalize(bp, &req->first_tlv);
 
-	return 0;

+   return rc;
 }
 
 /* request pf to add a vlan for the vf */

 int bnx2x_vfpf_update_vlan(struct bnx2x *bp, u16 vid, u8 vf_qid, bool add)
 {
struct vfpf_set_q_filters_tlv *req = &bp->vf2pf_mbox->req.set_q_filters;
struct pfvf_general_resp_tlv *resp = &bp->vf2pf_mbox->resp.general_resp;
--
2.9.3



Re: [PATCH v3 17/20] usb: gadget: pch_udc: Replace PCI pool old API

2017-03-06 Thread Felipe Balbi
Peter Senna Tschudin  writes:
> On Sun, Feb 26, 2017 at 08:24:22PM +0100, Romain Perier wrote:
>> The PCI pool API is deprecated. This commits replaces the PCI pool old
>> API by the appropriated function with the DMA pool API.
>> 
> Reviewed-by: Peter Senna Tschudin 

Fine by me:

Acked-by: Felipe Balbi 

-- 
balbi


signature.asc
Description: PGP signature


Re: [PATCH] 4.9.13 brcmfmac: fix use-after-free on resume

2017-03-06 Thread Daniel J Blueman
On 6 March 2017 at 21:00, Arend Van Spriel  wrote:
> + linux-wireless
>
> On 6-3-2017 8:14, Daniel J Blueman wrote:
>> KASAN reported 'struct wireless_dev wdev' was read after being freed.
>> Fix by freeing after the access.
>
> I would rather like to see the KASAN report, because something is off
> here. This function is called with wdev as a parameter so how can it be
> accessed after free here? brcmf_remove_interface() does not free the
> wdev nor the brcmf_cfg80211_vif instance which contains the wdev.
>
> Regards,
> Arend
>
>> Signed-off-by: Daniel J Blueman 
>>
>> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c
>> b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c
>> index de19c7c..aa0f470 100644
>> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c
>> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c
>> @@ -2288,12 +2288,13 @@ int brcmf_p2p_del_vif(struct wiphy *wiphy,
>> struct wireless_dev *wdev)
>> else
>> err = 0;
>> }
>> -   brcmf_remove_interface(vif->ifp, true);
>>
>> -   brcmf_cfg80211_arm_vif_event(cfg, NULL);
>> if (vif->wdev.iftype != NL80211_IFTYPE_P2P_DEVICE)
>> p2p->bss_idx[P2PAPI_BSSCFG_CONNECTION].vif = NULL;
>>
>> +   brcmf_remove_interface(vif->ifp, true);
>> +   brcmf_cfg80211_arm_vif_event(cfg, NULL);
>> +
>> return err;
>>  }

Sure, https://quora.org/kernel/brcmfmac/dmesg.txt

vmlinux, cfg80211.o, brcmfmac.o and config are in the same path; this
is against v4.9.13 stock.

Thanks,
  Daniel
-- 
Daniel J Blueman


[PATCH net-next] net: ipv4: add support for ECMP hash policy choice

2017-03-06 Thread Nikolay Aleksandrov
This patch adds support for ECMP hash policy choice via a new sysctl
called fib_multipath_hash_policy and also adds support for L4 hashes.
The current values for fib_multipath_hash_policy are:
 0 - layer 3
 1 - layer 4 (new default)
If there's an skb hash already set and it matches the chosen policy then it
will be used instead of being calculated. The ICMP inner IP addresses use
is removed, and we switch to L4 default for better distribution.

Signed-off-by: Nikolay Aleksandrov 
---
I'm not happy with using an integer, but it produces the smallest churn.
Just let me know if you'd like to switch to a string sysctl.

 Documentation/networking/ip-sysctl.txt |  8 +++
 include/net/ip_fib.h   | 14 ++---
 include/net/netns/ipv4.h   |  1 +
 include/net/route.h|  5 +-
 net/ipv4/fib_frontend.c|  3 ++
 net/ipv4/fib_semantics.c   | 11 ++--
 net/ipv4/icmp.c| 19 +--
 net/ipv4/route.c   | 93 ++
 net/ipv4/sysctl_net_ipv4.c |  9 
 9 files changed, 81 insertions(+), 82 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt 
b/Documentation/networking/ip-sysctl.txt
index fc73eeb7b3b8..15810ca7d8b0 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -73,6 +73,14 @@ fib_multipath_use_neigh - BOOLEAN
0 - disabled
1 - enabled
 
+fib_multipath_hash_policy - INTEGER
+   Controls which hash policy to use for multipath routes. Only valid
+   for kernels built with CONFIG_IP_ROUTE_MULTIPATH enabled.
+   Default: 1 (Layer 4)
+   Possible values:
+   0 - Layer 3
+   1 - Layer 4
+
 route/max_size - INTEGER
Maximum number of routes allowed in the kernel.  Increase
this when using large numbers of interfaces and/or routes.
diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 368bb4024b78..8ac9bec053c5 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -371,17 +371,13 @@ int fib_sync_down_dev(struct net_device *dev, unsigned 
long event, bool force);
 int fib_sync_down_addr(struct net_device *dev, __be32 local);
 int fib_sync_up(struct net_device *dev, unsigned int nh_flags);
 
-extern u32 fib_multipath_secret __read_mostly;
-
-static inline int fib_multipath_hash(__be32 saddr, __be32 daddr)
-{
-   return jhash_2words((__force u32)saddr, (__force u32)daddr,
-   fib_multipath_secret) >> 1;
-}
-
+#ifdef CONFIG_IP_ROUTE_MULTIPATH
+int fib_multipath_hash(const struct fib_info *fi, const struct flowi4 *fl4,
+  const struct sk_buff *skb);
+#endif
 void fib_select_multipath(struct fib_result *res, int hash);
 void fib_select_path(struct net *net, struct fib_result *res,
-struct flowi4 *fl4, int mp_hash);
+struct flowi4 *fl4);
 
 /* Exported by fib_trie.c */
 void fib_trie_init(void);
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 622d2da27135..70a1d4251790 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -152,6 +152,7 @@ struct netns_ipv4 {
 #endif
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
int sysctl_fib_multipath_use_neigh;
+   int sysctl_fib_multipath_hash_policy;
 #endif
 
unsigned intfib_seq;/* protected by rtnl_mutex */
diff --git a/include/net/route.h b/include/net/route.h
index c0874c87c173..77a5c613a290 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -113,13 +113,12 @@ struct in_device;
 int ip_rt_init(void);
 void rt_cache_flush(struct net *net);
 void rt_flush_dev(struct net_device *dev);
-struct rtable *__ip_route_output_key_hash(struct net *, struct flowi4 *flp,
- int mp_hash);
+struct rtable *__ip_route_output_key_hash(struct net *net, struct flowi4 *flp);
 
 static inline struct rtable *__ip_route_output_key(struct net *net,
   struct flowi4 *flp)
 {
-   return __ip_route_output_key_hash(net, flp, -1);
+   return __ip_route_output_key_hash(net, flp);
 }
 
 struct rtable *ip_route_output_flow(struct net *, struct flowi4 *flp,
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 42bfd08109dd..bba87195cbf4 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -1233,6 +1233,9 @@ static int __net_init ip_fib_net_init(struct net *net)
/* Avoid false sharing : Use at least a full cache line */
size = max_t(size_t, size, L1_CACHE_BYTES);
 
+#ifdef CONFIG_IP_ROUTE_MULTIPATH
+   net->ipv4.sysctl_fib_multipath_hash_policy = 1;
+#endif
net->ipv4.fib_table_hash = kzalloc(size, GFP_KERNEL);
if (!net->ipv4.fib_table_hash)
return -ENOMEM;
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 317026a39cfa..6601bd9744c9 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semant

[RFC PATCH net] net: Work around lockdep limitation in sockets that use sockets

2017-03-06 Thread David Howells
Lockdep issues a circular dependency warning when AFS issues an operation
through AF_RXRPC from a context in which the VFS/VM holds the mmap_sem.

The theory lockdep comes up with is as follows:

 (1) If the pagefault handler decides it needs to read pages from AFS, it
 calls AFS with mmap_sem held and AFS begins an AF_RXRPC call, but
 creating a call requires the socket lock:

mmap_sem must be taken before sk_lock-AF_RXRPC

 (2) afs_open_socket() opens an AF_RXRPC socket and binds it.  rxrpc_bind()
 binds the underlying UDP socket whilst holding its socket lock.
 inet_bind() takes its own socket lock:

sk_lock-AF_RXRPC must be taken before sk_lock-AF_INET

 (3) Reading from a TCP socket into a userspace buffer might cause a fault
 and thus cause the kernel to take the mmap_sem, but the TCP socket is
 locked whilst doing this:

sk_lock-AF_INET must be taken before mmap_sem

However, lockdep's theory is wrong in this instance because it deals only
with lock classes and not individual locks.  The AF_INET lock in (2) isn't
really equivalent to the AF_INET lock in (3) as the former deals with a
socket entirely internal to the kernel that never sees userspace.  This is
a limitation in the design of lockdep.

Fix the general case by:

 (1) Double up all the locking keys used in sockets so that one set are
 used if the socket is created by userspace and the other set is used
 if the socket is created by the kernel.

 (2) Store the kern parameter passed to sk_alloc() in a variable in the
 sock struct (sk_kern_sock).  This informs sock_lock_init(),
 sock_init_data() and sk_clone_lock() as to the lock keys to be used.

 Note that the child created by sk_clone_lock() inherits the parent's
 kern setting.

 (3) Add a 'kern' parameter to ->accept() that is analogous to the one
 passed in to ->create() that distinguishes whether kernel_accept() or
 sys_accept4() was the caller and can be passed to sk_alloc().

 Note that a lot of accept functions merely dequeue an already
 allocated socket.  I haven't touched these as the new socket already
 exists before we get the parameter.

 Note also that there are a couple of places where I've made the accepted
 socket unconditionally kernel-based:

irda_accept()
rds_rcp_accept_one()
tcp_accept_from_sock()

 because they follow a sock_create_kern() and accept off of that.

Whilst creating this, I noticed that lustre and ocfs don't create sockets
through sock_create_kern() and thus they aren't marked as for-kernel,
though they appear to be internal.  I wonder if these should do that so
that they use the new set of lock keys.

Signed-off-by: David Howells 
---

 crypto/af_alg.c   |9 +-
 crypto/algif_hash.c   |9 +-
 drivers/staging/lustre/lnet/lnet/lib-socket.c |4 -
 fs/dlm/lowcomms.c |2 
 fs/ocfs2/cluster/tcp.c|2 
 include/crypto/if_alg.h   |2 
 include/linux/net.h   |2 
 include/net/inet_common.h |3 -
 include/net/inet_connection_sock.h|2 
 include/net/sctp/structs.h|3 -
 include/net/sock.h|7 +-
 net/atm/svc.c |5 +
 net/ax25/af_ax25.c|3 -
 net/bluetooth/l2cap_sock.c|2 
 net/bluetooth/rfcomm/sock.c   |3 -
 net/bluetooth/sco.c   |2 
 net/core/sock.c   |  106 +
 net/decnet/af_decnet.c|5 +
 net/ipv4/af_inet.c|5 +
 net/ipv4/inet_connection_sock.c   |2 
 net/irda/af_irda.c|5 +
 net/iucv/af_iucv.c|2 
 net/llc/af_llc.c  |4 +
 net/netrom/af_netrom.c|3 -
 net/nfc/llcp_sock.c   |2 
 net/phonet/pep.c  |6 +
 net/phonet/socket.c   |4 -
 net/rds/tcp_listen.c  |2 
 net/rose/af_rose.c|3 -
 net/sctp/ipv6.c   |5 +
 net/sctp/protocol.c   |5 +
 net/sctp/socket.c |4 -
 net/smc/af_smc.c  |2 
 net/socket.c  |4 -
 net/tipc/socket.c |8 +-
 net/unix/af_unix.c|5 +
 net/vmw_vsock/af_vsock.c  |3 -
 net/x25/af_x25.c  |3 -
 38 files changed, 141 insertions(+), 107 deletions(-)

diff --git a/crypto/af_alg.c b/crypto/af

[PATCH 03/29] drivers, char: convert vma_data.refcnt from atomic_t to refcount_t

2017-03-06 Thread Elena Reshetova
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova 
Signed-off-by: Hans Liljestrand 
Signed-off-by: Kees Cook 
Signed-off-by: David Windsor 
---
 drivers/char/mspec.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/char/mspec.c b/drivers/char/mspec.c
index a9c2fa3..7b75669 100644
--- a/drivers/char/mspec.c
+++ b/drivers/char/mspec.c
@@ -43,6 +43,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -89,7 +90,7 @@ static int is_sn2;
  * protect in fork case where multiple tasks share the vma_data.
  */
 struct vma_data {
-   atomic_t refcnt;/* Number of vmas sharing the data. */
+   refcount_t refcnt;  /* Number of vmas sharing the data. */
spinlock_t lock;/* Serialize access to this structure. */
int count;  /* Number of pages allocated. */
enum mspec_page_type type; /* Type of pages allocated. */
@@ -144,7 +145,7 @@ mspec_open(struct vm_area_struct *vma)
struct vma_data *vdata;
 
vdata = vma->vm_private_data;
-   atomic_inc(&vdata->refcnt);
+   refcount_inc(&vdata->refcnt);
 }
 
 /*
@@ -162,7 +163,7 @@ mspec_close(struct vm_area_struct *vma)
 
vdata = vma->vm_private_data;
 
-   if (!atomic_dec_and_test(&vdata->refcnt))
+   if (!refcount_dec_and_test(&vdata->refcnt))
return;
 
last_index = (vdata->vm_end - vdata->vm_start) >> PAGE_SHIFT;
@@ -274,7 +275,7 @@ mspec_mmap(struct file *file, struct vm_area_struct *vma,
vdata->vm_end = vma->vm_end;
vdata->type = type;
spin_lock_init(&vdata->lock);
-   atomic_set(&vdata->refcnt, 1);
+   refcount_set(&vdata->refcnt, 1);
vma->vm_private_data = vdata;
 
vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP;
-- 
2.7.4



Re: [PATCH 21/29] drivers, s390: convert fc_fcp_pkt.ref_cnt from atomic_t to refcount_t

2017-03-06 Thread Johannes Thumshirn
On 03/06/2017 03:21 PM, Elena Reshetova wrote:
> refcount_t type and corresponding API should be
> used instead of atomic_t when the variable is used as
> a reference counter. This allows to avoid accidental
> refcounter overflows that might lead to use-after-free
> situations.

The subject is wrong, should be something like "scsi: libfc convert
fc_fcp_pkt.ref_cnt from atomic_t to refcount_t" but not s390.

Other than that
Acked-by: Johannes Thumshirn 

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


[patch net-next 2/5] flow_dissector: Move MPLS dissection into a separate function

2017-03-06 Thread Jiri Pirko
From: Jiri Pirko 

Make the main flow_dissect function a bit smaller and move the MPLS
dissection into a separate function. Along with that, do the MPLS header
processing only in case the flow dissection user requires it.

Signed-off-by: Jiri Pirko 
---
 net/core/flow_dissector.c | 56 ---
 1 file changed, 34 insertions(+), 22 deletions(-)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index d79fb8f..8d01298 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -119,6 +119,33 @@ enum flow_dissect_ret {
 };
 
 static enum flow_dissect_ret
+__skb_flow_dissect_mpls(const struct sk_buff *skb,
+   struct flow_dissector *flow_dissector,
+   void *target_container, void *data, int nhoff, int hlen)
+{
+   struct flow_dissector_key_keyid *key_keyid;
+   struct mpls_label *hdr, _hdr[2];
+
+   if (!dissector_uses_key(flow_dissector,
+   FLOW_DISSECTOR_KEY_MPLS_ENTROPY))
+   return FLOW_DISSECT_RET_OUT_GOOD;
+
+   hdr = __skb_header_pointer(skb, nhoff, sizeof(_hdr), data,
+  hlen, &_hdr);
+   if (!hdr)
+   return FLOW_DISSECT_RET_OUT_BAD;
+
+   if ((ntohl(hdr[0].entry) & MPLS_LS_LABEL_MASK) >>
+   MPLS_LS_LABEL_SHIFT == MPLS_LABEL_ENTROPY) {
+   key_keyid = skb_flow_dissector_target(flow_dissector,
+ 
FLOW_DISSECTOR_KEY_MPLS_ENTROPY,
+ target_container);
+   key_keyid->keyid = hdr[1].entry & htonl(MPLS_LS_LABEL_MASK);
+   }
+   return FLOW_DISSECT_RET_OUT_GOOD;
+}
+
+static enum flow_dissect_ret
 __skb_flow_dissect_arp(const struct sk_buff *skb,
   struct flow_dissector *flow_dissector,
   void *target_container, void *data, int nhoff, int hlen)
@@ -408,31 +435,16 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
}
 
case htons(ETH_P_MPLS_UC):
-   case htons(ETH_P_MPLS_MC): {
-   struct mpls_label *hdr, _hdr[2];
+   case htons(ETH_P_MPLS_MC):
 mpls:
-   hdr = __skb_header_pointer(skb, nhoff, sizeof(_hdr), data,
-  hlen, &_hdr);
-   if (!hdr)
-   goto out_bad;
-
-   if ((ntohl(hdr[0].entry) & MPLS_LS_LABEL_MASK) >>
-MPLS_LS_LABEL_SHIFT == MPLS_LABEL_ENTROPY) {
-   if (dissector_uses_key(flow_dissector,
-  
FLOW_DISSECTOR_KEY_MPLS_ENTROPY)) {
-   key_keyid = 
skb_flow_dissector_target(flow_dissector,
- 
FLOW_DISSECTOR_KEY_MPLS_ENTROPY,
- 
target_container);
-   key_keyid->keyid = hdr[1].entry &
-   htonl(MPLS_LS_LABEL_MASK);
-   }
-
+   switch (__skb_flow_dissect_mpls(skb, flow_dissector,
+   target_container, data,
+   nhoff, hlen)) {
+   case FLOW_DISSECT_RET_OUT_GOOD:
goto out_good;
+   case FLOW_DISSECT_RET_OUT_BAD:
+   goto out_bad;
}
-
-   goto out_good;
-   }
-
case htons(ETH_P_FCOE):
if ((hlen - nhoff) < FCOE_HEADER_LEN)
goto out_bad;
-- 
2.7.4



[patch net-next 0/5] make flow dissector great again

2017-03-06 Thread Jiri Pirko
From: Jiri Pirko 

This patchset follows-up the discussion about future extensions of flow
dissector and tries to address the mentioned concerns. Some parts are
cut out into sub-functions. Also, the processing of the code (ARP, MPLS)
is made dependent on user actually requiring the bisected values.
This prepares the code for future extensions to bisect IPv6 ND messages,
TCP flags, etc.

Jiri Pirko (5):
  flow_dissector: Move ARP dissection into a separate function
  flow_dissector: Move MPLS dissection into a separate function
  flow_dissector: Fix GRE header error path
  flow_dissector: rename "proto again" goto label
  flow_dissector: Move GRE dissection into a separate function

 net/core/flow_dissector.c | 426 ++
 1 file changed, 238 insertions(+), 188 deletions(-)

-- 
2.7.4



[patch net-next 4/5] flow_dissector: rename "proto again" goto label

2017-03-06 Thread Jiri Pirko
From: Jiri Pirko 

Align with "ip_proto_again" label used in the same function and rename
vague "again" to "proto_again".

Signed-off-by: Jiri Pirko 
---
 net/core/flow_dissector.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index cefaf23..9120835 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -267,7 +267,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
memcpy(key_eth_addrs, ð->h_dest, sizeof(*key_eth_addrs));
}
 
-again:
+proto_again:
switch (proto) {
case htons(ETH_P_IP): {
const struct iphdr *iph;
@@ -370,7 +370,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
proto = vlan->h_vlan_encapsulated_proto;
nhoff += sizeof(*vlan);
if (skip_vlan)
-   goto again;
+   goto proto_again;
}
 
skip_vlan = true;
@@ -393,7 +393,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
}
}
 
-   goto again;
+   goto proto_again;
}
case htons(ETH_P_PPP_SES): {
struct {
@@ -577,7 +577,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
if (flags & FLOW_DISSECTOR_F_STOP_AT_ENCAP)
goto out_good;
 
-   goto again;
+   goto proto_again;
}
case NEXTHDR_HOP:
case NEXTHDR_ROUTING:
-- 
2.7.4



[patch net-next 1/5] flow_dissector: Move ARP dissection into a separate function

2017-03-06 Thread Jiri Pirko
From: Jiri Pirko 

Make the main flow_dissect function a bit smaller and move the ARP
dissection into a separate function. Along with that, do the ARP header
processing only in case the flow dissection user requires it.

Signed-off-by: Jiri Pirko 
---
 net/core/flow_dissector.c | 120 ++
 1 file changed, 67 insertions(+), 53 deletions(-)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index c35aae1..d79fb8f 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -113,6 +113,66 @@ __be32 __skb_flow_get_ports(const struct sk_buff *skb, int 
thoff, u8 ip_proto,
 }
 EXPORT_SYMBOL(__skb_flow_get_ports);
 
+enum flow_dissect_ret {
+   FLOW_DISSECT_RET_OUT_GOOD,
+   FLOW_DISSECT_RET_OUT_BAD,
+};
+
+static enum flow_dissect_ret
+__skb_flow_dissect_arp(const struct sk_buff *skb,
+  struct flow_dissector *flow_dissector,
+  void *target_container, void *data, int nhoff, int hlen)
+{
+   struct flow_dissector_key_arp *key_arp;
+   struct {
+   unsigned char ar_sha[ETH_ALEN];
+   unsigned char ar_sip[4];
+   unsigned char ar_tha[ETH_ALEN];
+   unsigned char ar_tip[4];
+   } *arp_eth, _arp_eth;
+   const struct arphdr *arp;
+   struct arphdr *_arp;
+
+   if (!dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_ARP))
+   return FLOW_DISSECT_RET_OUT_GOOD;
+
+   arp = __skb_header_pointer(skb, nhoff, sizeof(_arp), data,
+  hlen, &_arp);
+   if (!arp)
+   return FLOW_DISSECT_RET_OUT_BAD;
+
+   if (arp->ar_hrd != htons(ARPHRD_ETHER) ||
+   arp->ar_pro != htons(ETH_P_IP) ||
+   arp->ar_hln != ETH_ALEN ||
+   arp->ar_pln != 4 ||
+   (arp->ar_op != htons(ARPOP_REPLY) &&
+arp->ar_op != htons(ARPOP_REQUEST)))
+   return FLOW_DISSECT_RET_OUT_BAD;
+
+   arp_eth = __skb_header_pointer(skb, nhoff + sizeof(_arp),
+  sizeof(_arp_eth), data,
+  hlen, &_arp_eth);
+   if (!arp_eth)
+   return FLOW_DISSECT_RET_OUT_BAD;
+
+   key_arp = skb_flow_dissector_target(flow_dissector,
+   FLOW_DISSECTOR_KEY_ARP,
+   target_container);
+
+   memcpy(&key_arp->sip, arp_eth->ar_sip, sizeof(key_arp->sip));
+   memcpy(&key_arp->tip, arp_eth->ar_tip, sizeof(key_arp->tip));
+
+   /* Only store the lower byte of the opcode;
+* this covers ARPOP_REPLY and ARPOP_REQUEST.
+*/
+   key_arp->op = ntohs(arp->ar_op) & 0xff;
+
+   ether_addr_copy(key_arp->sha, arp_eth->ar_sha);
+   ether_addr_copy(key_arp->tha, arp_eth->ar_tha);
+
+   return FLOW_DISSECT_RET_OUT_GOOD;
+}
+
 /**
  * __skb_flow_dissect - extract the flow_keys struct and return it
  * @skb: sk_buff to extract the flow from, can be NULL if the rest are 
specified
@@ -138,7 +198,6 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
struct flow_dissector_key_control *key_control;
struct flow_dissector_key_basic *key_basic;
struct flow_dissector_key_addrs *key_addrs;
-   struct flow_dissector_key_arp *key_arp;
struct flow_dissector_key_ports *key_ports;
struct flow_dissector_key_icmp *key_icmp;
struct flow_dissector_key_tags *key_tags;
@@ -382,60 +441,15 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
goto out_good;
 
case htons(ETH_P_ARP):
-   case htons(ETH_P_RARP): {
-   struct {
-   unsigned char ar_sha[ETH_ALEN];
-   unsigned char ar_sip[4];
-   unsigned char ar_tha[ETH_ALEN];
-   unsigned char ar_tip[4];
-   } *arp_eth, _arp_eth;
-   const struct arphdr *arp;
-   struct arphdr *_arp;
-
-   arp = __skb_header_pointer(skb, nhoff, sizeof(_arp), data,
-  hlen, &_arp);
-   if (!arp)
-   goto out_bad;
-
-   if (arp->ar_hrd != htons(ARPHRD_ETHER) ||
-   arp->ar_pro != htons(ETH_P_IP) ||
-   arp->ar_hln != ETH_ALEN ||
-   arp->ar_pln != 4 ||
-   (arp->ar_op != htons(ARPOP_REPLY) &&
-arp->ar_op != htons(ARPOP_REQUEST)))
-   goto out_bad;
-
-   arp_eth = __skb_header_pointer(skb, nhoff + sizeof(_arp),
-  sizeof(_arp_eth), data,
-  hlen,
-  &_arp_eth);
-   if (!arp_eth)
+   case htons(ETH_P_RARP):
+   switch (__skb_flow_dissect_arp(skb, flow_dissector,
+

[patch net-next 3/5] flow_dissector: Fix GRE header error path

2017-03-06 Thread Jiri Pirko
From: Jiri Pirko 

Now, when an unexpected element in the GRE header appears, we break so
the l4 ports are processed. But since the ports are processed
unconditionally, there will be certainly random values dissected. Fix
this by just bailing out in such situations.

Signed-off-by: Jiri Pirko 
---
 net/core/flow_dissector.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 8d01298..cefaf23 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -479,18 +479,18 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
 
/* Only look inside GRE without routing */
if (hdr->flags & GRE_ROUTING)
-   break;
+   goto out_good;
 
/* Only look inside GRE for version 0 and 1 */
gre_ver = ntohs(hdr->flags & GRE_VERSION);
if (gre_ver > 1)
-   break;
+   goto out_good;
 
proto = hdr->protocol;
if (gre_ver) {
/* Version1 must be PPTP, and check the flags */
if (!(proto == GRE_PROTO_PPP && (hdr->flags & GRE_KEY)))
-   break;
+   goto out_good;
}
 
offset += sizeof(struct gre_base_hdr);
-- 
2.7.4



[patch net-next 5/5] flow_dissector: Move GRE dissection into a separate function

2017-03-06 Thread Jiri Pirko
From: Jiri Pirko 

Make the main flow_dissect function a bit smaller and move the GRE
dissection into a separate function.

Signed-off-by: Jiri Pirko 
---
 net/core/flow_dissector.c | 244 +-
 1 file changed, 134 insertions(+), 110 deletions(-)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 9120835..5f3ae92 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -116,6 +116,7 @@ EXPORT_SYMBOL(__skb_flow_get_ports);
 enum flow_dissect_ret {
FLOW_DISSECT_RET_OUT_GOOD,
FLOW_DISSECT_RET_OUT_BAD,
+   FLOW_DISSECT_RET_OUT_PROTO_AGAIN,
 };
 
 static enum flow_dissect_ret
@@ -200,6 +201,128 @@ __skb_flow_dissect_arp(const struct sk_buff *skb,
return FLOW_DISSECT_RET_OUT_GOOD;
 }
 
+static enum flow_dissect_ret
+__skb_flow_dissect_gre(const struct sk_buff *skb,
+  struct flow_dissector_key_control *key_control,
+  struct flow_dissector *flow_dissector,
+  void *target_container, void *data,
+  __be16 *p_proto, int *p_nhoff, int *p_hlen,
+  unsigned int flags)
+{
+   struct flow_dissector_key_keyid *key_keyid;
+   struct gre_base_hdr *hdr, _hdr;
+   int offset = 0;
+   u16 gre_ver;
+
+   hdr = __skb_header_pointer(skb, *p_nhoff, sizeof(_hdr),
+  data, *p_hlen, &_hdr);
+   if (!hdr)
+   return FLOW_DISSECT_RET_OUT_BAD;
+
+   /* Only look inside GRE without routing */
+   if (hdr->flags & GRE_ROUTING)
+   return FLOW_DISSECT_RET_OUT_GOOD;
+
+   /* Only look inside GRE for version 0 and 1 */
+   gre_ver = ntohs(hdr->flags & GRE_VERSION);
+   if (gre_ver > 1)
+   return FLOW_DISSECT_RET_OUT_GOOD;
+
+   *p_proto = hdr->protocol;
+   if (gre_ver) {
+   /* Version1 must be PPTP, and check the flags */
+   if (!(*p_proto == GRE_PROTO_PPP && (hdr->flags & GRE_KEY)))
+   return FLOW_DISSECT_RET_OUT_GOOD;
+   }
+
+   offset += sizeof(struct gre_base_hdr);
+
+   if (hdr->flags & GRE_CSUM)
+   offset += sizeof(((struct gre_full_hdr *) 0)->csum) +
+ sizeof(((struct gre_full_hdr *) 0)->reserved1);
+
+   if (hdr->flags & GRE_KEY) {
+   const __be32 *keyid;
+   __be32 _keyid;
+
+   keyid = __skb_header_pointer(skb, *p_nhoff + offset,
+sizeof(_keyid),
+data, *p_hlen, &_keyid);
+   if (!keyid)
+   return FLOW_DISSECT_RET_OUT_BAD;
+
+   if (dissector_uses_key(flow_dissector,
+  FLOW_DISSECTOR_KEY_GRE_KEYID)) {
+   key_keyid = skb_flow_dissector_target(flow_dissector,
+ 
FLOW_DISSECTOR_KEY_GRE_KEYID,
+ target_container);
+   if (gre_ver == 0)
+   key_keyid->keyid = *keyid;
+   else
+   key_keyid->keyid = *keyid & GRE_PPTP_KEY_MASK;
+   }
+   offset += sizeof(((struct gre_full_hdr *) 0)->key);
+   }
+
+   if (hdr->flags & GRE_SEQ)
+   offset += sizeof(((struct pptp_gre_header *) 0)->seq);
+
+   if (gre_ver == 0) {
+   if (*p_proto == htons(ETH_P_TEB)) {
+   const struct ethhdr *eth;
+   struct ethhdr _eth;
+
+   eth = __skb_header_pointer(skb, *p_nhoff + offset,
+  sizeof(_eth),
+  data, *p_hlen, &_eth);
+   if (!eth)
+   return FLOW_DISSECT_RET_OUT_BAD;
+   *p_proto = eth->h_proto;
+   offset += sizeof(*eth);
+
+   /* Cap headers that we access via pointers at the
+* end of the Ethernet header as our maximum alignment
+* at that point is only 2 bytes.
+*/
+   if (NET_IP_ALIGN)
+   *p_hlen = *p_nhoff + offset;
+   }
+   } else { /* version 1, must be PPTP */
+   u8 _ppp_hdr[PPP_HDRLEN];
+   u8 *ppp_hdr;
+
+   if (hdr->flags & GRE_ACK)
+   offset += sizeof(((struct pptp_gre_header *) 0)->ack);
+
+   ppp_hdr = __skb_header_pointer(skb, *p_nhoff + offset,
+  sizeof(_ppp_hdr),
+  data, *p_hlen, _ppp_hdr);
+   if (!ppp_hdr)
+   return FLOW_DISSECT_RET_

Re: [patch net-next RFC 1/2] flow_dissecror: Move ARP dissection into a separate function

2017-03-06 Thread Jiri Pirko
Tue, Feb 21, 2017 at 07:50:53PM CET, t...@herbertland.com wrote:
>On Tue, Feb 21, 2017 at 6:33 AM, Jiri Pirko  wrote:
>> From: Jiri Pirko 
>>
>> Make the main flow_dissect function a bit smaller and move the ARP
>> dissection into a separate function. Along with that, do the ARP header
>> processing only in case the flow dissection user requires it.
>>
>
>Acked-by: Tom Herbert 
>
>GRE might also be a good candidate to get its own function.
>

Submitted with GRE bits. Note that I left you ack and Simon's revby out
since I did some cosmetic changes until rfc.

I would be glad if you both can check it again.

Thanks!


>
>> Signed-off-by: Jiri Pirko 
>> ---
>>  net/core/flow_dissector.c | 111 
>> --
>>  1 file changed, 59 insertions(+), 52 deletions(-)
>>
>> diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
>> index c35aae1..10dc5bb 100644
>> --- a/net/core/flow_dissector.c
>> +++ b/net/core/flow_dissector.c
>> @@ -113,6 +113,61 @@ __be32 __skb_flow_get_ports(const struct sk_buff *skb, 
>> int thoff, u8 ip_proto,
>>  }
>>  EXPORT_SYMBOL(__skb_flow_get_ports);
>>
>> +static bool __skb_flow_dissect_arp(const struct sk_buff *skb,
>> +  struct flow_dissector *flow_dissector,
>> +  void *target_container, void *data,
>> +  int nhoff, int hlen)
>> +{
>> +   struct flow_dissector_key_arp *key_arp;
>> +   struct {
>> +   unsigned char ar_sha[ETH_ALEN];
>> +   unsigned char ar_sip[4];
>> +   unsigned char ar_tha[ETH_ALEN];
>> +   unsigned char ar_tip[4];
>> +   } *arp_eth, _arp_eth;
>> +   const struct arphdr *arp;
>> +   struct arphdr *_arp;
>> +
>> +   if (!dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_ARP))
>> +   return true;
>> +
>> +   arp = __skb_header_pointer(skb, nhoff, sizeof(_arp), data,
>> +  hlen, &_arp);
>> +   if (!arp)
>> +   return false;
>> +
>> +   if (arp->ar_hrd != htons(ARPHRD_ETHER) ||
>> +   arp->ar_pro != htons(ETH_P_IP) ||
>> +   arp->ar_hln != ETH_ALEN ||
>> +   arp->ar_pln != 4 ||
>> +   (arp->ar_op != htons(ARPOP_REPLY) &&
>> +arp->ar_op != htons(ARPOP_REQUEST)))
>> +   return false;
>> +
>> +   arp_eth = __skb_header_pointer(skb, nhoff + sizeof(_arp),
>> +  sizeof(_arp_eth), data,
>> +  hlen, &_arp_eth);
>> +   if (!arp_eth)
>> +   return false;
>> +
>> +   key_arp = skb_flow_dissector_target(flow_dissector,
>> +   FLOW_DISSECTOR_KEY_ARP,
>> +   target_container);
>> +
>> +   memcpy(&key_arp->sip, arp_eth->ar_sip, sizeof(key_arp->sip));
>> +   memcpy(&key_arp->tip, arp_eth->ar_tip, sizeof(key_arp->tip));
>> +
>> +   /* Only store the lower byte of the opcode;
>> +* this covers ARPOP_REPLY and ARPOP_REQUEST.
>> +*/
>> +   key_arp->op = ntohs(arp->ar_op) & 0xff;
>> +
>> +   ether_addr_copy(key_arp->sha, arp_eth->ar_sha);
>> +   ether_addr_copy(key_arp->tha, arp_eth->ar_tha);
>> +
>> +   return true;
>> +}
>> +
>>  /**
>>   * __skb_flow_dissect - extract the flow_keys struct and return it
>>   * @skb: sk_buff to extract the flow from, can be NULL if the rest are 
>> specified
>> @@ -138,7 +193,6 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
>> struct flow_dissector_key_control *key_control;
>> struct flow_dissector_key_basic *key_basic;
>> struct flow_dissector_key_addrs *key_addrs;
>> -   struct flow_dissector_key_arp *key_arp;
>> struct flow_dissector_key_ports *key_ports;
>> struct flow_dissector_key_icmp *key_icmp;
>> struct flow_dissector_key_tags *key_tags;
>> @@ -382,59 +436,12 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
>> goto out_good;
>>
>> case htons(ETH_P_ARP):
>> -   case htons(ETH_P_RARP): {
>> -   struct {
>> -   unsigned char ar_sha[ETH_ALEN];
>> -   unsigned char ar_sip[4];
>> -   unsigned char ar_tha[ETH_ALEN];
>> -   unsigned char ar_tip[4];
>> -   } *arp_eth, _arp_eth;
>> -   const struct arphdr *arp;
>> -   struct arphdr *_arp;
>> -
>> -   arp = __skb_header_pointer(skb, nhoff, sizeof(_arp), data,
>> -  hlen, &_arp);
>> -   if (!arp)
>> -   goto out_bad;
>> -
>> -   if (arp->ar_hrd != htons(ARPHRD_ETHER) ||
>> -   arp->ar_pro != htons(ETH_P_IP) ||
>> -   arp->ar_hln != ETH_ALEN ||
>> -   arp->ar_pln != 4 ||
>> -   (arp->

Please view the attached file.

2017-03-06 Thread Ayesha Gadhafi



Hello.docx
Description: MS-Word 2007 document


Re: [PATCH net-next] net: ipv4: add support for ECMP hash policy choice

2017-03-06 Thread David Ahern
On 3/6/17 7:59 AM, Nikolay Aleksandrov wrote:
> diff --git a/include/net/route.h b/include/net/route.h
> index c0874c87c173..77a5c613a290 100644
> --- a/include/net/route.h
> +++ b/include/net/route.h
> @@ -113,13 +113,12 @@ struct in_device;
>  int ip_rt_init(void);
>  void rt_cache_flush(struct net *net);
>  void rt_flush_dev(struct net_device *dev);
> -struct rtable *__ip_route_output_key_hash(struct net *, struct flowi4 *flp,
> -   int mp_hash);
> +struct rtable *__ip_route_output_key_hash(struct net *net, struct flowi4 
> *flp);
>  
>  static inline struct rtable *__ip_route_output_key(struct net *net,
>  struct flowi4 *flp)
>  {
> - return __ip_route_output_key_hash(net, flp, -1);
> + return __ip_route_output_key_hash(net, flp);
>  }
>  
>  struct rtable *ip_route_output_flow(struct net *, struct flowi4 *flp,

The "_hash" variant was added by 79a131592dbb8. If the mp_hash arg is
removed, the "_hash" wrapper should be removed and go back to
__ip_route_output_key.


Re: [4.9.13] use after free in ipv4_mtu

2017-03-06 Thread Eric Dumazet
On Mon, 2017-03-06 at 05:45 -0800, Eric Dumazet wrote:
> On Mon, 2017-03-06 at 14:33 +0800, Daniel J Blueman wrote:

> > I do change the network queueing discipline and related at runtime [1]
> > which may be triggering this, though I did think I saw the KASAN
> > report only after resuming from suspend. rf(un)kill and other tweaking
> > may have been involved too.
> > 
> > Thanks,
> >   Dan
> > 
> > [1] /etc/sysctl.d/90-tcp.conf
> > 
> > net.core.default_qdisc = fq_codel
> > net.ipv4.tcp_congestion_control = bbr
> > net.ipv4.tcp_slow_start_after_idle = 0
> > net.ipv4.tcp_ecn = 1

BTW, fq_codel is not suitable for BBR.

Only fq contains the needed pacing for BBR.





Re: [PATCH 11/29] drivers, media: convert cx88_core.refcount from atomic_t to refcount_t

2017-03-06 Thread Sergei Shtylyov

Hello.

On 03/06/2017 05:20 PM, Elena Reshetova wrote:


refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova 
Signed-off-by: Hans Liljestrand 
Signed-off-by: Kees Cook 
Signed-off-by: David Windsor 

[...]

diff --git a/drivers/media/pci/cx88/cx88.h b/drivers/media/pci/cx88/cx88.h
index 115414c..16c1313 100644
--- a/drivers/media/pci/cx88/cx88.h
+++ b/drivers/media/pci/cx88/cx88.h
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -339,7 +340,7 @@ struct cx8802_dev;

 struct cx88_core {
struct list_head   devlist;
-   atomic_t   refcount;
+   refcount_t   refcount;


   Could you please keep the name aligned with above and below?



/* board name */
intnr;



MBR, Sergei



Re: [PATCH 1/4] net: thunderx: Fix IOMMU translation faults

2017-03-06 Thread Robin Murphy
On 06/03/17 12:57, Sunil Kovvuri wrote:
>>>
>>> We are seeing a 0.75Mpps drop with IP forwarding rate due to that.
>>> Hence I have restricted calling DMA interfaces to only when IOMMU is 
>>> enabled.
>>
>> What's 0.07Mpps as a percentage of baseline? On a correctly configured
>> coherent arm64 system, in the absence of an IOMMU, dma_map_*() is
>> essentially just virt_to_phys() behind a function call or two, so I'd be
>> interested to know where any non-trivial overhead might be coming from.
> 
> It's a 5% drop and yes device is configured as coherent.
> And the drop is due to additional function calls.

OK, interesting - sounds like there's potential for some optimisation
there as well. AFAICS the callchain goes:

dma_map_single_attrs (inline)
- ops->map_page (__swiotlb_map_page)
  - swiotlb_map_page
- phys_to_dma (inline)
- dma_capable (inline)

Do you happen to have a breakdown of where the time goes? If it's mostly
just in the indirect branch our options are limited (I'm guessing
ThunderX doesn't have a particularly fancy branch predictor, if it's not
even got a data prefetcher), but if it's in the SWIOTLB code then
there's certainly room for improvement (which will hopefully tie in with
some DMA ops work I'm planning to do soon anyway).

Thanks,
Robin.

> 
> Thanks,
> Sunil.
> 



Re: [PATCH 10/26] brcmsmac: reindent split functions

2017-03-06 Thread Kalle Valo
Arend Van Spriel  writes:

> On 2-3-2017 17:38, Arnd Bergmann wrote:
>> In the previous commit I left the indentation alone to help reviewing
>> the patch, this one now runs the three new functions through 'indent -kr -8'
>> with some manual fixups to avoid silliness.
>> 
>> No changes other than whitespace are intended here.
>
> Acked-by: Arend van Spriel 
>> Signed-off-by: Arnd Bergmann 
>> ---
>>  .../broadcom/brcm80211/brcmsmac/phy/phy_n.c| 1507 
>> +---
>>  1 file changed, 697 insertions(+), 810 deletions(-)
>> 

Arend, please edit your quotes. Leaving 1000 lines of unnecessary quotes
in your reply makes my use of patchwork horrible:

https://patchwork.kernel.org/patch/9601155/

-- 
Kalle Valo


Re: [PATCH 08/26] brcmsmac: make some local variables 'static const' to reduce stack size

2017-03-06 Thread Kalle Valo
Arend Van Spriel  writes:

> On 2-3-2017 17:38, Arnd Bergmann wrote:
>> With KASAN and a couple of other patches applied, this driver is one
>> of the few remaining ones that actually use more than 2048 bytes of
>> kernel stack:
>> 
>> broadcom/brcm80211/brcmsmac/phy/phy_n.c: In function 
>> 'wlc_phy_workarounds_nphy_gainctrl':
>> broadcom/brcm80211/brcmsmac/phy/phy_n.c:16065:1: warning: the frame size of 
>> 3264 bytes is larger than 2048 bytes [-Wframe-larger-than=]
>> broadcom/brcm80211/brcmsmac/phy/phy_n.c: In function 
>> 'wlc_phy_workarounds_nphy':
>> broadcom/brcm80211/brcmsmac/phy/phy_n.c:17138:1: warning: the frame size of 
>> 2864 bytes is larger than 2048 bytes [-Wframe-larger-than=]
>> 
>> Here, I'm reducing the stack size by marking as many local variables as
>> 'static const' as I can without changing the actual code.
>
> Acked-by: Arend van Spriel 

Arnd, via which tree are you planning to submit these? I'm not sure
what I should do with the wireless drivers patches from this series.

-- 
Kalle Valo


Re: [PATCH net-next] net: ipv4: add support for ECMP hash policy choice

2017-03-06 Thread Nikolay Aleksandrov
On 06/03/17 18:24, David Ahern wrote:
> On 3/6/17 7:59 AM, Nikolay Aleksandrov wrote:
>> diff --git a/include/net/route.h b/include/net/route.h
>> index c0874c87c173..77a5c613a290 100644
>> --- a/include/net/route.h
>> +++ b/include/net/route.h
>> @@ -113,13 +113,12 @@ struct in_device;
>>  int ip_rt_init(void);
>>  void rt_cache_flush(struct net *net);
>>  void rt_flush_dev(struct net_device *dev);
>> -struct rtable *__ip_route_output_key_hash(struct net *, struct flowi4 *flp,
>> -  int mp_hash);
>> +struct rtable *__ip_route_output_key_hash(struct net *net, struct flowi4 
>> *flp);
>>  
>>  static inline struct rtable *__ip_route_output_key(struct net *net,
>> struct flowi4 *flp)
>>  {
>> -return __ip_route_output_key_hash(net, flp, -1);
>> +return __ip_route_output_key_hash(net, flp);
>>  }
>>  
>>  struct rtable *ip_route_output_flow(struct net *, struct flowi4 *flp,
> 
> The "_hash" variant was added by 79a131592dbb8. If the mp_hash arg is
> removed, the "_hash" wrapper should be removed and go back to
> __ip_route_output_key.
> 

Ah yes, I've missed that. :-) Will remove the _hash variant when posting v2.

Thanks,
 Nik



Re: [PATCH] bridge: Add support for IEEE 802.11 Proxy ARP for IPv6

2017-03-06 Thread Jouni Malinen
On Fri, Feb 24, 2017 at 11:55:37AM -0800, Stephen Hemminger wrote:
> The concept is fine.

Thanks for taking a look.

> Please add some comments to the code about what is happening and why.
> The proposed patch is too sparse and has no comments.

Sure, will do that for the next version.

> > +   skb = alloc_skb(hlen + sizeof(struct ipv6hdr) + sizeof(*msg) +
> > +   ndisc_opt_addr_space(dev,
> > +NDISC_NEIGHBOUR_ADVERTISEMENT) +
> > +   tlen, GFP_ATOMIC);
> > +   if (!skb)
> > +   return;
> 
> Why not netdev_alloc_skb which takes care of padding and setting skb->dev? 

This implementation in br_ndisc_send_na() was trying to follow
ndisc_send_na() design for the operations.. If this function remains
(see below), I can clean this up further.

> Rather than doing copy/paste of the code to generate a ND message, it would
> be better to have one function in IPv6 code that handles that. That would keep
> from having to fix code in two places in the future. Is there some way
> to extend ndisc_send_na?

That was the original plan and adding the target_lladdr part would be
straightforward. The part that gets complex is in figuring out how to
use a foreign link layer source address (the MAC address on behalf of
which the local device is replying) in the outgoing NA when using the
IEEE 802.11/Hotspot 2.0 design.

ndisc_send_na() uses the full IPv6 stack for building the frame when
calling ndisc_send_skb(). dst_output() ends up sending this through
ip6_output(), I'd assume, and after building the IPv6 header, the local
MAC address of the outgoing interface gets assigned to the Ethernet
header. I'm not sure how to override that functionality in any clean
way. The dev_hard_header() call in the mostly copy-pasted version in
br_ndisc_send_na() followed by use of the custom
br_ndisc_send_na_finish() to call dev_queue_xmit(skb) was done to allow
the link layer source address to be modified.

The normal path in the net stack seemed to use dev_hard_header() with
saddr = NULL which maps to eth_header() saddr = NULL case to use device
source address. Either those would need to be somehow modified for this
special skb containing the NA with different source address requirement
or something after these calls would need to modify the frame to change
the source address.

Would you happen to know any convenient means for modifying the IPv6
stack behavior for ndisc_send_skb() cases conditionally to allow the
link layer source address to be modified while still being able to use
the existing IPv6 header and the Ethernet header construction function?

-- 
Jouni MalinenPGP id EFC895FA


Re: [PATCH] netfilter: remove redundant check on ret being non-zero

2017-03-06 Thread Pablo Neira Ayuso
On Tue, Feb 28, 2017 at 11:31:15AM +, Colin King wrote:
> From: Colin Ian King 
> 
> ret is initialized to zero and if it is set to non-zero in the
> xt_entry_foreach loop then we exit via the out_free label. Hence
> the check for ret being non-zero is redundant and can be removed.
> 
> Detected by CoverityScan, CID#1357132 ("Logically Dead Code")

Applied, thanks.


Re: [PATCH] netfilter: Use pr_cont where appropriate

2017-03-06 Thread Pablo Neira Ayuso
On Tue, Feb 28, 2017 at 02:09:24PM -0800, Joe Perches wrote:
> Logging output was changed when simple printks without KERN_CONT
> are now emitted on a new line and KERN_CONT is required to continue
> lines so use pr_cont.
> 
> Miscellanea:
> 
> o realign arguments
> o use print_hex_dump instead of a local variant

Applied, thanks Joe.


Re: [PATCH net-next RFC 3/4] vhost: interrupt coalescing support

2017-03-06 Thread Willem de Bruijn
On Mon, Mar 6, 2017 at 4:28 AM, Jason Wang  wrote:
>
>
> On 2017年03月03日 22:39, Willem de Bruijn wrote:
>>
>> +void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq);
>> +static enum hrtimer_restart vhost_coalesce_timer(struct hrtimer *timer)
>> +{
>> +   struct vhost_virtqueue *vq =
>> +   container_of(timer, struct vhost_virtqueue, ctimer);
>> +
>> +   if (mutex_trylock(&vq->mutex)) {
>> +   vq->coalesce_frames = vq->max_coalesce_frames;
>> +   vhost_signal(vq->dev, vq);
>> +   mutex_unlock(&vq->mutex);
>> +   }
>> +
>> +   /* TODO: restart if lock failed and not held by handle_tx */
>> +   return HRTIMER_NORESTART;
>> +}
>> +
>
>
> Then we may lose an interrupt forever if no new tx request? I believe we
> need e.g vhost_poll_queue() here.

Absolutely, I need to fix this. The common case for failing to grab
the lock is competition with handle_tx. With careful coding we can
probably avoid scheduling another run with vhost_poll_queue in
the common case.

Your patch v7 cancels the pending hrtimer at the start of handle_tx.
I need to reintroduce that, and also only schedule a timer at the end
of handle_tx, not immediately when vq->coalesce_frames becomes
non-zero.


Re: net: heap out-of-bounds in fib6_clean_node/rt6_fill_node/fib6_age/fib6_prune_clone

2017-03-06 Thread David Ahern
On 3/4/17 1:15 PM, Eric Dumazet wrote:
> On Sat, 2017-03-04 at 19:57 +0100, Dmitry Vyukov wrote:
>> On Fri, Mar 3, 2017 at 8:12 PM, David Ahern  wrote:
>>> On 3/3/17 6:39 AM, Dmitry Vyukov wrote:
 I am getting heap out-of-bounds reports in
 fib6_clean_node/rt6_fill_node/fib6_age/fib6_prune_clone while running
 syzkaller fuzzer on 86292b33d4b79ee03e2f43ea0381ef85f077c760. They all
 follow the same pattern: an object of size 216 is allocated from
 ip_dst_cache slab, and then accessed at offset 272/276 withing
 fib6_walk. Looks like type confusion. Unfortunately this is not
 reproducible.
>>>
>>> I'll take a look this weekend or Monday at the latest.
>>
>>
>> I've got some additional useful info on this. I think this is
>> use-after-free rather than out-of-bounds. I've collected stack where
>> the route was disposed with call_rcu, see the last "Disposed" stack.
>> The crash happens when cmpxchg in rt_cache_route replaces an existing
>> route. And that route seems to have some existing pointers to it
>> (rt->dst.rt6_next) which fib6_walk uses to get to it after its
>> deletion.
> 
> rt_cache_route() deals with IPv4 routes.
> 
> We somehow mix IPv4 and IPv6 dsts in IPv6 tree.
> 
> We need to add type safety at IPV6 route insertions to catch the
> offender.
> 

I've seen something like this before -- a rt was on the gc list but
still linked in the tables because of some reference.

Dmitry: you seem to have reproduced this a few times. Can you share how
to run whatever tests you are using?


  1   2   >