date:20150824

pull-request: can 2015-08-25

2015-08-24 Thread Marc Kleine-Budde

Hello David,

this is the updated pull request of one patch by me for the peak_usb driver. It
fixes the driver, so that non FD adapters don't provide CAN FD bittimings.

regards,
Marc

---

The following changes since commit b6df7d61c8776a882dd47ba4714d1445dd7ef2d9:

  net: bcmgenet: fix uncleaned dma flags (2015-08-23 23:00:41 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can.git 
tags/linux-can-fixes-for-4.2-20150825

for you to fetch changes up to 06b23f7fbbf26a025fd68395c7586949db586b47:

  can: pcan_usb: don't provide CAN FD bittimings by non-FD adapters (2015-08-25 
08:50:00 +0200)


linux-can-fixes-for-4.2-20150825


Marc Kleine-Budde (1):
  can: pcan_usb: don't provide CAN FD bittimings by non-FD adapters

 drivers/net/can/usb/peak_usb/pcan_usb.c  | 24 +++
 drivers/net/can/usb/peak_usb/pcan_usb_core.c |  4 +-
 drivers/net/can/usb/peak_usb/pcan_usb_core.h |  4 +-
 drivers/net/can/usb/peak_usb/pcan_usb_fd.c   | 96 +++-
 drivers/net/can/usb/peak_usb/pcan_usb_pro.c  | 24 +++
 5 files changed, 82 insertions(+), 70 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] can: pcan_usb: don't provide CAN FD bittimings by non-FD adapters

2015-08-24 Thread Marc Kleine-Budde

The CAN FD data bittiming constants are provided via netlink only when there
are valid CAN FD constants available in priv->data_bittiming_const.

Due to the indirection of pointer assignments in the peak_usb driver the
priv->data_bittiming_const never becomes NULL - not even for non-FD adapters.

The data_bittiming_const points to zero'ed data which leads to this result
when running 'ip -details link show can0':

35: can0:  mtu 16 qdisc noop state DOWN mode DEFAULT group default 
qlen 10
link/can  promiscuity 0
can state STOPPED restart-ms 0
  pcan_usb: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
  : dtseg1 0..0 dtseg2 0..0 dsjw 1..0 dbrp 0..0 dbrp-inc 0  <== BROKEN!
  clock 800

This patch changes the struct peak_usb_adapter::bittiming_const and struct
peak_usb_adapter::data_bittiming_const to pointers to fix the assignemnt
problems.

Cc: linux-stable  # >= 4.0
Reported-by: Oliver Hartkopp 
Tested-by: Oliver Hartkopp 
Signed-off-by: Marc Kleine-Budde 
---
 drivers/net/can/usb/peak_usb/pcan_usb.c  | 24 +++
 drivers/net/can/usb/peak_usb/pcan_usb_core.c |  4 +-
 drivers/net/can/usb/peak_usb/pcan_usb_core.h |  4 +-
 drivers/net/can/usb/peak_usb/pcan_usb_fd.c   | 96 +++-
 drivers/net/can/usb/peak_usb/pcan_usb_pro.c  | 24 +++
 5 files changed, 82 insertions(+), 70 deletions(-)

diff --git a/drivers/net/can/usb/peak_usb/pcan_usb.c 
b/drivers/net/can/usb/peak_usb/pcan_usb.c
index 6b94007ae052..838545ce468d 100644
--- a/drivers/net/can/usb/peak_usb/pcan_usb.c
+++ b/drivers/net/can/usb/peak_usb/pcan_usb.c
@@ -854,6 +854,18 @@ static int pcan_usb_probe(struct usb_interface *intf)
 /*
  * describe the PCAN-USB adapter
  */
+static const struct can_bittiming_const pcan_usb_const = {
+   .name = "pcan_usb",
+   .tseg1_min = 1,
+   .tseg1_max = 16,
+   .tseg2_min = 1,
+   .tseg2_max = 8,
+   .sjw_max = 4,
+   .brp_min = 1,
+   .brp_max = 64,
+   .brp_inc = 1,
+};
+
 const struct peak_usb_adapter pcan_usb = {
.name = "PCAN-USB",
.device_id = PCAN_USB_PRODUCT_ID,
@@ -862,17 +874,7 @@ const struct peak_usb_adapter pcan_usb = {
.clock = {
.freq = PCAN_USB_CRYSTAL_HZ / 2 ,
},
-   .bittiming_const = {
-   .name = "pcan_usb",
-   .tseg1_min = 1,
-   .tseg1_max = 16,
-   .tseg2_min = 1,
-   .tseg2_max = 8,
-   .sjw_max = 4,
-   .brp_min = 1,
-   .brp_max = 64,
-   .brp_inc = 1,
-   },
+   .bittiming_const = &pcan_usb_const,
 
/* size of device private data */
.sizeof_dev_private = sizeof(struct pcan_usb),
diff --git a/drivers/net/can/usb/peak_usb/pcan_usb_core.c 
b/drivers/net/can/usb/peak_usb/pcan_usb_core.c
index 7921cff93a63..5a2e341a6d1e 100644
--- a/drivers/net/can/usb/peak_usb/pcan_usb_core.c
+++ b/drivers/net/can/usb/peak_usb/pcan_usb_core.c
@@ -792,9 +792,9 @@ static int peak_usb_create_dev(const struct 
peak_usb_adapter *peak_usb_adapter,
dev->ep_msg_out = peak_usb_adapter->ep_msg_out[ctrl_idx];
 
dev->can.clock = peak_usb_adapter->clock;
-   dev->can.bittiming_const = &peak_usb_adapter->bittiming_const;
+   dev->can.bittiming_const = peak_usb_adapter->bittiming_const;
dev->can.do_set_bittiming = peak_usb_set_bittiming;
-   dev->can.data_bittiming_const = &peak_usb_adapter->data_bittiming_const;
+   dev->can.data_bittiming_const = peak_usb_adapter->data_bittiming_const;
dev->can.do_set_data_bittiming = peak_usb_set_data_bittiming;
dev->can.do_set_mode = peak_usb_set_mode;
dev->can.do_get_berr_counter = peak_usb_adapter->do_get_berr_counter;
diff --git a/drivers/net/can/usb/peak_usb/pcan_usb_core.h 
b/drivers/net/can/usb/peak_usb/pcan_usb_core.h
index 9e624f05ad4d..506fe506c9d3 100644
--- a/drivers/net/can/usb/peak_usb/pcan_usb_core.h
+++ b/drivers/net/can/usb/peak_usb/pcan_usb_core.h
@@ -48,8 +48,8 @@ struct peak_usb_adapter {
u32 device_id;
u32 ctrlmode_supported;
struct can_clock clock;
-   const struct can_bittiming_const bittiming_const;
-   const struct can_bittiming_const data_bittiming_const;
+   const struct can_bittiming_const * const bittiming_const;
+   const struct can_bittiming_const * const data_bittiming_const;
unsigned int ctrl_count;
 
int (*intf_probe)(struct usb_interface *intf);
diff --git a/drivers/net/can/usb/peak_usb/pcan_usb_fd.c 
b/drivers/net/can/usb/peak_usb/pcan_usb_fd.c
index 09d14e70abd7..ce44a033f63b 100644
--- a/drivers/net/can/usb/peak_usb/pcan_usb_fd.c
+++ b/drivers/net/can/usb/peak_usb/pcan_usb_fd.c
@@ -990,6 +990,30 @@ static void pcan_usb_fd_free(struct peak_usb_device *dev)
 }
 
 /* describes the PCAN-USB FD adapter */
+static const struct can_bittiming_const pcan_usb_fd_const = {
+   .name = "pcan_usb_fd",
+   .tseg1_min = 1,
+   .tseg1_max = 64,
+   .t

Re: pull-request: can 2015-08-24

2015-08-24 Thread Marc Kleine-Budde

On 08/24/2015 11:20 AM, Marc Kleine-Budde wrote:
> Hello David,
> 
> this is a pull request of one patch by me for the peak_usb driver. It fixes 
> the
> driver, so that non FD adapters don't provide CAN FD bittimings.

As there are some typos in the commit message I'll send an updated pull
request. David, please don't pull this one.

thanks,
Marc
-- 
Pengutronix e.K.  | Marc Kleine-Budde   |
Industrial Linux Solutions| Phone: +49-231-2826-924 |
Vertretung West/Dortmund  | Fax:   +49-5121-206917- |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |



signature.asc
Description: OpenPGP digital signature

Re: [net-next PATCH 1/3] drivers: net: cpsw: add am335x errata workarround for interrutps

2015-08-24 Thread Mugunthan V N

On Monday 24 August 2015 03:34 PM, Sekhar Nori wrote:
> Hi Mugunthan,
> 
> On Wednesday 12 August 2015 03:22 PM, Mugunthan V N wrote:
>> > +static const struct of_device_id cpsw_of_mtable[] = {
>> > +  { .compatible = "ti,cpsw", .data = &cpsw_devtype[CPSW], },
>> > +  { .compatible = "ti,am335x-cpsw", .data = &cpsw_devtype[AM335X_CPSW], },
>> > +  { .compatible = "ti,am4372-cpsw", .data = &cpsw_devtype[AM4372_CPSW], },
>> > +  { .compatible = "ti,dra7-cpsw", .data = &cpsw_devtype[DRA7_CPSW], },
> I do not see documentation added for these compatibles. Since the series
> is already applied, can you send additional patches adding documentation?

Will submit a patch ASAP

Regards
Mugunthan V N
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: iproute2: Behavioural Bug?

2015-08-24 Thread Cong Wang

On Mon, Aug 24, 2015 at 10:14 PM, Akshat Kakkar  wrote:
> Dear Florian,
>
> There are two filters 15:2:2 and 15:2:3 and I have deleted only
> 15:2:3, so 15:2:2 will still be there and hence this condition
> "destroy proto tp when all filters are gone" should not be applicable
> over here.
>

Florian is correct, it _does_ look like this is caused by my patch,
I guess some check in u32_destroy() isn't correct.

It's late here, I will look into this tomorrow.

Thanks for the report anyway!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

use after free again...

2015-08-24 Thread Cong Wang

Hi, Jiri,

In your commit 61adedf3e3f1d3f032c5a6a299978d91eff6d555 ("route: move
lwtunnel state to dst_entry"), how the hell could the following piece
be correct? :-/


@@ -264,6 +266,7 @@ again:
kfree(dst);
else
kmem_cache_free(dst->ops->kmem_cachep, dst);
+   lwtstate_put(dst->lwtstate);


There is clearly a kfree(dst) before dereferencing dst... And I got a
nice crash:

[   33.160081] general protection fault:  [#1] SMP DEBUG_PAGEALLOC
[   33.164285] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.2.0-rc7+ #166
[   33.164285] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[   33.164285] task: 88010656d280 ti: 88010657 task.ti:
88010657
[   33.164285] RIP: 0010:[]  []
dst_destroy+0xa6/0xef
[   33.164285] RSP: 0018:880107603e38  EFLAGS: 00010202
[   33.164285] RAX: 0001 RBX: 8800d225a000 RCX: 82250fd0
[   33.164285] RDX: 0001 RSI: 82250fd0 RDI: 6b6b6b6b6b6b6b6b
[   33.164285] RBP: 880107603e58 R08: 0001 R09: 0001
[   33.164285] R10: b530 R11: 880107609000 R12: 
[   33.164285] R13: 82343c40 R14:  R15: 8182fb4f
[   33.164285] FS:  () GS:88010760()
knlGS:
[   33.164285] CS:  0010 DS:  ES:  CR0: 8005003b
[   33.164285] CR2: 7fcabd9d3000 CR3: d7279000 CR4: 06e0
[   33.164285] Stack:
[   33.164285]  82250fd0 8801077d6f00 82253c40
8800d225a000
[   33.164285]  880107603e68 8182fb5d 880107603f08
810d795e
[   33.164285]  810d7648 880106574000 88010656d280
88010656d280
[   33.164285] Call Trace:
[   33.164285]  
[   33.164285]  [] dst_destroy_rcu+0xe/0x1d
[   33.164285]  [] rcu_process_callbacks+0x618/0x7eb
[   33.164285]  [] ? rcu_process_callbacks+0x302/0x7eb
[   33.164285]  [] ? dst_gc_task+0x1eb/0x1eb
[   33.164285]  [] __do_softirq+0x178/0x39f
[   33.164285]  [] irq_exit+0x41/0x95
[   33.164285]  [] smp_apic_timer_interrupt+0x34/0x40
[   33.164285]  [] apic_timer_interrupt+0x6d/0x80
[   33.164285]  
[   33.164285]  [] ? default_idle+0x21/0x32
[   33.164285]  [] ? default_idle+0x1f/0x32
[   33.164285]  [] arch_cpu_idle+0xf/0x11
[   33.164285]  [] default_idle_call+0x1f/0x21
[   33.164285]  [] cpu_startup_entry+0x1ad/0x273
[   33.164285]  [] start_secondary+0x135/0x156


I cooked a _quick_ patch to fix it. I can send it formally if it looks
good to you, if not, feel free to send a better fix before me.

diff --git a/net/core/dst.c b/net/core/dst.c
index 50dcdbb..477035e 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -262,11 +262,12 @@ again:
if (dst->dev)
dev_put(dst->dev);

+   lwtstate_put(dst->lwtstate);
+
if (dst->flags & DST_METADATA)
kfree(dst);
else
kmem_cache_free(dst->ops->kmem_cachep, dst);
-   lwtstate_put(dst->lwtstate);

dst = child;
if (dst) {
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Compile error: 'nf_skb_duplicated' undeclared (first use in this function)

2015-08-24 Thread Cong Wang

Hi,

I just got:

net/ipv4/netfilter/nf_dup_ipv4.c: In function ‘nf_dup_ipv4’:
net/ipv4/netfilter/nf_dup_ipv4.c:72:16: error: ‘nf_skb_duplicated’
undeclared (first use in this function)
  if (this_cpu_read(nf_skb_duplicated))
^
net/ipv4/netfilter/nf_dup_ipv4.c:72:16: note: each undeclared
identifier is reported only once for each function it appears in

And the following patch could fix it, but I haven't looked into it
yet, maybe some Kconfig symbol dependency issue too.

diff --git a/net/ipv4/netfilter/nf_dup_ipv4.c b/net/ipv4/netfilter/nf_dup_ipv4.c
index b5bb375..2d79e6e 100644
--- a/net/ipv4/netfilter/nf_dup_ipv4.c
+++ b/net/ipv4/netfilter/nf_dup_ipv4.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/net/ipv6/netfilter/nf_dup_ipv6.c b/net/ipv6/netfilter/nf_dup_ipv6.c
index d8ab654..89c2624 100644
--- a/net/ipv6/netfilter/nf_dup_ipv6.c
+++ b/net/ipv6/netfilter/nf_dup_ipv6.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: iproute2: Behavioural Bug?

2015-08-24 Thread Akshat Kakkar

Dear Florian,

There are two filters 15:2:2 and 15:2:3 and I have deleted only
15:2:3, so 15:2:2 will still be there and hence this condition
"destroy proto tp when all filters are gone" should not be applicable
over here.

On Tue, Aug 25, 2015 at 4:52 AM, Florian Westphal  wrote:
> Akshat Kakkar  wrote:
>
> [ CC Cong ]
>
>> When I am trying to delete a single tc filter (i.e. specifying its
>> handle), it is deleting all the
>> filters with the same priority/preference. i.e. it is ignoring the
>> handle specified.
>>
>> But, When I am doing similar activity in hashtable 800: it is deleting only 
>> the
>> specified filter, i.e. it is behaving as expected.
>>
>> I am unable to comprehend the reason for this difference in behaviour.
>>
>> Infact, in kernel 2.6.32 all is working as expected. However, in
>> kernel 3.1 and 4.1 it is having the behaviour as mentioned above.
>>
>> For example, following set of commands  create a hashtable 15: and add
>> 2 filters to it.
>>
>> tc filter add dev eth0 parent 1:0 prio 5 handle 15: protocol ip u32 divisor 
>> 256
>> tc filter add dev eth0 protocol ip parent 1: prio 5 handle 15:2:2 u32
>> ht 15:2: match ip src 10.0.0.2 flowid 1:10
>> tc filter add dev eth0 protocol ip parent 1: prio 5 handle 15:2:3 u32
>> ht 15:2: match ip src 10.0.0.3 flowid 1:10
>>
>> Now following command DELETES ALL THE FILTERS, though it should only
>> delete FILTER 15:2:3 !
>> tc filter del dev eth0 protocol ip parent 1: prio 5 handle 15:2:3 u32
>>
>> O/p of tc filter show eth0 is this case is blank. As all filters are deleted.
>
> Happens since
>
> 1e052be69d045c8d0f82ff1116fd3e5a79661745
> ("net_sched: destroy proto tp when all filters are gone").
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH, net-next] r8169: On RTL 8101 series bit SYSErr is reserved.

2015-08-24 Thread Marian Corcodel

May be entire program must
rewritten due multiple errors.

2015-08-24 21:38 GMT+03:00, David Miller :
> From: Corcodel Marian 
> Date: Mon, 24 Aug 2015 21:12:53 +0300
>
>> diff --git a/drivers/net/ethernet/realtek/r8169.c
>> b/drivers/net/ethernet/realtek/r8169.c
>> index 5693e65..32d2072 100644
>> --- a/drivers/net/ethernet/realtek/r8169.c
>> +++ b/drivers/net/ethernet/realtek/r8169.c
>> @@ -8256,6 +8256,14 @@ static int rtl_init_one(struct pci_dev *pdev, const
>> struct pci_device_id *ent)
>>  RTL_W8(Config1, RTL_R8(Config1) | PMEnable);
>>  RTL_W8(Config5, RTL_R8(Config5) & (BWF | MWF | UWF | LanWake |
>> PMEStatus));*/
>>  switch (tp->mac_version) {
>> +case RTL_GIGA_MAC_VER_07:
>> +case RTL_GIGA_MAC_VER_08:
>> +case RTL_GIGA_MAC_VER_09:
>> +case RTL_GIGA_MAC_VER_10:
>> +case RTL_GIGA_MAC_VER_13:
>> +case RTL_GIGA_MAC_VER_16:
>> +pci_write_config_word(pdev, PCI_COMMAND, ~PCI_COMMAND_SERR);
>
> You're writing all sorts of bits you definitely don't want to set here.
>
> Furthermore, there is no need to clear a bit that shouldn't be set
> in the first place.
>
> Your patches are really full of major errors, and unsuitable for
> upstream.
>
> Yes, all of them.
>
> So please stop posting your r8169 changes here, because if you don't
> care if your patches get included or not, then you should not be
> posting them here.  This isn't a place to just dump ramdom patches,
> sorry.
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] netlink: netlink_ack send a capped message in case of error

2015-08-24 Thread David Miller

From: Pablo Neira Ayuso 
Date: Mon, 24 Aug 2015 20:56:37 +0200

> On Mon, Aug 24, 2015 at 10:08:22AM +0200, Christophe Ricard wrote:
>> Hi Scott,
>> 
>> I think i understand the potential limitation of my solution.
>> I saw something was proposed by Jiri Benc who pushed an additional flag to
>> tell if the payload can be ignored in case of an error.
>> http://patchwork.ozlabs.org/patch/290976/
>> 
>> Do you think this one is acceptable ? I am not sure to understand David
>> last comment.
> 
> I think David suggests something like the (completely untested)
> attached patch.

Yes, echo'ing the entire message back in an ACK is really pointless.

Especially since if the user really is interested in noticing ACKs
it can very easily keep the original request around and match on
sequence number, as Pablo's patch's commit message suggests.

We're stuck with the current behavior by default, but we can add the
new ACK feature to deal with the issue in the long term.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 net-next 5/8] geneve: Add support to collect tunnel metadata.

2015-08-24 Thread Pravin Shelar

On Mon, Aug 24, 2015 at 6:42 PM, Jesse Gross  wrote:
> On Mon, Aug 24, 2015 at 10:43 AM, Pravin B Shelar  wrote:
>> diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
>> index 0a6d974..c05bc13 100644
>> --- a/drivers/net/geneve.c
>> +++ b/drivers/net/geneve.c
>> @@ -141,10 +190,15 @@ drop:
>>  /* Setup stats when device is created */
>>  static int geneve_init(struct net_device *dev)
>>  {
>> +   struct geneve_dev *geneve = netdev_priv(dev);
>> +
>> dev->tstats = netdev_alloc_pcpu_stats(struct pcpu_sw_netstats);
>> if (!dev->tstats)
>> return -ENOMEM;
>>
>> +   if (geneve->collect_md)
>> +   dev->features |= NETIF_F_NETNS_LOCAL;
>
> I was going back and forth on whether this is the right thing to do.
> Is it any weirder to allow this than to move a normal tunnel device
> across namespaces?

Moving this device means moving all tunnels backed by this device
rather than specific tunnel device. Thats why it does not look right
to move such device.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [v2 03/11] soc/fsl: Introduce the DPAA BMan portal driver

2015-08-24 Thread Scott Wood

On Wed, Aug 12, 2015 at 04:14:49PM -0400, Roy Pledge wrote:
> diff --git a/drivers/soc/fsl/qbman/bman.c b/drivers/soc/fsl/qbman/bman.c
> index 9a500ce..d6e2204 100644
> --- a/drivers/soc/fsl/qbman/bman.c
> +++ b/drivers/soc/fsl/qbman/bman.c
> @@ -165,11 +165,11 @@ static struct bman *bm_create(void *regs)
>  
>  static inline u32 __bm_in(struct bman *bm, u32 offset)
>  {
> - return in_be32((void *)bm + offset);
> + return ioread32be((void *)bm + offset);
>  }
>  static inline void __bm_out(struct bman *bm, u32 offset, u32 val)
>  {
> - out_be32((void *)bm + offset, val);
> + iowrite32be(val, (void*) bm + offset);
>  }

Don't introduce a problem in one patch and then fix it in another.  What
does this change have to do with introducing the portal driver?

>  #define bm_in(reg)   __bm_in(bm, REG_##reg)
>  #define bm_out(reg, val) __bm_out(bm, REG_##reg, val)
> @@ -341,6 +341,7 @@ u32 bm_pool_free_buffers(u32 bpid)
>  {
>   return bm_in(POOL_CONTENT(bpid));
>  }
> +EXPORT_SYMBOL(bm_pool_free_buffers);

If you're exporting this (or even making it global), where's the
documentation?

> +/* BTW, the drivers (and h/w programming model) already obtain the required
> + * synchronisation for portal accesses via lwsync(), hwsync(), and
> + * data-dependencies. Use of barrier()s or other order-preserving primitives
> + * simply degrade performance. Hence the use of the __raw_*() interfaces, 
> which
> + * simply ensure that the compiler treats the portal registers as volatile 
> (ie.
> + * non-coherent). */

volatile does not mean "non-coherent".

Be careful with this regarding endian, e.g. on ARM we can run the CPU in
big or little endian on the same chip, and the raw accessors also
unfortunately bypass endian conversion.

> +
> +/* Cache-inhibited register access. */
> +#define __bm_in(bm, o)   __raw_readl((bm)->addr_ci + (o))
> +#define __bm_out(bm, o, val) __raw_writel((val), (bm)->addr_ci + (o))
> +#define bm_in(reg)   __bm_in(&portal->addr, BM_REG_##reg)
> +#define bm_out(reg, val) __bm_out(&portal->addr, BM_REG_##reg, val)

Don't have multiple implementations of bm_in/out, with the same name,
where bm in both refers to "bman", but which have different functions.

> +/* Cache-enabled (index) register access */
> +#define __bm_cl_touch_ro(bm, o) dcbt_ro((bm)->addr_ce + (o))
> +#define __bm_cl_touch_rw(bm, o) dcbt_rw((bm)->addr_ce + (o))
> +#define __bm_cl_in(bm, o)__raw_readl((bm)->addr_ce + (o))
> +#define __bm_cl_out(bm, o, val) \
> + do { \
> + u32 *__tmpclout = (bm)->addr_ce + (o); \
> + __raw_writel((val), __tmpclout); \
> + dcbf(__tmpclout); \
> + } while (0)
> +#define __bm_cl_invalidate(bm, o) dcbi((bm)->addr_ce + (o))
> +#define bm_cl_touch_ro(reg) __bm_cl_touch_ro(&portal->addr, 
> BM_CL_##reg##_CENA)
> +#define bm_cl_touch_rw(reg) __bm_cl_touch_rw(&portal->addr, 
> BM_CL_##reg##_CENA)
> +#define bm_cl_in(reg)__bm_cl_in(&portal->addr, 
> BM_CL_##reg##_CENA)
> +#define bm_cl_out(reg, val) __bm_cl_out(&portal->addr, BM_CL_##reg##_CENA, 
> val)
> +#define bm_cl_invalidate(reg)\
> + __bm_cl_invalidate(&portal->addr, BM_CL_##reg##_CENA)

Define these using functions to operate on pointers, and pass the pointer
in without all the token-pasting.  Some extra explanation of the cache
manipulation would also be helpful.

> +/* --- RCR API --- */
> +
> +/* Bit-wise logic to wrap a ring pointer by clearing the "carry bit" */
> +#define RCR_CARRYCLEAR(p) \
> + (void *)((unsigned long)(p) & (~(unsigned long)(BM_RCR_SIZE << 6)))

This could be a function.

Where does 6 come from?  You use it again in the next function.  Please
define it symbolically.

> +
> +/* Bit-wise logic to convert a ring pointer to a ring index */
> +static inline u8 RCR_PTR2IDX(struct bm_rcr_entry *e)
> +{
> + return ((uintptr_t)e >> 6) & (BM_RCR_SIZE - 1);
> +}

This is a function, so don't use ALLCAPS.

> +/* Increment the 'cursor' ring pointer, taking 'vbit' into account */
> +static inline void RCR_INC(struct bm_rcr *rcr)
> +{
> + /* NB: this is odd-looking, but experiments show that it generates
> +  * fast code with essentially no branching overheads. We increment to
> +  * the next RCR pointer and handle overflow and 'vbit'. */
> + struct bm_rcr_entry *partial = rcr->cursor + 1;
> +
> + rcr->cursor = RCR_CARRYCLEAR(partial);
> + if (partial != rcr->cursor)
> + rcr->vbit ^= BM_RCR_VERB_VBIT;
> +}
> +
> +static inline int bm_rcr_init(struct bm_portal *portal, enum bm_rcr_pmode 
> pmode,
> + __maybe_unused enum bm_rcr_cmode cmode)
> +{
> + /* This use of 'register', as well as all other occurrences, is because
> +  * it has been observed to generate much faster code with gcc than is
> +  * otherwise the case. */
> + register struct bm_rcr *rcr = &portal->rcr;

What version of GCC?  Normal optimization settings?

Has the seemingly excessive use of inlin

Re: [PATCH v3 net-next 5/8] geneve: Add support to collect tunnel metadata.

2015-08-24 Thread Jesse Gross

On Mon, Aug 24, 2015 at 10:43 AM, Pravin B Shelar  wrote:
> diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
> index 0a6d974..c05bc13 100644
> --- a/drivers/net/geneve.c
> +++ b/drivers/net/geneve.c
> @@ -141,10 +190,15 @@ drop:
>  /* Setup stats when device is created */
>  static int geneve_init(struct net_device *dev)
>  {
> +   struct geneve_dev *geneve = netdev_priv(dev);
> +
> dev->tstats = netdev_alloc_pcpu_stats(struct pcpu_sw_netstats);
> if (!dev->tstats)
> return -ENOMEM;
>
> +   if (geneve->collect_md)
> +   dev->features |= NETIF_F_NETNS_LOCAL;

I was going back and forth on whether this is the right thing to do.
Is it any weirder to allow this than to move a normal tunnel device
across namespaces?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH iproute2 v2] add support for brief output for link and addresses

2015-08-24 Thread Andy Gospodarek

This adds support for slightly less output than is normally provided by
'ip link show' and 'ip addr show'.  This is a bit better when you have a
host with lots of interfaces.  Sample output:

$ ip -br link show
lo   UNKNOWN  00:00:00:00:00:00  
p7p1 UP   08:00:27:9d:62:9f  
p8p1 DOWN 08:00:27:dc:d8:ca  
p9p1 UP   08:00:27:76:d9:75  
p7p1.100@p7p1UP   08:00:27:9d:62:9f  

$ ip -br -4 addr show
lo   UNKNOWN  127.0.0.1/8
p7p1 UP   70.0.0.1/24
p8p1 DOWN 80.0.0.1/24
p7p1.100@p7p1UP   200.0.0.1/24

$ ip -br -6 addr show
lo   UNKNOWN  ::1/128
p7p1 UP   7000::1/8 fe80::a00:27ff:fe9d:629f/64
p8p1 DOWN 8000::1/8
p9p1 UP   fe80::a00:27ff:fe76:d975/64
p7p1.100@p7p1UP   fe80::a00:27ff:fe9d:629f/64

$ ip -br  addr show p7p1
p7p1 UP   70.0.0.1/24 7000::1/8 fe80::a00:27ff:fe9d:629f/64

v2: Now with color support!

Signed-off-by: Andy Gospodarek 
---
 include/utils.h   |   1 +
 ip/ip.c   |   5 +-
 ip/ip_common.h|   3 +
 ip/ipaddress.c| 155 +++---
 ip/iplink.c   |   5 +-
 man/man8/ip-link.8.in |   3 +-
 6 files changed, 147 insertions(+), 25 deletions(-)

diff --git a/include/utils.h b/include/utils.h
index 0c57ccd..f77edeb 100644
--- a/include/utils.h
+++ b/include/utils.h
@@ -19,6 +19,7 @@ extern int show_details;
 extern int show_raw;
 extern int resolve_hosts;
 extern int oneline;
+extern int brief;
 extern int timestamp;
 extern int timestamp_short;
 extern const char * _SL_;
diff --git a/ip/ip.c b/ip/ip.c
index e75447e..eea00b8 100644
--- a/ip/ip.c
+++ b/ip/ip.c
@@ -32,6 +32,7 @@ int show_stats;
 int show_details;
 int resolve_hosts;
 int oneline;
+int brief;
 int timestamp;
 const char *_SL_;
 int force;
@@ -55,7 +56,7 @@ static void usage(void)
 "-h[uman-readable] | -iec |\n"
 "-f[amily] { inet | inet6 | ipx | dnet | mpls | bridge | 
link } |\n"
 "-4 | -6 | -I | -D | -B | -0 |\n"
-"-l[oops] { maximum-addr-flush-attempts } |\n"
+"-l[oops] { maximum-addr-flush-attempts } | -br[ief] |\n"
 "-o[neline] | -t[imestamp] | -ts[hort] | -b[atch] 
[filename] |\n"
 "-rc[vbuf] [size] | -n[etns] name | -a[ll] | -c[olor]}\n");
exit(-1);
@@ -250,6 +251,8 @@ int main(int argc, char **argv)
if (argc <= 1)
usage();
batch_file = argv[1];
+   } else if (matches(opt, "-brief") == 0) {
+   ++brief;
} else if (matches(opt, "-rcvbuf") == 0) {
unsigned int size;
 
diff --git a/ip/ip_common.h b/ip/ip_common.h
index f120f5b..f74face 100644
--- a/ip/ip_common.h
+++ b/ip/ip_common.h
@@ -2,6 +2,9 @@ extern int get_operstate(const char *name);
 extern int print_linkinfo(const struct sockaddr_nl *who,
  struct nlmsghdr *n,
  void *arg);
+extern int print_linkinfo_brief(const struct sockaddr_nl *who,
+   struct nlmsghdr *n,
+   void *arg);
 extern int print_addrinfo(const struct sockaddr_nl *who,
  struct nlmsghdr *n,
  void *arg);
diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 13d9c46..bb44a55 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -125,7 +125,10 @@ static void print_link_flags(FILE *fp, unsigned flags, 
unsigned mdown)
fprintf(fp, "%x", flags);
if (mdown)
fprintf(fp, ",M-DOWN");
-   fprintf(fp, "> ");
+   if (brief)
+   fprintf(fp, ">");
+   else
+   fprintf(fp, "> ");
 }
 
 static const char *oper_states[] = {
@@ -138,13 +141,22 @@ static void print_operstate(FILE *f, __u8 state)
if (state >= sizeof(oper_states)/sizeof(oper_states[0]))
fprintf(f, "state %#x ", state);
else {
-   fprintf(f, "state ");
-   if (strcmp(oper_states[state], "UP") == 0)
-   color_fprintf(f, COLOR_OPERSTATE_UP, "%s ", 
oper_states[state]);
-   else if (strcmp(oper_states[state], "DOWN") == 0)
-   color_fprintf(f, COLOR_OPERSTATE_DOWN, "%s ", 
oper_states[state]);
-   else
-   fprintf(f, "%s ", oper_states[state]);
+   if (brief) {
+   if (strcmp(oper_states[state], "UP") == 0)
+   color_fprintf(f, COLOR_OPERSTATE_UP, "%-7s  ", 
oper_states[state]);
+   else if (strcmp(oper_states[state], "DOWN") == 0)
+   color_fprintf(f, COLOR_OPERSTATE_DOWN, "%-7s  
", oper_states[state]);
+   else
+

Re: [PATCH iproute2] add support for brief output for link and addresses

2015-08-24 Thread Andy Gospodarek

On Mon, Aug 24, 2015 at 02:08:29PM -0700, Stephen Hemminger wrote:
> On Mon, 24 Aug 2015 20:41:16 +
> Andy Gospodarek  wrote:
> 
> > This adds support for slightly less output than is normally provided by
> > 'ip link show' and 'ip addr show'.  This is a bit better when you have a
> > host with lots of interfaces.  Sample output:
> > 
> > $ ip -br link show
> > lo   UNKNOWN  00:00:00:00:00:00  
> > p7p1 UP   08:00:27:9d:62:9f  
> > 
> > p8p1 DOWN 08:00:27:dc:d8:ca  
> > 
> > p9p1 UP   08:00:27:76:d9:75  
> > 
> > p7p1.100@p7p1UP   08:00:27:9d:62:9f  
> > 
> > 
> > $ ip -br -4 addr show
> > lo   UNKNOWN  127.0.0.1/8
> > p7p1 UP   70.0.0.1/24
> > p8p1 DOWN 80.0.0.1/24
> > p7p1.100@p7p1UP   200.0.0.1/24
> > 
> > $ ip -br -6 addr show
> > lo   UNKNOWN  ::1/128
> > p7p1 UP   7000::1/8 fe80::a00:27ff:fe9d:629f/64
> > p8p1 DOWN 8000::1/8
> > p9p1 UP   fe80::a00:27ff:fe76:d975/64
> > p7p1.100@p7p1UP   fe80::a00:27ff:fe9d:629f/64
> > 
> > $ ip -br  addr show p7p1
> > p7p1 UP   70.0.0.1/24 7000::1/8 fe80::a00:27ff:fe9d:629f/64
> > 
> > Signed-off-by: Andy Gospodarek 
> 
> Cool, we could colorize this as well :-)

Will do, v2 coming up!

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv5 net-next 00/10] OVS conntrack support

2015-08-24 Thread Joe Stringer

The goal of this series is to allow OVS to send packets through the Linux
kernel connection tracker, and subsequently match on fields populated by
conntrack.

This version addresses the feedback from v4, mostly minor fixes, including
shifting the conntrack init into the per-namespace functions rather than
per-datapath and ensuring the ct_mark/ct_label attributes are re-serialized
when userspace dumps the actions. Users attempting to specify actions that set
ct_labels with a length longer than the supported length will now get flow
rejections. This series also rebases against the latest conntrack zone changes.

This functionality is enabled through the CONFIG_OPENVSWITCH_CONNTRACK option.

The branch below has been updated with the corresponding userspace pieces:
https://github.com/joestringer/ovs dev/ct_20150818

Joe Stringer (10):
  openvswitch: Serialize acts with original netlink len
  openvswitch: Move MASKED* macros to datapath.h
  ipv6: Export nf_ct_frag6_gather()
  dst: Add __skb_dst_copy() variation
  openvswitch: Add conntrack action
  openvswitch: Allow matching on conntrack mark
  netfilter: Always export nf_connlabels_replace()
  netfilter: connlabels: Export setting connlabel length
  openvswitch: Allow matching on conntrack label
  openvswitch: Allow attaching helpers to ct action

 include/net/dst.h   |   9 +-
 include/net/netfilter/nf_conntrack_labels.h |   4 +
 include/uapi/linux/openvswitch.h|  58 +++
 net/ipv6/netfilter/nf_conntrack_reasm.c |   1 +
 net/netfilter/nf_conntrack_labels.c |  34 +-
 net/netfilter/xt_connlabel.c|  16 +-
 net/openvswitch/Kconfig |  11 +
 net/openvswitch/Makefile|   2 +
 net/openvswitch/actions.c   | 229 +++--
 net/openvswitch/conntrack.c | 723 
 net/openvswitch/conntrack.h |  78 +++
 net/openvswitch/datapath.c  |  86 +++-
 net/openvswitch/datapath.h  |  13 +
 net/openvswitch/flow.c  |   6 +-
 net/openvswitch/flow.h  |  11 +-
 net/openvswitch/flow_netlink.c  | 129 -
 net/openvswitch/flow_netlink.h  |  13 +-
 net/openvswitch/vport.c |   1 +
 18 files changed, 1317 insertions(+), 107 deletions(-)
 create mode 100644 net/openvswitch/conntrack.c
 create mode 100644 net/openvswitch/conntrack.h

-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv5 net-next 04/10] dst: Add __skb_dst_copy() variation

2015-08-24 Thread Joe Stringer

This variation on skb_dst_copy() doesn't require two skbs.

Signed-off-by: Joe Stringer 
Acked-by: Pravin B Shelar 
---
v4: Add ack.
v5: No change.
---
 include/net/dst.h | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index 0a9a723..6f282e7 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -286,13 +286,18 @@ static inline void skb_dst_drop(struct sk_buff *skb)
}
 }
 
-static inline void skb_dst_copy(struct sk_buff *nskb, const struct sk_buff 
*oskb)
+static inline void __skb_dst_copy(struct sk_buff *nskb, unsigned long refdst)
 {
-   nskb->_skb_refdst = oskb->_skb_refdst;
+   nskb->_skb_refdst = refdst;
if (!(nskb->_skb_refdst & SKB_DST_NOREF))
dst_clone(skb_dst(nskb));
 }
 
+static inline void skb_dst_copy(struct sk_buff *nskb, const struct sk_buff 
*oskb)
+{
+   __skb_dst_copy(nskb, oskb->_skb_refdst);
+}
+
 /**
  * skb_dst_force - makes sure skb dst is refcounted
  * @skb: buffer
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv5 net-next 01/10] openvswitch: Serialize acts with original netlink len

2015-08-24 Thread Joe Stringer

Previously, we used the kernel-internal netlink actions length to
calculate the size of messages to serialize back to userspace.
However,the sw_flow_actions may not be formatted exactly the same as the
actions on the wire, so store the original actions length when
de-serializing and re-use the original length when serializing.

Signed-off-by: Joe Stringer 
Acked-by: Pravin B Shelar 
---
v2: No change.
v3: Preserve original length across buffer resize.
v4: Add ack.
v5: No change.
---
 net/openvswitch/datapath.c | 2 +-
 net/openvswitch/flow.h | 1 +
 net/openvswitch/flow_netlink.c | 2 ++
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index ffe984f..d5b5473 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -713,7 +713,7 @@ static size_t ovs_flow_cmd_msg_size(const struct 
sw_flow_actions *acts,
 
/* OVS_FLOW_ATTR_ACTIONS */
if (should_fill_actions(ufid_flags))
-   len += nla_total_size(acts->actions_len);
+   len += nla_total_size(acts->orig_len);
 
return len
+ nla_total_size(sizeof(struct ovs_flow_stats)) /* 
OVS_FLOW_ATTR_STATS */
diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
index b62cdb3..082a87b 100644
--- a/net/openvswitch/flow.h
+++ b/net/openvswitch/flow.h
@@ -144,6 +144,7 @@ struct sw_flow_id {
 
 struct sw_flow_actions {
struct rcu_head rcu;
+   size_t orig_len;/* From flow_cmd_new netlink actions size */
u32 actions_len;
struct nlattr actions[];
 };
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index 4e7a3f7..c182b28 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -1619,6 +1619,7 @@ static struct nlattr *reserve_sfa_size(struct 
sw_flow_actions **sfa,
 
memcpy(acts->actions, (*sfa)->actions, (*sfa)->actions_len);
acts->actions_len = (*sfa)->actions_len;
+   acts->orig_len = (*sfa)->orig_len;
kfree(*sfa);
*sfa = acts;
 
@@ -2223,6 +2224,7 @@ int ovs_nla_copy_actions(const struct nlattr *attr,
if (IS_ERR(*sfa))
return PTR_ERR(*sfa);
 
+   (*sfa)->orig_len = nla_len(attr);
err = __ovs_nla_copy_actions(attr, key, 0, sfa, key->eth.type,
 key->eth.tci, log);
if (err)
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv5 net-next 03/10] ipv6: Export nf_ct_frag6_gather()

2015-08-24 Thread Joe Stringer

Signed-off-by: Joe Stringer 
Acked-by: Thomas Graf 
Acked-by: Pravin B Shelar 
---
v4: Add ack.
v5: No change.
---
 net/ipv6/netfilter/nf_conntrack_reasm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c 
b/net/ipv6/netfilter/nf_conntrack_reasm.c
index 6d02498..701cd2b 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -633,6 +633,7 @@ ret_orig:
kfree_skb(clone);
return skb;
 }
+EXPORT_SYMBOL_GPL(nf_ct_frag6_gather);
 
 void nf_ct_frag6_consume_orig(struct sk_buff *skb)
 {
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv5 net-next 02/10] openvswitch: Move MASKED* macros to datapath.h

2015-08-24 Thread Joe Stringer

This will allow the ovs-conntrack code to reuse these macros.

Signed-off-by: Joe Stringer 
Acked-by: Thomas Graf 
Acked-by: Pravin B Shelar 
---
v4: Add ack.
v5: No change.
---
 net/openvswitch/actions.c  | 52 ++
 net/openvswitch/datapath.h |  4 
 2 files changed, 29 insertions(+), 27 deletions(-)

diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 4f42007..520438b 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -185,10 +185,6 @@ static int pop_mpls(struct sk_buff *skb, struct 
sw_flow_key *key,
return 0;
 }
 
-/* 'KEY' must not have any bits set outside of the 'MASK' */
-#define MASKED(OLD, KEY, MASK) ((KEY) | ((OLD) & ~(MASK)))
-#define SET_MASKED(OLD, KEY, MASK) ((OLD) = MASKED(OLD, KEY, MASK))
-
 static int set_mpls(struct sk_buff *skb, struct sw_flow_key *flow_key,
const __be32 *mpls_lse, const __be32 *mask)
 {
@@ -201,7 +197,7 @@ static int set_mpls(struct sk_buff *skb, struct sw_flow_key 
*flow_key,
return err;
 
stack = (__be32 *)skb_mpls_header(skb);
-   lse = MASKED(*stack, *mpls_lse, *mask);
+   lse = OVS_MASKED(*stack, *mpls_lse, *mask);
if (skb->ip_summed == CHECKSUM_COMPLETE) {
__be32 diff[] = { ~(*stack), lse };
 
@@ -244,9 +240,9 @@ static void ether_addr_copy_masked(u8 *dst_, const u8 
*src_, const u8 *mask_)
const u16 *src = (const u16 *)src_;
const u16 *mask = (const u16 *)mask_;
 
-   SET_MASKED(dst[0], src[0], mask[0]);
-   SET_MASKED(dst[1], src[1], mask[1]);
-   SET_MASKED(dst[2], src[2], mask[2]);
+   OVS_SET_MASKED(dst[0], src[0], mask[0]);
+   OVS_SET_MASKED(dst[1], src[1], mask[1]);
+   OVS_SET_MASKED(dst[2], src[2], mask[2]);
 }
 
 static int set_eth_addr(struct sk_buff *skb, struct sw_flow_key *flow_key,
@@ -338,10 +334,10 @@ static void update_ipv6_checksum(struct sk_buff *skb, u8 
l4_proto,
 static void mask_ipv6_addr(const __be32 old[4], const __be32 addr[4],
   const __be32 mask[4], __be32 masked[4])
 {
-   masked[0] = MASKED(old[0], addr[0], mask[0]);
-   masked[1] = MASKED(old[1], addr[1], mask[1]);
-   masked[2] = MASKED(old[2], addr[2], mask[2]);
-   masked[3] = MASKED(old[3], addr[3], mask[3]);
+   masked[0] = OVS_MASKED(old[0], addr[0], mask[0]);
+   masked[1] = OVS_MASKED(old[1], addr[1], mask[1]);
+   masked[2] = OVS_MASKED(old[2], addr[2], mask[2]);
+   masked[3] = OVS_MASKED(old[3], addr[3], mask[3]);
 }
 
 static void set_ipv6_addr(struct sk_buff *skb, u8 l4_proto,
@@ -358,15 +354,15 @@ static void set_ipv6_addr(struct sk_buff *skb, u8 
l4_proto,
 static void set_ipv6_fl(struct ipv6hdr *nh, u32 fl, u32 mask)
 {
/* Bits 21-24 are always unmasked, so this retains their values. */
-   SET_MASKED(nh->flow_lbl[0], (u8)(fl >> 16), (u8)(mask >> 16));
-   SET_MASKED(nh->flow_lbl[1], (u8)(fl >> 8), (u8)(mask >> 8));
-   SET_MASKED(nh->flow_lbl[2], (u8)fl, (u8)mask);
+   OVS_SET_MASKED(nh->flow_lbl[0], (u8)(fl >> 16), (u8)(mask >> 16));
+   OVS_SET_MASKED(nh->flow_lbl[1], (u8)(fl >> 8), (u8)(mask >> 8));
+   OVS_SET_MASKED(nh->flow_lbl[2], (u8)fl, (u8)mask);
 }
 
 static void set_ip_ttl(struct sk_buff *skb, struct iphdr *nh, u8 new_ttl,
   u8 mask)
 {
-   new_ttl = MASKED(nh->ttl, new_ttl, mask);
+   new_ttl = OVS_MASKED(nh->ttl, new_ttl, mask);
 
csum_replace2(&nh->check, htons(nh->ttl << 8), htons(new_ttl << 8));
nh->ttl = new_ttl;
@@ -392,7 +388,7 @@ static int set_ipv4(struct sk_buff *skb, struct sw_flow_key 
*flow_key,
 * makes sense to check if the value actually changed.
 */
if (mask->ipv4_src) {
-   new_addr = MASKED(nh->saddr, key->ipv4_src, mask->ipv4_src);
+   new_addr = OVS_MASKED(nh->saddr, key->ipv4_src, mask->ipv4_src);
 
if (unlikely(new_addr != nh->saddr)) {
set_ip_addr(skb, nh, &nh->saddr, new_addr);
@@ -400,7 +396,7 @@ static int set_ipv4(struct sk_buff *skb, struct sw_flow_key 
*flow_key,
}
}
if (mask->ipv4_dst) {
-   new_addr = MASKED(nh->daddr, key->ipv4_dst, mask->ipv4_dst);
+   new_addr = OVS_MASKED(nh->daddr, key->ipv4_dst, mask->ipv4_dst);
 
if (unlikely(new_addr != nh->daddr)) {
set_ip_addr(skb, nh, &nh->daddr, new_addr);
@@ -488,7 +484,8 @@ static int set_ipv6(struct sk_buff *skb, struct sw_flow_key 
*flow_key,
*(__be32 *)nh & htonl(IPV6_FLOWINFO_FLOWLABEL);
}
if (mask->ipv6_hlimit) {
-   SET_MASKED(nh->hop_limit, key->ipv6_hlimit, mask->ipv6_hlimit);
+   OVS_SET_MASKED(nh->hop_limit, key->ipv6_hlimit,
+  mask->ipv6_hlimit);
flow_key->ip.ttl = nh->hop_limit;
}
return 0;
@@ -517,8 +514,8 @@ static int

Re: [PATCH net-next 09/13] vxlan: provide access function for vxlan socket address family

2015-08-24 Thread Rustad, Mark D

> On Aug 18, 2015, at 1:33 PM, Jiri Benc  wrote:
> 
> Signed-off-by: Jiri Benc 
> ---
> drivers/net/vxlan.c | 8 
> include/net/vxlan.h | 5 +
> 2 files changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index e4b8ab63d0fa..d5ca1d7e0b81 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -236,7 +236,7 @@ static struct vxlan_sock *vxlan_find_sock(struct net 
> *net, sa_family_t family,
> 
>   hlist_for_each_entry_rcu(vs, vs_head(net, port), hlist) {
>   if (inet_sk(vs->sock->sk)->inet_sport == port &&
> - inet_sk(vs->sock->sk)->sk.sk_family == family &&
> + vxlan_get_sk_family(vs) == family &&
>   vs->flags == flags)
>   return vs;
>   }
> @@ -625,7 +625,7 @@ static void vxlan_notify_add_rx_port(struct vxlan_sock 
> *vs)
>   struct net_device *dev;
>   struct sock *sk = vs->sock->sk;
>   struct net *net = sock_net(sk);
> - sa_family_t sa_family = sk->sk_family;
> + sa_family_t sa_family = vxlan_get_sk_family(vs);
>   __be16 port = inet_sk(sk)->inet_sport;
>   int err;
> 
> @@ -650,7 +650,7 @@ static void vxlan_notify_del_rx_port(struct vxlan_sock 
> *vs)
>   struct net_device *dev;
>   struct sock *sk = vs->sock->sk;
>   struct net *net = sock_net(sk);
> - sa_family_t sa_family = sk->sk_family;
> + sa_family_t sa_family = vxlan_get_sk_family(vs);
>   __be16 port = inet_sk(sk)->inet_sport;
> 
>   rcu_read_lock();
> @@ -2390,7 +2390,7 @@ void vxlan_get_rx_port(struct net_device *dev)
>   for (i = 0; i < PORT_HASH_SIZE; ++i) {
>   hlist_for_each_entry_rcu(vs, &vn->sock_list[i], hlist) {
>   port = inet_sk(vs->sock->sk)->inet_sport;
> - sa_family = vs->sock->sk->sk_family;
> + sa_family = vxlan_get_sk_family(vs);
>   dev->netdev_ops->ndo_add_vxlan_port(dev, sa_family,
>   port);
>   }
> diff --git a/include/net/vxlan.h b/include/net/vxlan.h
> index e4534f1b2d8c..43677e6b9c43 100644
> --- a/include/net/vxlan.h
> +++ b/include/net/vxlan.h
> @@ -241,3 +241,8 @@ static inline void vxlan_get_rx_port(struct net_device 
> *netdev)
> }
> #endif
> #endif
> +
> +static inline unsigned short vxlan_get_sk_family(struct vxlan_sock *vs)
> +{
> + return vs->sock->sk->sk_family;
> +}

This causes build problems because vxlan_get_sk_family is not inside the #endif
protecting the file for multiple inclusion. Please put vxlan_get_sk_family
inside the last #endif.

--
Mark Rustad, Networking Division, Intel Corporation



signature.asc
Description: Message signed with OpenPGP using GPGMail

[PATCHv5 net-next 05/10] openvswitch: Add conntrack action

2015-08-24 Thread Joe Stringer

Expose the kernel connection tracker via OVS. Userspace components can
make use of the CT action to populate the connection state (ct_state)
field for a flow. This state can be subsequently matched.

Exposed connection states are OVS_CS_F_*:
- NEW (0x01) - Beginning of a new connection.
- ESTABLISHED (0x02) - Part of an existing connection.
- RELATED (0x04) - Related to an established connection.
- INVALID (0x20) - Could not track the connection for this packet.
- REPLY_DIR (0x40) - This packet is in the reply direction for the flow.
- TRACKED (0x80) - This packet has been sent through conntrack.

When the CT action is executed by itself, it will send the packet
through the connection tracker and populate the ct_state field with one
or more of the connection state flags above. The CT action will always
set the TRACKED bit.

When the COMMIT flag is passed to the conntrack action, this specifies
that information about the connection should be stored. This allows
subsequent packets for the same (or related) connections to be
correlated with this connection. Sending subsequent packets for the
connection through conntrack allows the connection tracker to consider
the packets as ESTABLISHED, RELATED, and/or REPLY_DIR.

The CT action may optionally take a zone to track the flow within. This
allows connections with the same 5-tuple to be kept logically separate
from connections in other zones. If the zone is specified, then the
"ct_zone" match field will be subsequently populated with the zone id.

IP fragments are handled by transparently assembling them as part of the
CT action. The maximum received unit (MRU) size is tracked so that
refragmentation can occur during output.

IP frag handling contributed by Andy Zhou.

Signed-off-by: Joe Stringer 
Signed-off-by: Justin Pettit 
Signed-off-by: Andy Zhou 
---
This can be tested with the corresponding userspace component here:
https://www.github.com/justinpettit/openvswitch conntrack

v2: Don't take references to devs or dsts in output path.
Shift ovs_ct_init()/ovs_ct_exit() into this patch
Handle output case where flow key is invalidated
Store the entire L2 header to apply to fragments
Various minor simplifications
Improve comments/logs
Style fixes
Rebase
v3: Clone dst in output, free final dst reference properly.
Handle CHECKSUM_COMPLETE after fragmentation
Restore L2 skb metadata after fragmentation
Make MRU types more consistent
Better cleanup in error paths
Fix sparse warnings
v4: Reject set_field actions for ct_state,ct_zone
Combine key->ct update from skb->nfct into a single function.
Minor documentation tweaks.
Simplify some codepaths.
v5: Fix ovs_ct_verify().
Don't take references on nf_conntrack_ipv[46]
Replace some #ifdefs with IS_ENABLED.
Remove unused functions.
Rebase.
---
 include/uapi/linux/openvswitch.h |  40 
 net/openvswitch/Kconfig  |  11 +
 net/openvswitch/Makefile |   2 +
 net/openvswitch/actions.c| 175 +++-
 net/openvswitch/conntrack.c  | 442 +++
 net/openvswitch/conntrack.h  |  70 +++
 net/openvswitch/datapath.c   |  66 --
 net/openvswitch/datapath.h   |   6 +
 net/openvswitch/flow.c   |   2 +
 net/openvswitch/flow.h   |   6 +
 net/openvswitch/flow_netlink.c   |  72 +--
 net/openvswitch/flow_netlink.h   |   4 +-
 net/openvswitch/vport.c  |   1 +
 13 files changed, 860 insertions(+), 37 deletions(-)
 create mode 100644 net/openvswitch/conntrack.c
 create mode 100644 net/openvswitch/conntrack.h

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index d6b8854..55f5997 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -164,6 +164,9 @@ enum ovs_packet_cmd {
  * %OVS_USERSPACE_ATTR_EGRESS_TUN_PORT attribute, which is sent only if the
  * output port is actually a tunnel port. Contains the output tunnel key
  * extracted from the packet as nested %OVS_TUNNEL_KEY_ATTR_* attributes.
+ * @OVS_PACKET_ATTR_MRU: Present for an %OVS_PACKET_CMD_ACTION and
+ * %OVS_PACKET_ATTR_USERSPACE action specify the Maximum received fragment
+ * size.
  *
  * These attributes follow the &struct ovs_header within the Generic Netlink
  * payload for %OVS_PACKET_* commands.
@@ -180,6 +183,7 @@ enum ovs_packet_attr {
OVS_PACKET_ATTR_UNUSED2,
OVS_PACKET_ATTR_PROBE,  /* Packet operation is a feature probe,
   error logging should be suppressed. */
+   OVS_PACKET_ATTR_MRU,/* Maximum received IP fragment size. */
__OVS_PACKET_ATTR_MAX
 };
 
@@ -319,6 +323,8 @@ enum ovs_key_attr {
OVS_KEY_ATTR_MPLS,  /* array of struct ovs_key_mpls.
 * The implementation may restrict
 * the accepted length of the array. */
+   OVS_KEY_ATTR_CT_STATE,  /* u8 bitmask of OVS_CS_

[PATCHv5 net-next 08/10] netfilter: connlabels: Export setting connlabel length

2015-08-24 Thread Joe Stringer

Add functions to change connlabel length into nf_conntrack_labels.c so
they may be reused by other modules like OVS and nftables without
needing to jump through xt_match_check() hoops.

Suggested-by: Florian Westphal 
Signed-off-by: Joe Stringer 
Acked-by: Florian Westphal 
Acked-by: Thomas Graf 
---
v2: Protect connlabel modification with spinlock.
Fix reference leak in error case.
Style fixups.
v3: No change.
v4-v5: Add acks.
---
 include/net/netfilter/nf_conntrack_labels.h |  4 
 net/netfilter/nf_conntrack_labels.c | 32 +
 net/netfilter/xt_connlabel.c| 16 ---
 3 files changed, 40 insertions(+), 12 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack_labels.h 
b/include/net/netfilter/nf_conntrack_labels.h
index dec6336..7e2b1d0 100644
--- a/include/net/netfilter/nf_conntrack_labels.h
+++ b/include/net/netfilter/nf_conntrack_labels.h
@@ -54,7 +54,11 @@ int nf_connlabels_replace(struct nf_conn *ct,
 #ifdef CONFIG_NF_CONNTRACK_LABELS
 int nf_conntrack_labels_init(void);
 void nf_conntrack_labels_fini(void);
+int nf_connlabels_get(struct net *net, unsigned int n_bits);
+void nf_connlabels_put(struct net *net);
 #else
 static inline int nf_conntrack_labels_init(void) { return 0; }
 static inline void nf_conntrack_labels_fini(void) {}
+static inline int nf_connlabels_get(struct net *net, unsigned int n_bits) { 
return 0; }
+static inline void nf_connlabels_put(struct net *net) {}
 #endif
diff --git a/net/netfilter/nf_conntrack_labels.c 
b/net/netfilter/nf_conntrack_labels.c
index daa7c13..3ce5c31 100644
--- a/net/netfilter/nf_conntrack_labels.c
+++ b/net/netfilter/nf_conntrack_labels.c
@@ -14,6 +14,8 @@
 #include 
 #include 
 
+static spinlock_t nf_connlabels_lock;
+
 static unsigned int label_bits(const struct nf_conn_labels *l)
 {
unsigned int longs = l->words;
@@ -89,6 +91,35 @@ int nf_connlabels_replace(struct nf_conn *ct,
 }
 EXPORT_SYMBOL_GPL(nf_connlabels_replace);
 
+int nf_connlabels_get(struct net *net, unsigned int n_bits)
+{
+   size_t words;
+
+   if (n_bits > (NF_CT_LABELS_MAX_SIZE * BITS_PER_BYTE))
+   return -ERANGE;
+
+   words = BITS_TO_LONGS(n_bits);
+
+   spin_lock(&nf_connlabels_lock);
+   net->ct.labels_used++;
+   if (words > net->ct.label_words)
+   net->ct.label_words = words;
+   spin_unlock(&nf_connlabels_lock);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(nf_connlabels_get);
+
+void nf_connlabels_put(struct net *net)
+{
+   spin_lock(&nf_connlabels_lock);
+   net->ct.labels_used--;
+   if (net->ct.labels_used == 0)
+   net->ct.label_words = 0;
+   spin_unlock(&nf_connlabels_lock);
+}
+EXPORT_SYMBOL_GPL(nf_connlabels_put);
+
 static struct nf_ct_ext_type labels_extend __read_mostly = {
.len= sizeof(struct nf_conn_labels),
.align  = __alignof__(struct nf_conn_labels),
@@ -97,6 +128,7 @@ static struct nf_ct_ext_type labels_extend __read_mostly = {
 
 int nf_conntrack_labels_init(void)
 {
+   spin_lock_init(&nf_connlabels_lock);
return nf_ct_extend_register(&labels_extend);
 }
 
diff --git a/net/netfilter/xt_connlabel.c b/net/netfilter/xt_connlabel.c
index 9f8719d..bb9cbeb 100644
--- a/net/netfilter/xt_connlabel.c
+++ b/net/netfilter/xt_connlabel.c
@@ -42,10 +42,6 @@ static int connlabel_mt_check(const struct xt_mtchk_param 
*par)
XT_CONNLABEL_OP_SET;
struct xt_connlabel_mtinfo *info = par->matchinfo;
int ret;
-   size_t words;
-
-   if (info->bit > XT_CONNLABEL_MAXBIT)
-   return -ERANGE;
 
if (info->options & ~options) {
pr_err("Unknown options in mask %x\n", info->options);
@@ -59,19 +55,15 @@ static int connlabel_mt_check(const struct xt_mtchk_param 
*par)
return ret;
}
 
-   par->net->ct.labels_used++;
-   words = BITS_TO_LONGS(info->bit+1);
-   if (words > par->net->ct.label_words)
-   par->net->ct.label_words = words;
-
+   ret = nf_connlabels_get(par->net, info->bit + 1);
+   if (ret < 0)
+   nf_ct_l3proto_module_put(par->family);
return ret;
 }
 
 static void connlabel_mt_destroy(const struct xt_mtdtor_param *par)
 {
-   par->net->ct.labels_used--;
-   if (par->net->ct.labels_used == 0)
-   par->net->ct.label_words = 0;
+   nf_connlabels_put(par->net);
nf_ct_l3proto_module_put(par->family);
 }
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv5 net-next 09/10] openvswitch: Allow matching on conntrack label

2015-08-24 Thread Joe Stringer

Allow matching and setting the ct_label field. As with ct_mark, this is
populated by executing the CT action. The label field may be modified by
specifying a label and mask nested under the CT action. It is stored as
metadata attached to the connection. Label modification occurs after
lookup, and will only persist when the conntrack entry is committed by
providing the COMMIT flag to the CT action. Labels are currently fixed
to 128 bits in size.

Signed-off-by: Joe Stringer 
---
v2: Split out setting the connlabel size for the current namespace.
v3: No change.
v4: Only allow setting label via ct action.
Update documentation.
v5: Fix ovs_ct_verify().
Add label to ct action serialization.
Free label bit length/reference properly.
Configure OVS label length per-netns, not per-dp.
Reject ct actions with label length longer than supported.
Replace some #ifdefs with IS_ENABLED.
Rebase.
---
 include/uapi/linux/openvswitch.h |  10 
 net/openvswitch/actions.c|   1 +
 net/openvswitch/conntrack.c  | 123 ++-
 net/openvswitch/conntrack.h  |  11 +++-
 net/openvswitch/datapath.c   |  18 +++---
 net/openvswitch/datapath.h   |   3 +
 net/openvswitch/flow.c   |   4 +-
 net/openvswitch/flow.h   |   3 +-
 net/openvswitch/flow_netlink.c   |  50 +++-
 net/openvswitch/flow_netlink.h   |   9 +--
 10 files changed, 198 insertions(+), 34 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 7a185b5..9d52058 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -326,6 +326,7 @@ enum ovs_key_attr {
OVS_KEY_ATTR_CT_STATE,  /* u8 bitmask of OVS_CS_F_* */
OVS_KEY_ATTR_CT_ZONE,   /* u16 connection tracking zone. */
OVS_KEY_ATTR_CT_MARK,   /* u32 connection tracking mark */
+   OVS_KEY_ATTR_CT_LABEL,  /* 16-octet connection tracking label */
 
 #ifdef __KERNEL__
OVS_KEY_ATTR_TUNNEL_INFO,  /* struct ip_tunnel_info */
@@ -438,6 +439,11 @@ struct ovs_key_nd {
__u8nd_tll[ETH_ALEN];
 };
 
+#define OVS_CT_LABEL_LEN   16
+struct ovs_key_ct_label {
+   __u8ct_label[OVS_CT_LABEL_LEN];
+};
+
 /* OVS_KEY_ATTR_CT_STATE flags */
 #define OVS_CS_F_NEW   0x01 /* Beginning of a new connection. */
 #define OVS_CS_F_ESTABLISHED   0x02 /* Part of an existing connection. */
@@ -617,12 +623,16 @@ struct ovs_action_hash {
  * @OVS_CT_ATTR_MARK: u32 value followed by u32 mask. For each bit set in the
  * mask, the corresponding bit in the value is copied to the connection
  * tracking mark field in the connection.
+ * @OVS_CT_ATTR_LABEL: %OVS_CT_LABEL_LEN value followed by %OVS_CT_LABEL_LEN
+ * mask. For each bit set in the mask, the corresponding bit in the value is
+ * copied to the connection tracking label field in the connection.
  */
 enum ovs_ct_attr {
OVS_CT_ATTR_UNSPEC,
OVS_CT_ATTR_FLAGS,  /* u8 bitmask of OVS_CT_F_*. */
OVS_CT_ATTR_ZONE,   /* u16 zone id. */
OVS_CT_ATTR_MARK,   /* mark to associate with this connection. */
+   OVS_CT_ATTR_LABEL,  /* label to associate with this connection. */
__OVS_CT_ATTR_MAX
 };
 
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 9741d2c..736a113 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -969,6 +969,7 @@ static int execute_masked_set_action(struct sk_buff *skb,
case OVS_KEY_ATTR_CT_STATE:
case OVS_KEY_ATTR_CT_ZONE:
case OVS_KEY_ATTR_CT_MARK:
+   case OVS_KEY_ATTR_CT_LABEL:
err = -EINVAL;
break;
}
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index daea29e..8cb0987 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -34,6 +35,12 @@ struct md_mark {
u32 mask;
 };
 
+/* Metadata label for masked write to conntrack label. */
+struct md_label {
+   struct ovs_key_ct_label value;
+   struct ovs_key_ct_label mask;
+};
+
 /* Conntrack action context for execution. */
 struct ovs_conntrack_info {
struct nf_conntrack_zone zone;
@@ -41,6 +48,7 @@ struct ovs_conntrack_info {
u32 flags;
u16 family;
struct md_mark mark;
+   struct md_label label;
 };
 
 static u16 key_to_nfproto(const struct sw_flow_key *key)
@@ -90,6 +98,24 @@ static u8 ovs_ct_get_state(enum ip_conntrack_info ctinfo)
return ct_state;
 }
 
+static void ovs_ct_get_label(const struct nf_conn *ct,
+struct ovs_key_ct_label *label)
+{
+   struct nf_conn_labels *cl = ct ? nf_ct_labels_find(ct) : NULL;
+
+   if (cl) {
+   size_t len = cl->words * sizeof(long);
+
+   if (len > OVS_CT_LABEL_LEN)
+   len = OVS_CT_LABEL_LEN;
+   else if (le

[PATCHv5 net-next 07/10] netfilter: Always export nf_connlabels_replace()

2015-08-24 Thread Joe Stringer

The following patches will reuse this code from OVS.

Signed-off-by: Joe Stringer 
Acked-by: Pravin B Shelar 
Acked-by: Thomas Graf 
---
v2-v4: No change.
v5: Add acks.
---
 net/netfilter/nf_conntrack_labels.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/netfilter/nf_conntrack_labels.c 
b/net/netfilter/nf_conntrack_labels.c
index bb53f12..daa7c13 100644
--- a/net/netfilter/nf_conntrack_labels.c
+++ b/net/netfilter/nf_conntrack_labels.c
@@ -48,7 +48,6 @@ int nf_connlabel_set(struct nf_conn *ct, u16 bit)
 }
 EXPORT_SYMBOL_GPL(nf_connlabel_set);
 
-#if IS_ENABLED(CONFIG_NF_CT_NETLINK)
 static void replace_u32(u32 *address, u32 mask, u32 new)
 {
u32 old, tmp;
@@ -89,7 +88,6 @@ int nf_connlabels_replace(struct nf_conn *ct,
return 0;
 }
 EXPORT_SYMBOL_GPL(nf_connlabels_replace);
-#endif
 
 static struct nf_ct_ext_type labels_extend __read_mostly = {
.len= sizeof(struct nf_conn_labels),
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv5 net-next 10/10] openvswitch: Allow attaching helpers to ct action

2015-08-24 Thread Joe Stringer

Add support for using conntrack helpers to assist protocol detection.
The new OVS_CT_ATTR_HELPER attribute of the CT action specifies a helper
to be used for this connection. If no helper is specified, then helpers
will be automatically applied as per the sysctl configuration of
net.netfilter.nf_conntrack_helper.

The helper may be specified as part of the conntrack action, eg:
ct(helper=ftp). Initial packets for related connections should be
committed to allow later packets for the flow to be considered
established.

Example ovs-ofctl flows allowing FTP connections from ports 1->2:
in_port=1,tcp,action=ct(helper=ftp,commit),2
in_port=2,tcp,ct_state=-trk,action=ct(recirc)
in_port=2,tcp,ct_state=+trk-new+est,action=1
in_port=2,tcp,ct_state=+trk+rel,action=1

Signed-off-by: Joe Stringer 
---
v2-v3: No change.
v4: Change error code for unknown helper ENOENT->EINVAL.
v5: Fix rcu access of helpers.
Rebase.
---
 include/uapi/linux/openvswitch.h |   3 ++
 net/openvswitch/conntrack.c  | 109 ++-
 2 files changed, 110 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 9d52058..32e07d8 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -626,6 +626,7 @@ struct ovs_action_hash {
  * @OVS_CT_ATTR_LABEL: %OVS_CT_LABEL_LEN value followed by %OVS_CT_LABEL_LEN
  * mask. For each bit set in the mask, the corresponding bit in the value is
  * copied to the connection tracking label field in the connection.
+ * @OVS_CT_ATTR_HELPER: variable length string defining conntrack ALG.
  */
 enum ovs_ct_attr {
OVS_CT_ATTR_UNSPEC,
@@ -633,6 +634,8 @@ enum ovs_ct_attr {
OVS_CT_ATTR_ZONE,   /* u16 zone id. */
OVS_CT_ATTR_MARK,   /* mark to associate with this connection. */
OVS_CT_ATTR_LABEL,  /* label to associate with this connection. */
+   OVS_CT_ATTR_HELPER, /* netlink helper to assist detection of
+  related connections. */
__OVS_CT_ATTR_MAX
 };
 
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 8cb0987..ac6d1d2 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -43,6 +44,7 @@ struct md_label {
 
 /* Conntrack action context for execution. */
 struct ovs_conntrack_info {
+   struct nf_conntrack_helper *helper;
struct nf_conntrack_zone zone;
struct nf_conn *ct;
u32 flags;
@@ -213,6 +215,51 @@ static int ovs_ct_set_label(struct sk_buff *skb, struct 
sw_flow_key *key,
return 0;
 }
 
+/* 'skb' should already be pulled to nh_ofs. */
+static int ovs_ct_helper(struct sk_buff *skb, u16 proto)
+{
+   const struct nf_conntrack_helper *helper;
+   const struct nf_conn_help *help;
+   enum ip_conntrack_info ctinfo;
+   unsigned int protoff;
+   struct nf_conn *ct;
+
+   ct = nf_ct_get(skb, &ctinfo);
+   if (!ct || ctinfo == IP_CT_RELATED_REPLY)
+   return NF_ACCEPT;
+
+   help = nfct_help(ct);
+   if (!help)
+   return NF_ACCEPT;
+
+   helper = rcu_dereference(help->helper);
+   if (!helper)
+   return NF_ACCEPT;
+
+   switch (proto) {
+   case NFPROTO_IPV4:
+   protoff = ip_hdrlen(skb);
+   break;
+   case NFPROTO_IPV6: {
+   u8 nexthdr = ipv6_hdr(skb)->nexthdr;
+   __be16 frag_off;
+
+   protoff = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr),
+  &nexthdr, &frag_off);
+   if (protoff < 0 || (frag_off & htons(~0x7)) != 0) {
+   pr_debug("proto header not found\n");
+   return NF_ACCEPT;
+   }
+   break;
+   }
+   default:
+   WARN_ONCE(1, "helper invoked on non-IP family!");
+   return NF_DROP;
+   }
+
+   return helper->help(skb, protoff, ct, ctinfo);
+}
+
 static int handle_fragments(struct net *net, struct sw_flow_key *key,
u16 zone, struct sk_buff *skb)
 {
@@ -285,6 +332,13 @@ static bool skb_nfct_cached(const struct net *net, const 
struct sk_buff *skb,
return false;
if (!nf_ct_zone_equal_any(info->ct, nf_ct_zone(ct)))
return false;
+   if (info->helper) {
+   struct nf_conn_help *help;
+
+   help = nf_ct_ext_find(ct, NF_CT_EXT_HELPER);
+   if (help && rcu_access_pointer(help->helper) != info->helper)
+   return false;
+   }
 
return true;
 }
@@ -313,6 +367,11 @@ static int __ovs_ct_lookup(struct net *net, const struct 
sw_flow_key *key,
if (nf_conntrack_in(net, info->family, NF_INET_PRE_ROUTING,
skb) != NF_ACCEPT)

[PATCHv5 net-next 06/10] openvswitch: Allow matching on conntrack mark

2015-08-24 Thread Joe Stringer

Allow matching and setting the ct_mark field. As with ct_state and
ct_zone, these fields are populated when the CT action is executed. To
write to this field, a value and mask can be specified as a nested
attribute under the CT action. This data is stored with the conntrack
entry, and is executed after the lookup occurs for the CT action. The
conntrack entry itself must be committed using the COMMIT flag in the CT
action flags for this change to persist.

Signed-off-by: Justin Pettit 
Signed-off-by: Joe Stringer 
---
v1-v3: No change.
v4: Only allow setting conntrack mark via ct action.
Documentation tweaks.
v5: Rebase against conntrack zone changes.
Add ct_mark to ct action serialization
Replace some #ifdefs with IS_ENABLED.
---
 include/uapi/linux/openvswitch.h |  5 
 net/openvswitch/actions.c|  1 +
 net/openvswitch/conntrack.c  | 63 ++--
 net/openvswitch/conntrack.h  |  1 +
 net/openvswitch/flow.h   |  1 +
 net/openvswitch/flow_netlink.c   | 15 +-
 6 files changed, 82 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 55f5997..7a185b5 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -325,6 +325,7 @@ enum ovs_key_attr {
 * the accepted length of the array. */
OVS_KEY_ATTR_CT_STATE,  /* u8 bitmask of OVS_CS_F_* */
OVS_KEY_ATTR_CT_ZONE,   /* u16 connection tracking zone. */
+   OVS_KEY_ATTR_CT_MARK,   /* u32 connection tracking mark */
 
 #ifdef __KERNEL__
OVS_KEY_ATTR_TUNNEL_INFO,  /* struct ip_tunnel_info */
@@ -613,11 +614,15 @@ struct ovs_action_hash {
  * enum ovs_ct_attr - Attributes for %OVS_ACTION_ATTR_CT action.
  * @OVS_CT_ATTR_FLAGS: u32 connection tracking flags.
  * @OVS_CT_ATTR_ZONE: u16 connection tracking zone.
+ * @OVS_CT_ATTR_MARK: u32 value followed by u32 mask. For each bit set in the
+ * mask, the corresponding bit in the value is copied to the connection
+ * tracking mark field in the connection.
  */
 enum ovs_ct_attr {
OVS_CT_ATTR_UNSPEC,
OVS_CT_ATTR_FLAGS,  /* u8 bitmask of OVS_CT_F_*. */
OVS_CT_ATTR_ZONE,   /* u16 zone id. */
+   OVS_CT_ATTR_MARK,   /* mark to associate with this connection. */
__OVS_CT_ATTR_MAX
 };
 
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 72ca2c4..9741d2c 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -968,6 +968,7 @@ static int execute_masked_set_action(struct sk_buff *skb,
 
case OVS_KEY_ATTR_CT_STATE:
case OVS_KEY_ATTR_CT_ZONE:
+   case OVS_KEY_ATTR_CT_MARK:
err = -EINVAL;
break;
}
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 4b7c4d7..daea29e 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -28,12 +28,19 @@ struct ovs_ct_len_tbl {
size_t minlen;
 };
 
+/* Metadata mark for masked write to conntrack mark */
+struct md_mark {
+   u32 value;
+   u32 mask;
+};
+
 /* Conntrack action context for execution. */
 struct ovs_conntrack_info {
struct nf_conntrack_zone zone;
struct nf_conn *ct;
u32 flags;
u16 family;
+   struct md_mark mark;
 };
 
 static u16 key_to_nfproto(const struct sw_flow_key *key)
@@ -84,10 +91,12 @@ static u8 ovs_ct_get_state(enum ip_conntrack_info ctinfo)
 }
 
 static void __ovs_ct_update_key(struct sw_flow_key *key, u8 state,
-   const struct nf_conntrack_zone *zone)
+   const struct nf_conntrack_zone *zone,
+   const struct nf_conn *ct)
 {
key->ct.state = state;
key->ct.zone = zone->id;
+   key->ct.mark = ct ? ct->mark : 0;
 }
 
 /* Update 'key' based on skb->nfct. If 'post_ct' is true, then OVS has
@@ -110,7 +119,7 @@ static void ovs_ct_update_key(const struct sk_buff *skb,
} else if (post_ct) {
state = OVS_CS_F_TRACKED | OVS_CS_F_INVALID;
}
-   __ovs_ct_update_key(key, state, zone);
+   __ovs_ct_update_key(key, state, zone, ct);
 }
 
 void ovs_ct_fill_key(const struct sk_buff *skb, struct sw_flow_key *key)
@@ -118,6 +127,31 @@ void ovs_ct_fill_key(const struct sk_buff *skb, struct 
sw_flow_key *key)
ovs_ct_update_key(skb, key, false);
 }
 
+static int ovs_ct_set_mark(struct sk_buff *skb, struct sw_flow_key *key,
+  u32 ct_mark, u32 mask)
+{
+   enum ip_conntrack_info ctinfo;
+   struct nf_conn *ct;
+   u32 new_mark;
+
+   if (!IS_ENABLED(CONFIG_NF_CONNTRACK_MARK))
+   return -ENOTSUPP;
+
+   /* The connection could be invalid, in which case set_mark is no-op. */
+   ct = nf_ct_get(skb, &ctinfo);
+   if (!ct)
+   return 0;
+
+   new_mark = ct_mark | (ct->mark & ~(mask));
+   if (ct->mark !=

Re: Correct way to access MDIO bus - phy.c seems buggy

2015-08-24 Thread Florian Fainelli

On 24/08/15 17:09, Russell King - ARM Linux wrote:
> Hi,
> 
> While trying to track down instability in the FEC driver, I've come
> across this question: what is the correct way to access the MDIO bus?
> 
> Is it via:
> 
>   bus->write()
> 
> where 'bus' is a struct mii_bus, or should it be via mdiobus_write()?
> 
> What I'm seeing in the FEC driver is two thread trying to access the
> MDIO bus simultaneously - one thread trying to do a read, and another
> trying to do a write.  The result is far from pretty with the current
> mainline code, because we can end up re-initialising a spinlock while
> it's held by the fec interrupt handler.
> 
> I think the correct answer is that mdiobus_write() should be used,
> which makes drivers/net/phy/phy.c horribly buggy, as it bypasses the
> locking at the mdiobus level by doing this:

Right the correct way is to use mdiobus_write() which takes the bus mutex.

> 
> mmd_phy_indirect()
> {
>   bus->write(bus, addr, MII_MMD_CTRL, devad);
>   bus->write(bus, addr, MII_MMD_DATA, prtad);
>   bus->write(bus, addr, MII_MMD_CTRL, (devad | MII_MMD_CTRL_NOINCR));
> }
> 
> However, it's not as simple as that, because the whole set of writes
> need to be done atomically.  The mdio bus lock needs to be taken around
> the internals of phy_read_mmd_indirect() and phy_write_mmd_indirect().

Well, yes, the bus lock should be grabbed at the beginning and released
at the end of this function at the very least, good catch.

> 
> This bug can be provoked by running an ethtool command which accesses
> the phy in a tight loop on a SMP platform.  For example:
> 
>   while :; do ethtool --show-eee eth0; done
> 
> Patch will follow tomorrow.

Good thing is it looks like you have isolated the only cases where we do
not grab the MDIO bus mutexm the rest of the code, except
phy_mmd_{read,write}_indirect() looks correct.
-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [v2 02/11] soc/fsl: Introduce DPAA BMan device management driver

2015-08-24 Thread Scott Wood

On Wed, 2015-08-12 at 16:14 -0400, Roy Pledge wrote:
> From: Geoff Thorpe 
> 
> This driver enables the Freescale DPAA 1.0 Buffer Manager block. BMan
> is a hardware buffer pool manager that allows accelerators
> connected to the SoC datapath to acquire and release buffers during
> data processing.
> 
> Signed-off-by: Geoff Thorpe 
> Signed-off-by: Emil Medve 
> Signed-off-by: Roy Pledge 
> ---
>  drivers/soc/Kconfig   |1 +
>  drivers/soc/Makefile  |1 +
>  drivers/soc/fsl/Kconfig   |5 +
>  drivers/soc/fsl/Makefile  |3 +
>  drivers/soc/fsl/qbman/Kconfig |   25 ++
>  drivers/soc/fsl/qbman/Makefile|1 +
>  drivers/soc/fsl/qbman/bman.c  |  553 
> +
>  drivers/soc/fsl/qbman/bman_priv.h |   53 
>  drivers/soc/fsl/qbman/dpaa_sys.h  |   55 
>  9 files changed, 697 insertions(+)
>  create mode 100644 drivers/soc/fsl/Kconfig
>  create mode 100644 drivers/soc/fsl/Makefile
>  create mode 100644 drivers/soc/fsl/qbman/Kconfig
>  create mode 100644 drivers/soc/fsl/qbman/Makefile
>  create mode 100644 drivers/soc/fsl/qbman/bman.c
>  create mode 100644 drivers/soc/fsl/qbman/bman_priv.h
>  create mode 100644 drivers/soc/fsl/qbman/dpaa_sys.h
> 
> diff --git a/drivers/soc/Kconfig b/drivers/soc/Kconfig
> index 96ddecb..4e3c8f4 100644
> --- a/drivers/soc/Kconfig
> +++ b/drivers/soc/Kconfig
> @@ -1,6 +1,7 @@
>  menu "SOC (System On Chip) specific Drivers"
>  
>  source "drivers/soc/mediatek/Kconfig"
> +source "drivers/soc/fsl/Kconfig"
>  source "drivers/soc/qcom/Kconfig"
>  source "drivers/soc/sunxi/Kconfig"
>  source "drivers/soc/ti/Kconfig"
> diff --git a/drivers/soc/Makefile b/drivers/soc/Makefile
> index 7dc7c0d..7adcd97 100644
> --- a/drivers/soc/Makefile
> +++ b/drivers/soc/Makefile
> @@ -3,6 +3,7 @@
>  #
>  
>  obj-$(CONFIG_ARCH_MEDIATEK)  += mediatek/
> +obj-$(CONFIG_FSL_SOC)+= fsl/
>  obj-$(CONFIG_ARCH_QCOM)  += qcom/
>  obj-$(CONFIG_ARCH_SUNXI) += sunxi/
>  obj-$(CONFIG_ARCH_TEGRA) += tegra/
> diff --git a/drivers/soc/fsl/Kconfig b/drivers/soc/fsl/Kconfig
> new file mode 100644
> index 000..daa9c0d
> --- /dev/null
> +++ b/drivers/soc/fsl/Kconfig
> @@ -0,0 +1,5 @@
> +menu "Freescale SOC (System On Chip) specific Drivers"
> +
> +source "drivers/soc/fsl/qbman/Kconfig"
> +
> +endmenu
> diff --git a/drivers/soc/fsl/Makefile b/drivers/soc/fsl/Makefile
> new file mode 100644
> index 000..19e74bb
> --- /dev/null
> +++ b/drivers/soc/fsl/Makefile
> @@ -0,0 +1,3 @@
> +# Common
> +obj-$(CONFIG_FSL_DPA)+= qbman/
> +
> diff --git a/drivers/soc/fsl/qbman/Kconfig b/drivers/soc/fsl/qbman/Kconfig
> new file mode 100644
> index 000..be4ae01
> --- /dev/null
> +++ b/drivers/soc/fsl/qbman/Kconfig
> @@ -0,0 +1,25 @@
> +menuconfig FSL_DPA
> + bool "Freescale DPAA support"
> + depends on FSL_SOC || COMPILE_TEST
> + default n

Drop the COMPILE_TEST -- this driver still has PPCisms that will break the 
build elsewhere.

> + help
> + FSL Data-Path Acceleration Architecture drivers
> +
> + These are not the actual Ethernet driver(s)
> +
> +if FSL_DPA
> +
> +config FSL_DPA_CHECKING
> + bool "additional driver checking"
> + default n
> + help
> + Compiles in additional checks to sanity-check the drivers and
> + any use of it by other code. Not recommended for performance
> +
> +config FSL_BMAN
> + tristate "BMan device management"
> + default n
> + help
> + FSL DPAA BMan driver

Please describe here what BMan is and when it should be enabled.  Why isn't 
it always enabled when DPA is enabled?

> +endif # FSL_DPA
> diff --git a/drivers/soc/fsl/qbman/Makefile b/drivers/soc/fsl/qbman/Makefile
> new file mode 100644
> index 000..02014d9
> --- /dev/null
> +++ b/drivers/soc/fsl/qbman/Makefile
> @@ -0,0 +1 @@
> +obj-$(CONFIG_FSL_BMAN)   += bman.o
> diff --git a/drivers/soc/fsl/qbman/bman.c b/drivers/soc/fsl/qbman/bman.c
> new file mode 100644
> index 000..9a500ce
> --- /dev/null
> +++ b/drivers/soc/fsl/qbman/bman.c
> @@ -0,0 +1,553 @@
> +/* Copyright (c) 2009 - 2015 Freescale Semiconductor, Inc.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions are 
> met:
> + * * Redistributions of source code must retain the above copyright
> + *notice, this list of conditions and the following disclaimer.
> + * * Redistributions in binary form must reproduce the above copyright
> + *notice, this list of conditions and the following disclaimer in the
> + *documentation and/or other materials provided with the distribution.
> + * * Neither the name of Freescale Semiconductor nor the
> + *names of its contributors may be used to endorse or promote products
> + *derived from this software without specific prior writte

Correct way to access MDIO bus - phy.c seems buggy

2015-08-24 Thread Russell King - ARM Linux

Hi,

While trying to track down instability in the FEC driver, I've come
across this question: what is the correct way to access the MDIO bus?

Is it via:

bus->write()

where 'bus' is a struct mii_bus, or should it be via mdiobus_write()?

What I'm seeing in the FEC driver is two thread trying to access the
MDIO bus simultaneously - one thread trying to do a read, and another
trying to do a write.  The result is far from pretty with the current
mainline code, because we can end up re-initialising a spinlock while
it's held by the fec interrupt handler.

I think the correct answer is that mdiobus_write() should be used,
which makes drivers/net/phy/phy.c horribly buggy, as it bypasses the
locking at the mdiobus level by doing this:

mmd_phy_indirect()
{
bus->write(bus, addr, MII_MMD_CTRL, devad);
bus->write(bus, addr, MII_MMD_DATA, prtad);
bus->write(bus, addr, MII_MMD_CTRL, (devad | MII_MMD_CTRL_NOINCR));
}

However, it's not as simple as that, because the whole set of writes
need to be done atomically.  The mdio bus lock needs to be taken around
the internals of phy_read_mmd_indirect() and phy_write_mmd_indirect().

This bug can be provoked by running an ethtool command which accesses
the phy in a tight loop on a SMP platform.  For example:

while :; do ethtool --show-eee eth0; done

Patch will follow tomorrow.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 net-next] r8169: Add values missing in @get_stats64 from HW counters

2015-08-24 Thread Francois Romieu

Corinna Vinschen  :
> On Aug 22 13:23, Francois Romieu wrote:
[...]
> > Sorry, my english was really bad:
> > 
> > the code should propagate failure when rtl8169_reset_counters and
> > rtl8169_update_counters *simultaneously* fail.
> 
> Uhm... sorry, but that still doesn't answer the question.  As you can
> see in my patch, the initalization at open time is already encapsulated
> in a function rtl8169_init_counter_offsets. 

I have read your patch, I have already answered the question and I have
already said that it wasn't a showstopper.

-- 
Ueimor
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: iproute2: Behavioural Bug?

2015-08-24 Thread Florian Westphal

Akshat Kakkar  wrote:

[ CC Cong ]

> When I am trying to delete a single tc filter (i.e. specifying its
> handle), it is deleting all the
> filters with the same priority/preference. i.e. it is ignoring the
> handle specified.
> 
> But, When I am doing similar activity in hashtable 800: it is deleting only 
> the
> specified filter, i.e. it is behaving as expected.
> 
> I am unable to comprehend the reason for this difference in behaviour.
> 
> Infact, in kernel 2.6.32 all is working as expected. However, in
> kernel 3.1 and 4.1 it is having the behaviour as mentioned above.
> 
> For example, following set of commands  create a hashtable 15: and add
> 2 filters to it.
> 
> tc filter add dev eth0 parent 1:0 prio 5 handle 15: protocol ip u32 divisor 
> 256
> tc filter add dev eth0 protocol ip parent 1: prio 5 handle 15:2:2 u32
> ht 15:2: match ip src 10.0.0.2 flowid 1:10
> tc filter add dev eth0 protocol ip parent 1: prio 5 handle 15:2:3 u32
> ht 15:2: match ip src 10.0.0.3 flowid 1:10
> 
> Now following command DELETES ALL THE FILTERS, though it should only
> delete FILTER 15:2:3 !
> tc filter del dev eth0 protocol ip parent 1: prio 5 handle 15:2:3 u32
> 
> O/p of tc filter show eth0 is this case is blank. As all filters are deleted.

Happens since

1e052be69d045c8d0f82ff1116fd3e5a79661745
("net_sched: destroy proto tp when all filters are gone").
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [v2 00/11] Freescale DPAA QBMan Drivers

2015-08-24 Thread Scott Wood

On Wed, 2015-08-12 at 16:14 -0400, Roy Pledge wrote:
> The Freescale Data Path Acceleration Architecture (DPAA) is a set of 
> hardware components on specific QorIQ multicore processors. This 
> architecture provides the infrastructure to support simplified sharing of 
> networking interfaces and accelerators by multiple CPU cores and the 
> accelerators.
> 
> The Queue Manager (QMan) is a hardware queue management block that allows 
> software and accelerators on the datapath to enqueue and dequeue frames in 
> order to communicate.
> 
> The Buffer Manager (BMan) is a hardware buffer pool management block that 
> allows software and accelerators on the datapath to acquire and release 
> buffers in order to build frames.
> 
> This patch set introduces the QBMan driver code that configures initializes 
> the QBMan hardware and provides APIs for software to use the frame queues 
> and buffer pools the blocks provide. These drivers provide the base 
> fuctionality for software to communicate with the other DPAA accelerators 
> on Freescale QorIQ processors.
> 
> Changes from v1:
>   - Cleanup Kconfig options
>   - Changed base QMan and BMan drivers to only be buit in.
> Will add loadable support in future patch

CONFIG_FSL_BMAN is tristate -- is it not expected to work if you select 'm'?

>   - Replace panic() call with WARN_ON()

panic() is still there.

> 
> >   - Replaced PowerPC specific IO accessors with platform independent 
> > versions

PowerPC accessors, and other PPC-specfic things like cache flushing and 
memory barriers, are still there.

-Scott

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 net-next 3/8] tunnel: introduce udp_tun_rx_dst()

2015-08-24 Thread Jesse Gross

On Mon, Aug 24, 2015 at 10:43 AM, Pravin B Shelar  wrote:
> Introduce function udp_tun_rx_dst() to initialize tunnel dst on
> receive path.
>
> Signed-off-by: Pravin B Shelar 

Reviewed-by: Jesse Gross 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] net: Fix RCU splat in af_key

2015-08-24 Thread David Miller

From: David Ahern 
Date: Mon, 24 Aug 2015 15:17:17 -0600

> Hit the following splat testing VRF change for ipsec:
 ...
> In pfkey_sendmsg the net mutex is taken and then pfkey_broadcast takes
> the RCU lock.
> 
> Since pfkey_broadcast takes the RCU lock the allocation argument is
> pointless since GFP_ATOMIC must be used between the rcu_read_{,un}lock.
> The one call outside of rcu can be done with GFP_KERNEL.
> 
> Fixes: 7f6b9dbd5afbd ("af_key: locking change")
> Signed-off-by: David Ahern 
> ---
> v2
> - removed allocation arg and hardcoded to GFP_ATOMIC during rcu locking

Applied, thanks David.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] net: Fix RCU splat in af_key

2015-08-24 Thread Eric Dumazet

On Mon, 2015-08-24 at 15:17 -0600, David Ahern wrote:
> Hit the following splat testing VRF change for ipsec:

> 
> In pfkey_sendmsg the net mutex is taken and then pfkey_broadcast takes
> the RCU lock.
> 
> Since pfkey_broadcast takes the RCU lock the allocation argument is
> pointless since GFP_ATOMIC must be used between the rcu_read_{,un}lock.
> The one call outside of rcu can be done with GFP_KERNEL.
> 
> Fixes: 7f6b9dbd5afbd ("af_key: locking change")
> Signed-off-by: David Ahern 
> ---
> v2
> - removed allocation arg and hardcoded to GFP_ATOMIC during rcu locking

Acked-by: Eric Dumazet 


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next] MAINTAINERS: update vmxnet3 driver maintainer

2015-08-24 Thread Shrikrishna Khare

Shreyas Bhatewara would no longer maintain the vmxnet3 driver. Taking over
the role of vmxnet3 maintainer.

Signed-off-by: Shrikrishna Khare 
Signed off-by: Shreyas Bhatewara 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 4e6dcb6..2963a89 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11053,7 +11053,7 @@ F:  drivers/input/mouse/vmmouse.c
 F: drivers/input/mouse/vmmouse.h
 
 VMWARE VMXNET3 ETHERNET DRIVER
-M: Shreyas Bhatewara 
+M: Shrikrishna Khare 
 M: "VMware, Inc." 
 L: netdev@vger.kernel.org
 S: Maintained
-- 
1.8.5.6

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] net: Fix RCU splat in af_key

2015-08-24 Thread David Ahern

Hit the following splat testing VRF change for ipsec:

[  113.475692] ===
[  113.476194] [ INFO: suspicious RCU usage. ]
[  113.476667] 4.2.0-rc6-1+deb7u2+clUNRELEASED #3.2.65-1+deb7u2+clUNRELEASED 
Not tainted
[  113.477545] ---
[  113.478013] /work/monster-14/dsa/kernel.git/include/linux/rcupdate.h:568 
Illegal context switch in RCU read-side critical section!
[  113.479288]
[  113.479288] other info that might help us debug this:
[  113.479288]
[  113.480207]
[  113.480207] rcu_scheduler_active = 1, debug_locks = 1
[  113.480931] 2 locks held by setkey/6829:
[  113.481371]  #0:  (&net->xfrm.xfrm_cfg_mutex){+.+.+.}, at: 
[] pfkey_sendmsg+0xfb/0x213
[  113.482509]  #1:  (rcu_read_lock){..}, at: [] 
rcu_read_lock+0x0/0x6e
[  113.483509]
[  113.483509] stack backtrace:
[  113.484041] CPU: 0 PID: 6829 Comm: setkey Not tainted 
4.2.0-rc6-1+deb7u2+clUNRELEASED #3.2.65-1+deb7u2+clUNRELEASED
[  113.485422] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014
[  113.486845]  0001 88001d4c7a98 81518af2 
81086962
[  113.487732]  88001d538480 88001d4c7ac8 8107ae75 
8180a154
[  113.488628]  0b30  00d0 
88001d4c7ad8
[  113.489525] Call Trace:
[  113.489813]  [] dump_stack+0x4c/0x65
[  113.490389]  [] ? console_unlock+0x3d6/0x405
[  113.491039]  [] lockdep_rcu_suspicious+0xfa/0x103
[  113.491735]  [] rcu_preempt_sleep_check+0x45/0x47
[  113.492442]  [] ___might_sleep+0x19/0x1c8
[  113.493077]  [] __might_sleep+0x6c/0x82
[  113.493681]  [] 
cache_alloc_debugcheck_before.isra.50+0x1d/0x24
[  113.494508]  [] kmem_cache_alloc+0x31/0x18f
[  113.495149]  [] skb_clone+0x64/0x80
[  113.495712]  [] pfkey_broadcast_one+0x3d/0xff
[  113.496380]  [] pfkey_broadcast+0xb5/0x11e
[  113.497024]  [] pfkey_register+0x191/0x1b1
[  113.497653]  [] pfkey_process+0x162/0x17e
[  113.498274]  [] pfkey_sendmsg+0x109/0x213

In pfkey_sendmsg the net mutex is taken and then pfkey_broadcast takes
the RCU lock.

Since pfkey_broadcast takes the RCU lock the allocation argument is
pointless since GFP_ATOMIC must be used between the rcu_read_{,un}lock.
The one call outside of rcu can be done with GFP_KERNEL.

Fixes: 7f6b9dbd5afbd ("af_key: locking change")
Signed-off-by: David Ahern 
---
v2
- removed allocation arg and hardcoded to GFP_ATOMIC during rcu locking

 net/key/af_key.c | 46 +++---
 1 file changed, 23 insertions(+), 23 deletions(-)

diff --git a/net/key/af_key.c b/net/key/af_key.c
index b397f0aa9005..83a70688784b 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -219,7 +219,7 @@ static int pfkey_broadcast_one(struct sk_buff *skb, struct 
sk_buff **skb2,
 #define BROADCAST_ONE  1
 #define BROADCAST_REGISTERED   2
 #define BROADCAST_PROMISC_ONLY 4
-static int pfkey_broadcast(struct sk_buff *skb, gfp_t allocation,
+static int pfkey_broadcast(struct sk_buff *skb,
   int broadcast_flags, struct sock *one_sk,
   struct net *net)
 {
@@ -244,7 +244,7 @@ static int pfkey_broadcast(struct sk_buff *skb, gfp_t 
allocation,
 * socket.
 */
if (pfk->promisc)
-   pfkey_broadcast_one(skb, &skb2, allocation, sk);
+   pfkey_broadcast_one(skb, &skb2, GFP_ATOMIC, sk);
 
/* the exact target will be processed later */
if (sk == one_sk)
@@ -259,7 +259,7 @@ static int pfkey_broadcast(struct sk_buff *skb, gfp_t 
allocation,
continue;
}
 
-   err2 = pfkey_broadcast_one(skb, &skb2, allocation, sk);
+   err2 = pfkey_broadcast_one(skb, &skb2, GFP_ATOMIC, sk);
 
/* Error is cleare after succecful sending to at least one
 * registered KM */
@@ -269,7 +269,7 @@ static int pfkey_broadcast(struct sk_buff *skb, gfp_t 
allocation,
rcu_read_unlock();
 
if (one_sk != NULL)
-   err = pfkey_broadcast_one(skb, &skb2, allocation, one_sk);
+   err = pfkey_broadcast_one(skb, &skb2, GFP_KERNEL, one_sk);
 
kfree_skb(skb2);
kfree_skb(skb);
@@ -292,7 +292,7 @@ static int pfkey_do_dump(struct pfkey_sock *pfk)
hdr = (struct sadb_msg *) pfk->dump.skb->data;
hdr->sadb_msg_seq = 0;
hdr->sadb_msg_errno = rc;
-   pfkey_broadcast(pfk->dump.skb, GFP_ATOMIC, BROADCAST_ONE,
+   pfkey_broadcast(pfk->dump.skb, BROADCAST_ONE,
&pfk->sk, sock_net(&pfk->sk));
pfk->dump.skb = NULL;
}
@@ -333,7 +333,7 @@ static int pfkey_error(const struct sadb_msg *orig, int 
err, struct sock *sk)
hdr->sadb_msg_len = (sizeof(struct sadb_msg) /
 sizeof

Re: [PATCH v3 00/22] FUJITSU Extended Socket network device driver

2015-08-24 Thread David Miller

From: Taku Izumi 
Date: Fri, 21 Aug 2015 17:28:00 +0900

> This patchsets adds FUJITSU Extended Socket network device driver.
> Extended Socket network device is a shared memory based high-speed
> network interface between Extended Partitions of PRIMEQUEST 2000 E2
> series.
> 
> You can get some information about Extended Partition and Extended
> Socket by referring the following manual.
> 
> http://globalsp.ts.fujitsu.com/dmsp/Publications/public/CA92344-0537.pdf
> 3.2.1 Extended Partitioning
> 3.2.2 Extended Socke
> 
> v2.2 -> v3:
>- Fix up according to David's comment (No functional change)

Series applied, thank you.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH iproute2] add support for brief output for link and addresses

2015-08-24 Thread Stephen Hemminger

On Mon, 24 Aug 2015 20:41:16 +
Andy Gospodarek  wrote:

> This adds support for slightly less output than is normally provided by
> 'ip link show' and 'ip addr show'.  This is a bit better when you have a
> host with lots of interfaces.  Sample output:
> 
> $ ip -br link show
> lo   UNKNOWN  00:00:00:00:00:00  
> p7p1 UP   08:00:27:9d:62:9f  
> p8p1 DOWN 08:00:27:dc:d8:ca  
> 
> p9p1 UP   08:00:27:76:d9:75  
> p7p1.100@p7p1UP   08:00:27:9d:62:9f  
> 
> $ ip -br -4 addr show
> lo   UNKNOWN  127.0.0.1/8
> p7p1 UP   70.0.0.1/24
> p8p1 DOWN 80.0.0.1/24
> p7p1.100@p7p1UP   200.0.0.1/24
> 
> $ ip -br -6 addr show
> lo   UNKNOWN  ::1/128
> p7p1 UP   7000::1/8 fe80::a00:27ff:fe9d:629f/64
> p8p1 DOWN 8000::1/8
> p9p1 UP   fe80::a00:27ff:fe76:d975/64
> p7p1.100@p7p1UP   fe80::a00:27ff:fe9d:629f/64
> 
> $ ip -br  addr show p7p1
> p7p1 UP   70.0.0.1/24 7000::1/8 fe80::a00:27ff:fe9d:629f/64
> 
> Signed-off-by: Andy Gospodarek 

Cool, we could colorize this as well :-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] usbnet: Fix a race between usbnet_stop() and the BH

2015-08-24 Thread Bjørn Mork

Eugene Shatokhin  writes:

> The race may happen when a device (e.g. YOTA 4G LTE Modem) is
> unplugged while the system is downloading a large file from the Net.
>
> Hardware breakpoints and Kprobes with delays were used to confirm that
> the race does actually happen.
>
> The race is on skb_queue ('next' pointer) between usbnet_stop()
> and rx_complete(), which, in turn, calls usbnet_bh().
>
> Here is a part of the call stack with the code where the changes to the
> queue happen. The line numbers are for the kernel 4.1.0:
>
> *0 __skb_unlink (skbuff.h:1517)
> prev->next = next;
> *1 defer_bh (usbnet.c:430)
> spin_lock_irqsave(&list->lock, flags);
> old_state = entry->state;
> entry->state = state;
> __skb_unlink(skb, list);
> spin_unlock(&list->lock);
> spin_lock(&dev->done.lock);
> __skb_queue_tail(&dev->done, skb);
> if (dev->done.qlen == 1)
> tasklet_schedule(&dev->bh);
> spin_unlock_irqrestore(&dev->done.lock, flags);
> *2 rx_complete (usbnet.c:640)
> state = defer_bh(dev, skb, &dev->rxq, state);
>
> At the same time, the following code repeatedly checks if the queue is
> empty and reads these values concurrently with the above changes:
>
> *0  usbnet_terminate_urbs (usbnet.c:765)
> /* maybe wait for deletions to finish. */
> while (!skb_queue_empty(&dev->rxq)
> && !skb_queue_empty(&dev->txq)
> && !skb_queue_empty(&dev->done)) {
> schedule_timeout(msecs_to_jiffies(UNLINK_TIMEOUT_MS));
> set_current_state(TASK_UNINTERRUPTIBLE);
> netif_dbg(dev, ifdown, dev->net,
>   "waited for %d urb completions\n", temp);
> }
> *1  usbnet_stop (usbnet.c:806)
> if (!(info->flags & FLAG_AVOID_UNLINK_URBS))
> usbnet_terminate_urbs(dev);
>
> As a result, it is possible, for example, that the skb is removed from
> dev->rxq by __skb_unlink() before the check
> "!skb_queue_empty(&dev->rxq)" in usbnet_terminate_urbs() is made. It is
> also possible in this case that the skb is added to dev->done queue
> after "!skb_queue_empty(&dev->done)" is checked. So
> usbnet_terminate_urbs() may stop waiting and return while dev->done
> queue still has an item.

Exactly what problem will that result in?  The tasklet_kill() will wait
for the processing of the single element done queue, and everything will
be fine.  Or?


Bjørn

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH iproute2] add support for brief output for link and addresses

2015-08-24 Thread Andy Gospodarek

This adds support for slightly less output than is normally provided by
'ip link show' and 'ip addr show'.  This is a bit better when you have a
host with lots of interfaces.  Sample output:

$ ip -br link show
lo   UNKNOWN  00:00:00:00:00:00  
p7p1 UP   08:00:27:9d:62:9f  
p8p1 DOWN 08:00:27:dc:d8:ca  
p9p1 UP   08:00:27:76:d9:75  
p7p1.100@p7p1UP   08:00:27:9d:62:9f  

$ ip -br -4 addr show
lo   UNKNOWN  127.0.0.1/8
p7p1 UP   70.0.0.1/24
p8p1 DOWN 80.0.0.1/24
p7p1.100@p7p1UP   200.0.0.1/24

$ ip -br -6 addr show
lo   UNKNOWN  ::1/128
p7p1 UP   7000::1/8 fe80::a00:27ff:fe9d:629f/64
p8p1 DOWN 8000::1/8
p9p1 UP   fe80::a00:27ff:fe76:d975/64
p7p1.100@p7p1UP   fe80::a00:27ff:fe9d:629f/64

$ ip -br  addr show p7p1
p7p1 UP   70.0.0.1/24 7000::1/8 fe80::a00:27ff:fe9d:629f/64

Signed-off-by: Andy Gospodarek 
---
 include/utils.h   |   1 +
 ip/ip.c   |   5 +-
 ip/ip_common.h|   3 +
 ip/ipaddress.c| 149 ++
 ip/iplink.c   |   5 +-
 man/man8/ip-link.8.in |   3 +-
 6 files changed, 141 insertions(+), 25 deletions(-)

diff --git a/include/utils.h b/include/utils.h
index 0c57ccd..f77edeb 100644
--- a/include/utils.h
+++ b/include/utils.h
@@ -19,6 +19,7 @@ extern int show_details;
 extern int show_raw;
 extern int resolve_hosts;
 extern int oneline;
+extern int brief;
 extern int timestamp;
 extern int timestamp_short;
 extern const char * _SL_;
diff --git a/ip/ip.c b/ip/ip.c
index e75447e..eea00b8 100644
--- a/ip/ip.c
+++ b/ip/ip.c
@@ -32,6 +32,7 @@ int show_stats;
 int show_details;
 int resolve_hosts;
 int oneline;
+int brief;
 int timestamp;
 const char *_SL_;
 int force;
@@ -55,7 +56,7 @@ static void usage(void)
 "-h[uman-readable] | -iec |\n"
 "-f[amily] { inet | inet6 | ipx | dnet | mpls | bridge | 
link } |\n"
 "-4 | -6 | -I | -D | -B | -0 |\n"
-"-l[oops] { maximum-addr-flush-attempts } |\n"
+"-l[oops] { maximum-addr-flush-attempts } | -br[ief] |\n"
 "-o[neline] | -t[imestamp] | -ts[hort] | -b[atch] 
[filename] |\n"
 "-rc[vbuf] [size] | -n[etns] name | -a[ll] | -c[olor]}\n");
exit(-1);
@@ -250,6 +251,8 @@ int main(int argc, char **argv)
if (argc <= 1)
usage();
batch_file = argv[1];
+   } else if (matches(opt, "-brief") == 0) {
+   ++brief;
} else if (matches(opt, "-rcvbuf") == 0) {
unsigned int size;
 
diff --git a/ip/ip_common.h b/ip/ip_common.h
index f120f5b..f74face 100644
--- a/ip/ip_common.h
+++ b/ip/ip_common.h
@@ -2,6 +2,9 @@ extern int get_operstate(const char *name);
 extern int print_linkinfo(const struct sockaddr_nl *who,
  struct nlmsghdr *n,
  void *arg);
+extern int print_linkinfo_brief(const struct sockaddr_nl *who,
+   struct nlmsghdr *n,
+   void *arg);
 extern int print_addrinfo(const struct sockaddr_nl *who,
  struct nlmsghdr *n,
  void *arg);
diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 13d9c46..84b453f 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -125,7 +125,10 @@ static void print_link_flags(FILE *fp, unsigned flags, 
unsigned mdown)
fprintf(fp, "%x", flags);
if (mdown)
fprintf(fp, ",M-DOWN");
-   fprintf(fp, "> ");
+   if (brief)
+   fprintf(fp, ">");
+   else
+   fprintf(fp, "> ");
 }
 
 static const char *oper_states[] = {
@@ -138,13 +141,17 @@ static void print_operstate(FILE *f, __u8 state)
if (state >= sizeof(oper_states)/sizeof(oper_states[0]))
fprintf(f, "state %#x ", state);
else {
-   fprintf(f, "state ");
-   if (strcmp(oper_states[state], "UP") == 0)
-   color_fprintf(f, COLOR_OPERSTATE_UP, "%s ", 
oper_states[state]);
-   else if (strcmp(oper_states[state], "DOWN") == 0)
-   color_fprintf(f, COLOR_OPERSTATE_DOWN, "%s ", 
oper_states[state]);
-   else
-   fprintf(f, "%s ", oper_states[state]);
+   if (brief) {
+   fprintf(f, "%-7s  ", oper_states[state]);
+   } else {
+   fprintf(f, "state ");
+   if (strcmp(oper_states[state], "UP") == 0)
+   color_fprintf(f, COLOR_OPERSTATE_UP, "%s ", 
oper_states[state]);
+   else if (strcmp(oper_states[state], "DOWN") == 0)
+   col

RE: [PATCH net-next 2/2] lan78xx: update eee code

2015-08-24 Thread Woojung.Huh

Hi Florian,

Thanks for comments.
Will update to utilize phylib.

- Woojung

> -Original Message-
> From: Florian Fainelli [mailto:f.faine...@gmail.com]
> Sent: Friday, August 21, 2015 5:57 PM
> To: Woojung Huh - C21699; da...@davemloft.net
> Cc: netdev@vger.kernel.org
> Subject: Re: [PATCH net-next 2/2] lan78xx: update eee code
> 
> On 21/08/15 14:41, woojung@microchip.com wrote:
> > Patch to pdate EEE code.
> 
> This really deserves a better explanation of what is it that you are
> fixing here.
> 
> >
> > Signed-off-by: Woojung Huh 
> > ---
> >  drivers/net/usb/lan78xx.c | 44 ---
> -
> >  drivers/net/usb/lan78xx.h | 22 +++---
> >  2 files changed, 35 insertions(+), 31 deletions(-)
> >
> > diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
> > index 4bcbf28..af102b0 100644
> > --- a/drivers/net/usb/lan78xx.c
> > +++ b/drivers/net/usb/lan78xx.c
> > @@ -1296,38 +1296,37 @@ static int lan78xx_get_eee(struct net_device
> *net, struct ethtool_eee *edata)
> > if (ret < 0)
> > return ret;
> >
> > +   buf = lan78xx_mmd_read(dev->net, dev->mii.phy_id,
> > +  PHY_MMD_DEV_7, PHY_EEE_ADVERTISEMENT);
> > +   adv = mmd_eee_adv_to_ethtool_adv_t(buf);
> > +   buf = lan78xx_mmd_read(dev->net, dev->mii.phy_id,
> > +  PHY_MMD_DEV_7,
> PHY_EEE_LP_ADVERTISEMENT);
> > +   lpadv = mmd_eee_adv_to_ethtool_adv_t(buf);
> 
> Considering your function signatures, it sounds like you should
> implement a libphy driver and you could get things like phy_init_eee()
> for free.
> 
> [snip]
> 
> > /* enable PHY interrupts */
> > ret = lan78xx_read_reg(dev, INT_EP_CTL, &buf);
> > buf |= INT_ENP_PHY_INT;
> > diff --git a/drivers/net/usb/lan78xx.h b/drivers/net/usb/lan78xx.h
> > index ae7562e..95e721b 100644
> > --- a/drivers/net/usb/lan78xx.h
> > +++ b/drivers/net/usb/lan78xx.h
> > @@ -1047,23 +1047,23 @@
> >  #define PHY_MMD_DEV_3  3
> >
> >  #define PHY_EEE_PCS_STATUS (0x1)
> > -#define PHY_EEE_PCS_STATUS_TX_LPI_RCVD_
>   ((WORD)0x0800)
> > -#define PHY_EEE_PCS_STATUS_RX_LPI_RCVD_
>   ((WORD)0x0400)
> > -#define PHY_EEE_PCS_STATUS_TX_LPI_IND_
>   ((WORD)0x0200)
> > -#define PHY_EEE_PCS_STATUS_RX_LPI_IND_
>   ((WORD)0x0100)
> > -#define PHY_EEE_PCS_STATUS_PCS_RCV_LNK_STS_
>   ((WORD)0x0004)
> > +#define PHY_EEE_PCS_STATUS_TX_LPI_RCVD_(0x0800)
> > +#define PHY_EEE_PCS_STATUS_RX_LPI_RCVD_(0x0400)
> > +#define PHY_EEE_PCS_STATUS_TX_LPI_IND_ (0x0200)
> > +#define PHY_EEE_PCS_STATUS_RX_LPI_IND_ (0x0100)
> > +#define PHY_EEE_PCS_STATUS_PCS_RCV_LNK_STS_(0x0004)
> 
> Can you look at updating include/uapi/linux/mdio.h with the missing
> registers for your use case instead of replicating this in a driver?
> --
> Florian

[PATCH 0/2] usbnet: Fix 2 problems in usbnet_stop()

2015-08-24 Thread Eugene Shatokhin

The following problems found when investigating races in usbnet module 
are fixed here:

1. EVENT_NO_RUNTIME_PM bit of dev->flags should be read before it is 
cleared by "dev->flags = 0". Thanks to Oliver Neukum for spotting this
problem and providing a fix.

2. A race on on skb_queue between usbnet_stop() and usbnet_bh().

Compared to the combined patch I sent earlier 
("[PATCH] usbnet: Fix two races between usbnet_stop() and the BH"), this 
patch set has the following changes:

* The fix for handling of EVENT_NO_RUNTIME_PM is now in a separate patch.
* The fix for the race on dev->flags has been removed because the race is
not considered harmful.

Regards,
Eugene

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH v3 3/4] Add support for driver cross-timestamp to PTP_SYS_OFFSET ioctl

2015-08-24 Thread Hall, Christopher S

> -Original Message-
> From: Richard Cochran [mailto:richardcoch...@gmail.com]
> Sent: Sunday, August 23, 2015 4:26 AM
> To: Thomas Gleixner
> Cc: Hall, Christopher S; Kirsher, Jeffrey T; h...@zytor.com;
> mi...@redhat.com; john.stu...@linaro.org; x...@kernel.org; linux-
> ker...@vger.kernel.org; netdev@vger.kernel.org; intel-wired-
> l...@lists.osuosl.org; pet...@infradead.org
> Subject: Re: [PATCH v3 3/4] Add support for driver cross-timestamp to
> PTP_SYS_OFFSET ioctl
> 
> On Sun, Aug 23, 2015 at 10:15:00AM +0200, Thomas Gleixner wrote:
> > So why can't you take N samples from the synced hardware? It does not
> > make any sense to me to switch to the imprecise mode if nsamples > 1.
> 
> Ok, then I prefer to leave this "imprecise" method in place and ...
> 
> > You can also provide a new IOCTL PTP_SYS_OFFSET_PRECISE which returns
> > -ENOSYS if hardware timestamping is not available and avoid the whole
> > nsamples dance for the case where we can get precise timestamps.
> 
> have this for the new way.
> 
> By keeping the imprecise method, we will be able to run both methods
> on the new hardware.  That will help to quantify how imprecise the old
> method is.

This means: remove code changes from the PTP_SYS_OFFSET ioctl and call 
getsynctime64() from a new ioctl PTP_SYS_OFFSET_PRECISE.  Right?

And use the same type (struct ptp_sys_offset) for the new ioctl?  Or should a 
new simplified struct be used? Such as:

struct precise_ptp_sys_offset {
struct ptp_clock_time device;
struct ptp_clock_time system;
};

Does it make sense to keep the "cross-timestamp" capabilities flag as-is?

> 
> Thanks,
> Richard

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] udp_offload: Allow device GRO without checksum-complete

2015-08-24 Thread Eric Dumazet

On Mon, 2015-08-24 at 12:34 -0700, Tom Herbert wrote:

> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index c0a15e7..1d91227 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -130,6 +130,9 @@ EXPORT_SYMBOL(sysctl_udp_wmem_min);
>  atomic_long_t udp_memory_allocated;
>  EXPORT_SYMBOL(udp_memory_allocated);
>  
> +int sysctl_udp_gro_nocsum_ok;
> +EXPORT_SYMBOL(sysctl_udp_gro_nocsum_ok);
> +

1) Why is this exported ?

2) I do not believe it is specific to UDP path.

  We could have the same sysctl for GRE or IPIP or XXX encaps ?



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] usbnet: Get EVENT_NO_RUNTIME_PM bit before it is cleared

2015-08-24 Thread Eugene Shatokhin

It is needed to check EVENT_NO_RUNTIME_PM bit of dev->flags in
usbnet_stop(), but its value should be read before it is cleared
when dev->flags is set to 0.

The problem was spotted and the fix was provided by
Oliver Neukum .

Signed-off-by: Eugene Shatokhin 
---
 drivers/net/usb/usbnet.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index 3c86b10..e049857 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -778,7 +778,7 @@ int usbnet_stop (struct net_device *net)
 {
struct usbnet   *dev = netdev_priv(net);
struct driver_info  *info = dev->driver_info;
-   int retval, pm;
+   int retval, pm, mpn;
 
clear_bit(EVENT_DEV_OPEN, &dev->flags);
netif_stop_queue (net);
@@ -809,6 +809,8 @@ int usbnet_stop (struct net_device *net)
 
usbnet_purge_paused_rxq(dev);
 
+   mpn = !test_and_clear_bit(EVENT_NO_RUNTIME_PM, &dev->flags);
+
/* deferred work (task, timer, softirq) must also stop.
 * can't flush_scheduled_work() until we drop rtnl (later),
 * else workers could deadlock; so make workers a NOP.
@@ -819,8 +821,7 @@ int usbnet_stop (struct net_device *net)
if (!pm)
usb_autopm_put_interface(dev->intf);
 
-   if (info->manage_power &&
-   !test_and_clear_bit(EVENT_NO_RUNTIME_PM, &dev->flags))
+   if (info->manage_power && mpn)
info->manage_power(dev, 0);
else
usb_autopm_put_interface(dev->intf);
-- 
2.3.2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] usbnet: Fix a race between usbnet_stop() and the BH

2015-08-24 Thread Eugene Shatokhin

The race may happen when a device (e.g. YOTA 4G LTE Modem) is
unplugged while the system is downloading a large file from the Net.

Hardware breakpoints and Kprobes with delays were used to confirm that
the race does actually happen.

The race is on skb_queue ('next' pointer) between usbnet_stop()
and rx_complete(), which, in turn, calls usbnet_bh().

Here is a part of the call stack with the code where the changes to the
queue happen. The line numbers are for the kernel 4.1.0:

*0 __skb_unlink (skbuff.h:1517)
prev->next = next;
*1 defer_bh (usbnet.c:430)
spin_lock_irqsave(&list->lock, flags);
old_state = entry->state;
entry->state = state;
__skb_unlink(skb, list);
spin_unlock(&list->lock);
spin_lock(&dev->done.lock);
__skb_queue_tail(&dev->done, skb);
if (dev->done.qlen == 1)
tasklet_schedule(&dev->bh);
spin_unlock_irqrestore(&dev->done.lock, flags);
*2 rx_complete (usbnet.c:640)
state = defer_bh(dev, skb, &dev->rxq, state);

At the same time, the following code repeatedly checks if the queue is
empty and reads these values concurrently with the above changes:

*0  usbnet_terminate_urbs (usbnet.c:765)
/* maybe wait for deletions to finish. */
while (!skb_queue_empty(&dev->rxq)
&& !skb_queue_empty(&dev->txq)
&& !skb_queue_empty(&dev->done)) {
schedule_timeout(msecs_to_jiffies(UNLINK_TIMEOUT_MS));
set_current_state(TASK_UNINTERRUPTIBLE);
netif_dbg(dev, ifdown, dev->net,
  "waited for %d urb completions\n", temp);
}
*1  usbnet_stop (usbnet.c:806)
if (!(info->flags & FLAG_AVOID_UNLINK_URBS))
usbnet_terminate_urbs(dev);

As a result, it is possible, for example, that the skb is removed from
dev->rxq by __skb_unlink() before the check
"!skb_queue_empty(&dev->rxq)" in usbnet_terminate_urbs() is made. It is
also possible in this case that the skb is added to dev->done queue
after "!skb_queue_empty(&dev->done)" is checked. So
usbnet_terminate_urbs() may stop waiting and return while dev->done
queue still has an item.

Locking in defer_bh() and usbnet_terminate_urbs() was revisited to avoid
this race.

Signed-off-by: Eugene Shatokhin 
---
 drivers/net/usb/usbnet.c | 39 ---
 1 file changed, 28 insertions(+), 11 deletions(-)

diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index e049857..b4cf107 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -428,12 +428,18 @@ static enum skb_state defer_bh(struct usbnet *dev, struct 
sk_buff *skb,
old_state = entry->state;
entry->state = state;
__skb_unlink(skb, list);
-   spin_unlock(&list->lock);
-   spin_lock(&dev->done.lock);
+
+   /* defer_bh() is never called with list == &dev->done.
+* spin_lock_nested() tells lockdep that it is OK to take
+* dev->done.lock here with list->lock held.
+*/
+   spin_lock_nested(&dev->done.lock, SINGLE_DEPTH_NESTING);
+
__skb_queue_tail(&dev->done, skb);
if (dev->done.qlen == 1)
tasklet_schedule(&dev->bh);
-   spin_unlock_irqrestore(&dev->done.lock, flags);
+   spin_unlock(&dev->done.lock);
+   spin_unlock_irqrestore(&list->lock, flags);
return old_state;
 }
 
@@ -749,6 +755,20 @@ EXPORT_SYMBOL_GPL(usbnet_unlink_rx_urbs);
 
 /*-*/
 
+static void wait_skb_queue_empty(struct sk_buff_head *q)
+{
+   unsigned long flags;
+
+   spin_lock_irqsave(&q->lock, flags);
+   while (!skb_queue_empty(q)) {
+   spin_unlock_irqrestore(&q->lock, flags);
+   schedule_timeout(msecs_to_jiffies(UNLINK_TIMEOUT_MS));
+   set_current_state(TASK_UNINTERRUPTIBLE);
+   spin_lock_irqsave(&q->lock, flags);
+   }
+   spin_unlock_irqrestore(&q->lock, flags);
+}
+
 // precondition: never called in_interrupt
 static void usbnet_terminate_urbs(struct usbnet *dev)
 {
@@ -762,14 +782,11 @@ static void usbnet_terminate_urbs(struct usbnet *dev)
unlink_urbs(dev, &dev->rxq);
 
/* maybe wait for deletions to finish. */
-   while (!skb_queue_empty(&dev->rxq)
-   && !skb_queue_empty(&dev->txq)
-   && !skb_queue_empty(&dev->done)) {
-   schedule_timeout(msecs_to_jiffies(UNLINK_TIMEOUT_MS));
-   set_current_state(TASK_UNINTERRUPTIBLE);
-   netif_dbg(dev, ifdown, dev->net,
- "waited for %d urb completions\n", temp);
-   }
+   wait_skb_queue_empty(&dev->rxq);
+   wait_skb_queue_empty(&dev->txq);
+   wait_skb_queue_empty(&dev->done);
+   netif_dbg(dev, ifdown, dev->net,
+ "waited for %d urb completions\n", temp);
set_current_state(TASK_RUNNING);
remove_wait_queue(&dev->wait, &wait);
 }
-- 
2.3.2

--
To unsubscribe from this list: sen

[PATCH net-next] udp_offload: Allow device GRO without checksum-complete

2015-08-24 Thread Tom Herbert

This patch adds a sysctl which allows GRO for a UDP offload protocol
to be performed in the device NAPI. This potentially is a performance
improvement if the savings of doing GRO in device NAPI outweighs the
cost of performing the checksum. Note that the performing the
checksum in device NAPI may negatively impact latency or throughput
of unrelated flows.

Performance results for VXLAN are below. Allowing GRO in device
NAPI does show performance improvement over doing GRO at the VXLAN
interface, however this performance is still less than what we see
with UDP checksums enabled (or getting checksum complete from the
device).

Test results: Running one netperf TCP_STREAM over VXLAN.

No UDP checksum, enable sysctl to allow GRO at device (this patch)
  TX CPU: 1.71
  RX CPU: 1.14
  6174 Mbps

UDP checksums and remote checksum offload enabled
  TX CPU: 1.97%
  RX CPU: 1.55%
  7527 Mbps

UDP checksums enabled
  TX CPU: 1.22%
  RX CPU: 1.86%
  6539 Mbps

No UDP checksums, GRO enabled on VXLAN interface
  TX CPU: 0.95%
  RX CPU: 1.78%
  4393 Mbps

No UDP checksum, GRO disabled VXLAN interface
  TX CPU: 1.31%
  RX CPU: 2.38%
  3613 Mbps

Signed-off-by: Tom Herbert 
---
 Documentation/networking/ip-sysctl.txt | 7 +++
 include/net/udp.h  | 1 +
 net/ipv4/sysctl_net_ipv4.c | 7 +++
 net/ipv4/udp.c | 3 +++
 net/ipv4/udp_offload.c | 7 ---
 5 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt 
b/Documentation/networking/ip-sysctl.txt
index 46e88ed..d8563c08 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -711,6 +711,13 @@ udp_wmem_min - INTEGER
total pages of UDP sockets exceed udp_mem pressure. The unit is byte.
Default: 1 page
 
+udp_gro_nocsum_ok - BOOLEAN
+   If set, allow Generic Receive Offload (GRO) to be performed for UDP
+   offload protocols in the case that packets are being received
+   without an offloaded checksum. This implies that packets checksums
+   may be performed in the device NAPI routines which could negatively
+   impact unrelated flows.
+
 CIPSOv4 Variables:
 
 cipso_cache_enable - BOOLEAN
diff --git a/include/net/udp.h b/include/net/udp.h
index 6d4ed18..48eb6ae 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -103,6 +103,7 @@ extern atomic_long_t udp_memory_allocated;
 extern long sysctl_udp_mem[3];
 extern int sysctl_udp_rmem_min;
 extern int sysctl_udp_wmem_min;
+extern int sysctl_udp_gro_nocsum_ok;
 
 struct sk_buff;
 
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 0330ab2..65fea78 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -766,6 +766,13 @@ static struct ctl_table ipv4_table[] = {
.proc_handler   = proc_dointvec_minmax,
.extra1 = &one
},
+   {
+   .procname   = "udp_gro_nocsum_ok",
+   .data   = &sysctl_udp_gro_nocsum_ok,
+   .maxlen = sizeof(sysctl_udp_gro_nocsum_ok),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   },
{ }
 };
 
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index c0a15e7..1d91227 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -130,6 +130,9 @@ EXPORT_SYMBOL(sysctl_udp_wmem_min);
 atomic_long_t udp_memory_allocated;
 EXPORT_SYMBOL(udp_memory_allocated);
 
+int sysctl_udp_gro_nocsum_ok;
+EXPORT_SYMBOL(sysctl_udp_gro_nocsum_ok);
+
 #define MAX_UDP_PORTS 65536
 #define PORTS_PER_CHAIN (MAX_UDP_PORTS / UDP_HTABLE_SIZE_MIN)
 
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index f938616..1666f44 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -300,9 +300,10 @@ struct sk_buff **udp_gro_receive(struct sk_buff **head, 
struct sk_buff *skb,
int flush = 1;
 
if (NAPI_GRO_CB(skb)->udp_mark ||
-   (skb->ip_summed != CHECKSUM_PARTIAL &&
-NAPI_GRO_CB(skb)->csum_cnt == 0 &&
-!NAPI_GRO_CB(skb)->csum_valid))
+   ((skb->ip_summed != CHECKSUM_PARTIAL &&
+ NAPI_GRO_CB(skb)->csum_cnt == 0 &&
+ !NAPI_GRO_CB(skb)->csum_valid) &&
+ !sysctl_udp_gro_nocsum_ok))
goto out;
 
/* mark that this skb passed once through the udp gro layer */
-- 
1.8.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] 3c59x: Add BQL support for 3c59x ethernet driver.

2015-08-24 Thread David Miller

From: Loganaden Velvindron 
Date: Thu, 20 Aug 2015 19:22:18 -0700

> This BQL patch is based on work done by Tino Reichardt.
> 
> Tested on :05:00.0: 3Com PCI 3c905C Tornado at c9e6e000 by running
> Flent several times.
> 
> 
> Signed-off-by: Loganaden Velvindron 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: iproute2: Behavioural Bug?

2015-08-24 Thread Vadim Kochan

On Mon, Aug 24, 2015 at 02:00:29PM +0530, Akshat Kakkar wrote:
> When I am trying to delete a single tc filter (i.e. specifying its
> handle), it is deleting all the
> filters with the same priority/preference. i.e. it is ignoring the
> handle specified.
> 
> But, When I am doing similar activity in hashtable 800: it is deleting only 
> the
> specified filter, i.e. it is behaving as expected.
> 
> I am unable to comprehend the reason for this difference in behaviour.
> 
> Infact, in kernel 2.6.32 all is working as expected. However, in
> kernel 3.1 and 4.1 it is having the behaviour as mentioned above.
> 
> For example, following set of commands  create a hashtable 15: and add
> 2 filters to it.
> 
> tc filter add dev eth0 parent 1:0 prio 5 handle 15: protocol ip u32 divisor 
> 256
> tc filter add dev eth0 protocol ip parent 1: prio 5 handle 15:2:2 u32
> ht 15:2: match ip src 10.0.0.2 flowid 1:10
> tc filter add dev eth0 protocol ip parent 1: prio 5 handle 15:2:3 u32
> ht 15:2: match ip src 10.0.0.3 flowid 1:10
> 
> Now following command DELETES ALL THE FILTERS, though it should only
> delete FILTER 15:2:3 !
> tc filter del dev eth0 protocol ip parent 1: prio 5 handle 15:2:3 u32
> 
> O/p of tc filter show eth0 is this case is blank. As all filters are deleted.
> 
> 
> However, similar commands when executed for hashtable 800: is deleting
> only the specified filter
> tc filter add dev eth0 protocol ip parent 1: prio 5 handle 800:0:2 u32
> ht 800:0: match ip src 10.0.0.2 flowid 1:10
> tc filter add dev eth0 protocol ip parent 1: prio 5 handle 800:0:3 u32
> ht 800:0: match ip src 10.0.0.3 flowid 1:10
> 
> tc filter del dev eth0 protocol ip parent 1: prio 5 handle 800:0:2 u32
> 
> Above mentioned command only deletes single filter.
> O/p of tc filter show eth0 is 2nd case is
> 
> filter parent 1: protocol ip pref 5 u32
> filter parent 1: protocol ip pref 5 u32 fh 800: ht divisor 1
> filter parent 1: protocol ip pref 5 u32 fh 800::3 order 3 key ht 800
> bkt 0 flowid 1:10
>   match 0a03/ at 12
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Hi,

Thats what I got using this script where I copied your commands:

--
#!/bin/bash

DEV=dummy0

ip link del $DEV 2> /dev/null

ip link add dev $DEV type dummy

tc qdisc add dev $DEV root handle 1: htb

tc filter add dev $DEV parent 1:0 prio 5 handle 15: protocol ip u32 divisor 256
tc filter add dev $DEV protocol ip parent 1: prio 5 handle 15:2:2 u32 ht 15:2: 
match ip src 10.0.0.2 flowid 1:10
tc filter add dev $DEV protocol ip parent 1: prio 5 handle 15:2:3 u32 ht 15:2: 
match ip src 10.0.0.3 flowid 1:10

tc filter del dev $DEV protocol ip parent 1: prio 5 handle 15:2:3 u32

tc filter show dev $DEV
# -

Result is:

filter parent 1: protocol ip pref 5 u32 
filter parent 1: protocol ip pref 5 u32 fh 15: ht divisor 256 
filter parent 1: protocol ip pref 5 u32 fh 15:2:2 order 2 key ht 15 bkt 2 
flowid 1:10 
  match 0a02/ at 12
filter parent 1: protocol ip pref 5 u32 fh 800: ht divisor 1 

Some additional info:

# tc -V
tc utility, iproute2-ss150413

# uname -a
Linux angus-think 4.0.4-2-ARCH #1 SMP PREEMPT Fri May 22 03:05:23 UTC 2015 
x86_64 GNU/Linux

Regards,
Vadim Kochan
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 11/11] mtrr: bury MTRR - unexport mtrr_add() and mtrr_del()

2015-08-24 Thread Luis R. Rodriguez

From: "Luis R. Rodriguez" 

The crusade to replace mtrr_add() with architecture agnostic
arch_phys_wc_add() is complete, this will ensure write-combining
implementations (PAT on x86) is taken advantage instead of using
MTRR. With the crusade done now, hide direct MTRR access for
drivers.

Update x86 documentation on MTRR to reflect the completion of the
phasing out of direct access to MTRR, also add a note on platform
firmware code use of MTRRs based on the obituary discussion of
MTRRs on Linux [0].

[0] http://lkml.kernel.org/r/1438991330.3109.196.ca...@hp.com

Cc: Toshi Kani 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: Suresh Siddha 
Cc: Ingo Molnar 
Cc: Juergen Gross 
Cc: Daniel Vetter 
Cc: Andy Lutomirski 
Cc: Dave Airlie 
Cc: Antonino Daplas 
Cc: Jean-Christophe Plagniol-Villard 
Cc: Tomi Valkeinen 
Cc: Ville Syrjälä 
Cc: Mel Gorman 
Cc: Vlastimil Babka 
Cc: Davidlohr Bueso 
Cc: Doug Ledford 
Cc: Andy Walls 
Cc: x...@kernel.org
Cc: netdev@vger.kernel.org
Cc: linux-me...@vger.kernel.org
Cc: linux-fb...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Luis R. Rodriguez 
---
 Documentation/x86/mtrr.txt  | 20 
 arch/x86/kernel/cpu/mtrr/main.c |  2 --
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/Documentation/x86/mtrr.txt b/Documentation/x86/mtrr.txt
index 860bc3adc223..8a0bdb6e7370 100644
--- a/Documentation/x86/mtrr.txt
+++ b/Documentation/x86/mtrr.txt
@@ -6,10 +6,22 @@ Luis R. Rodriguez  - April 9, 2015
 ===
 Phasing out MTRR use
 
-MTRR use is replaced on modern x86 hardware with PAT. Over time the only type
-of effective MTRR that is expected to be supported will be for write-combining.
-As MTRR use is phased out device drivers should use arch_phys_wc_add() to make
-MTRR effective on non-PAT systems while a no-op on PAT enabled systems.
+MTRR use is replaced on modern x86 hardware with PAT. Direct MTRR use by
+drivers on Linux is now completely phased out, device drivers should use
+arch_phys_wc_add() in combination with ioremap_wc() to make MTRR effective on
+non-PAT systems while a no-op but equally effective on PAT enabled systems.
+
+Even if Linux does not use MTRR directly some x86 platform firmware may still
+set up MTRRs early before booting the OS, they do this as some platform
+firmware may still have implemented access to MTRRs which would be controlled
+and handled by the platform firmware directly. An example of platform use of
+MTRR is through the use of SMI handlers, one case could be for fan control,
+the platform code would need uncachable access to some of its fan control
+registers. Such platform access does not need any Operating System MTRR code in
+place other than mtrr_type_lookup() to ensure any OS specific mapping requests
+are aligned with platform MTRR setup. If MTRRs are only set up by the platform
+firmware code though and the OS does not make any specific MTRR mapping
+requests mtrr_type_lookup() should always return MTRR_TYPE_INVALID.
 
 For details refer to Documentation/x86/pat.txt.
 
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index e7ed0d8ebacb..f891b4750f04 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -448,7 +448,6 @@ int mtrr_add(unsigned long base, unsigned long size, 
unsigned int type,
return mtrr_add_page(base >> PAGE_SHIFT, size >> PAGE_SHIFT, type,
 increment);
 }
-EXPORT_SYMBOL(mtrr_add);
 
 /**
  * mtrr_del_page - delete a memory type region
@@ -537,7 +536,6 @@ int mtrr_del(int reg, unsigned long base, unsigned long 
size)
return -EINVAL;
return mtrr_del_page(reg, base >> PAGE_SHIFT, size >> PAGE_SHIFT);
 }
-EXPORT_SYMBOL(mtrr_del);
 
 /**
  * arch_phys_wc_add - add a WC MTRR and handle errors if PAT is unavailable
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net v2] sctp: start t5 timer only when peer.rwnd is 0 and local.state is SHUTDOWN_PENDING

2015-08-24 Thread Marcelo Ricardo Leitner

On Mon, Aug 24, 2015 at 02:36:59PM -0400, Vlad Yasevich wrote:
> On 08/24/2015 02:31 PM, Marcelo Ricardo Leitner wrote:
> > On Mon, Aug 24, 2015 at 02:13:38PM -0400, Vlad Yasevich wrote:
> >> On 08/23/2015 07:30 AM, Xin Long wrote:
> >>> when A sends a data to B, then A close() and enter into SHUTDOWN_PENDING 
> >>> state,
> >>> if B neither claim his rwnd is 0 nor send SACK for this data, A will keep
> >>> retransmitting this data util t5 timeout, Max.Retrans times can't work 
> >>> anymore,
> >>> which is bad.
> >>>
> >>> if B's rwnd is not 0, it should send abord after Max.Retrans times, only 
> >>> when
> >>> B's rwnd == 0 and A's retransmitting beyonds Max.Retrans times, A will 
> >>> start
> >>> t5 timer, which is also commit f8d960524 means, but it lacks the condition
> >>> peer.rwnd == 0.
> >>>
> >>> Fixes: f8d960524 ("sctp: Enforce retransmission limit during shutdown")
> >>> Signed-off-by: Xin Long 
> >>> ---
> >>>  net/sctp/sm_statefuns.c | 3 ++-
> >>>  1 file changed, 2 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
> >>> index 3ee27b7..deb9eab 100644
> >>> --- a/net/sctp/sm_statefuns.c
> >>> +++ b/net/sctp/sm_statefuns.c
> >>> @@ -5412,7 +5412,8 @@ sctp_disposition_t sctp_sf_do_6_3_3_rtx(struct net 
> >>> *net,
> >>>   SCTP_INC_STATS(net, SCTP_MIB_T3_RTX_EXPIREDS);
> >>>  
> >>>   if (asoc->overall_error_count >= asoc->max_retrans) {
> >>> - if (asoc->state == SCTP_STATE_SHUTDOWN_PENDING) {
> >>> + if (!q->asoc->peer.rwnd &&
> >>> + asoc->state == SCTP_STATE_SHUTDOWN_PENDING) {
> >>>   /*
> >>>* We are here likely because the receiver had its rwnd
> >>>* closed for a while and we have not been able to
> >>>
> >>
> >> This may not work as expected.  peer.rwnd is the calculated peer window, 
> >> but it
> >> also gets updated when we receive sacks.  So there is no way to tell that
> >> the current windows is 0 because peer told us, or because we sent data to 
> >> make 0
> >> and the peer hasn't responded.
> > 
> > I'm not sure I follow you, Vlad. I don't think we care on why we have
> > zero-window in there, just that if we are at it on that stage. Either
> > one, if it's zero window, we will go through T5 and give it more time to
> > recover, but if it's not zero window, I don't see a reason to enable T5..
> 
> No, these are 2 distinct instances.  In one instance, the peer is reachable 
> and
> is able to communication 0 rwnd state to us.  Thus we are being nice and 
> granting
> the peer more time to exit the 0 window state.
> 
> In the other state, the peer is unreachable and we just happen to hit the 
> 0-window
> condition based on some estimations of the peer window.  In this case, we 
> should
> be subject to the Max.RTX and terminate the association sooner.

Makes sense, we can do better in there. Thanks Vlad.

  Marcelo

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] netlink: netlink_ack send a capped message in case of error

2015-08-24 Thread Pablo Neira Ayuso

On Mon, Aug 24, 2015 at 10:08:22AM +0200, Christophe Ricard wrote:
> Hi Scott,
> 
> I think i understand the potential limitation of my solution.
> I saw something was proposed by Jiri Benc who pushed an additional flag to
> tell if the payload can be ignored in case of an error.
> http://patchwork.ozlabs.org/patch/290976/
> 
> Do you think this one is acceptable ? I am not sure to understand David
> last comment.

I think David suggests something like the (completely untested)
attached patch.
>From 3aa0deafb5648427d154e26920d9d85f89dab190 Mon Sep 17 00:00:00 2001
From: Pablo Neira Ayuso 
Date: Mon, 24 Aug 2015 20:23:45 +0200
Subject: [PATCH RFC] netlink: add NETLINK_CAP_ACK socket option

Since commit c05cdb1b864f ("netlink: allow large data transfers from
user-space"), the kernel may fail to allocate the necessary room for the
acknowledgement message back to userspace. This patch introduces a new socket
option that trims off the payload of the original netlink message.

The netlink message header is still included, so the user can guess from the
sequence number what is the message that has triggered the acknowledgment.

Signed-off-by: Pablo Neira Ayuso 
---
 include/uapi/linux/netlink.h |1 +
 net/netlink/af_netlink.c |   25 +++--
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/netlink.h b/include/uapi/linux/netlink.h
index cf6a65c..6f3fe16 100644
--- a/include/uapi/linux/netlink.h
+++ b/include/uapi/linux/netlink.h
@@ -110,6 +110,7 @@ struct nlmsgerr {
 #define NETLINK_TX_RING			7
 #define NETLINK_LISTEN_ALL_NSID		8
 #define NETLINK_LIST_MEMBERSHIPS	9
+#define NETLINK_CAP_ACK			10
 
 struct nl_pktinfo {
 	__u32	group;
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 67d2104..baa5973 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -84,6 +84,7 @@ struct listeners {
 #define NETLINK_F_BROADCAST_SEND_ERROR	0x4
 #define NETLINK_F_RECV_NO_ENOBUFS	0x8
 #define NETLINK_F_LISTEN_ALL_NSID	0x10
+#define NETLINK_F_CAP_ACK		0x20
 
 static inline int netlink_is_kernel(struct sock *sk)
 {
@@ -2258,6 +2259,13 @@ static int netlink_setsockopt(struct socket *sock, int level, int optname,
 			nlk->flags &= ~NETLINK_F_LISTEN_ALL_NSID;
 		err = 0;
 		break;
+	case NETLINK_CAP_ACK:
+		if (val)
+			nlk->flags |= NETLINK_F_CAP_ACK;
+		else
+			nlk->flags &= ~NETLINK_F_CAP_ACK;
+		err = 0;
+		break;
 	default:
 		err = -ENOPROTOOPT;
 	}
@@ -2332,6 +2340,16 @@ static int netlink_getsockopt(struct socket *sock, int level, int optname,
 		netlink_table_ungrab();
 		break;
 	}
+	case NETLINK_CAP_ACK:
+		if (len < sizeof(int))
+			return -EINVAL;
+		len = sizeof(int);
+		val = nlk->flags & NETLINK_F_CAP_ACK ? 1 : 0;
+		if (put_user(len, optlen) ||
+		put_user(val, optval))
+			return -EFAULT;
+		err = 0;
+		break;
 	default:
 		err = -ENOPROTOOPT;
 	}
@@ -2869,13 +2887,16 @@ EXPORT_SYMBOL(__netlink_dump_start);
 
 void netlink_ack(struct sk_buff *in_skb, struct nlmsghdr *nlh, int err)
 {
+	struct netlink_sock *nlk = nlk_sk(in_skb->sk);
 	struct sk_buff *skb;
 	struct nlmsghdr *rep;
 	struct nlmsgerr *errmsg;
 	size_t payload = sizeof(*errmsg);
 
-	/* error messages get the original request appened */
-	if (err)
+	/* Error messages get the original request appended, unless the user
+	 * requests to cap the error message.
+	 */
+	if (!(nlk->flags & NETLINK_F_CAP_ACK) && err)
 		payload += nlmsg_len(nlh);
 
 	skb = netlink_alloc_skb(in_skb->sk, nlmsg_total_size(payload),
-- 
1.7.10.4

Re: [PATCH v3 01/10] ss: rooted out ss type declarations for output formatters

2015-08-24 Thread Matthias Tafelmeier

-BEGIN PGP MESSAGE-
Charset: windows-1252
Version: GnuPG v1

hQQOAweL74a5LMkVEA//azcgajmoTO+UKZPf5wl+V8QAi/r9gCmyyJR0wV6RsH0N
sUpnR2c9uSVNU+J41L206vDsnNk0Huoa6m6miibLFg3mxQ9KTDdzaePmkfk9FwCC
Au7RsDzxo8nq/rpZsPeD2r/EAod6C3XVGRNc6nAMMi84tMCtObjDFDQs+mPcWf5n
nCZwmdovGtzCHpw6moq51K8pql0CmRpFSnMdSVySykxc/pFetRBpBJ1hJBT3pCEc
ZYogu5LbKqCbn2xpwXPQDC0i3iEU1sa2xuucj88y8yG9Bdy08mgEwFJvbg+wP83e
oERVIFKIQK02qeS04RgEt5w8t/3b5F3GOn8lqCjLTiXssiKCjgqh0KsdSeE4SwMN
Ny8ND6SOSScCqx1lBvViBGpYw40CdrMwmI5opV0Ljm4lvzmmk8sNcxJhcKQKh/Gq
UYHm0oVMFsoiHlKREKtNn8k8A6fWKes1Psoa2ZsyQfiLH0lSii/eA5OudKYHRw71
oT688HPjABQ7PWjBN7cPr2CqNXutbA7NzvcjmaGZ0aXyd5OMIMyMbPZ4uhWtpN4N
sKtmaxh7kCBnNE50tj65X/hJurEl/tgJeWK8HjOVQOXlqKOTszxZbmvj3Sy0OHRS
RYUSDDaKbgPNeFF67+/ebHlUObo/gt2z9rLZhok9lP8OuXV35Am8qT2INcxsKtEQ
AITEgflrFsR+XG9XQNChtwqsFBx2dULsJqI4QHngtf6OcYNYJ544q6BSe8Meij++
IMAgFdFqyYpgsALsSOqcYbqaL4rxANe+1/gp/71ge9jm/T8vXRIImZuonYvEpJPS
cIvett1uOjmqckI2L4upz3Kx2La5+qhmMmvXxieib8Cmu51WIl9uSHwLsTkZQvyM
oqShZqe/w3zh+3RcuqgaRsTQIpAW5ArS8TvHs5GqYGAo290PxTSYfa70/YmDqkrx
rUGB1Dj7525fMUPACwNU0EemM0ia8ZmpBUWNcEtoPjppROjZ9MsuxDyJHSv7ghre
nTSL1AjcgUmphUsyq/YMD4sxW5k052GgmrQQOEp+LQX6U+r89uDkno3rsgwHCjKb
H7dKL7xqlG3vX8OKsUUvTCQxQIyqCoHxa3Iu6KPyBDq7061A4cNZdE//Jk6kHT0B
lSM50Prkaok+DR5lLBGZ87LVqmqasN/R73m0l+Jz6grMBfaqpUmkyrr3tyQEAZjD
C7IdnY5mrAM/q2GEQDjxk0gHdWBgg1NCNmRdMJ5VEyDtXeDgujnvSJPlTshqhbNE
MTSX9HCdGTwe5L8gEkpcSR7RmZEzPE6Qq4p+lB6cQa8rZlnjuFi7WMXxcivVE4ln
KaAEx+2FSWJ1dAUyLre4aIY//OWz9g2PKsau3U2JcCCxhQQOA+gF559QFX/oEA//
SQPehPCO9VD0sakvFZOCsWN0+65dOlK5hr3rzC2crZeMDaqr9fX7/5IGNHhAaqyc
7SjByXH0f2QC4M2LhR1oA/CihIsWo+0bnmfhJvfnOKrT1KZBqf8irIbdb6vw80e8
0cVE2FhD3WhawMtRv0a1L0eD3iLhz8C9utNe1iRl6hxdu4JKTvqRt7JAjy0dMy4W
4OcviIgMvE6EXmk8hr9LFx17ULGysYswaDBgBW+R1jjp9+EwzyfAmrGbghjtRxJX
4WMn2xYidBWsq3i/86hz0SkENsfrzhc5evOUun8W0yoiJZm6PQcwqsD4nJ8iGbaT
1GrNNdpoyHdQ8F1IXaabkJwVBZXvDxMLTTOKRlapzCEZVOZgirPm562sWY0yQ/9w
Igvv/ufTqLrfU7Wy5tKcuJsDGHdVYaPyusRlOLD65tRQWpDnosAboBB+lpTF13zh
nx4TWO6FqUA74Nc9OslspPz+FOX1vGCrQV6xCnKXP0xOZP3gSJ7yoJlcxhjRnwrf
XfXj2Rh6+V2KrvPsw4PqidBGkNwp1hbd/qB7loFuklt/vRZ02BVEuc3Wo5jDD8Qa
Jct6hwryw3PDu7lI7Zb74CxlzycqhUsHlRtpCXxr6hJepuy8zA1sFKLR8JR+X/u4
QmB1jA+h4WKAXEMuunrEqYitJ4wDrlV6/Kn06H/+K6sP/jjedosIAj46gVXpllu2
MrPVcyo4KQ+uyXhUuq1OeeSGKOmOC70wGrUOZKWTwrXhWMlRgPiiMj1V1/CAHQ2n
7D0ktruvzXf8+rkz2khftz5weke69iHEcjJ9uGiDPYExBGaZQ4SOKdOS2FLz/BWX
o+WhuHJ3R/jgKpkR/KQUKe4ueG0wQOqR8DqIB5D8+PV5NMn7AxXuVLUTsBf0vFoy
BS63QxhPxeD0x9T6SyFwIbpK7h6kr4R67HWvz0Ryu8Dly214IzWTrqzj4j/1ew0e
omKjqyI/+BhHKED7fjfQpmqXhIB492iluzfEeHqXliTxMM+wKdzRdiMeFPwhvI8M
YpEITOwTxvaMMcpA60fvxw39BM4TdnUAyc63O3ebciPENGQq32F9Vj4tAg5IRGMS
BN9HX3dU0PMhCT0UrbNRIWROSGJu43UFqymwffflhJb0xMCqslBsk9qhUqjtEq3A
DLdFt8YZz82uUjqSYyayVr14QdeTj7PhJoX+fOFjWPSLru6jVV7PjekuVfhcILWk
PFyCKYenPdWPyswI7NPmmd4YtovoJTBd0Mzijc2a6S+HJP9FNn3lG1p1L4We/Xcq
ivOvAIRN4NZuHHzTk75WoDsBaL9j+/ddVsyedKfkjo/bQCTnmpR9R3MAibynIT5K
YWUh1wXM2nLqQ/QQL/WHVLUNhQIOA2Nel9d25tJfEAf+JZJbaPnTzw2X04NnnQra
MLdNVys9c6HQUuBUmy6IHN48w5RE8ZXpl8jeiXh8/fyVGM01Ae4oG9Nf3nirAXE3
BW3gncwwEXootoJKeuQfRlVfTi+iHuapw8Bt16CpHe434cBlaeaaN6ZGQmKS8Bgd
vDn3TQn4Bp1gsahDnc9IjSIQMq+sUAeWE97UFrbK0sfs9/81NBTPS5xX8n+r4NEE
mxTHuZIS81aSQHpMc0j2vkZbl6En+JvYa1LcWkw7miIPXN6fGQPjGmw2W4dVoYho
2YwVh0I5mOAYH7kpmJ3bDtIh/DCGRygKqX5xpBBYFUgi0degNM2BXf9Amfcxc+wu
YggAsTXFNzxJZD2mMqyR+VI/2Ep9mJEGYBXUI4cVk6tF480VNYzvSnTeIsUFG2dw
7WXDs4Quon1Myah8ybUfXujvP21y+WG9Kdin9Y2pAtyrJkUdfL8kgQd8leQMHAx0
W+ECgRRcjgXUR5gIGGitYXw2vRfIwMqY7D8N2b5zfhtr94h/7cJORL973C5OAx10
Su4ROx/AU0Q2SWjW/3FwfD2praurLrqTcsD1UvTkW3YX1g5zdZomwYho1v8Qh/rM
R1nYKNcIF9fycGnRsi8F+vqP84uZBtS2p6CRDOSh+9l0iDFAJNwa+lg0abIA4o60
YgFqY6AeJjVwxZ+37ru362r2lIUCDAPlr88g4tbi7QEP/0JYP/q6tqdy6JyVzBu0
+vnc7tNGzEAW42QgYwDRxOebil/ojkhnHxL9tPqToRS76S0i8aI70Z4E7Aq8K/pW
N2QgSNIS2tPdTbVYl9VDidERncLa2eHiCok74H3aR3vLZUC1TGtuvrn2m+PSAzYB
o85whG4nD0sz4HCrqIiR/ASWdkgNc1VGS/0lIeZnfij+WQPN8FvwMFOwKeFlSihs
zIYnT1bzNPe3d+yAhE9C1mOSgFbLs1s287f41moAxvF6OsYtWwyE6y3bE0mDsf2e
0bTSsEFrYmipp9lAgb0C3vCqp8DMrySduk4PEAf/LXkI6wo/TgAlBHHK8YvDlm4E
MEJhUHlt4LkmeEkMTDPtEnuf8DuG0IB8ap4ayQL87hC9N0LrX5hOQVmdWJw5B7Y6
pUIZdOADDT9f/pWNlDu/4bHFAPJNK1S9A2hkhk9FeqTwp4zJCTi7jvHrRad9ckri
rad/V60rOij/OoEYZjbW93mTKyhq48GIU4+az9+S/Cb4RrcSXWf7kyoXmXCB+E7Q
kKS8L+YaLP8q4QuRyUnY/GXDtwu95S0SJ/AJ7y7XLjshcyJ8xdlAJWXFJhAECdJ9
XV/HvxvFAod7L6ag1gHGuzrZRG8SbMTtyiWzDa3FhqtwmvGiSs7/RyDL2wWtV9VD
TRrBGwWaSWsceqRl34UEDVIZhQEMA9VWY3oy9Ed1AQf+NzCoZT1ryrfEI/YzS7sr
J8dUKR2dSdjyX62plHjILQCUET6PEgJeTKRotv0THPLdJDjieTsQqbl5pXdm/T6n
P7hOhC97xZY/Giweabd30ZesAnnnNOW5zBGH8PSpmU3hMnWCsp1Rnp3C2SaeZvxb
Y7QxZRbg+2mb7fp41j3lqiNRczPVldc8iK9ngZSzzMw4MX2mw3BmPwxmD4uUD89y
BuB4J53FMF6TUZe9k38XHgmRLvDq+LYQhf53YNcb/DQ4AB1JHMkitKR1oyI3ptOx
urYuwMNtn2E/EgOqGIbjQrW8JKIUJ8bTgDQiSE7rkZp172+edU1hQYpbfTlAfW0R
zNLpAVd3bBldWfkH29RT2VpxSPbgclAXZGq+ZtBvuTa1p6T3GD0Y95vB70set+w7
oPxjpbjx3zWR/3M6Bz8wla9Bi2qXJ+J87dYRAIK3rYwpnrKfysY+qocJTQiO/Gku
62P2yWKYwcLvRa7v5TwzHxGq0CEul/bzIj8r9ixhReXKGq2PkKEiByqZ900kobMZ
03O4ZXY1FDo9GQ5w6N6T8dPxMuF/hTYADt/8xs1p6RUM+AfljtDdJ1LvBx9pDWjB
KF9ieqOkXw7QPCI7lp9XDv4s0Gqup9bTNOd7doTwc4ah5glnu6IdixG

Re: [PATCH, net-next] r8169: On RTL 8101 series bit SYSErr is reserved.

2015-08-24 Thread David Miller

From: Corcodel Marian 
Date: Mon, 24 Aug 2015 21:12:53 +0300

> diff --git a/drivers/net/ethernet/realtek/r8169.c 
> b/drivers/net/ethernet/realtek/r8169.c
> index 5693e65..32d2072 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -8256,6 +8256,14 @@ static int rtl_init_one(struct pci_dev *pdev, const 
> struct pci_device_id *ent)
>   RTL_W8(Config1, RTL_R8(Config1) | PMEnable);
>   RTL_W8(Config5, RTL_R8(Config5) & (BWF | MWF | UWF | LanWake | 
> PMEStatus));*/
>   switch (tp->mac_version) {
> + case RTL_GIGA_MAC_VER_07:
> + case RTL_GIGA_MAC_VER_08:
> + case RTL_GIGA_MAC_VER_09:
> + case RTL_GIGA_MAC_VER_10:
> + case RTL_GIGA_MAC_VER_13:
> + case RTL_GIGA_MAC_VER_16:
> + pci_write_config_word(pdev, PCI_COMMAND, ~PCI_COMMAND_SERR); 

You're writing all sorts of bits you definitely don't want to set here.

Furthermore, there is no need to clear a bit that shouldn't be set
in the first place.

Your patches are really full of major errors, and unsuitable for
upstream.

Yes, all of them.

So please stop posting your r8169 changes here, because if you don't
care if your patches get included or not, then you should not be
posting them here.  This isn't a place to just dump ramdom patches,
sorry.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net v2] sctp: start t5 timer only when peer.rwnd is 0 and local.state is SHUTDOWN_PENDING

2015-08-24 Thread Vlad Yasevich

On 08/24/2015 02:31 PM, Marcelo Ricardo Leitner wrote:
> On Mon, Aug 24, 2015 at 02:13:38PM -0400, Vlad Yasevich wrote:
>> On 08/23/2015 07:30 AM, Xin Long wrote:
>>> when A sends a data to B, then A close() and enter into SHUTDOWN_PENDING 
>>> state,
>>> if B neither claim his rwnd is 0 nor send SACK for this data, A will keep
>>> retransmitting this data util t5 timeout, Max.Retrans times can't work 
>>> anymore,
>>> which is bad.
>>>
>>> if B's rwnd is not 0, it should send abord after Max.Retrans times, only 
>>> when
>>> B's rwnd == 0 and A's retransmitting beyonds Max.Retrans times, A will start
>>> t5 timer, which is also commit f8d960524 means, but it lacks the condition
>>> peer.rwnd == 0.
>>>
>>> Fixes: f8d960524 ("sctp: Enforce retransmission limit during shutdown")
>>> Signed-off-by: Xin Long 
>>> ---
>>>  net/sctp/sm_statefuns.c | 3 ++-
>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
>>> index 3ee27b7..deb9eab 100644
>>> --- a/net/sctp/sm_statefuns.c
>>> +++ b/net/sctp/sm_statefuns.c
>>> @@ -5412,7 +5412,8 @@ sctp_disposition_t sctp_sf_do_6_3_3_rtx(struct net 
>>> *net,
>>> SCTP_INC_STATS(net, SCTP_MIB_T3_RTX_EXPIREDS);
>>>  
>>> if (asoc->overall_error_count >= asoc->max_retrans) {
>>> -   if (asoc->state == SCTP_STATE_SHUTDOWN_PENDING) {
>>> +   if (!q->asoc->peer.rwnd &&
>>> +   asoc->state == SCTP_STATE_SHUTDOWN_PENDING) {
>>> /*
>>>  * We are here likely because the receiver had its rwnd
>>>  * closed for a while and we have not been able to
>>>
>>
>> This may not work as expected.  peer.rwnd is the calculated peer window, but 
>> it
>> also gets updated when we receive sacks.  So there is no way to tell that
>> the current windows is 0 because peer told us, or because we sent data to 
>> make 0
>> and the peer hasn't responded.
> 
> I'm not sure I follow you, Vlad. I don't think we care on why we have
> zero-window in there, just that if we are at it on that stage. Either
> one, if it's zero window, we will go through T5 and give it more time to
> recover, but if it's not zero window, I don't see a reason to enable T5..

No, these are 2 distinct instances.  In one instance, the peer is reachable and
is able to communication 0 rwnd state to us.  Thus we are being nice and 
granting
the peer more time to exit the 0 window state.

In the other state, the peer is unreachable and we just happen to hit the 
0-window
condition based on some estimations of the peer window.  In this case, we should
be subject to the Max.RTX and terminate the association sooner.

-vlad

> 
>   Marcelo
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] usbnet: Fix two races between usbnet_stop() and the BH

2015-08-24 Thread David Miller

From: Alan Stern 
Date: Mon, 24 Aug 2015 14:06:15 -0400 (EDT)

> On Mon, 24 Aug 2015, David Miller wrote:
>> Atomic operations like clear_bit also will behave that way.
> 
> Are you certain about that?  I couldn't find any mention of it in
> Documentation/atomic_ops.txt.
> 
> In theory, an architecture could implement atomic bit operations using 
> a spinlock to insure atomicity.  I don't know if any architectures do 
> this, but if they do then the scenario above could arise.

Indeed, we do have platforms like 32-bit sparc and parisc that do this.

So, taking that into consideration, this is a bit unfortunate and on
such platforms we do have this problem.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net v2] sctp: start t5 timer only when peer.rwnd is 0 and local.state is SHUTDOWN_PENDING

2015-08-24 Thread Marcelo Ricardo Leitner

On Mon, Aug 24, 2015 at 02:13:38PM -0400, Vlad Yasevich wrote:
> On 08/23/2015 07:30 AM, Xin Long wrote:
> > when A sends a data to B, then A close() and enter into SHUTDOWN_PENDING 
> > state,
> > if B neither claim his rwnd is 0 nor send SACK for this data, A will keep
> > retransmitting this data util t5 timeout, Max.Retrans times can't work 
> > anymore,
> > which is bad.
> > 
> > if B's rwnd is not 0, it should send abord after Max.Retrans times, only 
> > when
> > B's rwnd == 0 and A's retransmitting beyonds Max.Retrans times, A will start
> > t5 timer, which is also commit f8d960524 means, but it lacks the condition
> > peer.rwnd == 0.
> > 
> > Fixes: f8d960524 ("sctp: Enforce retransmission limit during shutdown")
> > Signed-off-by: Xin Long 
> > ---
> >  net/sctp/sm_statefuns.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
> > index 3ee27b7..deb9eab 100644
> > --- a/net/sctp/sm_statefuns.c
> > +++ b/net/sctp/sm_statefuns.c
> > @@ -5412,7 +5412,8 @@ sctp_disposition_t sctp_sf_do_6_3_3_rtx(struct net 
> > *net,
> > SCTP_INC_STATS(net, SCTP_MIB_T3_RTX_EXPIREDS);
> >  
> > if (asoc->overall_error_count >= asoc->max_retrans) {
> > -   if (asoc->state == SCTP_STATE_SHUTDOWN_PENDING) {
> > +   if (!q->asoc->peer.rwnd &&
> > +   asoc->state == SCTP_STATE_SHUTDOWN_PENDING) {
> > /*
> >  * We are here likely because the receiver had its rwnd
> >  * closed for a while and we have not been able to
> > 
> 
> This may not work as expected.  peer.rwnd is the calculated peer window, but 
> it
> also gets updated when we receive sacks.  So there is no way to tell that
> the current windows is 0 because peer told us, or because we sent data to 
> make 0
> and the peer hasn't responded.

I'm not sure I follow you, Vlad. I don't think we care on why we have
zero-window in there, just that if we are at it on that stage. Either
one, if it's zero window, we will go through T5 and give it more time to
recover, but if it's not zero window, I don't see a reason to enable T5..

  Marcelo

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] sctp: partial chunk should be drop without sending abort packet

2015-08-24 Thread Daniel Borkmann


On 08/24/2015 02:47 PM, Marcelo Ricardo Leitner wrote:

On Mon, Aug 24, 2015 at 06:08:30PM +0800, Xin Long wrote:

as RFC 4960, 6.10 said, *if the receiver detects a partial chunk, it MUST drop
the chunk*, we should not send the abort. but if we put this discard to inside
state machine, it will send abort.

so we just drop the partial chunk there, never let this chunk go into the state
machine.

Signed-off-by: Xin Long 
---


This is basically reverting a chunk of Daniel's and Vlad's 26b87c788100
("net: sctp: fix remote memory pressure from excessive queueing") .
Isn't it going to re-introduce the initial issue then?


Yes, seems so.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] usbnet: Fix two races between usbnet_stop() and the BH

2015-08-24 Thread Alan Stern

On Mon, 24 Aug 2015, Alan Stern wrote:

> On Mon, 24 Aug 2015, David Miller wrote:
> 
> > From: Eugene Shatokhin 
> > Date: Wed, 19 Aug 2015 14:59:01 +0300
> > 
> > > So the following might be possible, although unlikely:
> > > 
> > > CPU0 CPU1
> > >  clear_bit: read dev->flags
> > >  clear_bit: clear EVENT_RX_KILL in the read value
> > > 
> > > dev->flags=0;
> > > 
> > >  clear_bit: write updated dev->flags
> > > 
> > > As a result, dev->flags may become non-zero again.
> > 
> > Is this really possible?
> > 
> > Stores really are "atomic" in the sense that the do their update
> > in one indivisible operation.
> 
> Provided you use ACCESS_ONCE or WRITE_ONCE or whatever people like to 
> call it now.
> 
> > Atomic operations like clear_bit also will behave that way.
> 
> Are you certain about that?  I couldn't find any mention of it in
> Documentation/atomic_ops.txt.
> 
> In theory, an architecture could implement atomic bit operations using 
> a spinlock to insure atomicity.  I don't know if any architectures do 
> this, but if they do then the scenario above could arise.

Now that I see this in writing, I realize it's not possible after all.  
clear_bit() et al. will work with a single unsigned long, which doesn't
leave any place for spinlocks or other mechanisms.  I was thinking of 
atomic_t.

So never mind...

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH, net-next] r8169: On RTL 8101 series bit SYSErr is reserved.

2015-08-24 Thread Corcodel Marian

On RTL 8101 series bit SYSErr is reserved.

Signed-off-by: Corcodel Marian 

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index 5693e65..32d2072 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -8256,6 +8256,14 @@ static int rtl_init_one(struct pci_dev *pdev, const 
struct pci_device_id *ent)
RTL_W8(Config1, RTL_R8(Config1) | PMEnable);
RTL_W8(Config5, RTL_R8(Config5) & (BWF | MWF | UWF | LanWake | 
PMEStatus));*/
switch (tp->mac_version) {
+   case RTL_GIGA_MAC_VER_07:
+   case RTL_GIGA_MAC_VER_08:
+   case RTL_GIGA_MAC_VER_09:
+   case RTL_GIGA_MAC_VER_10:
+   case RTL_GIGA_MAC_VER_13:
+   case RTL_GIGA_MAC_VER_16:
+   pci_write_config_word(pdev, PCI_COMMAND, ~PCI_COMMAND_SERR); 
+   break;
case RTL_GIGA_MAC_VER_34:
case RTL_GIGA_MAC_VER_35:
case RTL_GIGA_MAC_VER_36:
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Low throughput in VMs using VxLAN

2015-08-24 Thread Vlad Yasevich

On 08/24/2015 12:19 PM, Santosh R wrote:
>  Hi,
> 
>Earlier I was seeing lower throughput in VMs using VxLan as GRO was
> not happening in VM.
> Tom Herbert suggested to use "vxlan: GRO support at tunnel layer" patch 
> series.
> With today's net-next (4.2.0-rc7) in host and VM, I could see GRO
> happening for vxlan, macvtap and virtual interface in VM.
> The throughput is still low between VMs (around 4Gbps compared to
> 9Gbps without VxLAN).
> Looks like the packet is getting segmented in Host and then GROed in VM.
> Is this an expected behaviour?

Currently yes.  I am working on adding GSO_TUNNEL and related checksum support
to virtio to eliminate this segmentation.

-vlad


> Is my below configuration correct?
> 
> Here is the configuration.
> eth (VM) - macvtap - vxlan - phy iface  <-> phy iface - vxlan -
> macvtap - (VM) eth
> 
> VM is started with
> # qemu-system-x86_64 -m 4096 -smp 4 -boot c  -device
> virtio-net-pci,netdev=hostnet0,id=net0,mac=C2:B2:CA:6F:BC:A4 -device
> e1000,netdev=tap0,mac=DE:AD:BE:EF:96:32 -netdev tap,id=hostnet0,fd=3
> 3<>/dev/tap18 -netdev tap,id=tap0,script=no  -drive
> file=/root/vdisk_rhel65.img
> 
> Here is the skb_segment count for 10 sec iperf receive test.
> host # ./funccount skb_segment
> Tracing "skb_segment"... Ctrl-C to end.
> ^C
> FUNC  COUNT
> skb_segment   58604
> 
> # ./functrace skb_segment
> ...
>  -0 [006] ..s. 17632.030126: skb_segment <-tcp_gso_segment
>  ksoftirqd/6-38[006] ..s. 17632.030177: skb_segment <-tcp_gso_segment
>  ksoftirqd/6-38[006] ..s. 17632.030223: skb_segment <-tcp_gso_segment
>  ksoftirqd/6-38[006] ..s. 17632.030269: skb_segment <-tcp_gso_segment
>  ksoftirqd/6-38[006] ..s. 17632.030298: skb_segment <-tcp_gso_segment
>  qemu-system-x86-5932  [006] ..s. 17632.030489: skb_segment <-tcp_gso_segment
>  qemu-system-x86-5932  [006] ..s. 17632.030507: skb_segment <-tcp_gso_segment
>  qemu-system-x86-5932  [006] ..s. 17632.030528: skb_segment <-tcp_gso_segment
>  qemu-system-x86-5932  [006] ..s. 17632.030550: skb_segment <-tcp_gso_segment
>  qemu-system-x86-5932  [006] ..s. 17632.030576: skb_segment <-tcp_gso_segment
>  qemu-system-x86-5932  [006] ..s1 17632.030759: skb_segment <-tcp_gso_segment
>  qemu-system-x86-5932  [006] ..s1 17632.030814: skb_segment <-tcp_gso_segment
> ..
> 
> # Physical interface
> 21:32:49.749263 IP 102.22.22.14.39561 > 102.22.22.12.otv: UDP, length 2870
> 21:32:49.749278 IP 102.22.22.14.39561 > 102.22.22.12.otv: UDP, length 9860
> 21:32:49.749326 IP 102.22.22.12.44214 > 102.22.22.14.otv: UDP, length 74
> 21:32:49.749333 IP 102.22.22.12.44214 > 102.22.22.14.otv: UDP, length 74
> 21:32:49.749340 IP 102.22.22.12.44214 > 102.22.22.14.otv: UDP, length 74
> 21:32:49.749405 IP 102.22.22.14.39561 > 102.22.22.12.otv: UDP, length 2870
> 21:32:49.749425 IP 102.22.22.14.39561 > 102.22.22.12.otv: UDP, length 11258
> 
> # VxLAN
> 21:32:49.749268 IP 102.44.44.14.60616 > 102.44.44.12.commplex-link:
> Flags [.], seq 25:2821, ack 1, win 111, options [nop,nop,TS val
> 15632994 ecr 13334931], length 2796
> 21:32:49.749281 IP 102.44.44.14.60616 > 102.44.44.12.commplex-link:
> Flags [.], seq 2821:12607, ack 1, win 111, options [nop,nop,TS val
> 15632994 ecr 13334931], length 9786
> 21:32:49.749322 IP 102.44.44.12.commplex-link > 102.44.44.14.60616:
> Flags [.], ack 2821, win 270, options [nop,nop,TS val 13334931 ecr
> 15632994], length 0
> 21:32:49.749331 IP 102.44.44.12.commplex-link > 102.44.44.14.60616:
> Flags [.], ack 7015, win 336, options [nop,nop,TS val 13334931 ecr
> 15632994], length 0
> 21:32:49.749336 IP 102.44.44.12.commplex-link > 102.44.44.14.60616:
> Flags [.], ack 12607, win 423, options [nop,nop,TS val 13334931 ecr
> 15632994], length 0
> 21:32:49.749411 IP 102.44.44.14.60616 > 102.44.44.12.commplex-link:
> Flags [.], seq 12607:15403, ack 1, win 111, options [nop,nop,TS val
> 15632994 ecr 13334931], length 2796
> 21:32:49.749429 IP 102.44.44.14.60616 > 102.44.44.12.commplex-link:
> Flags [P.], seq 15403:26587, ack 1, win 111, options [nop,nop,TS val
> 15632994 ecr 13334931], length 11184
> 
> # macvtap
> 2.44.44.14.60616 > 102.44.44.12.commplex-link: Flags [.], seq 25:2821,
> ack 1, win 111, options [nop,nop,TS val 15632994 ecr 13334931], length
> 2796
> 21:32:49.749281 IP 102.44.44.14.60616 > 102.44.44.12.commplex-link:
> Flags [.], seq 2821:12607, ack 1, win 111, options [nop,nop,TS val
> 15632994 ecr 13334931], length 9786
> 21:32:49.749321 IP 102.44.44.12.commplex-link > 102.44.44.14.60616:
> Flags [.], ack 2821, win 270, options [nop,nop,TS val 13334931 ecr
> 15632994], length 0
> 21:32:49.749330 IP 102.44.44.12.commplex-link > 102.44.44.14.60616:
> Flags [.], ack 7015, win 336, options [nop,nop,TS val 13334931 ecr
> 15632994], length 0
> 21:32:49.749335 IP 102.44.44.12.commplex-link > 102.44.44.14.60616:
> Flags [.], ack 12607, win 423, options [nop,nop,TS val 13334931 ecr
> 15632994], length 0
> 21:32:49.749411 IP 102.44

Re: [PATCH net v2] sctp: start t5 timer only when peer.rwnd is 0 and local.state is SHUTDOWN_PENDING

2015-08-24 Thread Vlad Yasevich

On 08/23/2015 07:30 AM, Xin Long wrote:
> when A sends a data to B, then A close() and enter into SHUTDOWN_PENDING 
> state,
> if B neither claim his rwnd is 0 nor send SACK for this data, A will keep
> retransmitting this data util t5 timeout, Max.Retrans times can't work 
> anymore,
> which is bad.
> 
> if B's rwnd is not 0, it should send abord after Max.Retrans times, only when
> B's rwnd == 0 and A's retransmitting beyonds Max.Retrans times, A will start
> t5 timer, which is also commit f8d960524 means, but it lacks the condition
> peer.rwnd == 0.
> 
> Fixes: f8d960524 ("sctp: Enforce retransmission limit during shutdown")
> Signed-off-by: Xin Long 
> ---
>  net/sctp/sm_statefuns.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
> index 3ee27b7..deb9eab 100644
> --- a/net/sctp/sm_statefuns.c
> +++ b/net/sctp/sm_statefuns.c
> @@ -5412,7 +5412,8 @@ sctp_disposition_t sctp_sf_do_6_3_3_rtx(struct net *net,
>   SCTP_INC_STATS(net, SCTP_MIB_T3_RTX_EXPIREDS);
>  
>   if (asoc->overall_error_count >= asoc->max_retrans) {
> - if (asoc->state == SCTP_STATE_SHUTDOWN_PENDING) {
> + if (!q->asoc->peer.rwnd &&
> + asoc->state == SCTP_STATE_SHUTDOWN_PENDING) {
>   /*
>* We are here likely because the receiver had its rwnd
>* closed for a while and we have not been able to
> 

This may not work as expected.  peer.rwnd is the calculated peer window, but it
also gets updated when we receive sacks.  So there is no way to tell that
the current windows is 0 because peer told us, or because we sent data to make 0
and the peer hasn't responded.

-vlad
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] usbnet: Fix two races between usbnet_stop() and the BH

2015-08-24 Thread Eugene Shatokhin

24.08.2015 20:43, David Miller пишет:

From: Eugene Shatokhin 
Date: Wed, 19 Aug 2015 14:59:01 +0300

So the following might be possible, although unlikely:

CPU0 CPU1
  clear_bit: read dev->flags
  clear_bit: clear EVENT_RX_KILL in the read value

dev->flags=0;

  clear_bit: write updated dev->flags

As a result, dev->flags may become non-zero again.

Is this really possible?

On x86, it is not possible, so this is not a problem. Perhaps, for ARM 
too. As for the other architectures supported by the kernel - not sure, 
no common guarantees, it seems. Anyway, this is not a critical issue, I 
agree.

OK, let us leave things as they are for this one and fix the rest.

Stores really are "atomic" in the sense that the do their update
in one indivisible operation.

Atomic operations like clear_bit also will behave that way.

If a clear_bit is in progress, the "dev->flags=0" store will not be
able to grab the cache line exclusively until the clear_bit is done.

So I think the above sequent of events is completely impossible.  Once
a clear_bit starts, a write by another foreign agent on the bus is
absolutely impossible to legally occur until the clear_bit completes.

I think this is a non-issue.

Regards,
Eugene

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] usbnet: Fix two races between usbnet_stop() and the BH

2015-08-24 Thread Alan Stern

On Mon, 24 Aug 2015, David Miller wrote:

> From: Eugene Shatokhin 
> Date: Wed, 19 Aug 2015 14:59:01 +0300
> 
> > So the following might be possible, although unlikely:
> > 
> > CPU0 CPU1
> >  clear_bit: read dev->flags
> >  clear_bit: clear EVENT_RX_KILL in the read value
> > 
> > dev->flags=0;
> > 
> >  clear_bit: write updated dev->flags
> > 
> > As a result, dev->flags may become non-zero again.
> 
> Is this really possible?
> 
> Stores really are "atomic" in the sense that the do their update
> in one indivisible operation.

Provided you use ACCESS_ONCE or WRITE_ONCE or whatever people like to 
call it now.

> Atomic operations like clear_bit also will behave that way.

Are you certain about that?  I couldn't find any mention of it in
Documentation/atomic_ops.txt.

In theory, an architecture could implement atomic bit operations using 
a spinlock to insure atomicity.  I don't know if any architectures do 
this, but if they do then the scenario above could arise.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] sctp: partial chunk should be drop without sending abort packet

2015-08-24 Thread Vlad Yasevich

On 08/24/2015 06:08 AM, Xin Long wrote:
> as RFC 4960, 6.10 said, *if the receiver detects a partial chunk, it MUST drop
> the chunk*, we should not send the abort. but if we put this discard to inside
> state machine, it will send abort.
> 

Actually, silently dropping this is _very_ bad.  There reason is that you've 
already
processed the leading chunks and may have potentially queued a response...  
Now, you
reach the end of the packet and find that the last chunk is partial.  You end up
dropping the packet, but still handing the responses.  This actually lead to 
some very
interesting issues we were seeing.

It is better to terminate the association in this case.

-vlad

> so we just drop the partial chunk there, never let this chunk go into the 
> state
> machine.
> 
> Signed-off-by: Xin Long 
> ---
>  net/sctp/inqueue.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/net/sctp/inqueue.c b/net/sctp/inqueue.c
> index 7e8a16c..a22ca57 100644
> --- a/net/sctp/inqueue.c
> +++ b/net/sctp/inqueue.c
> @@ -183,9 +183,9 @@ struct sctp_chunk *sctp_inq_pop(struct sctp_inq *queue)
>   /* This is not a singleton */
>   chunk->singleton = 0;
>   } else if (chunk->chunk_end > skb_tail_pointer(chunk->skb)) {
> - /* Discard inside state machine. */
> - chunk->pdiscard = 1;
> - chunk->chunk_end = skb_tail_pointer(chunk->skb);
> + sctp_chunk_free(chunk);
> + chunk = queue->in_progress = NULL;
> + return NULL;
>   } else {
>   /* We are at the end of the packet, so mark the chunk
>* in case we need to send a SACK.
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] sctp: asconf process should treat multiple address parameter as unrecognized parameter

2015-08-24 Thread Vlad Yasevich

On 08/24/2015 06:07 AM, Xin Long wrote:
> currently, we sctp_walk_params(), if we encounter the address parameter, we 
> will
> skip them, we do not care about how many addr params are there.
> 
> but the params of ASCONF chunk should consist of one *Address Parameter* and 
> one
> or more *ASCONF Parameters*.
> 
> so we will process multiple address parameters as unrecognized parameter and
> send error cause to peer.
> 
> Signed-off-by: Xin Long 
> ---
>  net/sctp/sm_make_chunk.c | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
> index 06320c8..0ee5ca7 100644
> --- a/net/sctp/sm_make_chunk.c
> +++ b/net/sctp/sm_make_chunk.c
> @@ -3217,10 +3217,18 @@ struct sctp_chunk *sctp_process_asconf(struct 
> sctp_association *asoc,
>  
>   /* Process the TLVs contained within the ASCONF chunk. */
>   sctp_walk_params(param, addip, addip_hdr.params) {
> - /* Skip preceeding address parameters. */
> + /* Skip preceeding address parameters.
> +  * process multi-addrparam as unrecognized parameters
> +  */
>   if (param.p->type == SCTP_PARAM_IPV4_ADDRESS ||
> - param.p->type == SCTP_PARAM_IPV6_ADDRESS)
> + param.p->type == SCTP_PARAM_IPV6_ADDRESS) {
> + if(param.addr != addr_param) {
> + all_param_pass = false;
> + sctp_add_asconf_response(asconf_ack, 0,
> + SCTP_ERROR_UNKNOWN_PARAM, param.v);
> + }
>   continue;
> + }
>  

I think it would be much better to catch this in the validation stage.
If an implementation inserts multiple address parameters, we don't really know
which one we should be using.

-vlad

>   err_code = sctp_process_asconf_param(asoc, asconf,
>param.addip);
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 4/9] net: dsa: Allow configuration of CPU & DSA port speeds/duplex

2015-08-24 Thread Florian Fainelli

On 23/08/15 14:24, Andrew Lunn wrote:
>>> +   port_dn = cd->port_dn[port];
>>> +   if (of_phy_is_fixed_link(port_dn)) {
>>> +   ret = of_phy_register_fixed_link(port_dn);
>>> +   if (ret) {
>>> +   netdev_err(master,
>>> +  "failed to register fixed PHY\n");
>>> +   return ret;
>>> +   }
>>> +   phydev = of_phy_find_device(port_dn);
>>> +   genphy_config_init(phydev);
>>> +   genphy_read_status(phydev);
>>> +   if (ds->drv->adjust_link)
>>> +   ds->drv->adjust_link(ds, port, phydev);
>>
>> This kind of hack here because what you really need is just the link
>> parameters, but you cannot obtain such information without first
>> configuring the PHY up to a certain point in genphy_config_init(), and
>> then have genphy_read_status() copy these values in your phydev structure.
>>
>> Maybe we should really consider something like this after all:
>>
>> https://lkml.org/lkml/2015/8/5/490
> 
> Hi Florian
> 
> This half solves the problem. The nice thing about using the
> fixed_link, is that i can just call the adjust_link function with it.
> The fixed_phy_status cannot be passed directly to adjust_link. Some
> code refactoring or duplication would be needed.

Right, and using an adjust_link callback seems a little cleaner anyway
since you get an abstracted PHY device to work with.

>  
>> Or maybe, we should really introduce this "cpu" network device after all
>> with a dropping xmit function, such that we get ethtool counters to work
>> on it, and we can also attach it to a PHY device to configure link
>> parameters?
> 
> I keep humming and harring about this. I don't really like the idea of
> having an interface which you cannot send/receive packets. Yet it
> solves a number of problems like this, and gives you access to
> statistics and registers in the usual way.

Right that would be my primary motivation and use case as well.

> If we do it for the CPU
> port, we should also do it for the DSA ports. And we probably want the
> call for up to return -ENOSUP, just to make it clear it cannot be used
> for anything.

We should definitively start a separate thread for this, as there might
be real uses cases that are not yet covered that would need a network
device.

Let's go ahead with your patch for now:

Reviewed-by: Florian Fainelli 
-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 net-next 3/8] tunnel: introduce udp_tun_rx_dst()

2015-08-24 Thread Pravin B Shelar

Introduce function udp_tun_rx_dst() to initialize tunnel dst on
receive path.

Signed-off-by: Pravin B Shelar 
---
Rebased to support ipv6 tun-dst.
---
 drivers/net/vxlan.c|   29 ++--
 include/net/dst_metadata.h |   61 
 include/net/udp_tunnel.h   |4 +++
 net/ipv4/ip_gre.c  |   21 +++---
 net/ipv4/udp_tunnel.c  |   25 +-
 5 files changed, 97 insertions(+), 43 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 61b457b..5b4cf66 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1264,36 +1264,13 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct 
sk_buff *skb)
}
 
if (vxlan_collect_metadata(vs)) {
-   tun_dst = metadata_dst_alloc(sizeof(*md), GFP_ATOMIC);
+   tun_dst = udp_tun_rx_dst(skb, vxlan_get_sk_family(vs), 
TUNNEL_KEY,
+cpu_to_be64(vni >> 8), sizeof(*md));
+
if (!tun_dst)
goto drop;
 
info = &tun_dst->u.tun_info;
-   if (vxlan_get_sk_family(vs) == AF_INET) {
-   const struct iphdr *iph = ip_hdr(skb);
-
-   info->key.u.ipv4.src = iph->saddr;
-   info->key.u.ipv4.dst = iph->daddr;
-   info->key.tos = iph->tos;
-   info->key.ttl = iph->ttl;
-   } else {
-   const struct ipv6hdr *ip6h = ipv6_hdr(skb);
-
-   info->key.u.ipv6.src = ip6h->saddr;
-   info->key.u.ipv6.dst = ip6h->daddr;
-   info->key.tos = ipv6_get_dsfield(ip6h);
-   info->key.ttl = ip6h->hop_limit;
-   }
-
-   info->key.tp_src = udp_hdr(skb)->source;
-   info->key.tp_dst = udp_hdr(skb)->dest;
-
-   info->mode = IP_TUNNEL_INFO_RX;
-   info->key.tun_flags = TUNNEL_KEY;
-   info->key.tun_id = cpu_to_be64(vni >> 8);
-   if (udp_hdr(skb)->check != 0)
-   info->key.tun_flags |= TUNNEL_CSUM;
-
md = ip_tunnel_info_opts(info, sizeof(*md));
} else {
memset(md, 0, sizeof(*md));
diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
index 2cb52d5..60c0332 100644
--- a/include/net/dst_metadata.h
+++ b/include/net/dst_metadata.h
@@ -48,4 +48,65 @@ static inline bool skb_valid_dst(const struct sk_buff *skb)
 struct metadata_dst *metadata_dst_alloc(u8 optslen, gfp_t flags);
 struct metadata_dst __percpu *metadata_dst_alloc_percpu(u8 optslen, gfp_t 
flags);
 
+static inline struct metadata_dst *tun_rx_dst(__be16 flags,
+ __be64 tunnel_id, int md_size)
+{
+   struct metadata_dst *tun_dst;
+   struct ip_tunnel_info *info;
+
+   tun_dst = metadata_dst_alloc(md_size, GFP_ATOMIC);
+   if (!tun_dst)
+   return NULL;
+
+   info = &tun_dst->u.tun_info;
+   info->mode = IP_TUNNEL_INFO_RX;
+   info->key.tun_flags = flags;
+   info->key.tun_id = tunnel_id;
+   info->key.tp_src = 0;
+   info->key.tp_dst = 0;
+   return tun_dst;
+}
+
+static inline struct metadata_dst *ip_tun_rx_dst(struct sk_buff *skb,
+__be16 flags,
+__be64 tunnel_id,
+int md_size)
+{
+   const struct iphdr *iph = ip_hdr(skb);
+   struct metadata_dst *tun_dst;
+   struct ip_tunnel_info *info;
+
+   tun_dst = tun_rx_dst(flags, tunnel_id, md_size);
+   if (!tun_dst)
+   return NULL;
+
+   info = &tun_dst->u.tun_info;
+   info->key.u.ipv4.src = iph->saddr;
+   info->key.u.ipv4.dst = iph->daddr;
+   info->key.tos = iph->tos;
+   info->key.ttl = iph->ttl;
+   return tun_dst;
+}
+
+static inline struct metadata_dst *ipv6_tun_rx_dst(struct sk_buff *skb,
+__be16 flags,
+__be64 tunnel_id,
+int md_size)
+{
+   const struct ipv6hdr *ip6h = ipv6_hdr(skb);
+   struct metadata_dst *tun_dst;
+   struct ip_tunnel_info *info;
+
+   tun_dst = tun_rx_dst(flags, tunnel_id, md_size);
+   if (!tun_dst)
+   return NULL;
+
+   info = &tun_dst->u.tun_info;
+   info->key.u.ipv6.src = ip6h->saddr;
+   info->key.u.ipv6.dst = ip6h->daddr;
+   info->key.tos = ipv6_get_dsfield(ip6h);
+   info->key.ttl = ip6h->hop_limit;
+   return tun_dst;
+}
+
 #endif /* __NET_DST_METADATA_H */
diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h
index c491c12..35041d0 100644
--- a/include/net/udp_tunnel.h
+++ b/include/net/udp_tunnel.h
@@ -93,6 +93,10 @@ int udp_tunnel6_xmit_skb(struct dst_

Re: [PATCH] usbnet: Fix two races between usbnet_stop() and the BH

2015-08-24 Thread David Miller

From: Eugene Shatokhin 
Date: Wed, 19 Aug 2015 14:59:01 +0300

> So the following might be possible, although unlikely:
> 
> CPU0 CPU1
>  clear_bit: read dev->flags
>  clear_bit: clear EVENT_RX_KILL in the read value
> 
> dev->flags=0;
> 
>  clear_bit: write updated dev->flags
> 
> As a result, dev->flags may become non-zero again.

Is this really possible?

Stores really are "atomic" in the sense that the do their update
in one indivisible operation.

Atomic operations like clear_bit also will behave that way.

If a clear_bit is in progress, the "dev->flags=0" store will not be
able to grab the cache line exclusively until the clear_bit is done.

So I think the above sequent of events is completely impossible.  Once
a clear_bit starts, a write by another foreign agent on the bus is
absolutely impossible to legally occur until the clear_bit completes.

I think this is a non-issue.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 net-next 6/8] openvswitch: Use Geneve device.

2015-08-24 Thread Pravin B Shelar

With help of tunnel metadata mode OVS can directly use
Geneve devices to implement Geneve tunnels.
This patch removes all of the OVS specific Geneve code
and make OVS use a Geneve net_device. Basic geneve vport
is still there to handle compatibility with current
userspace application.

Signed-off-by: Pravin B Shelar 
Reviewed-by: Jesse Gross 
---
 net/openvswitch/Kconfig|2 +-
 net/openvswitch/vport-geneve.c |  179 +++
 2 files changed, 33 insertions(+), 148 deletions(-)

diff --git a/net/openvswitch/Kconfig b/net/openvswitch/Kconfig
index 422dc05..87b98c0 100644
--- a/net/openvswitch/Kconfig
+++ b/net/openvswitch/Kconfig
@@ -59,7 +59,7 @@ config OPENVSWITCH_VXLAN
 config OPENVSWITCH_GENEVE
tristate "Open vSwitch Geneve tunneling support"
depends on OPENVSWITCH
-   depends on GENEVE_CORE
+   depends on GENEVE
default OPENVSWITCH
---help---
  If you say Y here, then the Open vSwitch will be able create geneve 
vport.
diff --git a/net/openvswitch/vport-geneve.c b/net/openvswitch/vport-geneve.c
index d01bd63..fa37c95 100644
--- a/net/openvswitch/vport-geneve.c
+++ b/net/openvswitch/vport-geneve.c
@@ -26,95 +26,44 @@
 
 #include "datapath.h"
 #include "vport.h"
+#include "vport-netdev.h"
 
 static struct vport_ops ovs_geneve_vport_ops;
-
 /**
  * struct geneve_port - Keeps track of open UDP ports
- * @gs: The socket created for this port number.
- * @name: vport name.
+ * @dst_port: destination port.
  */
 struct geneve_port {
-   struct geneve_sock *gs;
-   char name[IFNAMSIZ];
+   u16 port_no;
 };
 
-static LIST_HEAD(geneve_ports);
-
 static inline struct geneve_port *geneve_vport(const struct vport *vport)
 {
return vport_priv(vport);
 }
 
-/* Convert 64 bit tunnel ID to 24 bit VNI. */
-static void tunnel_id_to_vni(__be64 tun_id, __u8 *vni)
-{
-#ifdef __BIG_ENDIAN
-   vni[0] = (__force __u8)(tun_id >> 16);
-   vni[1] = (__force __u8)(tun_id >> 8);
-   vni[2] = (__force __u8)tun_id;
-#else
-   vni[0] = (__force __u8)((__force u64)tun_id >> 40);
-   vni[1] = (__force __u8)((__force u64)tun_id >> 48);
-   vni[2] = (__force __u8)((__force u64)tun_id >> 56);
-#endif
-}
-
-/* Convert 24 bit VNI to 64 bit tunnel ID. */
-static __be64 vni_to_tunnel_id(const __u8 *vni)
-{
-#ifdef __BIG_ENDIAN
-   return (vni[0] << 16) | (vni[1] << 8) | vni[2];
-#else
-   return (__force __be64)(((__force u64)vni[0] << 40) |
-   ((__force u64)vni[1] << 48) |
-   ((__force u64)vni[2] << 56));
-#endif
-}
-
-static void geneve_rcv(struct geneve_sock *gs, struct sk_buff *skb)
-{
-   struct vport *vport = gs->rcv_data;
-   struct genevehdr *geneveh = geneve_hdr(skb);
-   int opts_len;
-   struct ip_tunnel_info tun_info;
-   __be64 key;
-   __be16 flags;
-
-   opts_len = geneveh->opt_len * 4;
-
-   flags = TUNNEL_KEY | TUNNEL_GENEVE_OPT |
-   (udp_hdr(skb)->check != 0 ? TUNNEL_CSUM : 0) |
-   (geneveh->oam ? TUNNEL_OAM : 0) |
-   (geneveh->critical ? TUNNEL_CRIT_OPT : 0);
-
-   key = vni_to_tunnel_id(geneveh->vni);
-
-   ip_tunnel_info_init(&tun_info, ip_hdr(skb),
-   udp_hdr(skb)->source, udp_hdr(skb)->dest,
-   key, flags, geneveh->options, opts_len);
-
-   ovs_vport_receive(vport, skb, &tun_info);
-}
-
 static int geneve_get_options(const struct vport *vport,
  struct sk_buff *skb)
 {
struct geneve_port *geneve_port = geneve_vport(vport);
-   struct inet_sock *sk = inet_sk(geneve_port->gs->sock->sk);
 
-   if (nla_put_u16(skb, OVS_TUNNEL_ATTR_DST_PORT, ntohs(sk->inet_sport)))
+   if (nla_put_u16(skb, OVS_TUNNEL_ATTR_DST_PORT, geneve_port->port_no))
return -EMSGSIZE;
return 0;
 }
 
-static void geneve_tnl_destroy(struct vport *vport)
+static int geneve_get_egress_tun_info(struct vport *vport, struct sk_buff *skb,
+ struct ip_tunnel_info *egress_tun_info)
 {
struct geneve_port *geneve_port = geneve_vport(vport);
+   struct net *net = ovs_dp_get_net(vport->dp);
+   __be16 dport = htons(geneve_port->port_no);
+   __be16 sport = udp_flow_src_port(net, skb, 1, USHRT_MAX, true);
 
-   geneve_sock_release(geneve_port->gs);
-
-   ovs_vport_deferred_free(vport);
+   return ovs_tunnel_get_egress_info(egress_tun_info,
+ ovs_dp_get_net(vport->dp),
+ OVS_CB(skb)->egress_tun_info,
+ IPPROTO_UDP, skb->mark, sport, dport);
 }
 
 static struct vport *geneve_tnl_create(const struct vport_parms *parms)
@@ -122,11 +71,11 @@ static struct vport *geneve_tnl_create(const struct 
vport_parms *parms)
struct net *net = ovs_dp_get_net(parms->dp);
struct nlattr *options

[PATCH v3 net-next 1/8] geneve: Initialize ethernet address in device setup.

2015-08-24 Thread Pravin B Shelar

Signed-off-by: Pravin B Shelar 
Reviewed-by: Jesse Gross 
Acked-by: Thomas Graf 
---
 drivers/net/geneve.c |4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 897e1a3..95e9da0 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -297,6 +297,7 @@ static void geneve_setup(struct net_device *dev)
 
netif_keep_dst(dev);
dev->priv_flags |= IFF_LIVE_ADDR_CHANGE | IFF_NO_QUEUE;
+   eth_hw_addr_random(dev);
 }
 
 static const struct nla_policy geneve_policy[IFLA_GENEVE_MAX + 1] = {
@@ -364,9 +365,6 @@ static int geneve_newlink(struct net *net, struct 
net_device *dev,
return -EBUSY;
}
 
-   if (tb[IFLA_ADDRESS] == NULL)
-   eth_hw_addr_random(dev);
-
err = register_netdevice(dev);
if (err)
return err;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 net-next 5/8] geneve: Add support to collect tunnel metadata.

2015-08-24 Thread Pravin B Shelar

Following patch create new tunnel flag which enable
tunnel metadata collection on given device. These devices
can be used by tunnel metadata based routing or by OVS.
Geneve Consolidation patch get rid of collect_md_tun to
simplify tunnel lookup further.

Signed-off-by: Pravin B Shelar 
---
v2-v3:
Do not allow regular and metadata tunnel devices on same port.
---
 drivers/net/geneve.c |  360 --
 include/net/geneve.h |3 +
 include/uapi/linux/if_link.h |1 +
 3 files changed, 280 insertions(+), 84 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 0a6d974..c05bc13 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -36,6 +37,7 @@ MODULE_PARM_DESC(log_ecn_error, "Log packets received with 
corrupted ECN");
 struct geneve_net {
struct list_head  geneve_list;
struct hlist_head vni_list[VNI_HASH_SIZE];
+   struct geneve_dev __rcu *collect_md_tun;
 };
 
 /* Pseudo network device */
@@ -50,6 +52,7 @@ struct geneve_dev {
struct sockaddr_in remote;  /* IPv4 address for link partner */
struct list_head   next;/* geneve's per namespace list */
__be16 dst_port;
+   bool   collect_md;
 };
 
 static int geneve_net_id;
@@ -62,48 +65,95 @@ static inline __u32 geneve_net_vni_hash(u8 vni[3])
return hash_32(vnid, VNI_HASH_BITS);
 }
 
-/* geneve receive/decap routine */
-static void geneve_rx(struct geneve_sock *gs, struct sk_buff *skb)
+static __be64 vni_to_tunnel_id(const __u8 *vni)
+{
+#ifdef __BIG_ENDIAN
+   return (vni[0] << 16) | (vni[1] << 8) | vni[2];
+#else
+   return (__force __be64)(((__force u64)vni[0] << 40) |
+   ((__force u64)vni[1] << 48) |
+   ((__force u64)vni[2] << 56));
+#endif
+}
+
+static struct geneve_dev *geneve_lookup(struct geneve_net *gn,
+   struct geneve_sock *gs,
+   struct iphdr *iph,
+   struct genevehdr *gnvh)
 {
struct inet_sock *sk = inet_sk(gs->sock->sk);
-   struct genevehdr *gnvh = geneve_hdr(skb);
-   struct geneve_dev *dummy, *geneve = NULL;
-   struct geneve_net *gn;
-   struct iphdr *iph = NULL;
-   struct pcpu_sw_netstats *stats;
struct hlist_head *vni_list_head;
-   int err = 0;
+   struct geneve_dev *geneve;
__u32 hash;
 
-   iph = ip_hdr(skb); /* Still outer IP header... */
-
-   gn = gs->rcv_data;
+   geneve = rcu_dereference(gn->collect_md_tun);
+   if (geneve)
+   return geneve;
 
/* Find the device for this VNI */
hash = geneve_net_vni_hash(gnvh->vni);
vni_list_head = &gn->vni_list[hash];
-   hlist_for_each_entry_rcu(dummy, vni_list_head, hlist) {
-   if (!memcmp(gnvh->vni, dummy->vni, sizeof(dummy->vni)) &&
-   iph->saddr == dummy->remote.sin_addr.s_addr &&
-   sk->inet_sport == dummy->dst_port) {
-   geneve = dummy;
-   break;
+   hlist_for_each_entry_rcu(geneve, vni_list_head, hlist) {
+   if (!memcmp(gnvh->vni, geneve->vni, sizeof(geneve->vni)) &&
+   iph->saddr == geneve->remote.sin_addr.s_addr &&
+   sk->inet_sport == geneve->dst_port) {
+   return geneve;
}
}
+   return NULL;
+}
+
+/* geneve receive/decap routine */
+static void geneve_rx(struct geneve_sock *gs, struct sk_buff *skb)
+{
+   struct genevehdr *gnvh = geneve_hdr(skb);
+   struct metadata_dst *tun_dst = NULL;
+   struct geneve_dev *geneve = NULL;
+   struct pcpu_sw_netstats *stats;
+   struct geneve_net *gn;
+   struct iphdr *iph;
+   int err;
+
+   iph = ip_hdr(skb); /* Still outer IP header... */
+   gn = gs->rcv_data;
+   geneve = geneve_lookup(gn, gs, iph, gnvh);
if (!geneve)
goto drop;
 
-   /* Drop packets w/ critical options,
-* since we don't support any...
-*/
-   if (gnvh->critical)
-   goto drop;
+   if (ip_tunnel_collect_metadata() && geneve->collect_md) {
+   __be16 flags;
+   void *opts;
+
+   flags = TUNNEL_KEY | TUNNEL_GENEVE_OPT |
+   (gnvh->oam ? TUNNEL_OAM : 0) |
+   (gnvh->critical ? TUNNEL_CRIT_OPT : 0);
+
+   tun_dst = udp_tun_rx_dst(skb, AF_INET, flags,
+vni_to_tunnel_id(gnvh->vni),
+gnvh->opt_len * 4);
+   if (!tun_dst)
+   goto drop;
+
+   /* Update tunnel dst according to Geneve options. */
+   opts = ip_tunnel_info_opts(&tun_dst->u.tun_i

[PATCH v3 net-next 2/8] geneve: Use skb mark and protocol to lookup route.

2015-08-24 Thread Pravin B Shelar

On packet transmit path geneve need to lookup route. Following
patch improves route lookup using more parameters.

Signed-off-by: Pravin B Shelar 
Reviewed-by: Jesse Gross 
Acked-by: Thomas Graf 
---
 drivers/net/geneve.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 95e9da0..3c5b2b1 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -202,6 +202,9 @@ static netdev_tx_t geneve_xmit(struct sk_buff *skb, struct 
net_device *dev)
memset(&fl4, 0, sizeof(fl4));
fl4.flowi4_tos = RT_TOS(tos);
fl4.daddr = geneve->remote.sin_addr.s_addr;
+   fl4.flowi4_mark = skb->mark;
+   fl4.flowi4_proto = IPPROTO_UDP;
+
rt = ip_route_output_key(geneve->net, &fl4);
if (IS_ERR(rt)) {
netdev_dbg(dev, "no route to %pI4\n", &fl4.daddr);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 net-next 7/8] geneve: Consolidate Geneve functionality in single module.

2015-08-24 Thread Pravin B Shelar

geneve_core module handles send and receive functionality.
This way OVS could use the Geneve API. Now with use of
tunnel meatadata mode OVS can directly use Geneve netdevice.
So there is no need for separate module for Geneve. Following
patch consolidates Geneve protocol processing in single module.

Signed-off-by: Pravin B Shelar 
---
v2-v3:
- Fixed Kconfig dependency.
- unified geneve_build_skb()
- Fixed geneve_build_skb() error path.
---
 drivers/net/Kconfig|4 +-
 drivers/net/geneve.c   |  494 +++-
 include/net/geneve.h   |   34 
 net/ipv4/Kconfig   |   14 --
 net/ipv4/Makefile  |1 -
 net/ipv4/geneve_core.c |  447 ---
 6 files changed, 407 insertions(+), 587 deletions(-)
 delete mode 100644 net/ipv4/geneve_core.c

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index f503736..7727b8b 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -180,8 +180,8 @@ config VXLAN
  will be called vxlan.
 
 config GENEVE
-   tristate "Generic Network Virtualization Encapsulation netdev"
-   depends on INET && GENEVE_CORE
+   tristate "Generic Network Virtualization Encapsulation"
+   depends on INET && NET_UDP_TUNNEL
select NET_IP_TUNNEL
---help---
  This allows one to create geneve virtual interfaces that provide
diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index c05bc13..8eb875d 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define GENEVE_NETDEV_VER  "0.6"
 
@@ -33,13 +34,18 @@ static bool log_ecn_error = true;
 module_param(log_ecn_error, bool, 0644);
 MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN");
 
+#define GENEVE_VER 0
+#define GENEVE_BASE_HLEN (sizeof(struct udphdr) + sizeof(struct genevehdr))
+
 /* per-network namespace private data for this module */
 struct geneve_net {
-   struct list_head  geneve_list;
-   struct hlist_head vni_list[VNI_HASH_SIZE];
-   struct geneve_dev __rcu *collect_md_tun;
+   struct list_headgeneve_list;
+   struct hlist_head   vni_list[VNI_HASH_SIZE];
+   struct list_headsock_list;
 };
 
+static int geneve_net_id;
+
 /* Pseudo network device */
 struct geneve_dev {
struct hlist_node  hlist;   /* vni hash table */
@@ -55,7 +61,15 @@ struct geneve_dev {
bool   collect_md;
 };
 
-static int geneve_net_id;
+struct geneve_sock {
+   boolcollect_md;
+   struct geneve_net   *gn;
+   struct list_headlist;
+   struct socket   *sock;
+   struct rcu_head rcu;
+   int refcnt;
+   struct udp_offload  udp_offloads;
+};
 
 static inline __u32 geneve_net_vni_hash(u8 vni[3])
 {
@@ -76,51 +90,63 @@ static __be64 vni_to_tunnel_id(const __u8 *vni)
 #endif
 }
 
-static struct geneve_dev *geneve_lookup(struct geneve_net *gn,
-   struct geneve_sock *gs,
-   struct iphdr *iph,
-   struct genevehdr *gnvh)
+static struct geneve_dev *geneve_lookup(struct geneve_net *gn, __be16 port,
+   __be32 addr, u8 vni[])
 {
-   struct inet_sock *sk = inet_sk(gs->sock->sk);
struct hlist_head *vni_list_head;
struct geneve_dev *geneve;
__u32 hash;
 
-   geneve = rcu_dereference(gn->collect_md_tun);
-   if (geneve)
-   return geneve;
-
/* Find the device for this VNI */
-   hash = geneve_net_vni_hash(gnvh->vni);
+   hash = geneve_net_vni_hash(vni);
vni_list_head = &gn->vni_list[hash];
hlist_for_each_entry_rcu(geneve, vni_list_head, hlist) {
-   if (!memcmp(gnvh->vni, geneve->vni, sizeof(geneve->vni)) &&
-   iph->saddr == geneve->remote.sin_addr.s_addr &&
-   sk->inet_sport == geneve->dst_port) {
+   if (!memcmp(vni, geneve->vni, sizeof(geneve->vni)) &&
+   addr == geneve->remote.sin_addr.s_addr &&
+   port == geneve->dst_port) {
return geneve;
}
}
return NULL;
 }
 
+static inline struct genevehdr *geneve_hdr(const struct sk_buff *skb)
+{
+   return (struct genevehdr *)(udp_hdr(skb) + 1);
+}
+
 /* geneve receive/decap routine */
 static void geneve_rx(struct geneve_sock *gs, struct sk_buff *skb)
 {
+   struct inet_sock *sk = inet_sk(gs->sock->sk);
struct genevehdr *gnvh = geneve_hdr(skb);
+   struct geneve_net *gn = gs->gn;
struct metadata_dst *tun_dst = NULL;
struct geneve_dev *geneve = NULL;
struct pcpu_sw_netstats *stats;
-   struct geneve_net *gn;
struct iphdr *iph;
+   u8 *vni;
+   __be32 addr;
+   bool xnet;
int err;

[PATCH v3 net-next 8/8] geneve: Move device hash table to geneve socket.

2015-08-24 Thread Pravin B Shelar

This change simplifies Geneve Tunnel hash table management.

Signed-off-by: Pravin B Shelar 
Reviewed-by: Jesse Gross 
---
 drivers/net/geneve.c |   58 ++---
 1 files changed, 26 insertions(+), 32 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 9967f4c..8358d41 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -40,7 +40,6 @@ MODULE_PARM_DESC(log_ecn_error, "Log packets received with 
corrupted ECN");
 /* per-network namespace private data for this module */
 struct geneve_net {
struct list_headgeneve_list;
-   struct hlist_head   vni_list[VNI_HASH_SIZE];
struct list_headsock_list;
 };
 
@@ -63,12 +62,12 @@ struct geneve_dev {
 
 struct geneve_sock {
boolcollect_md;
-   struct geneve_net   *gn;
struct list_headlist;
struct socket   *sock;
struct rcu_head rcu;
int refcnt;
struct udp_offload  udp_offloads;
+   struct hlist_head   vni_list[VNI_HASH_SIZE];
 };
 
 static inline __u32 geneve_net_vni_hash(u8 vni[3])
@@ -90,7 +89,7 @@ static __be64 vni_to_tunnel_id(const __u8 *vni)
 #endif
 }
 
-static struct geneve_dev *geneve_lookup(struct geneve_net *gn, __be16 port,
+static struct geneve_dev *geneve_lookup(struct geneve_sock *gs,
__be32 addr, u8 vni[])
 {
struct hlist_head *vni_list_head;
@@ -99,13 +98,11 @@ static struct geneve_dev *geneve_lookup(struct geneve_net 
*gn, __be16 port,
 
/* Find the device for this VNI */
hash = geneve_net_vni_hash(vni);
-   vni_list_head = &gn->vni_list[hash];
+   vni_list_head = &gs->vni_list[hash];
hlist_for_each_entry_rcu(geneve, vni_list_head, hlist) {
if (!memcmp(vni, geneve->vni, sizeof(geneve->vni)) &&
-   addr == geneve->remote.sin_addr.s_addr &&
-   port == geneve->dst_port) {
+   addr == geneve->remote.sin_addr.s_addr)
return geneve;
-   }
}
return NULL;
 }
@@ -118,9 +115,7 @@ static inline struct genevehdr *geneve_hdr(const struct 
sk_buff *skb)
 /* geneve receive/decap routine */
 static void geneve_rx(struct geneve_sock *gs, struct sk_buff *skb)
 {
-   struct inet_sock *sk = inet_sk(gs->sock->sk);
struct genevehdr *gnvh = geneve_hdr(skb);
-   struct geneve_net *gn = gs->gn;
struct metadata_dst *tun_dst = NULL;
struct geneve_dev *geneve = NULL;
struct pcpu_sw_netstats *stats;
@@ -130,8 +125,6 @@ static void geneve_rx(struct geneve_sock *gs, struct 
sk_buff *skb)
bool xnet;
int err;
 
-   iph = ip_hdr(skb); /* Still outer IP header... */
-
if (gs->collect_md) {
static u8 zero_vni[3];
 
@@ -139,10 +132,11 @@ static void geneve_rx(struct geneve_sock *gs, struct 
sk_buff *skb)
addr = 0;
} else {
vni = gnvh->vni;
+   iph = ip_hdr(skb); /* Still outer IP header... */
addr = iph->saddr;
}
 
-   geneve = geneve_lookup(gn, sk->inet_sport, addr, vni);
+   geneve = geneve_lookup(gs, addr, vni);
if (!geneve)
goto drop;
 
@@ -419,6 +413,7 @@ static struct geneve_sock *geneve_socket_create(struct net 
*net, __be16 port,
struct geneve_sock *gs;
struct socket *sock;
struct udp_tunnel_sock_cfg tunnel_cfg;
+   int h;
 
gs = kzalloc(sizeof(*gs), GFP_KERNEL);
if (!gs)
@@ -432,7 +427,8 @@ static struct geneve_sock *geneve_socket_create(struct net 
*net, __be16 port,
 
gs->sock = sock;
gs->refcnt = 1;
-   gs->gn = gn;
+   for (h = 0; h < VNI_HASH_SIZE; ++h)
+   INIT_HLIST_HEAD(&gs->vni_list[h]);
 
/* Initialize the geneve udp offloads structure */
gs->udp_offloads.port = port;
@@ -446,7 +442,6 @@ static struct geneve_sock *geneve_socket_create(struct net 
*net, __be16 port,
tunnel_cfg.encap_rcv = geneve_udp_encap_recv;
tunnel_cfg.encap_destroy = NULL;
setup_udp_tunnel_sock(net, sock, &tunnel_cfg);
-
list_add(&gs->list, &gn->sock_list);
return gs;
 }
@@ -491,6 +486,7 @@ static int geneve_open(struct net_device *dev)
struct net *net = geneve->net;
struct geneve_net *gn = net_generic(net, geneve_net_id);
struct geneve_sock *gs;
+   __u32 hash;
 
gs = geneve_find_sock(gn, geneve->dst_port);
if (gs) {
@@ -505,14 +501,20 @@ static int geneve_open(struct net_device *dev)
 out:
gs->collect_md = geneve->collect_md;
geneve->sock = gs;
+
+   hash = geneve_net_vni_hash(geneve->vni);
+   hlist_add_head_rcu(&geneve->hlist, &gs->vni_list[hash]);
return 0;
 }
 
 static int geneve_stop(struct net_device *dev)
 {
struct geneve_dev *geneve = netdev_priv(dev

[PATCH v3 net-next 0/8] Geneve: Add support for tunnel metadata mode

2015-08-24 Thread Pravin B Shelar

Following patches adds support for Geneve tunnel metadata
mode. OVS can make use of Geneve net-device with tunnel
metadata API from kernel.

This also allows us to consolidate Geneve implementation
from two kernel modules geneve_core and geneve to single
geneve module. geneve_core module was targeted to share
Geneve encap and decap code between Geneve netdevice and
OVS Geneve tunnel implementation, Since OVS no longer
needs these API, Geneve code can be consolidated into
single geneve module.

v2-v3:
- make tunnel medata device and regular device mutually exclusive.
- Fix Kconfig dependency for Geneve.
- Fix dst-port netlink encoding.
- drop changelink patch.

v1-v2:
- Replaced per hash table tunnel pointer (metadata enabled) with flag.
- Added support for changelink.
- Improve geneve device route lookup with more parameters.

Pravin B Shelar (8):
  geneve: Initialize ethernet address in device setup.
  geneve: Use skb mark and protocol to lookup route.
  tunnel: introduce udp_tun_rx_dst()
  geneve: Make dst-port configurable.
  geneve: Add support to collect tunnel metadata.
  openvswitch: Use Geneve device.
  geneve: Consolidate Geneve functionality in single module.
  geneve: Move device hash table to geneve socket.

 drivers/net/Kconfig|2 +-
 drivers/net/geneve.c   |  730 ++--
 drivers/net/vxlan.c|   29 +--
 include/net/dst_metadata.h |   61 
 include/net/geneve.h   |   35 +--
 include/net/udp_tunnel.h   |4 +
 include/uapi/linux/if_link.h   |2 +
 net/ipv4/Kconfig   |   14 -
 net/ipv4/Makefile  |1 -
 net/ipv4/geneve_core.c |  447 
 net/ipv4/ip_gre.c  |   20 +-
 net/ipv4/udp_tunnel.c  |   25 ++-
 net/openvswitch/Kconfig|2 +-
 net/openvswitch/vport-geneve.c |  179 ++
 14 files changed, 760 insertions(+), 791 deletions(-)
 delete mode 100644 net/ipv4/geneve_core.c

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 net-next 4/8] geneve: Make dst-port configurable.

2015-08-24 Thread Pravin B Shelar

Add netlink interface to configure Geneve UDP port number.
So that user can configure it for a Gevene device.

Signed-off-by: Pravin B Shelar 
Reviewed-by: Jesse Gross 
---
Fixed dst-port netlink encoding
---
 drivers/net/geneve.c |   25 +
 include/uapi/linux/if_link.h |1 +
 2 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 3c5b2b1..0a6d974 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -49,6 +49,7 @@ struct geneve_dev {
u8 tos; /* TOS override */
struct sockaddr_in remote;  /* IPv4 address for link partner */
struct list_head   next;/* geneve's per namespace list */
+   __be16 dst_port;
 };
 
 static int geneve_net_id;
@@ -64,6 +65,7 @@ static inline __u32 geneve_net_vni_hash(u8 vni[3])
 /* geneve receive/decap routine */
 static void geneve_rx(struct geneve_sock *gs, struct sk_buff *skb)
 {
+   struct inet_sock *sk = inet_sk(gs->sock->sk);
struct genevehdr *gnvh = geneve_hdr(skb);
struct geneve_dev *dummy, *geneve = NULL;
struct geneve_net *gn;
@@ -82,7 +84,8 @@ static void geneve_rx(struct geneve_sock *gs, struct sk_buff 
*skb)
vni_list_head = &gn->vni_list[hash];
hlist_for_each_entry_rcu(dummy, vni_list_head, hlist) {
if (!memcmp(gnvh->vni, dummy->vni, sizeof(dummy->vni)) &&
-   iph->saddr == dummy->remote.sin_addr.s_addr) {
+   iph->saddr == dummy->remote.sin_addr.s_addr &&
+   sk->inet_sport == dummy->dst_port) {
geneve = dummy;
break;
}
@@ -157,7 +160,7 @@ static int geneve_open(struct net_device *dev)
struct geneve_net *gn = net_generic(geneve->net, geneve_net_id);
struct geneve_sock *gs;
 
-   gs = geneve_sock_add(net, htons(GENEVE_UDP_PORT), geneve_rx, gn,
+   gs = geneve_sock_add(net, geneve->dst_port, geneve_rx, gn,
 false, false);
if (IS_ERR(gs))
return PTR_ERR(gs);
@@ -228,7 +231,7 @@ static netdev_tx_t geneve_xmit(struct sk_buff *skb, struct 
net_device *dev)
/* no need to handle local destination and encap bypass...yet... */
 
err = geneve_xmit_skb(gs, rt, skb, fl4.saddr, fl4.daddr,
- tos, ttl, 0, sport, htons(GENEVE_UDP_PORT), 0,
+ tos, ttl, 0, sport, geneve->dst_port, 0,
  geneve->vni, 0, NULL, false,
  !net_eq(geneve->net, dev_net(geneve->dev)));
if (err < 0)
@@ -308,6 +311,7 @@ static const struct nla_policy 
geneve_policy[IFLA_GENEVE_MAX + 1] = {
[IFLA_GENEVE_REMOTE]= { .len = FIELD_SIZEOF(struct iphdr, 
daddr) },
[IFLA_GENEVE_TTL]   = { .type = NLA_U8 },
[IFLA_GENEVE_TOS]   = { .type = NLA_U8 },
+   [IFLA_GENEVE_PORT]  = { .type = NLA_U16 },
 };
 
 static int geneve_validate(struct nlattr *tb[], struct nlattr *data[])
@@ -341,6 +345,7 @@ static int geneve_newlink(struct net *net, struct 
net_device *dev,
struct hlist_head *vni_list_head;
struct sockaddr_in remote;  /* IPv4 address for link partner */
__u32 vni, hash;
+   __be16 dst_port;
int err;
 
if (!data[IFLA_GENEVE_ID] || !data[IFLA_GENEVE_REMOTE])
@@ -359,13 +364,20 @@ static int geneve_newlink(struct net *net, struct 
net_device *dev,
if (IN_MULTICAST(ntohl(geneve->remote.sin_addr.s_addr)))
return -EINVAL;
 
+   if (data[IFLA_GENEVE_PORT])
+   dst_port = htons(nla_get_u16(data[IFLA_GENEVE_PORT]));
+   else
+   dst_port = htons(GENEVE_UDP_PORT);
+
remote = geneve->remote;
hash = geneve_net_vni_hash(geneve->vni);
vni_list_head = &gn->vni_list[hash];
hlist_for_each_entry_rcu(dummy, vni_list_head, hlist) {
if (!memcmp(geneve->vni, dummy->vni, sizeof(dummy->vni)) &&
-   !memcmp(&remote, &dummy->remote, sizeof(dummy->remote)))
+   !memcmp(&remote, &dummy->remote, sizeof(dummy->remote)) &&
+   dst_port == dummy->dst_port) {
return -EBUSY;
+   }
}
 
err = register_netdevice(dev);
@@ -378,6 +390,7 @@ static int geneve_newlink(struct net *net, struct 
net_device *dev,
if (data[IFLA_GENEVE_TOS])
geneve->tos = nla_get_u8(data[IFLA_GENEVE_TOS]);
 
+   geneve->dst_port = dst_port;
list_add(&geneve->next, &gn->geneve_list);
 
hlist_add_head_rcu(&geneve->hlist, &gn->vni_list[hash]);
@@ -402,6 +415,7 @@ static size_t geneve_get_size(const struct net_device *dev)
nla_total_size(sizeof(struct in_addr)) + /* IFLA_GENEVE_REMOTE 
*/
nla_total_size(sizeof(__u8)) +  /* IFLA_GENEVE_TTL */

Re: [PATCH net-next v3 0/2] ila: Precompute checksums

2015-08-24 Thread David Miller

From: Tom Herbert 
Date: Mon, 24 Aug 2015 09:45:40 -0700

> This patch set:
>  - Adds argument ot LWT build_state that holds a pointer to the fib
>configuration being applied to the new route
>  - Adds support in ILA to precompute checksum difference for
>performance optimization
> 
> v2:
>  - Move return argument in build_state to end of arguments
> 
> v3:
>  - Update the signature for ip6_tun_build_state()

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] sctp: asconf's process should verify address parameter is in the beginning

2015-08-24 Thread Vlad Yasevich

On 08/24/2015 06:07 AM, Xin Long wrote:
> in sctp_process_asconf(), we get address parameter from the beginning of the
> addip params. but we never check if it's really there. if the addr param is 
> not
> there, it still can pass sctp_verify_asconf(), then to be handled by
> sctp_process_asconf(), it will not be safe.
> 
> so add a code in sctp_verify_asconf() to check the address parameter is in the
> beginning, or return false to send abort.
> 
> Signed-off-by: Xin Long 
> ---
>  net/sctp/sm_make_chunk.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
> index 0ee5ca7..a2a72d5 100644
> --- a/net/sctp/sm_make_chunk.c
> +++ b/net/sctp/sm_make_chunk.c
> @@ -3122,6 +3122,14 @@ bool sctp_verify_asconf(const struct sctp_association 
> *asoc,
>   union sctp_params param;
>   bool addr_param_seen = false;
>  
> + if(addr_param_needed){
> + /* Ensure the address parameter is in the beginning */
> + param.v = chunk->skb->data + sizeof(sctp_addiphdr_t);
> + if (param.p->type != SCTP_PARAM_IPV4_ADDRESS &&
> + param.p->type != SCTP_PARAM_IPV6_ADDRESS)
> + return false;
> + }
> +

Sorry, you can't do that directly without a lot more checks.  The parameer
may be only only partial, or may not be there at all.  You'd end up looking
at wrong mememory.

A better way would be to set the addr_param_seen only when looking at
the first parameter (addip_hdr.params).

-vlad

>   sctp_walk_params(param, addip, addip_hdr.params) {
>   size_t length = ntohs(param.p->length);
>  
> 


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Low throughput in VMs using VxLAN

2015-08-24 Thread Rick Jones


On 08/24/2015 09:19 AM, Santosh R wrote:

  Hi,

Earlier I was seeing lower throughput in VMs using VxLan as GRO was
not happening in VM.
Tom Herbert suggested to use "vxlan: GRO support at tunnel layer" patch series.
With today's net-next (4.2.0-rc7) in host and VM, I could see GRO
happening for vxlan, macvtap and virtual interface in VM.
The throughput is still low between VMs (around 4Gbps compared to
9Gbps without VxLAN).


Out of curiosity, have you tried tweaking gro_flush_timeout 
(gro_flush_interval?) for the VMs eth interface?  Say perhaps a value of 
1000?  (I'm assuming the VM is using virtio_net) Does the behaviour 
change if vhost-net is loaded into the host and used by the VM?


rick jones

For completeness, it would also be good to compare the likes of netperf 
TCP_RR between VxLAN and without.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 6/9] dsa: mv88e6xxx: Set the RGMII delay based on phy interface

2015-08-24 Thread Florian Fainelli

On 23/08/15 14:10, Andrew Lunn wrote:
> On Sun, Aug 23, 2015 at 11:44:01AM -0700, Florian Fainelli wrote:
>> Le 08/23/15 02:46, Andrew Lunn a écrit :
>>> Some Marvell switches allow the RGMII Rx and Tx clock to be delayed
>>> when the port is using RGMII. Have the adjust_link function look at
>>> the phy interface type and enable this delay as requested.
>>>
>>> Signed-off-by: Andrew Lunn 
>>> ---
>>>  drivers/net/dsa/mv88e6xxx.c | 10 ++
>>>  drivers/net/dsa/mv88e6xxx.h |  2 ++
>>>  2 files changed, 12 insertions(+)
>>>
>>> diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
>>> index 7901db6503b4..f5af368751b2 100644
>>> --- a/drivers/net/dsa/mv88e6xxx.c
>>> +++ b/drivers/net/dsa/mv88e6xxx.c
>>> @@ -612,6 +612,16 @@ void mv88e6xxx_adjust_link(struct dsa_switch *ds, int 
>>> port,
>>> if (phydev->duplex == DUPLEX_FULL)
>>> reg |= PORT_PCS_CTRL_DUPLEX_FULL;
>>>  
>>> +   if ((mv88e6xxx_6352_family(ds) || mv88e6xxx_6351_family(ds)) &&
>>> +   (port >= ps->num_ports - 2)) {
>>
>> Are we positive that the last two ports of a switch are going to be
>> RGMII capable or is this something that should be moved to Device Tree /
>> platform data to account for different switch families? Maybe having a
>> bitmask of RGMII capable ports stored in "ps" would be good enough?
> 
> Hi Florian
> 
> For these two families, this is correct. And it is a property of the
> switch, not the board, so should not be in DT. Other families are
> different. Older ones are Fast Ethernet only. Some don't have any
> RGMII ports, etc. It could be with time, this condition gets messy, at
> which point, a bitmask in ps would make sense. But is it justified
> now?

Sure, I think for now this patch is good as-is, I was mostly curious
whether the assumption about the last 2 ports of the switch being RGMII
would hold for a while, and it looks like it will. With that:

Reviewed-by: Florian Fainelli 
-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] usbnet: Fix two races between usbnet_stop() and the BH

2015-08-24 Thread Eugene Shatokhin


24.08.2015 16:29, Bjørn Mork пишет:

Eugene Shatokhin  writes:


19.08.2015 15:31, Bjørn Mork пишет:

Eugene Shatokhin  writes:


The problem is not in the reordering but rather in the fact that
"dev->flags = 0" is not necessarily atomic
w.r.t. "clear_bit(EVENT_RX_KILL, &dev->flags)", and vice versa.

So the following might be possible, although unlikely:

CPU0 CPU1
   clear_bit: read dev->flags
   clear_bit: clear EVENT_RX_KILL in the read value

dev->flags=0;

   clear_bit: write updated dev->flags

As a result, dev->flags may become non-zero again.


Ah, right.  Thanks for explaining.


I cannot prove yet that this is an impossible situation. If anyone
can, please explain. If so, this part of the patch will not be needed.


I wonder if we could simply move the dev->flags = 0 down a few lines to
fix both issues?  It doesn't seem to do anything useful except for
resetting the flags to a sane initial state after the device is down.

Stopping the tasklet rescheduling etc depends only on netif_running(),
which will be false when usbnet_stop is called.  There is no need to
touch dev->flags for this to happen.


That was one of the first ideas we discussed here. Unfortunately, it
is probably not so simple.

Setting dev->flags to 0 makes some delayed operations do nothing and,
among other things, not to reschedule usbnet_bh().


Yes, but I believe that is merely a side effect.  You should never need
to clear multiple flags to get the desired behaviour.


As you can see in drivers/net/usb/usbnet.c, usbnet_bh() can be called
as a tasklet function and as a timer function in a number of
situations (look for the usage of dev->bh and dev->delay there).

netif_running() is indeed false when usbnet_stop() runs, usbnet_stop()
also disables Tx. This seems to be enough for many cases where
usbnet_bh() is scheduled, but I am not so sure about the remaining
ones, namely:

1. A work function, usbnet_deferred_kevent(), may reschedule
usbnet_bh(). Looks like the workqueue is only stopped in
usbnet_disconnect(), so a work item might be processed while
usbnet_stop() works. Setting dev->flags to 0 makes the work function
do nothing, by the way. See also the comment in usbnet_stop() about
this.

A work item may be placed to this workqueue in a number of ways, by
both usbnet module and the mini-drivers. It is not too easy to track
all these situations.


That's an understatement :)




2. rx_complete() and tx_complete() may schedule execution of
usbnet_bh() as a tasklet or a timer function. These two are URB
completion callbacks.

It seems, new Rx and Tx URBs cannot be submitted when usbnet_stop()
clears dev->flags, indeed. But it does not prevent the completion
handlers for the previously submitted URBs from running concurrently
with usbnet_stop(). The latter waits for them to complete (via
usbnet_terminate_urbs(dev)) but only if FLAG_AVOID_UNLINK_URBS is not
set in info->flags. rndis_wlan, however, sets this flag for a few
hardware models. So - no guarantees here as well.


FLAG_AVOID_UNLINK_URBS looks like it should be replaced by the newer
ability to keep the status urb active. I believe that must have been the
real reason for adding it, based on the commit message and the effect
the flag will have:

  commit 1487cd5e76337555737cbc55d7d83f41460d198f
  Author: Jussi Kivilinna 
  Date:   Thu Jul 30 19:41:20 2009 +0300

 usbnet: allow "minidriver" to prevent urb unlinking on usbnet_stop

 rndis_wlan devices freeze after running usbnet_stop several times. It 
appears
 that firmware freezes in state where it does not respond to any RNDIS 
commands
 and device have to be physically unplugged/replugged. This patch lets
 minidrivers to disable unlink_urbs on usbnet_stop through new info flag.

 Signed-off-by: Jussi Kivilinna 
 Cc: David Brownell 
 Signed-off-by: John W. Linville 



The rx urbs will not be resubmitted in any case, and there are of course
no tx urbs being submitted.  So the only effect of this flag is on the
status/interrupt urb, which I can imagine some RNDIS devices wants
active all the time.

So FLAG_AVOID_UNLINK_URBS should probably be removed and replaced calls
to usbnet_status_start() and usbnet_status_stop().  This will require
testing on some of the devices with the original firmware problem
however.

In any case: I do not think this flag should be considered when trying
to make usbnet_stop behaviour saner.  It's only purpose is to
deliberately break usbnet_stop by not actually stopping.



If someone could list the particular bits of dev->flags that should be
cleared to make sure no deferred call could reschedule usbnet_bh(),
etc... Well, it would be enough to clear these first and use
dev->flags = 0 later, after tasklet_kill() and del_timer_sync(). I
cannot point out these particular bits now.



I don't think any of the flags must be cleared.  The sequence

 dev_close(dev->net);
usbnet_terminate_urbs(d

[PATCH net-next v3 1/2] lwt: Add cfg argument to build_state

2015-08-24 Thread Tom Herbert

Add cfg and family arguments to lwt build state functions. cfg is a void
pointer and will either be a pointer to a fib_config or fib6_config
structure. The family parameter indicates which one (either AF_INET
or AF_INET6).

LWT encpasulation implementation may use the fib configuration to build
the LWT state.

Signed-off-by: Tom Herbert 
---
 include/net/lwtunnel.h|  3 +++
 net/core/lwtunnel.c   |  5 +++--
 net/ipv4/fib_semantics.c  | 17 ++---
 net/ipv4/ip_tunnel_core.c |  2 ++
 net/ipv6/ila.c|  1 +
 net/ipv6/route.c  |  3 ++-
 net/mpls/mpls_iptunnel.c  |  1 +
 7 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h
index 8434898..fce0e35 100644
--- a/include/net/lwtunnel.h
+++ b/include/net/lwtunnel.h
@@ -26,6 +26,7 @@ struct lwtunnel_state {
 
 struct lwtunnel_encap_ops {
int (*build_state)(struct net_device *dev, struct nlattr *encap,
+  unsigned int family, const void *cfg,
   struct lwtunnel_state **ts);
int (*output)(struct sock *sk, struct sk_buff *skb);
int (*input)(struct sk_buff *skb);
@@ -80,6 +81,7 @@ int lwtunnel_encap_del_ops(const struct lwtunnel_encap_ops 
*op,
   unsigned int num);
 int lwtunnel_build_state(struct net_device *dev, u16 encap_type,
 struct nlattr *encap,
+unsigned int family, const void *cfg,
 struct lwtunnel_state **lws);
 int lwtunnel_fill_encap(struct sk_buff *skb,
struct lwtunnel_state *lwtstate);
@@ -130,6 +132,7 @@ static inline int lwtunnel_encap_del_ops(const struct 
lwtunnel_encap_ops *op,
 
 static inline int lwtunnel_build_state(struct net_device *dev, u16 encap_type,
   struct nlattr *encap,
+  unsigned int family, const void *cfg,
   struct lwtunnel_state **lws)
 {
return -EOPNOTSUPP;
diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c
index e924c2e..dfb1a9c 100644
--- a/net/core/lwtunnel.c
+++ b/net/core/lwtunnel.c
@@ -72,7 +72,8 @@ int lwtunnel_encap_del_ops(const struct lwtunnel_encap_ops 
*ops,
 EXPORT_SYMBOL(lwtunnel_encap_del_ops);
 
 int lwtunnel_build_state(struct net_device *dev, u16 encap_type,
-struct nlattr *encap, struct lwtunnel_state **lws)
+struct nlattr *encap, unsigned int family,
+const void *cfg, struct lwtunnel_state **lws)
 {
const struct lwtunnel_encap_ops *ops;
int ret = -EINVAL;
@@ -85,7 +86,7 @@ int lwtunnel_build_state(struct net_device *dev, u16 
encap_type,
rcu_read_lock();
ops = rcu_dereference(lwtun_encaps[encap_type]);
if (likely(ops && ops->build_state))
-   ret = ops->build_state(dev, encap, lws);
+   ret = ops->build_state(dev, encap, family, cfg, lws);
rcu_read_unlock();
 
return ret;
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 01f1c7d..1b2d011 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -511,7 +511,8 @@ static int fib_get_nhs(struct fib_info *fi, struct 
rtnexthop *rtnh,
dev = __dev_get_by_index(net, 
cfg->fc_oif);
ret = lwtunnel_build_state(dev, nla_get_u16(
   nla_entype),
-  nla, &lwtstate);
+  nla,  AF_INET, cfg,
+  &lwtstate);
if (ret)
goto errout;
nexthop_nh->nh_lwtstate =
@@ -535,7 +536,8 @@ errout:
 
 static int fib_encap_match(struct net *net, u16 encap_type,
   struct nlattr *encap,
-  int oif, const struct fib_nh *nh)
+  int oif, const struct fib_nh *nh,
+  const struct fib_config *cfg)
 {
struct lwtunnel_state *lwtstate;
struct net_device *dev = NULL;
@@ -546,8 +548,8 @@ static int fib_encap_match(struct net *net, u16 encap_type,
 
if (oif)
dev = __dev_get_by_index(net, oif);
-   ret = lwtunnel_build_state(dev, encap_type,
-  encap, &lwtstate);
+   ret = lwtunnel_build_state(dev, encap_type, encap,
+  AF_INET, cfg, &lwtstate);
if (!ret) {
result = lwtunnel_cmp_encap(lwtstate, nh->nh_lwtstate);
lwtstate_free(lwtstate);
@@ -571,7 +573,7 @@ int fib_nh_match(struct fib_config *cfg, struct fib_info 
*fi)
if (cfg->fc_encap) {
if (fib_encap_ma

[PATCH net-next v3 2/2] ila: Precompute checksum difference for translations

2015-08-24 Thread Tom Herbert

In the ILA build state for LWT compute the checksum difference to apply
to transport checksums that include the IPv6 pseudo header. The
difference is between the route destination (from fib6_config) and the
locator to write.

Signed-off-by: Tom Herbert 
---
 net/ipv6/ila.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/net/ipv6/ila.c b/net/ipv6/ila.c
index ffe4dca..678d2df 100644
--- a/net/ipv6/ila.c
+++ b/net/ipv6/ila.c
@@ -14,6 +14,8 @@
 
 struct ila_params {
__be64 locator;
+   __be64 locator_match;
+   __wsum csum_diff;
 };
 
 static inline struct ila_params *ila_params_lwtunnel(
@@ -33,6 +35,9 @@ static inline __wsum compute_csum_diff8(const __be32 *from, 
const __be32 *to)
 
 static inline __wsum get_csum_diff(struct ipv6hdr *ip6h, struct ila_params *p)
 {
+   if (*(__be64 *)&ip6h->daddr == p->locator_match)
+   return p->csum_diff;
+   else
return compute_csum_diff8((__be32 *)&ip6h->daddr,
  (__be32 *)&p->locator);
 }
@@ -130,8 +135,12 @@ static int ila_build_state(struct net_device *dev, struct 
nlattr *nla,
struct nlattr *tb[ILA_ATTR_MAX + 1];
size_t encap_len = sizeof(*p);
struct lwtunnel_state *newts;
+   const struct fib6_config *cfg6 = cfg;
int ret;
 
+   if (family != AF_INET6)
+   return -EINVAL;
+
ret = nla_parse_nested(tb, ILA_ATTR_MAX, nla,
   ila_nl_policy);
if (ret < 0)
@@ -149,6 +158,15 @@ static int ila_build_state(struct net_device *dev, struct 
nlattr *nla,
 
p->locator = (__force __be64)nla_get_u64(tb[ILA_ATTR_LOCATOR]);
 
+   if (cfg6->fc_dst_len > sizeof(__be64)) {
+   /* Precompute checksum difference for translation since we
+* know both the old locator and the new one.
+*/
+   p->locator_match = *(__be64 *)&cfg6->fc_dst;
+   p->csum_diff = compute_csum_diff8(
+   (__be32 *)&p->locator_match, (__be32 *)&p->locator);
+   }
+
newts->type = LWTUNNEL_ENCAP_ILA;
newts->flags |= LWTUNNEL_STATE_OUTPUT_REDIRECT |
LWTUNNEL_STATE_INPUT_REDIRECT;
-- 
1.8.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v3 0/2] ila: Precompute checksums

2015-08-24 Thread Tom Herbert

This patch set:
 - Adds argument ot LWT build_state that holds a pointer to the fib
   configuration being applied to the new route
 - Adds support in ILA to precompute checksum difference for
   performance optimization

v2:
 - Move return argument in build_state to end of arguments

v3:
 - Update the signature for ip6_tun_build_state()


Tom Herbert (2):
  lwt: Add cfg argument to build_state
  ila: Precompute checksum difference for translations

 include/net/lwtunnel.h|  3 +++
 net/core/lwtunnel.c   |  5 +++--
 net/ipv4/fib_semantics.c  | 17 ++---
 net/ipv4/ip_tunnel_core.c |  2 ++
 net/ipv6/ila.c| 19 +++
 net/ipv6/route.c  |  3 ++-
 net/mpls/mpls_iptunnel.c  |  1 +
 7 files changed, 40 insertions(+), 10 deletions(-)

-- 
1.8.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] vrf: rename the framework to mrf

2015-08-24 Thread Nicolas Dichtel

Le 22/08/2015 19:47, David Miller a écrit :

From: Nicolas Dichtel 
Date: Sat, 22 Aug 2015 18:10:20 +0200

This patch renames the recently added vrf driver. 'VRF' term is very
generic and there is no clear definition of it.
For example, someone may expect more isolation and uses network namespaces
to implement VRF,

This is a rediculous argument.

Does someone using VRF on a Cisco box expect Linux namespaces to be used?

Sorry, this is not going to get applied.

I spent some time today to check threads on this topic on Quagga and on netdev
and I digged into the VRF-lite's commercial documentations. I finally agree
with you, let's drop this patch.

Nicolas
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 9/9] phy: fixed_phy: Set phy capabilities even when link is down

2015-08-24 Thread Andrew Lunn

On Sun, Aug 23, 2015 at 11:40:07AM -0700, Florian Fainelli wrote:
> Le 08/23/15 02:47, Andrew Lunn a écrit :
> > What features a phy supports is masked in genphy_config_init() by
> > looking at the PHYs BMSR register.
> > 
> > If the link is down, fixed_phy_update_regs() will only set the auto-
> > negotiation capable bit in BMSR. Thus genphy_config_init() comes to
> > the conclusion the PHY can only perform 10/Half, and masks out the
> > higher speed features. If however the link it up, BMSR is set to
> > indicate the speed the PHY is capable of auto-negotiating, and
> > genphy_config_init() does not mask out the high speed features.
> > 
> > To fix this, when the link is down, have fixed_phy_update_regs() leave
> > the link status and auto-negotiation complete bit unset, but set all
> > the other bits depending on the fixed phy speed.
> 
> This kinds of revert what Staas did in commit
> 868a4215be9a6d80548ccb74763b883dc99d32a2 ("net: phy: fixed_phy: handle
> link-down case"). When the link is down, it does not seem to me like we
> can rely on the previous speed and duplex parameters to be considered valid.
> 
> Your change does fix a valid use case though... humm.

Hi Florian

I took at look at Staas fix, and read a bit about what the different
bits mean. I've reworked the patch. I now always set the local phy
capabilities in BMSR, but only set the negotiated speed and link
partner capabilities if the link it up. I also don't error out on
speed=0, unless the link is up.

This works for my use case, and hopefully also Staas.

I will post the new version when we have come to a conclusion about
other open issues.

  Andrew
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Low throughput in VMs using VxLAN

2015-08-24 Thread Santosh R

 Hi,

   Earlier I was seeing lower throughput in VMs using VxLan as GRO was
not happening in VM.
Tom Herbert suggested to use "vxlan: GRO support at tunnel layer" patch series.
With today's net-next (4.2.0-rc7) in host and VM, I could see GRO
happening for vxlan, macvtap and virtual interface in VM.
The throughput is still low between VMs (around 4Gbps compared to
9Gbps without VxLAN).
Looks like the packet is getting segmented in Host and then GROed in VM.
Is this an expected behaviour? Is my below configuration correct?

Here is the configuration.
eth (VM) - macvtap - vxlan - phy iface  <-> phy iface - vxlan -
macvtap - (VM) eth

VM is started with
# qemu-system-x86_64 -m 4096 -smp 4 -boot c  -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=C2:B2:CA:6F:BC:A4 -device
e1000,netdev=tap0,mac=DE:AD:BE:EF:96:32 -netdev tap,id=hostnet0,fd=3
3<>/dev/tap18 -netdev tap,id=tap0,script=no  -drive
file=/root/vdisk_rhel65.img

Here is the skb_segment count for 10 sec iperf receive test.
host # ./funccount skb_segment
Tracing "skb_segment"... Ctrl-C to end.
^C
FUNC  COUNT
skb_segment   58604

# ./functrace skb_segment
...
 -0 [006] ..s. 17632.030126: skb_segment <-tcp_gso_segment
 ksoftirqd/6-38[006] ..s. 17632.030177: skb_segment <-tcp_gso_segment
 ksoftirqd/6-38[006] ..s. 17632.030223: skb_segment <-tcp_gso_segment
 ksoftirqd/6-38[006] ..s. 17632.030269: skb_segment <-tcp_gso_segment
 ksoftirqd/6-38[006] ..s. 17632.030298: skb_segment <-tcp_gso_segment
 qemu-system-x86-5932  [006] ..s. 17632.030489: skb_segment <-tcp_gso_segment
 qemu-system-x86-5932  [006] ..s. 17632.030507: skb_segment <-tcp_gso_segment
 qemu-system-x86-5932  [006] ..s. 17632.030528: skb_segment <-tcp_gso_segment
 qemu-system-x86-5932  [006] ..s. 17632.030550: skb_segment <-tcp_gso_segment
 qemu-system-x86-5932  [006] ..s. 17632.030576: skb_segment <-tcp_gso_segment
 qemu-system-x86-5932  [006] ..s1 17632.030759: skb_segment <-tcp_gso_segment
 qemu-system-x86-5932  [006] ..s1 17632.030814: skb_segment <-tcp_gso_segment
..

# Physical interface
21:32:49.749263 IP 102.22.22.14.39561 > 102.22.22.12.otv: UDP, length 2870
21:32:49.749278 IP 102.22.22.14.39561 > 102.22.22.12.otv: UDP, length 9860
21:32:49.749326 IP 102.22.22.12.44214 > 102.22.22.14.otv: UDP, length 74
21:32:49.749333 IP 102.22.22.12.44214 > 102.22.22.14.otv: UDP, length 74
21:32:49.749340 IP 102.22.22.12.44214 > 102.22.22.14.otv: UDP, length 74
21:32:49.749405 IP 102.22.22.14.39561 > 102.22.22.12.otv: UDP, length 2870
21:32:49.749425 IP 102.22.22.14.39561 > 102.22.22.12.otv: UDP, length 11258

# VxLAN
21:32:49.749268 IP 102.44.44.14.60616 > 102.44.44.12.commplex-link:
Flags [.], seq 25:2821, ack 1, win 111, options [nop,nop,TS val
15632994 ecr 13334931], length 2796
21:32:49.749281 IP 102.44.44.14.60616 > 102.44.44.12.commplex-link:
Flags [.], seq 2821:12607, ack 1, win 111, options [nop,nop,TS val
15632994 ecr 13334931], length 9786
21:32:49.749322 IP 102.44.44.12.commplex-link > 102.44.44.14.60616:
Flags [.], ack 2821, win 270, options [nop,nop,TS val 13334931 ecr
15632994], length 0
21:32:49.749331 IP 102.44.44.12.commplex-link > 102.44.44.14.60616:
Flags [.], ack 7015, win 336, options [nop,nop,TS val 13334931 ecr
15632994], length 0
21:32:49.749336 IP 102.44.44.12.commplex-link > 102.44.44.14.60616:
Flags [.], ack 12607, win 423, options [nop,nop,TS val 13334931 ecr
15632994], length 0
21:32:49.749411 IP 102.44.44.14.60616 > 102.44.44.12.commplex-link:
Flags [.], seq 12607:15403, ack 1, win 111, options [nop,nop,TS val
15632994 ecr 13334931], length 2796
21:32:49.749429 IP 102.44.44.14.60616 > 102.44.44.12.commplex-link:
Flags [P.], seq 15403:26587, ack 1, win 111, options [nop,nop,TS val
15632994 ecr 13334931], length 11184

# macvtap
2.44.44.14.60616 > 102.44.44.12.commplex-link: Flags [.], seq 25:2821,
ack 1, win 111, options [nop,nop,TS val 15632994 ecr 13334931], length
2796
21:32:49.749281 IP 102.44.44.14.60616 > 102.44.44.12.commplex-link:
Flags [.], seq 2821:12607, ack 1, win 111, options [nop,nop,TS val
15632994 ecr 13334931], length 9786
21:32:49.749321 IP 102.44.44.12.commplex-link > 102.44.44.14.60616:
Flags [.], ack 2821, win 270, options [nop,nop,TS val 13334931 ecr
15632994], length 0
21:32:49.749330 IP 102.44.44.12.commplex-link > 102.44.44.14.60616:
Flags [.], ack 7015, win 336, options [nop,nop,TS val 13334931 ecr
15632994], length 0
21:32:49.749335 IP 102.44.44.12.commplex-link > 102.44.44.14.60616:
Flags [.], ack 12607, win 423, options [nop,nop,TS val 13334931 ecr
15632994], length 0
21:32:49.749411 IP 102.44.44.14.60616 > 102.44.44.12.commplex-link:
Flags [.], seq 12607:15403, ack 1, win 111, options [nop,nop,TS val
15632994 ecr 13334931], length 2796
21:32:49.749429 IP 102.44.44.14.60616 > 102.44.44.12.commplex-link:
Flags [P.], seq 15403:26587, ack 1, win 111, options [nop,nop,TS val
15632994 ecr 13334931], length 11184

# VM interface
2:02:48.126327 IP 102.44.44.14

[no subject]

2015-08-24 Thread Koray Uçar

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[no subject]

2015-08-24 Thread Koray Uçar

subscribe netdev
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] man ip-link: Add little explanations about VLAN qos map

2015-08-24 Thread Vadim Kochan

From: Vadim Kochan 

Add little more info about how to manually set priority by iptables,
and some little clarifications about ingress/egress QoS mapping.

Signed-off-by: Vadim Kochan 
---
 man/man8/ip-link.8.in | 27 ---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
index b9137fb..2283b71 100644
--- a/man/man8/ip-link.8.in
+++ b/man/man8/ip-link.8.in
@@ -349,10 +349,30 @@ where  is the physical device to which VLAN 
device is bound.
 - specifies whether the VLAN device state is bound to the physical device 
state.
 
 .BI ingress-qos-map " QOS-MAP "
-- defines a mapping between priority code points on incoming frames.  The 
format is FROM:TO with multiple mappings separated by spaces.
+- defines a mapping of VLAN header prio field to the Linux internal packet
+priority on incoming frames. The format is FROM:TO with multiple mappings
+separated by spaces.
 
 .BI egress-qos-map " QOS-MAP "
-- the same as ingress-qos-map but for outgoing frames.
+- defines a mapping of Linux internal packet priority to VLAN header prio field
+but for outgoing frames. The format is the same as for ingress-qos-map.
+.in +4
+
+Linux packet priority can be set by
+.BR iptables "(8)":
+.in +4
+.sp
+.B iptables
+-t mangle -A POSTROUTING [...] -j CLASSIFY --set-class 0:4
+.sp
+.in -4
+and this "4" priority can be used in the egress qos mapping to set VLAN prio 
"5":
+.sp
+.in +4
+.B ip
+link set veth0.10 type vlan egress 4:5
+.in -4
+.in -4
 .in -8
 
 .TP
@@ -1090,7 +1110,8 @@ IEEE 802.15.4 device wpan0.
 .br
 .BR ip (8),
 .BR ip-netns (8),
-.BR ethtool (8)
+.BR ethtool (8),
+.BR iptables (8)
 
 .SH AUTHOR
 Original Manpage by Michail Litvak 
-- 
2.4.2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch net-next 1/3] mlxsw: Remove duplicate included header

2015-08-24 Thread Jiri Pirko

From: Ido Schimmel 

Signed-off-by: Ido Schimmel 
Signed-off-by: Jiri Pirko 
Signed-off-by: Elad Raz 
---
 drivers/net/ethernet/mellanox/mlxsw/core.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c 
b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 09325b7..0415ff6 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -48,7 +48,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch net-next 3/3] mlxsw: adjust log messages level in __mlxsw_emad_transmit

2015-08-24 Thread Jiri Pirko

From: Jiri Pirko 

When transmit fails, it is an error, not a warning.
Do not warn when timeout happens as that is handled by a counter.

Signed-off-by: Jiri Pirko 
Signed-off-by: Ido Schimmel 
Signed-off-by: Elad Raz 
---
 drivers/net/ethernet/mellanox/mlxsw/core.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c 
b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 6ee3f45..dfafb83 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -382,8 +382,8 @@ static int __mlxsw_emad_transmit(struct mlxsw_core 
*mlxsw_core,
 
err = mlxsw_core_skb_transmit(mlxsw_core->driver_priv, skb, tx_info);
if (err) {
-   dev_warn(mlxsw_core->bus_info->dev, "Failed to transmit EMAD 
(tid=%llx)\n",
-mlxsw_core->emad.tid);
+   dev_err(mlxsw_core->bus_info->dev, "Failed to transmit EMAD 
(tid=%llx)\n",
+   mlxsw_core->emad.tid);
dev_kfree_skb(skb);
return err;
}
@@ -393,8 +393,8 @@ static int __mlxsw_emad_transmit(struct mlxsw_core 
*mlxsw_core,
 !(mlxsw_core->emad.trans_active),
 msecs_to_jiffies(MLXSW_EMAD_TIMEOUT_MS));
if (!ret) {
-   dev_warn(mlxsw_core->bus_info->dev, "EMAD timed-out 
(tid=%llx)\n",
-mlxsw_core->emad.tid);
+   dev_dbg(mlxsw_core->bus_info->dev, "EMAD timed-out 
(tid=%llx)\n",
+   mlxsw_core->emad.tid);
mlxsw_core->emad.trans_active = false;
mlxsw_core->emad.stats.timeouts++;
return -EIO;
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch net-next 0/3] mlxsw: small driver update

2015-08-24 Thread Jiri Pirko

From: Jiri Pirko 

Ido Schimmel (1):
  mlxsw: Remove duplicate included header

Jiri Pirko (2):
  mlxsw: expose EMAD transactions statistics via debugfs
  mlxsw: adjust log messages level in __mlxsw_emad_transmit

 drivers/net/ethernet/mellanox/mlxsw/core.c | 60 ++
 1 file changed, 52 insertions(+), 8 deletions(-)

-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch net-next 2/3] mlxsw: expose EMAD transactions statistics via debugfs

2015-08-24 Thread Jiri Pirko

From: Jiri Pirko 

Signed-off-by: Jiri Pirko 
Signed-off-by: Ido Schimmel 
Signed-off-by: Elad Raz 
---
 drivers/net/ethernet/mellanox/mlxsw/core.c | 51 --
 1 file changed, 48 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c 
b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 0415ff6..6ee3f45 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -98,6 +98,12 @@ struct mlxsw_core {
bool trans_active;
struct mutex lock; /* One EMAD transaction at a time. */
bool use_emad;
+   struct {
+   u64 trans;
+   u32 fails;
+   u32 retries;
+   u32 timeouts;
+   } stats;
} emad;
struct mlxsw_core_pcpu_stats __percpu *pcpu_stats;
struct dentry *dbg_dir;
@@ -390,6 +396,7 @@ static int __mlxsw_emad_transmit(struct mlxsw_core 
*mlxsw_core,
dev_warn(mlxsw_core->bus_info->dev, "EMAD timed-out 
(tid=%llx)\n",
 mlxsw_core->emad.tid);
mlxsw_core->emad.trans_active = false;
+   mlxsw_core->emad.stats.timeouts++;
return -EIO;
}
 
@@ -463,8 +470,10 @@ retry:
if (!err || err != -EAGAIN)
goto out;
}
-   if (n_retry++ < MLXSW_EMAD_MAX_RETRY)
+   if (n_retry++ < MLXSW_EMAD_MAX_RETRY) {
+   mlxsw_core->emad.stats.retries++;
goto retry;
+   }
 
 out:
dev_kfree_skb(skb);
@@ -671,6 +680,35 @@ static const struct file_operations 
mlxsw_core_rx_stats_dbg_ops = {
.llseek = seq_lseek
 };
 
+static int mlxsw_core_emad_stats_dbg_read(struct seq_file *file, void *data)
+{
+   struct mlxsw_core *mlxsw_core = file->private;
+
+   if (mutex_lock_interruptible(&mlxsw_core->emad.lock))
+   return -EINTR;
+   seq_printf(file, "transactions: %llu\n", mlxsw_core->emad.stats.trans);
+   seq_printf(file, "fails: %u\n", mlxsw_core->emad.stats.fails);
+   seq_printf(file, "retries: %u\n", mlxsw_core->emad.stats.retries);
+   seq_printf(file, "timeouts: %u\n", mlxsw_core->emad.stats.timeouts);
+   mutex_unlock(&mlxsw_core->emad.lock);
+   return 0;
+}
+
+static int mlxsw_core_emad_stats_dbg_open(struct inode *inode, struct file *f)
+{
+   struct mlxsw_core *mlxsw_core = inode->i_private;
+
+   return single_open(f, mlxsw_core_emad_stats_dbg_read, mlxsw_core);
+}
+
+static const struct file_operations mlxsw_core_emad_stats_dbg_ops = {
+   .owner = THIS_MODULE,
+   .open = mlxsw_core_emad_stats_dbg_open,
+   .release = single_release,
+   .read = seq_read,
+   .llseek = seq_lseek
+};
+
 static void mlxsw_core_buf_dump_dbg(struct mlxsw_core *mlxsw_core,
const char *buf, size_t size)
 {
@@ -768,6 +806,8 @@ static int mlxsw_core_debugfs_init(struct mlxsw_core 
*mlxsw_core)
mlxsw_core->dbg.psid_blob.size = sizeof(bus_info->psid);
debugfs_create_blob("psid", S_IRUGO, mlxsw_core->dbg_dir,
&mlxsw_core->dbg.psid_blob);
+   debugfs_create_file("emad_stats", S_IRUGO, mlxsw_core->dbg_dir,
+   mlxsw_core, &mlxsw_core_emad_stats_dbg_ops);
return 0;
 }
 
@@ -1107,8 +1147,10 @@ retry:
err = mlxsw_cmd_access_reg(mlxsw_core, in_mbox, out_mbox);
if (!err) {
err = mlxsw_emad_process_status(mlxsw_core, out_mbox);
-   if (err == -EAGAIN && n_retry++ < MLXSW_EMAD_MAX_RETRY)
+   if (err == -EAGAIN && n_retry++ < MLXSW_EMAD_MAX_RETRY) {
+   mlxsw_core->emad.stats.retries++;
goto retry;
+   }
}
 
if (!err)
@@ -1137,6 +1179,7 @@ static int mlxsw_core_reg_access(struct mlxsw_core 
*mlxsw_core,
return -EINTR;
}
 
+   mlxsw_core->emad.stats.trans++;
cur_tid = mlxsw_core->emad.tid;
dev_dbg(mlxsw_core->bus_info->dev, "Reg access 
(tid=%llx,reg_id=%x(%s),type=%s)\n",
cur_tid, reg->id, mlxsw_reg_id_str(reg->id),
@@ -1153,10 +1196,12 @@ static int mlxsw_core_reg_access(struct mlxsw_core 
*mlxsw_core,
err = mlxsw_core_reg_access_emad(mlxsw_core, reg,
 payload, type);
 
-   if (err)
+   if (err) {
dev_err(mlxsw_core->bus_info->dev, "Reg access failed 
(tid=%llx,reg_id=%x(%s),type=%s)\n",
cur_tid, reg->id, mlxsw_reg_id_str(reg->id),
mlxsw_core_reg_access_type_str(type));
+   mlxsw_core->emad.stats.fails++;
+   }
 
mutex_unlock(&mlxsw_core->emad.lock);
return err;
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@

[PATCH] net-next: Fix warning while make xmldocs caused by skbuff.c

2015-08-24 Thread Masanari Iida

This patch fix following warnings.

.//net/core/skbuff.c:407: warning: No description found
for parameter 'len'
.//net/core/skbuff.c:407: warning: Excess function parameter
 'length' description in '__netdev_alloc_skb'
.//net/core/skbuff.c:476: warning: No description found
 for parameter 'len'
.//net/core/skbuff.c:476: warning: Excess function parameter
'length' description in '__napi_alloc_skb'

Signed-off-by: Masanari Iida 
---
 net/core/skbuff.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 7b84330..dad4dd3 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -392,7 +392,7 @@ EXPORT_SYMBOL(napi_alloc_frag);
 /**
  * __netdev_alloc_skb - allocate an skbuff for rx on a specific device
  * @dev: network device to receive on
- * @length: length to allocate
+ * @len: length to allocate
  * @gfp_mask: get_free_pages mask, passed to alloc_skb
  *
  * Allocate a new &sk_buff and assign it a usage count of one. The
@@ -461,7 +461,7 @@ EXPORT_SYMBOL(__netdev_alloc_skb);
 /**
  * __napi_alloc_skb - allocate skbuff for rx in a specific NAPI instance
  * @napi: napi instance this buffer was allocated for
- * @length: length to allocate
+ * @len: length to allocate
  * @gfp_mask: get_free_pages mask, passed to alloc_skb and alloc_pages
  *
  * Allocate a new sk_buff for use in NAPI receive.  This buffer will
-- 
2.5.0.234.gefc8a62

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [linux-sunxi] Re: [PATCH] net: sun4i-emac: Claim emac sram

2015-08-24 Thread Maxime Ripard

On Mon, Aug 24, 2015 at 11:17:43AM +0200, Hans de Goede wrote:
> Hi,
> 
> On 24-08-15 09:46, Maxime Ripard wrote:
> >Hi Hans,
> >
> >On Sun, Aug 23, 2015 at 08:31:38PM +0200, Hans de Goede wrote:
> >>Claim the emac sram ourselves, rather then relying on the bootloader
> >>having mapped the sram to the emac controller during boot.
> >>
> >>Signed-off-by: Hans de Goede 
> >>---
> >>  drivers/net/ethernet/allwinner/sun4i-emac.c | 13 +++--
> >>  1 file changed, 11 insertions(+), 2 deletions(-)
> >>
> >>diff --git a/drivers/net/ethernet/allwinner/sun4i-emac.c 
> >>b/drivers/net/ethernet/allwinner/sun4i-emac.c
> >>index bab01c84..48ce83e 100644
> >>--- a/drivers/net/ethernet/allwinner/sun4i-emac.c
> >>+++ b/drivers/net/ethernet/allwinner/sun4i-emac.c
> >>@@ -28,6 +28,7 @@
> >>  #include 
> >>  #include 
> >>  #include 
> >>+#include 
> >>
> >>  #include "sun4i-emac.h"
> >>
> >>@@ -857,11 +858,17 @@ static int emac_probe(struct platform_device *pdev)
> >>
> >>clk_prepare_enable(db->clk);
> >>
> >>+   ret = sunxi_sram_claim(&pdev->dev);
> >>+   if (ret) {
> >>+   dev_err(&pdev->dev, "Error couldn't map SRAM to device\n");
> >>+   goto out;
> >
> >Shouldn't you disable you clock too?
> 
> You're right, but that is a pre-existing problem, iow an unrelated
> issue.
> 
> I've put doing a follow-up patch for this on my todo list.

Thanks.

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com


signature.asc
Description: Digital signature

Re: [PATCH] usbnet: Fix two races between usbnet_stop() and the BH

2015-08-24 Thread Bjørn Mork

Eugene Shatokhin  writes:

> 19.08.2015 15:31, Bjørn Mork пишет:
>> Eugene Shatokhin  writes:
>>
>>> The problem is not in the reordering but rather in the fact that
>>> "dev->flags = 0" is not necessarily atomic
>>> w.r.t. "clear_bit(EVENT_RX_KILL, &dev->flags)", and vice versa.
>>>
>>> So the following might be possible, although unlikely:
>>>
>>> CPU0 CPU1
>>>   clear_bit: read dev->flags
>>>   clear_bit: clear EVENT_RX_KILL in the read value
>>>
>>> dev->flags=0;
>>>
>>>   clear_bit: write updated dev->flags
>>>
>>> As a result, dev->flags may become non-zero again.
>>
>> Ah, right.  Thanks for explaining.
>>
>>> I cannot prove yet that this is an impossible situation. If anyone
>>> can, please explain. If so, this part of the patch will not be needed.
>>
>> I wonder if we could simply move the dev->flags = 0 down a few lines to
>> fix both issues?  It doesn't seem to do anything useful except for
>> resetting the flags to a sane initial state after the device is down.
>>
>> Stopping the tasklet rescheduling etc depends only on netif_running(),
>> which will be false when usbnet_stop is called.  There is no need to
>> touch dev->flags for this to happen.
>
> That was one of the first ideas we discussed here. Unfortunately, it
> is probably not so simple.
>
> Setting dev->flags to 0 makes some delayed operations do nothing and,
> among other things, not to reschedule usbnet_bh().

Yes, but I believe that is merely a side effect.  You should never need
to clear multiple flags to get the desired behaviour.

> As you can see in drivers/net/usb/usbnet.c, usbnet_bh() can be called
> as a tasklet function and as a timer function in a number of
> situations (look for the usage of dev->bh and dev->delay there).
>
> netif_running() is indeed false when usbnet_stop() runs, usbnet_stop()
> also disables Tx. This seems to be enough for many cases where
> usbnet_bh() is scheduled, but I am not so sure about the remaining
> ones, namely:
>
> 1. A work function, usbnet_deferred_kevent(), may reschedule
> usbnet_bh(). Looks like the workqueue is only stopped in
> usbnet_disconnect(), so a work item might be processed while
> usbnet_stop() works. Setting dev->flags to 0 makes the work function
> do nothing, by the way. See also the comment in usbnet_stop() about
> this.
>
> A work item may be placed to this workqueue in a number of ways, by
> both usbnet module and the mini-drivers. It is not too easy to track
> all these situations.

That's an understatement :)



> 2. rx_complete() and tx_complete() may schedule execution of
> usbnet_bh() as a tasklet or a timer function. These two are URB
> completion callbacks.
>
> It seems, new Rx and Tx URBs cannot be submitted when usbnet_stop()
> clears dev->flags, indeed. But it does not prevent the completion
> handlers for the previously submitted URBs from running concurrently
> with usbnet_stop(). The latter waits for them to complete (via
> usbnet_terminate_urbs(dev)) but only if FLAG_AVOID_UNLINK_URBS is not
> set in info->flags. rndis_wlan, however, sets this flag for a few
> hardware models. So - no guarantees here as well.

FLAG_AVOID_UNLINK_URBS looks like it should be replaced by the newer
ability to keep the status urb active. I believe that must have been the
real reason for adding it, based on the commit message and the effect
the flag will have:

 commit 1487cd5e76337555737cbc55d7d83f41460d198f
 Author: Jussi Kivilinna 
 Date:   Thu Jul 30 19:41:20 2009 +0300

usbnet: allow "minidriver" to prevent urb unlinking on usbnet_stop

rndis_wlan devices freeze after running usbnet_stop several times. It 
appears
that firmware freezes in state where it does not respond to any RNDIS 
commands
and device have to be physically unplugged/replugged. This patch lets
minidrivers to disable unlink_urbs on usbnet_stop through new info flag.

Signed-off-by: Jussi Kivilinna 
Cc: David Brownell 
Signed-off-by: John W. Linville 



The rx urbs will not be resubmitted in any case, and there are of course
no tx urbs being submitted.  So the only effect of this flag is on the
status/interrupt urb, which I can imagine some RNDIS devices wants
active all the time. 

So FLAG_AVOID_UNLINK_URBS should probably be removed and replaced calls
to usbnet_status_start() and usbnet_status_stop().  This will require
testing on some of the devices with the original firmware problem
however.

In any case: I do not think this flag should be considered when trying
to make usbnet_stop behaviour saner.  It's only purpose is to
deliberately break usbnet_stop by not actually stopping.


> If someone could list the particular bits of dev->flags that should be
> cleared to make sure no deferred call could reschedule usbnet_bh(),
> etc... Well, it would be enough to clear these first and use
> dev->flags = 0 later, after tasklet_kill() and del_timer_sync(). I
> cannot point out these particular bits now.


I don'

1 2 >

1 - 100 of 125 matches

Mail list logo