Re: tunnels: Don't apply GRO to multiple layers of encapsulation.

2017-09-01 Thread Jesse Gross
On Thu, Aug 31, 2017 at 6:58 AM,   wrote:
> [ resend due to mail problems at my end ]
>
> Hi Jesse,
>
> The backport of fac8e0f579695a3ecbc4d3cac369139d7f819971,
> "tunnels: Don't apply GRO to multiple layers of encapsulation",
> to linux-4.1.y seems to have missed a line.
>
> The 4.1 commit is 066b300e5be43cb61697539e2a3a9aac5afb422f.
>
> The potentially missing line is:
>
> -   .gro_receive= ipv6_gro_receive,
> +   .gro_receive= sit_gro_receive,
>
>
> I am not experiencing any bad symptoms.  I simply noticed
> that the patch introduced a new function, sit_gro_receive,
> without introducing any users, and that same patch in
> linux-4.4.y does have a user.

Thanks for pointing that out. The line you mentioned should indeed be
there and seems to have been missed in the backport.

The backport was actually done by Sasha, not by me - would you mind
sending a patch to him or working with him to fix it?


Re: tunnels: Don't apply GRO to multiple layers of encapsulation.

2017-09-01 Thread Jesse Gross
On Thu, Aug 31, 2017 at 6:58 AM,   wrote:
> [ resend due to mail problems at my end ]
>
> Hi Jesse,
>
> The backport of fac8e0f579695a3ecbc4d3cac369139d7f819971,
> "tunnels: Don't apply GRO to multiple layers of encapsulation",
> to linux-4.1.y seems to have missed a line.
>
> The 4.1 commit is 066b300e5be43cb61697539e2a3a9aac5afb422f.
>
> The potentially missing line is:
>
> -   .gro_receive= ipv6_gro_receive,
> +   .gro_receive= sit_gro_receive,
>
>
> I am not experiencing any bad symptoms.  I simply noticed
> that the patch introduced a new function, sit_gro_receive,
> without introducing any users, and that same patch in
> linux-4.4.y does have a user.

Thanks for pointing that out. The line you mentioned should indeed be
there and seems to have been missed in the backport.

The backport was actually done by Sasha, not by me - would you mind
sending a patch to him or working with him to fix it?


Re: [PATCH] geneve: fix max_mtu setting

2016-06-27 Thread Jesse Gross
On Mon, Jun 27, 2016 at 6:27 PM, 严海双 <yanhaishu...@cmss.chinamobile.com> wrote:
>
> On Jun 28, 2016, at 12:10 AM, Jesse Gross <je...@kernel.org> wrote:
>
> On Sun, Jun 26, 2016 at 6:13 PM, Haishuang Yan
> <yanhaishu...@cmss.chinamobile.com> wrote:
>
>
> On Jun 26, 2016, at 8:35 PM, zhuyj <zyjzyj2...@gmail.com> wrote:
>
> +   if (geneve->remote.sa.sa_family == AF_INET)
> +   max_mtu -= sizeof(struct iphdr);
> +   else
> +   max_mtu -= sizeof(struct ipv6hdr);
>
> Sorry, if sa_family is not AF_NET, it is AF_INET6?
>
> There is a lot of macros in include/linux/socket.h.
>
> Zhu Yanjun
>
>
> There are only two enumerations AF_INET and AF_INET6 have been assigned in
> geneve_newlink:
>
>
> There's actually a third possibility: AF_UNSPEC, which is the default
> if neither remote type is specified. This is used by lightweight
> tunnels and should be able to work with either IPv4/v6. For the
> purposes of the MTU calculation this means that the IPv4 header size
> should be used to avoid disallowing potentially valid configurations.
>
>
> Yes, you’re right. Thanks for you advise. I will send a v2 commit like this:
>
>if (geneve->remote.sa.sa_family == AF_INET6)
>   max_mtu -= sizeof(struct ipv6hdr);
>else
>   max_mtu -= sizeof(struct iphdr);
>
> Is this ok?

Yes, that looks fine to me.


Re: [PATCH] geneve: fix max_mtu setting

2016-06-27 Thread Jesse Gross
On Mon, Jun 27, 2016 at 6:27 PM, 严海双  wrote:
>
> On Jun 28, 2016, at 12:10 AM, Jesse Gross  wrote:
>
> On Sun, Jun 26, 2016 at 6:13 PM, Haishuang Yan
>  wrote:
>
>
> On Jun 26, 2016, at 8:35 PM, zhuyj  wrote:
>
> +   if (geneve->remote.sa.sa_family == AF_INET)
> +   max_mtu -= sizeof(struct iphdr);
> +   else
> +   max_mtu -= sizeof(struct ipv6hdr);
>
> Sorry, if sa_family is not AF_NET, it is AF_INET6?
>
> There is a lot of macros in include/linux/socket.h.
>
> Zhu Yanjun
>
>
> There are only two enumerations AF_INET and AF_INET6 have been assigned in
> geneve_newlink:
>
>
> There's actually a third possibility: AF_UNSPEC, which is the default
> if neither remote type is specified. This is used by lightweight
> tunnels and should be able to work with either IPv4/v6. For the
> purposes of the MTU calculation this means that the IPv4 header size
> should be used to avoid disallowing potentially valid configurations.
>
>
> Yes, you’re right. Thanks for you advise. I will send a v2 commit like this:
>
>if (geneve->remote.sa.sa_family == AF_INET6)
>   max_mtu -= sizeof(struct ipv6hdr);
>else
>   max_mtu -= sizeof(struct iphdr);
>
> Is this ok?

Yes, that looks fine to me.


Re: [PATCH] geneve: fix max_mtu setting

2016-06-27 Thread Jesse Gross
On Sun, Jun 26, 2016 at 6:13 PM, 严海双  wrote:
>
>> On Jun 26, 2016, at 8:35 PM, zhuyj  wrote:
>>
>> +   if (geneve->remote.sa.sa_family == AF_INET)
>> +   max_mtu -= sizeof(struct iphdr);
>> +   else
>> +   max_mtu -= sizeof(struct ipv6hdr);
>>
>> Sorry, if sa_family is not AF_NET, it is AF_INET6?
>>
>> There is a lot of macros in include/linux/socket.h.
>>
>> Zhu Yanjun
>>
>
> There are only two enumerations AF_INET and AF_INET6 have been assigned in 
> geneve_newlink:

There's actually a third possibility: AF_UNSPEC, which is the default
if neither remote type is specified. This is used by lightweight
tunnels and should be able to work with either IPv4/v6. For the
purposes of the MTU calculation this means that the IPv4 header size
should be used to avoid disallowing potentially valid configurations.


Re: [PATCH] geneve: fix max_mtu setting

2016-06-27 Thread Jesse Gross
On Sun, Jun 26, 2016 at 6:13 PM, 严海双  wrote:
>
>> On Jun 26, 2016, at 8:35 PM, zhuyj  wrote:
>>
>> +   if (geneve->remote.sa.sa_family == AF_INET)
>> +   max_mtu -= sizeof(struct iphdr);
>> +   else
>> +   max_mtu -= sizeof(struct ipv6hdr);
>>
>> Sorry, if sa_family is not AF_NET, it is AF_INET6?
>>
>> There is a lot of macros in include/linux/socket.h.
>>
>> Zhu Yanjun
>>
>
> There are only two enumerations AF_INET and AF_INET6 have been assigned in 
> geneve_newlink:

There's actually a third possibility: AF_UNSPEC, which is the default
if neither remote type is specified. This is used by lightweight
tunnels and should be able to work with either IPv4/v6. For the
purposes of the MTU calculation this means that the IPv4 header size
should be used to avoid disallowing potentially valid configurations.


Re: [ovs-dev] [PATCH] openvswitch: reduce padding in struct sw_flow_key

2016-03-18 Thread Jesse Gross
On Fri, Mar 18, 2016 at 6:34 AM, Arnd Bergmann  wrote:
> This means it's still too large really, we just don't warn about it any more,
> and will get the warning again once another member is added. My patch is a
> band-aid at best, but more work is needed here. One problem is that
> ovs_flow_cmd_new and ovs_flow_cmd_set have two copies of struct sw_flow_key on
> the stack, one of them nested within sw_flow_mask. If we could reduce that to
> a single instance, the stack usage would be cut in half here.

I think it should be possible to reduce those two functions to only
use a single instance of the structure.

For new flows, we're already allocating the key structure on the heap,
it seems like we could just translate the key into that rather than to
intermediate location on the stack. And when setting flows, I'm not
sure that we really need the mask at all - we don't do anything with
it other than check it against the actions but we really should be
using the original mask for that rather than the new one.

It seems like it would be preferable to go down that route rather than
this patch, since as you say, this patch is really only suppressing
the warning and doesn't have a significant impact on the actual stack
consumption. Plus, the ordering of the flow layout affects how much
data needs to be compared during packet lookup, so this change will
likely be bad for forwarding performance.


Re: [ovs-dev] [PATCH] openvswitch: reduce padding in struct sw_flow_key

2016-03-18 Thread Jesse Gross
On Fri, Mar 18, 2016 at 6:34 AM, Arnd Bergmann  wrote:
> This means it's still too large really, we just don't warn about it any more,
> and will get the warning again once another member is added. My patch is a
> band-aid at best, but more work is needed here. One problem is that
> ovs_flow_cmd_new and ovs_flow_cmd_set have two copies of struct sw_flow_key on
> the stack, one of them nested within sw_flow_mask. If we could reduce that to
> a single instance, the stack usage would be cut in half here.

I think it should be possible to reduce those two functions to only
use a single instance of the structure.

For new flows, we're already allocating the key structure on the heap,
it seems like we could just translate the key into that rather than to
intermediate location on the stack. And when setting flows, I'm not
sure that we really need the mask at all - we don't do anything with
it other than check it against the actions but we really should be
using the original mask for that rather than the new one.

It seems like it would be preferable to go down that route rather than
this patch, since as you say, this patch is really only suppressing
the warning and doesn't have a significant impact on the actual stack
consumption. Plus, the ordering of the flow layout affects how much
data needs to be compared during packet lookup, so this change will
likely be bad for forwarding performance.


Re: [ovs-dev] [PATCH] ovs: do not allocate memory from offline numa node

2015-10-09 Thread Jesse Gross
On Fri, Oct 9, 2015 at 8:54 AM, Jarno Rajahalme  wrote:
>
> On Oct 8, 2015, at 4:03 PM, Jesse Gross  wrote:
>
> On Wed, Oct 7, 2015 at 10:47 AM, Jarno Rajahalme 
> wrote:
>
>
> On Oct 6, 2015, at 6:01 PM, Jesse Gross  wrote:
>
> On Mon, Oct 5, 2015 at 1:25 PM, Alexander Duyck
>  wrote:
>
> On 10/05/2015 06:59 AM, Vlastimil Babka wrote:
>
>
> On 10/02/2015 12:18 PM, Konstantin Khlebnikov wrote:
>
>
> When openvswitch tries allocate memory from offline numa node 0:
> stats = kmem_cache_alloc_node(flow_stats_cache, GFP_KERNEL | __GFP_ZERO,
> 0)
> It catches VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid))
> [ replaced with VM_WARN_ON(!node_online(nid)) recently ] in linux/gfp.h
> This patch disables numa affinity in this case.
>
> Signed-off-by: Konstantin Khlebnikov 
>
>
>
> ...
>
> diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
> index f2ea83ba4763..c7f74aab34b9 100644
> --- a/net/openvswitch/flow_table.c
> +++ b/net/openvswitch/flow_table.c
> @@ -93,7 +93,8 @@ struct sw_flow *ovs_flow_alloc(void)
>
> /* Initialize the default stat node. */
> stats = kmem_cache_alloc_node(flow_stats_cache,
> -  GFP_KERNEL | __GFP_ZERO, 0);
> +  GFP_KERNEL | __GFP_ZERO,
> +  node_online(0) ? 0 : NUMA_NO_NODE);
>
>
>
> Stupid question: can node 0 become offline between this check, and the
> VM_WARN_ON? :) BTW what kind of system has node 0 offline?
>
>
>
> Another question to ask would be is it possible for node 0 to be online, but
> be a memoryless node?
>
> I would say you are better off just making this call kmem_cache_alloc.  I
> don't see anything that indicates the memory has to come from node 0, so
> adding the extra overhead doesn't provide any value.
>
>
> I agree that this at least makes me wonder, though I actually have
> concerns in the opposite direction - I see assumptions about this
> being on node 0 in net/openvswitch/flow.c.
>
> Jarno, since you original wrote this code, can you take a look to see
> if everything still makes sense?
>
>
> We keep the pre-allocated stats node at array index 0, which is initially
> used by all CPUs, but if CPUs from multiple numa nodes start updating the
> stats, we allocate additional stats nodes (up to one per numa node), and the
> CPUs on node 0 keep using the preallocated entry. If stats cannot be
> allocated from CPUs local node, then those CPUs keep using the entry at
> index 0. Currently the code in net/openvswitch/flow.c will try to allocate
> the local memory repeatedly, which may not be optimal when there is no
> memory at the local node.
>
> Allocating the memory for the index 0 from other than node 0, as discussed
> here, just means that the CPUs on node 0 will keep on using non-local memory
> for stats. In a scenario where there are CPUs on two nodes (0, 1), but only
> the node 1 has memory, a shared flow entry will still end up having separate
> memory allocated for both nodes, but both of the nodes would be at node 1.
> However, there is still a high likelihood that the memory allocations would
> not share a cache line, which should prevent the nodes from invalidating
> each other’s caches. Based on this I do not see a problem relaxing the
> memory allocation for the default stats node. If node 0 has memory, however,
> it would be better to allocate the memory from node 0.
>
>
> Thanks for going through all of that.
>
> It seems like the question that is being raised is whether it actually
> makes sense to try to get the initial memory on node 0, especially
> since it seems to introduce some corner cases? Is there any reason why
> the flow is more likely to hit node 0 than a randomly chosen one?
> (Assuming that this is a multinode system, otherwise it's kind of a
> moot point.) We could have a separate pointer to the default allocated
> memory, so it wouldn't conflict with memory that was intentionally
> allocated for node 0.
>
>
> It would still be preferable to know from which node the default stats node
> was allocated, and store it in the appropriate pointer in the array. We
> could then add a new “default stats node index” that would be used to locate
> the node in the array of pointers we already have. That way we would avoid
> extra allocation and processing of the default stats node.

I agree, that sounds reasonable to me. Will you make that change?

Besides eliminating corner cases, it might help performance in some
cases too by avoiding stressing memory bandwidth on node 0.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ovs-dev] [PATCH] ovs: do not allocate memory from offline numa node

2015-10-09 Thread Jesse Gross
On Fri, Oct 9, 2015 at 8:54 AM, Jarno Rajahalme <jrajaha...@nicira.com> wrote:
>
> On Oct 8, 2015, at 4:03 PM, Jesse Gross <je...@nicira.com> wrote:
>
> On Wed, Oct 7, 2015 at 10:47 AM, Jarno Rajahalme <jrajaha...@nicira.com>
> wrote:
>
>
> On Oct 6, 2015, at 6:01 PM, Jesse Gross <je...@nicira.com> wrote:
>
> On Mon, Oct 5, 2015 at 1:25 PM, Alexander Duyck
> <alexander.du...@gmail.com> wrote:
>
> On 10/05/2015 06:59 AM, Vlastimil Babka wrote:
>
>
> On 10/02/2015 12:18 PM, Konstantin Khlebnikov wrote:
>
>
> When openvswitch tries allocate memory from offline numa node 0:
> stats = kmem_cache_alloc_node(flow_stats_cache, GFP_KERNEL | __GFP_ZERO,
> 0)
> It catches VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid))
> [ replaced with VM_WARN_ON(!node_online(nid)) recently ] in linux/gfp.h
> This patch disables numa affinity in this case.
>
> Signed-off-by: Konstantin Khlebnikov <khlebni...@yandex-team.ru>
>
>
>
> ...
>
> diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
> index f2ea83ba4763..c7f74aab34b9 100644
> --- a/net/openvswitch/flow_table.c
> +++ b/net/openvswitch/flow_table.c
> @@ -93,7 +93,8 @@ struct sw_flow *ovs_flow_alloc(void)
>
> /* Initialize the default stat node. */
> stats = kmem_cache_alloc_node(flow_stats_cache,
> -  GFP_KERNEL | __GFP_ZERO, 0);
> +  GFP_KERNEL | __GFP_ZERO,
> +  node_online(0) ? 0 : NUMA_NO_NODE);
>
>
>
> Stupid question: can node 0 become offline between this check, and the
> VM_WARN_ON? :) BTW what kind of system has node 0 offline?
>
>
>
> Another question to ask would be is it possible for node 0 to be online, but
> be a memoryless node?
>
> I would say you are better off just making this call kmem_cache_alloc.  I
> don't see anything that indicates the memory has to come from node 0, so
> adding the extra overhead doesn't provide any value.
>
>
> I agree that this at least makes me wonder, though I actually have
> concerns in the opposite direction - I see assumptions about this
> being on node 0 in net/openvswitch/flow.c.
>
> Jarno, since you original wrote this code, can you take a look to see
> if everything still makes sense?
>
>
> We keep the pre-allocated stats node at array index 0, which is initially
> used by all CPUs, but if CPUs from multiple numa nodes start updating the
> stats, we allocate additional stats nodes (up to one per numa node), and the
> CPUs on node 0 keep using the preallocated entry. If stats cannot be
> allocated from CPUs local node, then those CPUs keep using the entry at
> index 0. Currently the code in net/openvswitch/flow.c will try to allocate
> the local memory repeatedly, which may not be optimal when there is no
> memory at the local node.
>
> Allocating the memory for the index 0 from other than node 0, as discussed
> here, just means that the CPUs on node 0 will keep on using non-local memory
> for stats. In a scenario where there are CPUs on two nodes (0, 1), but only
> the node 1 has memory, a shared flow entry will still end up having separate
> memory allocated for both nodes, but both of the nodes would be at node 1.
> However, there is still a high likelihood that the memory allocations would
> not share a cache line, which should prevent the nodes from invalidating
> each other’s caches. Based on this I do not see a problem relaxing the
> memory allocation for the default stats node. If node 0 has memory, however,
> it would be better to allocate the memory from node 0.
>
>
> Thanks for going through all of that.
>
> It seems like the question that is being raised is whether it actually
> makes sense to try to get the initial memory on node 0, especially
> since it seems to introduce some corner cases? Is there any reason why
> the flow is more likely to hit node 0 than a randomly chosen one?
> (Assuming that this is a multinode system, otherwise it's kind of a
> moot point.) We could have a separate pointer to the default allocated
> memory, so it wouldn't conflict with memory that was intentionally
> allocated for node 0.
>
>
> It would still be preferable to know from which node the default stats node
> was allocated, and store it in the appropriate pointer in the array. We
> could then add a new “default stats node index” that would be used to locate
> the node in the array of pointers we already have. That way we would avoid
> extra allocation and processing of the default stats node.

I agree, that sounds reasonable to me. Will you make that change?

Besides eliminating corner cases, it might help performance in some
cases too by avoiding stressing memory bandwidth on node 0.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ovs-dev] [PATCH] ovs: do not allocate memory from offline numa node

2015-10-08 Thread Jesse Gross
On Wed, Oct 7, 2015 at 10:47 AM, Jarno Rajahalme  wrote:
>
>> On Oct 6, 2015, at 6:01 PM, Jesse Gross  wrote:
>>
>> On Mon, Oct 5, 2015 at 1:25 PM, Alexander Duyck
>>  wrote:
>>> On 10/05/2015 06:59 AM, Vlastimil Babka wrote:
>>>>
>>>> On 10/02/2015 12:18 PM, Konstantin Khlebnikov wrote:
>>>>>
>>>>> When openvswitch tries allocate memory from offline numa node 0:
>>>>> stats = kmem_cache_alloc_node(flow_stats_cache, GFP_KERNEL | __GFP_ZERO,
>>>>> 0)
>>>>> It catches VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid))
>>>>> [ replaced with VM_WARN_ON(!node_online(nid)) recently ] in linux/gfp.h
>>>>> This patch disables numa affinity in this case.
>>>>>
>>>>> Signed-off-by: Konstantin Khlebnikov 
>>>>
>>>>
>>>> ...
>>>>
>>>>> diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
>>>>> index f2ea83ba4763..c7f74aab34b9 100644
>>>>> --- a/net/openvswitch/flow_table.c
>>>>> +++ b/net/openvswitch/flow_table.c
>>>>> @@ -93,7 +93,8 @@ struct sw_flow *ovs_flow_alloc(void)
>>>>>
>>>>>  /* Initialize the default stat node. */
>>>>>  stats = kmem_cache_alloc_node(flow_stats_cache,
>>>>> -  GFP_KERNEL | __GFP_ZERO, 0);
>>>>> +  GFP_KERNEL | __GFP_ZERO,
>>>>> +  node_online(0) ? 0 : NUMA_NO_NODE);
>>>>
>>>>
>>>> Stupid question: can node 0 become offline between this check, and the
>>>> VM_WARN_ON? :) BTW what kind of system has node 0 offline?
>>>
>>>
>>> Another question to ask would be is it possible for node 0 to be online, but
>>> be a memoryless node?
>>>
>>> I would say you are better off just making this call kmem_cache_alloc.  I
>>> don't see anything that indicates the memory has to come from node 0, so
>>> adding the extra overhead doesn't provide any value.
>>
>> I agree that this at least makes me wonder, though I actually have
>> concerns in the opposite direction - I see assumptions about this
>> being on node 0 in net/openvswitch/flow.c.
>>
>> Jarno, since you original wrote this code, can you take a look to see
>> if everything still makes sense?
>
> We keep the pre-allocated stats node at array index 0, which is initially 
> used by all CPUs, but if CPUs from multiple numa nodes start updating the 
> stats, we allocate additional stats nodes (up to one per numa node), and the 
> CPUs on node 0 keep using the preallocated entry. If stats cannot be 
> allocated from CPUs local node, then those CPUs keep using the entry at index 
> 0. Currently the code in net/openvswitch/flow.c will try to allocate the 
> local memory repeatedly, which may not be optimal when there is no memory at 
> the local node.
>
> Allocating the memory for the index 0 from other than node 0, as discussed 
> here, just means that the CPUs on node 0 will keep on using non-local memory 
> for stats. In a scenario where there are CPUs on two nodes (0, 1), but only 
> the node 1 has memory, a shared flow entry will still end up having separate 
> memory allocated for both nodes, but both of the nodes would be at node 1. 
> However, there is still a high likelihood that the memory allocations would 
> not share a cache line, which should prevent the nodes from invalidating each 
> other’s caches. Based on this I do not see a problem relaxing the memory 
> allocation for the default stats node. If node 0 has memory, however, it 
> would be better to allocate the memory from node 0.

Thanks for going through all of that.

It seems like the question that is being raised is whether it actually
makes sense to try to get the initial memory on node 0, especially
since it seems to introduce some corner cases? Is there any reason why
the flow is more likely to hit node 0 than a randomly chosen one?
(Assuming that this is a multinode system, otherwise it's kind of a
moot point.) We could have a separate pointer to the default allocated
memory, so it wouldn't conflict with memory that was intentionally
allocated for node 0.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ovs-dev] [PATCH] ovs: do not allocate memory from offline numa node

2015-10-08 Thread Jesse Gross
On Wed, Oct 7, 2015 at 10:47 AM, Jarno Rajahalme <jrajaha...@nicira.com> wrote:
>
>> On Oct 6, 2015, at 6:01 PM, Jesse Gross <je...@nicira.com> wrote:
>>
>> On Mon, Oct 5, 2015 at 1:25 PM, Alexander Duyck
>> <alexander.du...@gmail.com> wrote:
>>> On 10/05/2015 06:59 AM, Vlastimil Babka wrote:
>>>>
>>>> On 10/02/2015 12:18 PM, Konstantin Khlebnikov wrote:
>>>>>
>>>>> When openvswitch tries allocate memory from offline numa node 0:
>>>>> stats = kmem_cache_alloc_node(flow_stats_cache, GFP_KERNEL | __GFP_ZERO,
>>>>> 0)
>>>>> It catches VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid))
>>>>> [ replaced with VM_WARN_ON(!node_online(nid)) recently ] in linux/gfp.h
>>>>> This patch disables numa affinity in this case.
>>>>>
>>>>> Signed-off-by: Konstantin Khlebnikov <khlebni...@yandex-team.ru>
>>>>
>>>>
>>>> ...
>>>>
>>>>> diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
>>>>> index f2ea83ba4763..c7f74aab34b9 100644
>>>>> --- a/net/openvswitch/flow_table.c
>>>>> +++ b/net/openvswitch/flow_table.c
>>>>> @@ -93,7 +93,8 @@ struct sw_flow *ovs_flow_alloc(void)
>>>>>
>>>>>  /* Initialize the default stat node. */
>>>>>  stats = kmem_cache_alloc_node(flow_stats_cache,
>>>>> -  GFP_KERNEL | __GFP_ZERO, 0);
>>>>> +  GFP_KERNEL | __GFP_ZERO,
>>>>> +  node_online(0) ? 0 : NUMA_NO_NODE);
>>>>
>>>>
>>>> Stupid question: can node 0 become offline between this check, and the
>>>> VM_WARN_ON? :) BTW what kind of system has node 0 offline?
>>>
>>>
>>> Another question to ask would be is it possible for node 0 to be online, but
>>> be a memoryless node?
>>>
>>> I would say you are better off just making this call kmem_cache_alloc.  I
>>> don't see anything that indicates the memory has to come from node 0, so
>>> adding the extra overhead doesn't provide any value.
>>
>> I agree that this at least makes me wonder, though I actually have
>> concerns in the opposite direction - I see assumptions about this
>> being on node 0 in net/openvswitch/flow.c.
>>
>> Jarno, since you original wrote this code, can you take a look to see
>> if everything still makes sense?
>
> We keep the pre-allocated stats node at array index 0, which is initially 
> used by all CPUs, but if CPUs from multiple numa nodes start updating the 
> stats, we allocate additional stats nodes (up to one per numa node), and the 
> CPUs on node 0 keep using the preallocated entry. If stats cannot be 
> allocated from CPUs local node, then those CPUs keep using the entry at index 
> 0. Currently the code in net/openvswitch/flow.c will try to allocate the 
> local memory repeatedly, which may not be optimal when there is no memory at 
> the local node.
>
> Allocating the memory for the index 0 from other than node 0, as discussed 
> here, just means that the CPUs on node 0 will keep on using non-local memory 
> for stats. In a scenario where there are CPUs on two nodes (0, 1), but only 
> the node 1 has memory, a shared flow entry will still end up having separate 
> memory allocated for both nodes, but both of the nodes would be at node 1. 
> However, there is still a high likelihood that the memory allocations would 
> not share a cache line, which should prevent the nodes from invalidating each 
> other’s caches. Based on this I do not see a problem relaxing the memory 
> allocation for the default stats node. If node 0 has memory, however, it 
> would be better to allocate the memory from node 0.

Thanks for going through all of that.

It seems like the question that is being raised is whether it actually
makes sense to try to get the initial memory on node 0, especially
since it seems to introduce some corner cases? Is there any reason why
the flow is more likely to hit node 0 than a randomly chosen one?
(Assuming that this is a multinode system, otherwise it's kind of a
moot point.) We could have a separate pointer to the default allocated
memory, so it wouldn't conflict with memory that was intentionally
allocated for node 0.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ovs-dev] [PATCH] ovs: do not allocate memory from offline numa node

2015-10-06 Thread Jesse Gross
On Mon, Oct 5, 2015 at 1:25 PM, Alexander Duyck
 wrote:
> On 10/05/2015 06:59 AM, Vlastimil Babka wrote:
>>
>> On 10/02/2015 12:18 PM, Konstantin Khlebnikov wrote:
>>>
>>> When openvswitch tries allocate memory from offline numa node 0:
>>> stats = kmem_cache_alloc_node(flow_stats_cache, GFP_KERNEL | __GFP_ZERO,
>>> 0)
>>> It catches VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid))
>>> [ replaced with VM_WARN_ON(!node_online(nid)) recently ] in linux/gfp.h
>>> This patch disables numa affinity in this case.
>>>
>>> Signed-off-by: Konstantin Khlebnikov 
>>
>>
>> ...
>>
>>> diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
>>> index f2ea83ba4763..c7f74aab34b9 100644
>>> --- a/net/openvswitch/flow_table.c
>>> +++ b/net/openvswitch/flow_table.c
>>> @@ -93,7 +93,8 @@ struct sw_flow *ovs_flow_alloc(void)
>>>
>>>   /* Initialize the default stat node. */
>>>   stats = kmem_cache_alloc_node(flow_stats_cache,
>>> -  GFP_KERNEL | __GFP_ZERO, 0);
>>> +  GFP_KERNEL | __GFP_ZERO,
>>> +  node_online(0) ? 0 : NUMA_NO_NODE);
>>
>>
>> Stupid question: can node 0 become offline between this check, and the
>> VM_WARN_ON? :) BTW what kind of system has node 0 offline?
>
>
> Another question to ask would be is it possible for node 0 to be online, but
> be a memoryless node?
>
> I would say you are better off just making this call kmem_cache_alloc.  I
> don't see anything that indicates the memory has to come from node 0, so
> adding the extra overhead doesn't provide any value.

I agree that this at least makes me wonder, though I actually have
concerns in the opposite direction - I see assumptions about this
being on node 0 in net/openvswitch/flow.c.

Jarno, since you original wrote this code, can you take a look to see
if everything still makes sense?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ovs-dev] [PATCH] ovs: do not allocate memory from offline numa node

2015-10-06 Thread Jesse Gross
On Mon, Oct 5, 2015 at 1:25 PM, Alexander Duyck
 wrote:
> On 10/05/2015 06:59 AM, Vlastimil Babka wrote:
>>
>> On 10/02/2015 12:18 PM, Konstantin Khlebnikov wrote:
>>>
>>> When openvswitch tries allocate memory from offline numa node 0:
>>> stats = kmem_cache_alloc_node(flow_stats_cache, GFP_KERNEL | __GFP_ZERO,
>>> 0)
>>> It catches VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid))
>>> [ replaced with VM_WARN_ON(!node_online(nid)) recently ] in linux/gfp.h
>>> This patch disables numa affinity in this case.
>>>
>>> Signed-off-by: Konstantin Khlebnikov 
>>
>>
>> ...
>>
>>> diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
>>> index f2ea83ba4763..c7f74aab34b9 100644
>>> --- a/net/openvswitch/flow_table.c
>>> +++ b/net/openvswitch/flow_table.c
>>> @@ -93,7 +93,8 @@ struct sw_flow *ovs_flow_alloc(void)
>>>
>>>   /* Initialize the default stat node. */
>>>   stats = kmem_cache_alloc_node(flow_stats_cache,
>>> -  GFP_KERNEL | __GFP_ZERO, 0);
>>> +  GFP_KERNEL | __GFP_ZERO,
>>> +  node_online(0) ? 0 : NUMA_NO_NODE);
>>
>>
>> Stupid question: can node 0 become offline between this check, and the
>> VM_WARN_ON? :) BTW what kind of system has node 0 offline?
>
>
> Another question to ask would be is it possible for node 0 to be online, but
> be a memoryless node?
>
> I would say you are better off just making this call kmem_cache_alloc.  I
> don't see anything that indicates the memory has to come from node 0, so
> adding the extra overhead doesn't provide any value.

I agree that this at least makes me wonder, though I actually have
concerns in the opposite direction - I see assumptions about this
being on node 0 in net/openvswitch/flow.c.

Jarno, since you original wrote this code, can you take a look to see
if everything still makes sense?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net-next 1/9] openvswitch: Scrub packet in ovs_vport_receive()

2015-08-07 Thread Jesse Gross
On Tue, Aug 4, 2015 at 9:40 PM, Joe Stringer  wrote:
> On 1 August 2015 at 12:17, Thomas Graf  wrote:
>> On 07/31/15 at 10:51am, Joe Stringer wrote:
>>> On 31 July 2015 at 07:34, Hannes Frederic Sowa  wrote:
>>> > In general, this shouldn't be necessary as the packet should already be
>>> > scrubbed before they arrive here.
>>> >
>>> > Could you maybe add a WARN_ON and check how those skbs with conntrack
>>> > data traverse the stack? I also didn't understand why make it dependent
>>> > on the socket.
>>>
>>> OK, sure. One case I could think of is with an OVS internal port in
>>> another namespace, directly attached to the bridge. I'll have a play
>>> around with WARN_ON and see if I can come up with something more
>>> trimmed down.
>>
>> The OVS internal port will definitely pass through an unscrubbed
>> packet across namespaces. I think the proper thing to do would be
>> to scrub but conditionally keep metadata.
>
> It's only "unscrubbed" when receiving from local stack at the moment.
> Some pieces are cleared when handing towards the local stack, and
> there's no configuration for that behaviour. Presumably internal port
> transmit and receive should mirror each other?
>
> I don't have a specific use case either way. The remaining code for
> this series handles this case correctly, it's just a matter of what
> behaviour we're looking for. We could implement the flag as you say, I
> presume that userspace would need to specify this during vport
> creation and the default should work similar to the existing behaviour
> (ie, keep metadata). One thing that's not entirely clear to me is
> exactly which metadata should be represented by this flag and whether
> the single flag is expressive enough.

I would prefer not to have a flag as it seems unnecessarily
complicated (doubly so if we try to have multiple flags to express
different combinations). The use case for moving internal ports to
different namespaces is pretty narrow, so it seems like we can just
pick a set of metadata to keep and go with that. Mark seems the
primary one to me.

I also think that it would be better to use skb->dev to determine the
original namespace rather than the socket.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net-next 1/9] openvswitch: Scrub packet in ovs_vport_receive()

2015-08-07 Thread Jesse Gross
On Tue, Aug 4, 2015 at 9:40 PM, Joe Stringer joestrin...@nicira.com wrote:
 On 1 August 2015 at 12:17, Thomas Graf tg...@suug.ch wrote:
 On 07/31/15 at 10:51am, Joe Stringer wrote:
 On 31 July 2015 at 07:34, Hannes Frederic Sowa han...@redhat.com wrote:
  In general, this shouldn't be necessary as the packet should already be
  scrubbed before they arrive here.
 
  Could you maybe add a WARN_ON and check how those skbs with conntrack
  data traverse the stack? I also didn't understand why make it dependent
  on the socket.

 OK, sure. One case I could think of is with an OVS internal port in
 another namespace, directly attached to the bridge. I'll have a play
 around with WARN_ON and see if I can come up with something more
 trimmed down.

 The OVS internal port will definitely pass through an unscrubbed
 packet across namespaces. I think the proper thing to do would be
 to scrub but conditionally keep metadata.

 It's only unscrubbed when receiving from local stack at the moment.
 Some pieces are cleared when handing towards the local stack, and
 there's no configuration for that behaviour. Presumably internal port
 transmit and receive should mirror each other?

 I don't have a specific use case either way. The remaining code for
 this series handles this case correctly, it's just a matter of what
 behaviour we're looking for. We could implement the flag as you say, I
 presume that userspace would need to specify this during vport
 creation and the default should work similar to the existing behaviour
 (ie, keep metadata). One thing that's not entirely clear to me is
 exactly which metadata should be represented by this flag and whether
 the single flag is expressive enough.

I would prefer not to have a flag as it seems unnecessarily
complicated (doubly so if we try to have multiple flags to express
different combinations). The use case for moving internal ports to
different namespaces is pretty narrow, so it seems like we can just
pick a set of metadata to keep and go with that. Mark seems the
primary one to me.

I also think that it would be better to use skb-dev to determine the
original namespace rather than the socket.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-nics] [PATCHv4 net] i40e: Implement ndo_gso_check()

2015-01-14 Thread Jesse Gross
On Fri, Dec 26, 2014 at 3:58 PM, Jesse Gross  wrote:
> On Fri, Dec 5, 2014 at 2:12 PM, Jeff Kirsher
>  wrote:
>> On Fri, 2014-12-05 at 10:41 -0800, Joe Stringer wrote:
>>> ndo_gso_check() was recently introduced to allow NICs to report the
>>> offloading support that they have on a per-skb basis. Add an
>>> implementation for this driver which checks for IPIP, GRE, UDP
>>> tunnels.
>>>
>>> Signed-off-by: Joe Stringer 
>>> ---
>>> v4: Simplify the check to just do tunnel header length.
>>> Fix #define style issue.
>>> v3: Drop IPIP and GRE (no driver support even though hw supports it).
>>> Check for UDP outer protocol for UDP tunnels.
>>> v2: Expand to include IP in IP and IPv4/IPv6 inside GRE/UDP tunnels.
>>> Add MAX_INNER_LENGTH (as 80).
>>> ---
>>>  drivers/net/ethernet/intel/i40e/i40e_main.c |   12 
>>>  1 file changed, 12 insertions(+)
>>
>> Thanks Joe, I will update the patch in my queue with your latest
>> version.
>
> Jeff, as a heads-up, this patch and the corresponding one for fm10k
> will no longer apply now that the ndo has changed. This was changed by
> 5f35227e ("net: Generalize ndo_gso_check to ndo_features_check"). The
> update needed is trivial and is just due to the change in the function
> signature but I'm not sure where you are in the testing process with
> this.

Jeff - just wanted to check if you had seen this.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-nics] [PATCHv4 net] i40e: Implement ndo_gso_check()

2015-01-14 Thread Jesse Gross
On Fri, Dec 26, 2014 at 3:58 PM, Jesse Gross je...@nicira.com wrote:
 On Fri, Dec 5, 2014 at 2:12 PM, Jeff Kirsher
 jeffrey.t.kirs...@intel.com wrote:
 On Fri, 2014-12-05 at 10:41 -0800, Joe Stringer wrote:
 ndo_gso_check() was recently introduced to allow NICs to report the
 offloading support that they have on a per-skb basis. Add an
 implementation for this driver which checks for IPIP, GRE, UDP
 tunnels.

 Signed-off-by: Joe Stringer joestrin...@nicira.com
 ---
 v4: Simplify the check to just do tunnel header length.
 Fix #define style issue.
 v3: Drop IPIP and GRE (no driver support even though hw supports it).
 Check for UDP outer protocol for UDP tunnels.
 v2: Expand to include IP in IP and IPv4/IPv6 inside GRE/UDP tunnels.
 Add MAX_INNER_LENGTH (as 80).
 ---
  drivers/net/ethernet/intel/i40e/i40e_main.c |   12 
  1 file changed, 12 insertions(+)

 Thanks Joe, I will update the patch in my queue with your latest
 version.

 Jeff, as a heads-up, this patch and the corresponding one for fm10k
 will no longer apply now that the ndo has changed. This was changed by
 5f35227e (net: Generalize ndo_gso_check to ndo_features_check). The
 update needed is trivial and is just due to the change in the function
 signature but I'm not sure where you are in the testing process with
 this.

Jeff - just wanted to check if you had seen this.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-nics] [PATCHv4 net] i40e: Implement ndo_gso_check()

2014-12-26 Thread Jesse Gross
On Fri, Dec 5, 2014 at 2:12 PM, Jeff Kirsher
 wrote:
> On Fri, 2014-12-05 at 10:41 -0800, Joe Stringer wrote:
>> ndo_gso_check() was recently introduced to allow NICs to report the
>> offloading support that they have on a per-skb basis. Add an
>> implementation for this driver which checks for IPIP, GRE, UDP
>> tunnels.
>>
>> Signed-off-by: Joe Stringer 
>> ---
>> v4: Simplify the check to just do tunnel header length.
>> Fix #define style issue.
>> v3: Drop IPIP and GRE (no driver support even though hw supports it).
>> Check for UDP outer protocol for UDP tunnels.
>> v2: Expand to include IP in IP and IPv4/IPv6 inside GRE/UDP tunnels.
>> Add MAX_INNER_LENGTH (as 80).
>> ---
>>  drivers/net/ethernet/intel/i40e/i40e_main.c |   12 
>>  1 file changed, 12 insertions(+)
>
> Thanks Joe, I will update the patch in my queue with your latest
> version.

Jeff, as a heads-up, this patch and the corresponding one for fm10k
will no longer apply now that the ndo has changed. This was changed by
5f35227e ("net: Generalize ndo_gso_check to ndo_features_check"). The
update needed is trivial and is just due to the change in the function
signature but I'm not sure where you are in the testing process with
this.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-nics] [PATCHv4 net] i40e: Implement ndo_gso_check()

2014-12-26 Thread Jesse Gross
On Fri, Dec 5, 2014 at 2:12 PM, Jeff Kirsher
jeffrey.t.kirs...@intel.com wrote:
 On Fri, 2014-12-05 at 10:41 -0800, Joe Stringer wrote:
 ndo_gso_check() was recently introduced to allow NICs to report the
 offloading support that they have on a per-skb basis. Add an
 implementation for this driver which checks for IPIP, GRE, UDP
 tunnels.

 Signed-off-by: Joe Stringer joestrin...@nicira.com
 ---
 v4: Simplify the check to just do tunnel header length.
 Fix #define style issue.
 v3: Drop IPIP and GRE (no driver support even though hw supports it).
 Check for UDP outer protocol for UDP tunnels.
 v2: Expand to include IP in IP and IPv4/IPv6 inside GRE/UDP tunnels.
 Add MAX_INNER_LENGTH (as 80).
 ---
  drivers/net/ethernet/intel/i40e/i40e_main.c |   12 
  1 file changed, 12 insertions(+)

 Thanks Joe, I will update the patch in my queue with your latest
 version.

Jeff, as a heads-up, this patch and the corresponding one for fm10k
will no longer apply now that the ndo has changed. This was changed by
5f35227e (net: Generalize ndo_gso_check to ndo_features_check). The
update needed is trivial and is just due to the change in the function
signature but I'm not sure where you are in the testing process with
this.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv2 net] i40e: Implement ndo_gso_check()

2014-12-05 Thread Jesse Gross
On Tue, Dec 2, 2014 at 10:26 AM, Jesse Gross  wrote:
> On Mon, Dec 1, 2014 at 4:09 PM, Tom Herbert  wrote:
>> On Mon, Dec 1, 2014 at 3:53 PM, Jesse Gross  wrote:
>>> On Mon, Dec 1, 2014 at 3:47 PM, Tom Herbert  wrote:
>>>> On Mon, Dec 1, 2014 at 3:35 PM, Joe Stringer  
>>>> wrote:
>>>>> On 21 November 2014 at 09:59, Joe Stringer  wrote:
>>>>>> On 20 November 2014 16:19, Jesse Gross  wrote:
>>>>>>> I don't know if we need to have the check at all for IPIP though -
>>>>>>> after all the driver doesn't expose support for it all (actually it
>>>>>>> doesn't expose GRE either). This raises kind of an interesting
>>>>>>> question about the checks though - it's pretty easy to add support to
>>>>>>> the driver for a new GSO type (and I imagine that people will be
>>>>>>> adding GRE soon) and forget to update the check.
>>>>>>
>>>>>> If the check is more conservative, then testing would show that it's
>>>>>> not working and lead people to figure out why (and update the check).
>>>>>
>>>>> More concretely, one suggestion would be something like following at
>>>>> the start of each gso_check():
>>>>>
>>>>> +   const int supported = SKB_GSO_TCPV4 | SKB_GSO_TCPV6 | 
>>>>> SKB_GSO_FCOE |
>>>>> + SKB_GSO_UDP | SKB_GSO_UDP_TUNNEL;
>>>>> +
>>>>> +   if (skb_shinfo(skb)->gso_type & ~supported)
>>>>> +   return false;
>>>>
>>>> This should already be handled by net_gso_ok.
>>>
>>> My original point wasn't so much that this isn't handled at the moment
>>> but that it's easy to add a supported GSO type but then forget to
>>> update this check - i.e. if a driver already supports UDP_TUNNEL and
>>> adds support for GRE with the same constraints. It seems not entirely
>>> ideal that this function is acting as a blacklist rather than a
>>> whitelist.
>>
>> Agreed, it would be nice to have all the checking logic in one place.
>> If all the drivers end up implementing ndo_gso_check then we could
>> potentially get rid of the GSO types as features. This probably
>> wouldn't be a bad thing since we already know that the features
>> mechanism doesn't scale (for instance there's no way to indicate that
>> certain combinations of GSO types are supported by a device).
>
> This crossed my mind and I agree that it's pretty clear that the
> features mechanism isn't scaling very well. Presumably, the logical
> extension of this is that each driver would have a function that looks
> at a packet and returns a set of offload operations that it can
> support rather than exposing a set of protocols. However, it seems
> like it would probably result in a bunch of duplicate code in each
> driver.

I think a possible middleground here is to convert ndo_gso_check() to
ndo_features_check(). This would behave similarly to
netif_skb_features() and give drivers an opportunity to knock out
features for a given packet. This would allow us to avoid duplicate
code in the immediate case of tunnels where we need to handle both GSO
and checksums and potentially enable wider usage in the future if it
makes sense.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv2 net] i40e: Implement ndo_gso_check()

2014-12-05 Thread Jesse Gross
On Tue, Dec 2, 2014 at 10:26 AM, Jesse Gross je...@nicira.com wrote:
 On Mon, Dec 1, 2014 at 4:09 PM, Tom Herbert therb...@google.com wrote:
 On Mon, Dec 1, 2014 at 3:53 PM, Jesse Gross je...@nicira.com wrote:
 On Mon, Dec 1, 2014 at 3:47 PM, Tom Herbert therb...@google.com wrote:
 On Mon, Dec 1, 2014 at 3:35 PM, Joe Stringer joestrin...@nicira.com 
 wrote:
 On 21 November 2014 at 09:59, Joe Stringer joestrin...@nicira.com wrote:
 On 20 November 2014 16:19, Jesse Gross je...@nicira.com wrote:
 I don't know if we need to have the check at all for IPIP though -
 after all the driver doesn't expose support for it all (actually it
 doesn't expose GRE either). This raises kind of an interesting
 question about the checks though - it's pretty easy to add support to
 the driver for a new GSO type (and I imagine that people will be
 adding GRE soon) and forget to update the check.

 If the check is more conservative, then testing would show that it's
 not working and lead people to figure out why (and update the check).

 More concretely, one suggestion would be something like following at
 the start of each gso_check():

 +   const int supported = SKB_GSO_TCPV4 | SKB_GSO_TCPV6 | 
 SKB_GSO_FCOE |
 + SKB_GSO_UDP | SKB_GSO_UDP_TUNNEL;
 +
 +   if (skb_shinfo(skb)-gso_type  ~supported)
 +   return false;

 This should already be handled by net_gso_ok.

 My original point wasn't so much that this isn't handled at the moment
 but that it's easy to add a supported GSO type but then forget to
 update this check - i.e. if a driver already supports UDP_TUNNEL and
 adds support for GRE with the same constraints. It seems not entirely
 ideal that this function is acting as a blacklist rather than a
 whitelist.

 Agreed, it would be nice to have all the checking logic in one place.
 If all the drivers end up implementing ndo_gso_check then we could
 potentially get rid of the GSO types as features. This probably
 wouldn't be a bad thing since we already know that the features
 mechanism doesn't scale (for instance there's no way to indicate that
 certain combinations of GSO types are supported by a device).

 This crossed my mind and I agree that it's pretty clear that the
 features mechanism isn't scaling very well. Presumably, the logical
 extension of this is that each driver would have a function that looks
 at a packet and returns a set of offload operations that it can
 support rather than exposing a set of protocols. However, it seems
 like it would probably result in a bunch of duplicate code in each
 driver.

I think a possible middleground here is to convert ndo_gso_check() to
ndo_features_check(). This would behave similarly to
netif_skb_features() and give drivers an opportunity to knock out
features for a given packet. This would allow us to avoid duplicate
code in the immediate case of tunnels where we need to handle both GSO
and checksums and potentially enable wider usage in the future if it
makes sense.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv2 net] i40e: Implement ndo_gso_check()

2014-12-02 Thread Jesse Gross
On Mon, Dec 1, 2014 at 4:09 PM, Tom Herbert  wrote:
> On Mon, Dec 1, 2014 at 3:53 PM, Jesse Gross  wrote:
>> On Mon, Dec 1, 2014 at 3:47 PM, Tom Herbert  wrote:
>>> On Mon, Dec 1, 2014 at 3:35 PM, Joe Stringer  wrote:
>>>> On 21 November 2014 at 09:59, Joe Stringer  wrote:
>>>>> On 20 November 2014 16:19, Jesse Gross  wrote:
>>>>>> I don't know if we need to have the check at all for IPIP though -
>>>>>> after all the driver doesn't expose support for it all (actually it
>>>>>> doesn't expose GRE either). This raises kind of an interesting
>>>>>> question about the checks though - it's pretty easy to add support to
>>>>>> the driver for a new GSO type (and I imagine that people will be
>>>>>> adding GRE soon) and forget to update the check.
>>>>>
>>>>> If the check is more conservative, then testing would show that it's
>>>>> not working and lead people to figure out why (and update the check).
>>>>
>>>> More concretely, one suggestion would be something like following at
>>>> the start of each gso_check():
>>>>
>>>> +   const int supported = SKB_GSO_TCPV4 | SKB_GSO_TCPV6 | SKB_GSO_FCOE 
>>>> |
>>>> + SKB_GSO_UDP | SKB_GSO_UDP_TUNNEL;
>>>> +
>>>> +   if (skb_shinfo(skb)->gso_type & ~supported)
>>>> +   return false;
>>>
>>> This should already be handled by net_gso_ok.
>>
>> My original point wasn't so much that this isn't handled at the moment
>> but that it's easy to add a supported GSO type but then forget to
>> update this check - i.e. if a driver already supports UDP_TUNNEL and
>> adds support for GRE with the same constraints. It seems not entirely
>> ideal that this function is acting as a blacklist rather than a
>> whitelist.
>
> Agreed, it would be nice to have all the checking logic in one place.
> If all the drivers end up implementing ndo_gso_check then we could
> potentially get rid of the GSO types as features. This probably
> wouldn't be a bad thing since we already know that the features
> mechanism doesn't scale (for instance there's no way to indicate that
> certain combinations of GSO types are supported by a device).

This crossed my mind and I agree that it's pretty clear that the
features mechanism isn't scaling very well. Presumably, the logical
extension of this is that each driver would have a function that looks
at a packet and returns a set of offload operations that it can
support rather than exposing a set of protocols. However, it seems
like it would probably result in a bunch of duplicate code in each
driver.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv2 net] i40e: Implement ndo_gso_check()

2014-12-02 Thread Jesse Gross
On Mon, Dec 1, 2014 at 4:09 PM, Tom Herbert therb...@google.com wrote:
 On Mon, Dec 1, 2014 at 3:53 PM, Jesse Gross je...@nicira.com wrote:
 On Mon, Dec 1, 2014 at 3:47 PM, Tom Herbert therb...@google.com wrote:
 On Mon, Dec 1, 2014 at 3:35 PM, Joe Stringer joestrin...@nicira.com wrote:
 On 21 November 2014 at 09:59, Joe Stringer joestrin...@nicira.com wrote:
 On 20 November 2014 16:19, Jesse Gross je...@nicira.com wrote:
 I don't know if we need to have the check at all for IPIP though -
 after all the driver doesn't expose support for it all (actually it
 doesn't expose GRE either). This raises kind of an interesting
 question about the checks though - it's pretty easy to add support to
 the driver for a new GSO type (and I imagine that people will be
 adding GRE soon) and forget to update the check.

 If the check is more conservative, then testing would show that it's
 not working and lead people to figure out why (and update the check).

 More concretely, one suggestion would be something like following at
 the start of each gso_check():

 +   const int supported = SKB_GSO_TCPV4 | SKB_GSO_TCPV6 | SKB_GSO_FCOE 
 |
 + SKB_GSO_UDP | SKB_GSO_UDP_TUNNEL;
 +
 +   if (skb_shinfo(skb)-gso_type  ~supported)
 +   return false;

 This should already be handled by net_gso_ok.

 My original point wasn't so much that this isn't handled at the moment
 but that it's easy to add a supported GSO type but then forget to
 update this check - i.e. if a driver already supports UDP_TUNNEL and
 adds support for GRE with the same constraints. It seems not entirely
 ideal that this function is acting as a blacklist rather than a
 whitelist.

 Agreed, it would be nice to have all the checking logic in one place.
 If all the drivers end up implementing ndo_gso_check then we could
 potentially get rid of the GSO types as features. This probably
 wouldn't be a bad thing since we already know that the features
 mechanism doesn't scale (for instance there's no way to indicate that
 certain combinations of GSO types are supported by a device).

This crossed my mind and I agree that it's pretty clear that the
features mechanism isn't scaling very well. Presumably, the logical
extension of this is that each driver would have a function that looks
at a packet and returns a set of offload operations that it can
support rather than exposing a set of protocols. However, it seems
like it would probably result in a bunch of duplicate code in each
driver.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv2 net] i40e: Implement ndo_gso_check()

2014-12-01 Thread Jesse Gross
On Mon, Dec 1, 2014 at 3:47 PM, Tom Herbert  wrote:
> On Mon, Dec 1, 2014 at 3:35 PM, Joe Stringer  wrote:
>> On 21 November 2014 at 09:59, Joe Stringer  wrote:
>>> On 20 November 2014 16:19, Jesse Gross  wrote:
>>>> I don't know if we need to have the check at all for IPIP though -
>>>> after all the driver doesn't expose support for it all (actually it
>>>> doesn't expose GRE either). This raises kind of an interesting
>>>> question about the checks though - it's pretty easy to add support to
>>>> the driver for a new GSO type (and I imagine that people will be
>>>> adding GRE soon) and forget to update the check.
>>>
>>> If the check is more conservative, then testing would show that it's
>>> not working and lead people to figure out why (and update the check).
>>
>> More concretely, one suggestion would be something like following at
>> the start of each gso_check():
>>
>> +   const int supported = SKB_GSO_TCPV4 | SKB_GSO_TCPV6 | SKB_GSO_FCOE |
>> + SKB_GSO_UDP | SKB_GSO_UDP_TUNNEL;
>> +
>> +   if (skb_shinfo(skb)->gso_type & ~supported)
>> +   return false;
>
> This should already be handled by net_gso_ok.

My original point wasn't so much that this isn't handled at the moment
but that it's easy to add a supported GSO type but then forget to
update this check - i.e. if a driver already supports UDP_TUNNEL and
adds support for GRE with the same constraints. It seems not entirely
ideal that this function is acting as a blacklist rather than a
whitelist.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv2 net] i40e: Implement ndo_gso_check()

2014-12-01 Thread Jesse Gross
On Mon, Dec 1, 2014 at 3:47 PM, Tom Herbert therb...@google.com wrote:
 On Mon, Dec 1, 2014 at 3:35 PM, Joe Stringer joestrin...@nicira.com wrote:
 On 21 November 2014 at 09:59, Joe Stringer joestrin...@nicira.com wrote:
 On 20 November 2014 16:19, Jesse Gross je...@nicira.com wrote:
 I don't know if we need to have the check at all for IPIP though -
 after all the driver doesn't expose support for it all (actually it
 doesn't expose GRE either). This raises kind of an interesting
 question about the checks though - it's pretty easy to add support to
 the driver for a new GSO type (and I imagine that people will be
 adding GRE soon) and forget to update the check.

 If the check is more conservative, then testing would show that it's
 not working and lead people to figure out why (and update the check).

 More concretely, one suggestion would be something like following at
 the start of each gso_check():

 +   const int supported = SKB_GSO_TCPV4 | SKB_GSO_TCPV6 | SKB_GSO_FCOE |
 + SKB_GSO_UDP | SKB_GSO_UDP_TUNNEL;
 +
 +   if (skb_shinfo(skb)-gso_type  ~supported)
 +   return false;

 This should already be handled by net_gso_ok.

My original point wasn't so much that this isn't handled at the moment
but that it's easy to add a supported GSO type but then forget to
update this check - i.e. if a driver already supports UDP_TUNNEL and
adds support for GRE with the same constraints. It seems not entirely
ideal that this function is acting as a blacklist rather than a
whitelist.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv2 net] i40e: Implement ndo_gso_check()

2014-11-20 Thread Jesse Gross
On Thu, Nov 20, 2014 at 3:11 PM, Joe Stringer  wrote:
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
> b/drivers/net/ethernet/intel/i40e/i40e_main.c
> index c3a7f4a..2b01c8d 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> @@ -7447,6 +7447,36 @@ static int i40e_ndo_fdb_dump(struct sk_buff *skb,
>
>  #endif /* USE_DEFAULT_FDB_DEL_DUMP */
>  #endif /* HAVE_FDB_OPS */
> +static bool i40e_gso_check(struct sk_buff *skb, struct net_device *dev)
> +{
> +   if ((skb_shinfo(skb)->gso_type & SKB_GSO_IPIP) &&
> +   (skb->inner_protocol_type != ENCAP_TYPE_IPPROTO ||
> +skb->inner_protocol != htons(IPPROTO_IPIP))) {

I think this check on inner_protocol isn't really semantically correct
- that field is a union with inner_ipproto, so it would be more
correct to check against that and then you wouldn't need to use htons
either.

I don't know if we need to have the check at all for IPIP though -
after all the driver doesn't expose support for it all (actually it
doesn't expose GRE either). This raises kind of an interesting
question about the checks though - it's pretty easy to add support to
the driver for a new GSO type (and I imagine that people will be
adding GRE soon) and forget to update the check.

> +   if (skb_shinfo(skb)->gso_type & (SKB_GSO_GRE | SKB_GSO_UDP_TUNNEL)) {
> +   unsigned char *ihdr;
> +
> +   if (skb->inner_protocol_type != ENCAP_TYPE_ETHER)
> +   return false;

I guess if you want to be nice, it might be good to check the
equivalent IP protocol types as well.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 2/5] i40e: Implement ndo_gso_check()

2014-11-20 Thread Jesse Gross
On Thu, Nov 20, 2014 at 11:16 AM, Joe Stringer  wrote:
> On Tuesday, November 04, 2014 15:45:22 Jesse Gross wrote:
>> On Tue, Nov 4, 2014 at 1:56 PM, Joe Stringer  wrote:
>> > diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c
>> > b/drivers/net/ethernet/intel/i40e/i40e_main.c index c3a7f4a..21829b5
>> > 100644
>> > --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
>> > +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
>> > +static bool i40e_gso_check(struct sk_buff *skb, struct net_device *dev)
>> > +{
>> > +   if ((skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL) &&
>> > +   (skb->inner_protocol_type != ENCAP_TYPE_ETHER ||
>> > +skb->inner_protocol != htons(ETH_P_TEB) ||
>> > +skb_inner_mac_header(skb) - skb_transport_header(skb) > 64))
>> > +   return false;
>>
>> I think it may be possible to even support a few more things here.
>> According to the datasheet here:
>> http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xl71
>> 0-10-40-controller-datasheet.pdf
>>
>> This can actually support 64 bytes beyond the tunnel header, which
>> would make for a total of 80 bytes. It looks like it can also support
>> IPv4 or IPv6 beyond just Ethernet as the encapsulated protocol.
>>
>> Intel guys, can you confirm that this is correct?
>
> I'm just respinning this for v4/6 beyond GRE/UDP tunnel and IPIP, and I found
> the description of max protocol parsing size of 480B (with individual header
> limit of 255B). I couldn't find where you get this 64/80 number or which
> headers it maps to. Could you (or one of the intel guys) expand on this?

The number that I gave was from the section on Geneve support (on page
708), which says that it can support up to 64 bytes of options (this
was also my understanding from previous conversations with Intel
guys). I searched for 480 byte limit and it seems like it for receive
instead of transmit, which could conceivably be different.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 2/5] i40e: Implement ndo_gso_check()

2014-11-20 Thread Jesse Gross
On Thu, Nov 20, 2014 at 11:16 AM, Joe Stringer joestrin...@nicira.com wrote:
 On Tuesday, November 04, 2014 15:45:22 Jesse Gross wrote:
 On Tue, Nov 4, 2014 at 1:56 PM, Joe Stringer joestrin...@nicira.com wrote:
  diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c
  b/drivers/net/ethernet/intel/i40e/i40e_main.c index c3a7f4a..21829b5
  100644
  --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
  +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
  +static bool i40e_gso_check(struct sk_buff *skb, struct net_device *dev)
  +{
  +   if ((skb_shinfo(skb)-gso_type  SKB_GSO_UDP_TUNNEL) 
  +   (skb-inner_protocol_type != ENCAP_TYPE_ETHER ||
  +skb-inner_protocol != htons(ETH_P_TEB) ||
  +skb_inner_mac_header(skb) - skb_transport_header(skb)  64))
  +   return false;

 I think it may be possible to even support a few more things here.
 According to the datasheet here:
 http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xl71
 0-10-40-controller-datasheet.pdf

 This can actually support 64 bytes beyond the tunnel header, which
 would make for a total of 80 bytes. It looks like it can also support
 IPv4 or IPv6 beyond just Ethernet as the encapsulated protocol.

 Intel guys, can you confirm that this is correct?

 I'm just respinning this for v4/6 beyond GRE/UDP tunnel and IPIP, and I found
 the description of max protocol parsing size of 480B (with individual header
 limit of 255B). I couldn't find where you get this 64/80 number or which
 headers it maps to. Could you (or one of the intel guys) expand on this?

The number that I gave was from the section on Geneve support (on page
708), which says that it can support up to 64 bytes of options (this
was also my understanding from previous conversations with Intel
guys). I searched for 480 byte limit and it seems like it for receive
instead of transmit, which could conceivably be different.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv2 net] i40e: Implement ndo_gso_check()

2014-11-20 Thread Jesse Gross
On Thu, Nov 20, 2014 at 3:11 PM, Joe Stringer joestrin...@nicira.com wrote:
 diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
 b/drivers/net/ethernet/intel/i40e/i40e_main.c
 index c3a7f4a..2b01c8d 100644
 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
 +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
 @@ -7447,6 +7447,36 @@ static int i40e_ndo_fdb_dump(struct sk_buff *skb,

  #endif /* USE_DEFAULT_FDB_DEL_DUMP */
  #endif /* HAVE_FDB_OPS */
 +static bool i40e_gso_check(struct sk_buff *skb, struct net_device *dev)
 +{
 +   if ((skb_shinfo(skb)-gso_type  SKB_GSO_IPIP) 
 +   (skb-inner_protocol_type != ENCAP_TYPE_IPPROTO ||
 +skb-inner_protocol != htons(IPPROTO_IPIP))) {

I think this check on inner_protocol isn't really semantically correct
- that field is a union with inner_ipproto, so it would be more
correct to check against that and then you wouldn't need to use htons
either.

I don't know if we need to have the check at all for IPIP though -
after all the driver doesn't expose support for it all (actually it
doesn't expose GRE either). This raises kind of an interesting
question about the checks though - it's pretty easy to add support to
the driver for a new GSO type (and I imagine that people will be
adding GRE soon) and forget to update the check.

 +   if (skb_shinfo(skb)-gso_type  (SKB_GSO_GRE | SKB_GSO_UDP_TUNNEL)) {
 +   unsigned char *ihdr;
 +
 +   if (skb-inner_protocol_type != ENCAP_TYPE_ETHER)
 +   return false;

I guess if you want to be nice, it might be good to check the
equivalent IP protocol types as well.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 0/5] Implement ndo_gso_check() for vxlan nics

2014-11-06 Thread Jesse Gross
On Thu, Nov 6, 2014 at 8:06 AM, Tom Herbert  wrote:
> On Wed, Nov 5, 2014 at 10:16 PM, Sathya Perla  wrote:
>>> -Original Message-
>>> From: Tom Herbert [mailto:therb...@google.com]
>>>
>>> On Wed, Nov 5, 2014 at 6:15 PM, David Miller 
>>> wrote:
>>> > From: Joe Stringer 
>>> > Date: Wed, 5 Nov 2014 17:06:46 -0800
>>> >
>>> >> My impression was that the changes are more likely to be
>>> >> hardware-specific (like the i40e changes) rather than software-specific,
>>> >> like changes that might be integrated into the helper.
>>> >
>>> > I think there is more commonality amongst hardware capabilities,
>>> > and this is why I want the helper to play itself out.
>>> >
>>> >> That said, I can rework for one helper. The way I see it would be the
>>> >> same code as these patches, as "vxlan_gso_check(struct sk_buff *)" in
>>> >> drivers/net/vxlan.c which would be called from each driver. Is that what
>>> >> you had in mind?
>>> >
>>> > Yes.
>>>
>>> Note that this code is not VXLAN specific, it will also accept NVGRE
>>> and GRE/UDP with keyid and TEB. I imagine all these cases should be
>>> indistinguishable to the hardware so they probably just work (which
>>> would be cool!). It might be better to name and locate the helper
>>> function to reflect that.
>>
>> Tom, I'm confused as to how the value of (skb_inner_mac_header(skb) - 
>> skb_transport_header(skb))
>> would be the same for VxLAN and NVGRE encapsulated packets. Wouldn't this 
>> value be 16 for VxLAN
>> and 8 for NVGRE?
>>
> The inner headers are reset in iptunnel_handle_offloads. This is
> called in the xmit encapsulation functions (GRE, fou, VXLAN, etc.)
> before adding in encapsulation headers (skb_push), so the
> mac_inner_header will point to the encapsulation payload, i.e. the
> encapsulated packet. This should not change after being set, although
> inner network and inner transport can. The headers are only set on the
> first encapsulation, so with nested tunnels the inner headers point to
> the innermost encapsulated packet. Since VXLAN and NVGRE have same
> size of encapsulation (8 UDP + 8 header),   skb_inner_mac_header(skb)
> - skb_transport_header(skb) should always be 16.

Tom, NVGRE is not encapsulated in UDP and it is not 16 bytes.

http://tools.ietf.org/html/draft-sridharan-virtualization-nvgre-06
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 0/5] Implement ndo_gso_check() for vxlan nics

2014-11-06 Thread Jesse Gross
On Thu, Nov 6, 2014 at 8:06 AM, Tom Herbert therb...@google.com wrote:
 On Wed, Nov 5, 2014 at 10:16 PM, Sathya Perla sathya.pe...@emulex.com wrote:
 -Original Message-
 From: Tom Herbert [mailto:therb...@google.com]

 On Wed, Nov 5, 2014 at 6:15 PM, David Miller da...@davemloft.net
 wrote:
  From: Joe Stringer joestrin...@nicira.com
  Date: Wed, 5 Nov 2014 17:06:46 -0800
 
  My impression was that the changes are more likely to be
  hardware-specific (like the i40e changes) rather than software-specific,
  like changes that might be integrated into the helper.
 
  I think there is more commonality amongst hardware capabilities,
  and this is why I want the helper to play itself out.
 
  That said, I can rework for one helper. The way I see it would be the
  same code as these patches, as vxlan_gso_check(struct sk_buff *) in
  drivers/net/vxlan.c which would be called from each driver. Is that what
  you had in mind?
 
  Yes.

 Note that this code is not VXLAN specific, it will also accept NVGRE
 and GRE/UDP with keyid and TEB. I imagine all these cases should be
 indistinguishable to the hardware so they probably just work (which
 would be cool!). It might be better to name and locate the helper
 function to reflect that.

 Tom, I'm confused as to how the value of (skb_inner_mac_header(skb) - 
 skb_transport_header(skb))
 would be the same for VxLAN and NVGRE encapsulated packets. Wouldn't this 
 value be 16 for VxLAN
 and 8 for NVGRE?

 The inner headers are reset in iptunnel_handle_offloads. This is
 called in the xmit encapsulation functions (GRE, fou, VXLAN, etc.)
 before adding in encapsulation headers (skb_push), so the
 mac_inner_header will point to the encapsulation payload, i.e. the
 encapsulated packet. This should not change after being set, although
 inner network and inner transport can. The headers are only set on the
 first encapsulation, so with nested tunnels the inner headers point to
 the innermost encapsulated packet. Since VXLAN and NVGRE have same
 size of encapsulation (8 UDP + 8 header),   skb_inner_mac_header(skb)
 - skb_transport_header(skb) should always be 16.

Tom, NVGRE is not encapsulated in UDP and it is not 16 bytes.

http://tools.ietf.org/html/draft-sridharan-virtualization-nvgre-06
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 2/5] i40e: Implement ndo_gso_check()

2014-11-04 Thread Jesse Gross
On Tue, Nov 4, 2014 at 1:56 PM, Joe Stringer  wrote:
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
> b/drivers/net/ethernet/intel/i40e/i40e_main.c
> index c3a7f4a..21829b5 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> +static bool i40e_gso_check(struct sk_buff *skb, struct net_device *dev)
> +{
> +   if ((skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL) &&
> +   (skb->inner_protocol_type != ENCAP_TYPE_ETHER ||
> +skb->inner_protocol != htons(ETH_P_TEB) ||
> +skb_inner_mac_header(skb) - skb_transport_header(skb) > 64))
> +   return false;

I think it may be possible to even support a few more things here.
According to the datasheet here:
http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xl710-10-40-controller-datasheet.pdf

This can actually support 64 bytes beyond the tunnel header, which
would make for a total of 80 bytes. It looks like it can also support
IPv4 or IPv6 beyond just Ethernet as the encapsulated protocol.

Intel guys, can you confirm that this is correct?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 2/5] i40e: Implement ndo_gso_check()

2014-11-04 Thread Jesse Gross
On Tue, Nov 4, 2014 at 1:56 PM, Joe Stringer joestrin...@nicira.com wrote:
 diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
 b/drivers/net/ethernet/intel/i40e/i40e_main.c
 index c3a7f4a..21829b5 100644
 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
 +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
 +static bool i40e_gso_check(struct sk_buff *skb, struct net_device *dev)
 +{
 +   if ((skb_shinfo(skb)-gso_type  SKB_GSO_UDP_TUNNEL) 
 +   (skb-inner_protocol_type != ENCAP_TYPE_ETHER ||
 +skb-inner_protocol != htons(ETH_P_TEB) ||
 +skb_inner_mac_header(skb) - skb_transport_header(skb)  64))
 +   return false;

I think it may be possible to even support a few more things here.
According to the datasheet here:
http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xl710-10-40-controller-datasheet.pdf

This can actually support 64 bytes beyond the tunnel header, which
would make for a total of 80 bytes. It looks like it can also support
IPv4 or IPv6 beyond just Ethernet as the encapsulated protocol.

Intel guys, can you confirm that this is correct?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] openvswitch: supply a dummy err_handler of gre_cisco_protocol to prevent kernel crash

2014-04-08 Thread Jesse Gross
On Fri, Apr 4, 2014 at 9:20 PM, wei zhang  wrote:
> At 2014-04-05 07:05:59,"Jesse Gross"  wrote:
>>On Tue, Apr 1, 2014 at 5:23 PM, Wei Zhang  wrote:
>>>
>>> v2 -> v1: use the same logic of the gre_rcv() to distinguish which packet is
>>> intended to us!
>>
>>As a tip on kernel process: if you put the version information after
>>three dashes below the signed-off-by line then git will automatically
>>remove it when the final patch is applied.
>
> Thanks, should I modify it and send a v3 patch?
>
>>
>>> diff --git a/net/openvswitch/vport-gre.c b/net/openvswitch/vport-gre.c
>>> index a3d6951..f391df1 100644
>>> --- a/net/openvswitch/vport-gre.c
>>> +++ b/net/openvswitch/vport-gre.c
>>> @@ -110,6 +110,21 @@ static int gre_rcv(struct sk_buff *skb,
>>> return PACKET_RCVD;
>>>  }
>>>
>>> +/* Called with rcu_read_lock and BH disabled. */
>>> +static int gre_err(struct sk_buff *skb, u32 info,
>>> +  const struct tnl_ptk_info *tpi)
>>> +{
>>> +   struct ovs_net *ovs_net;
>>> +   struct vport *vport;
>>> +
>>> +   ovs_net = net_generic(dev_net(skb->dev), ovs_net_id);
>>> +   vport = rcu_dereference(ovs_net->vport_net.gre_vport);
>>> +   if (unlikely(!vport))
>>> +   return PACKET_REJECT;
>>> +   else
>>> +   return PACKET_RCVD;
>>
>>Sorry, I forgot to say this before - if we receive the packet then we
>>should also call consume_skb() on it.
>
> Maybe there is no need to call consume_skb()? The icmp_rcv() would
> call kfree_skb() for us. I also checked the ipgre_err(), it return
> PACKET_RCVD without call consume_skb() too.

Thanks, you are right. I applied your patch as is.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] openvswitch: supply a dummy err_handler of gre_cisco_protocol to prevent kernel crash

2014-04-08 Thread Jesse Gross
On Fri, Apr 4, 2014 at 9:20 PM, wei zhang asuka@163.com wrote:
 At 2014-04-05 07:05:59,Jesse Gross je...@nicira.com wrote:
On Tue, Apr 1, 2014 at 5:23 PM, Wei Zhang asuka@163.com wrote:

 v2 - v1: use the same logic of the gre_rcv() to distinguish which packet is
 intended to us!

As a tip on kernel process: if you put the version information after
three dashes below the signed-off-by line then git will automatically
remove it when the final patch is applied.

 Thanks, should I modify it and send a v3 patch?


 diff --git a/net/openvswitch/vport-gre.c b/net/openvswitch/vport-gre.c
 index a3d6951..f391df1 100644
 --- a/net/openvswitch/vport-gre.c
 +++ b/net/openvswitch/vport-gre.c
 @@ -110,6 +110,21 @@ static int gre_rcv(struct sk_buff *skb,
 return PACKET_RCVD;
  }

 +/* Called with rcu_read_lock and BH disabled. */
 +static int gre_err(struct sk_buff *skb, u32 info,
 +  const struct tnl_ptk_info *tpi)
 +{
 +   struct ovs_net *ovs_net;
 +   struct vport *vport;
 +
 +   ovs_net = net_generic(dev_net(skb-dev), ovs_net_id);
 +   vport = rcu_dereference(ovs_net-vport_net.gre_vport);
 +   if (unlikely(!vport))
 +   return PACKET_REJECT;
 +   else
 +   return PACKET_RCVD;

Sorry, I forgot to say this before - if we receive the packet then we
should also call consume_skb() on it.

 Maybe there is no need to call consume_skb()? The icmp_rcv() would
 call kfree_skb() for us. I also checked the ipgre_err(), it return
 PACKET_RCVD without call consume_skb() too.

Thanks, you are right. I applied your patch as is.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] openvswitch: supply a dummy err_handler of gre_cisco_protocol to prevent kernel crash

2014-04-04 Thread Jesse Gross
On Tue, Apr 1, 2014 at 5:23 PM, Wei Zhang  wrote:
> When use gre vport, openvswitch register a gre_cisco_protocol but does not
> supply a err_handler with it. The gre_cisco_err() in net/ipv4/gre_demux.c 
> expect
> err_handler be provided with the gre_cisco_protocol implementation, and call
> ->err_handler() without existence check, cause the kernel crash.
>
> This patch provide a err_handler to fix this bug.
>
> v2 -> v1: use the same logic of the gre_rcv() to distinguish which packet is
> intended to us!

As a tip on kernel process: if you put the version information after
three dashes below the signed-off-by line then git will automatically
remove it when the final patch is applied.

> diff --git a/net/openvswitch/vport-gre.c b/net/openvswitch/vport-gre.c
> index a3d6951..f391df1 100644
> --- a/net/openvswitch/vport-gre.c
> +++ b/net/openvswitch/vport-gre.c
> @@ -110,6 +110,21 @@ static int gre_rcv(struct sk_buff *skb,
> return PACKET_RCVD;
>  }
>
> +/* Called with rcu_read_lock and BH disabled. */
> +static int gre_err(struct sk_buff *skb, u32 info,
> +  const struct tnl_ptk_info *tpi)
> +{
> +   struct ovs_net *ovs_net;
> +   struct vport *vport;
> +
> +   ovs_net = net_generic(dev_net(skb->dev), ovs_net_id);
> +   vport = rcu_dereference(ovs_net->vport_net.gre_vport);
> +   if (unlikely(!vport))
> +   return PACKET_REJECT;
> +   else
> +   return PACKET_RCVD;

Sorry, I forgot to say this before - if we receive the packet then we
should also call consume_skb() on it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] openvswitch: supply a dummy err_handler of gre_cisco_protocol to prevent kernel crash

2014-04-04 Thread Jesse Gross
On Tue, Apr 1, 2014 at 5:23 PM, Wei Zhang asuka@163.com wrote:
 When use gre vport, openvswitch register a gre_cisco_protocol but does not
 supply a err_handler with it. The gre_cisco_err() in net/ipv4/gre_demux.c 
 expect
 err_handler be provided with the gre_cisco_protocol implementation, and call
 -err_handler() without existence check, cause the kernel crash.

 This patch provide a err_handler to fix this bug.

 v2 - v1: use the same logic of the gre_rcv() to distinguish which packet is
 intended to us!

As a tip on kernel process: if you put the version information after
three dashes below the signed-off-by line then git will automatically
remove it when the final patch is applied.

 diff --git a/net/openvswitch/vport-gre.c b/net/openvswitch/vport-gre.c
 index a3d6951..f391df1 100644
 --- a/net/openvswitch/vport-gre.c
 +++ b/net/openvswitch/vport-gre.c
 @@ -110,6 +110,21 @@ static int gre_rcv(struct sk_buff *skb,
 return PACKET_RCVD;
  }

 +/* Called with rcu_read_lock and BH disabled. */
 +static int gre_err(struct sk_buff *skb, u32 info,
 +  const struct tnl_ptk_info *tpi)
 +{
 +   struct ovs_net *ovs_net;
 +   struct vport *vport;
 +
 +   ovs_net = net_generic(dev_net(skb-dev), ovs_net_id);
 +   vport = rcu_dereference(ovs_net-vport_net.gre_vport);
 +   if (unlikely(!vport))
 +   return PACKET_REJECT;
 +   else
 +   return PACKET_RCVD;

Sorry, I forgot to say this before - if we receive the packet then we
should also call consume_skb() on it.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] openvswitch: supply a dummy err_handler of gre_cisco_protocol to prevent kernel crash

2014-04-01 Thread Jesse Gross
On Tue, Apr 1, 2014 at 8:24 AM, wei zhang  wrote:
> At 2014-04-01 08:49:53,"Jesse Gross"  wrote:
>>On Sun, Mar 30, 2014 at 5:12 AM, wei zhang  wrote:
>>> At 2014-03-29 06:02:25,"Jesse Gross"  wrote:
>
>>> Maybe I misunderstand something? I think if we discard all packet pass to us
>>> when we use gre vport, new gre_cisco_protocol which has lower priority could
>>> not see the packet intended to it.
>>
>>That's true but in this case it would also not see any data packets,
>>so I don't think that situation would work well anyways.
>>
>>> I checked the implementation of the ipgre_err(), which has be called before
>>> the err_handler of gre vport. It use the the (local address, remote 
>>> address, key)
>>> to distinguish the packet which is realy intended to it, although it could 
>>> not
>>> always get the key from the icmp packet. Should we do as the same as it?
>>> I'm not sure this is feasible, any advice is appreciate.
>>
>>OVS does flow based matching rather than using a static set of
>>configuration parameters, so everything "matches" in some way
>>(although the result might be to drop).
>
> So the flow based match could dynamically determine by the ovs daemon, we 
> could
> not find out the belonging of the packet as far as we call ovs_dp_upcall(), 
> isn't it?

That's right - and since the OVS flow table always has a default
behavior (even if it is drop or send to controller) there's never a
packet that isn't considered to be destined to OVS once it is
received.

If this makes sense to you, would you mind submitting the patch you
had earlier formally with a commit message and signed off by line?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] openvswitch: supply a dummy err_handler of gre_cisco_protocol to prevent kernel crash

2014-04-01 Thread Jesse Gross
On Tue, Apr 1, 2014 at 8:24 AM, wei zhang asuka@163.com wrote:
 At 2014-04-01 08:49:53,Jesse Gross je...@nicira.com wrote:
On Sun, Mar 30, 2014 at 5:12 AM, wei zhang asuka@163.com wrote:
 At 2014-03-29 06:02:25,Jesse Gross je...@nicira.com wrote:

 Maybe I misunderstand something? I think if we discard all packet pass to us
 when we use gre vport, new gre_cisco_protocol which has lower priority could
 not see the packet intended to it.

That's true but in this case it would also not see any data packets,
so I don't think that situation would work well anyways.

 I checked the implementation of the ipgre_err(), which has be called before
 the err_handler of gre vport. It use the the (local address, remote 
 address, key)
 to distinguish the packet which is realy intended to it, although it could 
 not
 always get the key from the icmp packet. Should we do as the same as it?
 I'm not sure this is feasible, any advice is appreciate.

OVS does flow based matching rather than using a static set of
configuration parameters, so everything matches in some way
(although the result might be to drop).

 So the flow based match could dynamically determine by the ovs daemon, we 
 could
 not find out the belonging of the packet as far as we call ovs_dp_upcall(), 
 isn't it?

That's right - and since the OVS flow table always has a default
behavior (even if it is drop or send to controller) there's never a
packet that isn't considered to be destined to OVS once it is
received.

If this makes sense to you, would you mind submitting the patch you
had earlier formally with a commit message and signed off by line?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] openvswitch: supply a dummy err_handler of gre_cisco_protocol to prevent kernel crash

2014-03-31 Thread Jesse Gross
On Sun, Mar 30, 2014 at 5:12 AM, wei zhang  wrote:
> At 2014-03-29 06:02:25,"Jesse Gross"  wrote:
>
>>I'm not sure that rejecting all ICMP packets is the correct thing do
>>here since it means that we could pass them onto a later caller even
>>though they are intended for us. We should probably use the same logic
>>as for receiving packets and just discard them here.
>
> Thank you very much for your advice, did you mean this logic?

Yes, that's what I was thinking.

[...]

> Maybe I misunderstand something? I think if we discard all packet pass to us
> when we use gre vport, new gre_cisco_protocol which has lower priority could
> not see the packet intended to it.

That's true but in this case it would also not see any data packets,
so I don't think that situation would work well anyways.

> I checked the implementation of the ipgre_err(), which has be called before
> the err_handler of gre vport. It use the the (local address, remote address, 
> key)
> to distinguish the packet which is realy intended to it, although it could not
> always get the key from the icmp packet. Should we do as the same as it?
> I'm not sure this is feasible, any advice is appreciate.

OVS does flow based matching rather than using a static set of
configuration parameters, so everything "matches" in some way
(although the result might be to drop). This generally means that OVS
is the receiver of last resort and nothing currently has a lower
priority. That actually means the difference between the patches is
somewhat academic but it seems more robust for the logic to be
consistent.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] openvswitch: supply a dummy err_handler of gre_cisco_protocol to prevent kernel crash

2014-03-31 Thread Jesse Gross
On Sun, Mar 30, 2014 at 5:12 AM, wei zhang asuka@163.com wrote:
 At 2014-03-29 06:02:25,Jesse Gross je...@nicira.com wrote:

I'm not sure that rejecting all ICMP packets is the correct thing do
here since it means that we could pass them onto a later caller even
though they are intended for us. We should probably use the same logic
as for receiving packets and just discard them here.

 Thank you very much for your advice, did you mean this logic?

Yes, that's what I was thinking.

[...]

 Maybe I misunderstand something? I think if we discard all packet pass to us
 when we use gre vport, new gre_cisco_protocol which has lower priority could
 not see the packet intended to it.

That's true but in this case it would also not see any data packets,
so I don't think that situation would work well anyways.

 I checked the implementation of the ipgre_err(), which has be called before
 the err_handler of gre vport. It use the the (local address, remote address, 
 key)
 to distinguish the packet which is realy intended to it, although it could not
 always get the key from the icmp packet. Should we do as the same as it?
 I'm not sure this is feasible, any advice is appreciate.

OVS does flow based matching rather than using a static set of
configuration parameters, so everything matches in some way
(although the result might be to drop). This generally means that OVS
is the receiver of last resort and nothing currently has a lower
priority. That actually means the difference between the patches is
somewhat academic but it seems more robust for the logic to be
consistent.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net/openvswitch: Use with RCU_INIT_POINTER(x, NULL) in vport-gre.c

2014-03-28 Thread Jesse Gross
On Sun, Mar 23, 2014 at 12:22 PM, Monam Agarwal
 wrote:
> This patch replaces rcu_assign_pointer(x, NULL) with RCU_INIT_POINTER(x, NULL)
>
> The rcu_assign_pointer() ensures that the initialization of a structure
> is carried out before storing a pointer to that structure.
> And in the case of the NULL pointer, there is no structure to initialize.
> So, rcu_assign_pointer(p, NULL) can be safely converted to 
> RCU_INIT_POINTER(p, NULL)
>
> Signed-off-by: Monam Agarwal 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] openvswitch: supply a dummy err_handler of gre_cisco_protocol to prevent kernel crash

2014-03-28 Thread Jesse Gross
On Thu, Mar 27, 2014 at 2:56 PM, Wei Zhang  wrote:
> diff --git a/net/openvswitch/vport-gre.c b/net/openvswitch/vport-gre.c
> index a3d6951..d64c639 100644
> --- a/net/openvswitch/vport-gre.c
> +++ b/net/openvswitch/vport-gre.c
> @@ -110,6 +110,12 @@ static int gre_rcv(struct sk_buff *skb,
> return PACKET_RCVD;
>  }
>
> +static int gre_dummy_err(struct sk_buff *skb, u32 info,
> +const struct tnl_ptk_info *tpi)
> +{
> +   return PACKET_REJECT;
> +}

I'm not sure that rejecting all ICMP packets is the correct thing do
here since it means that we could pass them onto a later caller even
though they are intended for us. We should probably use the same logic
as for receiving packets and just discard them here.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] openvswitch: supply a dummy err_handler of gre_cisco_protocol to prevent kernel crash

2014-03-28 Thread Jesse Gross
On Thu, Mar 27, 2014 at 2:56 PM, Wei Zhang asuka@163.com wrote:
 diff --git a/net/openvswitch/vport-gre.c b/net/openvswitch/vport-gre.c
 index a3d6951..d64c639 100644
 --- a/net/openvswitch/vport-gre.c
 +++ b/net/openvswitch/vport-gre.c
 @@ -110,6 +110,12 @@ static int gre_rcv(struct sk_buff *skb,
 return PACKET_RCVD;
  }

 +static int gre_dummy_err(struct sk_buff *skb, u32 info,
 +const struct tnl_ptk_info *tpi)
 +{
 +   return PACKET_REJECT;
 +}

I'm not sure that rejecting all ICMP packets is the correct thing do
here since it means that we could pass them onto a later caller even
though they are intended for us. We should probably use the same logic
as for receiving packets and just discard them here.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net/openvswitch: Use with RCU_INIT_POINTER(x, NULL) in vport-gre.c

2014-03-28 Thread Jesse Gross
On Sun, Mar 23, 2014 at 12:22 PM, Monam Agarwal
monamagarwal...@gmail.com wrote:
 This patch replaces rcu_assign_pointer(x, NULL) with RCU_INIT_POINTER(x, NULL)

 The rcu_assign_pointer() ensures that the initialization of a structure
 is carried out before storing a pointer to that structure.
 And in the case of the NULL pointer, there is no structure to initialize.
 So, rcu_assign_pointer(p, NULL) can be safely converted to 
 RCU_INIT_POINTER(p, NULL)

 Signed-off-by: Monam Agarwal monamagarwal...@gmail.com

Applied.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ovs-discuss] Linus GIT Head OOPs reproducable in open vswitch when running mininet topology

2014-02-03 Thread Jesse Gross
On Fri, Jan 31, 2014 at 10:18 AM, Thomas Glanzmann  wrote:
>> Do you know if this happens with an older kernel or with a simpler topology?
>
> No, I don't. I just verified that the Ubuntu Mininet uses the
> openvswitch kernel module from openvswitch and not the one that is
> shipped with the kernel. Ubuntu precise does not crash with the exact same
> topology.

The kernel from Precise doesn't call the function that is triggering
the problem, so it's not too surprising that it doesn't have the same
issue.

It's not clear that this is actually a bug in the OVS code since it
happens in a different function and that function accesses data that
OVS doesn't really touch. Do you know if this happens with the bridge?
Or can you try bisecting? Or use gdb to track down the faulting
address?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ovs-discuss] Linus GIT Head OOPs reproducable in open vswitch when running mininet topology

2014-02-03 Thread Jesse Gross
On Fri, Jan 31, 2014 at 10:18 AM, Thomas Glanzmann tho...@glanzmann.de wrote:
 Do you know if this happens with an older kernel or with a simpler topology?

 No, I don't. I just verified that the Ubuntu Mininet uses the
 openvswitch kernel module from openvswitch and not the one that is
 shipped with the kernel. Ubuntu precise does not crash with the exact same
 topology.

The kernel from Precise doesn't call the function that is triggering
the problem, so it's not too surprising that it doesn't have the same
issue.

It's not clear that this is actually a bug in the OVS code since it
happens in a different function and that function accesses data that
OVS doesn't really touch. Do you know if this happens with the bridge?
Or can you try bisecting? Or use gdb to track down the faulting
address?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ovs-discuss] Linus GIT Head OOPs reproducable in open vswitch when running mininet topology

2014-01-31 Thread Jesse Gross
On Thu, Jan 30, 2014 at 6:33 PM, Thomas Glanzmann  wrote:
> Hello Jesse,
>
>> This looks like the kernel module included with upstream Linux instead
>> of from OVS git, is that correct?
>
> coorect.
>
>> Can you please describe what you are doing instead of just giving your 
>> script?
>
> I created 8 hosts. 2 hosts are connected two each switches. That gives
> me 4 switches which are connected using a ring topology. The reason for
> that is that I want to test the Layer2, Layer3 IPv4 and IPv6
> capabilities of OpenDayLight.

Do you know what type of devices are being attached to OVS (i.e. tap,
veth, etc.)?

Do you know if this happens with an older kernel or with a simpler topology?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ovs-discuss] Linus GIT Head OOPs reproducable in open vswitch when running mininet topology

2014-01-31 Thread Jesse Gross
On Thu, Jan 30, 2014 at 6:33 PM, Thomas Glanzmann tho...@glanzmann.de wrote:
 Hello Jesse,

 This looks like the kernel module included with upstream Linux instead
 of from OVS git, is that correct?

 coorect.

 Can you please describe what you are doing instead of just giving your 
 script?

 I created 8 hosts. 2 hosts are connected two each switches. That gives
 me 4 switches which are connected using a ring topology. The reason for
 that is that I want to test the Layer2, Layer3 IPv4 and IPv6
 capabilities of OpenDayLight.

Do you know what type of devices are being attached to OVS (i.e. tap,
veth, etc.)?

Do you know if this happens with an older kernel or with a simpler topology?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ovs-discuss] Linus GIT Head OOPs reproducable in open vswitch when running mininet topology

2014-01-30 Thread Jesse Gross
On Thu, Jan 30, 2014 at 12:44 PM, Thomas Glanzmann  wrote:
> Hello,
> open vswitch git head with Linus tip OOPses for me reproducable when I
> load the following mininet topology:

This looks like the kernel module included with upstream Linux instead
of from OVS git, is that correct?

Can you please describe what you are doing instead of just giving your script?

It would also be helpful if you could use GDB to find out the source
of the faulting address.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ovs-discuss] Linus GIT Head OOPs reproducable in open vswitch when running mininet topology

2014-01-30 Thread Jesse Gross
On Thu, Jan 30, 2014 at 12:44 PM, Thomas Glanzmann tho...@glanzmann.de wrote:
 Hello,
 open vswitch git head with Linus tip OOPses for me reproducable when I
 load the following mininet topology:

This looks like the kernel module included with upstream Linux instead
of from OVS git, is that correct?

Can you please describe what you are doing instead of just giving your script?

It would also be helpful if you could use GDB to find out the source
of the faulting address.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [-next] openvswitch BUILD_BUG_ON failed

2013-09-04 Thread Jesse Gross
On Tue, Sep 3, 2013 at 11:55 PM, Geert Uytterhoeven
 wrote:
> On Tue, Sep 3, 2013 at 11:44 PM, Jesse Gross  wrote:
>> On Sat, Aug 31, 2013 at 5:11 AM, Geert Uytterhoeven
>>  wrote:
>>> On Fri, Aug 30, 2013 at 3:11 AM, Jesse Gross  wrote:
>>>> On Thu, Aug 29, 2013 at 3:10 PM, David Miller  wrote:
>>>>> From: Jesse Gross 
>>>>> Date: Thu, 29 Aug 2013 14:42:22 -0700
>>>>>
>>>>>> On Thu, Aug 29, 2013 at 2:21 PM, Geert Uytterhoeven
>>>>>>  wrote:
>>>>>>> However, I have some doubts about other alignment "enforcements":
>>>>>>>
>>>>>>> "__aligned(__alignof__(long))" makes the whole struct aligned to the
>>>>>>> alignment rule for "long":
>>>>>>>1. This is only 2 bytes on m68k, i.e. != sizeof(long).
>>>>>>>2. This is 4 bytes on many 32-bit platforms, which may be less than 
>>>>>>> the
>>>>>>>   default alignment for "__be64" (cfr. some members of struct
>>>>>>>   ovs_key_ipv4_tunnel), so this may make those 64-bit members 
>>>>>>> unaligned.
>>>>>>
>>>>>> Do any of those 32-bit architectures actually care about alignment of
>>>>>> 64 bit values? On 32-bit x86, a long is 32 bits but the alignment
>>>>>> requirement of __be64 is also 32 bit.
>>>>>
>>>>> All except x86-32 do, it is in fact the odd man out with respect to this
>>>>> issue.
>>>>
>>>> Thanks, good to know.
>>>>
>>>> Andy, do you want to modify your patch to just drop the alignment
>>>> specification as Geert suggested (but definitely keep the new build
>>>> assert that you added)? It's probably better to just send the patch to
>>>> netdev (against net-next) as well since you'll likely get better
>>>> comments there and we can fix this faster if you cut out the
>>>> middleman.
>>>
>>> Why do you want to keep the build asserts?
>>> Is this in-memory structure also transfered as-is over the network?
>>> If yes, you definitely want the padding.
>>
>> Well they caught this bug and really don't cost anything.
>>
>>> Nevertheless, as the struct contains u32 and even __be64 members, the
>>> size of the struct will always be a multiple of the alignment unit for
>>> 64-bit quantities (and thus also for long), as per the C standard.
>>> Hence the check
>>>
>>> BUILD_BUG_ON(sizeof(struct sw_flow_key) % __alignof__(long));
>>>
>>> will only catch bad compiler bugs or people adding __packed to the struct.
>>
>> It's possible that we might want to pack the structure in the future.
>> More generally though, the contents of the struct is really
>> independent of the alignment requirements here because we're accessing
>> it as an array of bytes in long-sized chunks so implicitly depending
>> on the size of the members is not that great.
>
> So you're accessing it as an array of bytes in long-sized chunks.
> What are you doing with this accessed data?
> Transfering over the network?
> Storing on disk?
> Then it must be portable across machines and architectures, right?

It's just an in-memory hash table lookup. No one else ever sees it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [-next] openvswitch BUILD_BUG_ON failed

2013-09-04 Thread Jesse Gross
On Tue, Sep 3, 2013 at 11:55 PM, Geert Uytterhoeven
ge...@linux-m68k.org wrote:
 On Tue, Sep 3, 2013 at 11:44 PM, Jesse Gross je...@nicira.com wrote:
 On Sat, Aug 31, 2013 at 5:11 AM, Geert Uytterhoeven
 ge...@linux-m68k.org wrote:
 On Fri, Aug 30, 2013 at 3:11 AM, Jesse Gross je...@nicira.com wrote:
 On Thu, Aug 29, 2013 at 3:10 PM, David Miller da...@davemloft.net wrote:
 From: Jesse Gross je...@nicira.com
 Date: Thu, 29 Aug 2013 14:42:22 -0700

 On Thu, Aug 29, 2013 at 2:21 PM, Geert Uytterhoeven
 ge...@linux-m68k.org wrote:
 However, I have some doubts about other alignment enforcements:

 __aligned(__alignof__(long)) makes the whole struct aligned to the
 alignment rule for long:
1. This is only 2 bytes on m68k, i.e. != sizeof(long).
2. This is 4 bytes on many 32-bit platforms, which may be less than 
 the
   default alignment for __be64 (cfr. some members of struct
   ovs_key_ipv4_tunnel), so this may make those 64-bit members 
 unaligned.

 Do any of those 32-bit architectures actually care about alignment of
 64 bit values? On 32-bit x86, a long is 32 bits but the alignment
 requirement of __be64 is also 32 bit.

 All except x86-32 do, it is in fact the odd man out with respect to this
 issue.

 Thanks, good to know.

 Andy, do you want to modify your patch to just drop the alignment
 specification as Geert suggested (but definitely keep the new build
 assert that you added)? It's probably better to just send the patch to
 netdev (against net-next) as well since you'll likely get better
 comments there and we can fix this faster if you cut out the
 middleman.

 Why do you want to keep the build asserts?
 Is this in-memory structure also transfered as-is over the network?
 If yes, you definitely want the padding.

 Well they caught this bug and really don't cost anything.

 Nevertheless, as the struct contains u32 and even __be64 members, the
 size of the struct will always be a multiple of the alignment unit for
 64-bit quantities (and thus also for long), as per the C standard.
 Hence the check

 BUILD_BUG_ON(sizeof(struct sw_flow_key) % __alignof__(long));

 will only catch bad compiler bugs or people adding __packed to the struct.

 It's possible that we might want to pack the structure in the future.
 More generally though, the contents of the struct is really
 independent of the alignment requirements here because we're accessing
 it as an array of bytes in long-sized chunks so implicitly depending
 on the size of the members is not that great.

 So you're accessing it as an array of bytes in long-sized chunks.
 What are you doing with this accessed data?
 Transfering over the network?
 Storing on disk?
 Then it must be portable across machines and architectures, right?

It's just an in-memory hash table lookup. No one else ever sees it.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [-next] openvswitch BUILD_BUG_ON failed

2013-09-03 Thread Jesse Gross
On Sat, Aug 31, 2013 at 5:11 AM, Geert Uytterhoeven
 wrote:
> On Fri, Aug 30, 2013 at 3:11 AM, Jesse Gross  wrote:
>> On Thu, Aug 29, 2013 at 3:10 PM, David Miller  wrote:
>>> From: Jesse Gross 
>>> Date: Thu, 29 Aug 2013 14:42:22 -0700
>>>
>>>> On Thu, Aug 29, 2013 at 2:21 PM, Geert Uytterhoeven
>>>>  wrote:
>>>>> However, I have some doubts about other alignment "enforcements":
>>>>>
>>>>> "__aligned(__alignof__(long))" makes the whole struct aligned to the
>>>>> alignment rule for "long":
>>>>>1. This is only 2 bytes on m68k, i.e. != sizeof(long).
>>>>>2. This is 4 bytes on many 32-bit platforms, which may be less than the
>>>>>   default alignment for "__be64" (cfr. some members of struct
>>>>>   ovs_key_ipv4_tunnel), so this may make those 64-bit members 
>>>>> unaligned.
>>>>
>>>> Do any of those 32-bit architectures actually care about alignment of
>>>> 64 bit values? On 32-bit x86, a long is 32 bits but the alignment
>>>> requirement of __be64 is also 32 bit.
>>>
>>> All except x86-32 do, it is in fact the odd man out with respect to this
>>> issue.
>>
>> Thanks, good to know.
>>
>> Andy, do you want to modify your patch to just drop the alignment
>> specification as Geert suggested (but definitely keep the new build
>> assert that you added)? It's probably better to just send the patch to
>> netdev (against net-next) as well since you'll likely get better
>> comments there and we can fix this faster if you cut out the
>> middleman.
>
> Why do you want to keep the build asserts?
> Is this in-memory structure also transfered as-is over the network?
> If yes, you definitely want the padding.

Well they caught this bug and really don't cost anything.

> Nevertheless, as the struct contains u32 and even __be64 members, the
> size of the struct will always be a multiple of the alignment unit for
> 64-bit quantities (and thus also for long), as per the C standard.
> Hence the check
>
> BUILD_BUG_ON(sizeof(struct sw_flow_key) % __alignof__(long));
>
> will only catch bad compiler bugs or people adding __packed to the struct.

It's possible that we might want to pack the structure in the future.
More generally though, the contents of the struct is really
independent of the alignment requirements here because we're accessing
it as an array of bytes in long-sized chunks so implicitly depending
on the size of the members is not that great.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [-next] openvswitch BUILD_BUG_ON failed

2013-09-03 Thread Jesse Gross
On Sat, Aug 31, 2013 at 5:11 AM, Geert Uytterhoeven
ge...@linux-m68k.org wrote:
 On Fri, Aug 30, 2013 at 3:11 AM, Jesse Gross je...@nicira.com wrote:
 On Thu, Aug 29, 2013 at 3:10 PM, David Miller da...@davemloft.net wrote:
 From: Jesse Gross je...@nicira.com
 Date: Thu, 29 Aug 2013 14:42:22 -0700

 On Thu, Aug 29, 2013 at 2:21 PM, Geert Uytterhoeven
 ge...@linux-m68k.org wrote:
 However, I have some doubts about other alignment enforcements:

 __aligned(__alignof__(long)) makes the whole struct aligned to the
 alignment rule for long:
1. This is only 2 bytes on m68k, i.e. != sizeof(long).
2. This is 4 bytes on many 32-bit platforms, which may be less than the
   default alignment for __be64 (cfr. some members of struct
   ovs_key_ipv4_tunnel), so this may make those 64-bit members 
 unaligned.

 Do any of those 32-bit architectures actually care about alignment of
 64 bit values? On 32-bit x86, a long is 32 bits but the alignment
 requirement of __be64 is also 32 bit.

 All except x86-32 do, it is in fact the odd man out with respect to this
 issue.

 Thanks, good to know.

 Andy, do you want to modify your patch to just drop the alignment
 specification as Geert suggested (but definitely keep the new build
 assert that you added)? It's probably better to just send the patch to
 netdev (against net-next) as well since you'll likely get better
 comments there and we can fix this faster if you cut out the
 middleman.

 Why do you want to keep the build asserts?
 Is this in-memory structure also transfered as-is over the network?
 If yes, you definitely want the padding.

Well they caught this bug and really don't cost anything.

 Nevertheless, as the struct contains u32 and even __be64 members, the
 size of the struct will always be a multiple of the alignment unit for
 64-bit quantities (and thus also for long), as per the C standard.
 Hence the check

 BUILD_BUG_ON(sizeof(struct sw_flow_key) % __alignof__(long));

 will only catch bad compiler bugs or people adding __packed to the struct.

It's possible that we might want to pack the structure in the future.
More generally though, the contents of the struct is really
independent of the alignment requirements here because we're accessing
it as an array of bytes in long-sized chunks so implicitly depending
on the size of the members is not that great.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [-next] openvswitch BUILD_BUG_ON failed

2013-08-29 Thread Jesse Gross
On Thu, Aug 29, 2013 at 3:10 PM, David Miller  wrote:
> From: Jesse Gross 
> Date: Thu, 29 Aug 2013 14:42:22 -0700
>
>> On Thu, Aug 29, 2013 at 2:21 PM, Geert Uytterhoeven
>>  wrote:
>>> However, I have some doubts about other alignment "enforcements":
>>>
>>> "__aligned(__alignof__(long))" makes the whole struct aligned to the
>>> alignment rule for "long":
>>>1. This is only 2 bytes on m68k, i.e. != sizeof(long).
>>>2. This is 4 bytes on many 32-bit platforms, which may be less than the
>>>   default alignment for "__be64" (cfr. some members of struct
>>>   ovs_key_ipv4_tunnel), so this may make those 64-bit members unaligned.
>>
>> Do any of those 32-bit architectures actually care about alignment of
>> 64 bit values? On 32-bit x86, a long is 32 bits but the alignment
>> requirement of __be64 is also 32 bit.
>
> All except x86-32 do, it is in fact the odd man out with respect to this
> issue.

Thanks, good to know.

Andy, do you want to modify your patch to just drop the alignment
specification as Geert suggested (but definitely keep the new build
assert that you added)? It's probably better to just send the patch to
netdev (against net-next) as well since you'll likely get better
comments there and we can fix this faster if you cut out the
middleman.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [-next] openvswitch BUILD_BUG_ON failed

2013-08-29 Thread Jesse Gross
On Thu, Aug 29, 2013 at 2:21 PM, Geert Uytterhoeven
 wrote:
> However, I have some doubts about other alignment "enforcements":
>
> "__aligned(__alignof__(long))" makes the whole struct aligned to the
> alignment rule for "long":
>1. This is only 2 bytes on m68k, i.e. != sizeof(long).
>2. This is 4 bytes on many 32-bit platforms, which may be less than the
>   default alignment for "__be64" (cfr. some members of struct
>   ovs_key_ipv4_tunnel), so this may make those 64-bit members unaligned.

Do any of those 32-bit architectures actually care about alignment of
64 bit values? On 32-bit x86, a long is 32 bits but the alignment
requirement of __be64 is also 32 bit.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [-next] openvswitch BUILD_BUG_ON failed

2013-08-29 Thread Jesse Gross
On Thu, Aug 29, 2013 at 2:21 PM, Geert Uytterhoeven
ge...@linux-m68k.org wrote:
 However, I have some doubts about other alignment enforcements:

 __aligned(__alignof__(long)) makes the whole struct aligned to the
 alignment rule for long:
1. This is only 2 bytes on m68k, i.e. != sizeof(long).
2. This is 4 bytes on many 32-bit platforms, which may be less than the
   default alignment for __be64 (cfr. some members of struct
   ovs_key_ipv4_tunnel), so this may make those 64-bit members unaligned.

Do any of those 32-bit architectures actually care about alignment of
64 bit values? On 32-bit x86, a long is 32 bits but the alignment
requirement of __be64 is also 32 bit.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [-next] openvswitch BUILD_BUG_ON failed

2013-08-29 Thread Jesse Gross
On Thu, Aug 29, 2013 at 3:10 PM, David Miller da...@davemloft.net wrote:
 From: Jesse Gross je...@nicira.com
 Date: Thu, 29 Aug 2013 14:42:22 -0700

 On Thu, Aug 29, 2013 at 2:21 PM, Geert Uytterhoeven
 ge...@linux-m68k.org wrote:
 However, I have some doubts about other alignment enforcements:

 __aligned(__alignof__(long)) makes the whole struct aligned to the
 alignment rule for long:
1. This is only 2 bytes on m68k, i.e. != sizeof(long).
2. This is 4 bytes on many 32-bit platforms, which may be less than the
   default alignment for __be64 (cfr. some members of struct
   ovs_key_ipv4_tunnel), so this may make those 64-bit members unaligned.

 Do any of those 32-bit architectures actually care about alignment of
 64 bit values? On 32-bit x86, a long is 32 bits but the alignment
 requirement of __be64 is also 32 bit.

 All except x86-32 do, it is in fact the odd man out with respect to this
 issue.

Thanks, good to know.

Andy, do you want to modify your patch to just drop the alignment
specification as Geert suggested (but definitely keep the new build
assert that you added)? It's probably better to just send the patch to
netdev (against net-next) as well since you'll likely get better
comments there and we can fix this faster if you cut out the
middleman.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: Tree for Jun 21 (netdev: openvswitch)

2013-06-21 Thread Jesse Gross
On Fri, Jun 21, 2013 at 8:22 AM, Randy Dunlap  wrote:
> On 06/21/13 01:17, Stephen Rothwell wrote:
>> Hi all,
>>
>> Happy solstice!
>>
>> Changes since 20130620:
>>
>
> when CONFIG_INET is not enabled:
>
>   CC  net/openvswitch/flow.o
> In file included from net/openvswitch/flow.c:43:0:
> include/net/ip_tunnels.h: In function 'tunnel_ip_select_ident':
> include/net/ip_tunnels.h:155:3: error: implicit declaration of function 
> '__ip_select_ident' [-Werror=implicit-function-declaration]

Thanks, I just sent out a couple of patches to fix config issues with
the recent tunneling series.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: Tree for Jun 21 (netdev: openvswitch)

2013-06-21 Thread Jesse Gross
On Fri, Jun 21, 2013 at 8:22 AM, Randy Dunlap rdun...@infradead.org wrote:
 On 06/21/13 01:17, Stephen Rothwell wrote:
 Hi all,

 Happy solstice!

 Changes since 20130620:


 when CONFIG_INET is not enabled:

   CC  net/openvswitch/flow.o
 In file included from net/openvswitch/flow.c:43:0:
 include/net/ip_tunnels.h: In function 'tunnel_ip_select_ident':
 include/net/ip_tunnels.h:155:3: error: implicit declaration of function 
 '__ip_select_ident' [-Werror=implicit-function-declaration]

Thanks, I just sent out a couple of patches to fix config issues with
the recent tunneling series.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 4/9] net: openvswitch: use this_cpu_ptr per-cpu helper

2012-11-16 Thread Jesse Gross
On Fri, Nov 16, 2012 at 12:35 AM, Shan Wei  wrote:
> Shan Wei said, at 2012/11/13 9:52:
>> From: Shan Wei 
>>
>> just use more faster this_cpu_ptr instead of per_cpu_ptr(p, 
>> smp_processor_id());
>>
>>
>> Signed-off-by: Shan Wei 
>> Reviewed-by: Christoph Lameter 
>
> Jesse Gross,  would you like to pick it up to your tree?

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 4/9] net: openvswitch: use this_cpu_ptr per-cpu helper

2012-11-16 Thread Jesse Gross
On Fri, Nov 16, 2012 at 12:35 AM, Shan Wei shanwe...@gmail.com wrote:
 Shan Wei said, at 2012/11/13 9:52:
 From: Shan Wei davids...@tencent.com

 just use more faster this_cpu_ptr instead of per_cpu_ptr(p, 
 smp_processor_id());


 Signed-off-by: Shan Wei davids...@tencent.com
 Reviewed-by: Christoph Lameter c...@linux.com

 Jesse Gross,  would you like to pick it up to your tree?

Applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/9] net: openvswitch: use this_cpu_ptr per-cpu helper

2012-11-01 Thread Jesse Gross
On Thu, Nov 1, 2012 at 7:33 AM, Christoph Lameter  wrote:
> On Thu, 1 Nov 2012, Shan Wei wrote:
>
>> But for different field in same per-cpu variable, how to guarantee n_missed
>> and n_hit are from same cpu?
>> this_cpu_read(dp->stats_percpu->n_missed);
>> [processor changed]
>> this_cpu_read(dp->stats_percpu->n_hit);
>
> What does current guarantee that? If it is guaranteed then you can use the
> __this_cpu_xxx ops.

Preemption is disabled in all of the places where writes are done and
all of the reads are from foreign CPUs.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/9] net: openvswitch: use this_cpu_ptr per-cpu helper

2012-11-01 Thread Jesse Gross
On Thu, Nov 1, 2012 at 7:33 AM, Christoph Lameter c...@linux.com wrote:
 On Thu, 1 Nov 2012, Shan Wei wrote:

 But for different field in same per-cpu variable, how to guarantee n_missed
 and n_hit are from same cpu?
 this_cpu_read(dp-stats_percpu-n_missed);
 [processor changed]
 this_cpu_read(dp-stats_percpu-n_hit);

 What does current guarantee that? If it is guaranteed then you can use the
 __this_cpu_xxx ops.

Preemption is disabled in all of the places where writes are done and
all of the reads are from foreign CPUs.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/