from:"Steffen Klassert"

Re: [PATCH] esp6: Simplify the calculation of variables

2021-04-14 Thread Steffen Klassert

On Tue, Apr 13, 2021 at 05:57:15PM +0800, Jiapeng Chong wrote:
> Fix the following coccicheck warnings:
> 
> ./net/ipv6/esp6_offload.c:321:32-34: WARNING !A || A && B is equivalent
> to !A || B.
> 
> Reported-by: Abaci Robot 
> Signed-off-by: Jiapeng Chong 
> ---
>  net/ipv6/esp6_offload.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/ipv6/esp6_offload.c b/net/ipv6/esp6_offload.c
> index 4af56af..40ed4fc 100644
> --- a/net/ipv6/esp6_offload.c
> +++ b/net/ipv6/esp6_offload.c
> @@ -318,7 +318,7 @@ static int esp6_xmit(struct xfrm_state *x, struct sk_buff 
> *skb,  netdev_features
>   esp.plen = esp.clen - skb->len - esp.tfclen;
>   esp.tailen = esp.tfclen + esp.plen + alen;
>  
> - if (!hw_offload || (hw_offload && !skb_is_gso(skb))) {
> + if (!hw_offload || !skb_is_gso(skb)) {
>   esp.nfrags = esp6_output_head(x, skb, );
>   if (esp.nfrags < 0)
>   return esp.nfrags;

That one is already in ipsec-next:

Commit f076835a8bf2aa6ea48f718e4506587c815ab99f
Author: Junlin Yang 
Date:   Thu Mar 11 10:07:56 2021 +0800

esp6: remove a duplicative condition

Fixes coccicheck warnings:
./net/ipv6/esp6_offload.c:319:32-34:
WARNING !A || A && B is equivalent to !A || B

Signed-off-by: Junlin Yang 
Signed-off-by: Steffen Klassert

Re: [PATCH] xfrm/compat: Cleanup WARN()s that can be user-triggered

2021-03-31 Thread Steffen Klassert

On Tue, Mar 30, 2021 at 12:25:06AM +0100, Dmitry Safonov wrote:
> Replace WARN_ONCE() that can be triggered from userspace with
> pr_warn_once(). Those still give user a hint what's the issue.
> 
> I've left WARN()s that are not possible to trigger with current
> code-base and that would mean that the code has issues:
> - relying on current compat_msg_min[type] <= xfrm_msg_min[type]
> - expected 4-byte padding size difference between
>   compat_msg_min[type] and xfrm_msg_min[type]
> - compat_policy[type].len <= xfrma_policy[type].len
> (for every type)
> 
> Reported-by: syzbot+834ffd1afc7212eb8...@syzkaller.appspotmail.com
> Fixes: 5f3eea6b7e8f ("xfrm/compat: Attach xfrm dumps to 64=>32 bit 
> translator")
> Cc: "David S. Miller" 
> Cc: Eric Dumazet 
> Cc: Herbert Xu 
> Cc: Jakub Kicinski 
> Cc: Steffen Klassert 
> Cc: net...@vger.kernel.org
> Cc: sta...@vger.kernel.org
> Signed-off-by: Dmitry Safonov 

Applied, thanks Dmitry!

Re: [PATCH net-next] ipv6: fix clang Wformat warning

2021-03-26 Thread Steffen Klassert

On Mon, Mar 22, 2021 at 12:56:49PM +0100, Arnd Bergmann wrote:
> From: Arnd Bergmann 
> 
> When building with 'make W=1', clang warns about a mismatched
> format string:
> 
> net/ipv6/ah6.c:710:4: error: format specifies type 'unsigned short' but the 
> argument has type 'int' [-Werror,-Wformat]
> aalg_desc->uinfo.auth.icv_fullbits/8);
> ^~~~
> include/linux/printk.h:375:34: note: expanded from macro 'pr_info'
> printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
> ~~~ ^~~
> net/ipv6/esp6.c:1153:5: error: format specifies type 'unsigned short' but the 
> argument has type 'int' [-Werror,-Wformat]
> aalg_desc->uinfo.auth.icv_fullbits / 8);
> ^~
> include/linux/printk.h:375:34: note: expanded from macro 'pr_info'
> printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
> ~~~ ^~~
> 
> Here, the result of dividing a 16-bit number by a 32-bit number
> produces a 32-bit result, which is printed as a 16-bit integer.
> 
> Change the %hu format to the normal %u, which has the same effect
> but avoids the warning.
> 
> Signed-off-by: Arnd Bergmann 

Applied to ipsec-next, thanks!

Re: [PATCH v1 0/2] net: xfrm: Use seqcount_spinlock_t

2021-03-23 Thread Steffen Klassert

On Tue, Mar 16, 2021 at 11:56:28AM +0100, Ahmed S. Darwish wrote:
> Hi,
> 
> This is a small series to trasform xfrm_state_hash_generation sequence
> counter to seqcount_spinlock_t, instead of plain seqcount_t.
> 
> In general, seqcount_LOCKNAME_t sequence counters allows to associate
> the lock used for write serialization with the seqcount. This enables
> lockdep to verify that the write serialization lock is always held
> before entering the seqcount write section.
> 
> If lockdep is disabled, this lock association is compiled out and has
> neither storage size nor runtime overhead.
> 
> The first patch is a general mainline fix, and has a Fixes tag.
> 
> Thanks,
> 
> 8<--
> 
> Ahmed S. Darwish (2):
>   net: xfrm: Localize sequence counter per network namespace
>   net: xfrm: Use sequence counter with associated spinlock

Applied to the ipsec tree, thanks a lot!

Re: [PATCH] esp6: remove a duplicative condition

2021-03-15 Thread Steffen Klassert

On Thu, Mar 11, 2021 at 10:07:56AM +0800, angkery wrote:
> From: Junlin Yang 
> 
> Fixes coccicheck warnings:
> ./net/ipv6/esp6_offload.c:319:32-34:
> WARNING !A || A && B is equivalent to !A || B
> 
> Signed-off-by: Junlin Yang 

Applied to ipsec-next, thanks!

Re: [PATCH] esp4: Simplify the calculation of variables

2021-03-15 Thread Steffen Klassert

On Mon, Mar 01, 2021 at 06:46:02PM +0800, Jiapeng Chong wrote:
> Fix the following coccicheck warnings:
> 
> ./net/ipv4/esp4.c:757:16-18: WARNING !A || A && B is equivalent to !A || B.
> 
> Reported-by: Abaci Robot 
> Signed-off-by: Jiapeng Chong 

Now applied to ipsec-next, thanks!

Re: [PATCH v2 1/1] xfrm: Use actual socket sk instead of skb socket for xfrm_output_resume

2021-03-04 Thread Steffen Klassert

On Tue, Mar 02, 2021 at 08:00:04AM +1300, Evan Nimmo wrote:
> A situation can occur where the interface bound to the sk is different
> to the interface bound to the sk attached to the skb. The interface
> bound to the sk is the correct one however this information is lost inside
> xfrm_output2 and instead the sk on the skb is used in xfrm_output_resume
> instead. This assumes that the sk bound interface and the bound interface
> attached to the sk within the skb are the same which can lead to lookup
> failures inside ip_route_me_harder resulting in the packet being dropped.
> 
> We have an l2tp v3 tunnel with ipsec protection. The tunnel is in the
> global VRF however we have an encapsulated dot1q tunnel interface that
> is within a different VRF. We also have a mangle rule that marks the 
> packets causing them to be processed inside ip_route_me_harder.
> 
> Prior to commit 31c70d5956fc ("l2tp: keep original skb ownership") this
> worked fine as the sk attached to the skb was changed from the dot1q
> encapsulated interface to the sk for the tunnel which meant the interface
> bound to the sk and the interface bound to the skb were identical.
> Commit 46d6c5ae953c ("netfilter: use actual socket sk rather than skb sk
> when routing harder") fixed some of these issues however a similar
> problem existed in the xfrm code.
> 
> Fixes: 31c70d5956fc ("l2tp: keep original skb ownership")
> 
> Signed-off-by: Evan Nimmo 

Applied, thanks Evan!

Re: [PATCH v2] xfrm: Fix incorrect types in assignment

2021-03-01 Thread Steffen Klassert

On Sat, Feb 20, 2021 at 11:18:23AM +0800, Yang Li wrote:
> Fix the following sparse warnings:
> net/xfrm/xfrm_policy.c:1303:22: warning: incorrect type in assignment
> (different address spaces)
> 
> Reported-by: Abaci Robot 
> Signed-off-by: Yang Li 

Please add a proper 'Fixes' tag so that it can be backported into
the stable trees.

Thanks!

Re: [PATCH] xfrm: Use actual socket sk instead of skb socket for xfrm_output_resume

2021-03-01 Thread Steffen Klassert

On Mon, Mar 01, 2021 at 05:02:08PM +1300, Evan Nimmo wrote:
> A situation can occur where the interface bound to the sk is different
> to the interface bound to the sk attached to the skb. The interface
> bound to the sk is the correct one however this information is lost inside
> xfrm_output2 and instead the sk on the skb is used in xfrm_output_resume
> instead. This assumes that the sk bound interface and the bound interface
> attached to the sk within the skb are the same which can lead to lookup
> failures inside ip_route_me_harder resulting in the packet being dropped.
> 
> We have an l2tp v3 tunnel with ipsec protection. The tunnel is in the
> global VRF however we have an encapsulated dot1q tunnel interface that
> is within a different VRF. We also have a mangle rule that marks the 
> packets causing them to be processed inside ip_route_me_harder.
> 
> Prior to commit 31c70d5956fc ("l2tp: keep original skb ownership") this
> worked fine as the sk attached to the skb was changed from the dot1q
> encapsulated interface to the sk for the tunnel which meant the interface
> bound to the sk and the interface bound to the skb were identical.
> Commit 46d6c5ae953c ("netfilter: use actual socket sk rather than skb sk
> when routing harder") fixed some of these issues however a similar
> problem existed in the xfrm code.
> 
> Signed-off-by: Evan Nimmo 

Please add a proper 'Fixes' tag so that it can be backported into
the stable trees.

Thanks!

Re: [PATCH net-next] xfrm: Return the correct errno code

2021-02-05 Thread Steffen Klassert

On Thu, Feb 04, 2021 at 03:42:54PM +0800, Zheng Yongjun wrote:
> When kalloc or kmemdup failed, should return ENOMEM rather than ENOBUF.
> 
> Signed-off-by: Zheng Yongjun 

Applied to ipsec-next, thanks!

Re: [PATCH] esp: Simplify the calculation of variables

2021-02-05 Thread Steffen Klassert

On Wed, Feb 03, 2021 at 10:44:30AM +0800, Jiapeng Chong wrote:
> Fix the following coccicheck warnings:
> 
> ./net/ipv6/esp6.c:791:16-18: WARNING !A || A && B is equivalent
> to !A || B.
> 
> Reported-by: Abaci Robot 
> Signed-off-by: Jiapeng Chong 

Applied to ipsec-next, thanks!

Re: [PATCH v2] net: Simplify the calculation of variables

2021-01-28 Thread Steffen Klassert

On Mon, Jan 25, 2021 at 02:41:46PM +0800, Jiapeng Zhong wrote:
> Fix the following coccicheck warnings:
> 
>  ./net/ipv4/esp4_offload.c:288:32-34: WARNING !A || A && B is
> equivalent to !A || B.
> 
> Reported-by: Abaci Robot 
> Signed-off-by: Jiapeng Zhong 

Patch applied, thanks!

Re: [PATCH net v3] udp: ipv4: manipulate network header of NATed UDP GRO fraglist

2021-01-26 Thread Steffen Klassert

On Thu, Jan 21, 2021 at 10:24:39PM +0900, Dongseok Yi wrote:
> UDP/IP header of UDP GROed frag_skbs are not updated even after NAT
> forwarding. Only the header of head_skb from ip_finish_output_gso ->
> skb_gso_segment is updated but following frag_skbs are not updated.
> 
> A call path skb_mac_gso_segment -> inet_gso_segment ->
> udp4_ufo_fragment -> __udp_gso_segment -> __udp_gso_segment_list
> does not try to update UDP/IP header of the segment list but copy
> only the MAC header.
> 
> Update port, addr and check of each skb of the segment list in
> __udp_gso_segment_list. It covers both SNAT and DNAT.
> 
> Fixes: 9fd1ff5d2ac7 (udp: Support UDP fraglist GRO/GSO.)
> Signed-off-by: Dongseok Yi 
> ---
> v1:
> Steffen Klassert said, there could be 2 options.
> https://lore.kernel.org/patchwork/patch/1362257/
> I was trying to write a quick fix, but it was not easy to forward
> segmented list. Currently, assuming DNAT only.
> 
> v2:
> Per Steffen Klassert request, moved the procedure from
> udp4_ufo_fragment to __udp_gso_segment_list and support SNAT.
> 
> v3:
> Per Steffen Klassert request, applied fast return by comparing seg
> and seg->next at the beginning of __udpv4_gso_segment_list_csum.
> 
> Fixed uh->dest = *newport and iph->daddr = *newip to
> *oldport = *newport and *oldip = *newip.
> 
>  include/net/udp.h  |  2 +-
>  net/ipv4/udp_offload.c | 72 
> ++
>  net/ipv6/udp_offload.c |  2 +-
>  3 files changed, 69 insertions(+), 7 deletions(-)
> 
> diff --git a/include/net/udp.h b/include/net/udp.h
> index 877832b..01351ba 100644
> --- a/include/net/udp.h
> +++ b/include/net/udp.h
> @@ -178,7 +178,7 @@ struct sk_buff *udp_gro_receive(struct list_head *head, 
> struct sk_buff *skb,
>  int udp_gro_complete(struct sk_buff *skb, int nhoff, udp_lookup_t lookup);
>  
>  struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
> -   netdev_features_t features);
> +   netdev_features_t features, bool is_ipv6);
>  
>  static inline struct udphdr *udp_gro_udphdr(struct sk_buff *skb)
>  {
> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> index ff39e94..43660cf 100644
> --- a/net/ipv4/udp_offload.c
> +++ b/net/ipv4/udp_offload.c
> @@ -187,8 +187,67 @@ struct sk_buff *skb_udp_tunnel_segment(struct sk_buff 
> *skb,
>  }
>  EXPORT_SYMBOL(skb_udp_tunnel_segment);
>  
> +static void __udpv4_gso_segment_csum(struct sk_buff *seg,
> +  __be32 *oldip, __be32 *newip,
> +  __be16 *oldport, __be16 *newport)
> +{
> + struct udphdr *uh;
> + struct iphdr *iph;
> +
> + if (*oldip == *newip && *oldport == *newport)
> + return;

This check is redundant as you check this already in
__udpv4_gso_segment_list_csum.

Looks ok otherwise.

> +
> + uh = udp_hdr(seg);
> + iph = ip_hdr(seg);
> +
> + if (uh->check) {
> + inet_proto_csum_replace4(>check, seg, *oldip, *newip,
> +  true);
> + inet_proto_csum_replace2(>check, seg, *oldport, *newport,
> +  false);
> + if (!uh->check)
> + uh->check = CSUM_MANGLED_0;
> + }
> + *oldport = *newport;
> +
> + csum_replace4(>check, *oldip, *newip);
> + *oldip = *newip;
> +}
> +
> +static struct sk_buff *__udpv4_gso_segment_list_csum(struct sk_buff *segs)
> +{
> + struct sk_buff *seg;
> + struct udphdr *uh, *uh2;
> + struct iphdr *iph, *iph2;
> +
> + seg = segs;
> + uh = udp_hdr(seg);
> + iph = ip_hdr(seg);
> +
> + if ((udp_hdr(seg)->dest == udp_hdr(seg->next)->dest) &&
> + (udp_hdr(seg)->source == udp_hdr(seg->next)->source) &&
> + (ip_hdr(seg)->daddr == ip_hdr(seg->next)->daddr) &&
> + (ip_hdr(seg)->saddr == ip_hdr(seg->next)->saddr))
> + return segs;
> +
> + while ((seg = seg->next)) {
> + uh2 = udp_hdr(seg);
> + iph2 = ip_hdr(seg);
> +
> + __udpv4_gso_segment_csum(seg,
> +  >saddr, >saddr,
> +  >source, >source);
> + __udpv4_gso_segment_csum(seg,
> +  >daddr, >daddr,
> +  >dest, >dest);
> + }
> +
> + return segs;
> +}
> +
>  static struct sk_buff *__udp_gso_segment_list(

Re: [PATCH net v3] udp: ipv4: manipulate network header of NATed UDP GRO fraglist

2021-01-26 Thread Steffen Klassert

On Tue, Jan 26, 2021 at 09:31:29AM +0900, Dongseok Yi wrote:
> On 1/25/21 9:45 PM, Steffen Klassert wrote:
> > On Thu, Jan 21, 2021 at 10:24:39PM +0900, Dongseok Yi wrote:
> > >
> > > +static void __udpv4_gso_segment_csum(struct sk_buff *seg,
> > > +  __be32 *oldip, __be32 *newip,
> > > +  __be16 *oldport, __be16 *newport)
> > > +{
> > > + struct udphdr *uh;
> > > + struct iphdr *iph;
> > > +
> > > + if (*oldip == *newip && *oldport == *newport)
> > > + return;
> > 
> > This check is redundant as you check this already in
> > __udpv4_gso_segment_list_csum.
> 
> When comes in __udpv4_gso_segment_csum, the condition would be
> SNAT or DNAT. I think we don't need to do the function if the
> condition is not met. I want to skip the function for SNAT checksum
> when DNAT only case. Is it better to remove the check?

Ok, so it can be seen as an optimization. It is ok as it is.

Acked-by: Steffen Klassert 

Thanks!

Re: [PATCH net v2] udp: ipv4: manipulate network header of NATed UDP GRO fraglist

2021-01-21 Thread Steffen Klassert

On Wed, Jan 20, 2021 at 03:55:42PM +0900, Dongseok Yi wrote:
> On 2021-01-18 22:27, Steffen Klassert wrote:
> > On Fri, Jan 15, 2021 at 10:20:35PM +0900, Dongseok Yi wrote:
> > > UDP/IP header of UDP GROed frag_skbs are not updated even after NAT
> > > forwarding. Only the header of head_skb from ip_finish_output_gso ->
> > > skb_gso_segment is updated but following frag_skbs are not updated.
> > >
> > > A call path skb_mac_gso_segment -> inet_gso_segment ->
> > > udp4_ufo_fragment -> __udp_gso_segment -> __udp_gso_segment_list
> > > does not try to update UDP/IP header of the segment list but copy
> > > only the MAC header.
> > >
> > > Update dport, daddr and checksums of each skb of the segment list
> > > in __udp_gso_segment_list. It covers both SNAT and DNAT.
> > >
> > > Fixes: 9fd1ff5d2ac7 (udp: Support UDP fraglist GRO/GSO.)
> > > Signed-off-by: Dongseok Yi 
> > > ---
> > > v1:
> > > Steffen Klassert said, there could be 2 options.
> > > https://lore.kernel.org/patchwork/patch/1362257/
> > > I was trying to write a quick fix, but it was not easy to forward
> > > segmented list. Currently, assuming DNAT only.
> > >
> > > v2:
> > > Per Steffen Klassert request, move the procedure from
> > > udp4_ufo_fragment to __udp_gso_segment_list and support SNAT.
> > >
> > > To Alexander Lobakin, I've checked your email late. Just use this
> > > patch as a reference. It support SNAT too, but does not support IPv6
> > > yet. I cannot make IPv6 header changes in __udp_gso_segment_list due
> > > to the file is in IPv4 directory.
> > >
> > >  include/net/udp.h  |  2 +-
> > >  net/ipv4/udp_offload.c | 62 
> > > ++
> > >  net/ipv6/udp_offload.c |  2 +-
> > >  3 files changed, 59 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/include/net/udp.h b/include/net/udp.h
> > > index 877832b..01351ba 100644
> > > --- a/include/net/udp.h
> > > +++ b/include/net/udp.h
> > > @@ -178,7 +178,7 @@ struct sk_buff *udp_gro_receive(struct list_head 
> > > *head, struct sk_buff *skb,
> > >  int udp_gro_complete(struct sk_buff *skb, int nhoff, udp_lookup_t 
> > > lookup);
> > >
> > >  struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
> > > -   netdev_features_t features);
> > > +   netdev_features_t features, bool is_ipv6);
> > >
> > >  static inline struct udphdr *udp_gro_udphdr(struct sk_buff *skb)
> > >  {
> > > diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> > > index ff39e94..c532d3b 100644
> > > --- a/net/ipv4/udp_offload.c
> > > +++ b/net/ipv4/udp_offload.c
> > > @@ -187,8 +187,57 @@ struct sk_buff *skb_udp_tunnel_segment(struct 
> > > sk_buff *skb,
> > >  }
> > >  EXPORT_SYMBOL(skb_udp_tunnel_segment);
> > >
> > > +static void __udpv4_gso_segment_csum(struct sk_buff *seg,
> > > +  __be32 *oldip, __be32 *newip,
> > > +  __be16 *oldport, __be16 *newport)
> > > +{
> > > + struct udphdr *uh = udp_hdr(seg);
> > > + struct iphdr *iph = ip_hdr(seg);
> > > +
> > > + if (uh->check) {
> > > + inet_proto_csum_replace4(>check, seg, *oldip, *newip,
> > > +  true);
> > > + inet_proto_csum_replace2(>check, seg, *oldport, *newport,
> > > +  false);
> > > + if (!uh->check)
> > > + uh->check = CSUM_MANGLED_0;
> > > + }
> > > + uh->dest = *newport;
> > > +
> > > + csum_replace4(>check, *oldip, *newip);
> > > + iph->daddr = *newip;
> > > +}
> > 
> > Can't we just do the checksum recalculation for this case as it is done
> > with standard GRO?
> 
> If I understand standard GRO correctly, it calculates full checksum.
> Should we adopt the same method to UDP GRO fraglist? I did not find
> a simple method for the incremental checksum update.
> 
> > 
> > > +
> > > +static struct sk_buff *__udpv4_gso_segment_list_csum(struct sk_buff 
> > > *segs)
> > > +{
> > > + struct sk_buff *seg;
> > > + struct udphdr *uh, *uh2;
> > > + struct iphdr *iph, *iph2;
> > > +
> > > + seg = segs;
> >

Re: [PATCH net v2] udp: ipv4: manipulate network header of NATed UDP GRO fraglist

2021-01-18 Thread Steffen Klassert

On Fri, Jan 15, 2021 at 10:20:35PM +0900, Dongseok Yi wrote:
> UDP/IP header of UDP GROed frag_skbs are not updated even after NAT
> forwarding. Only the header of head_skb from ip_finish_output_gso ->
> skb_gso_segment is updated but following frag_skbs are not updated.
> 
> A call path skb_mac_gso_segment -> inet_gso_segment ->
> udp4_ufo_fragment -> __udp_gso_segment -> __udp_gso_segment_list
> does not try to update UDP/IP header of the segment list but copy
> only the MAC header.
> 
> Update dport, daddr and checksums of each skb of the segment list
> in __udp_gso_segment_list. It covers both SNAT and DNAT.
> 
> Fixes: 9fd1ff5d2ac7 (udp: Support UDP fraglist GRO/GSO.)
> Signed-off-by: Dongseok Yi 
> ---
> v1:
> Steffen Klassert said, there could be 2 options.
> https://lore.kernel.org/patchwork/patch/1362257/
> I was trying to write a quick fix, but it was not easy to forward
> segmented list. Currently, assuming DNAT only.
> 
> v2:
> Per Steffen Klassert request, move the procedure from
> udp4_ufo_fragment to __udp_gso_segment_list and support SNAT.
> 
> To Alexander Lobakin, I've checked your email late. Just use this
> patch as a reference. It support SNAT too, but does not support IPv6
> yet. I cannot make IPv6 header changes in __udp_gso_segment_list due
> to the file is in IPv4 directory.
> 
>  include/net/udp.h  |  2 +-
>  net/ipv4/udp_offload.c | 62 
> ++
>  net/ipv6/udp_offload.c |  2 +-
>  3 files changed, 59 insertions(+), 7 deletions(-)
> 
> diff --git a/include/net/udp.h b/include/net/udp.h
> index 877832b..01351ba 100644
> --- a/include/net/udp.h
> +++ b/include/net/udp.h
> @@ -178,7 +178,7 @@ struct sk_buff *udp_gro_receive(struct list_head *head, 
> struct sk_buff *skb,
>  int udp_gro_complete(struct sk_buff *skb, int nhoff, udp_lookup_t lookup);
>  
>  struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
> -   netdev_features_t features);
> +   netdev_features_t features, bool is_ipv6);
>  
>  static inline struct udphdr *udp_gro_udphdr(struct sk_buff *skb)
>  {
> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> index ff39e94..c532d3b 100644
> --- a/net/ipv4/udp_offload.c
> +++ b/net/ipv4/udp_offload.c
> @@ -187,8 +187,57 @@ struct sk_buff *skb_udp_tunnel_segment(struct sk_buff 
> *skb,
>  }
>  EXPORT_SYMBOL(skb_udp_tunnel_segment);
>  
> +static void __udpv4_gso_segment_csum(struct sk_buff *seg,
> +  __be32 *oldip, __be32 *newip,
> +  __be16 *oldport, __be16 *newport)
> +{
> + struct udphdr *uh = udp_hdr(seg);
> + struct iphdr *iph = ip_hdr(seg);
> +
> + if (uh->check) {
> + inet_proto_csum_replace4(>check, seg, *oldip, *newip,
> +  true);
> + inet_proto_csum_replace2(>check, seg, *oldport, *newport,
> +  false);
> + if (!uh->check)
> + uh->check = CSUM_MANGLED_0;
> + }
> + uh->dest = *newport;
> +
> + csum_replace4(>check, *oldip, *newip);
> + iph->daddr = *newip;
> +}

Can't we just do the checksum recalculation for this case as it is done
with standard GRO?

> +
> +static struct sk_buff *__udpv4_gso_segment_list_csum(struct sk_buff *segs)
> +{
> + struct sk_buff *seg;
> + struct udphdr *uh, *uh2;
> + struct iphdr *iph, *iph2;
> +
> + seg = segs;
> + uh = udp_hdr(seg);
> + iph = ip_hdr(seg);
> +
> + while ((seg = seg->next)) {
> + uh2 = udp_hdr(seg);
> + iph2 = ip_hdr(seg);
> +
> + if (uh->source != uh2->source || iph->saddr != iph2->saddr)
> + __udpv4_gso_segment_csum(seg,
> +  >saddr, >saddr,
> +  >source, >source);
> +
> + if (uh->dest != uh2->dest || iph->daddr != iph2->daddr)
> + __udpv4_gso_segment_csum(seg,
> +  >daddr, >daddr,
> +  >dest, >dest);
> + }

You don't need to check the addresses and ports of all packets in the
segment list. If the addresses and ports are equal for the first and
second packet in the list, then this also holds for the rest of the
packets.

Re: [PATCH net v2] udp: ipv4: manipulate network header of NATed UDP GRO fraglist

2021-01-18 Thread Steffen Klassert

On Mon, Jan 18, 2021 at 12:17:34PM +, Alexander Lobakin wrote:
> > From: Steffen Klassert 
> > Date: Mon, 18 Jan 2021 07:37:59 +0100
> > On Fri, Jan 15, 2021 at 05:12:33PM +, Alexander Lobakin wrote:
> >>
> >> I used another approach, tried to make fraglist GRO closer to plain
> >> in terms of checksummming, as it is confusing to me why GSO packet
> >> should have CHECKSUM_UNNECESSARY.
> >
> > This is intentional. With fraglist GRO, we don't mangle packets
> > in the standard (non NAT) case. So the checksum is still correct
> > after segmentation. That is one reason why it has good forwarding
> > performance when software segmentation is needed. Checksuming
> > touches the whole packet and has a lot of overhead, so it is
> > heplfull to avoid it whenever possible.
> >
> > We should find a way to do the checksum only when we really
> > need it. I.e. only if the headers of the head skb changed.
> 
> I suggest to do memcmp() between skb_network_header(skb) and
> skb_network_header(skb->frag_list) with the len of
> skb->data - skb_network_header(skb). This way we will detect changes
> in IPv4/IPv6 and UDP headers.

I thought about that too. Bbut with fraglist GRO, the length of
the packets can vary. Unlike standard GRO, there is no requirement
that the packets in the fraglist must be equal in length here. So
we can't compare the full headers. I think we need to test for
addresses and ports.

> If so, copy the full headers and fall back to the standard checksum,
> recalculation, else use the current path.

I agree that we should fallback to standard checksum recalculation
if the addresses or ports changed.

Re: [PATCH net v2] udp: ipv4: manipulate network header of NATed UDP GRO fraglist

2021-01-17 Thread Steffen Klassert

On Fri, Jan 15, 2021 at 05:12:33PM +, Alexander Lobakin wrote:
> From: Dongseok Yi 
> Date: Fri, 15 Jan 2021 22:20:35 +0900
> 
> > UDP/IP header of UDP GROed frag_skbs are not updated even after NAT
> > forwarding. Only the header of head_skb from ip_finish_output_gso ->
> > skb_gso_segment is updated but following frag_skbs are not updated.
> > 
> > A call path skb_mac_gso_segment -> inet_gso_segment ->
> > udp4_ufo_fragment -> __udp_gso_segment -> __udp_gso_segment_list
> > does not try to update UDP/IP header of the segment list but copy
> > only the MAC header.
> > 
> > Update dport, daddr and checksums of each skb of the segment list
> > in __udp_gso_segment_list. It covers both SNAT and DNAT.
> > 
> > Fixes: 9fd1ff5d2ac7 (udp: Support UDP fraglist GRO/GSO.)
> > Signed-off-by: Dongseok Yi 
> > ---
> > v1:
> > Steffen Klassert said, there could be 2 options.
> > https://lore.kernel.org/patchwork/patch/1362257/
> > I was trying to write a quick fix, but it was not easy to forward
> > segmented list. Currently, assuming DNAT only.
> > 
> > v2:
> > Per Steffen Klassert request, move the procedure from
> > udp4_ufo_fragment to __udp_gso_segment_list and support SNAT.
> > 
> > To Alexander Lobakin, I've checked your email late. Just use this
> > patch as a reference. It support SNAT too, but does not support IPv6
> > yet. I cannot make IPv6 header changes in __udp_gso_segment_list due
> > to the file is in IPv4 directory.
> 
> I used another approach, tried to make fraglist GRO closer to plain
> in terms of checksummming, as it is confusing to me why GSO packet
> should have CHECKSUM_UNNECESSARY.

This is intentional. With fraglist GRO, we don't mangle packets
in the standard (non NAT) case. So the checksum is still correct
after segmentation. That is one reason why it has good forwarding
performance when software segmentation is needed. Checksuming
touches the whole packet and has a lot of overhead, so it is
heplfull to avoid it whenever possible.

We should find a way to do the checksum only when we really
need it. I.e. only if the headers of the head skb changed.

Re: [PATCH net] udp: ipv4: manipulate network header of NATed UDP GRO fraglist

2021-01-15 Thread Steffen Klassert

On Fri, Jan 15, 2021 at 05:55:22PM +0900, Dongseok Yi wrote:
> On 2021-01-15 17:12, Steffen Klassert wrote:
> > On Fri, Jan 15, 2021 at 02:58:24PM +0900, Dongseok Yi wrote:
> > > UDP/IP header of UDP GROed frag_skbs are not updated even after NAT
> > > forwarding. Only the header of head_skb from ip_finish_output_gso ->
> > > skb_gso_segment is updated but following frag_skbs are not updated.
> > >
> > > A call path skb_mac_gso_segment -> inet_gso_segment ->
> > > udp4_ufo_fragment -> __udp_gso_segment -> __udp_gso_segment_list
> > > does not try to update UDP/IP header of the segment list.
> > 
> > We still need to find out why it works for Alexander, but not for you.
> > Different usecases?
> 
> This patch is not for
> https://lore.kernel.org/patchwork/patch/1364544/
> Alexander might want to call udp_gro_receive_segment even when
> !sk and ~NETIF_F_GRO_FRAGLIST.

Yes, I know. But he said that fraglist GRO + NAT works for him.
I want to find out why it works for him, but not for you.

> > 
> > I would not like to add this to a generic codepath. I think we can
> > relatively easy copy the full headers in skb_segment_list().
> 
> I tried to copy the full headers with the similar approach, but it
> copies length too. Can we keep the length of each skb of the fraglist?

Ah yes, good point.

Then maybe you can move your approach into __udp_gso_segment_list()
so that we dont touch generic code.

> 
> > 
> > I think about something like the (completely untested) patch below:
> > 
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index f62cae3f75d8..63ae7f79fad7 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -3651,13 +3651,14 @@ struct sk_buff *skb_segment_list(struct sk_buff 
> > *skb,
> >  unsigned int offset)
> >  {
> > struct sk_buff *list_skb = skb_shinfo(skb)->frag_list;
> > +   unsigned int doffset = skb->data - skb_mac_header(skb);
> > unsigned int tnl_hlen = skb_tnl_header_len(skb);
> > unsigned int delta_truesize = 0;
> > unsigned int delta_len = 0;
> > struct sk_buff *tail = NULL;
> > struct sk_buff *nskb;
> > 
> > -   skb_push(skb, -skb_network_offset(skb) + offset);
> > +   skb_push(skb, doffset);
> > 
> > skb_shinfo(skb)->frag_list = NULL;
> > 
> > @@ -3675,7 +3676,7 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb,
> > delta_len += nskb->len;
> > delta_truesize += nskb->truesize;
> > 
> > -   skb_push(nskb, -skb_network_offset(nskb) + offset);
> > +   skb_push(nskb, doffset);
> > 
> > skb_release_head_state(nskb);
> >  __copy_skb_header(nskb, skb);
> > diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> > index ff39e94781bf..1181398378b8 100644
> > --- a/net/ipv4/udp_offload.c
> > +++ b/net/ipv4/udp_offload.c
> > @@ -190,9 +190,22 @@ EXPORT_SYMBOL(skb_udp_tunnel_segment);
> >  static struct sk_buff *__udp_gso_segment_list(struct sk_buff *skb,
> >   netdev_features_t features)
> >  {
> > +   struct sk_buff *list_skb = skb_shinfo(skb)->frag_list;
> > unsigned int mss = skb_shinfo(skb)->gso_size;
> > +   unsigned int offset;
> > 
> > -   skb = skb_segment_list(skb, features, skb_mac_header_len(skb));
> > +   skb_headers_offset_update(list_skb, skb_headroom(list_skb) - 
> > skb_headroom(skb));
> > +
> > +   /* Check for header changes and copy the full header in that case. */
> > +   if ((udp_hdr(skb)->dest == udp_hdr(list_skb)->dest) &&
> > +   (udp_hdr(skb)->source == udp_hdr(list_skb)->source) &&
> > +   (ip_hdr(skb)->daddr == ip_hdr(list_skb)->daddr) &&
> > +   (ip_hdr(skb)->saddr == ip_hdr(list_skb)->saddr))
> > +   offset = skb_mac_header_len(skb);
> > +   else
> > +   offset = skb->data - skb_mac_header(skb);
> > +
> > +   skb = skb_segment_list(skb, features, offset);
> > if (IS_ERR(skb))
> > return skb;
> > 
> > 
> > After that you can apply the CSUM magic in __udp_gso_segment_list().
> 
> Sorry, I don't know CSUM magic well. Is it used for checksum
> incremental update too?

With that I meant the checksum updating you did in your patch.

Re: [PATCH net] udp: ipv4: manipulate network header of NATed UDP GRO fraglist

2021-01-15 Thread Steffen Klassert

On Fri, Jan 15, 2021 at 02:58:24PM +0900, Dongseok Yi wrote:
> UDP/IP header of UDP GROed frag_skbs are not updated even after NAT
> forwarding. Only the header of head_skb from ip_finish_output_gso ->
> skb_gso_segment is updated but following frag_skbs are not updated.
> 
> A call path skb_mac_gso_segment -> inet_gso_segment ->
> udp4_ufo_fragment -> __udp_gso_segment -> __udp_gso_segment_list
> does not try to update UDP/IP header of the segment list.

We still need to find out why it works for Alexander, but not for you.
Different usecases?

We copy only the MAC header in skb_segment_list(), so I think
this is a valid bug when NAT changed the UDP header.

> 
> Update dport, daddr and checksums of each skb of the segment list
> after __udp_gso_segment.
> 
> Fixes: 9fd1ff5d2ac7 (udp: Support UDP fraglist GRO/GSO.)
> Signed-off-by: Dongseok Yi 
> ---
> Steffen Klassert said, there could be 2 options.
> https://lore.kernel.org/patchwork/patch/1362257/
> 
> I was trying to write a quick fix, but it was not easy to forward
> segmented list. Currently, assuming DNAT only. Should we consider
> SNAT too?

If it is broken, then it is broken for both, so yes.

> 
>  net/ipv4/udp_offload.c | 45 +
>  1 file changed, 41 insertions(+), 4 deletions(-)
> 
> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> index ff39e94..7e24928 100644
> --- a/net/ipv4/udp_offload.c
> +++ b/net/ipv4/udp_offload.c
> @@ -309,10 +309,12 @@ static struct sk_buff *udp4_ufo_fragment(struct sk_buff 
> *skb,
>netdev_features_t features)
>  {
>   struct sk_buff *segs = ERR_PTR(-EINVAL);
> + struct sk_buff *seg;
>   unsigned int mss;
>   __wsum csum;
> - struct udphdr *uh;
> - struct iphdr *iph;
> + struct udphdr *uh, *uh2;
> + struct iphdr *iph, *iph2;
> + bool is_fraglist = false;
>  
>   if (skb->encapsulation &&
>   (skb_shinfo(skb)->gso_type &
> @@ -327,8 +329,43 @@ static struct sk_buff *udp4_ufo_fragment(struct sk_buff 
> *skb,
>   if (!pskb_may_pull(skb, sizeof(struct udphdr)))
>   goto out;
>  
> - if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4)
> - return __udp_gso_segment(skb, features);
> + if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4) {
> + if (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST)
> + is_fraglist = true;
> +
> + segs = __udp_gso_segment(skb, features);
> + if (IS_ERR_OR_NULL(segs) || !is_fraglist)
> + return segs;
> +
> + seg = segs;
> + uh = udp_hdr(seg);
> + iph = ip_hdr(seg);
> +
> + while ((seg = seg->next)) {
> + uh2 = udp_hdr(seg);
> + iph2 = ip_hdr(seg);
> +
> + if (uh->dest == uh2->dest && iph->daddr == iph2->daddr)
> + continue;
> +
> + if (uh2->check) {
> + inet_proto_csum_replace4(>check, seg,
> +  iph2->daddr,
> +  iph->daddr, true);
> + inet_proto_csum_replace2(>check, seg,
> +  uh2->dest, uh->dest,
> +  false);
> + if (!uh2->check)
> + uh2->check = CSUM_MANGLED_0;
> + }
> + uh2->dest = uh->dest;
> +
> + csum_replace4(>check, iph2->daddr, iph->daddr);
> + iph2->daddr = iph->daddr;
> + }
> +
> + return segs;
> + }

I would not like to add this to a generic codepath. I think we can
relatively easy copy the full headers in skb_segment_list().

I think about something like the (completely untested) patch below:

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index f62cae3f75d8..63ae7f79fad7 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3651,13 +3651,14 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb,
 unsigned int offset)
 {
struct sk_buff *list_skb = skb_shinfo(skb)->frag_list;
+   unsigned int doffset = skb->data - skb_mac_header(skb);
unsigned int tnl_hlen = skb_tnl_header_len(skb);
unsigned int delta_truesize = 0;
unsigned int delta_len = 0;
struct sk_buff *tail = NULL;

Re: [RFC PATCH net] udp: check sk for UDP GRO fraglist

2021-01-11 Thread Steffen Klassert

On Mon, Jan 11, 2021 at 11:02:42AM +0900, Dongseok Yi wrote:
> On 2021-01-08 22:35, Steffen Klassert wrote:
> > On Fri, Jan 08, 2021 at 09:52:28PM +0900, Dongseok Yi wrote:
> > > It is a workaround patch.
> > >
> > > UDP/IP header of UDP GROed frag_skbs are not updated even after NAT
> > > forwarding. Only the header of head_skb from ip_finish_output_gso ->
> > > skb_gso_segment is updated but following frag_skbs are not updated.
> > >
> > > A call path skb_mac_gso_segment -> inet_gso_segment ->
> > > udp4_ufo_fragment -> __udp_gso_segment -> __udp_gso_segment_list
> > > does not try to update any UDP/IP header of the segment list.
> > >
> > > It might make sense because each skb of frag_skbs is converted to a
> > > list of regular packets. Header update with checksum calculation may
> > > be not needed for UDP GROed frag_skbs.
> > >
> > > But UDP GRO frag_list is started from udp_gro_receive, we don't know
> > > whether the skb will be NAT forwarded at that time. For workaround,
> > > try to get sock always when call udp4_gro_receive -> udp_gro_receive
> > > to check if the skb is for local.
> > >
> > > I'm still not sure if UDP GRO frag_list is really designed for local
> > > session only. Can kernel support NAT forward for UDP GRO frag_list?
> > > What am I missing?
> > 
> > The initial idea when I implemented this was to have a fast
> > forwarding path for UDP. So forwarding is a usecase, but NAT
> > is a problem, indeed. A quick fix could be to segment the
> > skb before it gets NAT forwarded. Alternatively we could
> > check for a header change in __udp_gso_segment_list and
> > update the header of the frag_skbs accordingly in that case.
> 
> Thank you for explaining.
> Can I think of it as a known issue?

No, it was not known before you reported it.

> I think we should have a fix
> because NAT can be triggered by user. Can I check the current status?
> Already planning a patch or a new patch should be written?

We have to do a new patch to fix that issue. If you want do
do so, go ahead.

Re: [RFC PATCH net] udp: check sk for UDP GRO fraglist

2021-01-08 Thread Steffen Klassert

On Fri, Jan 08, 2021 at 09:52:28PM +0900, Dongseok Yi wrote:
> It is a workaround patch.
> 
> UDP/IP header of UDP GROed frag_skbs are not updated even after NAT
> forwarding. Only the header of head_skb from ip_finish_output_gso ->
> skb_gso_segment is updated but following frag_skbs are not updated.
> 
> A call path skb_mac_gso_segment -> inet_gso_segment ->
> udp4_ufo_fragment -> __udp_gso_segment -> __udp_gso_segment_list
> does not try to update any UDP/IP header of the segment list.
> 
> It might make sense because each skb of frag_skbs is converted to a
> list of regular packets. Header update with checksum calculation may
> be not needed for UDP GROed frag_skbs.
> 
> But UDP GRO frag_list is started from udp_gro_receive, we don't know
> whether the skb will be NAT forwarded at that time. For workaround,
> try to get sock always when call udp4_gro_receive -> udp_gro_receive
> to check if the skb is for local.
> 
> I'm still not sure if UDP GRO frag_list is really designed for local
> session only. Can kernel support NAT forward for UDP GRO frag_list?
> What am I missing?

The initial idea when I implemented this was to have a fast
forwarding path for UDP. So forwarding is a usecase, but NAT
is a problem, indeed. A quick fix could be to segment the
skb before it gets NAT forwarded. Alternatively we could
check for a header change in __udp_gso_segment_list and
update the header of the frag_skbs accordingly in that case.

Re: [PATCH] selftests: xfrm: fix test return value override issue in xfrm_policy.sh

2021-01-05 Thread Steffen Klassert

On Wed, Dec 30, 2020 at 05:52:04PM +0800, Po-Hsu Lin wrote:
> When running this xfrm_policy.sh test script, even with some cases
> marked as FAIL, the overall test result will still be PASS:
> 
> $ sudo ./xfrm_policy.sh
> PASS: policy before exception matches
> FAIL: expected ping to .254 to fail (exceptions)
> PASS: direct policy matches (exceptions)
> PASS: policy matches (exceptions)
> FAIL: expected ping to .254 to fail (exceptions and block policies)
> PASS: direct policy matches (exceptions and block policies)
> PASS: policy matches (exceptions and block policies)
> FAIL: expected ping to .254 to fail (exceptions and block policies after 
> hresh changes)
> PASS: direct policy matches (exceptions and block policies after hresh 
> changes)
> PASS: policy matches (exceptions and block policies after hresh changes)
> FAIL: expected ping to .254 to fail (exceptions and block policies after 
> hthresh change in ns3)
> PASS: direct policy matches (exceptions and block policies after hthresh 
> change in ns3)
> PASS: policy matches (exceptions and block policies after hthresh change in 
> ns3)
> FAIL: expected ping to .254 to fail (exceptions and block policies after 
> htresh change to normal)
> PASS: direct policy matches (exceptions and block policies after htresh 
> change to normal)
> PASS: policy matches (exceptions and block policies after htresh change to 
> normal)
> PASS: policies with repeated htresh change
> $ echo $?
> 0
> 
> This is because the $lret in check_xfrm() is not a local variable.
> Therefore when a test failed in check_exceptions(), the non-zero $lret
> will later get reset to 0 when the next test calls check_xfrm().
> 
> With this fix, the final return value will be 1. Make it easier for
> testers to spot this failure.
> 
> Fixes: 39aa6928d462d0 ("xfrm: policy: fix netlink/pf_key policy lookups")
> Signed-off-by: Po-Hsu Lin 

Applied to the ipsec tree, thanks!

Re: linux-next: Signed-off-by missing for commit in the ipsec tree

2020-12-18 Thread Steffen Klassert

On Sat, Dec 19, 2020 at 02:26:09PM +1100, Stephen Rothwell wrote:
> Hi all,
> 
> Commit
> 
>   06148d3b3f2e ("xfrm: Fix oops in xfrm_replay_advance_bmp")
> 
> is missing a Signed-off-by from its committer.

My bad. I did a forced push to fix it.

Re: USGv6 Tunnel Mode Fragmentation Failures

2020-12-16 Thread Steffen Klassert

On Thu, Nov 26, 2020 at 09:21:39AM +, Marler, Jonathan wrote:
> We've found an issue while running the following USGv6 tests where the kernel 
> drops outgoing packets:
> 
> 5.3.11 Tunnel Mode: Fragmentation
> 5.4.11 Tunnel Mode: Fragmentation
> 
> During the test, an esp PING request is sent to our device under test, and we 
> send back a response that is rejected by our router with a "packet to big" 
> error.  This error is fine, it's part of the test.  This error packet then 
> sets the MTU to 1280 (which also happens to be the minimum MTU size allowed 
> by ipv6 and the kernel).
> 
> The issue comes up when this mtu is adjusted by the esp6_get_mtu function.  
> It adjusts it to a value below the 1280 threshold, which causes any packets 
> associated with this MTU to be dropped in the ipv6_setup_cork function.
> 
> We are running on kernel version 4.9.180, but this issue looks as if it would 
> still exist in the latest version except that esp6_get_mtu has been replaced 
> by xfrm_state_mtu.  By adding instrumentation, we found during the test the 
> mtu value of 1280 given by the "packet to big" response gets passed to the 
> esp6_get_mtu function, and the following values are what we logged in that 
> function:
> 
> mtu = 1280
> x->props.header_len = 56
> crypto_aead_authsize(aead) = 12
> net_adj = 0
> blocksize = 8
> 
> This causes the function to return an mtu size of 1206, causing packets 
> thereafter to be dropped because this falls below the IPV6_MIN_MTU threshold.
> 
> Our idea is to modify esp6_get_mtu to limit the minimum mtu to IPV6_MIN_MTU.  
> We're not certain this is the correct fix, and thought to check with the 
> maintainers of that file, and a few others who have modified that function.  
> Any help or guidance here is appreciated, thank you.
> 

We should not return a mtu below IPV6_MIN_MTU for IPv6 and not below
IPV6_MIN_MTU for IPv4. So the proposed fix looks correct to me.

Re: [PATCH] net: xfrm: fix memory leak in xfrm_user_policy()

2020-11-11 Thread Steffen Klassert

On Tue, Nov 10, 2020 at 09:14:43AM +0800, Yu Kuai wrote:
> if xfrm_get_translator() failed, xfrm_user_policy() return without
> freeing 'data', which is allocated in memdup_sockptr().
> 
> Fixes: 96392ee5a13b ("xfrm/compat: Translate 32-bit user_policy from sockptr")
> Reported-by: Hulk Robot 
> Signed-off-by: Yu Kuai 

Patch applied, thanks!

Re: [PATCH v2 0/3] xfrm/compat: syzbot-found fixes

2020-11-09 Thread Steffen Klassert

On Mon, Nov 02, 2020 at 04:14:44PM +, Dmitry Safonov wrote:
> v2: Added "Fixes" tags to the patches.
> 
> WARN_ON() for XFRMA_UNSPEC translation which likely no-one except
> syzkaller uses; properly zerofy tail-padding for 64-bit attribute;
> don't use __GFP_ZERO as the memory is initialized during translation.
> 
> Cc: Steffen Klassert 
> Cc: "David S. Miller" 
> Cc: Jakub Kicinski 
> Cc: Herbert Xu 
> Cc: Hillf Danton 
> Cc: net...@vger.kernel.org
> 
> Thanks,
>  Dmitry
> 
> Dmitry Safonov (3):
>   xfrm/compat: Translate by copying XFRMA_UNSPEC attribute
>   xfrm/compat: memset(0) 64-bit padding at right place
>   xfrm/compat: Don't allocate memory with __GFP_ZERO

Series applied, thanks a lot Dmitry!

Re: [PATCH] xfrm:fragmented ipv4 tunnel packets in inner interface

2020-11-09 Thread Steffen Klassert

On Thu, Nov 05, 2020 at 01:52:01PM +0900, Lorenzo Colitti wrote:
> On Tue, Sep 15, 2020 at 4:30 PM Steffen Klassert
>  wrote:
> > > In esp's tunnel mode,if inner interface is ipv4,outer is ipv4,one big
> > > packet which travels through tunnel will be fragmented with outer
> > > interface's mtu,peer server will remove tunnelled esp header and assemble
> > > them in big packet.After forwarding such packet to next endpoint,it will
> > > be dropped because of exceeding mtu or be returned ICMP(packet-too-big).
> >
> > What is the exact case where packets are dropped? Given that the packet
> > was fragmented (and reassembled), I'd assume the DF bit was not set. So
> > every router along the path is allowed to fragment again if needed.
> 
> In general, isn't it just suboptimal to rely on fragmentation if the
> sender already knows the packet is too big? That's why we have things
> like path MTU discovery (RFC 1191).

When we setup packets that are sent from a local socket, we take
MTU/PMTU info we have into account. So we don't create fragments in
that case.

When forwarding packets it is different. The router that can not
TX the packet because it exceeds the MTU of the sending interface
is responsible to either fragment (if DF is not set), or send a
PMTU notification (if DF is set). So if we are able to transmit
the packet, we do it.

> Fragmentation is generally
> expensive, increases the chance of packet loss, and has historically
> caused lots of security vulnerabilities. Also, in real world networks,
> fragments sometimes just don't work, either because intermediate
> routers don't fragment, or because firewalls drop the fragments due to
> security reasons.
> 
> While it's possible in theory to ask these operators to configure
> their routers to fragment packets, that may not result in the network
> being fixed, due to hardware constraints, security policy or other
> reasons.

We can not really do anything here. If a flow has no DF bit set
on the packets, we can not rely on PMTU information. If we have PMTU
info on the route, then we have it because some other flow (that has
DF bit set on the packets) triggered PMTU discovery. That means that
the PMTU information is reset when this flow (with DF set) stops
sending packets. So the other flow (with DF not set) will send
big packets again.

> Those operators may also be in a position to place
> requirements on devices that have to use their network. If the Linux
> stack does not work as is on these networks, then those devices will
> have to meet those requirements by making out-of-tree changes. It
> would be good to avoid that if there's a better solution (e.g., make
> this configurable via sysctl).

We should not try to workaround broken configurations, there are just
too many possibilities to configure a broken network.

Re: [PATCH 0/3] xfrm/compat: syzbot-found fixes

2020-11-02 Thread Steffen Klassert

On Fri, Oct 30, 2020 at 02:25:57AM +, Dmitry Safonov wrote:
> WARN_ON() for XFRMA_UNSPEC translation which likely no-one except
> syzkaller uses; properly zerofy tail-padding for 64-bit attribute;
> don't use __GFP_ZERO as the memory is initialized during translation.
> 
> Cc: Steffen Klassert 
> Cc: "David S. Miller" 
> Cc: Jakub Kicinski 
> Cc: Herbert Xu 
> Cc: Hillf Danton 
> Cc: net...@vger.kernel.org
> 
> Thanks,
>  Dmitry
> 
> Dmitry Safonov (3):
>   xfrm/compat: Translate by copying XFRMA_UNSPEC attribute
>   xfrm/compat: memset(0) 64-bit padding at right place
>   xfrm/compat: Don't allocate memory with __GFP_ZERO

Can you please add 'Fixes' tags to all the patches.

Thanks!

Re: WARNING in xfrm_alloc_compat

2020-10-28 Thread Steffen Klassert

Same here, Dmitry please look into it.

I guess we can just remove the WARN_ON() that
triggeres here.

On Mon, Oct 26, 2020 at 06:58:28AM -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:f11901ed Merge tag 'xfs-5.10-merge-7' of git://git.kernel...
> git tree:   upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=17b3556450
> kernel config:  https://syzkaller.appspot.com/x/.config?x=fb79b5c2dc1e69e3
> dashboard link: https://syzkaller.appspot.com/bug?extid=a7e701c8385bd8543074
> compiler:   gcc (GCC) 10.1.0-syz 20200507
> 
> Unfortunately, I don't have any reproducer for this issue yet.
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+a7e701c8385bd8543...@syzkaller.appspotmail.com
> 
> netlink: 404 bytes leftover after parsing attributes in process 
> `syz-executor.4'.
> [ cut here ]
> unsupported nla_type 0
> WARNING: CPU: 0 PID: 9953 at net/xfrm/xfrm_compat.c:279 xfrm_xlate64_attr 
> net/xfrm/xfrm_compat.c:279 [inline]
> WARNING: CPU: 0 PID: 9953 at net/xfrm/xfrm_compat.c:279 xfrm_xlate64 
> net/xfrm/xfrm_compat.c:300 [inline]
> WARNING: CPU: 0 PID: 9953 at net/xfrm/xfrm_compat.c:279 
> xfrm_alloc_compat+0xf39/0x10d0 net/xfrm/xfrm_compat.c:327
> Modules linked in:
> CPU: 0 PID: 9953 Comm: syz-executor.4 Not tainted 5.9.0-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 01/01/2011
> RIP: 0010:xfrm_xlate64_attr net/xfrm/xfrm_compat.c:279 [inline]
> RIP: 0010:xfrm_xlate64 net/xfrm/xfrm_compat.c:300 [inline]
> RIP: 0010:xfrm_alloc_compat+0xf39/0x10d0 net/xfrm/xfrm_compat.c:327
> Code: de e8 4b 68 d3 f9 84 db 0f 85 b0 f8 ff ff e8 2e 70 d3 f9 8b 74 24 08 48 
> c7 c7 40 b9 51 8a c6 05 f7 0d 3c 05 01 e8 b7 db 0e 01 <0f> 0b e9 8d f8 ff ff 
> e8 0b 70 d3 f9 8b 14 24 48 c7 c7 00 b9 51 8a
> RSP: 0018:c9000bb4f4b8 EFLAGS: 00010282
> RAX:  RBX:  RCX: 
> RDX: 0004 RSI: 8158cf25 RDI: f52001769e89
> RBP: 01a0 R08: 0001 R09: 8880b9e2005b
> R10:  R11:  R12: ffa1
> R13: 88802ed1d8f8 R14: 888014403c80 R15: 88801514fc80
> FS:  7f188bbe6700() GS:8880b9f0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 0074b698 CR3: 1aabe000 CR4: 001506e0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
>  xfrm_alloc_userspi+0x66a/0xa30 net/xfrm/xfrm_user.c:1388
>  xfrm_user_rcv_msg+0x42f/0x8b0 net/xfrm/xfrm_user.c:2752
>  netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494
>  xfrm_netlink_rcv+0x6b/0x90 net/xfrm/xfrm_user.c:2764
>  netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
>  netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1330
>  netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1919
>  sock_sendmsg_nosec net/socket.c:651 [inline]
>  sock_sendmsg+0xcf/0x120 net/socket.c:671
>  sys_sendmsg+0x6e8/0x810 net/socket.c:2353
>  ___sys_sendmsg+0xf3/0x170 net/socket.c:2407
>  __sys_sendmsg+0xe5/0x1b0 net/socket.c:2440
>  do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x45de59
> Code: 0d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 
> 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 
> 83 db b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:7f188bbe5c78 EFLAGS: 0246 ORIG_RAX: 002e
> RAX: ffda RBX: 0002e640 RCX: 0045de59
> RDX:  RSI: 2100 RDI: 0003
> RBP: 0118bf60 R08:  R09: 
> R10:  R11: 0246 R12: 0118bf2c
> R13: 0169fb7f R14: 7f188bbe69c0 R15: 0118bf2c
> 
> 
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkal...@googlegroups.com.
> 
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

Re: KASAN: slab-out-of-bounds Write in xfrm_attr_cpy32

2020-10-28 Thread Steffen Klassert



Dimitry, you added this code, can you please look into
that?

Thanks!

On Wed, Oct 28, 2020 at 05:00:22PM +0800, Hillf Danton wrote:
> On Fri, 23 Oct 2020 01:38:23 -0700
> > Hello,
> > 
> > syzbot found the following issue on:
> > 
> > HEAD commit:c4d6fe73 Merge tag 'xarray-5.9' of git://git.infradead.org..
> > git tree:   upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=1117ff7850
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=5e8379456358b93c
> > dashboard link: https://syzkaller.appspot.com/bug?extid=c43831072e7df506a646
> > compiler:   gcc (GCC) 10.1.0-syz 20200507
> > userspace arch: i386
> > 
> > Unfortunately, I don't have any reproducer for this issue yet.
> > 
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+c43831072e7df506a...@syzkaller.appspotmail.com
> > 
> > ==
> > BUG: KASAN: slab-out-of-bounds in memset include/linux/string.h:384 [inline]
> > BUG: KASAN: slab-out-of-bounds in xfrm_attr_cpy32+0x15a/0x1d0 
> > net/xfrm/xfrm_compat.c:393
> > Write of size 4 at addr 88801c57e6c0 by task syz-executor.0/9498
> > 
> > CPU: 1 PID: 9498 Comm: syz-executor.0 Not tainted 5.9.0-syzkaller #0
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> > rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
> > Call Trace:
> >  __dump_stack lib/dump_stack.c:77 [inline]
> >  dump_stack+0x107/0x163 lib/dump_stack.c:118
> >  print_address_description.constprop.0.cold+0xae/0x497 mm/kasan/report.c:385
> >  __kasan_report mm/kasan/report.c:545 [inline]
> >  kasan_report.cold+0x1f/0x37 mm/kasan/report.c:562
> >  check_memory_region_inline mm/kasan/generic.c:186 [inline]
> >  check_memory_region+0x13d/0x180 mm/kasan/generic.c:192
> >  memset+0x20/0x40 mm/kasan/common.c:84
> >  memset include/linux/string.h:384 [inline]
> >  xfrm_attr_cpy32+0x15a/0x1d0 net/xfrm/xfrm_compat.c:393
> >  xfrm_xlate32_attr net/xfrm/xfrm_compat.c:426 [inline]
> >  xfrm_xlate32 net/xfrm/xfrm_compat.c:525 [inline]
> >  xfrm_user_rcv_msg_compat+0x76b/0x1040 net/xfrm/xfrm_compat.c:570
> >  xfrm_user_rcv_msg+0x55b/0x8b0 net/xfrm/xfrm_user.c:2714
> >  netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494
> >  xfrm_netlink_rcv+0x6b/0x90 net/xfrm/xfrm_user.c:2764
> >  netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
> >  netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1330
> >  netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1919
> >  sock_sendmsg_nosec net/socket.c:651 [inline]
> >  sock_sendmsg+0xcf/0x120 net/socket.c:671
> >  sys_sendmsg+0x6e8/0x810 net/socket.c:2353
> >  ___sys_sendmsg+0xf3/0x170 net/socket.c:2407
> >  __sys_sendmsg+0xe5/0x1b0 net/socket.c:2440
> >  do_syscall_32_irqs_on arch/x86/entry/common.c:78 [inline]
> >  __do_fast_syscall_32+0x56/0x80 arch/x86/entry/common.c:137
> >  do_fast_syscall_32+0x2f/0x70 arch/x86/entry/common.c:160
> >  entry_SYSENTER_compat_after_hwframe+0x4d/0x5c
> > RIP: 0023:0xf7f86549
> > Code: 03 74 c0 01 10 05 03 74 b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 
> > 08 03 74 d8 01 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 
> > 90 90 90 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90
> > RSP: 002b:f55800bc EFLAGS: 0296 ORIG_RAX: 0172
> > RAX: ffda RBX: 0003 RCX: 21c0
> > RDX:  RSI:  RDI: 
> > RBP:  R08:  R09: 
> > R10:  R11:  R12: 
> > R13:  R14:  R15: 
> > 
> > Allocated by task 9498:
> >  kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
> >  kasan_set_track mm/kasan/common.c:56 [inline]
> >  __kasan_kmalloc.constprop.0+0xbf/0xd0 mm/kasan/common.c:461
> >  kmalloc_node include/linux/slab.h:577 [inline]
> >  kvmalloc_node+0x61/0xf0 mm/util.c:575
> >  kvmalloc include/linux/mm.h:765 [inline]
> >  xfrm_user_rcv_msg_compat+0x3cd/0x1040 net/xfrm/xfrm_compat.c:566
> >  xfrm_user_rcv_msg+0x55b/0x8b0 net/xfrm/xfrm_user.c:2714
> >  netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494
> >  xfrm_netlink_rcv+0x6b/0x90 net/xfrm/xfrm_user.c:2764
> >  netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
> >  netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1330
> >  netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1919
> >  sock_sendmsg_nosec net/socket.c:651 [inline]
> >  sock_sendmsg+0xcf/0x120 net/socket.c:671
> >  sys_sendmsg+0x6e8/0x810 net/socket.c:2353
> >  ___sys_sendmsg+0xf3/0x170 net/socket.c:2407
> >  __sys_sendmsg+0xe5/0x1b0 net/socket.c:2440
> >  do_syscall_32_irqs_on arch/x86/entry/common.c:78 [inline]
> >  __do_fast_syscall_32+0x56/0x80 arch/x86/entry/common.c:137
> >  do_fast_syscall_32+0x2f/0x70 arch/x86/entry/common.c:160
> >  entry_SYSENTER_compat_after_hwframe+0x4d/0x5c
> > 
> > Last call_rcu():
> >

Re: [PATCH v2] net: xfrm: fix a race condition during allocing spi

2020-10-26 Thread Steffen Klassert

On Thu, Oct 22, 2020 at 06:01:27PM +0800, Zhuoliang Zhang wrote:
> From: zhuoliang zhang 
> 
> we found that the following race condition exists in
> xfrm_alloc_userspi flow:
> 
> user threadstate_hash_work thread
>    
> xfrm_alloc_userspi()
>  __find_acq_core()
>/*alloc new xfrm_state:x*/
>xfrm_state_alloc()
>/*schedule state_hash_work thread*/
>xfrm_hash_grow_check()xfrm_hash_resize()
>  xfrm_alloc_spi  /*hold lock*/
>   x->id.spi = htonl(spi) 
> spin_lock_bh(>xfrm.xfrm_state_lock)
>   /*waiting lock release*/ xfrm_hash_transfer()
>   spin_lock_bh(>xfrm.xfrm_state_lock)  /*add x into 
> hlist:net->xfrm.state_byspi*/
>   
> hlist_add_head_rcu(>byspi)
>  
> spin_unlock_bh(>xfrm.xfrm_state_lock)
> 
> /*add x into hlist:net->xfrm.state_byspi 2 times*/
> hlist_add_head_rcu(>byspi)
> 
> 1. a new state x is alloced in xfrm_state_alloc() and added into the bydst 
> hlist
> in  __find_acq_core() on the LHS;
> 2. on the RHS, state_hash_work thread travels the old bydst and tranfers 
> every xfrm_state
> (include x) into the new bydst hlist and new byspi hlist;
> 3. user thread on the LHS gets the lock and adds x into the new byspi hlist 
> again.
> 
> So the same xfrm_state (x) is added into the same list_hash
> (net->xfrm.state_byspi) 2 times that makes the list_hash become
> an inifite loop.
> 
> To fix the race, x->id.spi = htonl(spi) in the xfrm_alloc_spi() is moved
> to the back of spin_lock_bh, sothat state_hash_work thread no longer add x
> which id.spi is zero into the hash_list.
> 
> Fixes: f034b5d4efdf ("[XFRM]: Dynamic xfrm_state hash table sizing.")
> Signed-off-by: zhuoliang zhang 

Applied, thanks a lot!

One remark, please don't use base64 encoding when you send patches.
I had to hand edit your patch to get it applied.

Re: [PATCH v3 0/7] xfrm: Add compat layer

2020-09-28 Thread Steffen Klassert

On Mon, Sep 21, 2020 at 03:36:50PM +0100, Dmitry Safonov wrote:
> Changes since v2:
> - added struct xfrm_translator as API to register xfrm_compat.ko with
>   xfrm_state.ko. This allows compilation of translator as a loadable
>   module
> - fixed indention and collected reviewed-by (Johannes Berg)
> - moved boilerplate from commit messages into cover-letter (Steffen
>   Klassert)
> - found on KASAN build and fixed non-initialised stack variable usage
>   in the translator
> 
> The resulting v2/v3 diff can be found here:
> https://gist.github.com/0x7f454c46/8f68311dfa1f240959fdbe7c77ed2259
> 
> Patches as a .git branch:
> https://github.com/0x7f454c46/linux/tree/xfrm-compat-v3
> 
> Changes since v1:
> - reworked patches set to use translator
> - separated the compat layer into xfrm_compat.c,
>   compiled under XFRM_USER_COMPAT config
> - 32-bit messages now being sent in frag_list (like wext-core does)
> - instead of __packed add compat_u64 members in compat structures
> - selftest reworked to kselftest lib API
> - added netlink dump testing to the selftest
> 
> XFRM is disabled for compatible users because of the UABI difference.
> The difference is in structures paddings and in the result the size
> of netlink messages differ.
> 
> Possibility for compatible application to manage xfrm tunnels was
> disabled by: the commmit 19d7df69fdb2 ("xfrm: Refuse to insert 32 bit
> userspace socket policies on 64 bit systems") and the commit 74005991b78a
> ("xfrm: Do not parse 32bits compiled xfrm netlink msg on 64bits host").
> 
> This is my second attempt to resolve the xfrm/compat problem by adding
> the 64=>32 and 32=>64 bit translators those non-visibly to a user
> provide translation between compatible user and kernel.
> Previous attempt was to interrupt the message ABI according to a syscall
> by xfrm_user, which resulted in over-complicated code [1].
> 
> Florian Westphal provided the idea of translator and some draft patches
> in the discussion. In these patches, his idea is reused and some of his
> initial code is also present.
> 
> There were a couple of attempts to solve xfrm compat problem:
> https://lkml.org/lkml/2017/1/20/733
> https://patchwork.ozlabs.org/patch/44600/
> http://netdev.vger.kernel.narkive.com/2Gesykj6/patch-net-next-xfrm-correctly-parse-netlink-msg-from-32bits-ip-command-on-64bits-host
> 
> All the discussions end in the conclusion that xfrm should have a full
> compatible layer to correctly work with 32-bit applications on 64-bit
> kernels:
> https://lkml.org/lkml/2017/1/23/413
> https://patchwork.ozlabs.org/patch/433279/
> 
> In some recent lkml discussion, Linus said that it's worth to fix this
> problem and not giving people an excuse to stay on 32-bit kernel:
> https://lkml.org/lkml/2018/2/13/752
> 
> There is also an selftest for ipsec tunnels.
> It doesn't depend on any library and compat version can be easy
> build with: make CFLAGS=-m32 net/ipsec
> 
> [1]: https://lkml.kernel.org/r/20180726023144.31066-1-d...@arista.com
> 
> Cc: "David S. Miller" 
> Cc: Florian Westphal 
> Cc: Herbert Xu 
> Cc: Jakub Kicinski 
> Cc: Johannes Berg 
> Cc: Steffen Klassert 
> Cc: Stephen Suryaputra 
> Cc: Dmitry Safonov <0x7f454...@gmail.com>
> Cc: net...@vger.kernel.org
> 
> Dmitry Safonov (7):
>   xfrm: Provide API to register translator module
>   xfrm/compat: Add 64=>32-bit messages translator
>   xfrm/compat: Attach xfrm dumps to 64=>32 bit translator
>   netlink/compat: Append NLMSG_DONE/extack to frag_list
>   xfrm/compat: Add 32=>64-bit messages translator
>   xfrm/compat: Translate 32-bit user_policy from sockptr
>   selftest/net/xfrm: Add test for ipsec tunnel

Series applied to ipsec-next. Thanks a lot for your work Dmitry!

Re: [PATCH] xfrm: Use correct address family in xfrm_state_find

2020-09-27 Thread Steffen Klassert

On Fri, Sep 25, 2020 at 02:42:56PM +1000, Herbert Xu wrote:
> Resend with proper subject.
>  
> ---8<---
> The struct flowi must never be interpreted by itself as its size
> depends on the address family.  Therefore it must always be grouped
> with its original family value.
> 
> In this particular instance, the original family value is lost in
> the function xfrm_state_find.  Therefore we get a bogus read when
> it's coupled with the wrong family which would occur with inter-
> family xfrm states.
> 
> This patch fixes it by keeping the original family value.
> 
> Note that the same bug could potentially occur in LSM through
> the xfrm_state_pol_flow_match hook.  I checked the current code
> there and it seems to be safe for now as only secid is used which
> is part of struct flowi_common.  But that API should be changed
> so that so that we don't get new bugs in the future.  We could
> do that by replacing fl with just secid or adding a family field.
> 
> Reported-by: syzbot+577fbac3145a6eb2e...@syzkaller.appspotmail.com
> Fixes: 48b8d78315bf ("[XFRM]: State selection update to use inner...")
> Signed-off-by: Herbert Xu 

Applied, thanks a lot Herbert!

Re: KASAN: stack-out-of-bounds Read in xfrm_selector_match (2)

2020-09-24 Thread Steffen Klassert

On Thu, Sep 24, 2020 at 05:43:51PM +1000, Herbert Xu wrote:
> On Thu, Sep 24, 2020 at 09:40:26AM +0200, Steffen Klassert wrote:
> >
> > This is yet another ipv4 mapped ipv6 address with IPsec socket policy
> > combination bug, and I'm sure it is not the last one. We could fix this
> > one by adding another check to match the address family of the policy
> > and the SA selector, but maybe it is better to think about how this
> > should work at all.
> > 
> > We can have only one socket policy for each direction and that
> > policy accepts either ipv4 or ipv6. We treat this ipv4 mapped ipv6
> > address as ipv4 and pass it down the ipv4 stack, so this dual usage
> > will not work with a socket policy. Maybe we can require IPV6_V6ONLY
> > for sockets with policy attached. Thoughts?
> 
> I'm looking at the history of this and it used to work at the start
> because you'd always interpret the flow object with a family.  This
> appears to have been lost with 8444cf712c5f71845cba9dc30d8f530ff0d5ff83. 

I'm sure it can be fixed to work with either ipv4 or ipv6.
If I understand that right, it should be possible to talk
ipv4 and ipv6 through that socket, but the policy will
accept only one address family.

> I'm working on a fix.

Thanks!

Re: KASAN: stack-out-of-bounds Read in xfrm_selector_match (2)

2020-09-24 Thread Steffen Klassert

On Mon, Sep 21, 2020 at 07:56:20AM -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:eb5f95f1 Merge tag 's390-5.9-6' of git://git.kernel.org/pu..
> git tree:   upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=13996ad590
> kernel config:  https://syzkaller.appspot.com/x/.config?x=ffe85b197a57c180
> dashboard link: https://syzkaller.appspot.com/bug?extid=577fbac3145a6eb2e7a5
> compiler:   gcc (GCC) 10.1.0-syz 20200507
> 
> Unfortunately, I don't have any reproducer for this issue yet.
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+577fbac3145a6eb2e...@syzkaller.appspotmail.com
> 
> ==
> BUG: KASAN: stack-out-of-bounds in xfrm_flowi_dport include/net/xfrm.h:877 
> [inline]
> BUG: KASAN: stack-out-of-bounds in __xfrm6_selector_match 
> net/xfrm/xfrm_policy.c:216 [inline]
> BUG: KASAN: stack-out-of-bounds in xfrm_selector_match+0xf36/0xf60 
> net/xfrm/xfrm_policy.c:229
> Read of size 2 at addr c9001914f55c by task syz-executor.4/15633

This is yet another ipv4 mapped ipv6 address with IPsec socket policy
combination bug, and I'm sure it is not the last one. We could fix this
one by adding another check to match the address family of the policy
and the SA selector, but maybe it is better to think about how this
should work at all.

We can have only one socket policy for each direction and that
policy accepts either ipv4 or ipv6. We treat this ipv4 mapped ipv6
address as ipv4 and pass it down the ipv4 stack, so this dual usage
will not work with a socket policy. Maybe we can require IPV6_V6ONLY
for sockets with policy attached. Thoughts?

Re: [PATCH] xfrm:fragmented ipv4 tunnel packets in inner interface

2020-09-15 Thread Steffen Klassert

On Wed, Sep 09, 2020 at 02:26:13PM +0800, mtk81216 wrote:
> In esp's tunnel mode,if inner interface is ipv4,outer is ipv4,one big 
> packet which travels through tunnel will be fragmented with outer 
> interface's mtu,peer server will remove tunnelled esp header and assemble
> them in big packet.After forwarding such packet to next endpoint,it will 
> be dropped because of exceeding mtu or be returned ICMP(packet-too-big).

What is the exact case where packets are dropped? Given that the packet
was fragmented (and reassembled), I'd assume the DF bit was not set. So
every router along the path is allowed to fragment again if needed.

> When inner interface is ipv4,outer is ipv6,the flag of xfrm state in tunnel
> mode is af-unspec, thing is different.One big packet through tunnel will be
> fragmented with outer interface's mtu minus tunneled header, then two or 
> more less fragmented packets will be tunneled and transmitted in outer 
> interface,that is what xfrm6_output has done. If peer server receives such
> packets, it will forward successfully to next because length is valid.
> 
> This patch has followed up xfrm6_output's logic,which includes two changes,
> one is choosing suitable mtu value which considering innner/outer 
> interface's mtu and dst path, the other is if packet is too big, calling 
> ip_fragment first,then tunnelling fragmented packets in outer interface and
> transmitting finally.
> 
> Signed-off-by: mtk81216 

Please use your real name to sign off.

Re: KASAN: use-after-free Read in __xfrm6_tunnel_spi_lookup

2020-09-11 Thread Steffen Klassert

On Thu, Sep 10, 2020 at 10:09:50AM +0200, Dmitry Vyukov wrote:
> On Thu, Sep 10, 2020 at 10:08 AM B K Karthik  wrote:
> >
> > On Thu, Sep 10, 2020 at 1:32 PM Dmitry Vyukov  wrote:
> > >
> > > On Thu, Sep 10, 2020 at 9:20 AM Anant Thazhemadam
> > >  wrote:
> > > > Looks like this bug is no longer valid. I'm not sure which commit seems 
> > > > to have fixed it. Can this be marked as invalid or closed yet?
> > >
> > > You can see on the dashboard (or in mailing list archives) that B K
> > > Karthik tested a patch for this bug in July:
> > > https://syzkaller.appspot.com/bug?extid=72ff2fa98097767b5a27
> > >
> > > So perhaps that patch fixes it? Karthik, did you send it? Was it
> > > merged? Did the commit include the syzbot Reported-by tag?
> > >
> >
> > I did send it. I was taking a u32 spi value and casting it to a
> > pointer to an IP address. Steffen Klassert
> >  pointed out to me that the approach i
> > was looking at was completely wrong.
> > https://lkml.org/lkml/2020/7/27/361 is the conversation. hope this
> > helps.
> 
> +Steffen, was there any other fix merged for this?

I think that was already fixed before the sysbot report came in by
commit 8b404f46dd6a ("xfrm: interface: not xfrmi_ipv6/ipip_handler twice")

Re: [PATCH v2 1/6] xfrm/compat: Add 64=>32-bit messages translator

2020-09-07 Thread Steffen Klassert

On Wed, Aug 26, 2020 at 02:49:44AM +0100, Dmitry Safonov wrote:
> XFRM is disabled for compatible users because of the UABI difference.
> The difference is in structures paddings and in the result the size
> of netlink messages differ.
> 
> Possibility for compatible application to manage xfrm tunnels was
> disabled by: the commmit 19d7df69fdb2 ("xfrm: Refuse to insert 32 bit
> userspace socket policies on 64 bit systems") and the commit 74005991b78a
> ("xfrm: Do not parse 32bits compiled xfrm netlink msg on 64bits host").
> 
> This is my second attempt to resolve the xfrm/compat problem by adding
> the 64=>32 and 32=>64 bit translators those non-visibly to a user
> provide translation between compatible user and kernel.
> Previous attempt was to interrupt the message ABI according to a syscall
> by xfrm_user, which resulted in over-complicated code [1].
> 
> Florian Westphal provided the idea of translator and some draft patches
> in the discussion. In these patches, his idea is reused and some of his
> initial code is also present.

One comment on this. Looks like the above is the same in all
commit messages. Please provide that generic information
with the patch 0/n and remove it from the other patches.

Re: [PATCH v2 0/6] xfrm: Add compat layer

2020-09-07 Thread Steffen Klassert

On Wed, Aug 26, 2020 at 02:49:43AM +0100, Dmitry Safonov wrote:
> Changes since v1:
> - reworked patches set to use translator
> - separated the compat layer into xfrm_compat.c,
>   compiled under XFRM_USER_COMPAT config
> - 32-bit messages now being sent in frag_list (like wext-core does)
> - instead of __packed add compat_u64 members in compat structures
> - selftest reworked to kselftest lib API
> - added netlink dump testing to the selftest
> 
> XFRM is disabled for compatible users because of the UABI difference.
> The difference is in structures paddings and in the result the size
> of netlink messages differ.
> 
> Possibility for compatible application to manage xfrm tunnels was
> disabled by: the commmit 19d7df69fdb2 ("xfrm: Refuse to insert 32 bit
> userspace socket policies on 64 bit systems") and the commit 74005991b78a
> ("xfrm: Do not parse 32bits compiled xfrm netlink msg on 64bits host").
> 
> This is my second attempt to resolve the xfrm/compat problem by adding
> the 64=>32 and 32=>64 bit translators those non-visibly to a user
> provide translation between compatible user and kernel.
> Previous attempt was to interrupt the message ABI according to a syscall
> by xfrm_user, which resulted in over-complicated code [1].
> 
> Florian Westphal provided the idea of translator and some draft patches
> in the discussion. In these patches, his idea is reused and some of his
> initial code is also present.
> 
> There were a couple of attempts to solve xfrm compat problem:
> https://lkml.org/lkml/2017/1/20/733
> https://patchwork.ozlabs.org/patch/44600/
> http://netdev.vger.kernel.narkive.com/2Gesykj6/patch-net-next-xfrm-correctly-parse-netlink-msg-from-32bits-ip-command-on-64bits-host
> 
> All the discussions end in the conclusion that xfrm should have a full
> compatible layer to correctly work with 32-bit applications on 64-bit
> kernels:
> https://lkml.org/lkml/2017/1/23/413
> https://patchwork.ozlabs.org/patch/433279/
> 
> In some recent lkml discussion, Linus said that it's worth to fix this
> problem and not giving people an excuse to stay on 32-bit kernel:
> https://lkml.org/lkml/2018/2/13/752
> 
> There is also an selftest for ipsec tunnels.
> It doesn't depend on any library and compat version can be easy
> build with: make CFLAGS=-m32 net/ipsec
> 
> Patches as a .git branch:
> https://github.com/0x7f454c46/linux/tree/xfrm-compat-v2
> 
> [1]: https://lkml.kernel.org/r/20180726023144.31066-1-d...@arista.com

Thanks for the patches, looks good!

Please fix the issue reported from 'kernel test robot' and resend.

Thanks!

Re: [PATCH v2] padata: add another maintainer and another list

2020-08-29 Thread Steffen Klassert

On Thu, Aug 27, 2020 at 09:53:28PM -0400, Daniel Jordan wrote:
> At Steffen's request, I'll help maintain padata for the foreseeable
> future.
> 
> While at it, let's have patches go to lkml too since the code is now
> used outside of crypto.
> 
> Signed-off-by: Daniel Jordan 

Great, thanks a lot Daniel!

Acked-by: Steffen Klassert

Re: [PATCH] padata: add a reviewer

2020-08-27 Thread Steffen Klassert

On Wed, Aug 26, 2020 at 10:59:23AM -0400, Daniel Jordan wrote:
> I volunteer to review padata changes for the foreseeable future.
> 
> Signed-off-by: Daniel Jordan 
> Cc: Herbert Xu 
> Cc: Steffen Klassert 
> Cc: linux-cry...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  MAINTAINERS | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 3b186ade3597..1481d47cfd75 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -13024,6 +13024,7 @@ F:lib/packing.c
>  
>  PADATA PARALLEL EXECUTION MECHANISM
>  M:   Steffen Klassert 
> +R:   Daniel Jordan 

Please also consider to add yourself as one of the maintainers.

Otherwise:

Acked-by: Steffen Klassert 

Thanks!

Re: [PATCH net-next] ip_vti: Fix unused variable warning

2020-08-10 Thread Steffen Klassert

On Fri, Jul 31, 2020 at 02:49:52PM +0800, YueHaibing wrote:
> If CONFIG_INET_XFRM_TUNNEL is set but CONFIG_IPV6 is n,
> 
> net/ipv4/ip_vti.c:493:27: warning: 'vti_ipip6_handler' defined but not used 
> [-Wunused-variable]
> 
> Signed-off-by: YueHaibing 

Now applied to the ipsec tree, thanks!

Re: [PATCH net-next] ip_vti: Fix unused variable warning

2020-08-03 Thread Steffen Klassert

On Mon, Aug 03, 2020 at 03:13:49PM -0700, David Miller wrote:
> From: YueHaibing 
> Date: Fri, 31 Jul 2020 14:49:52 +0800
> 
> > If CONFIG_INET_XFRM_TUNNEL is set but CONFIG_IPV6 is n,
> > 
> > net/ipv4/ip_vti.c:493:27: warning: 'vti_ipip6_handler' defined but not used 
> > [-Wunused-variable]
> > 
> > Signed-off-by: YueHaibing 
> 
> Steffen, please pick this up if you haven't already.

I still have this one in my queue, it came in after
I did the the ipsec-next pull request last week.
Now the 5.8 release was inbetween, so it should go
to the ipsec tree. I'm waiting until I can backmerge
the offending patch into the ipsec tree and apply it
then.

Alternatively to speed things up, you can take it
directly into net-next before you do the pull request
to Linus. In case you prefer that:

Acked-by: Steffen Klassert

Re: [PATCH v2] net: ipv6: fix use-after-free Read in __xfrm6_tunnel_spi_lookup

2020-07-27 Thread Steffen Klassert

On Sat, Jul 25, 2020 at 10:35:12PM -0700, Cong Wang wrote:
> On Sat, Jul 25, 2020 at 8:09 PM B K Karthik  wrote:
> > @@ -103,10 +103,10 @@ static int __xfrm6_tunnel_spi_check(struct net *net, 
> > u32 spi)
> >  {
> > struct xfrm6_tunnel_net *xfrm6_tn = xfrm6_tunnel_pernet(net);
> > struct xfrm6_tunnel_spi *x6spi;
> > -   int index = xfrm6_tunnel_spi_hash_byspi(spi);
> > +   int index = xfrm6_tunnel_spi_hash_byaddr((const xfrm_address_t 
> > *)spi);
> >
> > hlist_for_each_entry(x6spi,
> > -_tn->spi_byspi[index],
> > +_tn->spi_byaddr[index],
> >  list_byspi) {
> > if (x6spi->spi == spi)
> 
> How did you convince yourself this is correct? This lookup is still
> using spi. :)

This can not be correct, it takes a u32 spi value and casts it to a
pointer to an IP address.

Re: [PATCH] net: xfrm: xfrm_policy.c: remove some unnecessary cases in decode_session6

2020-07-27 Thread Steffen Klassert

On Sat, Jul 25, 2020 at 07:19:49PM +0530, B K Karthik wrote:
> remove some unnecessary cases in decode_session6
> 
> Signed-off-by: B K Karthik 
> ---
>  net/xfrm/xfrm_policy.c | 4 
>  1 file changed, 4 deletions(-)
> 
> diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
> index 19c5e0fa3f44..e1c988a89382 100644
> --- a/net/xfrm/xfrm_policy.c
> +++ b/net/xfrm/xfrm_policy.c
> @@ -3449,10 +3449,6 @@ decode_session6(struct sk_buff *skb, struct flowi *fl, 
> bool reverse)
>   fl6->flowi6_proto = nexthdr;
>   return;
>  #endif
> - /* XXX Why are there these headers? */
> - case IPPROTO_AH:
> - case IPPROTO_ESP:
> - case IPPROTO_COMP:
>   default:
>   fl6->fl6_ipsec_spi = 0;
>   fl6->flowi6_proto = nexthdr;

IPv4 implements spi parsing for these protocols, IPv6
does not do it. Before you just remove something, you
should think about which is the correct behaviour and
then do it for both, IPv4 and IPv6.

Re: [PATCH v2] af_key: pfkey_dump needs parameter validation

2020-07-23 Thread Steffen Klassert

On Wed, Jul 22, 2020 at 04:00:53AM -0700, Mark Salyzyn wrote:
> In pfkey_dump() dplen and splen can both be specified to access the
> xfrm_address_t structure out of bounds in__xfrm_state_filter_match()
> when it calls addr_match() with the indexes.  Return EINVAL if either
> are out of range.
> 
> Signed-off-by: Mark Salyzyn 
> Cc: net...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: kernel-t...@android.com
> Cc: Steffen Klassert 
> Cc: Herbert Xu 
> Cc: "David S. Miller" 
> Cc: Jakub Kicinski 
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")

Applied, thanks a lot!

Re: af_key: pfkey_dump needs parameter validation

2020-07-22 Thread Steffen Klassert

On Wed, Jul 22, 2020 at 03:20:59AM -0700, Mark Salyzyn wrote:
> On 7/22/20 2:33 AM, Steffen Klassert wrote:
> > On Tue, Jul 21, 2020 at 06:23:54AM -0700, Mark Salyzyn wrote:
> > > In pfkey_dump() dplen and splen can both be specified to access the
> > > xfrm_address_t structure out of bounds in__xfrm_state_filter_match()
> > > when it calls addr_match() with the indexes.  Return EINVAL if either
> > > are out of range.
> > > 
> > > Signed-off-by: Mark Salyzyn 
> > > Cc: net...@vger.kernel.org
> > > Cc: linux-kernel@vger.kernel.org
> > > Cc: kernel-t...@android.com
> > > ---
> > > Should be back ported to the stable queues because this is a out of
> > > bounds access.
> > Please do a v2 and add a proper 'Fixes' tag if this is a fix that
> > needs to be backported.
> > 
> > Thanks!
> 
> Confused because this code was never right? From 2008 there was a rewrite
> that instantiated this fragment of code so that it could handle
> continuations for overloaded receive queues, but it was not right before the
> adjustment.
> 
> Fixes: 83321d6b9872b94604e481a79dc2c8acbe4ece31 ("[AF_KEY]: Dump SA/SP
> entries non-atomically")
> 
> that is reaching back more than 12 years and the blame is poorly aimed
> AFAIK.

This is just that the stable team knows how far they need to backport
it. If this was never right, then the initial git commit is the right
one for the fixes tag e.g. 'Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")'

Re: af_key: pfkey_dump needs parameter validation

2020-07-22 Thread Steffen Klassert

On Tue, Jul 21, 2020 at 06:23:54AM -0700, Mark Salyzyn wrote:
> In pfkey_dump() dplen and splen can both be specified to access the
> xfrm_address_t structure out of bounds in__xfrm_state_filter_match()
> when it calls addr_match() with the indexes.  Return EINVAL if either
> are out of range.
> 
> Signed-off-by: Mark Salyzyn 
> Cc: net...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: kernel-t...@android.com
> ---
> Should be back ported to the stable queues because this is a out of
> bounds access.

Please do a v2 and add a proper 'Fixes' tag if this is a fix that
needs to be backported.

Thanks!

Re: mmotm 2020-07-20-19-06 uploaded (net/ipv6/ip6_vti.o)

2020-07-21 Thread Steffen Klassert

On Mon, Jul 20, 2020 at 11:09:34PM -0700, Randy Dunlap wrote:
> On 7/20/20 7:07 PM, Andrew Morton wrote:
> > The mm-of-the-moment snapshot 2020-07-20-19-06 has been uploaded to
> > 
> >http://www.ozlabs.org/~akpm/mmotm/
> > 
> > mmotm-readme.txt says
> > 
> > README for mm-of-the-moment:
> > 
> > http://www.ozlabs.org/~akpm/mmotm/
> > 
> > This is a snapshot of my -mm patch queue.  Uploaded at random hopefully
> > more than once a week.
> > 
> > You will need quilt to apply these patches to the latest Linus release (5.x
> > or 5.x-rcY).  The series file is in broken-out.tar.gz and is duplicated in
> > http://ozlabs.org/~akpm/mmotm/series
> > 
> 
> 
> on i386:
> 
> ld: net/ipv6/ip6_vti.o: in function `vti6_rcv_tunnel':
> ip6_vti.c:(.text+0x2d11): undefined reference to `xfrm6_tunnel_spi_lookup'

Thanks for the report!

I've applied a fix to ipsec-next just a minute ago.

Re: KASAN: slab-out-of-bounds Read in __xfrm6_tunnel_spi_lookup

2020-07-16 Thread Steffen Klassert

On Wed, Jul 15, 2020 at 05:18:36PM +0800, Xin Long wrote:
> Hi, Steffen,
> 
> I've confirmed the patchset I posted yesterday would fix this:
> 
> [PATCH ipsec-next 0/3] xfrm: not register one xfrm(6)_tunnel object twice

Thanks for the confirmation! That patchset is now applied
to ipsec-next.

Re: KASAN: slab-out-of-bounds Read in __xfrm6_tunnel_spi_lookup

2020-07-14 Thread Steffen Klassert

Xin,

this looks a bit like it was introduced with one of your recent
patches. Can you please look into that?

Thanks!

On Mon, Jul 13, 2020 at 03:04:16PM -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:be978f8f Add linux-next specific files for 20200713
> git tree:   linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=1225f8c710
> kernel config:  https://syzkaller.appspot.com/x/.config?x=3fe4fccb94cbc1a6
> dashboard link: https://syzkaller.appspot.com/bug?extid=ea9832f8ae588deb0205
> compiler:   gcc (GCC) 10.1.0-syz 20200507
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=1727071310
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=126c0ffb10
> 
> Bisection is inconclusive: the first bad commit could be any of:
> 
> 08622869 ip6_vti: support IP6IP6 tunnel processing with .cb_handler
> e6ce6457 ip_vti: support IPIP6 tunnel processing
> 2ab110cb ip6_vti: support IP6IP tunnel processing
> 87e66b96 ip_vti: support IPIP tunnel processing with .cb_handler
> 86afc703 tunnel6: add tunnel6_input_afinfo for ipip and ipv6 tunnels
> d5a7a505 ipcomp: assign if_id to child tunnel from parent tunnel
> 6df2db5d tunnel4: add cb_handler to struct xfrm_tunnel
> d7b360c2 xfrm: interface: support IP6IP6 and IP6IP tunnels processing with 
> .cb_handler
> 1475ee0a xfrm: add is_ipip to struct xfrm_input_afinfo
> da9bbf05 xfrm: interface: support IPIP and IPIP6 tunnels processing with 
> .cb_handler
> 2d4c7986 Merge remote-tracking branch 'origin/testing'
> 428d2459 xfrm: introduce oseq-may-wrap flag
> bdf0acad Merge remote-tracking branch 'ipsec-next/master'
> 
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=137de95d10
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+ea9832f8ae588deb0...@syzkaller.appspotmail.com
> 
> ==
> BUG: KASAN: slab-out-of-bounds in __xfrm6_tunnel_spi_lookup+0x3a9/0x3b0 
> net/ipv6/xfrm6_tunnel.c:79
> Read of size 8 at addr 88809a0d6b80 by task syz-executor016/7061
> CPU: 1 PID: 7061 Comm: syz-executor016 Not tainted 
> 5.8.0-rc4-next-20200713-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 01/01/2011
> Call Trace:
>  
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x18f/0x20d lib/dump_stack.c:118
>  print_address_description.constprop.0.cold+0xae/0x497 mm/kasan/report.c:383
>  __kasan_report mm/kasan/report.c:513 [inline]
>  kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
>  __xfrm6_tunnel_spi_lookup+0x3a9/0x3b0 net/ipv6/xfrm6_tunnel.c:79
>  xfrm6_tunnel_spi_lookup+0x8a/0x1d0 net/ipv6/xfrm6_tunnel.c:95
>  xfrmi6_rcv_tunnel+0xb9/0x100 net/xfrm/xfrm_interface.c:810
>  tunnel46_rcv+0xef/0x2b0 net/ipv6/tunnel6.c:193
>  ip6_protocol_deliver_rcu+0x2e8/0x1670 net/ipv6/ip6_input.c:433
>  ip6_input_finish+0x7f/0x160 net/ipv6/ip6_input.c:474
>  NF_HOOK include/linux/netfilter.h:307 [inline]
>  NF_HOOK include/linux/netfilter.h:301 [inline]
>  ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:483
>  ip6_mc_input+0x411/0xea0 net/ipv6/ip6_input.c:577
>  dst_input include/net/dst.h:449 [inline]
>  ip6_rcv_finish net/ipv6/ip6_input.c:76 [inline]
>  NF_HOOK include/linux/netfilter.h:307 [inline]
>  NF_HOOK include/linux/netfilter.h:301 [inline]
>  ipv6_rcv+0x28e/0x3c0 net/ipv6/ip6_input.c:307
>  __netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5287
>  __netif_receive_skb+0x27/0x1c0 net/core/dev.c:5401
>  process_backlog+0x28d/0x7f0 net/core/dev.c:6245
>  napi_poll net/core/dev.c:6690 [inline]
>  net_rx_action+0x4a1/0xe80 net/core/dev.c:6760
>  __do_softirq+0x34c/0xa60 kernel/softirq.c:292
>  asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:706
>  
>  __run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline]
>  run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline]
>  do_softirq_own_stack+0x111/0x170 arch/x86/kernel/irq_64.c:77
>  do_softirq kernel/softirq.c:337 [inline]
>  do_softirq+0x16b/0x1e0 kernel/softirq.c:324
>  netif_rx_ni+0x3c5/0x650 net/core/dev.c:4836
>  dev_loopback_xmit+0x204/0x590 net/core/dev.c:3852
>  NF_HOOK include/linux/netfilter.h:307 [inline]
>  NF_HOOK include/linux/netfilter.h:301 [inline]
>  ip6_finish_output2+0x108f/0x17b0 net/ipv6/ip6_output.c:81
>  ip6_fragment+0xbdb/0x2490 net/ipv6/ip6_output.c:920
>  __ip6_finish_output net/ipv6/ip6_output.c:141 [inline]
>  __ip6_finish_output+0x578/0xab0 net/ipv6/ip6_output.c:128
>  ip6_finish_output+0x34/0x1f0 net/ipv6/ip6_output.c:153
>  NF_HOOK_COND include/linux/netfilter.h:296 [inline]
>  ip6_output+0x1db/0x520 net/ipv6/ip6_output.c:176
>  dst_output include/net/dst.h:443 [inline]
>  ip6_local_out+0xaf/0x1a0 net/ipv6/output_core.c:179
>  ip6_send_skb+0xb7/0x340 net/ipv6/ip6_output.c:1865
>  ip6_push_pending_frames+0xbd/0xe0 net/ipv6/ip6_output.c:1885
>  rawv6_push_pending_frames net/ipv6/raw.c:613 [inline]
>

Re: [PATCH net-next v2] xfrm: introduce oseq-may-wrap flag

2020-06-25 Thread Steffen Klassert

On Sat, May 30, 2020 at 02:39:12PM +0200, Petr Vaněk wrote:
> RFC 4303 in section 3.3.3 suggests to disable anti-replay for manually
> distributed ICVs in which case the sender does not need to monitor or
> reset the counter. However, the sender still increments the counter and
> when it reaches the maximum value, the counter rolls over back to zero.
> 
> This patch introduces new extra_flag XFRM_SA_XFLAG_OSEQ_MAY_WRAP which
> allows sequence number to cycle in outbound packets if set. This flag is
> used only in legacy and bmp code, because esn should not be negotiated
> if anti-replay is disabled (see note in 3.3.3 section).
> 
> Signed-off-by: Petr Vaněk 

Now applied to ipsec-next, thanks a lot!

Re: linux-next: manual merge of the ipsec-next tree with Linus' tree

2020-06-16 Thread Steffen Klassert

On Tue, Jun 16, 2020 at 07:39:30AM -0600, David Ahern wrote:
> > 
> > Indeed. I must have been looking at -net. Both -net and -net-next have
> > it conditional, so yes a fixup patch is needed.
> > 
> 
> I see that both net and net-next still have the conditional in xfrm_output:
> 
> #ifdef CONFIG_NETFILTER
> IPCB(skb)->flags |= IPSKB_XFRM_TRANSFORMED;
> #endif
> break;
> case AF_INET6:
> memset(IP6CB(skb), 0, sizeof(*IP6CB(skb)));
> 
> #ifdef CONFIG_NETFILTER
> IP6CB(skb)->flags |= IP6SKB_XFRM_TRANSFORMED;
> #endif
> 
> Did you submit the merge fix? If not, I can do it today.

I still have it in the ipsec tree, I'll do a pull request
this week. The fixup will go to the net tree then. It should
be already in linux-next.

Re: linux-next: manual merge of the ipsec-next tree with Linus' tree

2020-06-05 Thread Steffen Klassert

On Thu, Jun 04, 2020 at 06:44:10AM -0600, David Ahern wrote:
> On 6/4/20 12:41 AM, Steffen Klassert wrote:
> > On Wed, Jun 03, 2020 at 08:55:01PM -0600, David Ahern wrote:
> >> On 6/3/20 7:26 PM, Stephen Rothwell wrote:
> >>>
> >>> And now the net-next tree has been merged into Linus' tree without this 
> >>> fix :-(
> >>>
> >>
> >> I took a look earlier and I think it is fine. Some code was moved around
> >> in ipsec-next and I think the merge is good. I'll run the test cases
> >> later this week and double check. Thanks for the reminder
> > 
> > The setting of XFRM_TRANSFORMED moved to xfrm_output() and depends
> > on CONFIG_NETFILTER. So I think the fix is needed. After the merge
> > of the net tree today, I have both conflicting patches patches in
> > the ipsec tree. I'd apply the fix from Stephen unless you say
> > it is not needed.
> > 
> 
> Indeed. I must have been looking at -net. Both -net and -net-next have
> it conditional, so yes a fixup patch is needed.

The fixup patch from Stephen is now applied to the ipsec tree.

Re: linux-next: manual merge of the ipsec-next tree with Linus' tree

2020-06-04 Thread Steffen Klassert

On Wed, Jun 03, 2020 at 08:55:01PM -0600, David Ahern wrote:
> On 6/3/20 7:26 PM, Stephen Rothwell wrote:
> > 
> > And now the net-next tree has been merged into Linus' tree without this fix 
> > :-(
> > 
> 
> I took a look earlier and I think it is fine. Some code was moved around
> in ipsec-next and I think the merge is good. I'll run the test cases
> later this week and double check. Thanks for the reminder

The setting of XFRM_TRANSFORMED moved to xfrm_output() and depends
on CONFIG_NETFILTER. So I think the fix is needed. After the merge
of the net tree today, I have both conflicting patches patches in
the ipsec tree. I'd apply the fix from Stephen unless you say
it is not needed.

Re: [PATCH v2] xfrm: policy: Fix xfrm policy match

2020-05-19 Thread Steffen Klassert

On Fri, May 15, 2020 at 04:39:57PM +0800, Yuehaibing wrote:
> 
> Friendly ping...
> 
> Any plan for this issue?

There was still no consensus between you and Xin on how
to fix this issue. Once this happens, I consider applying
a fix.

Re: [PATCH] xfrm : lock input tasklet skb queue

2019-10-21 Thread Steffen Klassert

On Sun, Oct 20, 2019 at 08:46:10AM -0700, Tom Rix wrote:
> On PREEMPT_RT_FULL while running netperf, a corruption
> of the skb queue causes an oops.
> 
> This appears to be caused by a race condition here
> __skb_queue_tail(>queue, skb);
> tasklet_schedule(>tasklet);
> Where the queue is changed before the tasklet is locked by
> tasklet_schedule.
> 
> The fix is to use the skb queue lock.
> 
> Signed-off-by: Tom Rix 
> ---
>  net/xfrm/xfrm_input.c | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
> index 9b599ed66d97..226dead86828 100644
> --- a/net/xfrm/xfrm_input.c
> +++ b/net/xfrm/xfrm_input.c
> @@ -758,12 +758,16 @@ static void xfrm_trans_reinject(unsigned long data)
>  struct xfrm_trans_tasklet *trans = (void *)data;
>  struct sk_buff_head queue;
>  struct sk_buff *skb;
> +unsigned long flags;
> 
>  __skb_queue_head_init();
> +spin_lock_irqsave(>queue.lock, flags);
>  skb_queue_splice_init(>queue, );
> 
>  while ((skb = __skb_dequeue()))
>  XFRM_TRANS_SKB_CB(skb)->finish(dev_net(skb->dev), NULL, skb);
> +
> +spin_unlock_irqrestore(>queue.lock, flags);
>  }
> 
>  int xfrm_trans_queue(struct sk_buff *skb,
> @@ -771,15 +775,20 @@ int xfrm_trans_queue(struct sk_buff *skb,
> struct sk_buff *))
>  {
>  struct xfrm_trans_tasklet *trans;
> +unsigned long flags;
> 
>  trans = this_cpu_ptr(_trans_tasklet);
> +spin_lock_irqsave(>queue.lock, flags);

As you can see above 'trans' is per cpu, so a spinlock
is not needed here. Also this does not run in hard
interrupt context, so irqsave is also not needed.
I don't see how this can fix anything.

Can you please explain that race a bit more detailed?

Re: [BUG] net: xfrm: possible null-pointer dereferences in xfrm_policy()

2019-07-29 Thread Steffen Klassert

On Mon, Jul 29, 2019 at 11:43:49AM +0800, Jia-Ju Bai wrote:
> In xfrm_policy(), the while loop on lines 3802-3830 ends when dst->xfrm is
> NULL.

We don't have a xfrm_policy() function, and as said already the
line numbers does not help much as long as you don't say which
tree/branch this is and which commit is the head commit.

> Then, dst->xfrm is used on line 3840:
>     xfrm_state_mtu(dst->xfrm, mtu);
>     if (x->km.state != XFRM_STATE_VALID...)
>     aead = x->data;
> 
> Thus, possible null-pointer dereferences may occur.

I guess you refer to xfrm_bundle_ok(). The dst pointer
is reoaded after the loop, so the dereferenced pointer
is not the one that had NULL at dst->xfrm.

> 
> These bugs are found by a static analysis tool STCheck written by us.
> 
> I do not know how to correctly fix these bugs, so I only report them.

I'd suggest you to manually review the reports of your
tool and to fix the tool accordingly.

Re: [PATCH] net: key: af_key: Fix possible null-pointer dereferences in pfkey_send_policy_notify()

2019-07-27 Thread Steffen Klassert

On Fri, Jul 26, 2019 at 09:15:55PM +0100, Jeremy Sowden wrote:
> On 2019-07-26, at 11:45:14 +0200, Steffen Klassert wrote:
> > On Wed, Jul 24, 2019 at 05:35:09PM +0800, Jia-Ju Bai wrote:
> > >
> > > diff --git a/net/key/af_key.c b/net/key/af_key.c
> > > index b67ed3a8486c..ced54144d5fd 100644
> > > --- a/net/key/af_key.c
> > > +++ b/net/key/af_key.c
> > > @@ -3087,6 +3087,8 @@ static int pfkey_send_policy_notify(struct 
> > > xfrm_policy *xp, int dir, const struc
> > >   case XFRM_MSG_DELPOLICY:
> > >   case XFRM_MSG_NEWPOLICY:
> > >   case XFRM_MSG_UPDPOLICY:
> > > + if (!xp)
> > > + break;
> >
> > I think this can not happen. Who sends one of these notifications
> > without a pointer to the policy?
> 
> I had a quick grep and found two places where km_policy_notify is passed
> NULL as the policy:
> 
>   $ grep -rn '\   net/xfrm/xfrm_user.c:2154:  km_policy_notify(NULL, 0, );
>   net/key/af_key.c:2788:  km_policy_notify(NULL, 0, );
> 
> They occur in xfrm_flush_policy() and pfkey_spdflush() respectively.

Yes, but these two send a XFRM_MSG_FLUSHPOLICY notify.
This does not trigger the code that is changed here.

Re: [PATCH] net: key: af_key: Fix possible null-pointer dereferences in pfkey_send_policy_notify()

2019-07-26 Thread Steffen Klassert

On Wed, Jul 24, 2019 at 05:35:09PM +0800, Jia-Ju Bai wrote:
> In pfkey_send_policy_notify(), there is an if statement on line 3081 to
> check whether xp is NULL:
> if (xp && xp->type != XFRM_POLICY_TYPE_MAIN)
> 
> When xp is NULL, it is used by key_notify_policy() on line 3090:
> key_notify_policy(xp, ...)
> pfkey_xfrm_policy2msg_prep(xp) -- line 2211
> pfkey_xfrm_policy2msg_size(xp) -- line 2046
> for (i=0; ixfrm_nr; i++) -- line 2026
> t = xp->xfrm_vec + i; -- line 2027
> key_notify_policy(xp, ...)
> xp_net(xp) -- line 2231
> return read_pnet(>xp_net); -- line 534

Please don't quote random code lines, explain the
problem instead.

> 
> Thus, possible null-pointer dereferences may occur.
> 
> To fix these bugs, xp is checked before calling key_notify_policy().
> 
> These bugs are found by a static analysis tool STCheck written by us.
> 
> Signed-off-by: Jia-Ju Bai 
> ---
>  net/key/af_key.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/net/key/af_key.c b/net/key/af_key.c
> index b67ed3a8486c..ced54144d5fd 100644
> --- a/net/key/af_key.c
> +++ b/net/key/af_key.c
> @@ -3087,6 +3087,8 @@ static int pfkey_send_policy_notify(struct xfrm_policy 
> *xp, int dir, const struc
>   case XFRM_MSG_DELPOLICY:
>   case XFRM_MSG_NEWPOLICY:
>   case XFRM_MSG_UPDPOLICY:
> + if (!xp)
> + break;

I think this can not happen. Who sends one of these notifications
without a pointer to the policy?

Re: [PATCH] padata: use smp_mb in padata_reorder to avoid orphaned padata jobs

2019-07-12 Thread Steffen Klassert

On Fri, Jul 12, 2019 at 06:06:36PM +0800, Herbert Xu wrote:
> Daniel Jordan  wrote:
> >
> > CPU0 CPU1
> > 
> > padata_reorder   padata_do_serial
> >  LOAD reorder_objects  // 0
> >   INC reorder_objects  // 1
> >   padata_reorder
> > TRYLOCK pd->lock   // failed
> >  UNLOCK pd->lock
> 
> I think this can't happen because CPU1 won't call padata_reorder
> at all as it's the wrong CPU so it gets pushed back onto a work
> queue which will go back to CPU0.
> 
> Steffen, could you please take a look at this as there clearly
> is a problem here?

I'm currently travelling, will have a look at it when I'm back.

Re: [PATCH] [v2] ipsec: select crypto ciphers for xfrm_algo

2019-06-24 Thread Steffen Klassert

On Tue, Jun 18, 2019 at 01:22:13PM +0200, Arnd Bergmann wrote:
> kernelci.org reports failed builds on arc because of what looks
> like an old missed 'select' statement:
> 
> net/xfrm/xfrm_algo.o: In function `xfrm_probe_algs':
> xfrm_algo.c:(.text+0x1e8): undefined reference to `crypto_has_ahash'
> 
> I don't see this in randconfig builds on other architectures, but
> it's fairly clear we want to select the hash code for it, like we
> do for all its other users. As Herbert points out, CRYPTO_BLKCIPHER
> is also required even though it has not popped up in build tests.
> 
> Fixes: 17bc19702221 ("ipsec: Use skcipher and ahash when probing algorithms")
> Signed-off-by: Arnd Bergmann 

Applied, thanks a lot!

Re: [PATCH] af_key: Fix memory leak in key_notify_policy.

2019-06-14 Thread Steffen Klassert

On Fri, Jun 14, 2019 at 04:26:26PM +0800, Young Xiao wrote:
> We leak the allocated out_skb in case pfkey_xfrm_policy2msg() fails.
> Fix this by freeing it on error.
> 
> Signed-off-by: Young Xiao <92siuy...@gmail.com>
> ---
>  net/key/af_key.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/net/key/af_key.c b/net/key/af_key.c
> index 4af1e1d..ec414f6 100644
> --- a/net/key/af_key.c
> +++ b/net/key/af_key.c
> @@ -2443,6 +2443,7 @@ static int key_pol_get_resp(struct sock *sk, struct 
> xfrm_policy *xp, const struc
>   }
>   err = pfkey_xfrm_policy2msg(out_skb, xp, dir);
>   if (err < 0)
> + kfree_skb(out_skb);
>   goto out;

Did you test this?

You need to add braces, otherwise 'goto out' will happen unconditionally.

>  
>   out_hdr = (struct sadb_msg *) out_skb->data;
> @@ -2695,6 +2696,7 @@ static int dump_sp(struct xfrm_policy *xp, int dir, int 
> count, void *ptr)
>  
>   err = pfkey_xfrm_policy2msg(out_skb, xp, dir);
>   if (err < 0)
> + kfree_skb(out_skb);
>   return err;

Same here.

Re: [PATCH][next] xfrm: fix missing break on AF_INET6 case

2019-06-12 Thread Steffen Klassert

On Wed, Jun 12, 2019 at 11:36:24AM +0100, Colin King wrote:
> From: Colin Ian King 
> 
> It appears that there is a missing break statement for the AF_INET6 case
> that falls through to the default WARN_ONCE case. I don't think that is
> intentional. Fix this by adding in the missing break.
> 
> Addresses-Coverity: ("Missing break in switch")
> Fixes: 4c203b0454b5 ("xfrm: remove eth_proto value from xfrm_state_afinfo")
> Signed-off-by: Colin Ian King 

I have already a patch from Florian in queue to fix this.

Thanks anyway!

Re: [PATCH net] xfrm: Fix xfrm sel prefix length validation

2019-05-28 Thread Steffen Klassert

On Wed, May 22, 2019 at 11:17:00AM +0800, Herbert Xu wrote:
> On Tue, May 21, 2019 at 08:59:47PM +0530, Anirudh Gupta wrote:
> > Family of src/dst can be different from family of selector src/dst.
> > Use xfrm selector family to validate address prefix length,
> > while verifying new sa from userspace.
> > 
> > Validated patch with this command:
> > ip xfrm state add src 1.1.6.1 dst 1.1.6.2 proto esp spi 4260196 \
> > reqid 20004 mode tunnel aead "rfc4106(gcm(aes))" \
> > 0x01640001 128 \
> > sel src 1011:1:4::2/128 sel dst 1021:1:4::2/128 dev Port5
> > 
> > Fixes: 07bf7908950a ("xfrm: Validate address prefix lengths in the xfrm 
> > selector.")
> > Signed-off-by: Anirudh Gupta 
> 
> Acked-by: Herbert Xu 

Patch applied, thanks everyone!

Re: [PATCH] xfrm: Reset secpath in xfrm failure

2019-03-06 Thread Steffen Klassert

On Wed, Mar 06, 2019 at 04:33:08PM +0900, Myungho Jung wrote:
> In esp4_gro_receive() and esp6_gro_receive(), secpath can be allocated
> without adding xfrm state to xvec. Then, sp->xvec[sp->len - 1] would
> fail and result in dereferencing invalid pointer in esp4_gso_segment()
> and esp6_gso_segment(). Reset secpath if xfrm function returns error.
> 
> Reported-by: syzbot+b69368fd933c6c592...@syzkaller.appspotmail.com
> Signed-off-by: Myungho Jung 

The patch itself looks ok, but please add a 'Fixes' tag to
the commit message.

Thanks!

Re: [PATCH v2] net: xfrm: Add '_rcu' tag for rcu protected pointer in netns_xfrm

2019-03-06 Thread Steffen Klassert

On Mon, Mar 04, 2019 at 08:19:14PM -0500, Su Yanjun wrote:
> For rcu protected pointers, we'd better add '__rcu' for them.
> 
> Once added '__rcu' tag for rcu protected pointer, the sparse tool reports
> warnings.
> 
> net/xfrm/xfrm_user.c:1198:39: sparse:expected struct sock *sk
> net/xfrm/xfrm_user.c:1198:39: sparse:got struct sock [noderef]  
> *nlsk
> [...]
> 
> So introduce a new wrapper function of nlmsg_unicast  to handle type
> conversions.
> 
> No functional change.

While that was true for v1 of that patch, it is not
true for this version. This fixes a direct access
of a rcu protected socket. So please add a proper
'Fixes' tag.

Re: [PATCH] net: xfrm: Fix potential oops in xfrm_user_rcv_msg and array out of bounds

2019-03-04 Thread Steffen Klassert

On Tue, Mar 05, 2019 at 03:08:49PM +0800, Su Yanjun  
wrote:
> On 2019/3/5 14:49, Herbert Xu wrote:
> 
> > On Sun, Mar 03, 2019 at 10:47:39PM -0500, Su Yanjun wrote:
> > > When i review xfrm_user.c code, i found some potentical bug in it.
> > > 
> > > In xfrm_user_rcvmsg if type parameter from user space is set to
> > > XFRM_MSG_MAX or  XFRM_MSG_NEWSADINFO or XFRM_MSG_NEWSPDINFO. It will cause
> > > xfrm_user_rcv_msg  referring to null entry in xfrm_dispatch array.
> > > 
> > > Signed-off-by: Su Yanjun 
> > > ---
> > >   net/xfrm/xfrm_user.c | 4 +++-
> > >   1 file changed, 3 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
> > > index a131f9f..d832783 100644
> > > --- a/net/xfrm/xfrm_user.c
> > > +++ b/net/xfrm/xfrm_user.c
> > > @@ -2630,11 +2630,13 @@ static int xfrm_user_rcv_msg(struct sk_buff *skb, 
> > > struct nlmsghdr *nlh,
> > >   return -EOPNOTSUPP;
> > >   type = nlh->nlmsg_type;
> > > - if (type > XFRM_MSG_MAX)
> > > + if (type >= XFRM_MSG_MAX)
> > >   return -EINVAL;
> > Your patch is wrong.  Please check the definition of XFRM_MSG_MAX.
> 
> I see, thanks for your reply.
> 
>   type -= XFRM_MSG_BASE;
>   link = _dispatch[type];
> + if (!link)
> + return -EOPNOTSUPP;
> 
> Here **link** may refer to null entry for special types such as
> XFRM_MSG_MAX or  XFRM_MSG_NEWSADINFO or XFRM_MSG_NEWSPDINFO
> Am i miss something?

'link' is always a valid pointer into that array.

Re: [PATCH] xfrm: policy: Fix out-of-bound array accesses in __xfrm_policy_unlink

2019-03-04 Thread Steffen Klassert

On Thu, Feb 28, 2019 at 03:52:27PM +0800, Herbert Xu wrote:
> On Thu, Feb 28, 2019 at 03:18:59PM +0800, Yue Haibing wrote:
> > From: YueHaibing 
> > 
> > UBSAN report this:
> > 
> > UBSAN: Undefined behaviour in net/xfrm/xfrm_policy.c:1289:24
> > index 6 is out of range for type 'unsigned int [6]'
> > CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.4.162-514.55.6.9.x86_64+ #13
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 
> > 04/01/2014
> >   1466cf39b41b23c9 8801f6b07a58 81cb35f4
> >  41b58ab3 83230f9c 81cb34e0 8801f6b07a80
> >  8801f6b07a20 1466cf39b41b23c9 851706e0 8801f6b07ae8
> > Call Trace:
> >[] __dump_stack lib/dump_stack.c:15 [inline]
> >[] dump_stack+0x114/0x1a0 lib/dump_stack.c:51
> >  [] ubsan_epilogue+0x12/0x8f lib/ubsan.c:164
> >  [] __ubsan_handle_out_of_bounds+0x16e/0x1b2 
> > lib/ubsan.c:382
> >  [] __xfrm_policy_unlink+0x3dd/0x5b0 
> > net/xfrm/xfrm_policy.c:1289
> >  [] xfrm_policy_delete+0x52/0xb0 
> > net/xfrm/xfrm_policy.c:1309
> >  [] xfrm_policy_timer+0x30b/0x590 
> > net/xfrm/xfrm_policy.c:243
> >  [] call_timer_fn+0x237/0x990 kernel/time/timer.c:1144
> >  [] __run_timers kernel/time/timer.c:1218 [inline]
> >  [] run_timer_softirq+0x6ce/0xb80 kernel/time/timer.c:1401
> >  [] __do_softirq+0x299/0xe10 kernel/softirq.c:273
> >  [] invoke_softirq kernel/softirq.c:350 [inline]
> >  [] irq_exit+0x216/0x2c0 kernel/softirq.c:391
> >  [] exiting_irq arch/x86/include/asm/apic.h:652 [inline]
> >  [] smp_apic_timer_interrupt+0x8b/0xc0 
> > arch/x86/kernel/apic/apic.c:926
> >  [] apic_timer_interrupt+0xa5/0xb0 
> > arch/x86/entry/entry_64.S:735
> >[] ? native_safe_halt+0x6/0x10 
> > arch/x86/include/asm/irqflags.h:52
> >  [] arch_safe_halt arch/x86/include/asm/paravirt.h:111 
> > [inline]
> >  [] default_idle+0x27/0x430 arch/x86/kernel/process.c:446
> >  [] arch_cpu_idle+0x15/0x20 arch/x86/kernel/process.c:437
> >  [] default_idle_call+0x53/0x90 kernel/sched/idle.c:92
> >  [] cpuidle_idle_call kernel/sched/idle.c:156 [inline]
> >  [] cpu_idle_loop kernel/sched/idle.c:251 [inline]
> >  [] cpu_startup_entry+0x60d/0x9a0 kernel/sched/idle.c:299
> >  [] start_secondary+0x3c9/0x560 
> > arch/x86/kernel/smpboot.c:245
> > 
> > The issue is triggered as this:
> > 
> > xfrm_add_policy
> > -->verify_newpolicy_info  //check the index provided by user with 
> > XFRM_POLICY_MAX
> >   //In my case, the index is 0x6E6BB6, so it pass 
> > the check.
> > -->xfrm_policy_construct  //copy the user's policy and set 
> > xfrm_policy_timer
> > -->xfrm_policy_insert
> > --> __xfrm_policy_link //use the orgin dir, in my case is 2
> > --> xfrm_gen_index   //generate policy index, there is 0x6E6BB6
> > 
> > then xfrm_policy_timer be fired
> > 
> > xfrm_policy_timer
> >--> xfrm_policy_id2dir  //get dir from (policy index & 7), in my case is 
> > 6
> >--> xfrm_policy_delete
> >   --> __xfrm_policy_unlink //access policy_count[dir], trigger out of 
> > range access
> > 
> > Add xfrm_policy_id2dir check in verify_newpolicy_info, make sure the 
> > computed dir is
> > valid, to fix the issue.
> > 
> > Reported-by: Hulk Robot 
> > Fixes: e682adf021be ("xfrm: Try to honor policy index if it's supplied by 
> > user")
> > Signed-off-by: YueHaibing 
> > ---
> >  net/xfrm/xfrm_user.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Acked-by: Herbert Xu 

Applied, thanks everyone!

Re: [PATCH v2] vti4: Fix a ipip packet processing bug in 'IPCOMP' virtual tunnel

2019-01-10 Thread Steffen Klassert

On Sun, Jan 06, 2019 at 09:31:20PM -0500, Su Yanjun wrote:
> Recently we run a network test over ipcomp virtual tunnel.We find that
> if a ipv4 packet needs fragment, then the peer can't receive
> it.
> 
> We deep into the code and find that when packet need fragment the smaller
> fragment will be encapsulated by ipip not ipcomp. So when the ipip packet
> goes into xfrm, it's skb->dev is not properly set. The ipv4 reassembly code
> always set skb'dev to the last fragment's dev. After ipv4 defrag processing,
> when the kernel rp_filter parameter is set, the skb will be drop by -EXDEV
> error.
> 
> This patch adds compatible support for the ipip process in ipcomp virtual 
> tunnel.
> 
> Signed-off-by: Su Yanjun 

Patch applied, thanks!

Re: [PATCH] vti4: Fix a ipip packet processing bug in 'IPCOMP' virtual tunnel

2019-01-04 Thread Steffen Klassert

On Fri, Jan 04, 2019 at 04:42:05PM +0800, Su Yanjun  
wrote:
> On 1/4/2019 3:43 PM, Steffen Klassert wrote:
> 
> > On Thu, Jan 03, 2019 at 07:48:41AM -0500, Su Yanjun wrote:
> > > Recently we run a network test over ipcomp virtual tunnel.We find that
> > > if a ipv4 packet needs fragment, then the peer can't receive
> > > it.
> > > 
> > > We deep into the code and find that when packet need fragment the smaller
> > > fragment will be encapsulated by ipip not ipcomp. So when the ipip packet
> > > goes into xfrm, it's skb->dev is not properly set. The ipv4 reassembly 
> > > code
> > > always set skb'dev to the last fragment's dev. After ipv4 defrag 
> > > processing,
> > > when the kernel rp_filter parameter is set, the skb will be drop by -EXDEV
> > > error.
> > Why not just leaving rp_filter disabled or in 'loose mode' if you use 
> > ipcomp?
> In my option rp_filter should not affect the ip_vti functionality.
> The root cause is the origin tunnel code doesn't update skb->dev.
> vti only cares about ipcomp, esp, ah packets.
> > > This patch adds compatible support for the ipip process in ipcomp virtual 
> > > tunnel.
> > > 
> > > Signed-off-by: Su Yanjun 
> > > ---
> > >   net/ipv4/ip_vti.c | 25 -
> > >   1 file changed, 24 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
> > > index de31b30..63de2f6 100644
> > > --- a/net/ipv4/ip_vti.c
> > > +++ b/net/ipv4/ip_vti.c
> > > @@ -65,6 +65,9 @@ static int vti_input(struct sk_buff *skb, int nexthdr, 
> > > __be32 spi,
> > >   XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip4 = tunnel;
> > > + if (iph->protocol == IPPROTO_IPIP)
> > > + skb->dev = tunnel->dev;
> > > +
> > >   return xfrm_input(skb, nexthdr, spi, encap_type);
> > >   }
> > > @@ -76,10 +79,15 @@ static int vti_input(struct sk_buff *skb, int 
> > > nexthdr, __be32 spi,
> > >   static int vti_rcv(struct sk_buff *skb)
> > >   {
> > > + __be32 spi = 0;
> > > + 
> > >   XFRM_SPI_SKB_CB(skb)->family = AF_INET;
> > >   XFRM_SPI_SKB_CB(skb)->daddroff = offsetof(struct iphdr, daddr);
> > > + 
> > > + if (ip_hdr(skb)->protocol == IPPROTO_IPIP)
> > > + spi = ip_hdr(skb)->saddr;
> > > - return vti_input(skb, ip_hdr(skb)->protocol, 0, 0);
> > > + return vti_input(skb, ip_hdr(skb)->protocol, spi, 0);
> > You use the src address as spi, how is this supposed to work?
> > 
> This code derives from xfrm4_tunnel and i just want the vti can handle ipip
> packet as xfrm4 tunnel does.

Ok, I see what it is. ipcomp needs an extra SA to handle the 'not
compressed' packets. This SA uses the src address as the spi.

So this is ok, but please add separate handlers for this case.

Re: [PATCH] vti4: Fix a ipip packet processing bug in 'IPCOMP' virtual tunnel

2019-01-03 Thread Steffen Klassert

On Thu, Jan 03, 2019 at 07:48:41AM -0500, Su Yanjun wrote:
> Recently we run a network test over ipcomp virtual tunnel.We find that
> if a ipv4 packet needs fragment, then the peer can't receive
> it.
> 
> We deep into the code and find that when packet need fragment the smaller
> fragment will be encapsulated by ipip not ipcomp. So when the ipip packet
> goes into xfrm, it's skb->dev is not properly set. The ipv4 reassembly code
> always set skb'dev to the last fragment's dev. After ipv4 defrag processing,
> when the kernel rp_filter parameter is set, the skb will be drop by -EXDEV
> error.

Why not just leaving rp_filter disabled or in 'loose mode' if you use ipcomp?

> 
> This patch adds compatible support for the ipip process in ipcomp virtual 
> tunnel.
> 
> Signed-off-by: Su Yanjun 
> ---
>  net/ipv4/ip_vti.c | 25 -
>  1 file changed, 24 insertions(+), 1 deletion(-)
> 
> diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
> index de31b30..63de2f6 100644
> --- a/net/ipv4/ip_vti.c
> +++ b/net/ipv4/ip_vti.c
> @@ -65,6 +65,9 @@ static int vti_input(struct sk_buff *skb, int nexthdr, 
> __be32 spi,
>  
>   XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip4 = tunnel;
>  
> + if (iph->protocol == IPPROTO_IPIP)
> + skb->dev = tunnel->dev;
> +
>   return xfrm_input(skb, nexthdr, spi, encap_type);
>   }
>  
> @@ -76,10 +79,15 @@ static int vti_input(struct sk_buff *skb, int nexthdr, 
> __be32 spi,
>  
>  static int vti_rcv(struct sk_buff *skb)
>  {
> + __be32 spi = 0;
> + 
>   XFRM_SPI_SKB_CB(skb)->family = AF_INET;
>   XFRM_SPI_SKB_CB(skb)->daddroff = offsetof(struct iphdr, daddr);
> + 
> + if (ip_hdr(skb)->protocol == IPPROTO_IPIP)
> + spi = ip_hdr(skb)->saddr;
>  
> - return vti_input(skb, ip_hdr(skb)->protocol, 0, 0);
> + return vti_input(skb, ip_hdr(skb)->protocol, spi, 0);

You use the src address as spi, how is this supposed to work?

Re: [PATCH net-next] xfrm6_tunnel: Fix spi check in __xfrm6_tunnel_alloc_spi

2018-12-20 Thread Steffen Klassert

On Wed, Dec 19, 2018 at 02:45:09PM +0800, YueHaibing wrote:
> gcc warn this:
> 
> net/ipv6/xfrm6_tunnel.c:143 __xfrm6_tunnel_alloc_spi() warn:
>  always true condition '(spi <= 4294967295) => (0-u32max <= u32max)'
> 
> 'spi' is u32, which always not greater than XFRM6_TUNNEL_SPI_MAX
> because of wrap around. So the second forloop will never reach.
> 
> Signed-off-by: YueHaibing 

Also applied, thanks a lot!

Re: [PATCH] xfrm: clean an indentation issue, remove a space

2018-12-12 Thread Steffen Klassert

On Thu, Dec 06, 2018 at 05:52:28PM +, Colin King wrote:
> From: Colin Ian King 
> 
> Trivial fix to clean up indentation issue, remove an extraneous
> space.
> 
> Signed-off-by: Colin Ian King 

Applied, thanks!

Re: UBSAN: Undefined behaviour in ./include/net/xfrm.h

2018-07-19 Thread Steffen Klassert

On Fri, Jun 22, 2018 at 11:46:44PM +0800, air icy wrote:
> Hi,
> 
> static inline bool addr4_match(__be32 a1, __be32 a2, u8 prefixlen)
> {
> /* C99 6.5.7 (3): u32 << 32 is undefined behaviour */
> if (sizeof(long) == 4 && prefixlen == 0)
> return true;
> return !((a1 ^ a2) & htonl(~0UL << (32 - prefixlen)));
> }
> 
> 
> $ cat report0
> 
> UBSAN: Undefined behaviour in ./include/net/xfrm.h:894:23
> shift exponent -128 is negative

Looks like we don't validate the prefixlen of the address family
in the xfrm_selector.

> This bug can be repro, if you need the repro file, please tell me.

Can you send me your reproducer?

Re: UBSAN: Undefined behaviour in ./include/net/xfrm.h

2018-07-19 Thread Steffen Klassert

On Fri, Jun 22, 2018 at 11:46:44PM +0800, air icy wrote:
> Hi,
> 
> static inline bool addr4_match(__be32 a1, __be32 a2, u8 prefixlen)
> {
> /* C99 6.5.7 (3): u32 << 32 is undefined behaviour */
> if (sizeof(long) == 4 && prefixlen == 0)
> return true;
> return !((a1 ^ a2) & htonl(~0UL << (32 - prefixlen)));
> }
> 
> 
> $ cat report0
> 
> UBSAN: Undefined behaviour in ./include/net/xfrm.h:894:23
> shift exponent -128 is negative

Looks like we don't validate the prefixlen of the address family
in the xfrm_selector.

> This bug can be repro, if you need the repro file, please tell me.

Can you send me your reproducer?

Re: [PATCHv2] net/xfrm: Revert "[XFRM]: Do not add a state whose SPI is zero to the SPI hash."

2018-05-07 Thread Steffen Klassert

On Fri, May 04, 2018 at 02:20:09AM +0100, Dmitry Safonov wrote:
> This reverts commit 7b4dc3600e48 ("[XFRM]: Do not add a state whose SPI
> is zero to the SPI hash.").
> 
> Zero SPI is legal and defined for IPcomp.
> We shouldn't omit adding the state to SPI hash because it'll not be
> possible to delete or lookup for it afterward:
> __xfrm_state_insert() obviously doesn't add hash for zero
> SPI in xfrm.state_byspi, and xfrm_user_state_lookup() will fail as
> xfrm_state_lookup() does lookups by hash.
> 
> It also isn't possible to workaround from userspace as
> xfrm_id_proto_match() will be always true for ah/esp/comp protos.
> 
> v1 link: https://lkml.kernel.org/r/<20180502020220.2027-1-d...@arista.com>
> 
> Cc: Masahide NAKAMURA <na...@linux-ipv6.org>
> Cc: YOSHIFUJI Hideaki <yoshf...@linux-ipv6.org>
> Cc: Steffen Klassert <steffen.klass...@secunet.com>
> Cc: "David S. Miller" <da...@davemloft.net>
> Cc: net...@vger.kernel.org
> Suggested-by: Herbert Xu <herb...@gondor.apana.org.au>
> Signed-off-by: Dmitry Safonov <d...@arista.com>

This patch does much more than reverting the commit you mentioned.
It removes all the logic that is needed to handle larval SAs.

The result is a disaster, all connections that are negotiated
by an IKE deamon stop working.

On traffic triggered connections the IKE deamon inserts a new SA
for each packet that matches a policy.

Re: [PATCHv2] net/xfrm: Revert "[XFRM]: Do not add a state whose SPI is zero to the SPI hash."

2018-05-07 Thread Steffen Klassert

On Fri, May 04, 2018 at 02:20:09AM +0100, Dmitry Safonov wrote:
> This reverts commit 7b4dc3600e48 ("[XFRM]: Do not add a state whose SPI
> is zero to the SPI hash.").
> 
> Zero SPI is legal and defined for IPcomp.
> We shouldn't omit adding the state to SPI hash because it'll not be
> possible to delete or lookup for it afterward:
> __xfrm_state_insert() obviously doesn't add hash for zero
> SPI in xfrm.state_byspi, and xfrm_user_state_lookup() will fail as
> xfrm_state_lookup() does lookups by hash.
> 
> It also isn't possible to workaround from userspace as
> xfrm_id_proto_match() will be always true for ah/esp/comp protos.
> 
> v1 link: https://lkml.kernel.org/r/<20180502020220.2027-1-d...@arista.com>
> 
> Cc: Masahide NAKAMURA 
> Cc: YOSHIFUJI Hideaki 
> Cc: Steffen Klassert 
> Cc: "David S. Miller" 
> Cc: net...@vger.kernel.org
> Suggested-by: Herbert Xu 
> Signed-off-by: Dmitry Safonov 

This patch does much more than reverting the commit you mentioned.
It removes all the logic that is needed to handle larval SAs.

The result is a disaster, all connections that are negotiated
by an IKE deamon stop working.

On traffic triggered connections the IKE deamon inserts a new SA
for each packet that matches a policy.

Re: [PATCH v4 ipsec-next] xfrm: remove VLA usage in __xfrm6_sort()

2018-04-26 Thread Steffen Klassert

On Wed, Apr 25, 2018 at 04:58:52PM +0200, Stefano Brivio wrote:
> On Wed, 25 Apr 2018 07:46:39 -0700
> Kees Cook  wrote:
> 
> > In the quest to remove all stack VLA usage removed from the kernel[1],
> > just use XFRM_MAX_DEPTH as already done for the "class" array. In one
> > case, it'll do this loop up to 5, the other caller up to 6.
> > 
> > [1] https://lkml.org/lkml/2018/3/7/621
> > 
> > Co-developed-by: Andreas Christoforou 
> > Signed-off-by: Kees Cook 
> > ---
> > v4:
> > - actually remove memset(). :)
> > v3:
> > - adjust Subject and commit log (Steffen)
> > - use "= { }" instead of memset() (Stefano)
> > v2:
> > - use XFRM_MAX_DEPTH for "count" array (Steffen and Mathias).
> > ---
> 
> Acked-by: Stefano Brivio 

Applied to ipsec-next, thanks everyone!

Re: [PATCH v4 ipsec-next] xfrm: remove VLA usage in __xfrm6_sort()

2018-04-26 Thread Steffen Klassert

On Wed, Apr 25, 2018 at 04:58:52PM +0200, Stefano Brivio wrote:
> On Wed, 25 Apr 2018 07:46:39 -0700
> Kees Cook  wrote:
> 
> > In the quest to remove all stack VLA usage removed from the kernel[1],
> > just use XFRM_MAX_DEPTH as already done for the "class" array. In one
> > case, it'll do this loop up to 5, the other caller up to 6.
> > 
> > [1] https://lkml.org/lkml/2018/3/7/621
> > 
> > Co-developed-by: Andreas Christoforou 
> > Signed-off-by: Kees Cook 
> > ---
> > v4:
> > - actually remove memset(). :)
> > v3:
> > - adjust Subject and commit log (Steffen)
> > - use "= { }" instead of memset() (Stefano)
> > v2:
> > - use XFRM_MAX_DEPTH for "count" array (Steffen and Mathias).
> > ---
> 
> Acked-by: Stefano Brivio 

Applied to ipsec-next, thanks everyone!

Re: [PATCH v2 2/2] af_key: Use DIV_ROUND_UP() instead of open-coded equivalent

2018-04-09 Thread Steffen Klassert

On Sat, Apr 07, 2018 at 11:40:47AM -0400, Kevin Easton wrote:
> Several places use (x + 7) / 8 to convert from a number of bits to a number
> of bytes.  Replace those with DIV_ROUND_UP(x, 8) instead, for consistency
> with other parts of the same file.
> 
> Signed-off-by: Kevin Easton 

Please resubmit this one to ipsec-next after the
merge window. Thanks!

Re: [PATCH v2 2/2] af_key: Use DIV_ROUND_UP() instead of open-coded equivalent

2018-04-09 Thread Steffen Klassert

On Sat, Apr 07, 2018 at 11:40:47AM -0400, Kevin Easton wrote:
> Several places use (x + 7) / 8 to convert from a number of bits to a number
> of bytes.  Replace those with DIV_ROUND_UP(x, 8) instead, for consistency
> with other parts of the same file.
> 
> Signed-off-by: Kevin Easton 

Please resubmit this one to ipsec-next after the
merge window. Thanks!

Re: [PATCH v2 1/2] af_key: Always verify length of provided sadb_key

2018-04-09 Thread Steffen Klassert

On Sat, Apr 07, 2018 at 11:40:33AM -0400, Kevin Easton wrote:
> Key extensions (struct sadb_key) include a user-specified number of key
> bits.  The kernel uses that number to determine how much key data to copy
> out of the message in pfkey_msg2xfrm_state().
> 
> The length of the sadb_key message must be verified to be long enough,
> even in the case of SADB_X_AALG_NULL.  Furthermore, the sadb_key_len value
> must be long enough to include both the key data and the struct sadb_key
> itself.
> 
> Introduce a helper function verify_key_len(), and call it from
> parse_exthdrs() where other exthdr types are similarly checked for
> correctness.
> 
> Signed-off-by: Kevin Easton 
> Reported-by: 
> syzbot+5022a34ca5a3d49b84223653fab632dfb7b4c...@syzkaller.appspotmail.com

Applied to the ipsec tree, thanks Kevin!

Re: [PATCH v2 1/2] af_key: Always verify length of provided sadb_key

2018-04-09 Thread Steffen Klassert

On Sat, Apr 07, 2018 at 11:40:33AM -0400, Kevin Easton wrote:
> Key extensions (struct sadb_key) include a user-specified number of key
> bits.  The kernel uses that number to determine how much key data to copy
> out of the message in pfkey_msg2xfrm_state().
> 
> The length of the sadb_key message must be verified to be long enough,
> even in the case of SADB_X_AALG_NULL.  Furthermore, the sadb_key_len value
> must be long enough to include both the key data and the struct sadb_key
> itself.
> 
> Introduce a helper function verify_key_len(), and call it from
> parse_exthdrs() where other exthdr types are similarly checked for
> correctness.
> 
> Signed-off-by: Kevin Easton 
> Reported-by: 
> syzbot+5022a34ca5a3d49b84223653fab632dfb7b4c...@syzkaller.appspotmail.com

Applied to the ipsec tree, thanks Kevin!

Re: [PATCH v2 0/2] af_key: Fix for sadb_key memcpy read overrun

2018-04-09 Thread Steffen Klassert

On Sat, Apr 07, 2018 at 11:40:18AM -0400, Kevin Easton wrote:
> As found by syzbot, af_key does not properly validate the key length in
> sadb_key messages from userspace.  This can result in copying from beyond
> the end of the sadb_key part of the message, or indeed beyond the end of
> the entire packet.
> 
> Both these patches apply cleanly to ipsec-next.  Based on Steffen's
> feedback I have re-ordered them so that the fix only is in patch 1, which
> I would suggest is also a stable tree candidate, whereas patch 2 is a
> cleanup only.

I think here is some explanation needed. Usually bugfixes and cleanups
don't go to the same tree. On IPsec bugfixes go to the'ipsec' tree
while cleanups and new features go to the 'ipsec-next' tree. So
you need to split up your patchsets into patches that are targeted
to 'ipsec' and 'ipsec-next'. Aside from that, we are in 'merge window'
currently. This means that most maintainers don't accept patches to
their -next trees. If you have patches for a -next tree, wait until
the merge window is over (when v4.17-rc1 is released) and send them
then.

Re: [PATCH v2 0/2] af_key: Fix for sadb_key memcpy read overrun

2018-04-09 Thread Steffen Klassert

On Sat, Apr 07, 2018 at 11:40:18AM -0400, Kevin Easton wrote:
> As found by syzbot, af_key does not properly validate the key length in
> sadb_key messages from userspace.  This can result in copying from beyond
> the end of the sadb_key part of the message, or indeed beyond the end of
> the entire packet.
> 
> Both these patches apply cleanly to ipsec-next.  Based on Steffen's
> feedback I have re-ordered them so that the fix only is in patch 1, which
> I would suggest is also a stable tree candidate, whereas patch 2 is a
> cleanup only.

I think here is some explanation needed. Usually bugfixes and cleanups
don't go to the same tree. On IPsec bugfixes go to the'ipsec' tree
while cleanups and new features go to the 'ipsec-next' tree. So
you need to split up your patchsets into patches that are targeted
to 'ipsec' and 'ipsec-next'. Aside from that, we are in 'merge window'
currently. This means that most maintainers don't accept patches to
their -next trees. If you have patches for a -next tree, wait until
the merge window is over (when v4.17-rc1 is released) and send them
then.

Re: [PATCH 1/2] af_key: Use DIV_ROUND_UP() instead of open-coded equivalent

2018-04-05 Thread Steffen Klassert

On Wed, Mar 28, 2018 at 09:35:26PM -0400, Kevin Easton wrote:
> On Wed, Mar 28, 2018 at 07:59:25AM +0200, Steffen Klassert wrote:
> > On Mon, Mar 26, 2018 at 07:39:16AM -0400, Kevin Easton wrote:
> > > Several places use (x + 7) / 8 to convert from a number of bits to a 
> > > number
> > > of bytes.  Replace those with DIV_ROUND_UP(x, 8) instead, for consistency
> > > with other parts of the same file.
> > > 
> > > Signed-off-by: Kevin Easton <ke...@guarana.org>
> > 
> > Is this a fix or just a cleanup?
> > If it is just a cleanup, please resent it based on ipsec-next.
> 
> Hi Steffen,
> 
> This patch (1/2) is just a cleanup, but it's in preparation for patch 2/2
> which is a fix and modifies some of the same lines.

So please resend the fix without the cleanup, otherwise we can
get conflicts when backporting the fix to the stable trees.

Re: [PATCH 1/2] af_key: Use DIV_ROUND_UP() instead of open-coded equivalent

2018-04-05 Thread Steffen Klassert

On Wed, Mar 28, 2018 at 09:35:26PM -0400, Kevin Easton wrote:
> On Wed, Mar 28, 2018 at 07:59:25AM +0200, Steffen Klassert wrote:
> > On Mon, Mar 26, 2018 at 07:39:16AM -0400, Kevin Easton wrote:
> > > Several places use (x + 7) / 8 to convert from a number of bits to a 
> > > number
> > > of bytes.  Replace those with DIV_ROUND_UP(x, 8) instead, for consistency
> > > with other parts of the same file.
> > > 
> > > Signed-off-by: Kevin Easton 
> > 
> > Is this a fix or just a cleanup?
> > If it is just a cleanup, please resent it based on ipsec-next.
> 
> Hi Steffen,
> 
> This patch (1/2) is just a cleanup, but it's in preparation for patch 2/2
> which is a fix and modifies some of the same lines.

So please resend the fix without the cleanup, otherwise we can
get conflicts when backporting the fix to the stable trees.

Re: Regression in 4.16-rc7 - ipsec vpn broken

2018-03-29 Thread Steffen Klassert

Please always make sure to Cc net...@vger.kernel.org
on networking problems.

On Wed, Mar 28, 2018 at 10:21:32PM +, Derek Robson wrote:
> The ipsec VPN is broken in 4.16-rc7 and seem to have been broken in all of
> 4.15
> 
> connecting from an iphone seems to give a timeout.
> 
> 
> A bisect brings me to this commit as the one that is the issue.
> 
> commit: acf568ee859f098279eadf551612f103afdacb4e  (xfrm: Reinject
> transport-mode packets through tasklet)

I have a fix queued for this commit in the ipsec tree.

Can you please try if the patch below fixes your problems?

Thanks!

Subject: [PATCH] xfrm: Fix transport mode skb control buffer usage.

A recent commit introduced a new struct xfrm_trans_cb
that is used with the sk_buff control buffer. Unfortunately
it placed the structure in front of the control buffer and
overlooked that the IPv4/IPv6 control buffer is still needed
for some layer 4 protocols. As a result the IPv4/IPv6 control
buffer is overwritten with this structure. Fix this by setting
a apropriate header in front of the structure.

Fixes acf568ee859f ("xfrm: Reinject transport-mode packets ...")
Signed-off-by: Steffen Klassert <steffen.klass...@secunet.com>
---
 net/xfrm/xfrm_input.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index 1472c0857975..81788105c164 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -26,6 +26,12 @@ struct xfrm_trans_tasklet {
 };

 struct xfrm_trans_cb {
+   union {
+   struct inet_skb_parmh4;
+#if IS_ENABLED(CONFIG_IPV6)
+   struct inet6_skb_parm   h6;
+#endif
+   } header;
int (*finish)(struct net *net, struct sock *sk, struct sk_buff *skb);
 };

-- 
2.14.1

Re: Regression in 4.16-rc7 - ipsec vpn broken

2018-03-29 Thread Steffen Klassert

Please always make sure to Cc net...@vger.kernel.org
on networking problems.

On Wed, Mar 28, 2018 at 10:21:32PM +, Derek Robson wrote:
> The ipsec VPN is broken in 4.16-rc7 and seem to have been broken in all of
> 4.15
> 
> connecting from an iphone seems to give a timeout.
> 
> 
> A bisect brings me to this commit as the one that is the issue.
> 
> commit: acf568ee859f098279eadf551612f103afdacb4e  (xfrm: Reinject
> transport-mode packets through tasklet)

I have a fix queued for this commit in the ipsec tree.

Can you please try if the patch below fixes your problems?

Thanks!

Subject: [PATCH] xfrm: Fix transport mode skb control buffer usage.

A recent commit introduced a new struct xfrm_trans_cb
that is used with the sk_buff control buffer. Unfortunately
it placed the structure in front of the control buffer and
overlooked that the IPv4/IPv6 control buffer is still needed
for some layer 4 protocols. As a result the IPv4/IPv6 control
buffer is overwritten with this structure. Fix this by setting
a apropriate header in front of the structure.

Fixes acf568ee859f ("xfrm: Reinject transport-mode packets ...")
Signed-off-by: Steffen Klassert 
---
 net/xfrm/xfrm_input.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index 1472c0857975..81788105c164 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -26,6 +26,12 @@ struct xfrm_trans_tasklet {
 };

 struct xfrm_trans_cb {
+   union {
+   struct inet_skb_parmh4;
+#if IS_ENABLED(CONFIG_IPV6)
+   struct inet6_skb_parm   h6;
+#endif
+   } header;
int (*finish)(struct net *net, struct sock *sk, struct sk_buff *skb);
 };

-- 
2.14.1

Re: [PATCH 1/2] af_key: Use DIV_ROUND_UP() instead of open-coded equivalent

2018-03-28 Thread Steffen Klassert

On Mon, Mar 26, 2018 at 07:39:16AM -0400, Kevin Easton wrote:
> Several places use (x + 7) / 8 to convert from a number of bits to a number
> of bytes.  Replace those with DIV_ROUND_UP(x, 8) instead, for consistency
> with other parts of the same file.
> 
> Signed-off-by: Kevin Easton 

Is this a fix or just a cleanup?
If it is just a cleanup, please resent it based on ipsec-next.

Re: [PATCH 1/2] af_key: Use DIV_ROUND_UP() instead of open-coded equivalent

2018-03-28 Thread Steffen Klassert

On Mon, Mar 26, 2018 at 07:39:16AM -0400, Kevin Easton wrote:
> Several places use (x + 7) / 8 to convert from a number of bits to a number
> of bytes.  Replace those with DIV_ROUND_UP(x, 8) instead, for consistency
> with other parts of the same file.
> 
> Signed-off-by: Kevin Easton 

Is this a fix or just a cleanup?
If it is just a cleanup, please resent it based on ipsec-next.

Re: WARNING in kmalloc_slab (4)

2018-03-13 Thread Steffen Klassert

On Tue, Mar 13, 2018 at 12:33:02AM -0700, syzbot wrote:
> Hello,
> 
> syzbot hit the following crash on net-next commit
> f44b1886a5f876c87b5889df463ad7b97834ba37 (Fri Mar 9 18:10:06 2018 +)
> Merge branch 's390-qeth-next'
> 
> Unfortunately, I don't have any reproducer for this crash yet.
> Raw console output is attached.
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached.
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+6a7e7ed886bde4346...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.
> 
> WARNING: CPU: 1 PID: 27333 at mm/slab_common.c:1012 kmalloc_slab+0x5d/0x70
> mm/slab_common.c:1012
> Kernel panic - not syncing: panic_on_warn set ...
> 
> syz-executor0: vmalloc: allocation failure: 17045651456 bytes,
> mode:0x14080c0(GFP_KERNEL|__GFP_ZERO), nodemask=(null)
> CPU: 1 PID: 27333 Comm: syz-executor2 Not tainted 4.16.0-rc4+ #260
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:17 [inline]
>  dump_stack+0x194/0x24d lib/dump_stack.c:53
>  panic+0x1e4/0x41c kernel/panic.c:183
> syz-executor0 cpuset=
>  __warn+0x1dc/0x200 kernel/panic.c:547
> /
>  mems_allowed=0
>  report_bug+0x211/0x2d0 lib/bug.c:184
>  fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178
>  fixup_bug arch/x86/kernel/traps.c:247 [inline]
>  do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
>  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
>  invalid_op+0x1b/0x40 arch/x86/entry/entry_64.S:986
> RIP: 0010:kmalloc_slab+0x5d/0x70 mm/slab_common.c:1012
> RSP: 0018:8801ccfc72f0 EFLAGS: 00010246
> RAX:  RBX: 1018 RCX: 84ec4fc8
> RDX: 0ba7 RSI:  RDI: 1018
> RBP: 8801ccfc72f0 R08:  R09: 1100399f8e21
> R10: 8801ccfc7040 R11: 0001 R12: 0018
> R13: 8801ccfc7598 R14: 014080c0 R15: 8801aebaad80
>  __do_kmalloc mm/slab.c:3700 [inline]
>  __kmalloc+0x25/0x760 mm/slab.c:3714
>  kmalloc include/linux/slab.h:517 [inline]
>  kzalloc include/linux/slab.h:701 [inline]
>  xfrm_alloc_replay_state_esn net/xfrm/xfrm_user.c:442 [inline]

This is likely fixed with:

commit d97ca5d714a5334aecadadf696875da40f1fbf3e
xfrm_user: uncoditionally validate esn replay attribute struct

The patch is included in the ipsec pull request for the net
tree I've sent this morning.

Re: WARNING in kmalloc_slab (4)

2018-03-13 Thread Steffen Klassert

On Tue, Mar 13, 2018 at 12:33:02AM -0700, syzbot wrote:
> Hello,
> 
> syzbot hit the following crash on net-next commit
> f44b1886a5f876c87b5889df463ad7b97834ba37 (Fri Mar 9 18:10:06 2018 +)
> Merge branch 's390-qeth-next'
> 
> Unfortunately, I don't have any reproducer for this crash yet.
> Raw console output is attached.
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached.
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+6a7e7ed886bde4346...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.
> 
> WARNING: CPU: 1 PID: 27333 at mm/slab_common.c:1012 kmalloc_slab+0x5d/0x70
> mm/slab_common.c:1012
> Kernel panic - not syncing: panic_on_warn set ...
> 
> syz-executor0: vmalloc: allocation failure: 17045651456 bytes,
> mode:0x14080c0(GFP_KERNEL|__GFP_ZERO), nodemask=(null)
> CPU: 1 PID: 27333 Comm: syz-executor2 Not tainted 4.16.0-rc4+ #260
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:17 [inline]
>  dump_stack+0x194/0x24d lib/dump_stack.c:53
>  panic+0x1e4/0x41c kernel/panic.c:183
> syz-executor0 cpuset=
>  __warn+0x1dc/0x200 kernel/panic.c:547
> /
>  mems_allowed=0
>  report_bug+0x211/0x2d0 lib/bug.c:184
>  fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178
>  fixup_bug arch/x86/kernel/traps.c:247 [inline]
>  do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
>  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
>  invalid_op+0x1b/0x40 arch/x86/entry/entry_64.S:986
> RIP: 0010:kmalloc_slab+0x5d/0x70 mm/slab_common.c:1012
> RSP: 0018:8801ccfc72f0 EFLAGS: 00010246
> RAX:  RBX: 1018 RCX: 84ec4fc8
> RDX: 0ba7 RSI:  RDI: 1018
> RBP: 8801ccfc72f0 R08:  R09: 1100399f8e21
> R10: 8801ccfc7040 R11: 0001 R12: 0018
> R13: 8801ccfc7598 R14: 014080c0 R15: 8801aebaad80
>  __do_kmalloc mm/slab.c:3700 [inline]
>  __kmalloc+0x25/0x760 mm/slab.c:3714
>  kmalloc include/linux/slab.h:517 [inline]
>  kzalloc include/linux/slab.h:701 [inline]
>  xfrm_alloc_replay_state_esn net/xfrm/xfrm_user.c:442 [inline]

This is likely fixed with:

commit d97ca5d714a5334aecadadf696875da40f1fbf3e
xfrm_user: uncoditionally validate esn replay attribute struct

The patch is included in the ipsec pull request for the net
tree I've sent this morning.

Re: [PATCH v2 net] net: xfrm: use preempt-safe this_cpu_read() in ipcomp_alloc_tfms()

2018-03-13 Thread Steffen Klassert

On Wed, Mar 07, 2018 at 02:42:53PM -0800, Greg Hackmann wrote:
> f7c83bcbfaf5 ("net: xfrm: use __this_cpu_read per-cpu helper") added a
> __this_cpu_read() call inside ipcomp_alloc_tfms().
> 
> At the time, __this_cpu_read() required the caller to either not care
> about races or to handle preemption/interrupt issues.  3.15 tightened
> the rules around some per-cpu operations, and now __this_cpu_read()
> should never be used in a preemptible context.  On 3.15 and later, we
> need to use this_cpu_read() instead.
> 

...

> Signed-off-by: Greg Hackmann 

Patch applied, thanks!

Re: [PATCH v2 net] net: xfrm: use preempt-safe this_cpu_read() in ipcomp_alloc_tfms()

2018-03-13 Thread Steffen Klassert

On Wed, Mar 07, 2018 at 02:42:53PM -0800, Greg Hackmann wrote:
> f7c83bcbfaf5 ("net: xfrm: use __this_cpu_read per-cpu helper") added a
> __this_cpu_read() call inside ipcomp_alloc_tfms().
> 
> At the time, __this_cpu_read() required the caller to either not care
> about races or to handle preemption/interrupt issues.  3.15 tightened
> the rules around some per-cpu operations, and now __this_cpu_read()
> should never be used in a preemptible context.  On 3.15 and later, we
> need to use this_cpu_read() instead.
> 

...

> Signed-off-by: Greg Hackmann 

Patch applied, thanks!

Re: [PATCH v2] net: ipv6: xfrm6_state: remove VLA usage

2018-03-12 Thread Steffen Klassert

On Sat, Mar 10, 2018 at 07:26:44PM +0100, Stefano Brivio wrote:
> On Sat, 10 Mar 2018 09:18:46 -0800
> Kees Cook  wrote:
> 
> > On Sat, Mar 10, 2018 at 12:43 AM, Stefano Brivio  wrote:
> > > On Sat, 10 Mar 2018 09:40:44 +0200
> > > Andreas Christoforou  wrote:
> > >  
> > >> diff --git a/net/ipv6/xfrm6_state.c b/net/ipv6/xfrm6_state.c
> > >> index b15075a..270a53a 100644
> > >> --- a/net/ipv6/xfrm6_state.c
> > >> +++ b/net/ipv6/xfrm6_state.c
> > >> @@ -62,7 +62,7 @@ __xfrm6_sort(void **dst, void **src, int n, int 
> > >> (*cmp)(void *p), int maxclass)
> > >>  {
> > >>   int i;
> > >>   int class[XFRM_MAX_DEPTH];
> > >> - int count[maxclass];
> > >> + int count[XFRM_MAX_DEPTH];
> > >>
> > >>   memset(count, 0, sizeof(count));  
> > >
> > > Can you perhaps initialize 'count' instead of calling memset(), now?  
> > 
> > Do you mean:
> > 
> > int count[XFRM_MAX_DEPTH] = { };
> > 
> > instead of the memset()?
> 
> Yep.
> 
> > I thought the compiler would resolve these both to the same thing?
> 
> Yes, for all practical purposes. With gcc 7.3.0 for x86_64, starting
> from -O1, it's exactly the same. With e.g. gcc 4.4.7, even with -O3,
> they can be a bit different depending on context.
> 
> > The former looks better though! :)
> 
> Yep! :)

If Andreas does a v3 anyway, please also consider to trim the subject
line to something like:

xfrm: remove VLA usage in __xfrm6_sort()

Re: [PATCH v2] net: ipv6: xfrm6_state: remove VLA usage

2018-03-12 Thread Steffen Klassert

On Sat, Mar 10, 2018 at 07:26:44PM +0100, Stefano Brivio wrote:
> On Sat, 10 Mar 2018 09:18:46 -0800
> Kees Cook  wrote:
> 
> > On Sat, Mar 10, 2018 at 12:43 AM, Stefano Brivio  wrote:
> > > On Sat, 10 Mar 2018 09:40:44 +0200
> > > Andreas Christoforou  wrote:
> > >  
> > >> diff --git a/net/ipv6/xfrm6_state.c b/net/ipv6/xfrm6_state.c
> > >> index b15075a..270a53a 100644
> > >> --- a/net/ipv6/xfrm6_state.c
> > >> +++ b/net/ipv6/xfrm6_state.c
> > >> @@ -62,7 +62,7 @@ __xfrm6_sort(void **dst, void **src, int n, int 
> > >> (*cmp)(void *p), int maxclass)
> > >>  {
> > >>   int i;
> > >>   int class[XFRM_MAX_DEPTH];
> > >> - int count[maxclass];
> > >> + int count[XFRM_MAX_DEPTH];
> > >>
> > >>   memset(count, 0, sizeof(count));  
> > >
> > > Can you perhaps initialize 'count' instead of calling memset(), now?  
> > 
> > Do you mean:
> > 
> > int count[XFRM_MAX_DEPTH] = { };
> > 
> > instead of the memset()?
> 
> Yep.
> 
> > I thought the compiler would resolve these both to the same thing?
> 
> Yes, for all practical purposes. With gcc 7.3.0 for x86_64, starting
> from -O1, it's exactly the same. With e.g. gcc 4.4.7, even with -O3,
> they can be a bit different depending on context.
> 
> > The former looks better though! :)
> 
> Yep! :)

If Andreas does a v3 anyway, please also consider to trim the subject
line to something like:

xfrm: remove VLA usage in __xfrm6_sort()

Re: [PATCH] net: ipv6: xfrm6_state: remove VLA usage

2018-03-09 Thread Steffen Klassert

On Fri, Mar 09, 2018 at 01:49:07PM +0100, Mathias Krause wrote:
> On 9 March 2018 at 13:21, Andreas Christoforou
>  wrote:
> > The kernel would like to have all stack VLA usage removed[1].
> >
> > Signed-off-by: Andreas Christoforou 
> > ---
> >  net/ipv6/xfrm6_state.c | 8 +++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/ipv6/xfrm6_state.c b/net/ipv6/xfrm6_state.c
> > index b15075a..45c0d98 100644
> > --- a/net/ipv6/xfrm6_state.c
> > +++ b/net/ipv6/xfrm6_state.c
> > @@ -62,7 +62,12 @@ __xfrm6_sort(void **dst, void **src, int n, int 
> > (*cmp)(void *p), int maxclass)
> >  {
> > int i;
> > int class[XFRM_MAX_DEPTH];
> > -   int count[maxclass];
> > +   int *count;
> > +
> > +   count = kcalloc(maxclass + 1, sizeof(*count), GFP_KERNEL);
> > +
> > +   if (!count)
> > +   return -ENOMEM;
> >
> > memset(count, 0, sizeof(count));
> >
> > @@ -80,6 +85,7 @@ __xfrm6_sort(void **dst, void **src, int n, int 
> > (*cmp)(void *p), int maxclass)
> > src[i] = NULL;
> > }
> >
> > +   kfree(count);
> > return 0;
> >  }
> 
> Instead of dynamically allocating and freeing memory here, shouldn't
> we just get rid of the maxclass parameter and use XFRM_MAX_DEPTH as
> size for the count[] array, too?

Right, that's the way to go. Aside from that, allocating
with GFP_KERNEL is definitely wrong here.

1 2 3 4 >

1 - 100 of 367 matches

Mail list logo