Re: [PATCHv3 1/2] capability: introduce sysctl for controlled user-ns capability whitelist

2017-12-30 Thread Michael Kerrisk (man-pages)
Hello Mahesh,

On 12/05/2017 11:31 PM, Mahesh Bandewar wrote:
> From: Mahesh Bandewar 
> 
> Add a sysctl variable kernel.controlled_userns_caps_whitelist. This
> takes input as capability mask expressed as two comma separated hex
> u32 words. The mask, however, is stored in kernel as kernel_cap_t type.

Just by the way, why is it not expressed as a 64 bit value? (The answer
to that question should I think be part of this commit message.)

> Any capabilities that are not part of this mask will be controlled and
> will not be allowed to processes in controlled user-ns.
> 
> Acked-by: Serge Hallyn 
> Signed-off-by: Mahesh Bandewar 
> ---
> v3:
>   Added couple of comments as requested by Serge Hallyn
> v2:
>   Rebase
> v1:
>   Initial submission
> 
>  Documentation/sysctl/kernel.txt | 21 ++
>  include/linux/capability.h  |  3 +++
>  kernel/capability.c | 47 
> +
>  kernel/sysctl.c |  5 +
>  4 files changed, 76 insertions(+)
> 
> diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
> index 694968c7523c..a1d39dbae847 100644
> --- a/Documentation/sysctl/kernel.txt
> +++ b/Documentation/sysctl/kernel.txt
> @@ -25,6 +25,7 @@ show up in /proc/sys/kernel:
>  - bootloader_version  [ X86 only ]
>  - callhome[ S390 only ]
>  - cap_last_cap
> +- controlled_userns_caps_whitelist
>  - core_pattern
>  - core_pipe_limit
>  - core_uses_pid
> @@ -187,6 +188,26 @@ CAP_LAST_CAP from the kernel.
>  
>  ==
>  
> +controlled_userns_caps_whitelist
> +
> +Capability mask that is whitelisted for "controlled" user namespaces.

How is a user-ns marked as "controlled"? Please clarify this.

> +Any capability that is missing from this mask will not be allowed to
> +any process that is attached to a controlled-userns. e.g. if CAP_NET_RAW
> +is not part of this mask, then processes running inside any controlled
> +userns's will not be allowed to perform action that needs CAP_NET_RAW
> +capability. However, processes that are attached to a parent user-ns
> +hierarchy that is *not* controlled and has CAP_NET_RAW can continue
> +performing those actions. User-namespaces are marked "controlled" at
> +the time of their creation based on the capabilities of the creator.
> +A process that does not have CAP_SYS_ADMIN will create user-namespaces
> +that are controlled.
> +
> +The value is expressed as two comma separated hex words (u32). This
> +sysctl is avaialble in init-ns and users with CAP_SYS_ADMIN in init-ns
> +are allowed to make changes.

Could you add here a shell session that demonstrates the use of these 
interfaces and how they allow/disallow capabilities. 

Is there a way that a process can see whether it is a controlled user-ns
vs an uncontrolled user-ns? I think it would be good to explain in this
doc patch.

Thanks,

Michael

> +==
> +
>  core_pattern:
>  
>  core_pattern is used to specify a core dumpfile pattern name.
> diff --git a/include/linux/capability.h b/include/linux/capability.h
> index f640dcbc880c..7d79a4689625 100644
> --- a/include/linux/capability.h
> +++ b/include/linux/capability.h
> @@ -14,6 +14,7 @@
>  #define _LINUX_CAPABILITY_H
>  
>  #include 
> +#include 
>  
>  
>  #define _KERNEL_CAPABILITY_VERSION _LINUX_CAPABILITY_VERSION_3
> @@ -248,6 +249,8 @@ extern bool ptracer_capable(struct task_struct *tsk, 
> struct user_namespace *ns);
>  
>  /* audit system wants to get cap info from files as well */
>  extern int get_vfs_caps_from_disk(const struct dentry *dentry, struct 
> cpu_vfs_cap_data *cpu_caps);
> +int proc_douserns_caps_whitelist(struct ctl_table *table, int write,
> +  void __user *buff, size_t *lenp, loff_t *ppos);
>  
>  extern int cap_convert_nscap(struct dentry *dentry, void **ivalue, size_t 
> size);
>  
> diff --git a/kernel/capability.c b/kernel/capability.c
> index 1e1c0236f55b..4a859b7d4902 100644
> --- a/kernel/capability.c
> +++ b/kernel/capability.c
> @@ -29,6 +29,8 @@ EXPORT_SYMBOL(__cap_empty_set);
>  
>  int file_caps_enabled = 1;
>  
> +kernel_cap_t controlled_userns_caps_whitelist = CAP_FULL_SET;
> +
>  static int __init file_caps_disable(char *str)
>  {
>   file_caps_enabled = 0;
> @@ -507,3 +509,48 @@ bool ptracer_capable(struct task_struct *tsk, struct 
> user_namespace *ns)
>   rcu_read_unlock();
>   return (ret == 0);
>  }
> +
> +/* Controlled-userns capabilities routines */
> +#ifdef CONFIG_SYSCTL
> +int proc_douserns_caps_whitelist(struct ctl_table *table, int write,
> +  void __user *buff, size_t *lenp, loff_t *ppos)
> +{
> + DECLARE_BITMAP(caps_bitmap, CAP_LAST_CAP);
> + struct ctl_table caps_table;
> + char tbuf[NAME_MAX];
> + int ret;
> +
> + ret = bitmap_from_u32array(caps_bitmap, CAP_LAST_CAP,
> +cont

Re: [PATCHv3 0/2] capability controlled user-namespaces

2017-12-30 Thread Michael Kerrisk (man-pages)
Hello Mahesh,

On 12/28/2017 01:45 AM, Mahesh Bandewar (महेश बंडेवार) wrote:
> On Wed, Dec 27, 2017 at 12:23 PM, Michael Kerrisk (man-pages)
>  wrote:
>> Hello Mahesh,
>>
>> On 27 December 2017 at 18:09, Mahesh Bandewar (महेश बंडेवार)
>>  wrote:
>>> Hello James,
>>>
>>> Seems like I missed your name to be added into the review of this
>>> patch series. Would you be willing be pull this into the security
>>> tree? Serge Hallyn has already ACKed it.
>>
>> We seem to have no formal documentation/specification of this feature.
>> I think that should be written up before this patch goes into
>> mainline...
>>
> absolutely. I have added enough information into the Documentation dir
> relevant to this feature (please look at the  individual patches),
> that could be used. I could help if needed.

Yes, but I think that the documentation is rather incomplete.
I'll also reply to the relevant Documentation thread.

See also some comments below about this commit message, which
should make things *much* easier for the reader.

>>> On Tue, Dec 5, 2017 at 2:30 PM, Mahesh Bandewar  wrote:
 From: Mahesh Bandewar 

 TL;DR version
 -
 Creating a sandbox environment with namespaces is challenging
 considering what these sandboxed processes can engage into. e.g.
 CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few.
 Current form of user-namespaces, however, if changed a bit can allow
 us to create a sandbox environment without locking down user-
 namespaces.

 Detailed version
 

 Problem
 ---
 User-namespaces in the current form have increased the attack surface as
 any process can acquire capabilities which are not available to them (by
 default) by performing combination of clone()/unshare()/setns() syscalls.

 #define _GNU_SOURCE
 #include 
 #include 
 #include 

 int main(int ac, char **av)
 {
 int sock = -1;

 printf("Attempting to open RAW socket before unshare()...\n");
 sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW);
 if (sock < 0) {
 perror("socket() SOCK_RAW failed: ");
 } else {
 printf("Successfully opened RAW-Sock before unshare().\n");
 close(sock);
 sock = -1;
 }

 if (unshare(CLONE_NEWUSER | CLONE_NEWNET) < 0) {
 perror("unshare() failed: ");
 return 1;
 }

 printf("Attempting to open RAW socket after unshare()...\n");
 sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW);
 if (sock < 0) {
 perror("socket() SOCK_RAW failed: ");
 } else {
 printf("Successfully opened RAW-Sock after unshare().\n");
 close(sock);
 sock = -1;
 }

 return 0;
 }

 The above example shows how easy it is to acquire NET_RAW capabilities
 and once acquired, these processes could take benefit of above mentioned
 or similar issues discovered/undiscovered with malicious intent.

But you do not actually describe what the problem is. I think
it's not sufficient to simply refer to some CVEs.
Your mail message/commit should clearly describe what the issue is,
rather than leave the reader to decipher a bunch of CVEs, and derive
your concerns from those CVEs.

 Note
 that this is just an example and the problem/solution is not limited
 to NET_RAW capability *only*.

 The easiest fix one can apply here is to lock-down user-namespaces which
 many of the distros do (i.e. don't allow users to create user namespaces),
 but unfortunately that prevents everyone from using them.

 Approach
 
 Introduce a notion of 'controlled' user-namespaces. Every process on
 the host is allowed to create user-namespaces (governed by the limit
 imposed by per-ns sysctl) however, mark user-namespaces created by
 sandboxed processes as 'controlled'. Use this 'mark' at the time of
 capability check in conjunction with a global capability whitelist.
 If the capability is not whitelisted, processes that belong to
 controlled user-namespaces will not be allowed.

 Once a user-ns is marked as 'controlled'; all its child user-
 namespaces are marked as 'controlled' too.

How is a user-ns marked as "controlled"? It is not clear at this point.
Please clarify this in your cover mail (and on the Documentation patch.)

 A global whitelist is list of capabilities governed by the
 sysctl which is available to (privileged) user in init-ns to modify

What "the sysctl? Please name it at this point. (This may be purely a 
language issue. Do you mean "...governed by *a* sysctl, 
[sysctl-name-inserted-here]"?)

 while it's applicable to all controlled user-name

Re: [patch net-next v2 00/10] Add support for resource abstraction

2017-12-30 Thread Andrew Lunn
> 1. Both dpipe and devlink resource are abstraction models for
> hardware entities, and as a result they true to provide generic objects.
> Each driver/ASIC should register his own and it absolutely proprietary
> implementation. There is absolutely NO industry standard here, the only
> thing that resembles a standard is that dpipe looks a bit like P4 only
> because its proved to be useful for describing packet forwarding
> pipelines. The host4 table is just a hardware process in the mellanox
> spectrum ASIC pipeline and it should not be part of ABI, sorry I clearly
> don't understand how this even came up.

Why should it not be part of the ABI? Are you saying it will change
from version to version? Are you trying to warning people not to write
scripts using it, because its meaning will change and what works now
will break in the next kernel version?

 Andrew


Re: [patch net-next v2 00/10] Add support for resource abstraction

2017-12-30 Thread Jiri Pirko
Sat, Dec 30, 2017 at 11:18:50AM CET, and...@lunn.ch wrote:
>> 1. Both dpipe and devlink resource are abstraction models for
>> hardware entities, and as a result they true to provide generic objects.
>> Each driver/ASIC should register his own and it absolutely proprietary
>> implementation. There is absolutely NO industry standard here, the only
>> thing that resembles a standard is that dpipe looks a bit like P4 only
>> because its proved to be useful for describing packet forwarding
>> pipelines. The host4 table is just a hardware process in the mellanox
>> spectrum ASIC pipeline and it should not be part of ABI, sorry I clearly
>> don't understand how this even came up.
>
>Why should it not be part of the ABI? Are you saying it will change
>from version to version? Are you trying to warning people not to write
>scripts using it, because its meaning will change and what works now
>will break in the next kernel version?

In my opinion it should not change. Unless there is a bug (like the one
DaveA found in mlxsw erif table). Existing tables and resources should
be only added. It is the driver's maintainer responsibility to not to
break user scripts.


Re: [PATCH iproute2 2/4] man: add more keywords to ip.8 short description

2017-12-30 Thread Luca Boccassi
On Fri, 2017-12-29 at 20:02 -0800, Stephen Hemminger wrote:
> On Fri, 29 Dec 2017 23:01:23 +0100
> Luca Boccassi  wrote:
> 
> > A Debian user suggested adding more network-related keywords to the
> > ip manpage, so that manpage-scraping and indexing software like
> > apropos can do a better job of categorizing the programs.
> > 
> > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=877983
> > 
> > Suggested-by: Lynoure Braakman 
> > Signed-off-by: Luca Boccassi 
> > ---
> >  man/man8/ip.8 | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/man/man8/ip.8 b/man/man8/ip.8
> > index ae018fdf..94e64319 100644
> > --- a/man/man8/ip.8
> > +++ b/man/man8/ip.8
> > @@ -1,6 +1,6 @@
> >  .TH IP 8 "20 Dec 2011" "iproute2" "Linux"
> >  .SH NAME
> > -ip \- show / manipulate routing, devices, policy routing and
> > tunnels
> > +ip \- show / manipulate routing, network devices, policy routing,
> > interfaces (ethernet and more) and tunnels
> 
> Let's just keep it short and drop out policy routing (since that is
> part of routing anyway
> and drop the (ethernet and more).

Sure, done in v2

-- 
Kind regards,
Luca Boccassi

signature.asc
Description: This is a digitally signed message part


[PATCH iproute2 v2 1/4] man: drop references to Debian-specific paths

2017-12-30 Thread Luca Boccassi
Documentation should be distribution-agnostic - any specific quirks
should be handled by downstream maintainers, if necessary.
Remove mentions of Debian paths and package names.

Signed-off-by: Luca Boccassi 
---
 man/man8/lnstat.8 | 3 +--
 man/man8/ss.8 | 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/man/man8/lnstat.8 b/man/man8/lnstat.8
index acd5f4a2..b98241bf 100644
--- a/man/man8/lnstat.8
+++ b/man/man8/lnstat.8
@@ -254,8 +254,7 @@ Number of hash table list traversals for output traffic. 
Deprecated since IP
 route cache removal, therefore always zero.
 
 .SH SEE ALSO
-.BR ip (8),
-and /usr/share/doc/iproute-doc/README.lnstat (package iproute-doc on Debian)
+.BR ip (8)
 .br
 .SH AUTHOR
 lnstat was written by Harald Welte .
diff --git a/man/man8/ss.8 b/man/man8/ss.8
index 0d526734..973afbe0 100644
--- a/man/man8/ss.8
+++ b/man/man8/ss.8
@@ -327,7 +327,7 @@ Read filter information from FILE.
 Each line of FILE is interpreted like single command line option. If FILE is - 
stdin is used.
 .TP
 .B FILTER := [ state STATE-FILTER ] [ EXPRESSION ]
-Please take a look at the official documentation (Debian package iproute-doc) 
for details regarding filters.
+Please take a look at the official documentation for details regarding filters.
 
 .SH STATE-FILTER
 
@@ -382,7 +382,6 @@ Find all local processes connected to X server.
 List all the tcp sockets in state FIN-WAIT-1 for our apache to network 
193.233.7/24 and look at their timers.
 .SH SEE ALSO
 .BR ip (8),
-.BR /usr/share/doc/iproute-doc/ss.html " (package iproute�doc)",
 .br
 .BR RFC " 793 "
 - https://tools.ietf.org/rfc/rfc793.txt (TCP states)
-- 
2.14.2



[PATCH iproute2 v2 3/4] man: ip-address: document 15-char limit for LABEL

2017-12-30 Thread Luca Boccassi
Trying to set a label longer than 15 characters returns an error:
 RTNETLINK answers: Numerical result out of range

Document the limit in the manpage.

Originally reported as a Debian bug:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=661886

Reported-by: Gabor Kiss 
Signed-off-by: Luca Boccassi 
---
 man/man8/ip-address.8.in | 1 +
 1 file changed, 1 insertion(+)

diff --git a/man/man8/ip-address.8.in b/man/man8/ip-address.8.in
index eaa179c6..7ebf0bc9 100644
--- a/man/man8/ip-address.8.in
+++ b/man/man8/ip-address.8.in
@@ -190,6 +190,7 @@ Each address may be tagged with a label string.
 In order to preserve compatibility with Linux-2.0 net aliases,
 this string must coincide with the name of the device or must be prefixed
 with the device name followed by colon.
+The maximum allowed total length of label is 15 characters.
 
 .TP
 .BI scope " SCOPE_VALUE"
-- 
2.14.2



[PATCH iproute2 v2 2/4] man: add more keywords to ip.8 short description

2017-12-30 Thread Luca Boccassi
A Debian user suggested adding more network-related keywords to the
ip manpage, so that manpage-scraping and indexing software like
apropos can do a better job of categorizing the programs.

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=877983

Suggested-by: Lynoure Braakman 
Signed-off-by: Luca Boccassi 
---
 man/man8/ip.8 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/man/man8/ip.8 b/man/man8/ip.8
index ae018fdf..7f26582d 100644
--- a/man/man8/ip.8
+++ b/man/man8/ip.8
@@ -1,6 +1,6 @@
 .TH IP 8 "20 Dec 2011" "iproute2" "Linux"
 .SH NAME
-ip \- show / manipulate routing, devices, policy routing and tunnels
+ip \- show / manipulate routing, network devices, interfaces and tunnels
 .SH SYNOPSIS
 
 .ad l
-- 
2.14.2



[PATCH iproute2 v2 4/4] man: routel/routef: don't mention filesystem paths

2017-12-30 Thread Luca Boccassi
The filesytem paths to these scripts might be different on various
distros, so don't mention it in the manpages. It is not really useful
information anyway.

Originally submitted as Debian bug:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=561424

Reported-by: jida...@jidanni.org
Signed-off-by: Luca Boccassi 
---
 man/man8/routel.8 | 5 -
 1 file changed, 5 deletions(-)

diff --git a/man/man8/routel.8 b/man/man8/routel.8
index 82d580fb..2270eacb 100644
--- a/man/man8/routel.8
+++ b/man/man8/routel.8
@@ -17,11 +17,6 @@ The routel script will list routes in a format that some 
might consider easier t
 .br
 The routef script does not take any arguments and will simply flush the 
routing table down the drain. Beware! This means deleting all routes which will 
make your network unusable!
 
-.SH "FILES"
-.LP
-\fI/usr/bin/routef\fP
-.br
-\fI/usr/bin/routel\fP
 .SH "AUTHORS"
 .LP
 The routel script was written by Stephen R. van den Berg , 
1999/04/18 and donated to the public domain.
-- 
2.14.2



Re: [PATCH iproute2 3/4] man: ip-address: document 15-char limit for LABEL

2017-12-30 Thread Luca Boccassi
On Fri, 2017-12-29 at 20:04 -0800, Stephen Hemminger wrote:
> On Fri, 29 Dec 2017 23:01:24 +0100
> Luca Boccassi  wrote:
> 
> > Trying to set a label longer than 15 characters returns an error:
> >  RTNETLINK answers: Numerical result out of range
> > 
> > Document the limit in the manpage.
> > 
> > Originally reported as a Debian bug:
> > 
> > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=661886
> > 
> > Reported-by: Gabor Kiss 
> > Signed-off-by: Luca Boccassi 
> > ---
> >  man/man8/ip-address.8.in | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/man/man8/ip-address.8.in b/man/man8/ip-address.8.in
> > index eaa179c6..e7f14533 100644
> > --- a/man/man8/ip-address.8.in
> > +++ b/man/man8/ip-address.8.in
> > @@ -190,6 +190,7 @@ Each address may be tagged with a label string.
> >  In order to preserve compatibility with Linux-2.0 net aliases,
> >  this string must coincide with the name of the device or must be
> > prefixed
> >  with the device name followed by colon.
> > +The maximum allowed length is 15 characters.
> 
> Since these are aliases, lets be precise:
> The maximum allowed total length of label is 15 characters.

Ok, done in v2, thanks

-- 
Kind regards,
Luca Boccassi

signature.asc
Description: This is a digitally signed message part


[PATCH] b43: Replace mdelay with msleep in b43_radio_2057_init_post

2017-12-30 Thread Jia-Ju Bai
b43_radio_2057_init_post is not called in an interrupt handler 
nor holding a spinlock.
The function mdelay in it can be replaced with msleep, to reduce busy wait.

Signed-off-by: Jia-Ju Bai 
---
 drivers/net/wireless/broadcom/b43/phy_n.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/broadcom/b43/phy_n.c 
b/drivers/net/wireless/broadcom/b43/phy_n.c
index a5557d7..5bc838e 100644
--- a/drivers/net/wireless/broadcom/b43/phy_n.c
+++ b/drivers/net/wireless/broadcom/b43/phy_n.c
@@ -1031,7 +1031,7 @@ static void b43_radio_2057_init_post(struct b43_wldev 
*dev)
 
b43_radio_set(dev, R2057_RFPLL_MISC_CAL_RESETN, 0x78);
b43_radio_set(dev, R2057_XTAL_CONFIG2, 0x80);
-   mdelay(2);
+   msleep(2);
b43_radio_mask(dev, R2057_RFPLL_MISC_CAL_RESETN, ~0x78);
b43_radio_mask(dev, R2057_XTAL_CONFIG2, ~0x80);
 
-- 
1.7.9.5



[PATCH] sky2: Replace mdelay with msleep in sky2_vpd_wait

2017-12-30 Thread Jia-Ju Bai
sky2_vpd_wait is not called in an interrupt handler nor holding a spinlock.
The function mdelay in it can be replaced with msleep, to reduce busy wait.

Signed-off-by: Jia-Ju Bai 
---
 drivers/net/ethernet/marvell/sky2.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/sky2.c 
b/drivers/net/ethernet/marvell/sky2.c
index 9efe177..9fe8530 100644
--- a/drivers/net/ethernet/marvell/sky2.c
+++ b/drivers/net/ethernet/marvell/sky2.c
@@ -4287,7 +4287,7 @@ static int sky2_vpd_wait(const struct sky2_hw *hw, int 
cap, u16 busy)
dev_err(&hw->pdev->dev, "VPD cycle timed out\n");
return -ETIMEDOUT;
}
-   mdelay(1);
+   msleep(1);
}
 
return 0;
-- 
1.7.9.5



Re: general protection fault in skb_segment

2017-12-30 Thread Willem de Bruijn
> So this is a packet socket writing something that apparently looks
> like an SCTP packet, is only 42 bytes long, but has GSO set in its
> virtio_net_hdr struct.
>
> It crashes in skb_segment seemingly on a NULL list_skb.
>
> (gdb) list *(skb_segment+0x2a4)
> 0x8167cc24 is in skb_segment (net/core/skbuff.c:3566).
> 3561if (hsize < 0)
> 3562hsize = 0;
> 3563if (hsize > len || !sg)
> 3564hsize = len;
> 3565
> 3566if (!hsize && i >= nfrags && skb_headlen(list_skb) &&
> 3567(skb_headlen(list_skb) == len || sg)) {
> 3568BUG_ON(skb_headlen(list_skb) > len);
> 3569
> 3570i = 0;

It appears to be a packet that consists only of an sctp header.
sctp_gso_segment pulls the header before calling skb_segment,
after which hsize == skb_headlen(head_skb) == 0 and nfrags == 0.

This check avoids the crash, but still triggers an skb_warn_bad_offload
on return in __skb_gso_segment

@@ -45,6 +45,13 @@ static struct sk_buff *sctp_gso_segment(struct sk_buff *skb,
struct sk_buff *segs = ERR_PTR(-EINVAL);
struct sctphdr *sh;

+   if (!skb_has_frag_list(skb))
+   goto out;

A gso packet shorter than mss should perhaps just be dropped. The stack
does not generate these. tcp_gso_segment does have a test

   if (unlikely(skb->len <= mss))
goto out;

but as mss is derived from gso_size, which a packet socket controls, this
may not be sufficient for this purpose.


v4.15-rc5 warning: suspicious RCU usage in net/wireless/util.c:778

2017-12-30 Thread Dominik Brodowski
On Fri, Dec 22, 2017 at 08:20:12AM +0100, Dominik Brodowski wrote:
> Dear all,
> 
> once the (wifi) link becomes ready, the following warning is emitted on
> mainline (v4.15-rc4-202-gead68f216110) on my notebook:

... and it is still present as of v4.15-rc5-149-g5aa90a845892

> [   22.770422] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
> 
> [   22.772364] =
> [   22.772369] WARNING: suspicious RCU usage
> [   22.772375] 4.15.0-rc4+ #5 Not tainted
> [   22.772380] -
> [   22.772386] /home/brodo/local/kernel/git/linux/net/wireless/util.c:778 
> suspicious rcu_dereference_check() usage!
> [   22.772391] 
> [   22.772397] 
> [   22.772402] 4 locks held by wpa_supplicant/774:
> [   22.772407]  #0:  (cb_lock){}, at: [<276dc3a0>] 
> genl_rcv+0x15/0x40
> [   22.772437]  #1:  (genl_mutex){+.+.}, at: [<24d83eb3>] 
> genl_rcv_msg+0x7a/0x90
> [   22.772463]  #2:  (rtnl_mutex){+.+.}, at: [<9de25a59>] 
> nl80211_pre_doit+0xe9/0x190
> [   22.772489]  #3:  (&wdev->mtx){+.+.}, at: [<89bf2cfd>] 
> nl80211_send_iface+0x317/0x8d0
> [   22.772516] 
> [   22.772522] CPU: 3 PID: 774 Comm: wpa_supplicant Not tainted 4.15.0-rc4+ #5
> [   22.772528] Hardware name: Dell Inc. XPS 13 9343/0TM99H, BIOS A11 
> 12/08/2016
> [   22.772532] Call Trace:
> [   22.772544]  dump_stack+0x67/0x95
> [   22.772553]  ieee80211_bss_get_ie+0x66/0x70
> [   22.772562]  nl80211_send_iface+0x344/0x8d0
> [   22.772585]  nl80211_get_interface+0x4b/0xa0
> [   22.772598]  genl_family_rcv_msg+0x32e/0x3f0
> [   22.772607]  ? preempt_count_sub+0x92/0xd0
> [   22.772645]  genl_rcv_msg+0x47/0x90
> [   22.772652]  ? genl_family_rcv_msg+0x3f0/0x3f0
> [   22.772661]  netlink_rcv_skb+0x8a/0x120
> [   22.772677]  genl_rcv+0x24/0x40
> [   22.772684]  netlink_unicast+0x174/0x1f0
> [   22.772698]  netlink_sendmsg+0x386/0x3d0
> [   22.772719]  sock_sendmsg+0x2d/0x40
> [   22.772728]  ___sys_sendmsg+0x2a7/0x300
> [   22.772748]  ? netlink_sendmsg+0x13d/0x3d0
> [   22.772791]  ? __sys_sendmsg+0x67/0xb0
> [   22.772797]  __sys_sendmsg+0x67/0xb0
> [   22.772822]  entry_SYSCALL_64_fastpath+0x18/0x85
> 
> This warning wasn't present in 4.15. Despite of it, networking seems to
> work fine. Nonetheless, the code seems to need a bugfix.

Thanks,
Dominik


Re: general protection fault in skb_segment

2017-12-30 Thread Xin Long
On Sat, Dec 30, 2017 at 7:54 PM, Willem de Bruijn
 wrote:
>> So this is a packet socket writing something that apparently looks
>> like an SCTP packet, is only 42 bytes long, but has GSO set in its
>> virtio_net_hdr struct.
>>
>> It crashes in skb_segment seemingly on a NULL list_skb.
>>
>> (gdb) list *(skb_segment+0x2a4)
>> 0x8167cc24 is in skb_segment (net/core/skbuff.c:3566).
>> 3561if (hsize < 0)
>> 3562hsize = 0;
>> 3563if (hsize > len || !sg)
>> 3564hsize = len;
>> 3565
>> 3566if (!hsize && i >= nfrags && skb_headlen(list_skb) &&
>> 3567(skb_headlen(list_skb) == len || sg)) {
>> 3568BUG_ON(skb_headlen(list_skb) > len);
>> 3569
>> 3570i = 0;
>
> It appears to be a packet that consists only of an sctp header.
> sctp_gso_segment pulls the header before calling skb_segment,
> after which hsize == skb_headlen(head_skb) == 0 and nfrags == 0.
>
> This check avoids the crash, but still triggers an skb_warn_bad_offload
> on return in __skb_gso_segment
>
> @@ -45,6 +45,13 @@ static struct sk_buff *sctp_gso_segment(struct sk_buff 
> *skb,
> struct sk_buff *segs = ERR_PTR(-EINVAL);
> struct sctphdr *sh;
>
> +   if (!skb_has_frag_list(skb))
> +   goto out;
>
Shouldn't this check be in skb_segment(), right after if (mss ==
GSO_BY_FRAGS), like
if (unlikely(mss == GSO_BY_FRAGS)) {
if (unlikely(!list_skb))
goto err;
len = list_skb->len;

as the fix of commit 3953c46c3ac7 ("sk_buff: allow segmenting based on
frag sizes").


[PATCH] qed: Use zeroing memory allocator than allocator/memset

2017-12-30 Thread Himanshu Jha
Use dma_zalloc_coherent and vzalloc for allocating zeroed
memory and remove unnecessary memset function.

Done using Coccinelle.
Generated-by: scripts/coccinelle/api/alloc/kzalloc-simple.cocci
0-day tested with no failures.

Suggested-by: Luis R. Rodriguez 
Signed-off-by: Himanshu Jha 
---
 drivers/net/ethernet/qlogic/qed/qed_cxt.c | 12 +---
 drivers/net/ethernet/qlogic/qed/qed_l2.c  |  3 +--
 2 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_cxt.c 
b/drivers/net/ethernet/qlogic/qed/qed_cxt.c
index afd07ad..f0a55d2 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_cxt.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_cxt.c
@@ -1055,11 +1055,10 @@ static int qed_ilt_blk_alloc(struct qed_hwfn *p_hwfn,
u32 size;
 
size = min_t(u32, sz_left, p_blk->real_size_in_page);
-   p_virt = dma_alloc_coherent(&p_hwfn->cdev->pdev->dev,
-   size, &p_phys, GFP_KERNEL);
+   p_virt = dma_zalloc_coherent(&p_hwfn->cdev->pdev->dev, size,
+&p_phys, GFP_KERNEL);
if (!p_virt)
return -ENOMEM;
-   memset(p_virt, 0, size);
 
ilt_shadow[line].p_phys = p_phys;
ilt_shadow[line].p_virt = p_virt;
@@ -2303,14 +2302,13 @@ qed_cxt_dynamic_ilt_alloc(struct qed_hwfn *p_hwfn,
goto out0;
}
 
-   p_virt = dma_alloc_coherent(&p_hwfn->cdev->pdev->dev,
-   p_blk->real_size_in_page,
-   &p_phys, GFP_KERNEL);
+   p_virt = dma_zalloc_coherent(&p_hwfn->cdev->pdev->dev,
+p_blk->real_size_in_page, &p_phys,
+GFP_KERNEL);
if (!p_virt) {
rc = -ENOMEM;
goto out1;
}
-   memset(p_virt, 0, p_blk->real_size_in_page);
 
/* configuration of refTagMask to 0xF is required for RoCE DIF MR only,
 * to compensate for a HW bug, but it is configured even if DIF is not
diff --git a/drivers/net/ethernet/qlogic/qed/qed_l2.c 
b/drivers/net/ethernet/qlogic/qed/qed_l2.c
index 0853389..fd76b81 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_l2.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_l2.c
@@ -223,10 +223,9 @@ _qed_eth_queue_to_cid(struct qed_hwfn *p_hwfn,
struct qed_queue_cid *p_cid;
int rc;
 
-   p_cid = vmalloc(sizeof(*p_cid));
+   p_cid = vzalloc(sizeof(*p_cid));
if (!p_cid)
return NULL;
-   memset(p_cid, 0, sizeof(*p_cid));
 
p_cid->opaque_fid = opaque_fid;
p_cid->cid = cid;
-- 
2.7.4



[PATCH] ethernet/broadcom: Use zeroing memory allocator than allocator/memset

2017-12-30 Thread Himanshu Jha
Use dma_zalloc_coherent for allocating zeroed
memory and remove unnecessary memset function.

Done using Coccinelle.
Generated-by: scripts/coccinelle/api/alloc/kzalloc-simple.cocci
0-day tested with no failures.

Suggested-by: Luis R. Rodriguez 
Signed-off-by: Himanshu Jha 
---
 drivers/net/ethernet/broadcom/bcm63xx_enet.c  | 6 ++
 drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c | 5 ++---
 2 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bcm63xx_enet.c 
b/drivers/net/ethernet/broadcom/bcm63xx_enet.c
index 1fbbbab..14a59e5 100644
--- a/drivers/net/ethernet/broadcom/bcm63xx_enet.c
+++ b/drivers/net/ethernet/broadcom/bcm63xx_enet.c
@@ -2128,27 +2128,25 @@ static int bcm_enetsw_open(struct net_device *dev)
 
/* allocate rx dma ring */
size = priv->rx_ring_size * sizeof(struct bcm_enet_desc);
-   p = dma_alloc_coherent(kdev, size, &priv->rx_desc_dma, GFP_KERNEL);
+   p = dma_zalloc_coherent(kdev, size, &priv->rx_desc_dma, GFP_KERNEL);
if (!p) {
dev_err(kdev, "cannot allocate rx ring %u\n", size);
ret = -ENOMEM;
goto out_freeirq_tx;
}
 
-   memset(p, 0, size);
priv->rx_desc_alloc_size = size;
priv->rx_desc_cpu = p;
 
/* allocate tx dma ring */
size = priv->tx_ring_size * sizeof(struct bcm_enet_desc);
-   p = dma_alloc_coherent(kdev, size, &priv->tx_desc_dma, GFP_KERNEL);
+   p = dma_zalloc_coherent(kdev, size, &priv->tx_desc_dma, GFP_KERNEL);
if (!p) {
dev_err(kdev, "cannot allocate tx ring\n");
ret = -ENOMEM;
goto out_free_rx_ring;
}
 
-   memset(p, 0, size);
priv->tx_desc_alloc_size = size;
priv->tx_desc_cpu = p;
 
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c
index fed37cd..3c746f2 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c
@@ -278,12 +278,11 @@ static int bnxt_hwrm_set_dcbx_app(struct bnxt *bp, struct 
dcb_app *app,
 
n = IEEE_8021QAZ_MAX_TCS;
data_len = sizeof(*data) + sizeof(*fw_app) * n;
-   data = dma_alloc_coherent(&bp->pdev->dev, data_len, &mapping,
- GFP_KERNEL);
+   data = dma_zalloc_coherent(&bp->pdev->dev, data_len, &mapping,
+  GFP_KERNEL);
if (!data)
return -ENOMEM;
 
-   memset(data, 0, data_len);
bnxt_hwrm_cmd_hdr_init(bp, &get, HWRM_FW_GET_STRUCTURED_DATA, -1, -1);
get.dest_data_addr = cpu_to_le64(mapping);
get.structure_id = cpu_to_le16(STRUCT_HDR_STRUCT_ID_DCBX_APP);
-- 
2.7.4



Re: [PATCH net-next v5 0/5] Introduce NETIF_F_GRO_HW

2017-12-30 Thread David Miller
From: Michael Chan 
Date: Fri, 29 Dec 2017 21:20:02 -0800

> I think the name GRO_HW is perfectly fine.  It is GRO aggregation done
> in hardware, and hardware providing extra information to the driver to
> setup the SKB just like GRO.  I don't know what better name to call it
> than GRO_HW.

Agreed.


[PATCH] brcmfmac: Use zeroing memory allocator than allocator/memset

2017-12-30 Thread Himanshu Jha
Use dma_zalloc_coherent for allocating zeroed
memory and remove unnecessary memset function.

Done using Coccinelle.
Generated-by: scripts/coccinelle/api/alloc/kzalloc-simple.cocci
0-day tested with no failures.

Suggested-by: Luis R. Rodriguez 
Signed-off-by: Himanshu Jha 
---
 .../net/wireless/broadcom/brcm80211/brcmfmac/pcie.c| 18 --
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c 
b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c
index 3c87157..bdef2ac 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c
@@ -1251,14 +1251,13 @@ static int brcmf_pcie_init_scratchbuffers(struct 
brcmf_pciedev_info *devinfo)
u64 address;
u32 addr;
 
-   devinfo->shared.scratch = dma_alloc_coherent(&devinfo->pdev->dev,
-   BRCMF_DMA_D2H_SCRATCH_BUF_LEN,
-   &devinfo->shared.scratch_dmahandle, GFP_KERNEL);
+   devinfo->shared.scratch =
+   dma_zalloc_coherent(&devinfo->pdev->dev,
+   BRCMF_DMA_D2H_SCRATCH_BUF_LEN,
+   &devinfo->shared.scratch_dmahandle,
+   GFP_KERNEL);
if (!devinfo->shared.scratch)
goto fail;
 
-   memset(devinfo->shared.scratch, 0, BRCMF_DMA_D2H_SCRATCH_BUF_LEN);
-
addr = devinfo->shared.tcm_base_address +
   BRCMF_SHARED_DMA_SCRATCH_ADDR_OFFSET;
address = (u64)devinfo->shared.scratch_dmahandle;
@@ -1268,14 +1267,13 @@ static int brcmf_pcie_init_scratchbuffers(struct 
brcmf_pciedev_info *devinfo)
   BRCMF_SHARED_DMA_SCRATCH_LEN_OFFSET;
brcmf_pcie_write_tcm32(devinfo, addr, BRCMF_DMA_D2H_SCRATCH_BUF_LEN);
 
-   devinfo->shared.ringupd = dma_alloc_coherent(&devinfo->pdev->dev,
-   BRCMF_DMA_D2H_RINGUPD_BUF_LEN,
-   &devinfo->shared.ringupd_dmahandle, GFP_KERNEL);
+   devinfo->shared.ringupd =
+   dma_zalloc_coherent(&devinfo->pdev->dev,
+   BRCMF_DMA_D2H_RINGUPD_BUF_LEN,
+   &devinfo->shared.ringupd_dmahandle,
+   GFP_KERNEL);
if (!devinfo->shared.ringupd)
goto fail;
 
-   memset(devinfo->shared.ringupd, 0, BRCMF_DMA_D2H_RINGUPD_BUF_LEN);
-
addr = devinfo->shared.tcm_base_address +
   BRCMF_SHARED_DMA_RINGUPD_ADDR_OFFSET;
address = (u64)devinfo->shared.ringupd_dmahandle;
-- 
2.7.4



Re: [PATCH net-next 7/7] net: phy: convert read-modify-write to phy_modify()

2017-12-30 Thread Russell King - ARM Linux
Hi,

Unfortunately, I've found this afternoon that this patch causes a
regression for Marvell PHYs connected in RGMII mode - so please do
not apply this patch.  The remainder of the series is fine.

Thanks.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up


Re: [PATCH net-next 5/6] arm64: dts: marvell: mcbin: enable the fourth network interface

2017-12-30 Thread Marcin Wojtas
Hi Russell and Stefan,

2017-12-29 12:38 GMT+01:00 Russell King - ARM Linux :
> On Fri, Dec 29, 2017 at 12:12:15PM +0100, Marcin Wojtas wrote:
>> Hi Russell,
>>
>> I see that I misspelled your email address, hence the series remained 
>> unnoticed:
>> https://lkml.org/lkml/2017/12/18/216
>>
>> In terms of the phylink support, I think the most important are:
>> * 3/8
>> https://lkml.org/lkml/2017/12/18/211
>> * 7/8
>> https://lkml.org/lkml/2017/12/18/207
>>
>> I think the way of obtaining PHY fwnode and connecting it from the
>> latter patch could be incorporated to the phylink code. Although I
>> didn't get much feedback, the whole ACPI-handling of MDIO bus and the
>> PHYs touch ACPI specification and I expect it a slower to get merged.
>> Hence my idea is following:
>> * Send v2 with ACPI supporting link-irq only in mvpp2.c
>> * Extract MDIO bus handling for ACPI and propose PHY handling
>> modifications in phylink.
>>
>> This way we may push the two things forwards in more efficient way.
>> I'm looking forward to your opinion.
>
> Agreed - as we have very few users of phylink at the moment (they're
> mostly all in external trees) we can easily change the phylink
> interfaces.  The first step is solving the ACPI representation of the
> MDIO bus and attached devices, and until that is settled, not much can
> be done.
>
> However, it seems to me that the issues of adding ACPI to mvpp2 vs
> adding phylink to mvpp2 are two entirely separate problems that don't
> really conflict with each other - since the "phy" problem afflicts
> both.
>

Yes, I already split the series and will send first one right away. I
will be followed by MDIO bus / PHY handling proposal, including the
bits related to phylink. I'm looking forward to your opinion on that
once sent.

> However, I'm not sure what this "link-irq" thing is that you talk
> about (and I suspect it's one of the things that I've been trying for
> months to find out about from Antoine when he says that there's stuff
> that mvpp2 supports that phylink doesn't.)  So, I'm left to guess, and
> I guess it's the mvpp2-variant of mvneta's in-band autonegotiation.
> Continuing to guess from the mvpp2 phylink conversion patch, this mvpp2
> variant is selected by not providing a phy handle in DT, whereas
> mvneta's variant is selected using the ethernet-standard property
> 'managed = "in-band-status"'.

This my understanding of how the PP2 HW works in terms of signalling
the link interrupt:

The full in-band management, similar to mvneta is supported only in
the SGMII mode (1G, not sure how it looks like in 2.5G mode). Such
handling is not yet implemented in the mvpp2.c

10G:
The XGMII MAC (XLG) is capable of generating link status change
interrupt upon information provided from the reconciliation layer (RS)
of the interface.

2.5G/1G SGMII:
Apart from the in-band management, the MAC is also capable of
generating IRQ during link-status change.

1G RGMII:
I was a bit surprised, but checked on my own - the link change IRQ can
be generated here as well.

In addition to above the clause 22 PHYs can be automatically polled
via SMI bus and provide complete information about link status, speed,
etc., reflecting it directly in GMAC status registers. However, this
feature had to be disabled, in order not to conflict with SW PHY
management of the phylib.

Stefan, is above correct?

>
> If my guessing is correct, I have to wonder why mvpp2 invented a
> different way to represent this from mvneta?  This makes it much more
> difficult to convert mvpp2 to phylink, and it also makes it difficult
> to add SFP support ignoring the phylink issue (since there is no phy
> handle there either.)

Doesn't SFP require the fwnode handle to the sfp node? This is what I
understand at least from the phylink_register_sfp.

Anyway, once the phylink is introduced in mvpp2.c, its presence will
simply be detected by port->phylink pointer. In such case the link IRQ
will no be used. In longer perspective, link IRQ should be used only
by ACPI and once MDIO bus is supported in generic way in this world,
it could remain as the 'last resort' option.

Best regards,
Marcin


Re: [PATCH] sky2: Replace mdelay with msleep in sky2_vpd_wait

2017-12-30 Thread Stephen Hemminger
On Sat, 30 Dec 2017 19:09:47 +0800
Jia-Ju Bai  wrote:

> sky2_vpd_wait is not called in an interrupt handler nor holding a spinlock.
> The function mdelay in it can be replaced with msleep, to reduce busy wait.
> 
> Signed-off-by: Jia-Ju Bai 
> ---
>  drivers/net/ethernet/marvell/sky2.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/marvell/sky2.c 
> b/drivers/net/ethernet/marvell/sky2.c
> index 9efe177..9fe8530 100644
> --- a/drivers/net/ethernet/marvell/sky2.c
> +++ b/drivers/net/ethernet/marvell/sky2.c
> @@ -4287,7 +4287,7 @@ static int sky2_vpd_wait(const struct sky2_hw *hw, int 
> cap, u16 busy)
>   dev_err(&hw->pdev->dev, "VPD cycle timed out\n");
>   return -ETIMEDOUT;
>   }
> - mdelay(1);
> + msleep(1);
>   }
>  
>   return 0;

When sky2 was written mdelay was equivalent to current msleep.

Acked-by: Stephen Hemminger 


[PATCH V4 0/4] Add SELinux SCTP protocol support

2017-12-30 Thread Richard Haines
Note: Some conflicts are expected when merging with current net-next due to
Interleaving Data (I-DATA) sets of patches:
PATCH 2/4 - Where 'sctp_datachk_len(&asoc->stream)' has replaced
'sizeof(struct sctp_data_chunk)' in include/net/sctp/sctp.h, 
net/sctp/chunk.c and net/sctp/socket.c 
PATCH 3/4 - Where include/uapi/linux/sctp.h requires a fix to update the
#define SCTP_SENDMSG_CONNECT to a higher number.

These patches have been built on Fedora 27 with kernel 4.14.8 plus
the following userspace patches to enable testing:

1) Updates to libsepol 2.7 to support the sctp portcon statement.
   The patch is available from:
 http://arctic.selinuxproject.org/~rhaines/selinux-sctp/
 selinux-Add-support-for-the-SCTP-portcon-keyword.patch

2) Updates to the SELinux Test Suite adding SCTP tests. Please read the
   selinux-testsuite/README.sctp for details. The patch is available from:
 http://arctic.selinuxproject.org/~rhaines/selinux-sctp/
 selinux-testsuite-Add-SCTP-test-support.patch

3) Updates to lksctp-tools that show SELinux info in sctp_darn and
   sctp_test. It also contains a minor patch for test_1_to_1_connect.c
   as when CIPSO/CALIPSO configured, NetLabel returns a different error
   code for illegal addresses in test 5. The patch is available from:
 http://arctic.selinuxproject.org/~rhaines/selinux-sctp/
 lksctp-tools-Add-SELinux-support-to-sctp_test-and-sc.patch

All SCTP lksctp-tools/src/func_tests run correctly in enforcing mode.

All SCTP regression tests "./sctp-tests run" run correctly in enforcing
mode. These tests are obtained from: https://github.com/sctp/sctp-tests

The selinux-testsuite patch also adds remote tests (that need some manual
configuration). These are useful for testing CIPSO/CALIPSO over a network
with a number of categories to produce large ip option fields with various
message sizes forcing fragmentation etc..

Changes since RFC Patch:
Removed the NetLabel patch (was [RFC PATCH 4/5] netlabel: Add SCTP support)
as re-engineered. However this patchset will require the NetLabel
patch at [1] to fully run the SCTP selinux-testsuite.

V1 Changes:
PATCH 1/4
Remove unused parameter from security_sctp_assoc_request().
Reformat and update LSM-sctp.rst documentation.
PATCH 2/4
Add variables and RCU locks as requested in [2] to support IP options.
PATCH 3/4
Added security_sctp_assoc_request() hook to sctp_sf_do_unexpected_init()
and sctp_sf_do_5_2_4_dupcook().
Removed security_sctp_assoc_request() hook from sctp_sf_do_5_1C_ack() as
no longer required.
PATCH 4/4
Reformat and update SELinux-sctp.rst documentation.
Remove bindx and connectx permissions.
Rework selinux_socket_connect() and selinux_netlbl_socket_connect() to
utilise helpers for code reuse.
Add spinlock to selinux_sctp_assoc_request().
Remove unused parameter from security_sctp_assoc_request().
Use address->sa_family == AF_INET in *_bind and *_connect to ensure
correct address type.
Minor cleanups.

V2 Changes:
PATCH 4/4 - Remove spin lock from selinux_sctp_assoc_request()
PATCH 4/4 - Fix selinux_sctp_sk_clone() kbuild test robot catch [3]

V3 Changes:
PATCH 2/4 - Account for IP options length in sctp.h sctp_frag_point() by
Marcelo

V4 Changes:
PATCH 1/4 - Move specific SELinux descriptions from LSM-sctp.rst and
lsm_hooks.h to SELinux-sctp.rst in PATCH 4/4
PATCH 4/4 - Rename selinux_netlbl_sctp_socket_connect() to
selinux_netlbl_socket_connect_locked() and move description comments to
selinux_sctp_bind_connect()

[1] https://marc.info/?l=selinux&m=151061619115945&w=2
[2] https://marc.info/?l=selinux&m=150962470215797&w=2
[3] https://marc.info/?l=selinux&m=151198281817779&w=2


Richard Haines (4):
  security: Add support for SCTP security hooks
  sctp: Add ip option support
  sctp: Add LSM hooks
  selinux: Add SCTP support

 Documentation/security/LSM-sctp.rst | 175 
 Documentation/security/SELinux-sctp.rst | 157 ++
 include/linux/lsm_hooks.h   |  36 
 include/linux/security.h|  25 +++
 include/net/sctp/sctp.h |   4 +-
 include/net/sctp/structs.h  |  12 ++
 include/uapi/linux/sctp.h   |   1 +
 net/sctp/chunk.c|  13 +-
 net/sctp/ipv6.c |  42 -
 net/sctp/output.c   |   5 +-
 net/sctp/protocol.c |  36 
 net/sctp/sm_make_chunk.c|  12 ++
 net/sctp/sm_statefuns.c |  18 ++
 net/sctp/socket.c   |  70 +++-
 security/security.c |  22 +++
 security/selinux/hooks.c| 280 +---
 security/selinux/include/classmap.h |   2 +-
 security/selinux/include/netlabel.h |  21 ++-
 security/selinux/include/objsec.h   |   4 +
 security/selinux/netlabel.c | 138 ++--
 20 files changed, 1024 insertions(+), 49 deletions(-)
 create mode 100644 Documentation

[PATCH V4 3/4] sctp: Add LSM hooks

2017-12-30 Thread Richard Haines
Add security hooks to allow security modules to exercise access control
over SCTP.

Signed-off-by: Richard Haines 
---
 include/net/sctp/structs.h | 10 
 include/uapi/linux/sctp.h  |  1 +
 net/sctp/sm_make_chunk.c   | 12 +
 net/sctp/sm_statefuns.c| 18 ++
 net/sctp/socket.c  | 61 +-
 5 files changed, 101 insertions(+), 1 deletion(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 9942ed5..2ca0a3f 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -1271,6 +1271,16 @@ struct sctp_endpoint {
  reconf_enable:1;
 
__u8  strreset_enable;
+
+   /* Security identifiers from incoming (INIT). These are set by
+* security_sctp_assoc_request(). These will only be used by
+* SCTP TCP type sockets and peeled off connections as they
+* cause a new socket to be generated. security_sctp_sk_clone()
+* will then plug these into the new socket.
+*/
+
+   u32 secid;
+   u32 peer_secid;
 };
 
 /* Recover the outter endpoint structure. */
diff --git a/include/uapi/linux/sctp.h b/include/uapi/linux/sctp.h
index cfe9712..cafac36 100644
--- a/include/uapi/linux/sctp.h
+++ b/include/uapi/linux/sctp.h
@@ -123,6 +123,7 @@ typedef __s32 sctp_assoc_t;
 #define SCTP_RESET_ASSOC   120
 #define SCTP_ADD_STREAMS   121
 #define SCTP_SOCKOPT_PEELOFF_FLAGS 122
+#define SCTP_SENDMSG_CONNECT   123
 
 /* PR-SCTP policies */
 #define SCTP_PR_SCTP_NONE  0x
diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index 514465b..269fd3d 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -3054,6 +3054,12 @@ static __be16 sctp_process_asconf_param(struct 
sctp_association *asoc,
if (af->is_any(&addr))
memcpy(&addr, &asconf->source, sizeof(addr));
 
+   if (security_sctp_bind_connect(asoc->ep->base.sk,
+  SCTP_PARAM_ADD_IP,
+  (struct sockaddr *)&addr,
+  af->sockaddr_len))
+   return SCTP_ERROR_REQ_REFUSED;
+
/* ADDIP 4.3 D9) If an endpoint receives an ADD IP address
 * request and does not have the local resources to add this
 * new address to the association, it MUST return an Error
@@ -3120,6 +3126,12 @@ static __be16 sctp_process_asconf_param(struct 
sctp_association *asoc,
if (af->is_any(&addr))
memcpy(&addr.v4, sctp_source(asconf), sizeof(addr));
 
+   if (security_sctp_bind_connect(asoc->ep->base.sk,
+  SCTP_PARAM_SET_PRIMARY,
+  (struct sockaddr *)&addr,
+  af->sockaddr_len))
+   return SCTP_ERROR_REQ_REFUSED;
+
peer = sctp_assoc_lookup_paddr(asoc, &addr);
if (!peer)
return SCTP_ERROR_DNS_FAILED;
diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index 8f8ccde..a2dfc5a 100644
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -318,6 +318,11 @@ enum sctp_disposition sctp_sf_do_5_1B_init(struct net *net,
struct sctp_packet *packet;
int len;
 
+   /* Update socket peer label if first association. */
+   if (security_sctp_assoc_request((struct sctp_endpoint *)ep,
+   chunk->skb))
+   return sctp_sf_pdiscard(net, ep, asoc, type, arg, commands);
+
/* 6.10 Bundling
 * An endpoint MUST NOT bundle INIT, INIT ACK or
 * SHUTDOWN COMPLETE with any other chunks.
@@ -905,6 +910,9 @@ enum sctp_disposition sctp_sf_do_5_1E_ca(struct net *net,
 */
sctp_add_cmd_sf(commands, SCTP_CMD_INIT_COUNTER_RESET, SCTP_NULL());
 
+   /* Set peer label for connection. */
+   security_inet_conn_established(ep->base.sk, chunk->skb);
+
/* RFC 2960 5.1 Normal Establishment of an Association
 *
 * E) Upon reception of the COOKIE ACK, endpoint "A" will move
@@ -1433,6 +1441,11 @@ static enum sctp_disposition sctp_sf_do_unexpected_init(
struct sctp_packet *packet;
int len;
 
+   /* Update socket peer label if first association. */
+   if (security_sctp_assoc_request((struct sctp_endpoint *)ep,
+   chunk->skb))
+   return sctp_sf_pdiscard(net, ep, asoc, type, arg, commands);
+
/* 6.10 Bundling
 * An endpoint MUST NOT bundle INIT, INIT ACK or
 * SHUTDOWN COMPLETE with any other chunks.
@@ -2103,6 +2116,11 @@ enum sctp_disposition sctp_sf_do_5_2_4_dupcook(
}
}
 
+   /* Update socket peer label if first association. */
+   if (security_sctp_assoc

[PATCH V4 2/4] sctp: Add ip option support

2017-12-30 Thread Richard Haines
Add ip option support to allow LSM security modules to utilise CIPSO/IPv4
and CALIPSO/IPv6 services.

Signed-off-by: Richard Haines 
---
 include/net/sctp/sctp.h|  4 +++-
 include/net/sctp/structs.h |  2 ++
 net/sctp/chunk.c   | 13 -
 net/sctp/ipv6.c| 42 +++---
 net/sctp/output.c  |  5 -
 net/sctp/protocol.c| 36 
 net/sctp/socket.c  |  9 +++--
 7 files changed, 95 insertions(+), 16 deletions(-)

diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index d7d8cba..1b2f40a 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -436,9 +436,11 @@ static inline int sctp_list_single_entry(struct list_head 
*head)
 static inline int sctp_frag_point(const struct sctp_association *asoc, int 
pmtu)
 {
struct sctp_sock *sp = sctp_sk(asoc->base.sk);
+   struct sctp_af *af = sp->pf->af;
int frag = pmtu;
 
-   frag -= sp->pf->af->net_header_len;
+   frag -= af->ip_options_len(asoc->base.sk);
+   frag -= af->net_header_len;
frag -= sizeof(struct sctphdr) + sizeof(struct sctp_data_chunk);
 
if (asoc->user_frag)
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 0477945..9942ed5 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -461,6 +461,7 @@ struct sctp_af {
void(*ecn_capable)(struct sock *sk);
__u16   net_header_len;
int sockaddr_len;
+   int (*ip_options_len)(struct sock *sk);
sa_family_t sa_family;
struct list_head list;
 };
@@ -485,6 +486,7 @@ struct sctp_pf {
int (*addr_to_user)(struct sctp_sock *sk, union sctp_addr *addr);
void (*to_sk_saddr)(union sctp_addr *, struct sock *sk);
void (*to_sk_daddr)(union sctp_addr *, struct sock *sk);
+   void (*copy_ip_options)(struct sock *sk, struct sock *newsk);
struct sctp_af *af;
 };
 
diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c
index 3afac27..9d130f4 100644
--- a/net/sctp/chunk.c
+++ b/net/sctp/chunk.c
@@ -153,7 +153,6 @@ static void sctp_datamsg_assign(struct sctp_datamsg *msg, 
struct sctp_chunk *chu
chunk->msg = msg;
 }
 
-
 /* A data chunk can have a maximum payload of (2^16 - 20).  Break
  * down any such message into smaller chunks.  Opportunistically, fragment
  * the chunks down to the current MTU constraints.  We may get refragmented
@@ -170,6 +169,8 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct 
sctp_association *asoc,
struct list_head *pos, *temp;
struct sctp_chunk *chunk;
struct sctp_datamsg *msg;
+   struct sctp_sock *sp;
+   struct sctp_af *af;
int err;
 
msg = sctp_datamsg_new(GFP_KERNEL);
@@ -188,9 +189,12 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct 
sctp_association *asoc,
/* This is the biggest possible DATA chunk that can fit into
 * the packet
 */
-   max_data = asoc->pathmtu -
-  sctp_sk(asoc->base.sk)->pf->af->net_header_len -
-  sizeof(struct sctphdr) - sizeof(struct sctp_data_chunk);
+   sp = sctp_sk(asoc->base.sk);
+   af = sp->pf->af;
+   max_data = asoc->pathmtu - af->net_header_len -
+  sizeof(struct sctphdr) - sizeof(struct sctp_data_chunk) -
+  af->ip_options_len(asoc->base.sk);
+
max_data = SCTP_TRUNC4(max_data);
 
/* If the the peer requested that we authenticate DATA chunks
@@ -210,7 +214,6 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct 
sctp_association *asoc,
 
/* Set first_len and then account for possible bundles on first frag */
first_len = max_data;
-
/* Check to see if we have a pending SACK and try to let it be bundled
 * with this message.  Do this if we don't have any data queued already.
 * To check that, look at out_qlen and retransmit list.
diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index 3b18085..b06dc81 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -423,6 +423,38 @@ static void sctp_v6_copy_addrlist(struct list_head 
*addrlist,
rcu_read_unlock();
 }
 
+/* Copy over any ip options */
+static void sctp_v6_copy_ip_options(struct sock *sk, struct sock *newsk)
+{
+   struct ipv6_pinfo *newnp, *np = inet6_sk(sk);
+   struct ipv6_txoptions *opt;
+
+   newnp = inet6_sk(newsk);
+
+   rcu_read_lock();
+   opt = rcu_dereference(np->opt);
+   if (opt)
+   opt = ipv6_dup_options(newsk, opt);
+   RCU_INIT_POINTER(newnp->opt, opt);
+   rcu_read_unlock();
+}
+
+/* Account for the IP options */
+static int sctp_v6_ip_options_len(struct sock *sk)
+{
+   struct ipv6_pinfo *np = inet6_sk(sk);
+   struct ipv6_txoptions *opt;
+   int len = 0;
+
+   rcu_read_lock();
+   opt = rcu_dereference(np->opt);
+   if (opt)
+   len = opt

[PATCH V4 4/4] selinux: Add SCTP support

2017-12-30 Thread Richard Haines
The SELinux SCTP implementation is explained in:
Documentation/security/SELinux-sctp.rst

Signed-off-by: Richard Haines 
---
 Documentation/security/SELinux-sctp.rst | 157 ++
 security/selinux/hooks.c| 280 +---
 security/selinux/include/classmap.h |   2 +-
 security/selinux/include/netlabel.h |  21 ++-
 security/selinux/include/objsec.h   |   4 +
 security/selinux/netlabel.c | 138 ++--
 6 files changed, 570 insertions(+), 32 deletions(-)
 create mode 100644 Documentation/security/SELinux-sctp.rst

diff --git a/Documentation/security/SELinux-sctp.rst 
b/Documentation/security/SELinux-sctp.rst
new file mode 100644
index 000..2f66bf3
--- /dev/null
+++ b/Documentation/security/SELinux-sctp.rst
@@ -0,0 +1,157 @@
+SCTP SELinux Support
+=
+
+Security Hooks
+===
+
+``Documentation/security/LSM-sctp.rst`` describes the following SCTP security
+hooks with the SELinux specifics expanded below::
+
+security_sctp_assoc_request()
+security_sctp_bind_connect()
+security_sctp_sk_clone()
+security_inet_conn_established()
+
+
+security_sctp_assoc_request()
+-
+Passes the ``@ep`` and ``@chunk->skb`` of the association INIT packet to the
+security module. Returns 0 on success, error on failure.
+::
+
+@ep - pointer to sctp endpoint structure.
+@skb - pointer to skbuff of association packet.
+
+The security module performs the following operations:
+ IF this is the first association on ``@ep->base.sk``, then set the peer
+ sid to that in ``@skb``. This will ensure there is only one peer sid
+ assigned to ``@ep->base.sk`` that may support multiple associations.
+
+ ELSE validate the ``@ep->base.sk peer_sid`` against the ``@skb peer sid``
+ to determine whether the association should be allowed or denied.
+
+ Set the sctp ``@ep sid`` to socket's sid (from ``ep->base.sk``) with
+ MLS portion taken from ``@skb peer sid``. This will be used by SCTP
+ TCP style sockets and peeled off connections as they cause a new socket
+ to be generated.
+
+ If IP security options are configured (CIPSO/CALIPSO), then the ip
+ options are set on the socket.
+
+
+security_sctp_bind_connect()
+-
+Checks permissions required for ipv4/ipv6 addresses based on the ``@optname``
+as follows::
+
+  --
+  |   BIND Permission Checks   |
+  |   @optname | @address contains |
+  ||---|
+  | SCTP_SOCKOPT_BINDX_ADD | One or more ipv4 / ipv6 addresses |
+  | SCTP_PRIMARY_ADDR  | Single ipv4 or ipv6 address   |
+  | SCTP_SET_PEER_PRIMARY_ADDR | Single ipv4 or ipv6 address   |
+  --
+
+  --
+  | CONNECT Permission Checks  |
+  |   @optname | @address contains |
+  ||---|
+  | SCTP_SOCKOPT_CONNECTX  | One or more ipv4 / ipv6 addresses |
+  | SCTP_PARAM_ADD_IP  | One or more ipv4 / ipv6 addresses |
+  | SCTP_SENDMSG_CONNECT   | Single ipv4 or ipv6 address   |
+  | SCTP_PARAM_SET_PRIMARY | Single ipv4 or ipv6 address   |
+  --
+
+
+``Documentation/security/LSM-sctp.rst`` gives a summary of the ``@optname``
+entries and also describes ASCONF chunk processing when Dynamic Address
+Reconfiguration is enabled.
+
+
+security_sctp_sk_clone()
+-
+Called whenever a new socket is created by **accept**\(2) (i.e. a TCP style
+socket) or when a socket is 'peeled off' e.g userspace calls
+**sctp_peeloff**\(3). ``security_sctp_sk_clone()`` will set the new
+sockets sid and peer sid to that contained in the ``@ep sid`` and
+``@ep peer sid`` respectively.
+::
+
+@ep - pointer to current sctp endpoint structure.
+@sk - pointer to current sock structure.
+@sk - pointer to new sock structure.
+
+
+security_inet_conn_established()
+-
+Called when a COOKIE ACK is received where it sets the connection's peer sid
+to that in ``@skb``::
+
+@sk  - pointer to sock structure.
+@skb - pointer to skbuff of the COOKIE ACK packet.
+
+
+Policy Statements
+==
+The following class and permissions to support SCTP are available within the
+kernel::
+
+class sctp_socket inherits socket { node_bind }
+
+whenever the following policy capability is enabled::
+
+policycap extended_socket_class;
+
+SELinux SCTP support adds the ``name_connect`` permission for connecting
+to a specific port type and the ``associ

[PATCH V4 1/4] security: Add support for SCTP security hooks

2017-12-30 Thread Richard Haines
The SCTP security hooks are explained in:
Documentation/security/LSM-sctp.rst

Signed-off-by: Richard Haines 
---
 Documentation/security/LSM-sctp.rst | 175 
 include/linux/lsm_hooks.h   |  36 
 include/linux/security.h|  25 ++
 security/security.c |  22 +
 4 files changed, 258 insertions(+)
 create mode 100644 Documentation/security/LSM-sctp.rst

diff --git a/Documentation/security/LSM-sctp.rst 
b/Documentation/security/LSM-sctp.rst
new file mode 100644
index 000..6e5a392
--- /dev/null
+++ b/Documentation/security/LSM-sctp.rst
@@ -0,0 +1,175 @@
+SCTP LSM Support
+
+
+For security module support, three SCTP specific hooks have been implemented::
+
+security_sctp_assoc_request()
+security_sctp_bind_connect()
+security_sctp_sk_clone()
+
+Also the following security hook has been utilised::
+
+security_inet_conn_established()
+
+The usage of these hooks are described below with the SELinux implementation
+described in ``Documentation/security/SELinux-sctp.rst``
+
+
+security_sctp_assoc_request()
+-
+Passes the ``@ep`` and ``@chunk->skb`` of the association INIT packet to the
+security module. Returns 0 on success, error on failure.
+::
+
+@ep - pointer to sctp endpoint structure.
+@skb - pointer to skbuff of association packet.
+
+
+security_sctp_bind_connect()
+-
+Passes one or more ipv4/ipv6 addresses to the security module for validation
+based on the ``@optname`` that will result in either a bind or connect
+service as shown in the permission check tables below.
+Returns 0 on success, error on failure.
+::
+
+@sk  - Pointer to sock structure.
+@optname - Name of the option to validate.
+@address - One or more ipv4 / ipv6 addresses.
+@addrlen - The total length of address(s). This is calculated on each
+   ipv4 or ipv6 address using sizeof(struct sockaddr_in) or
+   sizeof(struct sockaddr_in6).
+
+  --
+  | BIND Type Checks   |
+  |   @optname | @address contains |
+  ||---|
+  | SCTP_SOCKOPT_BINDX_ADD | One or more ipv4 / ipv6 addresses |
+  | SCTP_PRIMARY_ADDR  | Single ipv4 or ipv6 address   |
+  | SCTP_SET_PEER_PRIMARY_ADDR | Single ipv4 or ipv6 address   |
+  --
+
+  --
+  |   CONNECT Type Checks  |
+  |   @optname | @address contains |
+  ||---|
+  | SCTP_SOCKOPT_CONNECTX  | One or more ipv4 / ipv6 addresses |
+  | SCTP_PARAM_ADD_IP  | One or more ipv4 / ipv6 addresses |
+  | SCTP_SENDMSG_CONNECT   | Single ipv4 or ipv6 address   |
+  | SCTP_PARAM_SET_PRIMARY | Single ipv4 or ipv6 address   |
+  --
+
+A summary of the ``@optname`` entries is as follows::
+
+SCTP_SOCKOPT_BINDX_ADD - Allows additional bind addresses to be
+ associated after (optionally) calling
+ bind(3).
+ sctp_bindx(3) adds a set of bind
+ addresses on a socket.
+
+SCTP_SOCKOPT_CONNECTX - Allows the allocation of multiple
+addresses for reaching a peer
+(multi-homed).
+sctp_connectx(3) initiates a connection
+on an SCTP socket using multiple
+destination addresses.
+
+SCTP_SENDMSG_CONNECT  - Initiate a connection that is generated by a
+sendmsg(2) or sctp_sendmsg(3) on a new asociation.
+
+SCTP_PRIMARY_ADDR - Set local primary address.
+
+SCTP_SET_PEER_PRIMARY_ADDR - Request peer sets address as
+ association primary.
+
+SCTP_PARAM_ADD_IP  - These are used when Dynamic Address
+SCTP_PARAM_SET_PRIMARY - Reconfiguration is enabled as explained below.
+
+
+To support Dynamic Address Reconfiguration the following parameters must be
+enabled on both endpoints (or use the appropriate **setsockopt**\(2))::
+
+/proc/sys/net/sctp/addip_enable
+/proc/sys/net/sctp/addip_noauth_enable
+
+then the following *_PARAM_*'s are sent to the peer in an
+ASCONF chunk when the corresponding ``@optname``'s are present::
+
+  @optname  ASCONF Parameter
+ ----
+SCTP_SOCKOPT_BINDX_ADD ->   SCTP_PARAM_ADD_IP
+SCTP_SET_PEER_P

Re: [patch net-next v2 00/10] Add support for resource abstraction

2017-12-30 Thread Andrew Lunn
> In my opinion it should not change. Unless there is a bug (like the one
> DaveA found in mlxsw erif table). Existing tables and resources should
> be only added. It is the driver's maintainer responsibility to not to
> break user scripts.

So we agree with is ABI. Great.

   Andrew


Re: [PATCH net-next 5/6] arm64: dts: marvell: mcbin: enable the fourth network interface

2017-12-30 Thread Russell King - ARM Linux
Hi Marcin,

On Sat, Dec 30, 2017 at 05:34:23PM +0100, Marcin Wojtas wrote:
> Yes, I already split the series and will send first one right away. I
> will be followed by MDIO bus / PHY handling proposal, including the
> bits related to phylink. I'm looking forward to your opinion on that
> once sent.

I'm looking forward to the patches. :)

> This my understanding of how the PP2 HW works in terms of signalling
> the link interrupt:
> 
> The full in-band management, similar to mvneta is supported only in
> the SGMII mode (1G, not sure how it looks like in 2.5G mode). Such
> handling is not yet implemented in the mvpp2.c
> 
> 10G:
> The XGMII MAC (XLG) is capable of generating link status change
> interrupt upon information provided from the reconciliation layer (RS)
> of the interface.
> 
> 2.5G/1G SGMII:
> Apart from the in-band management, the MAC is also capable of
> generating IRQ during link-status change.
> 
> 1G RGMII:
> I was a bit surprised, but checked on my own - the link change IRQ can
> be generated here as well.
> 
> In addition to above the clause 22 PHYs can be automatically polled
> via SMI bus and provide complete information about link status, speed,
> etc., reflecting it directly in GMAC status registers. However, this
> feature had to be disabled, in order not to conflict with SW PHY
> management of the phylib.
> 
> Stefan, is above correct?

This sounds very much like mvneta's 'managed = "in-band"' mode.

Having done some research earlier this month on the "2.5G SGMII" I have
a number of comments about this:

1. Beware of "SGMII" being used as a generic term for single lane serdes
   based ethernet. Marvell seem to use this for 802.3z BASE-X in their
   code, but it is not. SGMII is a modification of 802.3z BASE-X by
   Cisco.  This leads to some confusion!

2. For Cisco SGMII running at 2.5G, PHYs such as those from Xilinx do not
   support the speed bits, because the speed is defined to be 2.5G.  IOW,
   they do not support 250Mbps or 25Mbps speeds by data replication as is
   done with 100Mbps and 10Mbps over 1G SGMII.

3. There is also 1000BASE-X upclocked to 2.5G speeds, which mvneta and
   mvpp2 both support by appropriate configuration of the comphy.  I've
   already tested this with 4.3Mbps Fiberchannel SFPs between clearfog
   and mcbin - but needing devmem2 to reconfigure the clearfog comphy.

> > If my guessing is correct, I have to wonder why mvpp2 invented a
> > different way to represent this from mvneta?  This makes it much more
> > difficult to convert mvpp2 to phylink, and it also makes it difficult
> > to add SFP support ignoring the phylink issue (since there is no phy
> > handle there either.)
> 
> Doesn't SFP require the fwnode handle to the sfp node? This is what I
> understand at least from the phylink_register_sfp.

Yes, internally within phylink.  What I'm concerned about is the
following disparity between mvneta and mvpp2 - I'll try to explain it
more clearly with DT examples:

1.1. mvneta phy
ð {
phy = <&phy>;
phy-mode = "whatever";
};
1.2. mvneta fixed-link
ð {
fixed-link {
speed = <1000>;
full-duplex;
};
};
1.3. mvneta in-band
ð {
phy-mode = "sgmii";
managed = "in-band-status";
};
2.1. mvpp2 phy
ð {
phy = <&phy>;
phy-mode = "whatever";
};
2.2. mvpp2 fixed-link
ð {
fixed-link {
speed = <1000>;
full-duplex;
};
};
2.3. mvpp2 in-band (guess)
ð {
phy-mode = "sgmii";
};

In both cases, the representation for phy and fixed-link mode are the
same, but the in-band are different.  In mvneta in-band, the generic
"managed" property must be specified as specified by
Documentation/devicetree/bindings/net/ethernet.txt.  However, for mvpp2,
this mode is currently selected by omission of both a "phy" property and
a "fixed-link" sub-node/property - and that goes against the description
of the "managed" property in the ethernet.txt binding doc.

Phylink won't recognise the mvpp2's style of "in-band" because phylink,
being a piece of generic code, is written to follow the generic binding
documentation, rather than accomodating driver's individual quirks.

So, if what I think is correct (basically what I've said above) there is
a problem converting mvpp2 to use phylink - any existing DT files that
use the "2.3 mvpp2 in-band" method instantly break, and I think that's
what Antoine referred to when I picked out that the previous patches
avoided using phylink when there was no "phy" node present.

However, I haven't spotted anything using the 2.3 method, but it's not
that easy to find the lack of a property amongst the maze of .dts*
files - trying to track down which use mvpp2 and which do not specify
a phy or fixed-link

Re: [PATCH iproute2 v2 1/4] man: drop references to Debian-specific paths

2017-12-30 Thread Stephen Hemminger
On Sat, 30 Dec 2017 11:31:14 +0100
Luca Boccassi  wrote:

> Documentation should be distribution-agnostic - any specific quirks
> should be handled by downstream maintainers, if necessary.
> Remove mentions of Debian paths and package names.
> 
> Signed-off-by: Luca Boccassi 

Applied all of these


Re: [PATCH RESEND 1/3] net: Fix possible race in peernet2id_alloc()

2017-12-30 Thread Kirill Tkhai
On 29.12.2017 21:18, Eric W. Biederman wrote:
> Kirill Tkhai  writes:
> 
>> peernet2id_alloc() is racy without rtnl_lock() as atomic_read(&peer->count)
>> under net->nsid_lock does not guarantee, peer is alive:
>>
>> rcu_read_lock()
>> peernet2id_alloc()..
>>   spin_lock_bh(&net->nsid_lock)   ..
>>   atomic_read(&peer->count) == 1  ..
>>   ..  put_net()
>>   ..cleanup_net()
>>   ..  for_each_net(tmp)
>>   ..
>> spin_lock_bh(&tmp->nsid_lock)
>>   ..__peernet2id(tmp, net) 
>> == -1
>>   ....
>>   ....
>> __peernet2id_alloc(alloc == true)   ..
>>   ....
>> rcu_read_unlock()   ..
>> ..synchronize_rcu()
>> ..kmem_cache_free(net)
>>
>> After the above situation, net::netns_id contains id pointing to freed 
>> memory,
>> and any other dereferencing by the id will operate with this freed memory.
>>
>> Currently, peernet2id_alloc() is used under rtnl_lock() everywhere except
>> ovs_vport_cmd_fill_info(), and this race can't occur. But peernet2id_alloc()
>> is generic interface, and better we fix it before someone really starts
>> use it in wrong context.
> 
> Nacked-by: "Eric W. Biederman" 
> 
> I have already made a clear objection to the first unnecessary and
> confusing hunk.  Simply resending the muddle headed code doesn't make it
> better.

You provided comments on my changes and you asked couple of questions. I replied
your questions and explained, why it seems important to made the first hunk. 
Since
there were questions from you I interpreted the conversation is a discussion. 
Later
there was no an answer from you, and patchset status became not clear for me, 
and I
wrote about that. I had no an aim to disappoint you or ignore your position.

Thank you for the reply. Now the position is clear for me. I'll remove the first
hunk and resend the changed patchset like you suggested.

Kirill

>> Signed-off-by: Kirill Tkhai 
>> ---
>>  net/core/net_namespace.c |   23 +++
>>  1 file changed, 19 insertions(+), 4 deletions(-)
>>
>> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
>> index 60a71be75aea..6a4eab438221 100644
>> --- a/net/core/net_namespace.c
>> +++ b/net/core/net_namespace.c
>> @@ -221,17 +221,32 @@ static void rtnl_net_notifyid(struct net *net, int 
>> cmd, int id);
>>   */
>>  int peernet2id_alloc(struct net *net, struct net *peer)
>>  {
>> -bool alloc;
>> +bool alloc = false, alive = false;
>>  int id;
>>  
>> -if (atomic_read(&net->count) == 0)
>> -return NETNSA_NSID_NOT_ASSIGNED;
>>  spin_lock_bh(&net->nsid_lock);
>> -alloc = atomic_read(&peer->count) == 0 ? false : true;
>> +/* Spinlock guarantees we never hash a peer to net->netns_ids
>> + * after idr_destroy(&net->netns_ids) occurs in cleanup_net().
>> + */
>> +if (atomic_read(&net->count) == 0) {
>> +id = NETNSA_NSID_NOT_ASSIGNED;
>> +goto unlock;
>> +}
>> +/*
>> + * When peer is obtained from RCU lists, we may race with
>> + * its cleanup. Check whether it's alive, and this guarantees
>> + * we never hash a peer back to net->netns_ids, after it has
>> + * just been idr_remove()'d from there in cleanup_net().
>> + */
>> +if (maybe_get_net(peer))
>> +alive = alloc = true;
>>  id = __peernet2id_alloc(net, peer, &alloc);
>> +unlock:
>>  spin_unlock_bh(&net->nsid_lock);
>>  if (alloc && id >= 0)
>>  rtnl_net_notifyid(net, RTM_NEWNSID, id);
>> +if (alive)
>> +put_net(peer);
>>  return id;
>>  }
>>  EXPORT_SYMBOL_GPL(peernet2id_alloc);



Re: [PATCH] b43: Replace mdelay with msleep in b43_radio_2057_init_post

2017-12-30 Thread Larry Finger

On 12/30/2017 05:08 AM, Jia-Ju Bai wrote:

b43_radio_2057_init_post is not called in an interrupt handler
nor holding a spinlock.
The function mdelay in it can be replaced with msleep, to reduce busy wait.

Signed-off-by: Jia-Ju Bai 


checkpatch.pl reports the following warning for this patch:

WARNING: msleep < 20ms can sleep for up to 20ms; see 
Documentation/timers/timers-howto.txt

#26: FILE: drivers/net/wireless/broadcom/b43/phy_n.c:1034:
+   msleep(2);

total: 0 errors, 1 warnings, 0 checks, 8 lines checked

Have you tested to verify that a sleep as long as 20 ms will not cause problems? 
The referenced document suggests a usleep_range() call.


In general, delay changes should never be proposed without testing.

Larry


[PATCH] orinoco: Delete an error message for a failed memory allocation in three functions

2017-12-30 Thread SF Markus Elfring
From: Markus Elfring 
Date: Sat, 30 Dec 2017 20:20:56 +0100

Omit an extra message for a memory allocation failure in these functions.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring 
---
 drivers/net/wireless/intersil/orinoco/main.c| 10 ++
 drivers/net/wireless/intersil/orinoco/orinoco_usb.c |  5 ++---
 2 files changed, 4 insertions(+), 11 deletions(-)

diff --git a/drivers/net/wireless/intersil/orinoco/main.c 
b/drivers/net/wireless/intersil/orinoco/main.c
index 28dac36d7c4c..0b214f92a503 100644
--- a/drivers/net/wireless/intersil/orinoco/main.c
+++ b/drivers/net/wireless/intersil/orinoco/main.c
@@ -785,11 +785,8 @@ static void orinoco_rx_monitor(struct net_device *dev, u16 
rxfid,
}
 
skb = dev_alloc_skb(hdrlen + datalen);
-   if (!skb) {
-   printk(KERN_WARNING "%s: Cannot allocate skb for monitor 
frame\n",
-  dev->name);
+   if (!skb)
goto update_stats;
-   }
 
/* Copy the 802.11 header to the skb */
skb_put_data(skb, &(desc->frame_ctl), hdrlen);
@@ -900,11 +897,8 @@ void __orinoco_ev_rx(struct net_device *dev, struct hermes 
*hw)
   packets from the card, which has an IO granularity of 16
   bits */
skb = dev_alloc_skb(length + ETH_HLEN + 2 + 1);
-   if (!skb) {
-   printk(KERN_WARNING "%s: Can't allocate skb for Rx\n",
-  dev->name);
+   if (!skb)
goto update_stats;
-   }
 
/* We'll prepend the header, so reserve space for it.  The worst
   case is no decapsulation, when 802.3 header is prepended and
diff --git a/drivers/net/wireless/intersil/orinoco/orinoco_usb.c 
b/drivers/net/wireless/intersil/orinoco/orinoco_usb.c
index 501180584b4b..8ef96a1c231f 100644
--- a/drivers/net/wireless/intersil/orinoco/orinoco_usb.c
+++ b/drivers/net/wireless/intersil/orinoco/orinoco_usb.c
@@ -805,10 +805,9 @@ static int ezusb_firmware_download(struct ezusb_priv 
*upriv,
int variant_offset;
 
fw_buffer = kmalloc(FW_BUF_SIZE, GFP_KERNEL);
-   if (!fw_buffer) {
-   printk(KERN_ERR PFX "Out of memory for firmware buffer.\n");
+   if (!fw_buffer)
return -ENOMEM;
-   }
+
/*
 * This byte is 1 and should be replaced with 0.  The offset is
 * 0x10AD in version 0.0.6.  The byte in question should follow
-- 
2.15.1



[PATCH] rds: fix use-after-free read in rds_find_bound

2017-12-30 Thread Santosh Shilimkar
socket buffer can get freed as part of sock_close
callback so before adding reference check underneath
socket validity.

Reported-by: syzbot+93a5839deb3555374...@syzkaller.appspotmail.com
Signed-off-by: Santosh Shilimkar 
---
 net/rds/bind.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/rds/bind.c b/net/rds/bind.c
index 75d43dc..8dec06e 100644
--- a/net/rds/bind.c
+++ b/net/rds/bind.c
@@ -61,7 +61,7 @@ struct rds_sock *rds_find_bound(__be32 addr, __be16 port)
struct rds_sock *rs;
 
rs = rhashtable_lookup_fast(&bind_hash_table, &key, ht_parms);
-   if (rs && !sock_flag(rds_rs_to_sk(rs), SOCK_DEAD))
+   if (rs && rds_rs_to_sk(rs) && !sock_flag(rds_rs_to_sk(rs), SOCK_DEAD))
rds_sock_addref(rs);
else
rs = NULL;
-- 
1.9.1



Re: KASAN: use-after-free Read in rds_find_bound

2017-12-30 Thread santosh.shilim...@oracle.com

On 12/30/17 1:17 AM, syzbot wrote:

Hello,

syzkaller hit the following crash on 
fba961ab29e5ffb055592442808bb0f7962e05da

git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/master
compiler: gcc (GCC) 7.1.1 20170620
.config is attached
Raw console output is attached.
Unfortunately, I don't have any reproducer for this bug yet.


IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+93a5839deb3555374...@syzkaller.appspotmail.com


Posted a fix[1] for above issue. Didn't test it but looks straight
forward.

Regards,
Santosh




[PATCH] wireless: airo: Delete an error message for a failed memory allocation in airo_networks_allocate()

2017-12-30 Thread SF Markus Elfring
From: Markus Elfring 
Date: Sat, 30 Dec 2017 20:48:44 +0100

Omit an extra message for a memory allocation failure in this function.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring 
---
 drivers/net/wireless/cisco/airo.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/net/wireless/cisco/airo.c 
b/drivers/net/wireless/cisco/airo.c
index 86e795de6760..a65d82d26eaa 100644
--- a/drivers/net/wireless/cisco/airo.c
+++ b/drivers/net/wireless/cisco/airo.c
@@ -2714,12 +2714,7 @@ static int airo_networks_allocate(struct airo_info *ai)
 
ai->networks = kcalloc(AIRO_MAX_NETWORK_COUNT, sizeof(BSSListElement),
   GFP_KERNEL);
-   if (!ai->networks) {
-   airo_print_warn("", "Out of memory allocating beacons");
-   return -ENOMEM;
-   }
-
-   return 0;
+   return ai->networks ? 0 : -ENOMEM;
 }
 
 static void airo_networks_free(struct airo_info *ai)
-- 
2.15.1



Re: iproute2 net-next

2017-12-30 Thread Daniel Borkmann
On 12/30/2017 05:00 AM, Stephen Hemminger wrote:
> On Fri, 29 Dec 2017 09:58:23 +0100
> Jiri Pirko  wrote:
>> Fri, Dec 29, 2017 at 12:46:31AM CET, dan...@iogearbox.net wrote:
>>> On 12/26/2017 10:35 AM, Leon Romanovsky wrote:  
 On Mon, Dec 25, 2017 at 10:14:26PM -0800, Stephen Hemminger wrote:  
> On Tue, 26 Dec 2017 06:47:43 +0200
> Leon Romanovsky  wrote:  
>> On Mon, Dec 25, 2017 at 10:49:19AM -0800, Stephen Hemminger wrote:  
>>> David Ahern has agreed to take over managing the net-next branch of 
>>> iproute2.
>>> The new location is:
>>>  
>>> https://git.kernel.org/pub/scm/linux/kernel/git/dsahern/iproute2-next.git/
>>>
>>> In the past, I have accepted new features into iproute2 master branch, 
>>> but
>>> am changing the policy so that outside of the merge window (up until 
>>> -rc1)
>>> new features will get put into net-next to get some more review and 
>>> testing
>>> time. This means that things like the proposed batch streaming mode will
>>> go through net-next.  
>>
>> Did you consider to create one shared repo for the iproute2 to allow
>> multiple committers workflow?  
>
> For now having separate trees is best, there is no need for multiple
> committers the load is very light.
>  
>> It will be much convenient for the users to have one place for
>> master/stable/net-next branches, instead of actually following two
>> different repositories.  
>
> If you are doing network development, you already need to deal with
> multiple repo's on the kernel side so there is no difference.  

 I agree with you that one extra "git remote add .." is not so huge and
 all people who develop for the netdev will do it. My concern is about
 Documentation and newcomers, who will have a hard time to find a right
 tree.  
>>>
>>> I guess it would certainly help to identify the official repo to rebase
>>> against much quicker if it would be under a common group on korg e.g.
>>>
>>>  * iproute2/iproute2.git - for current cycle
>>>  * iproute2/iproute2-next.git- for net-next bits
>>>
>>> and also be in line with other tooling (ethtool and others), even if
>>> not as high volume, but it would make it unambiguous right away from
>>> the other, private iproute2 repos on korg, imho. Just a thought.  
>>
>> +1
>>
>> I was about to suggest this. This is nice opportunity to do such change.
>>
>> Example, of such shared repo:
>> BPF: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/
>> Bluetooth: 
>> https://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git/
>> RDMA: https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/  
>
> Most of these are high volume or vendor silo'd which is not the case 
> here.  
>>> Cheers,
>>> Daniel  
> 
> Good news
> kup does support links so could make links from personal to iproute2 directory

That's nice indeed!

> Bad news
> kup won't allow me to make iproute2 directory right now. Will have to wait for
> Konstantin

Right, he also did set up the shared dir for bpf which was straight forward
though, so would be pretty much the same one-time procedure for iproute2.

Thanks,
Daniel


Re: [PATCH] rds: fix use-after-free read in rds_find_bound

2017-12-30 Thread Sowmini Varadhan
On (12/30/17 11:36), Santosh Shilimkar wrote:
> 
> socket buffer can get freed as part of sock_close
> callback so before adding reference check underneath
> socket validity.

I'm not sure I understand this fix-  

struct rds_sock is:
  struct rds_sock {
struct sock rs_sk;
 :
  }

How can  rs be non-null but rds_rs_to_sk() is null? (Note that
rds_rs_to_sk just returns &rs->rs_sk) so the changed line is 
identical to the original line.

> - if (rs && !sock_flag(rds_rs_to_sk(rs), SOCK_DEAD))
> + if (rs && rds_rs_to_sk(rs) && !sock_flag(rds_rs_to_sk(rs), SOCK_DEAD))

I think the real issue is refcount bug somewhere,

Was the syzbot test run with http://patchwork.ozlabs.org/patch/852492/
this sounds like that type of bug.

--Sowmini



[PATCH] wireless: b43: Delete an error message for a failed memory allocation in b43_sdio_probe()

2017-12-30 Thread SF Markus Elfring
From: Markus Elfring 
Date: Sat, 30 Dec 2017 21:23:47 +0100

Omit an extra message for a memory allocation failure in this function.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring 
---
 drivers/net/wireless/broadcom/b43/sdio.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/wireless/broadcom/b43/sdio.c 
b/drivers/net/wireless/broadcom/b43/sdio.c
index 59a521800694..5a6dbcf170f9 100644
--- a/drivers/net/wireless/broadcom/b43/sdio.c
+++ b/drivers/net/wireless/broadcom/b43/sdio.c
@@ -146,7 +146,6 @@ static int b43_sdio_probe(struct sdio_func *func,
sdio = kzalloc(sizeof(*sdio), GFP_KERNEL);
if (!sdio) {
error = -ENOMEM;
-   dev_err(&func->dev, "failed to allocate ssb bus\n");
goto err_disable_func;
}
error = ssb_bus_sdiobus_register(&sdio->ssb, func,
-- 
2.15.1



[PATCH bpf-next v4 2/3] libbpf: add error reporting in XDP

2017-12-30 Thread Eric Leblond
Parse netlink ext attribute to get the error message returned by
the card. Code is partially take from libnl.

Signed-off-by: Eric Leblond 
Acked-by: Alexei Starovoitov 
---
 tools/lib/bpf/Build|   2 +-
 tools/lib/bpf/bpf.c|  10 ++-
 tools/lib/bpf/nlattr.c | 187 +
 tools/lib/bpf/nlattr.h |  69 ++
 4 files changed, 266 insertions(+), 2 deletions(-)
 create mode 100644 tools/lib/bpf/nlattr.c
 create mode 100644 tools/lib/bpf/nlattr.h

diff --git a/tools/lib/bpf/Build b/tools/lib/bpf/Build
index d8749756352d..64c679d67109 100644
--- a/tools/lib/bpf/Build
+++ b/tools/lib/bpf/Build
@@ -1 +1 @@
-libbpf-y := libbpf.o bpf.o
+libbpf-y := libbpf.o bpf.o nlattr.o
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index f00fba2edeae..ceb20c5cae3b 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -26,6 +26,7 @@
 #include 
 #include "bpf.h"
 #include "libbpf.h"
+#include "nlattr.h"
 #include 
 #include 
 #include 
@@ -436,6 +437,7 @@ int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags)
struct nlmsghdr *nh;
struct nlmsgerr *err;
socklen_t addrlen;
+   int one;
 
memset(&sa, 0, sizeof(sa));
sa.nl_family = AF_NETLINK;
@@ -445,6 +447,11 @@ int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags)
return -errno;
}
 
+   if (setsockopt(sock, SOL_NETLINK, NETLINK_EXT_ACK,
+  &one, sizeof(one)) < 0) {
+   fprintf(stderr, "Netlink error reporting not supported\n");
+   }
+
if (bind(sock, (struct sockaddr *)&sa, sizeof(sa)) < 0) {
ret = -errno;
goto cleanup;
@@ -520,7 +527,8 @@ int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags)
err = (struct nlmsgerr *)NLMSG_DATA(nh);
if (!err->error)
continue;
-   ret = err->error;
+   ret = -err->error;
+   nla_dump_errormsg(nh);
goto cleanup;
case NLMSG_DONE:
break;
diff --git a/tools/lib/bpf/nlattr.c b/tools/lib/bpf/nlattr.c
new file mode 100644
index ..4719434278b2
--- /dev/null
+++ b/tools/lib/bpf/nlattr.c
@@ -0,0 +1,187 @@
+// SPDX-License-Identifier: LGPL-2.1
+
+/*
+ * NETLINK  Netlink attributes
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation version 2.1
+ * of the License.
+ *
+ * Copyright (c) 2003-2013 Thomas Graf 
+ */
+
+#include 
+#include "nlattr.h"
+#include 
+#include 
+#include 
+
+static uint16_t nla_attr_minlen[NLA_TYPE_MAX+1] = {
+   [NLA_U8]= sizeof(uint8_t),
+   [NLA_U16]   = sizeof(uint16_t),
+   [NLA_U32]   = sizeof(uint32_t),
+   [NLA_U64]   = sizeof(uint64_t),
+   [NLA_STRING]= 1,
+   [NLA_FLAG]  = 0,
+};
+
+static int nla_len(const struct nlattr *nla)
+{
+   return nla->nla_len - NLA_HDRLEN;
+}
+
+static struct nlattr *nla_next(const struct nlattr *nla, int *remaining)
+{
+   int totlen = NLA_ALIGN(nla->nla_len);
+
+   *remaining -= totlen;
+   return (struct nlattr *) ((char *) nla + totlen);
+}
+
+static int nla_ok(const struct nlattr *nla, int remaining)
+{
+   return remaining >= sizeof(*nla) &&
+  nla->nla_len >= sizeof(*nla) &&
+  nla->nla_len <= remaining;
+}
+
+static void *nla_data(const struct nlattr *nla)
+{
+   return (char *) nla + NLA_HDRLEN;
+}
+
+static int nla_type(const struct nlattr *nla)
+{
+   return nla->nla_type & NLA_TYPE_MASK;
+}
+
+static int validate_nla(struct nlattr *nla, int maxtype,
+   struct nla_policy *policy)
+{
+   struct nla_policy *pt;
+   unsigned int minlen = 0;
+   int type = nla_type(nla);
+
+   if (type < 0 || type > maxtype)
+   return 0;
+
+   pt = &policy[type];
+
+   if (pt->type > NLA_TYPE_MAX)
+   return 0;
+
+   if (pt->minlen)
+   minlen = pt->minlen;
+   else if (pt->type != NLA_UNSPEC)
+   minlen = nla_attr_minlen[pt->type];
+
+   if (nla_len(nla) < minlen)
+   return -1;
+
+   if (pt->maxlen && nla_len(nla) > pt->maxlen)
+   return -1;
+
+   if (pt->type == NLA_STRING) {
+   char *data = nla_data(nla);
+   if (data[nla_len(nla) - 1] != '\0')
+   return -1;
+   }
+
+   return 0;
+}
+
+static inline int nlmsg_len(const struct nlmsghdr *nlh)
+{
+   return nlh->nlmsg_len - NLMSG_HDRLEN;
+}
+
+/**
+ * Create attribute index based on a stream of attributes.
+ * @arg tb Index array to be filled (maxtype+1 elements).
+ * @arg maxtypeMaximum attribute type expected and accepted.
+ * @arg head

[PATCH bpf-next v4 0/3] libbpf: add function to setup XDP

2017-12-30 Thread Eric Leblond
Hello,

This updated patchset address the remarks by Toshiaki Makita and
Philippe Ombredanne:
 - fixes on errno handling
 - correct usage of SPDX header

Best regards,
--
Eric Leblond 


[PATCH bpf-next v4 1/3] libbpf: add function to setup XDP

2017-12-30 Thread Eric Leblond
Most of the code is taken from set_link_xdp_fd() in bpf_load.c and
slightly modified to be library compliant.

Signed-off-by: Eric Leblond 
Acked-by: Alexei Starovoitov 
---
 tools/lib/bpf/bpf.c| 126 -
 tools/lib/bpf/libbpf.c |   2 +
 tools/lib/bpf/libbpf.h |   4 ++
 3 files changed, 130 insertions(+), 2 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 5128677e4117..f00fba2edeae 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -25,6 +25,16 @@
 #include 
 #include 
 #include "bpf.h"
+#include "libbpf.h"
+#include 
+#include 
+#include 
+
+#ifndef IFLA_XDP_MAX
+#define IFLA_XDP   43
+#define IFLA_XDP_FD1
+#define IFLA_XDP_FLAGS 3
+#endif
 
 /*
  * When building perf, unistd.h is overridden. __NR_bpf is
@@ -46,8 +56,6 @@
 # endif
 #endif
 
-#define min(x, y) ((x) < (y) ? (x) : (y))
-
 static inline __u64 ptr_to_u64(const void *ptr)
 {
return (__u64) (unsigned long) ptr;
@@ -413,3 +421,117 @@ int bpf_obj_get_info_by_fd(int prog_fd, void *info, __u32 
*info_len)
 
return err;
 }
+
+int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags)
+{
+   struct sockaddr_nl sa;
+   int sock, seq = 0, len, ret = -1;
+   char buf[4096];
+   struct nlattr *nla, *nla_xdp;
+   struct {
+   struct nlmsghdr  nh;
+   struct ifinfomsg ifinfo;
+   char attrbuf[64];
+   } req;
+   struct nlmsghdr *nh;
+   struct nlmsgerr *err;
+   socklen_t addrlen;
+
+   memset(&sa, 0, sizeof(sa));
+   sa.nl_family = AF_NETLINK;
+
+   sock = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);
+   if (sock < 0) {
+   return -errno;
+   }
+
+   if (bind(sock, (struct sockaddr *)&sa, sizeof(sa)) < 0) {
+   ret = -errno;
+   goto cleanup;
+   }
+
+   addrlen = sizeof(sa);
+   if (getsockname(sock, (struct sockaddr *)&sa, &addrlen) < 0) {
+   ret = -errno;
+   goto cleanup;
+   }
+
+   if (addrlen != sizeof(sa)) {
+   ret = -LIBBPF_ERRNO__INTERNAL;
+   goto cleanup;
+   }
+
+   memset(&req, 0, sizeof(req));
+   req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg));
+   req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
+   req.nh.nlmsg_type = RTM_SETLINK;
+   req.nh.nlmsg_pid = 0;
+   req.nh.nlmsg_seq = ++seq;
+   req.ifinfo.ifi_family = AF_UNSPEC;
+   req.ifinfo.ifi_index = ifindex;
+
+   /* started nested attribute for XDP */
+   nla = (struct nlattr *)(((char *)&req)
+   + NLMSG_ALIGN(req.nh.nlmsg_len));
+   nla->nla_type = NLA_F_NESTED | IFLA_XDP;
+   nla->nla_len = NLA_HDRLEN;
+
+   /* add XDP fd */
+   nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
+   nla_xdp->nla_type = IFLA_XDP_FD;
+   nla_xdp->nla_len = NLA_HDRLEN + sizeof(int);
+   memcpy((char *)nla_xdp + NLA_HDRLEN, &fd, sizeof(fd));
+   nla->nla_len += nla_xdp->nla_len;
+
+   /* if user passed in any flags, add those too */
+   if (flags) {
+   nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
+   nla_xdp->nla_type = IFLA_XDP_FLAGS;
+   nla_xdp->nla_len = NLA_HDRLEN + sizeof(flags);
+   memcpy((char *)nla_xdp + NLA_HDRLEN, &flags, sizeof(flags));
+   nla->nla_len += nla_xdp->nla_len;
+   }
+
+   req.nh.nlmsg_len += NLA_ALIGN(nla->nla_len);
+
+   if (send(sock, &req, req.nh.nlmsg_len, 0) < 0) {
+   ret = -errno;
+   goto cleanup;
+   }
+
+   len = recv(sock, buf, sizeof(buf), 0);
+   if (len < 0) {
+   ret = -errno;
+   goto cleanup;
+   }
+
+   for (nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, len);
+nh = NLMSG_NEXT(nh, len)) {
+   if (nh->nlmsg_pid != sa.nl_pid) {
+   ret = -LIBBPF_ERRNO__WRNGPID;
+   goto cleanup;
+   }
+   if (nh->nlmsg_seq != seq) {
+   ret = -LIBBPF_ERRNO__INVSEQ;
+   goto cleanup;
+   }
+   switch (nh->nlmsg_type) {
+   case NLMSG_ERROR:
+   err = (struct nlmsgerr *)NLMSG_DATA(nh);
+   if (!err->error)
+   continue;
+   ret = err->error;
+   goto cleanup;
+   case NLMSG_DONE:
+   break;
+   default:
+   break;
+   }
+   }
+
+   ret = 0;
+
+cleanup:
+   close(sock);
+   return ret;
+}
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index e9c4b7cabcf2..5fe8aaa2123e 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -106,6 +106,8 @@ static const char *libbpf_strerror_table[NR_ERRNO] = {
[ERRCODE_OFF

[PATCH bpf-next v4 3/3] libbpf: add missing SPDX-License-Identifier

2017-12-30 Thread Eric Leblond
Signed-off-by: Eric Leblond 
Acked-by: Alexei Starovoitov 
---
 tools/lib/bpf/bpf.c| 2 ++
 tools/lib/bpf/bpf.h| 2 ++
 tools/lib/bpf/libbpf.c | 2 ++
 tools/lib/bpf/libbpf.h | 2 ++
 4 files changed, 8 insertions(+)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index ceb20c5cae3b..ab8b2eb31273 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -1,3 +1,5 @@
+// SPDX-License-Identifier: LGPL-2.1
+
 /*
  * common eBPF ELF operations.
  *
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 9f44c196931e..8d18fb73d7fb 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -1,3 +1,5 @@
+/* SPDX-License-Identifier: LGPL-2.1 */
+
 /*
  * common eBPF ELF operations.
  *
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 5fe8aaa2123e..924a8b8431ab 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1,3 +1,5 @@
+// SPDX-License-Identifier: LGPL-2.1
+
 /*
  * Common eBPF ELF object loading operations.
  *
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index e42f96900318..f85906533cdd 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -1,3 +1,5 @@
+/* SPDX-License-Identifier: LGPL-2.1 */
+
 /*
  * Common eBPF ELF object loading operations.
  *
-- 
2.15.1



Re: [RFT net-next v3 0/5] dwmac-meson8b: RGMII clock fixes for Meson8b

2017-12-30 Thread Martin Blumenstingl
Hi Emiliano,

On Sat, Dec 30, 2017 at 12:00 AM, Emiliano Ingrassia
 wrote:
> Hi Jerome, Hi Martin,
>
> On Fri, Dec 29, 2017 at 07:04:23PM +0100, Jerome Brunet wrote:
>> On Fri, 2017-12-29 at 02:31 +0100, Emiliano Ingrassia wrote:
>> > Hi Martin, Hi Dave,
>> >
>> > On Thu, Dec 28, 2017 at 11:21:23PM +0100, Martin Blumenstingl wrote:
>> > > Hi Dave,
>> > >
>> > > please do not apply this series until it got a Tested-by from Emiliano.
>> > >
>> > >
>> > > Hi Emiliano,
>> > >
>> > > you reported [0] that you couldn't get dwmac-meson8b to work on your
>> > > Odroid-C1. With your findings (register dumps, clk_summary output, etc.)
>> > > I think I was able to find a fix: it consists of two patches (which you
>> > > find in this series)
>> > >
>> > > Unfortunately I don't have any Meson8b boards with RGMII PHY so I could
>> > > only partially test this (I could only check if the clocks were
>> > > calculated correctly when using a dummy 52394Hz input clock instead
>> > > of MPLL2).
>> > >
>> > > Could you please give this series a try and let me know about the
>> > > results?
>> > > You obviously still need your two "ARM: dts: meson8b" patches which
>> > > - add the amlogic,meson8b-dwmac" compatible to meson8b.dtsi
>> > > - enable Ethernet on the Odroid-C1
>> > >
>> > > When testing on Meson8b this also needs a fix for the MPLL clock driver:
>> > > "clk: meson: mpll: use 64-bit maths in params_from_rate", see:
>> > > https://patchwork.kernel.org/patch/10131677/
>> > >
>> > >
>> > > I have tested this myself on a Khadas VIM (GXL SoC, internal RMII PHY)
>> > > and a Khadas VIM2 (GXM SoC, external RGMII PHY). Both are still working
>> > > fine (so let's hope that this also fixes your Meson8b issue :)).
>> > >
>> > >
>> > > changes since v1 at [1]:
>> > > - changed the subject of the cover-letter to indicate that this is all
>> > >   about the RGMII clock
>> > > - added PATCH #1 which ensures that we don't unnecessarily change the
>> > >   parent clocks in RMII mode (and also makes the code easier to
>> > >   understand)
>> > > - changed subject of PATCH #2 (formerly PATCH #1) to state that this
>> > >   is about the RGMII clock
>> > > - added Jerome's Reviewed-by to PATCH #2 (formerly PATCH #1)
>> > > - replaced PATCH #3 (formerly PATCH #2) with one that sets
>> > >   CLK_SET_RATE_PARENT on the mux and thus re-configures the MPLL2 clock
>> > >   on Meson8b correctly
>> > >
>> > > changes since v2 at [2]:
>> > > - added PATCH #2 to make the following patch easier
>> > > - Emiliano reported that there's currently another bug in the
>> > >   dwmac-meson8b driver which prevents it from working with RGMII PHYs on
>> > >   Meson8b: bit 10 of the PRG_ETH0 register is configures a clock gate
>> > >   (instead of a divide by 5 or divide by 10 clock divider). This has not
>> > >   been visible on GXBB and later due to the input clock which always led
>> > >   to a selection of "divide by 10" (which is done internally in the IP
>> > >   block, but the bit actually means "enable RGMII clock output").
>> > >   PATCH #3 was added to address this issue.
>> > > - the commit message of PATCH #4 and #5 (formerly PATCH #2 and #3) were
>> > >   updated and the patch itself rebased because the m25_div clock was
>> > >   removed with the new PATCH #3 (so some of the statements were not
>> > >   valid anymore)
>> > >
>> >
>> > Here is the clk_summary relative to ethernet on Odroid-C1+
>> > with this new series applied:
>> >
>> > xtal112400 
>> >  0 0
>> >  sys_pll00  12  0 0
>> >   cpu_clk   00  12  0 0
>> >  vid_pll00   73200  0 0
>> >  fixed_pll  22  255000  0 0
>> >   mpll2 11   24701 
>> >  0 0
>> >c941.ethernet#m250_sel   11   24701  0 0
>> > c941.ethernet#m250_div  11   24701 
>> >  0 0
>> >  c941.ethernet#fixed_div10  112470  0 0
>> >   c941.ethernet#m25_en  112470 
>> >  0 0
>> >
>> > The ethernet prg0 register is set to 0x74A1 which should be correct with
>> > respect to the information contained in the S805 SoC manual.
>> > Actually, the ethernet is not yet fully functional.
>> > Trying to ping the board, I can see ARP request from host to board using
>> > tcpdump. However, the host can't see any response.
>>
>> If the rx path is ok-ish, I suppose the clock setting applied is good.
>> Maybe you could try to play with the tx delay (BIT 5-6 of the register) ?
>>
>
> Thanks for the suggestion. Finally the ethernet works correctly using 4
> ns as tx-delay.
> The clock summary is the same reported above. The prg0 ethernet register
> value is 0x74c1 as expected.
this is a

[PATCH 0/2] at76c50x-usb: Adjustments for two function implementations

2017-12-30 Thread SF Markus Elfring
From: Markus Elfring 
Date: Sat, 30 Dec 2017 22:01:23 +0100

Two update suggestions were taken into account
from static source code analysis.

Markus Elfring (2):
  Delete an error message for a failed memory allocation in at76_submit_rx_urb()
  Improve size determinations in at76_usbdfu_download()

 drivers/net/wireless/atmel/at76c50x-usb.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

-- 
2.15.1



[PATCH 1/2] at76c50x-usb: Delete an error message for a failed memory allocation in at76_submit_rx_urb()

2017-12-30 Thread SF Markus Elfring
From: Markus Elfring 
Date: Sat, 30 Dec 2017 21:50:12 +0100

Omit an extra message for a memory allocation failure in this function.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring 
---
 drivers/net/wireless/atmel/at76c50x-usb.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/wireless/atmel/at76c50x-usb.c 
b/drivers/net/wireless/atmel/at76c50x-usb.c
index ede89d4ffc88..2893d339b440 100644
--- a/drivers/net/wireless/atmel/at76c50x-usb.c
+++ b/drivers/net/wireless/atmel/at76c50x-usb.c
@@ -1223,8 +1223,6 @@ static int at76_submit_rx_urb(struct at76_priv *priv)
if (!skb) {
skb = dev_alloc_skb(sizeof(struct at76_rx_buffer));
if (!skb) {
-   wiphy_err(priv->hw->wiphy,
- "cannot allocate rx skbuff\n");
ret = -ENOMEM;
goto exit;
}
-- 
2.15.1



[PATCH 2/2] at76c50x-usb: Improve size determinations in at76_usbdfu_download()

2017-12-30 Thread SF Markus Elfring
From: Markus Elfring 
Date: Sat, 30 Dec 2017 21:56:56 +0100

Replace the specification of two data types by pointer dereferences
as the parameter for the operator "sizeof" to make the corresponding size
determination a bit safer according to the Linux coding style convention.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring 
---
 drivers/net/wireless/atmel/at76c50x-usb.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/atmel/at76c50x-usb.c 
b/drivers/net/wireless/atmel/at76c50x-usb.c
index 2893d339b440..6144d4a258ca 100644
--- a/drivers/net/wireless/atmel/at76c50x-usb.c
+++ b/drivers/net/wireless/atmel/at76c50x-usb.c
@@ -383,7 +383,7 @@ static int at76_usbdfu_download(struct usb_device *udev, u8 
*buf, u32 size,
return -EINVAL;
}
 
-   dfu_stat_buf = kmalloc(sizeof(struct dfu_status), GFP_KERNEL);
+   dfu_stat_buf = kmalloc(sizeof(*dfu_stat_buf), GFP_KERNEL);
if (!dfu_stat_buf) {
ret = -ENOMEM;
goto exit;
@@ -395,7 +395,7 @@ static int at76_usbdfu_download(struct usb_device *udev, u8 
*buf, u32 size,
goto exit;
}
 
-   dfu_state = kmalloc(sizeof(u8), GFP_KERNEL);
+   dfu_state = kmalloc(sizeof(*dfu_state), GFP_KERNEL);
if (!dfu_state) {
ret = -ENOMEM;
goto exit;
-- 
2.15.1



Re: [patch net-next v2 00/10] Add support for resource abstraction

2017-12-30 Thread David Ahern
On 12/28/17 1:21 AM, Yuval Mintz wrote:
> I think it goes the other way around. The dpipe tables are the ones that
> can be translated to functionality; The resources are internal and HW-specific
> representing the possible internal division of resources -
> but a given resource sn't necessarily mapped to a single networking feature.
> [It might be in some cases, but not in the general case]

This is what I am getting at -- a single resource /kvd/linear is used
for multiple networking features, and those networking features do map
to well known entities -- fdb entries, ACL entries, ipv4/v6 host
entries, LPM entries, etc.

Nothing about the output from devlink helps the user in any way to
understand how to change the resource values. Saying that these
resources, what they mean and how they are used is MLX proprietary and
is known only to MLX employees and those with MLX agreements is not
acceptable. Likewise, requiring some network admin to deep dive into the
mlxsw driver to piece together how kvd/linear (for example) is used is
not acceptable.

The cover letter touts "Many of the ASIC's internal resources are
limited and are shared between several hardware procedures. For example,
unified hash-based memory can be used for many lookup purposes, like FDB
and LPM. In many cases the user can provide a partitioning scheme for
such a resource in order to perform fine tuning for his application."

Great, now give the user some indication of how to do that. Is setting
/kvd/linear to 0 acceptable? If not, why? What functionality is lost?
(Apparently, everything [1].)

The dpipe tables list some correlation between the kvd resources and
tables but that is not a complete list and again there is nothing to
tell a user that it is only a partial list of how a kvd resource is
used. For example, it shows ipv4 host is in /kvd/hash_single and that is
all it shows. So if I have an ipv6 only deployment can I conclude that I
can set /kvd/hash_single to 0? Or the reverse, can I set hash_double to
0 for an ipv4 only deployment? From the limited information given, it is
reasonable for a user to assume yes and has to learn through trial and
error what can be done. [2]

-

[1] This is allowed by the current patch set and perhaps it should not be:

$ ip ro ls vrf vrf1101
unreachable default metric 8192
11.2.51.0/24 dev swp1s0.51 proto kernel scope link src 11.2.51.1 offload
11.3.51.0/24 dev swp1s1.51 proto kernel scope link src 11.3.51.1 offload
11.4.51.0/24 dev swp1s2.51 proto kernel scope link src 11.4.51.1 offload
11.5.51.0/24 dev swp1s3.51 proto kernel scope link src 11.5.51.1 offload
11.6.51.0/24 dev swp3s0.51 proto kernel scope link src 11.6.51.1 offload
11.7.51.0/24 dev swp3s1.51 proto kernel scope link src 11.7.51.1 offload
11.8.51.0/24 dev swp3s2.51 proto kernel scope link src 11.8.51.1 offload
11.9.51.0/24 dev swp3s3.51 proto kernel scope link src 11.9.51.1 offload

$ devlink resource set pci/:03:00.0 path /kvd/linear size 0
$ devlink reload pci/:03:00.0
$ ip ro ls vrf vrf1101
unreachable default metric 8192

[2] Same exact result for setting hash_double to 0:
$ ip ro ls vrf vrf1101
unreachable default metric 8192
11.2.51.0/24 dev swp1s0.51 proto kernel scope link src 11.2.51.1 offload
11.3.51.0/24 dev swp1s1.51 proto kernel scope link src 11.3.51.1 offload
11.4.51.0/24 dev swp1s2.51 proto kernel scope link src 11.4.51.1 offload
11.5.51.0/24 dev swp1s3.51 proto kernel scope link src 11.5.51.1 offload
11.6.51.0/24 dev swp3s0.51 proto kernel scope link src 11.6.51.1 offload
11.7.51.0/24 dev swp3s1.51 proto kernel scope link src 11.7.51.1 offload
11.8.51.0/24 dev swp3s2.51 proto kernel scope link src 11.8.51.1 offload
11.9.51.0/24 dev swp3s3.51 proto kernel scope link src 11.9.51.1 offload

$ devlink resource set pci/:03:00.0 path /kvd/hash_double size 0
$ devlink reload pci/:03:00.0
$ ip ro ls vrf vrf1101
unreachable default metric 8192


[PATCH] wan/fsl_ucc_hdlc: Delete an error message for a failed memory allocation in ucc_hdlc_probe()

2017-12-30 Thread SF Markus Elfring
From: Markus Elfring 
Date: Sat, 30 Dec 2017 22:25:44 +0100

Omit an extra message for a memory allocation failure in this function.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring 
---
 drivers/net/wan/fsl_ucc_hdlc.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/wan/fsl_ucc_hdlc.c b/drivers/net/wan/fsl_ucc_hdlc.c
index 33df76405b86..98f8be206bae 100644
--- a/drivers/net/wan/fsl_ucc_hdlc.c
+++ b/drivers/net/wan/fsl_ucc_hdlc.c
@@ -1082,7 +1082,6 @@ static int ucc_hdlc_probe(struct platform_device *pdev)
utdm = kzalloc(sizeof(*utdm), GFP_KERNEL);
if (!utdm) {
ret = -ENOMEM;
-   dev_err(&pdev->dev, "No mem to alloc ucc tdm data\n");
goto free_uhdlc_priv;
}
uhdlc_priv->utdm = utdm;
-- 
2.15.1



Re: [PATCH] rds: fix use-after-free read in rds_find_bound

2017-12-30 Thread santosh.shilim...@oracle.com

On 12/30/17 12:26 PM, Sowmini Varadhan wrote:

On (12/30/17 11:36), Santosh Shilimkar wrote:


socket buffer can get freed as part of sock_close
callback so before adding reference check underneath
socket validity.


I'm not sure I understand this fix-

struct rds_sock is:
   struct rds_sock {
 struct sock rs_sk;
  :
   }

How can  rs be non-null but rds_rs_to_sk() is null? (Note that
rds_rs_to_sk just returns &rs->rs_sk) so the changed line is
identical to the original line.


Well thats what the report says o.w flag test wouldn't have
been attempted.


-   if (rs && !sock_flag(rds_rs_to_sk(rs), SOCK_DEAD))
+   if (rs && rds_rs_to_sk(rs) && !sock_flag(rds_rs_to_sk(rs), SOCK_DEAD))


I think the real issue is refcount bug somewhere,


Thats what I thought as well initially but since the reported case,
the rs seems to be valid where as sk seems to be freed up as part of
sock_release callback.


Was the syzbot test run with http://patchwork.ozlabs.org/patch/852492/
this sounds like that type of bug.


That fix scenario, the rs don't get inserted in hash table and
in this particular bug, the lookup was successful so am not sure
if these two bugs are related.

But since bound address fix was still not part of the build
reproduced use after free bug, $subject fix can wait for next
reproduction. Unfortunately as per the report, there is no
reproducer for it to test if other fix fixes this issue.

Regards,
Santosh



Re: [PATCH] rds: fix use-after-free read in rds_find_bound

2017-12-30 Thread Sowmini Varadhan
On (12/30/17 13:37), santosh.shilim...@oracle.com wrote:
> Well thats what the report says o.w flag test wouldn't have
> been attempted.

the bug report says "use-after-free". 

It doesnt say that rds_rs_to_sk(rs) is null (if rds_rs_to_sk(rs) was null,
rs would also be null, please cscope struct rds_sock)

What the bug report says is 
" The buggy address belongs to the object at 8801c09a6080
  which belongs to the cache RDS of size 1472
 The buggy address is located 96 bytes inside of .."

96 is the offset of sk->sk_flags. so yes, there is a socket refcount
issue.

But the patch you sent (see next two lines) will not solve that.

> >>-   if (rs && !sock_flag(rds_rs_to_sk(rs), SOCK_DEAD))
> >>+   if (rs && rds_rs_to_sk(rs) && !sock_flag(rds_rs_to_sk(rs), SOCK_DEAD))

Sowmini>I think the real issue is refcount bug somewhere,

> Thats what I thought as well initially but since the reported case,
> the rs seems to be valid where as sk seems to be freed up as part of
> sock_release callback.

I dont understand the statement above- how can "rs be valid, and sk
be freed"?

rs_sk is embedded in the struct rds_sock, it is not a pointer. 

let's find and fix the refcount bug. See stack trace in commit comment.
The socket release is happening prematurely and existing WARN_ONs
are not catching it.

> >Was the syzbot test run with http://patchwork.ozlabs.org/patch/852492/
> >this sounds like that type of bug.

--Sowmini


Re: [PATCH v3 net-next 2/5] net: tracepoint: replace tcp_set_state tracepoint with inet_sock_set_state tracepoint

2017-12-30 Thread Brendan Gregg
On Tue, Dec 19, 2017 at 7:12 PM, Yafang Shao  wrote:
> As sk_state is a common field for struct sock, so the state
> transition tracepoint should not be a TCP specific feature.
> Currently it traces all AF_INET state transition, so I rename this
> tracepoint to inet_sock_set_state tracepoint with some minor changes and move 
> it
> into trace/events/sock.h.

The tcp:tcp_set_state probe is tcp_set_state(), so it's only going to
fire for TCP sessions. It's not broken, and we could add a
sctp:sctp_set_state as well. Replacing tcp:tcp_set_state with
inet_sk_set_state is feeling like we might be baking too much
implementation detail into the tracepoint API.

If we must have inet_sk_set_state, then must we also delete tcp:tcp_set_state?

Brendan


> We dont need to create a file named trace/events/inet_sock.h for this one 
> single
> tracepoint.
>
> Two helpers are introduced to trace sk_state transition
> - void inet_sk_state_store(struct sock *sk, int newstate);
> - void inet_sk_set_state(struct sock *sk, int state);
> As trace header should not be included in other header files,
> so they are defined in sock.c.
>
> The protocol such as SCTP maybe compiled as a ko, hence export
> inet_sk_set_state().
>[...]


Re: [PATCH V4 1/4] security: Add support for SCTP security hooks

2017-12-30 Thread Marcelo Ricardo Leitner
On Sat, Dec 30, 2017 at 05:19:26PM +, Richard Haines wrote:
> The SCTP security hooks are explained in:
> Documentation/security/LSM-sctp.rst
> 
> Signed-off-by: Richard Haines 

Reviewed-by: Marcelo Ricardo Leitner 

> ---
>  Documentation/security/LSM-sctp.rst | 175 
> 
>  include/linux/lsm_hooks.h   |  36 
>  include/linux/security.h|  25 ++
>  security/security.c |  22 +
>  4 files changed, 258 insertions(+)
>  create mode 100644 Documentation/security/LSM-sctp.rst
> 
> diff --git a/Documentation/security/LSM-sctp.rst 
> b/Documentation/security/LSM-sctp.rst
> new file mode 100644
> index 000..6e5a392
> --- /dev/null
> +++ b/Documentation/security/LSM-sctp.rst
> @@ -0,0 +1,175 @@
> +SCTP LSM Support
> +
> +
> +For security module support, three SCTP specific hooks have been 
> implemented::
> +
> +security_sctp_assoc_request()
> +security_sctp_bind_connect()
> +security_sctp_sk_clone()
> +
> +Also the following security hook has been utilised::
> +
> +security_inet_conn_established()
> +
> +The usage of these hooks are described below with the SELinux implementation
> +described in ``Documentation/security/SELinux-sctp.rst``
> +
> +
> +security_sctp_assoc_request()
> +-
> +Passes the ``@ep`` and ``@chunk->skb`` of the association INIT packet to the
> +security module. Returns 0 on success, error on failure.
> +::
> +
> +@ep - pointer to sctp endpoint structure.
> +@skb - pointer to skbuff of association packet.
> +
> +
> +security_sctp_bind_connect()
> +-
> +Passes one or more ipv4/ipv6 addresses to the security module for validation
> +based on the ``@optname`` that will result in either a bind or connect
> +service as shown in the permission check tables below.
> +Returns 0 on success, error on failure.
> +::
> +
> +@sk  - Pointer to sock structure.
> +@optname - Name of the option to validate.
> +@address - One or more ipv4 / ipv6 addresses.
> +@addrlen - The total length of address(s). This is calculated on each
> +   ipv4 or ipv6 address using sizeof(struct sockaddr_in) or
> +   sizeof(struct sockaddr_in6).
> +
> +  --
> +  | BIND Type Checks   |
> +  |   @optname | @address contains |
> +  ||---|
> +  | SCTP_SOCKOPT_BINDX_ADD | One or more ipv4 / ipv6 addresses |
> +  | SCTP_PRIMARY_ADDR  | Single ipv4 or ipv6 address   |
> +  | SCTP_SET_PEER_PRIMARY_ADDR | Single ipv4 or ipv6 address   |
> +  --
> +
> +  --
> +  |   CONNECT Type Checks  |
> +  |   @optname | @address contains |
> +  ||---|
> +  | SCTP_SOCKOPT_CONNECTX  | One or more ipv4 / ipv6 addresses |
> +  | SCTP_PARAM_ADD_IP  | One or more ipv4 / ipv6 addresses |
> +  | SCTP_SENDMSG_CONNECT   | Single ipv4 or ipv6 address   |
> +  | SCTP_PARAM_SET_PRIMARY | Single ipv4 or ipv6 address   |
> +  --
> +
> +A summary of the ``@optname`` entries is as follows::
> +
> +SCTP_SOCKOPT_BINDX_ADD - Allows additional bind addresses to be
> + associated after (optionally) calling
> + bind(3).
> + sctp_bindx(3) adds a set of bind
> + addresses on a socket.
> +
> +SCTP_SOCKOPT_CONNECTX - Allows the allocation of multiple
> +addresses for reaching a peer
> +(multi-homed).
> +sctp_connectx(3) initiates a connection
> +on an SCTP socket using multiple
> +destination addresses.
> +
> +SCTP_SENDMSG_CONNECT  - Initiate a connection that is generated by a
> +sendmsg(2) or sctp_sendmsg(3) on a new 
> asociation.
> +
> +SCTP_PRIMARY_ADDR - Set local primary address.
> +
> +SCTP_SET_PEER_PRIMARY_ADDR - Request peer sets address as
> + association primary.
> +
> +SCTP_PARAM_ADD_IP  - These are used when Dynamic Address
> +SCTP_PARAM_SET_PRIMARY - Reconfiguration is enabled as explained 
> below.
> +
> +
> +To support Dynamic Address Reconfiguration the following parameters must be
> +enabled on both endpoints (or use the appropriate **setsockopt**\(2))::
> +
> +/proc/sys/net/sctp/addip_enable
> +/proc/sys/net/sctp/a

Re: [PATCH V4 2/4] sctp: Add ip option support

2017-12-30 Thread Marcelo Ricardo Leitner
On Sat, Dec 30, 2017 at 05:19:50PM +, Richard Haines wrote:
> Add ip option support to allow LSM security modules to utilise CIPSO/IPv4
> and CALIPSO/IPv6 services.
> 
> Signed-off-by: Richard Haines 

Acked-by: Marcelo Ricardo Leitner 

> ---
>  include/net/sctp/sctp.h|  4 +++-
>  include/net/sctp/structs.h |  2 ++
>  net/sctp/chunk.c   | 13 -
>  net/sctp/ipv6.c| 42 +++---
>  net/sctp/output.c  |  5 -
>  net/sctp/protocol.c| 36 
>  net/sctp/socket.c  |  9 +++--
>  7 files changed, 95 insertions(+), 16 deletions(-)
> 
> diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
> index d7d8cba..1b2f40a 100644
> --- a/include/net/sctp/sctp.h
> +++ b/include/net/sctp/sctp.h
> @@ -436,9 +436,11 @@ static inline int sctp_list_single_entry(struct 
> list_head *head)
>  static inline int sctp_frag_point(const struct sctp_association *asoc, int 
> pmtu)
>  {
>   struct sctp_sock *sp = sctp_sk(asoc->base.sk);
> + struct sctp_af *af = sp->pf->af;
>   int frag = pmtu;
>  
> - frag -= sp->pf->af->net_header_len;
> + frag -= af->ip_options_len(asoc->base.sk);
> + frag -= af->net_header_len;
>   frag -= sizeof(struct sctphdr) + sizeof(struct sctp_data_chunk);
>  
>   if (asoc->user_frag)
> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> index 0477945..9942ed5 100644
> --- a/include/net/sctp/structs.h
> +++ b/include/net/sctp/structs.h
> @@ -461,6 +461,7 @@ struct sctp_af {
>   void(*ecn_capable)(struct sock *sk);
>   __u16   net_header_len;
>   int sockaddr_len;
> + int (*ip_options_len)(struct sock *sk);
>   sa_family_t sa_family;
>   struct list_head list;
>  };
> @@ -485,6 +486,7 @@ struct sctp_pf {
>   int (*addr_to_user)(struct sctp_sock *sk, union sctp_addr *addr);
>   void (*to_sk_saddr)(union sctp_addr *, struct sock *sk);
>   void (*to_sk_daddr)(union sctp_addr *, struct sock *sk);
> + void (*copy_ip_options)(struct sock *sk, struct sock *newsk);
>   struct sctp_af *af;
>  };
>  
> diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c
> index 3afac27..9d130f4 100644
> --- a/net/sctp/chunk.c
> +++ b/net/sctp/chunk.c
> @@ -153,7 +153,6 @@ static void sctp_datamsg_assign(struct sctp_datamsg *msg, 
> struct sctp_chunk *chu
>   chunk->msg = msg;
>  }
>  
> -
>  /* A data chunk can have a maximum payload of (2^16 - 20).  Break
>   * down any such message into smaller chunks.  Opportunistically, fragment
>   * the chunks down to the current MTU constraints.  We may get refragmented
> @@ -170,6 +169,8 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct 
> sctp_association *asoc,
>   struct list_head *pos, *temp;
>   struct sctp_chunk *chunk;
>   struct sctp_datamsg *msg;
> + struct sctp_sock *sp;
> + struct sctp_af *af;
>   int err;
>  
>   msg = sctp_datamsg_new(GFP_KERNEL);
> @@ -188,9 +189,12 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct 
> sctp_association *asoc,
>   /* This is the biggest possible DATA chunk that can fit into
>* the packet
>*/
> - max_data = asoc->pathmtu -
> -sctp_sk(asoc->base.sk)->pf->af->net_header_len -
> -sizeof(struct sctphdr) - sizeof(struct sctp_data_chunk);
> + sp = sctp_sk(asoc->base.sk);
> + af = sp->pf->af;
> + max_data = asoc->pathmtu - af->net_header_len -
> +sizeof(struct sctphdr) - sizeof(struct sctp_data_chunk) -
> +af->ip_options_len(asoc->base.sk);
> +
>   max_data = SCTP_TRUNC4(max_data);
>  
>   /* If the the peer requested that we authenticate DATA chunks
> @@ -210,7 +214,6 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct 
> sctp_association *asoc,
>  
>   /* Set first_len and then account for possible bundles on first frag */
>   first_len = max_data;
> -
>   /* Check to see if we have a pending SACK and try to let it be bundled
>* with this message.  Do this if we don't have any data queued already.
>* To check that, look at out_qlen and retransmit list.
> diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
> index 3b18085..b06dc81 100644
> --- a/net/sctp/ipv6.c
> +++ b/net/sctp/ipv6.c
> @@ -423,6 +423,38 @@ static void sctp_v6_copy_addrlist(struct list_head 
> *addrlist,
>   rcu_read_unlock();
>  }
>  
> +/* Copy over any ip options */
> +static void sctp_v6_copy_ip_options(struct sock *sk, struct sock *newsk)
> +{
> + struct ipv6_pinfo *newnp, *np = inet6_sk(sk);
> + struct ipv6_txoptions *opt;
> +
> + newnp = inet6_sk(newsk);
> +
> + rcu_read_lock();
> + opt = rcu_dereference(np->opt);
> + if (opt)
> + opt = ipv6_dup_options(newsk, opt);
> + RCU_INIT_POINTER(newnp->opt, opt);
> + rcu_read_unlock();
> +}
> +
> +/* Account for the IP options */
> +static int sctp_v

Re: [PATCH V4 3/4] sctp: Add LSM hooks

2017-12-30 Thread Marcelo Ricardo Leitner
On Sat, Dec 30, 2017 at 05:20:13PM +, Richard Haines wrote:
> Add security hooks to allow security modules to exercise access control
> over SCTP.
> 
> Signed-off-by: Richard Haines 

Acked-by: Marcelo Ricardo Leitner 

> ---
>  include/net/sctp/structs.h | 10 
>  include/uapi/linux/sctp.h  |  1 +
>  net/sctp/sm_make_chunk.c   | 12 +
>  net/sctp/sm_statefuns.c| 18 ++
>  net/sctp/socket.c  | 61 
> +-
>  5 files changed, 101 insertions(+), 1 deletion(-)
> 
> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> index 9942ed5..2ca0a3f 100644
> --- a/include/net/sctp/structs.h
> +++ b/include/net/sctp/structs.h
> @@ -1271,6 +1271,16 @@ struct sctp_endpoint {
> reconf_enable:1;
>  
>   __u8  strreset_enable;
> +
> + /* Security identifiers from incoming (INIT). These are set by
> +  * security_sctp_assoc_request(). These will only be used by
> +  * SCTP TCP type sockets and peeled off connections as they
> +  * cause a new socket to be generated. security_sctp_sk_clone()
> +  * will then plug these into the new socket.
> +  */
> +
> + u32 secid;
> + u32 peer_secid;
>  };
>  
>  /* Recover the outter endpoint structure. */
> diff --git a/include/uapi/linux/sctp.h b/include/uapi/linux/sctp.h
> index cfe9712..cafac36 100644
> --- a/include/uapi/linux/sctp.h
> +++ b/include/uapi/linux/sctp.h
> @@ -123,6 +123,7 @@ typedef __s32 sctp_assoc_t;
>  #define SCTP_RESET_ASSOC 120
>  #define SCTP_ADD_STREAMS 121
>  #define SCTP_SOCKOPT_PEELOFF_FLAGS 122
> +#define SCTP_SENDMSG_CONNECT 123
>  
>  /* PR-SCTP policies */
>  #define SCTP_PR_SCTP_NONE0x
> diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
> index 514465b..269fd3d 100644
> --- a/net/sctp/sm_make_chunk.c
> +++ b/net/sctp/sm_make_chunk.c
> @@ -3054,6 +3054,12 @@ static __be16 sctp_process_asconf_param(struct 
> sctp_association *asoc,
>   if (af->is_any(&addr))
>   memcpy(&addr, &asconf->source, sizeof(addr));
>  
> + if (security_sctp_bind_connect(asoc->ep->base.sk,
> +SCTP_PARAM_ADD_IP,
> +(struct sockaddr *)&addr,
> +af->sockaddr_len))
> + return SCTP_ERROR_REQ_REFUSED;
> +
>   /* ADDIP 4.3 D9) If an endpoint receives an ADD IP address
>* request and does not have the local resources to add this
>* new address to the association, it MUST return an Error
> @@ -3120,6 +3126,12 @@ static __be16 sctp_process_asconf_param(struct 
> sctp_association *asoc,
>   if (af->is_any(&addr))
>   memcpy(&addr.v4, sctp_source(asconf), sizeof(addr));
>  
> + if (security_sctp_bind_connect(asoc->ep->base.sk,
> +SCTP_PARAM_SET_PRIMARY,
> +(struct sockaddr *)&addr,
> +af->sockaddr_len))
> + return SCTP_ERROR_REQ_REFUSED;
> +
>   peer = sctp_assoc_lookup_paddr(asoc, &addr);
>   if (!peer)
>   return SCTP_ERROR_DNS_FAILED;
> diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
> index 8f8ccde..a2dfc5a 100644
> --- a/net/sctp/sm_statefuns.c
> +++ b/net/sctp/sm_statefuns.c
> @@ -318,6 +318,11 @@ enum sctp_disposition sctp_sf_do_5_1B_init(struct net 
> *net,
>   struct sctp_packet *packet;
>   int len;
>  
> + /* Update socket peer label if first association. */
> + if (security_sctp_assoc_request((struct sctp_endpoint *)ep,
> + chunk->skb))
> + return sctp_sf_pdiscard(net, ep, asoc, type, arg, commands);
> +
>   /* 6.10 Bundling
>* An endpoint MUST NOT bundle INIT, INIT ACK or
>* SHUTDOWN COMPLETE with any other chunks.
> @@ -905,6 +910,9 @@ enum sctp_disposition sctp_sf_do_5_1E_ca(struct net *net,
>*/
>   sctp_add_cmd_sf(commands, SCTP_CMD_INIT_COUNTER_RESET, SCTP_NULL());
>  
> + /* Set peer label for connection. */
> + security_inet_conn_established(ep->base.sk, chunk->skb);
> +
>   /* RFC 2960 5.1 Normal Establishment of an Association
>*
>* E) Upon reception of the COOKIE ACK, endpoint "A" will move
> @@ -1433,6 +1441,11 @@ static enum sctp_disposition 
> sctp_sf_do_unexpected_init(
>   struct sctp_packet *packet;
>   int len;
>  
> + /* Update socket peer label if first association. */
> + if (security_sctp_assoc_request((struct sctp_endpoint *)ep,
> + chunk->skb))
> + return sctp_sf_pdiscard(net, ep, asoc, type, arg, commands);
> +
>   /* 6.10 Bundling
>* An endpoint MUST NOT bundle INIT, INIT ACK or
>* SHUTDOWN COM

Re: [PATCH V4 4/4] selinux: Add SCTP support

2017-12-30 Thread Marcelo Ricardo Leitner
On Sat, Dec 30, 2017 at 05:20:35PM +, Richard Haines wrote:
> The SELinux SCTP implementation is explained in:
> Documentation/security/SELinux-sctp.rst
> 
> Signed-off-by: Richard Haines 

Reviewed-by: Marcelo Ricardo Leitner 

Thanks Richard.

> ---
>  Documentation/security/SELinux-sctp.rst | 157 ++
>  security/selinux/hooks.c| 280 
> +---
>  security/selinux/include/classmap.h |   2 +-
>  security/selinux/include/netlabel.h |  21 ++-
>  security/selinux/include/objsec.h   |   4 +
>  security/selinux/netlabel.c | 138 ++--
>  6 files changed, 570 insertions(+), 32 deletions(-)
>  create mode 100644 Documentation/security/SELinux-sctp.rst
> 
> diff --git a/Documentation/security/SELinux-sctp.rst 
> b/Documentation/security/SELinux-sctp.rst
> new file mode 100644
> index 000..2f66bf3
> --- /dev/null
> +++ b/Documentation/security/SELinux-sctp.rst
> @@ -0,0 +1,157 @@
> +SCTP SELinux Support
> +=
> +
> +Security Hooks
> +===
> +
> +``Documentation/security/LSM-sctp.rst`` describes the following SCTP security
> +hooks with the SELinux specifics expanded below::
> +
> +security_sctp_assoc_request()
> +security_sctp_bind_connect()
> +security_sctp_sk_clone()
> +security_inet_conn_established()
> +
> +
> +security_sctp_assoc_request()
> +-
> +Passes the ``@ep`` and ``@chunk->skb`` of the association INIT packet to the
> +security module. Returns 0 on success, error on failure.
> +::
> +
> +@ep - pointer to sctp endpoint structure.
> +@skb - pointer to skbuff of association packet.
> +
> +The security module performs the following operations:
> + IF this is the first association on ``@ep->base.sk``, then set the peer
> + sid to that in ``@skb``. This will ensure there is only one peer sid
> + assigned to ``@ep->base.sk`` that may support multiple associations.
> +
> + ELSE validate the ``@ep->base.sk peer_sid`` against the ``@skb peer 
> sid``
> + to determine whether the association should be allowed or denied.
> +
> + Set the sctp ``@ep sid`` to socket's sid (from ``ep->base.sk``) with
> + MLS portion taken from ``@skb peer sid``. This will be used by SCTP
> + TCP style sockets and peeled off connections as they cause a new socket
> + to be generated.
> +
> + If IP security options are configured (CIPSO/CALIPSO), then the ip
> + options are set on the socket.
> +
> +
> +security_sctp_bind_connect()
> +-
> +Checks permissions required for ipv4/ipv6 addresses based on the ``@optname``
> +as follows::
> +
> +  --
> +  |   BIND Permission Checks   |
> +  |   @optname | @address contains |
> +  ||---|
> +  | SCTP_SOCKOPT_BINDX_ADD | One or more ipv4 / ipv6 addresses |
> +  | SCTP_PRIMARY_ADDR  | Single ipv4 or ipv6 address   |
> +  | SCTP_SET_PEER_PRIMARY_ADDR | Single ipv4 or ipv6 address   |
> +  --
> +
> +  --
> +  | CONNECT Permission Checks  |
> +  |   @optname | @address contains |
> +  ||---|
> +  | SCTP_SOCKOPT_CONNECTX  | One or more ipv4 / ipv6 addresses |
> +  | SCTP_PARAM_ADD_IP  | One or more ipv4 / ipv6 addresses |
> +  | SCTP_SENDMSG_CONNECT   | Single ipv4 or ipv6 address   |
> +  | SCTP_PARAM_SET_PRIMARY | Single ipv4 or ipv6 address   |
> +  --
> +
> +
> +``Documentation/security/LSM-sctp.rst`` gives a summary of the ``@optname``
> +entries and also describes ASCONF chunk processing when Dynamic Address
> +Reconfiguration is enabled.
> +
> +
> +security_sctp_sk_clone()
> +-
> +Called whenever a new socket is created by **accept**\(2) (i.e. a TCP style
> +socket) or when a socket is 'peeled off' e.g userspace calls
> +**sctp_peeloff**\(3). ``security_sctp_sk_clone()`` will set the new
> +sockets sid and peer sid to that contained in the ``@ep sid`` and
> +``@ep peer sid`` respectively.
> +::
> +
> +@ep - pointer to current sctp endpoint structure.
> +@sk - pointer to current sock structure.
> +@sk - pointer to new sock structure.
> +
> +
> +security_inet_conn_established()
> +-
> +Called when a COOKIE ACK is received where it sets the connection's peer sid
> +to that in ``@skb``::
> +
> +@sk  - pointer to sock structure.
> +@skb - pointer to skbuff of the COOKIE ACK packet.
> +
> +
> +Policy Statements
> +===

Re: general protection fault in skb_segment

2017-12-30 Thread Marcelo Ricardo Leitner
On Sat, Dec 30, 2017 at 08:42:41AM +0100, Willem de Bruijn wrote:
> > syzkaller hit the following crash on
> > 37759fa6d0fa9e4d6036d19ac12f555bfc0aeafd
> > git://git.cmpxchg.org/linux-mmots.git/master
> > compiler: gcc (GCC) 7.1.1 20170620
> > .config is attached
> > Raw console output is attached.
> > C reproducer is attached
> > syzkaller reproducer is attached. See https://goo.gl/kgGztJ
> > for information about syzkaller reproducers
> 
> Reproduced with the C reproducer on v4.15-rc1 and mainline
> going back at least to v4.8, but not v4.7. SCTP GSO was
> introduced in v4.8-rc1, so a patch in this set is likely the starting
> point. Indeed crashes at 90017accff61 ("sctp: Add GSO support"),
> but not at 90017accff61~4.
> 
> The reproducer with its sandbox removed shows this invocation in strace -f
> 
> # strace -f ./repro2
> [... skipped ...]
> socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3
> open("/dev/net/tun", O_RDONLY)  = 4
> fcntl(4, F_DUPFD, 3)= 5
> socket(PF_PACKET, SOCK_RAW|SOCK_CLOEXEC, 8) = 6
> ioctl(4, TUNSETIFF, 0x20e63000) = 0
> ioctl(3, SIOCSIFFLAGS, {ifr_name="syz0",
> ifr_flags=IFF_UP|IFF_PROMISC|IFF_ALLMULTI}) = 0
> setsockopt(6, SOL_PACKET, 0xf /* PACKET_??? */, [4096], 4) = 0
> ioctl(6, SIOCGIFINDEX, {ifr_name="syz0", ifr_index=24}) = 0
> bind(6, {sa_family=AF_PACKET, proto=, if24, pkttype=PACKET_HOST,
> addr(6)={1, aa00}, 20) = 0
> dup2(6, 5)  = 5
> write(5, "\0\201\1\0\350\367\0\0\3\0E\364\0 \0d\0\0\7\2042\342\0\0\0
> \177\0\0\1\0\t"..., 42
> 
> where 0xf in setsockopt is PACKET_VNET_HDR
> 
> So this is a packet socket writing something that apparently looks
> like an SCTP packet, is only 42 bytes long, but has GSO set in its
> virtio_net_hdr struct.
> 
> It crashes in skb_segment seemingly on a NULL list_skb.
> 
> (gdb) list *(skb_segment+0x2a4)
> 0x8167cc24 is in skb_segment (net/core/skbuff.c:3566).
> 3561if (hsize < 0)
> 3562hsize = 0;
> 3563if (hsize > len || !sg)
> 3564hsize = len;
> 3565
> 3566if (!hsize && i >= nfrags && skb_headlen(list_skb) &&
> 3567(skb_headlen(list_skb) == len || sg)) {
> 3568BUG_ON(skb_headlen(list_skb) > len);
> 3569
> 3570i = 0;
> 
> Likely there is a hidden assumption about SCTP GSO packets that does
> not hold for such packets generated by PF_PACKET.
> 
> SCTP GSO introduced the GSO_BY_FRAGS mss value, so the code
> takes a different path for SCTP packets generated by the SCTP stack.
> 
> PF_PACKET does not necessarily set gso_size to GSO_BY_FRAGS, so
> does not take the branch that requires list_skb to be non-zero here:
> 
> if (unlikely(mss == GSO_BY_FRAGS)) {
> len = list_skb->len;
> } else {
> len = head_skb->len - offset;
> if (len > mss)
> len = mss;
> }
> 
> hsize = skb_headlen(head_skb) - offset;
> if (hsize < 0)
> hsize = 0;
> if (hsize > len || !sg)
> hsize = len;
> 
> if (!hsize && i >= nfrags && skb_headlen(list_skb) &&
> (skb_headlen(list_skb) == len || sg)) {
> 
> Somewhat tangential, but any PF_PACKET socket can set this
> magic gso_size value in its virtio_net_hdr, so if it is assumed to
> be an SCTP GSO specific option, setting it for a TCP GSO packet
> may also cause unexpected results.

It seems virtio_net could use more sanity checks. When PACKET_VNET_HDR
is used, it will end up calling:
tpacket_rcv() {
...
if (do_vnet) {
if (virtio_net_hdr_from_skb(skb, h.raw + macoff -
sizeof(struct virtio_net_hdr),
vio_le(), true)) {
spin_lock(&sk->sk_receive_queue.lock);
goto drop_n_account;
}
}

and virtio_net_hdr_from_skb does:
if (skb_is_gso(skb)) {
...
if (sinfo->gso_type & SKB_GSO_TCPV4)
hdr->gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
else if (sinfo->gso_type & SKB_GSO_TCPV6)
hdr->gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
else
return -EINVAL;

Meaning that any gso_type other than TCP would be rejected, but this
SCTP one got through. Seems the header contains a sctp header, but the
gso_type set was actually pointing to TCP (otherwise it would have
been rejected). AFAICT if this packet had an ESP header, for example,
it could have hit esp4_gso_segment. Can you please confirm this?

I don't know of anywhere in the stack validating if the gso_type
matches the header that actually is in there.

Th

Re: general protection fault in skb_segment

2017-12-30 Thread Marcelo Ricardo Leitner
On Sat, Dec 30, 2017 at 10:52:20PM -0200, Marcelo Ricardo Leitner wrote:
> On Sat, Dec 30, 2017 at 08:42:41AM +0100, Willem de Bruijn wrote:
[...]
> > Somewhat tangential, but any PF_PACKET socket can set this
> > magic gso_size value in its virtio_net_hdr, so if it is assumed to
> > be an SCTP GSO specific option, setting it for a TCP GSO packet
> > may also cause unexpected results.
> 
> It seems virtio_net could use more sanity checks. When PACKET_VNET_HDR
> is used, it will end up calling:
> tpacket_rcv() {
> ...
> if (do_vnet) {
> if (virtio_net_hdr_from_skb(skb, h.raw + macoff -
> sizeof(struct virtio_net_hdr),
> vio_le(), true)) {
> spin_lock(&sk->sk_receive_queue.lock);
> goto drop_n_account;
> }
> }
> 
> and virtio_net_hdr_from_skb does:
> if (skb_is_gso(skb)) {
> ...
> if (sinfo->gso_type & SKB_GSO_TCPV4)
> hdr->gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
> else if (sinfo->gso_type & SKB_GSO_TCPV6)
> hdr->gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
> else
> return -EINVAL;
> 
> Meaning that any gso_type other than TCP would be rejected, but this
> SCTP one got through. Seems the header contains a sctp header, but the
> gso_type set was actually pointing to TCP (otherwise it would have
> been rejected). AFAICT if this packet had an ESP header, for example,
> it could have hit esp4_gso_segment. Can you please confirm this?

I added:
--- a/net/sctp/offload.c
+++ b/net/sctp/offload.c
@@ -44,6 +44,18 @@ static struct sk_buff *sctp_gso_segment(struct sk_buff *skb,
 {
struct sk_buff *segs = ERR_PTR(-EINVAL);
struct sctphdr *sh;
+   int fail = 0;
+
+   if (!(skb_shinfo(skb)->gso_type & SKB_GSO_SCTP)) {
+   printk("Bogus gso_type: %x\n", skb_shinfo(skb)->gso_type);
+   fail = 1;
+   }
+   if (skb_shinfo(skb)->gso_size != GSO_BY_FRAGS) {
+   printk("Bogus gso_size: %u\n", skb_shinfo(skb)->gso_size);
+   fail = 1;
+   }
+   if (fail)
+   goto out;
 
sh = sctp_hdr(skb);
if (!pskb_may_pull(skb, sizeof(*sh)))

and with the reproducer, got:
[   54.255469] Bogus gso_type: 7
[   54.258801] Bogus gso_size: 63464
[   54.262532] [ cut here ]
[   54.267703] syz0: caps=(0x080058c1, 0x) len=32 
data_len=0 gso_size=63464 gso_type=7 ip_summed0
[   54.279777] WARNING: CPU: 1 PID: 13005 at /root/linux/net/core/dev.c:2600 
skb_warn_bad_offload+0xd6/0xec

gso_type 7 = SKB_GSO_TCPV4 | SKB_GSO_DODGY | SKB_GSO_TCP_ECN
as the warn indicated too.

Once this gets to sctp_gso_segment, it's too late to avoid the
warning. Would be nice if we could somehow filter this earlier in the
process.

  Marcelo


Re: [PATCH v3 net-next 2/5] net: tracepoint: replace tcp_set_state tracepoint with inet_sock_set_state tracepoint

2017-12-30 Thread Yafang Shao
On Sun, Dec 31, 2017 at 6:33 AM, Brendan Gregg
 wrote:
> On Tue, Dec 19, 2017 at 7:12 PM, Yafang Shao  wrote:
>> As sk_state is a common field for struct sock, so the state
>> transition tracepoint should not be a TCP specific feature.
>> Currently it traces all AF_INET state transition, so I rename this
>> tracepoint to inet_sock_set_state tracepoint with some minor changes and 
>> move it
>> into trace/events/sock.h.
>
> The tcp:tcp_set_state probe is tcp_set_state(), so it's only going to
> fire for TCP sessions. It's not broken, and we could add a
> sctp:sctp_set_state as well. Replacing tcp:tcp_set_state with
> inet_sk_set_state is feeling like we might be baking too much
> implementation detail into the tracepoint API.
>
> If we must have inet_sk_set_state, then must we also delete tcp:tcp_set_state?
>

Hi Brendan,

The reason we have to make this change could be got from this mail
thread, https://patchwork.kernel.org/patch/10099243/ .

The original tcp:tcp_set_state probe doesn't traced all TCP state transitions.
There're some state transitions in inet_connection_sock.c and
inet_hashtables.c are missed.
So we have to place this probe into these two files to fix the issue.
But as inet_connection_sock.c and inet_hashtables.c are common files
for all IPv4 protocols, not only for TCP, so it is not proper to place
a tcp_ function in these two files.
That's why we decide to rename tcp:tcp_set_state probe to
sock:inet_sock_set_state.

Thanks
Yafang


Re: [PATCH net] ethtool: do not print warning for applications using legacy API

2017-12-30 Thread David Decotigny
Signed-off-by: David Decotigny 


On Fri, Dec 29, 2017 at 10:02 AM, Stephen Hemminger
 wrote:
> From: Stephen Hemminger 
>
> In kernel log ths message appears on every boot:
>  "warning: `NetworkChangeNo' uses legacy ethtool link settings API,
>   link modes are only partially reported"
>
> When ethtool link settings API changed, it started complaining about
> usages of old API. Ironically, the original patch was from google but
> the application using the legacy API is chrome.
>
> Linux ABI is fixed as much as possible. The kernel must not break it
> and should not complain about applications using legacy API's.
> This patch just removes the warning since using legacy API's
> in Linux is perfectly acceptable.
>
> Fixes: 3f1ac7a700d0 ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
> Signed-off-by: Stephen Hemminger 
> ---
>  net/core/ethtool.c | 15 ++-
>  1 file changed, 2 insertions(+), 13 deletions(-)
>
> diff --git a/net/core/ethtool.c b/net/core/ethtool.c
> index f8fcf450a36e..8225416911ae 100644
> --- a/net/core/ethtool.c
> +++ b/net/core/ethtool.c
> @@ -770,15 +770,6 @@ static int ethtool_set_link_ksettings(struct net_device 
> *dev,
> return dev->ethtool_ops->set_link_ksettings(dev, &link_ksettings);
>  }
>
> -static void
> -warn_incomplete_ethtool_legacy_settings_conversion(const char *details)
> -{
> -   char name[sizeof(current->comm)];
> -
> -   pr_info_once("warning: `%s' uses legacy ethtool link settings API, 
> %s\n",
> -get_task_comm(name, current), details);
> -}
> -
>  /* Query device for its ethtool_cmd settings.
>   *
>   * Backward compatibility note: for compatibility with legacy ethtool,
> @@ -805,10 +796,8 @@ static int ethtool_get_settings(struct net_device *dev, 
> void __user *useraddr)
>&link_ksettings);
> if (err < 0)
> return err;
> -   if (!convert_link_ksettings_to_legacy_settings(&cmd,
> -  
> &link_ksettings))
> -   warn_incomplete_ethtool_legacy_settings_conversion(
> -   "link modes are only partially reported");
> +   convert_link_ksettings_to_legacy_settings(&cmd,
> + &link_ksettings);
>
> /* send a sensible cmd tag back to user */
> cmd.cmd = ETHTOOL_GSET;
> --
> 2.11.0
>


Re: [PATCH] rds: fix use-after-free read in rds_find_bound

2017-12-30 Thread santosh.shilim...@oracle.com

On 12/30/17 2:32 PM, Sowmini Varadhan wrote:

On (12/30/17 13:37), santosh.shilim...@oracle.com wrote:


[...]


Thats what I thought as well initially but since the reported case,
the rs seems to be valid where as sk seems to be freed up as part of
sock_release callback.


I dont understand the statement above- how can "rs be valid, and sk
be freed"?

rs_sk is embedded in the struct rds_sock, it is not a pointer.


I was going with order of evaluation of if () but you made good point.
rs_sk isn't a pointer so sk can't be null.


let's find and fix the refcount bug. See stack trace in commit comment.
The socket release is happening prematurely and existing WARN_ONs
are not catching it.


Right. This was loop transport in action so xmit will just flip
the direction with receive. And rds_recv_incoming() can race with 
socket_release. rds_find_bound() is suppose to add ref count on

socket for rds_recv_incoming() but by that time socket is DEAD &
freed by socket release callback.
And rds_release is suppose to be synced with rs_recv_lock. But
release callback is marking the sk orphan before syncing
up with receive path and updating the bind table. Probably it
can pushed down further after the socket clean up buut need
to think bit more.


diff --git a/net/rds/af_rds.c b/net/rds/af_rds.c
index b405f77..11e1426 100644
--- a/net/rds/af_rds.c
+++ b/net/rds/af_rds.c
@@ -65,7 +65,6 @@ static int rds_release(struct socket *sock)

rs = rds_sk_to_rs(sk);

-   sock_orphan(sk);
/* Note - rds_clear_recv_queue grabs rs_recv_lock, so
 * that ensures the recv path has completed messing
 * with the socket. */
@@ -85,6 +84,7 @@ static int rds_release(struct socket *sock)

rds_trans_put(rs->rs_transport);

+   sock_orphan(sk);
sock->sk = NULL;
sock_put(sk);
 out:


GREETINGS FROM MOHAMMED AHMED .

2017-12-30 Thread Mr Mohamad Ahmed
Hi Friend.

I am Mr. Mohammed Ahmed a banker in Bank of Africa Burkina Faso West
Africa, Please i want to transfer an abandoned sum of 13.5 millions
USD to your account.50% will be for you and 50% for me.

No risk involved. Contact me for more details along with your personal
information needed below.

1. Full name:.
2. Current Address:.
3. Phone.
4. Occupation:.
5. Age:
6. Country:
7. Sex
8. Your Passport or ID card or Driving License

Thanks.

Mr. Mohammed Ahmed.


Re: xfrm: Forbid state updates from changing encap type

2017-12-30 Thread Steffen Klassert
On Tue, Dec 26, 2017 at 05:34:44PM +1100, Herbert Xu wrote:
> Currently we allow state updates to competely replace the contents
> of x->encap.  This is bad because on the user side ESP only sets up
> header lengths depending on encap_type once when the state is first
> created.  This could result in the header lengths getting out of
> sync with the actual state configuration.
> 
> In practice key managers will never do a state update to change the
> encapsulation type.  Only the port numbers need to be changed as the
> peer NAT entry is updated.
> 
> Therefore this patch adds a check in xfrm_state_update to forbid
> any changes to the encap_type.
> 
> Signed-off-by: Herbert Xu 

Applied to the ipsec tree, thanks Herbert!


Re: [PATCH ipsec] xfrm: skip policies marked as dead while rehashing

2017-12-30 Thread Steffen Klassert
On Wed, Dec 27, 2017 at 11:25:45PM +0100, Florian Westphal wrote:
> syzkaller triggered following KASAN splat:
> 
> BUG: KASAN: slab-out-of-bounds in xfrm_hash_rebuild+0xdbe/0xf00 
> net/xfrm/xfrm_policy.c:618
> read of size 2 at addr 8801c8e92fe4 by task kworker/1:1/23 [..]
> Workqueue: events xfrm_hash_rebuild [..]
>  __asan_report_load2_noabort+0x14/0x20 mm/kasan/report.c:428
>  xfrm_hash_rebuild+0xdbe/0xf00 net/xfrm/xfrm_policy.c:618
>  process_one_work+0xbbf/0x1b10 kernel/workqueue.c:2112
>  worker_thread+0x223/0x1990 kernel/workqueue.c:2246 [..]
> 
> The reproducer triggers:
> 1016 if (error) {
> 1017 list_move_tail(&walk->walk.all, &x->all);
> 1018 goto out;
> 1019 }
> 
> in xfrm_policy_walk() via pfkey (it sets tiny rcv space, dump
> callback returns -ENOBUFS).
> 
> In this case, *walk is located the pfkey socket struct, so this socket
> becomes visible in the global policy list.
> 
> It looks like this is intentional -- phony walker has walk.dead set to 1
> and all other places skip such "policies".
> 
> Ccing original authors of the two commits that seem to expose this
> issue (first patch missed ->dead check, second patch adds pfkey
> sockets to policies dumper list).
> 
> Fixes: 880a6fab8f6ba5b ("xfrm: configure policy hash table thresholds by 
> netlink")
> Fixes: 12a169e7d8f4b1c ("ipsec: Put dumpers on the dump list")
> Cc: Herbert Xu 
> Cc: Timo Teras 
> Cc: Christophe Gouault 
> Reported-by: syzbot 
> 
> Signed-off-by: Florian Westphal 

Applied, thanks a lot!


Re: [PATCH] af_key: fix buffer overread in verify_address_len()

2017-12-30 Thread Steffen Klassert
On Fri, Dec 29, 2017 at 06:13:05PM -0600, Eric Biggers wrote:
> From: Eric Biggers 
> 
> If a message sent to a PF_KEY socket ended with one of the extensions
> that takes a 'struct sadb_address' but there were not enough bytes
> remaining in the message for the ->sa_family member of the 'struct
> sockaddr' which is supposed to follow, then verify_address_len() read
> past the end of the message, into uninitialized memory.  Fix it by
> returning -EINVAL in this case.
> 
> This bug was found using syzkaller with KMSAN.
> 
> Reproducer:
> 
>   #include 
>   #include 
>   #include 
> 
>   int main()
>   {
>   int sock = socket(PF_KEY, SOCK_RAW, PF_KEY_V2);
>   char buf[24] = { 0 };
>   struct sadb_msg *msg = (void *)buf;
>   struct sadb_address *addr = (void *)(msg + 1);
> 
>   msg->sadb_msg_version = PF_KEY_V2;
>   msg->sadb_msg_type = SADB_DELETE;
>   msg->sadb_msg_len = 3;
>   addr->sadb_address_len = 1;
>   addr->sadb_address_exttype = SADB_EXT_ADDRESS_SRC;
> 
>   write(sock, buf, 24);
>   }
> 
> Reported-by: Alexander Potapenko 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Eric Biggers 

Applied to the ipsec tree, thanks Eric!


Re: [PATCH] af_key: fix buffer overread in parse_exthdrs()

2017-12-30 Thread Steffen Klassert
On Fri, Dec 29, 2017 at 06:15:23PM -0600, Eric Biggers wrote:
> From: Eric Biggers 
> 
> If a message sent to a PF_KEY socket ended with an incomplete extension
> header (fewer than 4 bytes remaining), then parse_exthdrs() read past
> the end of the message, into uninitialized memory.  Fix it by returning
> -EINVAL in this case.
> 
> Reproducer:
> 
>   #include 
>   #include 
>   #include 
> 
>   int main()
>   {
>   int sock = socket(PF_KEY, SOCK_RAW, PF_KEY_V2);
>   char buf[17] = { 0 };
>   struct sadb_msg *msg = (void *)buf;
> 
>   msg->sadb_msg_version = PF_KEY_V2;
>   msg->sadb_msg_type = SADB_DELETE;
>   msg->sadb_msg_len = 2;
> 
>   write(sock, buf, 17);
>   }
> 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Eric Biggers 

Also applied to the ipsec tree, thanks a lot!