Re: [PATCH] staging: android: vsoc: Remove VSOC_WAIT_FOR_INCOMING_INTERRUPT

2019-02-11 Thread Alistair Strachan
On Mon, Feb 11, 2019 at 9:22 AM Todd Kjos  wrote:
>
> +Alistair Strachan
>
> On Mon, Feb 11, 2019 at 9:11 AM Greg KH  wrote:
> >
> > On Mon, Feb 11, 2019 at 10:15:18PM +0530, Souptick Joarder wrote:
> > > On Mon, Feb 11, 2019 at 9:27 PM Greg KH  
> > > wrote:
> > > >
> > > > On Mon, Feb 11, 2019 at 09:21:19PM +0530, Souptick Joarder wrote:
> > > > > On Mon, Feb 11, 2019 at 9:10 PM Greg KH  
> > > > > wrote:
> > > > > >
> > > > > > On Mon, Feb 11, 2019 at 08:56:02PM +0530, Souptick Joarder wrote:
> > > > > > > As mentioned in TODO list, Removed 
> > > > > > > VSOC_WAIT_FOR_INCOMING_INTERRUPT
> > > > > > > ioctl. This functionality has been superseded by the futex and is
> > > > > > > there for legacy reasons.
> > > > > > >
> > > > > > > Signed-off-by: Souptick Joarder 
> > > > > > > ---
> > > > > > >  drivers/staging/android/uapi/vsoc_shm.h | 7 ---
> > > > > > >  drivers/staging/android/vsoc.c  | 5 -
> > > > > > >  2 files changed, 12 deletions(-)
> > > > > >
> > > > > > So userspace is all fixed up now and this ioctl can be dropped?  Any
> > > > > > pointers to the userspace commit that did this?

The ioctl is still being used and cannot be removed.

> > > > >
> > > > > I am not sure about user space part.
> > > >
> > > > Then we can not just delete the ioctl :)
> > >
> > > Agree, but where to verify the user space commit ?
> > > Any pointer to the source code path ?

The userspace code using the Linux 'vsoc' staging driver can be cloned
from here:

https://android.googlesource.com/device/google/cuttlefish_common

> >
> > Please work with the android developers to solve this.  It should be in
> > AOSP "somewhere" :(

I'm working on documenting this better on source.android.com. Stay tuned.

> >
> > good luck,
> >
> > greg k-h


Re: [PATCH v2] media: uvcvideo: Fix 'type' check leading to overflow

2018-12-19 Thread Alistair Strachan
On Wed, Dec 19, 2018 at 12:16 AM Laurent Pinchart
 wrote:
>
> Hi Alistair,
>
> Thank you for the patch.
>
> On Wednesday, 19 December 2018 03:32:48 EET Alistair Strachan wrote:
> > From: Laurent Pinchart 
>
> Are you sure you don't want to keep authorship ? I've merely reviewed v1 and
> proposed an alternative implementation :-) Let me know what you would prefer
> and I'll apply this to my tree.

Whatever attribution you think is best works for me! Thank you for
picking up this patch.

> > When initially testing the Camera Terminal Descriptor wTerminalType
> > field (buffer[4]), no mask is used. Later in the function, the MSB is
> > overloaded to store the descriptor subtype, and so a mask of 0x7fff
> > is used to check the type.
> >
> > If a descriptor is specially crafted to set this overloaded bit in the
> > original wTerminalType field, the initial type check will fail (falling
> > through, without adjusting the buffer size), but the later type checks
> > will pass, assuming the buffer has been made suitably large, causing an
> > overflow.
> >
> > Avoid this problem by checking for the MSB in the wTerminalType field.
> > If the bit is set, assume the descriptor is bad, and abort parsing it.
> >
> > Originally reported here:
> > https://groups.google.com/forum/#!topic/syzkaller/Ot1fOE6v1d8
> > A similar (non-compiling) patch was provided at that time.
> >
> > Reported-by: syzbot 
> > Signed-off-by: Alistair Strachan 
> > Cc: Laurent Pinchart 
> > Cc: Mauro Carvalho Chehab 
> > Cc: linux-me...@vger.kernel.org
> > Cc: kernel-t...@android.com
> > ---
> > v2: Use an alternative fix suggested by Laurent
> >  drivers/media/usb/uvc/uvc_driver.c | 14 +++---
> >  1 file changed, 11 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/media/usb/uvc/uvc_driver.c
> > b/drivers/media/usb/uvc/uvc_driver.c index bc369a0934a3..7fde3ce642c4
> > 100644
> > --- a/drivers/media/usb/uvc/uvc_driver.c
> > +++ b/drivers/media/usb/uvc/uvc_driver.c
> > @@ -1065,11 +1065,19 @@ static int uvc_parse_standard_control(struct
> > uvc_device *dev, return -EINVAL;
> >   }
> >
> > - /* Make sure the terminal type MSB is not null, otherwise it
> > -  * could be confused with a unit.
> > + /*
> > +  * Reject invalid terminal types that would cause issues:
> > +  *
> > +  * - The high byte must be non-zero, otherwise it would be
> > +  *   confused with a unit.
> > +  *
> > +  * - Bit 15 must be 0, as we use it internally as a terminal
> > +  *   direction flag.
> > +  *
> > +  * Other unknown types are accepted.
> >*/
> >   type = get_unaligned_le16([4]);
> > - if ((type & 0xff00) == 0) {
> > + if ((type & 0x7f00) == 0 || (type & 0x8000) != 0) {
> >   uvc_trace(UVC_TRACE_DESCR, "device %d videocontrol "
> >   "interface %d INPUT_TERMINAL %d has invalid "
> >   "type 0x%04x, skipping\n", udev->devnum,
>
> --
> Regards,
>
> Laurent Pinchart
>
>
>


[PATCH v2] media: uvcvideo: Fix 'type' check leading to overflow

2018-12-18 Thread Alistair Strachan
From: Laurent Pinchart 

When initially testing the Camera Terminal Descriptor wTerminalType
field (buffer[4]), no mask is used. Later in the function, the MSB is
overloaded to store the descriptor subtype, and so a mask of 0x7fff
is used to check the type.

If a descriptor is specially crafted to set this overloaded bit in the
original wTerminalType field, the initial type check will fail (falling
through, without adjusting the buffer size), but the later type checks
will pass, assuming the buffer has been made suitably large, causing an
overflow.

Avoid this problem by checking for the MSB in the wTerminalType field.
If the bit is set, assume the descriptor is bad, and abort parsing it.

Originally reported here:
https://groups.google.com/forum/#!topic/syzkaller/Ot1fOE6v1d8
A similar (non-compiling) patch was provided at that time.

Reported-by: syzbot 
Signed-off-by: Alistair Strachan 
Cc: Laurent Pinchart 
Cc: Mauro Carvalho Chehab 
Cc: linux-me...@vger.kernel.org
Cc: kernel-t...@android.com
---
v2: Use an alternative fix suggested by Laurent
 drivers/media/usb/uvc/uvc_driver.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/media/usb/uvc/uvc_driver.c 
b/drivers/media/usb/uvc/uvc_driver.c
index bc369a0934a3..7fde3ce642c4 100644
--- a/drivers/media/usb/uvc/uvc_driver.c
+++ b/drivers/media/usb/uvc/uvc_driver.c
@@ -1065,11 +1065,19 @@ static int uvc_parse_standard_control(struct uvc_device 
*dev,
return -EINVAL;
}
 
-   /* Make sure the terminal type MSB is not null, otherwise it
-* could be confused with a unit.
+   /*
+* Reject invalid terminal types that would cause issues:
+*
+* - The high byte must be non-zero, otherwise it would be
+*   confused with a unit.
+*
+* - Bit 15 must be 0, as we use it internally as a terminal
+*   direction flag.
+*
+* Other unknown types are accepted.
 */
type = get_unaligned_le16([4]);
-   if ((type & 0xff00) == 0) {
+   if ((type & 0x7f00) == 0 || (type & 0x8000) != 0) {
uvc_trace(UVC_TRACE_DESCR, "device %d videocontrol "
"interface %d INPUT_TERMINAL %d has invalid "
"type 0x%04x, skipping\n", udev->devnum,
-- 
2.20.1.415.g653613c723-goog



Re: [PATCH] media: uvcvideo: Fix 'type' check leading to overflow

2018-12-18 Thread Alistair Strachan
Hi Laurent,

On Tue, Dec 18, 2018 at 1:42 AM Laurent Pinchart
 wrote:
>
> Hi Alistair,
>
> Thank you for the patch.
>
> On Monday, 17 December 2018 23:02:22 EET Alistair Strachan wrote:
> > When initially testing the Camera Terminal Descriptor wTerminalType
> > field (buffer[4]), no mask is used. Later in the function, the MSB is
> > overloaded to store the descriptor subtype, and so a mask of 0x7fff
> > is used to check the type.
> >
> > If a descriptor is specially crafted to set this overloaded bit in the
> > original wTerminalType field, the initial type check will fail (falling
> > through, without adjusting the buffer size), but the later type checks
> > will pass, assuming the buffer has been made suitably large, causing an
> > overflow.
> >
> > This problem could be resolved in a few different ways, but this fix
> > applies the same initial type check as used by UVC_ENTITY_TYPE (once we
> > have a 'term' structure.) Such crafted wTerminalType fields will then be
> > treated as *generic* Input Terminals, not as CAMERA or
> > MEDIA_TRANSPORT_INPUT, avoiding an overflow.
> >
> > Originally reported here:
> > https://groups.google.com/forum/#!topic/syzkaller/Ot1fOE6v1d8
> > A similar (non-compiling) patch was provided at that time.
> >
> > Reported-by: syzbot 
> > Signed-off-by: Alistair Strachan 
> > Cc: Laurent Pinchart 
> > Cc: Mauro Carvalho Chehab 
> > Cc: linux-me...@vger.kernel.org
> > Cc: kernel-t...@android.com
> > ---
> >  drivers/media/usb/uvc/uvc_driver.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/media/usb/uvc/uvc_driver.c
> > b/drivers/media/usb/uvc/uvc_driver.c index bc369a0934a3..279a967b8264
> > 100644
> > --- a/drivers/media/usb/uvc/uvc_driver.c
> > +++ b/drivers/media/usb/uvc/uvc_driver.c
> > @@ -1082,11 +1082,11 @@ static int uvc_parse_standard_control(struct
> > uvc_device *dev, p = 0;
> >   len = 8;
> >
> > - if (type == UVC_ITT_CAMERA) {
> > + if ((type & 0x7fff) == UVC_ITT_CAMERA) {
> >   n = buflen >= 15 ? buffer[14] : 0;
> >   len = 15;
> >
> > - } else if (type == UVC_ITT_MEDIA_TRANSPORT_INPUT) {
> > + } else if ((type & 0x7fff) == UVC_ITT_MEDIA_TRANSPORT_INPUT) {
> >   n = buflen >= 9 ? buffer[8] : 0;
> >   p = buflen >= 10 + n ? buffer[9+n] : 0;
> >   len = 10;
>
> How about rejecting invalid types instead ? Something along the lines of

This looks simpler/better to me. I just re-sent your version of the
change (build + runtime tested.) Thanks for reviewing.

> diff --git a/drivers/media/usb/uvc/uvc_driver.c 
> b/drivers/media/usb/uvc/uvc_driver.c
> index b62cbd800111..33a22c016456 100644
> --- a/drivers/media/usb/uvc/uvc_driver.c
> +++ b/drivers/media/usb/uvc/uvc_driver.c
> @@ -1106,11 +1106,19 @@ static int uvc_parse_standard_control(struct 
> uvc_device *dev,
> return -EINVAL;
> }
>
> -   /* Make sure the terminal type MSB is not null, otherwise it
> -* could be confused with a unit.
> +   /*
> +* Reject invalid terminal types that would cause issues:
> +*
> +* - The high byte must be non-zero, otherwise it would be
> +*   confused with a unit.
> +*
> +* - Bit 15 must be 0, as we use it internally as a terminal
> +*   direction flag.
> +*
> +* Other unknown types are accepted.
>  */
> type = get_unaligned_le16([4]);
> -   if ((type & 0xff00) == 0) {
> +   if ((type & 0x7f00) == 0 || (type & 0x8000) != 0) {
> uvc_trace(UVC_TRACE_DESCR, "device %d videocontrol "
> "interface %d INPUT_TERMINAL %d has invalid "
> "type 0x%04x, skipping\n", udev->devnum,
>
> --
> Regards,
>
> Laurent Pinchart
>
>
>


[PATCH] media: uvcvideo: Fix 'type' check leading to overflow

2018-12-17 Thread Alistair Strachan
When initially testing the Camera Terminal Descriptor wTerminalType
field (buffer[4]), no mask is used. Later in the function, the MSB is
overloaded to store the descriptor subtype, and so a mask of 0x7fff
is used to check the type.

If a descriptor is specially crafted to set this overloaded bit in the
original wTerminalType field, the initial type check will fail (falling
through, without adjusting the buffer size), but the later type checks
will pass, assuming the buffer has been made suitably large, causing an
overflow.

This problem could be resolved in a few different ways, but this fix
applies the same initial type check as used by UVC_ENTITY_TYPE (once we
have a 'term' structure.) Such crafted wTerminalType fields will then be
treated as *generic* Input Terminals, not as CAMERA or
MEDIA_TRANSPORT_INPUT, avoiding an overflow.

Originally reported here:
https://groups.google.com/forum/#!topic/syzkaller/Ot1fOE6v1d8
A similar (non-compiling) patch was provided at that time.

Reported-by: syzbot 
Signed-off-by: Alistair Strachan 
Cc: Laurent Pinchart 
Cc: Mauro Carvalho Chehab 
Cc: linux-me...@vger.kernel.org
Cc: kernel-t...@android.com
---
 drivers/media/usb/uvc/uvc_driver.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/media/usb/uvc/uvc_driver.c 
b/drivers/media/usb/uvc/uvc_driver.c
index bc369a0934a3..279a967b8264 100644
--- a/drivers/media/usb/uvc/uvc_driver.c
+++ b/drivers/media/usb/uvc/uvc_driver.c
@@ -1082,11 +1082,11 @@ static int uvc_parse_standard_control(struct uvc_device 
*dev,
p = 0;
len = 8;
 
-   if (type == UVC_ITT_CAMERA) {
+   if ((type & 0x7fff) == UVC_ITT_CAMERA) {
n = buflen >= 15 ? buffer[14] : 0;
len = 15;
 
-   } else if (type == UVC_ITT_MEDIA_TRANSPORT_INPUT) {
+   } else if ((type & 0x7fff) == UVC_ITT_MEDIA_TRANSPORT_INPUT) {
n = buflen >= 9 ? buffer[8] : 0;
p = buflen >= 10 + n ? buffer[9+n] : 0;
len = 10;
-- 
2.20.0.405.gbc1bbc6f85-goog



[tip:x86/urgent] x86/vdso: Pass --eh-frame-hdr to the linker

2018-12-15 Thread tip-bot for Alistair Strachan
Commit-ID:  cd01544a268ad8ee5b1dfe42c4393f1095f86879
Gitweb: https://git.kernel.org/tip/cd01544a268ad8ee5b1dfe42c4393f1095f86879
Author: Alistair Strachan 
AuthorDate: Fri, 14 Dec 2018 14:36:37 -0800
Committer:  Borislav Petkov 
CommitDate: Sat, 15 Dec 2018 11:37:51 +0100

x86/vdso: Pass --eh-frame-hdr to the linker

Commit

  379d98ddf413 ("x86: vdso: Use $LD instead of $CC to link")

accidentally broke unwinding from userspace, because ld would strip the
.eh_frame sections when linking.

Originally, the compiler would implicitly add --eh-frame-hdr when
invoking the linker, but when this Makefile was converted from invoking
ld via the compiler, to invoking it directly (like vmlinux does),
the flag was missed. (The EH_FRAME section is important for the VDSO
shared libraries, but not for vmlinux.)

Fix the problem by explicitly specifying --eh-frame-hdr, which restores
parity with the old method.

See relevant bug reports for additional info:

  https://bugzilla.kernel.org/show_bug.cgi?id=201741
  https://bugzilla.redhat.com/show_bug.cgi?id=1659295

Fixes: 379d98ddf413 ("x86: vdso: Use $LD instead of $CC to link")
Reported-by: Florian Weimer 
Reported-by: Carlos O'Donell 
Reported-by: "H. J. Lu" 
Signed-off-by: Alistair Strachan 
Signed-off-by: Borislav Petkov 
Tested-by: Laura Abbott 
Cc: Andy Lutomirski 
Cc: Carlos O'Donell 
Cc: "H. Peter Anvin" 
Cc: Ingo Molnar 
Cc: Joel Fernandes 
Cc: kernel-t...@android.com
Cc: Laura Abbott 
Cc: stable 
Cc: Thomas Gleixner 
Cc: X86 ML 
Link: https://lkml.kernel.org/r/20181214223637.35954-1-astrac...@google.com
---
 arch/x86/entry/vdso/Makefile | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 0624bf2266fd..5bfe2243a08f 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -171,7 +171,8 @@ quiet_cmd_vdso = VDSO$@
 sh $(srctree)/$(src)/checkundef.sh '$(NM)' '$@'
 
 VDSO_LDFLAGS = -shared $(call ld-option, --hash-style=both) \
-   $(call ld-option, --build-id) -Bsymbolic
+   $(call ld-option, --build-id) $(call ld-option, --eh-frame-hdr) \
+   -Bsymbolic
 GCOV_PROFILE := n
 
 #


[PATCH v2] x86: vdso: Pass --eh-frame-hdr to ld

2018-12-14 Thread Alistair Strachan
Commit 379d98ddf413 ("x86: vdso: Use $LD instead of $CC to link")
accidentally broke unwinding from userspace, because ld would strip the
.eh_frame sections when linking.

Originally, the compiler would implicitly add --eh-frame-hdr when
invoking the linker, but when this Makefile was converted from invoking
ld via the compiler, to invoking it directly (like vmlinux does),
the flag was missed. (The EH_FRAME section is important for the VDSO
shared libraries, but not for vmlinux.)

Fix the problem by explicitly specifying --eh-frame-hdr, which restores
parity with the old method.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201741
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1659295
Reported-by: Florian Weimer 
Reported-by: Carlos O'Donell 
Reported-by: "H. J. Lu" 
Tested-by: Laura Abbott 
Fixes: 379d98ddf413 ("x86: vdso: Use $LD instead of $CC to link")
Cc: sta...@vger.kernel.org
Cc: Andy Lutomirski 
Cc: Thomas Gleixner 
Cc: "H. Peter Anvin" 
Cc: X86 ML 
Cc: Joel Fernandes 
Cc: kernel-t...@android.com
Signed-off-by: Alistair Strachan 
---
v2: Updated commit message, no changes to the code
 arch/x86/entry/vdso/Makefile | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 141d415a8c80..c3d7ccd25381 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -171,7 +171,8 @@ quiet_cmd_vdso = VDSO$@
 sh $(srctree)/$(src)/checkundef.sh '$(NM)' '$@'
 
 VDSO_LDFLAGS = -shared $(call ld-option, --hash-style=both) \
-   $(call ld-option, --build-id) -Bsymbolic
+   $(call ld-option, --build-id) $(call ld-option, --eh-frame-hdr) \
+   -Bsymbolic
 GCOV_PROFILE := n
 
 #
-- 
2.20.0.405.gbc1bbc6f85-goog



[PATCH] x86: vdso: Pass --eh-frame-hdr to ld

2018-12-14 Thread Alistair Strachan
Commit 379d98ddf413 ("x86: vdso: Use $LD instead of $CC to link")
accidentally broke unwinding from userspace, because ld would strip the
.eh_frame sections when linking.

Originally, the compiler would implicitly add --eh-frame-hdr when
invoking the linker, but when this Makefile was converted from invoking
ld via the compiler, to invoking it directly (like vmlinux does),
the flag was missed. (The EH_FRAME section is important for the VDSO
shared libraries, but not for vmlinux.)

Fix the problem by explicitly specifying --eh-frame-hdr, which restores
parity with the old method.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201741
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1659295
Reported-by: Laura Abbott 
Fixes: 379d98ddf413 ("x86: vdso: Use $LD instead of $CC to link")
Cc: sta...@vger.kernel.org
Cc: Andy Lutomirski 
Cc: Thomas Gleixner 
Cc: "H. Peter Anvin" 
Cc: X86 ML 
Cc: Florian Weimer ,
Cc: Carlos O'Donell ,
Cc: "H. J. Lu" 
Cc: Joel Fernandes 
Cc: kernel-t...@android.com
Signed-off-by: Alistair Strachan 
---
 arch/x86/entry/vdso/Makefile | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 141d415a8c80..c3d7ccd25381 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -171,7 +171,8 @@ quiet_cmd_vdso = VDSO$@
 sh $(srctree)/$(src)/checkundef.sh '$(NM)' '$@'
 
 VDSO_LDFLAGS = -shared $(call ld-option, --hash-style=both) \
-   $(call ld-option, --build-id) -Bsymbolic
+   $(call ld-option, --build-id) $(call ld-option, --eh-frame-hdr) \
+   -Bsymbolic
 GCOV_PROFILE := n
 
 #
-- 
2.20.0.405.gbc1bbc6f85-goog



Re: Unwinding regression with 379d98ddf413 ("x86: vdso: Use $LD instead of $CC to link")

2018-12-14 Thread Alistair Strachan
Hi Laura,

On Fri, Dec 14, 2018 at 1:48 PM Laura Abbott  wrote:
> Hi,
>
> There are two reports of a regression with unwinding with
> 379d98ddf413 ("x86: vdso: Use $LD instead of $CC to link")
>
> https://bugzilla.kernel.org/show_bug.cgi?id=201741
> https://bugzilla.redhat.com/show_bug.cgi?id=1659295
>
> It looks like the unwinding information has been
> stripped out. Any ideas?

Sorry about this. I missed a critical flag (--eh-frame-hdr) that gcc
specifies implicitly when converting the Makefile to invoke ld
directly. This went unnoticed because the linking of vmlinux (which
also uses ld directly) does not need this information, so it does not
specify the flag. Sending a verified fix now.

> Thanks,
> Laura


"x86/mm: Introduce the 'no5lvl' kernel parameter" broke SETUP_DTB?

2018-11-02 Thread Alistair Strachan
Hi Kirill,

I noticed that booting 4.19 in qemu while injecting a FDT using the
"-dtb /path/to/blob" feature might have been broken by your change
372fddf70904 ("x86/mm: Introduce the 'no5lvl' kernel parameter").

This manifests either as FDT corruption, which causes the setup code
to fail to unpack it (i.e. corruption of the device-tree structure),
or simply bad node data. If I make the below change, the problem goes
away:

diff --git a/arch/x86/boot/compressed/pgtable_64.c
b/arch/x86/boot/compressed/pgtable_64.c
index 8c5107545251..bfe5aca71254 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -40,7 +40,7 @@ struct paging_config paging_prepare(void *rmode)
unsigned long bios_start, ebda_start;

/* Initialize boot_params. Required for cmdline_find_option_bool(). */
-   boot_params = rmode;
+   //boot_params = rmode;

/*
 * Check if LA57 is desired and supported.

This can be reproduced with the system-root.dtb file in
https://android.googlesource.com/device/google/cuttlefish/+archive/master.tar.gz
using "qemu -dtb system-root.dtb -kernel /path/to/bzImage -drive
file=root.ext4" on x86_64_defconfig with CONFIG_OF_UNITTEST enabled.

If the FDT is unpacked successfully, the
/proc/device-tree/firmware/android/compatible file will exist, and
contain the string "android,firmware" instead of junk.

I'm still looking into the root cause for this, but I just wanted to
let you know.

Alistair.


"x86/mm: Introduce the 'no5lvl' kernel parameter" broke SETUP_DTB?

2018-11-02 Thread Alistair Strachan
Hi Kirill,

I noticed that booting 4.19 in qemu while injecting a FDT using the
"-dtb /path/to/blob" feature might have been broken by your change
372fddf70904 ("x86/mm: Introduce the 'no5lvl' kernel parameter").

This manifests either as FDT corruption, which causes the setup code
to fail to unpack it (i.e. corruption of the device-tree structure),
or simply bad node data. If I make the below change, the problem goes
away:

diff --git a/arch/x86/boot/compressed/pgtable_64.c
b/arch/x86/boot/compressed/pgtable_64.c
index 8c5107545251..bfe5aca71254 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -40,7 +40,7 @@ struct paging_config paging_prepare(void *rmode)
unsigned long bios_start, ebda_start;

/* Initialize boot_params. Required for cmdline_find_option_bool(). */
-   boot_params = rmode;
+   //boot_params = rmode;

/*
 * Check if LA57 is desired and supported.

This can be reproduced with the system-root.dtb file in
https://android.googlesource.com/device/google/cuttlefish/+archive/master.tar.gz
using "qemu -dtb system-root.dtb -kernel /path/to/bzImage -drive
file=root.ext4" on x86_64_defconfig with CONFIG_OF_UNITTEST enabled.

If the FDT is unpacked successfully, the
/proc/device-tree/firmware/android/compatible file will exist, and
contain the string "android,firmware" instead of junk.

I'm still looking into the root cause for this, but I just wanted to
let you know.

Alistair.


Re: [PATCH] Trivial numbering change in comments.

2018-08-28 Thread Alistair Strachan
On Tue, Aug 28, 2018 at 1:08 PM Ray Clinton  wrote:
> I'm trying to get my feet wet in kernel dev and while working on a patch
> I noticed in a lengthy comment block that the number "2" was skipped
> in a description of the steps of a protocol. This patch is simply correcting
> this. This is for 4.18.0.
>
> It is a very trivial patch, but this is my first actual try at one and
> thought I
> might start out small (and I don't think you can get smaller than a single
>  character change). Any and all advice (about this patch, email, or
> anything else) is very welcome.
>
> Thanks so much!
> Ray
>
> Signed-off-by: Ray Clinton 

Acked-by: Alistair Strachan 

> ---
>  drivers/staging/android/uapi/vsoc_shm.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/staging/android/uapi/vsoc_shm.h
> b/drivers/staging/android/uapi/vsoc_shm.h
> index 6291fb2..0301172 100644
> --- a/drivers/staging/android/uapi/vsoc_shm.h
> +++ b/drivers/staging/android/uapi/vsoc_shm.h
> @@ -92,7 +92,7 @@ struct fd_scoped_permission_arg {
>   *The transmitter can choose any appropriate hashing algorithm, including
>   *hash = futex_offset & ((1 << num_nodes_lg2) - 1)
>   *
> - * 3. The transmitter should atomically compare and swap futex_offset with 0
> + * 2. The transmitter should atomically compare and swap futex_offset with 0
>   *at hash. There are 3 possible outcomes
>   *  a. The swap fails because the futex_offset is already in the table.
>   * The transmitter should stop.
> --
> 2.7.4


Re: [PATCH] Trivial numbering change in comments.

2018-08-28 Thread Alistair Strachan
On Tue, Aug 28, 2018 at 1:08 PM Ray Clinton  wrote:
> I'm trying to get my feet wet in kernel dev and while working on a patch
> I noticed in a lengthy comment block that the number "2" was skipped
> in a description of the steps of a protocol. This patch is simply correcting
> this. This is for 4.18.0.
>
> It is a very trivial patch, but this is my first actual try at one and
> thought I
> might start out small (and I don't think you can get smaller than a single
>  character change). Any and all advice (about this patch, email, or
> anything else) is very welcome.
>
> Thanks so much!
> Ray
>
> Signed-off-by: Ray Clinton 

Acked-by: Alistair Strachan 

> ---
>  drivers/staging/android/uapi/vsoc_shm.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/staging/android/uapi/vsoc_shm.h
> b/drivers/staging/android/uapi/vsoc_shm.h
> index 6291fb2..0301172 100644
> --- a/drivers/staging/android/uapi/vsoc_shm.h
> +++ b/drivers/staging/android/uapi/vsoc_shm.h
> @@ -92,7 +92,7 @@ struct fd_scoped_permission_arg {
>   *The transmitter can choose any appropriate hashing algorithm, including
>   *hash = futex_offset & ((1 << num_nodes_lg2) - 1)
>   *
> - * 3. The transmitter should atomically compare and swap futex_offset with 0
> + * 2. The transmitter should atomically compare and swap futex_offset with 0
>   *at hash. There are 3 possible outcomes
>   *  a. The swap fails because the futex_offset is already in the table.
>   * The transmitter should stop.
> --
> 2.7.4


[tip:x86/vdso] x86: vdso: Use $LD instead of $CC to link

2018-08-05 Thread tip-bot for Alistair Strachan
Commit-ID:  379d98ddf41344273d9718556f761420f4dc80b3
Gitweb: https://git.kernel.org/tip/379d98ddf41344273d9718556f761420f4dc80b3
Author: Alistair Strachan 
AuthorDate: Fri, 3 Aug 2018 10:39:31 -0700
Committer:  Thomas Gleixner 
CommitDate: Sun, 5 Aug 2018 22:33:50 +0200

x86: vdso: Use $LD instead of $CC to link

The vdso{32,64}.so can fail to link with CC=clang when clang tries to find
a suitable GCC toolchain to link these libraries with.

/usr/bin/ld: arch/x86/entry/vdso/vclock_gettime.o:
  access beyond end of merged section (782)

This happens because the host environment leaked into the cross compiler
environment due to the way clang searches for suitable GCC toolchains.

Clang is a retargetable compiler, and each invocation of it must provide
--target= --gcc-toolchain= to allow it to find the
correct binutils for cross compilation. These flags had been added to
KBUILD_CFLAGS, but the vdso code uses CC and not KBUILD_CFLAGS (for various
reasons) which breaks clang's ability to find the correct linker when cross
compiling.

Most of the time this goes unnoticed because the host linker is new enough
to work anyway, or is incompatible and skipped, but this cannot be reliably
assumed.

This change alters the vdso makefile to just use LD directly, which
bypasses clang and thus the searching problem. The makefile will just use
${CROSS_COMPILE}ld instead, which is always what we want. This matches the
method used to link vmlinux.

This drops references to DISABLE_LTO; this option doesn't seem to be set
anywhere, and not knowing what its possible values are, it's not clear how
to convert it from CC to LD flag.

Signed-off-by: Alistair Strachan 
Signed-off-by: Thomas Gleixner 
Acked-by: Andy Lutomirski 
Cc: "H. Peter Anvin" 
Cc: Greg Kroah-Hartman 
Cc: kernel-t...@android.com
Cc: j...@joelfernandes.org
Cc: Andi Kleen 
Link: https://lkml.kernel.org/r/20180803173931.117515-1-astrac...@google.com

---
 arch/x86/entry/vdso/Makefile | 22 +-
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 261802b1cc50..42c6c1bea4f4 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -46,10 +46,8 @@ targets += $(vdso_img_sodbg) $(vdso_img-y:%=vdso%.so)
 
 CPPFLAGS_vdso.lds += -P -C
 
-VDSO_LDFLAGS_vdso.lds = -m64 -Wl,-soname=linux-vdso.so.1 \
-   -Wl,--no-undefined \
-   -Wl,-z,max-page-size=4096 -Wl,-z,common-page-size=4096 \
-   $(DISABLE_LTO)
+VDSO_LDFLAGS_vdso.lds = -m elf_x86_64 -soname linux-vdso.so.1 --no-undefined \
+   -z max-page-size=4096 -z common-page-size=4096
 
 $(obj)/vdso64.so.dbg: $(obj)/vdso.lds $(vobjs) FORCE
$(call if_changed,vdso)
@@ -95,10 +93,8 @@ CFLAGS_REMOVE_vvar.o = -pg
 #
 
 CPPFLAGS_vdsox32.lds = $(CPPFLAGS_vdso.lds)
-VDSO_LDFLAGS_vdsox32.lds = -Wl,-m,elf32_x86_64 \
-  -Wl,-soname=linux-vdso.so.1 \
-  -Wl,-z,max-page-size=4096 \
-  -Wl,-z,common-page-size=4096
+VDSO_LDFLAGS_vdsox32.lds = -m elf32_x86_64 -soname linux-vdso.so.1 \
+  -z max-page-size=4096 -z common-page-size=4096
 
 # x32-rebranded versions
 vobjx32s-y := $(vobjs-y:.o=-x32.o)
@@ -123,7 +119,7 @@ $(obj)/vdsox32.so.dbg: $(obj)/vdsox32.lds $(vobjx32s) FORCE
$(call if_changed,vdso)
 
 CPPFLAGS_vdso32.lds = $(CPPFLAGS_vdso.lds)
-VDSO_LDFLAGS_vdso32.lds = -m32 -Wl,-m,elf_i386 -Wl,-soname=linux-gate.so.1
+VDSO_LDFLAGS_vdso32.lds = -m elf_i386 -soname linux-gate.so.1
 
 targets += vdso32/vdso32.lds
 targets += vdso32/note.o vdso32/system_call.o vdso32/sigreturn.o
@@ -157,13 +153,13 @@ $(obj)/vdso32.so.dbg: FORCE \
 # The DSO images are built using a special linker script.
 #
 quiet_cmd_vdso = VDSO$@
-  cmd_vdso = $(CC) -nostdlib -o $@ \
+  cmd_vdso = $(LD) -nostdlib -o $@ \
   $(VDSO_LDFLAGS) $(VDSO_LDFLAGS_$(filter %.lds,$(^F))) \
-  -Wl,-T,$(filter %.lds,$^) $(filter %.o,$^) && \
+  -T $(filter %.lds,$^) $(filter %.o,$^) && \
 sh $(srctree)/$(src)/checkundef.sh '$(NM)' '$@'
 
-VDSO_LDFLAGS = -fPIC -shared $(call cc-ldoption, -Wl$(comma)--hash-style=both) 
\
-   $(call cc-ldoption, -Wl$(comma)--build-id) -Wl,-Bsymbolic $(LTO_CFLAGS)
+VDSO_LDFLAGS = -shared $(call ld-option, --hash-style=both) \
+   $(call ld-option, --build-id) -Bsymbolic
 GCOV_PROFILE := n
 
 #


[tip:x86/vdso] x86: vdso: Use $LD instead of $CC to link

2018-08-05 Thread tip-bot for Alistair Strachan
Commit-ID:  379d98ddf41344273d9718556f761420f4dc80b3
Gitweb: https://git.kernel.org/tip/379d98ddf41344273d9718556f761420f4dc80b3
Author: Alistair Strachan 
AuthorDate: Fri, 3 Aug 2018 10:39:31 -0700
Committer:  Thomas Gleixner 
CommitDate: Sun, 5 Aug 2018 22:33:50 +0200

x86: vdso: Use $LD instead of $CC to link

The vdso{32,64}.so can fail to link with CC=clang when clang tries to find
a suitable GCC toolchain to link these libraries with.

/usr/bin/ld: arch/x86/entry/vdso/vclock_gettime.o:
  access beyond end of merged section (782)

This happens because the host environment leaked into the cross compiler
environment due to the way clang searches for suitable GCC toolchains.

Clang is a retargetable compiler, and each invocation of it must provide
--target= --gcc-toolchain= to allow it to find the
correct binutils for cross compilation. These flags had been added to
KBUILD_CFLAGS, but the vdso code uses CC and not KBUILD_CFLAGS (for various
reasons) which breaks clang's ability to find the correct linker when cross
compiling.

Most of the time this goes unnoticed because the host linker is new enough
to work anyway, or is incompatible and skipped, but this cannot be reliably
assumed.

This change alters the vdso makefile to just use LD directly, which
bypasses clang and thus the searching problem. The makefile will just use
${CROSS_COMPILE}ld instead, which is always what we want. This matches the
method used to link vmlinux.

This drops references to DISABLE_LTO; this option doesn't seem to be set
anywhere, and not knowing what its possible values are, it's not clear how
to convert it from CC to LD flag.

Signed-off-by: Alistair Strachan 
Signed-off-by: Thomas Gleixner 
Acked-by: Andy Lutomirski 
Cc: "H. Peter Anvin" 
Cc: Greg Kroah-Hartman 
Cc: kernel-t...@android.com
Cc: j...@joelfernandes.org
Cc: Andi Kleen 
Link: https://lkml.kernel.org/r/20180803173931.117515-1-astrac...@google.com

---
 arch/x86/entry/vdso/Makefile | 22 +-
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 261802b1cc50..42c6c1bea4f4 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -46,10 +46,8 @@ targets += $(vdso_img_sodbg) $(vdso_img-y:%=vdso%.so)
 
 CPPFLAGS_vdso.lds += -P -C
 
-VDSO_LDFLAGS_vdso.lds = -m64 -Wl,-soname=linux-vdso.so.1 \
-   -Wl,--no-undefined \
-   -Wl,-z,max-page-size=4096 -Wl,-z,common-page-size=4096 \
-   $(DISABLE_LTO)
+VDSO_LDFLAGS_vdso.lds = -m elf_x86_64 -soname linux-vdso.so.1 --no-undefined \
+   -z max-page-size=4096 -z common-page-size=4096
 
 $(obj)/vdso64.so.dbg: $(obj)/vdso.lds $(vobjs) FORCE
$(call if_changed,vdso)
@@ -95,10 +93,8 @@ CFLAGS_REMOVE_vvar.o = -pg
 #
 
 CPPFLAGS_vdsox32.lds = $(CPPFLAGS_vdso.lds)
-VDSO_LDFLAGS_vdsox32.lds = -Wl,-m,elf32_x86_64 \
-  -Wl,-soname=linux-vdso.so.1 \
-  -Wl,-z,max-page-size=4096 \
-  -Wl,-z,common-page-size=4096
+VDSO_LDFLAGS_vdsox32.lds = -m elf32_x86_64 -soname linux-vdso.so.1 \
+  -z max-page-size=4096 -z common-page-size=4096
 
 # x32-rebranded versions
 vobjx32s-y := $(vobjs-y:.o=-x32.o)
@@ -123,7 +119,7 @@ $(obj)/vdsox32.so.dbg: $(obj)/vdsox32.lds $(vobjx32s) FORCE
$(call if_changed,vdso)
 
 CPPFLAGS_vdso32.lds = $(CPPFLAGS_vdso.lds)
-VDSO_LDFLAGS_vdso32.lds = -m32 -Wl,-m,elf_i386 -Wl,-soname=linux-gate.so.1
+VDSO_LDFLAGS_vdso32.lds = -m elf_i386 -soname linux-gate.so.1
 
 targets += vdso32/vdso32.lds
 targets += vdso32/note.o vdso32/system_call.o vdso32/sigreturn.o
@@ -157,13 +153,13 @@ $(obj)/vdso32.so.dbg: FORCE \
 # The DSO images are built using a special linker script.
 #
 quiet_cmd_vdso = VDSO$@
-  cmd_vdso = $(CC) -nostdlib -o $@ \
+  cmd_vdso = $(LD) -nostdlib -o $@ \
   $(VDSO_LDFLAGS) $(VDSO_LDFLAGS_$(filter %.lds,$(^F))) \
-  -Wl,-T,$(filter %.lds,$^) $(filter %.o,$^) && \
+  -T $(filter %.lds,$^) $(filter %.o,$^) && \
 sh $(srctree)/$(src)/checkundef.sh '$(NM)' '$@'
 
-VDSO_LDFLAGS = -fPIC -shared $(call cc-ldoption, -Wl$(comma)--hash-style=both) 
\
-   $(call cc-ldoption, -Wl$(comma)--build-id) -Wl,-Bsymbolic $(LTO_CFLAGS)
+VDSO_LDFLAGS = -shared $(call ld-option, --hash-style=both) \
+   $(call ld-option, --build-id) -Bsymbolic
 GCOV_PROFILE := n
 
 #


Re: native_save_fl() causes a warning

2018-08-03 Thread Alistair Strachan
On Fri, Aug 3, 2018 at 9:38 AM Nick Desaulniers  wrote:
>
> On Fri, Aug 3, 2018 at 6:10 AM Jean Delvare  wrote:
> >
> > Hi Nick,
> >
> > It seems that this linux kernel commit of yours:
> >
> > commit d0a8d9378d16eb3c69bd8e6d23779fbdbee3a8c7
> > Author: Nick Desaulniers
> > Date:   Thu Jun 21 09:23:24 2018 -0700
> >
> > x86/paravirt: Make native_save_fl() extern inline
> >
> > introduced a new warning (with W=1):
> >
> > ./arch/x86/include/asm/irqflags.h:16:29: warning: no previous prototype for 
> > ‘native_save_fl’ [-Wmissing-prototypes]
> >  extern inline unsigned long native_save_fl(void)
> >  ^
> >
> > Please fix it.
>
> Hi Jean, thanks for the report.  David Laight also reported this
> warning; he tested a patch I sent him overnight.
>
> Let me guess, you're using a version of GCC < 4.9?  It seems that GCC
> < 4.9 will produce that warning for extern inline functions without
> previous declarations.
>
> I'll add your Reported-By tag to the patch that I will send out in a
> few minutes.
>
> > Secondly, I am quite curious why you changed only native_save_fl() from
> > static inline to extern inline, when native_restore_fl(),
> > native_irq_disable() and native_irq_enable() are equally referenced by
> > address in arch/x86/kernel/paravirt.c and thus should suffer from the
> > same problem. Can you explain?
>
> This is a good point.  With native_save_fl, we were not able to boot
> the kernel at all.  Maybe this was called from the boot sequence
> (maybe Juergen knows more)?  It seems that the other functions aren't
> preventing us from booting, but maybe exercising other paths in
> paravirt would expose such an issue?  Or maybe paravirt doesn't have
> the same calling convention requirements for those functions?

The core issue these patches worked around was the automatic/heuristic
generation of stack guard code by clang, which ended up clobbering
%ecx/%rcx in a way not expected by the contract of the paravirt code.
The only function affected by this problem was native_save_fl(),
because only it has a C stack (and thus, has a stack that requires
guarding).

The other functions could have been converted at the same time, and
they will have to be converted if (down the line) somebody adds C
stack variables to them. But, for now, the patch series seems to
correctly work around this issue.

> Is there a standard testing procedure for paravirt? I'd be happy to
> try it to see if we can expose more things that should have the same
> cleanup.

This bug was so obvious you just enabled CONFIG_PARAVIRT, built with
CC=clang, and booted the x86_64 bzImage on any qemu. The minute the
paravirt alternatives were patched in, it exploded.

> --
> Thanks,
> ~Nick Desaulniers


Re: native_save_fl() causes a warning

2018-08-03 Thread Alistair Strachan
On Fri, Aug 3, 2018 at 9:38 AM Nick Desaulniers  wrote:
>
> On Fri, Aug 3, 2018 at 6:10 AM Jean Delvare  wrote:
> >
> > Hi Nick,
> >
> > It seems that this linux kernel commit of yours:
> >
> > commit d0a8d9378d16eb3c69bd8e6d23779fbdbee3a8c7
> > Author: Nick Desaulniers
> > Date:   Thu Jun 21 09:23:24 2018 -0700
> >
> > x86/paravirt: Make native_save_fl() extern inline
> >
> > introduced a new warning (with W=1):
> >
> > ./arch/x86/include/asm/irqflags.h:16:29: warning: no previous prototype for 
> > ‘native_save_fl’ [-Wmissing-prototypes]
> >  extern inline unsigned long native_save_fl(void)
> >  ^
> >
> > Please fix it.
>
> Hi Jean, thanks for the report.  David Laight also reported this
> warning; he tested a patch I sent him overnight.
>
> Let me guess, you're using a version of GCC < 4.9?  It seems that GCC
> < 4.9 will produce that warning for extern inline functions without
> previous declarations.
>
> I'll add your Reported-By tag to the patch that I will send out in a
> few minutes.
>
> > Secondly, I am quite curious why you changed only native_save_fl() from
> > static inline to extern inline, when native_restore_fl(),
> > native_irq_disable() and native_irq_enable() are equally referenced by
> > address in arch/x86/kernel/paravirt.c and thus should suffer from the
> > same problem. Can you explain?
>
> This is a good point.  With native_save_fl, we were not able to boot
> the kernel at all.  Maybe this was called from the boot sequence
> (maybe Juergen knows more)?  It seems that the other functions aren't
> preventing us from booting, but maybe exercising other paths in
> paravirt would expose such an issue?  Or maybe paravirt doesn't have
> the same calling convention requirements for those functions?

The core issue these patches worked around was the automatic/heuristic
generation of stack guard code by clang, which ended up clobbering
%ecx/%rcx in a way not expected by the contract of the paravirt code.
The only function affected by this problem was native_save_fl(),
because only it has a C stack (and thus, has a stack that requires
guarding).

The other functions could have been converted at the same time, and
they will have to be converted if (down the line) somebody adds C
stack variables to them. But, for now, the patch series seems to
correctly work around this issue.

> Is there a standard testing procedure for paravirt? I'd be happy to
> try it to see if we can expose more things that should have the same
> cleanup.

This bug was so obvious you just enabled CONFIG_PARAVIRT, built with
CC=clang, and booted the x86_64 bzImage on any qemu. The minute the
paravirt alternatives were patched in, it exploded.

> --
> Thanks,
> ~Nick Desaulniers


[PATCH v2] x86: vdso: Use $LD instead of $CC to link

2018-08-03 Thread Alistair Strachan
The vdso{32,64}.so can fail to link with CC=clang when clang tries to
find a suitable GCC toolchain to link these libraries with.

/usr/bin/ld: arch/x86/entry/vdso/vclock_gettime.o:
  access beyond end of merged section (782)

This happens because the host environment leaked into the cross
compiler environment due to the way clang searches for suitable GCC
toolchains.

Clang is a retargetable compiler, and each invocation of it must
provide --target= --gcc-toolchain= to allow it
to find the correct binutils for cross compilation. These flags had
been added to KBUILD_CFLAGS, but the vdso code uses CC and not
KBUILD_CFLAGS (for various reasons) which breaks clang's ability to
find the correct linker when cross compiling.

Most of the time this goes unnoticed because the host linker is new
enough to work anyway, or is incompatible and skipped, but this cannot
be reliably assumed.

This change alters the vdso makefile to just use LD directly, which
bypasses clang and thus the searching problem. The makefile will just
use ${CROSS_COMPILE}ld instead, which is always what we want. This
matches the method used to link vmlinux.

This change drops references to DISABLE_LTO; this option doesn't seem to
be set anywhere, and not knowing what its possible values are, it's not
clear how to convert it from CC to LD flag.

Cc: Andy Lutomirski 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Greg Kroah-Hartman 
Cc: x...@kernel.org
Cc: kernel-t...@android.com
Cc: j...@joelfernandes.org
Signed-off-by: Alistair Strachan 
Acked-by: Andy Lutomirski 
---
v2: Updated changelog and rediffed
Supersedes "x86: vdso: Fix leaky vdso link with CC=clang"
 arch/x86/entry/vdso/Makefile | 22 +-
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 261802b1cc50..42c6c1bea4f4 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -46,10 +46,8 @@ targets += $(vdso_img_sodbg) $(vdso_img-y:%=vdso%.so)
 
 CPPFLAGS_vdso.lds += -P -C
 
-VDSO_LDFLAGS_vdso.lds = -m64 -Wl,-soname=linux-vdso.so.1 \
-   -Wl,--no-undefined \
-   -Wl,-z,max-page-size=4096 -Wl,-z,common-page-size=4096 \
-   $(DISABLE_LTO)
+VDSO_LDFLAGS_vdso.lds = -m elf_x86_64 -soname linux-vdso.so.1 --no-undefined \
+   -z max-page-size=4096 -z common-page-size=4096
 
 $(obj)/vdso64.so.dbg: $(obj)/vdso.lds $(vobjs) FORCE
$(call if_changed,vdso)
@@ -95,10 +93,8 @@ CFLAGS_REMOVE_vvar.o = -pg
 #
 
 CPPFLAGS_vdsox32.lds = $(CPPFLAGS_vdso.lds)
-VDSO_LDFLAGS_vdsox32.lds = -Wl,-m,elf32_x86_64 \
-  -Wl,-soname=linux-vdso.so.1 \
-  -Wl,-z,max-page-size=4096 \
-  -Wl,-z,common-page-size=4096
+VDSO_LDFLAGS_vdsox32.lds = -m elf32_x86_64 -soname linux-vdso.so.1 \
+  -z max-page-size=4096 -z common-page-size=4096
 
 # x32-rebranded versions
 vobjx32s-y := $(vobjs-y:.o=-x32.o)
@@ -123,7 +119,7 @@ $(obj)/vdsox32.so.dbg: $(obj)/vdsox32.lds $(vobjx32s) FORCE
$(call if_changed,vdso)
 
 CPPFLAGS_vdso32.lds = $(CPPFLAGS_vdso.lds)
-VDSO_LDFLAGS_vdso32.lds = -m32 -Wl,-m,elf_i386 -Wl,-soname=linux-gate.so.1
+VDSO_LDFLAGS_vdso32.lds = -m elf_i386 -soname linux-gate.so.1
 
 targets += vdso32/vdso32.lds
 targets += vdso32/note.o vdso32/system_call.o vdso32/sigreturn.o
@@ -157,13 +153,13 @@ $(obj)/vdso32.so.dbg: FORCE \
 # The DSO images are built using a special linker script.
 #
 quiet_cmd_vdso = VDSO$@
-  cmd_vdso = $(CC) -nostdlib -o $@ \
+  cmd_vdso = $(LD) -nostdlib -o $@ \
   $(VDSO_LDFLAGS) $(VDSO_LDFLAGS_$(filter %.lds,$(^F))) \
-  -Wl,-T,$(filter %.lds,$^) $(filter %.o,$^) && \
+  -T $(filter %.lds,$^) $(filter %.o,$^) && \
 sh $(srctree)/$(src)/checkundef.sh '$(NM)' '$@'
 
-VDSO_LDFLAGS = -fPIC -shared $(call cc-ldoption, -Wl$(comma)--hash-style=both) 
\
-   $(call cc-ldoption, -Wl$(comma)--build-id) -Wl,-Bsymbolic $(LTO_CFLAGS)
+VDSO_LDFLAGS = -shared $(call ld-option, --hash-style=both) \
+   $(call ld-option, --build-id) -Bsymbolic
 GCOV_PROFILE := n
 
 #


[PATCH v2] x86: vdso: Use $LD instead of $CC to link

2018-08-03 Thread Alistair Strachan
The vdso{32,64}.so can fail to link with CC=clang when clang tries to
find a suitable GCC toolchain to link these libraries with.

/usr/bin/ld: arch/x86/entry/vdso/vclock_gettime.o:
  access beyond end of merged section (782)

This happens because the host environment leaked into the cross
compiler environment due to the way clang searches for suitable GCC
toolchains.

Clang is a retargetable compiler, and each invocation of it must
provide --target= --gcc-toolchain= to allow it
to find the correct binutils for cross compilation. These flags had
been added to KBUILD_CFLAGS, but the vdso code uses CC and not
KBUILD_CFLAGS (for various reasons) which breaks clang's ability to
find the correct linker when cross compiling.

Most of the time this goes unnoticed because the host linker is new
enough to work anyway, or is incompatible and skipped, but this cannot
be reliably assumed.

This change alters the vdso makefile to just use LD directly, which
bypasses clang and thus the searching problem. The makefile will just
use ${CROSS_COMPILE}ld instead, which is always what we want. This
matches the method used to link vmlinux.

This change drops references to DISABLE_LTO; this option doesn't seem to
be set anywhere, and not knowing what its possible values are, it's not
clear how to convert it from CC to LD flag.

Cc: Andy Lutomirski 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Greg Kroah-Hartman 
Cc: x...@kernel.org
Cc: kernel-t...@android.com
Cc: j...@joelfernandes.org
Signed-off-by: Alistair Strachan 
Acked-by: Andy Lutomirski 
---
v2: Updated changelog and rediffed
Supersedes "x86: vdso: Fix leaky vdso link with CC=clang"
 arch/x86/entry/vdso/Makefile | 22 +-
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 261802b1cc50..42c6c1bea4f4 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -46,10 +46,8 @@ targets += $(vdso_img_sodbg) $(vdso_img-y:%=vdso%.so)
 
 CPPFLAGS_vdso.lds += -P -C
 
-VDSO_LDFLAGS_vdso.lds = -m64 -Wl,-soname=linux-vdso.so.1 \
-   -Wl,--no-undefined \
-   -Wl,-z,max-page-size=4096 -Wl,-z,common-page-size=4096 \
-   $(DISABLE_LTO)
+VDSO_LDFLAGS_vdso.lds = -m elf_x86_64 -soname linux-vdso.so.1 --no-undefined \
+   -z max-page-size=4096 -z common-page-size=4096
 
 $(obj)/vdso64.so.dbg: $(obj)/vdso.lds $(vobjs) FORCE
$(call if_changed,vdso)
@@ -95,10 +93,8 @@ CFLAGS_REMOVE_vvar.o = -pg
 #
 
 CPPFLAGS_vdsox32.lds = $(CPPFLAGS_vdso.lds)
-VDSO_LDFLAGS_vdsox32.lds = -Wl,-m,elf32_x86_64 \
-  -Wl,-soname=linux-vdso.so.1 \
-  -Wl,-z,max-page-size=4096 \
-  -Wl,-z,common-page-size=4096
+VDSO_LDFLAGS_vdsox32.lds = -m elf32_x86_64 -soname linux-vdso.so.1 \
+  -z max-page-size=4096 -z common-page-size=4096
 
 # x32-rebranded versions
 vobjx32s-y := $(vobjs-y:.o=-x32.o)
@@ -123,7 +119,7 @@ $(obj)/vdsox32.so.dbg: $(obj)/vdsox32.lds $(vobjx32s) FORCE
$(call if_changed,vdso)
 
 CPPFLAGS_vdso32.lds = $(CPPFLAGS_vdso.lds)
-VDSO_LDFLAGS_vdso32.lds = -m32 -Wl,-m,elf_i386 -Wl,-soname=linux-gate.so.1
+VDSO_LDFLAGS_vdso32.lds = -m elf_i386 -soname linux-gate.so.1
 
 targets += vdso32/vdso32.lds
 targets += vdso32/note.o vdso32/system_call.o vdso32/sigreturn.o
@@ -157,13 +153,13 @@ $(obj)/vdso32.so.dbg: FORCE \
 # The DSO images are built using a special linker script.
 #
 quiet_cmd_vdso = VDSO$@
-  cmd_vdso = $(CC) -nostdlib -o $@ \
+  cmd_vdso = $(LD) -nostdlib -o $@ \
   $(VDSO_LDFLAGS) $(VDSO_LDFLAGS_$(filter %.lds,$(^F))) \
-  -Wl,-T,$(filter %.lds,$^) $(filter %.o,$^) && \
+  -T $(filter %.lds,$^) $(filter %.o,$^) && \
 sh $(srctree)/$(src)/checkundef.sh '$(NM)' '$@'
 
-VDSO_LDFLAGS = -fPIC -shared $(call cc-ldoption, -Wl$(comma)--hash-style=both) 
\
-   $(call cc-ldoption, -Wl$(comma)--build-id) -Wl,-Bsymbolic $(LTO_CFLAGS)
+VDSO_LDFLAGS = -shared $(call ld-option, --hash-style=both) \
+   $(call ld-option, --build-id) -Bsymbolic
 GCOV_PROFILE := n
 
 #


Re: [PATCH] x86: vdso: Use $LD instead of $CC to link

2018-08-03 Thread Alistair Strachan
On Fri, Aug 3, 2018 at 3:26 AM Thomas Gleixner  wrote:
>
> On Wed, 18 Jul 2018, Alistair Strachan wrote:
> >  export CPPFLAGS_vdso.lds += -P -C
> >
> > -VDSO_LDFLAGS_vdso.lds = -m64 -Wl,-soname=linux-vdso.so.1 \
> > - -Wl,--no-undefined \
> > - -Wl,-z,max-page-size=4096 
> > -Wl,-z,common-page-size=4096 \
> > - $(DISABLE_LTO)
> > +VDSO_LDFLAGS_vdso.lds = -m elf_x86_64 -soname linux-vdso.so.1 
> > --no-undefined \
> > + -z max-page-size=4096 -z common-page-size=4096
>
> Aside of the fact that it does not apply to upstream,

It seems to apply fine to Linus's tree (0585df4), but I'll send a v2
which is rediffed.

> why is this dropping
> the $(DISABLE_LTO) part?
>
> The changelog is utterly silent about that.

$ git grep DISABLE_LTO
arch/sparc/vdso/Makefile:KBUILD_CFLAGS += $(DISABLE_LTO)
arch/sparc/vdso/Makefile:   $(DISABLE_LTO)
arch/x86/entry/vdso/Makefile:KBUILD_CFLAGS += $(DISABLE_LTO)
arch/x86/entry/vdso/Makefile:   $(DISABLE_LTO)
kernel/Makefile:CFLAGS_sys_ni.o = $(DISABLE_LTO)
scripts/Makefile.build:cmd_cc_s_c   = $(CC) $(c_flags)
$(DISABLE_LTO) -fverbose-asm -S -o $@ $<

Looks like a dead option to me, but maybe somebody else knows better.
v2 will explain this removal.

> Thanks,
>
> tglx


Re: [PATCH] x86: vdso: Use $LD instead of $CC to link

2018-08-03 Thread Alistair Strachan
On Fri, Aug 3, 2018 at 3:26 AM Thomas Gleixner  wrote:
>
> On Wed, 18 Jul 2018, Alistair Strachan wrote:
> >  export CPPFLAGS_vdso.lds += -P -C
> >
> > -VDSO_LDFLAGS_vdso.lds = -m64 -Wl,-soname=linux-vdso.so.1 \
> > - -Wl,--no-undefined \
> > - -Wl,-z,max-page-size=4096 
> > -Wl,-z,common-page-size=4096 \
> > - $(DISABLE_LTO)
> > +VDSO_LDFLAGS_vdso.lds = -m elf_x86_64 -soname linux-vdso.so.1 
> > --no-undefined \
> > + -z max-page-size=4096 -z common-page-size=4096
>
> Aside of the fact that it does not apply to upstream,

It seems to apply fine to Linus's tree (0585df4), but I'll send a v2
which is rediffed.

> why is this dropping
> the $(DISABLE_LTO) part?
>
> The changelog is utterly silent about that.

$ git grep DISABLE_LTO
arch/sparc/vdso/Makefile:KBUILD_CFLAGS += $(DISABLE_LTO)
arch/sparc/vdso/Makefile:   $(DISABLE_LTO)
arch/x86/entry/vdso/Makefile:KBUILD_CFLAGS += $(DISABLE_LTO)
arch/x86/entry/vdso/Makefile:   $(DISABLE_LTO)
kernel/Makefile:CFLAGS_sys_ni.o = $(DISABLE_LTO)
scripts/Makefile.build:cmd_cc_s_c   = $(CC) $(c_flags)
$(DISABLE_LTO) -fverbose-asm -S -o $@ $<

Looks like a dead option to me, but maybe somebody else knows better.
v2 will explain this removal.

> Thanks,
>
> tglx


[PATCH] x86: vdso: Use $LD instead of $CC to link

2018-07-18 Thread Alistair Strachan
The vdso{32,64}.so can fail to link with CC=clang when clang tries to
find a suitable GCC toolchain to link these libraries with.

/usr/bin/ld: arch/x86/entry/vdso/vclock_gettime.o:
  access beyond end of merged section (782)

This happens because the host environment leaked into the cross
compiler environment due to the way clang searches for suitable GCC
toolchains.

Clang is a retargetable compiler, and each invocation of it must
provide --target= --gcc-toolchain= to allow it
to find the correct binutils for cross compilation. These flags had
been added to KBUILD_CFLAGS, but the vdso code uses CC and not
KBUILD_CFLAGS (for various reasons) which breaks clang's ability to
find the correct linker when cross compiling.

Most of the time this goes unnoticed because the host linker is new
enough to work anyway, or is incompatible and skipped, but this cannot
be reliably assumed.

This change alters the vdso makefile to just use LD directly, which
bypasses clang and thus the searching problem. The makefile will just
use ${CROSS_COMPILE}ld instead, which is always what we want. This
matches the method used to link vmlinux.

Cc: Andy Lutomirski 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Greg Kroah-Hartman 
Cc: x...@kernel.org
Cc: kernel-t...@android.com
Cc: j...@joelfernandes.com
Signed-off-by: Alistair Strachan 
---
Supersedes "x86: vdso: Fix leaky vdso link with CC=clang"
 arch/x86/entry/vdso/Makefile | 22 +-
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 1943aebadede..896a55204386 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -48,10 +48,8 @@ targets += $(vdso_img_sodbg)
 
 export CPPFLAGS_vdso.lds += -P -C
 
-VDSO_LDFLAGS_vdso.lds = -m64 -Wl,-soname=linux-vdso.so.1 \
-   -Wl,--no-undefined \
-   -Wl,-z,max-page-size=4096 -Wl,-z,common-page-size=4096 \
-   $(DISABLE_LTO)
+VDSO_LDFLAGS_vdso.lds = -m elf_x86_64 -soname linux-vdso.so.1 --no-undefined \
+   -z max-page-size=4096 -z common-page-size=4096
 
 $(obj)/vdso64.so.dbg: $(src)/vdso.lds $(vobjs) FORCE
$(call if_changed,vdso)
@@ -97,10 +95,8 @@ CFLAGS_REMOVE_vvar.o = -pg
 #
 
 CPPFLAGS_vdsox32.lds = $(CPPFLAGS_vdso.lds)
-VDSO_LDFLAGS_vdsox32.lds = -Wl,-m,elf32_x86_64 \
-  -Wl,-soname=linux-vdso.so.1 \
-  -Wl,-z,max-page-size=4096 \
-  -Wl,-z,common-page-size=4096
+VDSO_LDFLAGS_vdsox32.lds = -m elf32_x86_64 -soname linux-vdso.so.1 \
+  -z max-page-size=4096 -z common-page-size=4096
 
 # 64-bit objects to re-brand as x32
 vobjs64-for-x32 := $(filter-out $(vobjs-nox32),$(vobjs-y))
@@ -128,7 +124,7 @@ $(obj)/vdsox32.so.dbg: $(src)/vdsox32.lds $(vobjx32s) FORCE
$(call if_changed,vdso)
 
 CPPFLAGS_vdso32.lds = $(CPPFLAGS_vdso.lds)
-VDSO_LDFLAGS_vdso32.lds = -m32 -Wl,-m,elf_i386 -Wl,-soname=linux-gate.so.1
+VDSO_LDFLAGS_vdso32.lds = -m elf_i386 -soname linux-gate.so.1
 
 targets += vdso32/vdso32.lds
 targets += vdso32/note.o vdso32/system_call.o vdso32/sigreturn.o
@@ -162,13 +158,13 @@ $(obj)/vdso32.so.dbg: FORCE \
 # The DSO images are built using a special linker script.
 #
 quiet_cmd_vdso = VDSO$@
-  cmd_vdso = $(CC) -nostdlib -o $@ \
+  cmd_vdso = $(LD) -nostdlib -o $@ \
   $(VDSO_LDFLAGS) $(VDSO_LDFLAGS_$(filter %.lds,$(^F))) \
-  -Wl,-T,$(filter %.lds,$^) $(filter %.o,$^) && \
+  -T $(filter %.lds,$^) $(filter %.o,$^) && \
 sh $(srctree)/$(src)/checkundef.sh '$(NM)' '$@'
 
-VDSO_LDFLAGS = -fPIC -shared $(call cc-ldoption, -Wl$(comma)--hash-style=both) 
\
-   $(call cc-ldoption, -Wl$(comma)--build-id) -Wl,-Bsymbolic $(LTO_CFLAGS)
+VDSO_LDFLAGS = -shared $(call ld-option, --hash-style=both) \
+   $(call ld-option, --build-id) -Bsymbolic
 GCOV_PROFILE := n
 
 #


[PATCH] x86: vdso: Use $LD instead of $CC to link

2018-07-18 Thread Alistair Strachan
The vdso{32,64}.so can fail to link with CC=clang when clang tries to
find a suitable GCC toolchain to link these libraries with.

/usr/bin/ld: arch/x86/entry/vdso/vclock_gettime.o:
  access beyond end of merged section (782)

This happens because the host environment leaked into the cross
compiler environment due to the way clang searches for suitable GCC
toolchains.

Clang is a retargetable compiler, and each invocation of it must
provide --target= --gcc-toolchain= to allow it
to find the correct binutils for cross compilation. These flags had
been added to KBUILD_CFLAGS, but the vdso code uses CC and not
KBUILD_CFLAGS (for various reasons) which breaks clang's ability to
find the correct linker when cross compiling.

Most of the time this goes unnoticed because the host linker is new
enough to work anyway, or is incompatible and skipped, but this cannot
be reliably assumed.

This change alters the vdso makefile to just use LD directly, which
bypasses clang and thus the searching problem. The makefile will just
use ${CROSS_COMPILE}ld instead, which is always what we want. This
matches the method used to link vmlinux.

Cc: Andy Lutomirski 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Greg Kroah-Hartman 
Cc: x...@kernel.org
Cc: kernel-t...@android.com
Cc: j...@joelfernandes.com
Signed-off-by: Alistair Strachan 
---
Supersedes "x86: vdso: Fix leaky vdso link with CC=clang"
 arch/x86/entry/vdso/Makefile | 22 +-
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 1943aebadede..896a55204386 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -48,10 +48,8 @@ targets += $(vdso_img_sodbg)
 
 export CPPFLAGS_vdso.lds += -P -C
 
-VDSO_LDFLAGS_vdso.lds = -m64 -Wl,-soname=linux-vdso.so.1 \
-   -Wl,--no-undefined \
-   -Wl,-z,max-page-size=4096 -Wl,-z,common-page-size=4096 \
-   $(DISABLE_LTO)
+VDSO_LDFLAGS_vdso.lds = -m elf_x86_64 -soname linux-vdso.so.1 --no-undefined \
+   -z max-page-size=4096 -z common-page-size=4096
 
 $(obj)/vdso64.so.dbg: $(src)/vdso.lds $(vobjs) FORCE
$(call if_changed,vdso)
@@ -97,10 +95,8 @@ CFLAGS_REMOVE_vvar.o = -pg
 #
 
 CPPFLAGS_vdsox32.lds = $(CPPFLAGS_vdso.lds)
-VDSO_LDFLAGS_vdsox32.lds = -Wl,-m,elf32_x86_64 \
-  -Wl,-soname=linux-vdso.so.1 \
-  -Wl,-z,max-page-size=4096 \
-  -Wl,-z,common-page-size=4096
+VDSO_LDFLAGS_vdsox32.lds = -m elf32_x86_64 -soname linux-vdso.so.1 \
+  -z max-page-size=4096 -z common-page-size=4096
 
 # 64-bit objects to re-brand as x32
 vobjs64-for-x32 := $(filter-out $(vobjs-nox32),$(vobjs-y))
@@ -128,7 +124,7 @@ $(obj)/vdsox32.so.dbg: $(src)/vdsox32.lds $(vobjx32s) FORCE
$(call if_changed,vdso)
 
 CPPFLAGS_vdso32.lds = $(CPPFLAGS_vdso.lds)
-VDSO_LDFLAGS_vdso32.lds = -m32 -Wl,-m,elf_i386 -Wl,-soname=linux-gate.so.1
+VDSO_LDFLAGS_vdso32.lds = -m elf_i386 -soname linux-gate.so.1
 
 targets += vdso32/vdso32.lds
 targets += vdso32/note.o vdso32/system_call.o vdso32/sigreturn.o
@@ -162,13 +158,13 @@ $(obj)/vdso32.so.dbg: FORCE \
 # The DSO images are built using a special linker script.
 #
 quiet_cmd_vdso = VDSO$@
-  cmd_vdso = $(CC) -nostdlib -o $@ \
+  cmd_vdso = $(LD) -nostdlib -o $@ \
   $(VDSO_LDFLAGS) $(VDSO_LDFLAGS_$(filter %.lds,$(^F))) \
-  -Wl,-T,$(filter %.lds,$^) $(filter %.o,$^) && \
+  -T $(filter %.lds,$^) $(filter %.o,$^) && \
 sh $(srctree)/$(src)/checkundef.sh '$(NM)' '$@'
 
-VDSO_LDFLAGS = -fPIC -shared $(call cc-ldoption, -Wl$(comma)--hash-style=both) 
\
-   $(call cc-ldoption, -Wl$(comma)--build-id) -Wl,-Bsymbolic $(LTO_CFLAGS)
+VDSO_LDFLAGS = -shared $(call ld-option, --hash-style=both) \
+   $(call ld-option, --build-id) -Bsymbolic
 GCOV_PROFILE := n
 
 #


Re: WARNING: lock held when returning to user space in fuse_lock_inode

2018-07-17 Thread Alistair Strachan
On Tue, Jul 17, 2018 at 5:46 AM Miklos Szeredi  wrote:
> On Tue, Jul 17, 2018 at 1:36 PM, Dmitry Vyukov  wrote:
> > On Tue, Jul 17, 2018 at 1:14 PM, Miklos Szeredi  wrote:
> >> On Thu, Jul 12, 2018 at 5:49 PM, syzbot
> >>  wrote:
> >>> Hello,
> >>>
> >>> syzbot found the following crash on:
> >>>
> >>> HEAD commit:c25c74b7476e Merge tag 'trace-v4.18-rc3-2' of 
> >>> git://git.ke..
> >>> git tree:   upstream
> >>> console output: https://syzkaller.appspot.com/x/log.txt?x=177bcec240
> >>> kernel config:  https://syzkaller.appspot.com/x/.config?x=25856fac4e580aa7
> >>> dashboard link: 
> >>> https://syzkaller.appspot.com/bug?extid=3f7b29af1baa9d0a55be
> >>> compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
> >>> syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=13aa767840
> >>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1749267840
> >>>
> >>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> >>> Reported-by: syzbot+3f7b29af1baa9d0a5...@syzkaller.appspotmail.com
> >>>
> >>> random: sshd: uninitialized urandom read (32 bytes read)
> >>> random: sshd: uninitialized urandom read (32 bytes read)
> >>> random: sshd: uninitialized urandom read (32 bytes read)
> >>>
> >>> 
> >>> WARNING: lock held when returning to user space!
> >>> 4.18.0-rc4+ #143 Not tainted
> >>> 
> >>> syz-executor012/4539 is leaving the kernel with locks still held!
> >>> 1 lock held by syz-executor012/4539:
> >>>  #0: (ptrval) (>mutex){+.+.}, at: fuse_lock_inode+0xaf/0xe0
> >>> fs/fuse/inode.c:363
> >>
> >> False positive.
> >>
> >> fi->mutex is definitely not held by the acquiring task when returning
> >> to userspace.  Maybe syzkaller is confused by the fact that there are
> >> several interdependent tasks involved with fuse:  the one calling into
> >> fuse by doing something (looking up ./file0/file0) and the one that
> >> reads the fuse device (returning with the LOOKUP request for "file0").
> >> The second one will return with that lock held, but it's not the one
> >> that acquired it, so there's no bug at all here.
> >
> > Hi Miklos,
> >
> > syzkaller is unrelated here. That's what kernel self-detects and
> > prints. So either way there is something to fix in kernel here: either
> > fuse or lockdep.
> >
> > +Alistair did some analysis offline, hope you don't mind if I repost
> > your description:
> > ===
> > Just from reading the code, I think I can see how this happens. Fuse
> > is wrapping its inode mutex with a check for "parallel_dirops", which
> > is set up in process_init_reply(). The FUSE_PARALLEL_DIROPS appears to
> > always be set, in fuse_send_init(), but its initial state is to be
> > disabled. So if the mutex gets taken, and it'll never be unlocked if
> > the initial command is flushed by fuse_readdir()'s use of
> > fuse_lock_inode().
> > ===
>
> Ah, indeed.  Fix attached.

Looks good to me.

Tested-by: Alistair Strachan 

> Thanks,
> Miklos


Re: WARNING: lock held when returning to user space in fuse_lock_inode

2018-07-17 Thread Alistair Strachan
On Tue, Jul 17, 2018 at 5:46 AM Miklos Szeredi  wrote:
> On Tue, Jul 17, 2018 at 1:36 PM, Dmitry Vyukov  wrote:
> > On Tue, Jul 17, 2018 at 1:14 PM, Miklos Szeredi  wrote:
> >> On Thu, Jul 12, 2018 at 5:49 PM, syzbot
> >>  wrote:
> >>> Hello,
> >>>
> >>> syzbot found the following crash on:
> >>>
> >>> HEAD commit:c25c74b7476e Merge tag 'trace-v4.18-rc3-2' of 
> >>> git://git.ke..
> >>> git tree:   upstream
> >>> console output: https://syzkaller.appspot.com/x/log.txt?x=177bcec240
> >>> kernel config:  https://syzkaller.appspot.com/x/.config?x=25856fac4e580aa7
> >>> dashboard link: 
> >>> https://syzkaller.appspot.com/bug?extid=3f7b29af1baa9d0a55be
> >>> compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
> >>> syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=13aa767840
> >>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1749267840
> >>>
> >>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> >>> Reported-by: syzbot+3f7b29af1baa9d0a5...@syzkaller.appspotmail.com
> >>>
> >>> random: sshd: uninitialized urandom read (32 bytes read)
> >>> random: sshd: uninitialized urandom read (32 bytes read)
> >>> random: sshd: uninitialized urandom read (32 bytes read)
> >>>
> >>> 
> >>> WARNING: lock held when returning to user space!
> >>> 4.18.0-rc4+ #143 Not tainted
> >>> 
> >>> syz-executor012/4539 is leaving the kernel with locks still held!
> >>> 1 lock held by syz-executor012/4539:
> >>>  #0: (ptrval) (>mutex){+.+.}, at: fuse_lock_inode+0xaf/0xe0
> >>> fs/fuse/inode.c:363
> >>
> >> False positive.
> >>
> >> fi->mutex is definitely not held by the acquiring task when returning
> >> to userspace.  Maybe syzkaller is confused by the fact that there are
> >> several interdependent tasks involved with fuse:  the one calling into
> >> fuse by doing something (looking up ./file0/file0) and the one that
> >> reads the fuse device (returning with the LOOKUP request for "file0").
> >> The second one will return with that lock held, but it's not the one
> >> that acquired it, so there's no bug at all here.
> >
> > Hi Miklos,
> >
> > syzkaller is unrelated here. That's what kernel self-detects and
> > prints. So either way there is something to fix in kernel here: either
> > fuse or lockdep.
> >
> > +Alistair did some analysis offline, hope you don't mind if I repost
> > your description:
> > ===
> > Just from reading the code, I think I can see how this happens. Fuse
> > is wrapping its inode mutex with a check for "parallel_dirops", which
> > is set up in process_init_reply(). The FUSE_PARALLEL_DIROPS appears to
> > always be set, in fuse_send_init(), but its initial state is to be
> > disabled. So if the mutex gets taken, and it'll never be unlocked if
> > the initial command is flushed by fuse_readdir()'s use of
> > fuse_lock_inode().
> > ===
>
> Ah, indeed.  Fix attached.

Looks good to me.

Tested-by: Alistair Strachan 

> Thanks,
> Miklos


Re: [PATCH] x86: vdso: Fix leaky vdso link with CC=clang

2018-07-12 Thread Alistair Strachan
On Thu, Jul 12, 2018 at 4:20 PM Andy Lutomirski  wrote:
>
> > On Jul 12, 2018, at 3:06 PM, H. Peter Anvin  wrote:
> >
> >> On 07/12/18 13:37, Alistair Strachan wrote:
> >>> On Thu, Jul 12, 2018 at 1:25 PM H. Peter Anvin  wrote:
> >>>> On 07/12/18 13:10, Alistair Strachan wrote:
> >>>> The vdso{32,64}.so can fail to link with CC=clang when clang tries to
> >>>> find a suitable GCC toolchain to link these libraries with.
> >>>>
> >>>> /usr/bin/ld: arch/x86/entry/vdso/vclock_gettime.o:
> >>>>  access beyond end of merged section (782)
> >>>>
> >>>> This happens because the host environment leaked into the cross
> >>>> compiler environment due to the way clang searches for suitable GCC
> >>>> toolchains.
> >>>>
> >>>
> >>> Is this another clang bug that you want a workaround for in the kernel?
> >>
> >> Clang is a retargetable compiler (specified with --target=)
> >> and so it has a mechanism for searching for suitable binutils (from
> >> another "GCC toolchain") to perform assembly and linkage. This
> >> mechanism relies on both --target and --gcc-toolchain when
> >> cross-compiling, otherwise it will fall back to searching /usr.
> >>
> >> The --target and --gcc-toolchain flags are already specified correctly
> >> in the top level Makefile, but the vdso Makefile rolls its own linker
> >> flags and doesn't use KBUILD_CFLAGS. Therefore, these flags get
> >> incorrectly dropped from the vdso $CC link command line, and an
> >> inconsistency is created between the "GCC toolchain" used to generate
> >> the objects for the vdso, and the linker used to link them.
> >>
> >
> > It sounds like there needs to be a more fundamental symbol than
> > KBUILD_CFLAGS to contain these kinds of things.
>
> How about $(CC)?

I'm guessing, but I think this wasn't done originally because CC is
something the user might reasonably specify on the command line (the
other bit comes from CROSS_COMPILE), so doing this via CC would
require us to override the CC passed in on the command line. Not sure
how to do that, since the vdso makefile is executed with a submake, so
the usual "override CC := $(CC) something else" followed by "export
CC" doesn't work.


Re: [PATCH] x86: vdso: Fix leaky vdso link with CC=clang

2018-07-12 Thread Alistair Strachan
On Thu, Jul 12, 2018 at 4:20 PM Andy Lutomirski  wrote:
>
> > On Jul 12, 2018, at 3:06 PM, H. Peter Anvin  wrote:
> >
> >> On 07/12/18 13:37, Alistair Strachan wrote:
> >>> On Thu, Jul 12, 2018 at 1:25 PM H. Peter Anvin  wrote:
> >>>> On 07/12/18 13:10, Alistair Strachan wrote:
> >>>> The vdso{32,64}.so can fail to link with CC=clang when clang tries to
> >>>> find a suitable GCC toolchain to link these libraries with.
> >>>>
> >>>> /usr/bin/ld: arch/x86/entry/vdso/vclock_gettime.o:
> >>>>  access beyond end of merged section (782)
> >>>>
> >>>> This happens because the host environment leaked into the cross
> >>>> compiler environment due to the way clang searches for suitable GCC
> >>>> toolchains.
> >>>>
> >>>
> >>> Is this another clang bug that you want a workaround for in the kernel?
> >>
> >> Clang is a retargetable compiler (specified with --target=)
> >> and so it has a mechanism for searching for suitable binutils (from
> >> another "GCC toolchain") to perform assembly and linkage. This
> >> mechanism relies on both --target and --gcc-toolchain when
> >> cross-compiling, otherwise it will fall back to searching /usr.
> >>
> >> The --target and --gcc-toolchain flags are already specified correctly
> >> in the top level Makefile, but the vdso Makefile rolls its own linker
> >> flags and doesn't use KBUILD_CFLAGS. Therefore, these flags get
> >> incorrectly dropped from the vdso $CC link command line, and an
> >> inconsistency is created between the "GCC toolchain" used to generate
> >> the objects for the vdso, and the linker used to link them.
> >>
> >
> > It sounds like there needs to be a more fundamental symbol than
> > KBUILD_CFLAGS to contain these kinds of things.
>
> How about $(CC)?

I'm guessing, but I think this wasn't done originally because CC is
something the user might reasonably specify on the command line (the
other bit comes from CROSS_COMPILE), so doing this via CC would
require us to override the CC passed in on the command line. Not sure
how to do that, since the vdso makefile is executed with a submake, so
the usual "override CC := $(CC) something else" followed by "export
CC" doesn't work.


Re: [PATCH] x86: vdso: Fix leaky vdso link with CC=clang

2018-07-12 Thread Alistair Strachan
On Thu, Jul 12, 2018 at 1:25 PM H. Peter Anvin  wrote:
> On 07/12/18 13:10, Alistair Strachan wrote:
> > The vdso{32,64}.so can fail to link with CC=clang when clang tries to
> > find a suitable GCC toolchain to link these libraries with.
> >
> > /usr/bin/ld: arch/x86/entry/vdso/vclock_gettime.o:
> >   access beyond end of merged section (782)
> >
> > This happens because the host environment leaked into the cross
> > compiler environment due to the way clang searches for suitable GCC
> > toolchains.
> >
>
> Is this another clang bug that you want a workaround for in the kernel?

Clang is a retargetable compiler (specified with --target=)
and so it has a mechanism for searching for suitable binutils (from
another "GCC toolchain") to perform assembly and linkage. This
mechanism relies on both --target and --gcc-toolchain when
cross-compiling, otherwise it will fall back to searching /usr.

The --target and --gcc-toolchain flags are already specified correctly
in the top level Makefile, but the vdso Makefile rolls its own linker
flags and doesn't use KBUILD_CFLAGS. Therefore, these flags get
incorrectly dropped from the vdso $CC link command line, and an
inconsistency is created between the "GCC toolchain" used to generate
the objects for the vdso, and the linker used to link them.


Re: [PATCH] x86: vdso: Fix leaky vdso link with CC=clang

2018-07-12 Thread Alistair Strachan
On Thu, Jul 12, 2018 at 1:25 PM H. Peter Anvin  wrote:
> On 07/12/18 13:10, Alistair Strachan wrote:
> > The vdso{32,64}.so can fail to link with CC=clang when clang tries to
> > find a suitable GCC toolchain to link these libraries with.
> >
> > /usr/bin/ld: arch/x86/entry/vdso/vclock_gettime.o:
> >   access beyond end of merged section (782)
> >
> > This happens because the host environment leaked into the cross
> > compiler environment due to the way clang searches for suitable GCC
> > toolchains.
> >
>
> Is this another clang bug that you want a workaround for in the kernel?

Clang is a retargetable compiler (specified with --target=)
and so it has a mechanism for searching for suitable binutils (from
another "GCC toolchain") to perform assembly and linkage. This
mechanism relies on both --target and --gcc-toolchain when
cross-compiling, otherwise it will fall back to searching /usr.

The --target and --gcc-toolchain flags are already specified correctly
in the top level Makefile, but the vdso Makefile rolls its own linker
flags and doesn't use KBUILD_CFLAGS. Therefore, these flags get
incorrectly dropped from the vdso $CC link command line, and an
inconsistency is created between the "GCC toolchain" used to generate
the objects for the vdso, and the linker used to link them.


[PATCH] x86: vdso: Fix leaky vdso link with CC=clang

2018-07-12 Thread Alistair Strachan
The vdso{32,64}.so can fail to link with CC=clang when clang tries to
find a suitable GCC toolchain to link these libraries with.

/usr/bin/ld: arch/x86/entry/vdso/vclock_gettime.o:
  access beyond end of merged section (782)

This happens because the host environment leaked into the cross
compiler environment due to the way clang searches for suitable GCC
toolchains.

Most of the time this goes unnoticed because the host linker is new
enough to work anyway, but this cannot be reliably assumed.

Extract the needed --target and --gcc-toolchain flags added in the top
level Makefile from KBUILD_CFLAGS. These flags direct clang to the
appropriate toolchain to link the vdsos.

Cc: Andy Lutomirski 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Greg Kroah-Hartman 
Cc: x...@kernel.org
Cc: kernel-t...@android.com
Cc: j...@joelfernandes.com
Signed-off-by: Alistair Strachan 
---
 arch/x86/entry/vdso/Makefile | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 261802b1cc50..062ee94d2d26 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -163,7 +163,8 @@ quiet_cmd_vdso = VDSO$@
 sh $(srctree)/$(src)/checkundef.sh '$(NM)' '$@'
 
 VDSO_LDFLAGS = -fPIC -shared $(call cc-ldoption, -Wl$(comma)--hash-style=both) 
\
-   $(call cc-ldoption, -Wl$(comma)--build-id) -Wl,-Bsymbolic $(LTO_CFLAGS)
+   $(call cc-ldoption, -Wl$(comma)--build-id) -Wl,-Bsymbolic $(LTO_CFLAGS) 
\
+   $(filter --target=% --gcc-toolchain=%,$(KBUILD_CFLAGS))
 GCOV_PROFILE := n
 
 #


[PATCH] x86: vdso: Fix leaky vdso link with CC=clang

2018-07-12 Thread Alistair Strachan
The vdso{32,64}.so can fail to link with CC=clang when clang tries to
find a suitable GCC toolchain to link these libraries with.

/usr/bin/ld: arch/x86/entry/vdso/vclock_gettime.o:
  access beyond end of merged section (782)

This happens because the host environment leaked into the cross
compiler environment due to the way clang searches for suitable GCC
toolchains.

Most of the time this goes unnoticed because the host linker is new
enough to work anyway, but this cannot be reliably assumed.

Extract the needed --target and --gcc-toolchain flags added in the top
level Makefile from KBUILD_CFLAGS. These flags direct clang to the
appropriate toolchain to link the vdsos.

Cc: Andy Lutomirski 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Greg Kroah-Hartman 
Cc: x...@kernel.org
Cc: kernel-t...@android.com
Cc: j...@joelfernandes.com
Signed-off-by: Alistair Strachan 
---
 arch/x86/entry/vdso/Makefile | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 261802b1cc50..062ee94d2d26 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -163,7 +163,8 @@ quiet_cmd_vdso = VDSO$@
 sh $(srctree)/$(src)/checkundef.sh '$(NM)' '$@'
 
 VDSO_LDFLAGS = -fPIC -shared $(call cc-ldoption, -Wl$(comma)--hash-style=both) 
\
-   $(call cc-ldoption, -Wl$(comma)--build-id) -Wl,-Bsymbolic $(LTO_CFLAGS)
+   $(call cc-ldoption, -Wl$(comma)--build-id) -Wl,-Bsymbolic $(LTO_CFLAGS) 
\
+   $(filter --target=% --gcc-toolchain=%,$(KBUILD_CFLAGS))
 GCOV_PROFILE := n
 
 #


Re: [PATCH] staging: android/vsoc: stop using 'timespec'

2018-06-21 Thread Alistair Strachan
On Mon, Jun 18, 2018 at 8:09 AM Arnd Bergmann  wrote:
>
> The timespec structure suffers from the y2038 overflow and should not
> be used. This changes handle_vsoc_cond_wait() to use ktime_t directly.
>
> Signed-off-by: Arnd Bergmann 

Tested-by: Alistair Strachan 

> ---
>  drivers/staging/android/vsoc.c | 11 +--
>  1 file changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/staging/android/vsoc.c b/drivers/staging/android/vsoc.c
> index 806beda1040b..22571abcaa4e 100644
> --- a/drivers/staging/android/vsoc.c
> +++ b/drivers/staging/android/vsoc.c
> @@ -405,7 +405,7 @@ static int handle_vsoc_cond_wait(struct file *filp, 
> struct vsoc_cond_wait *arg)
> int ret = 0;
> struct vsoc_device_region *region_p = vsoc_region_from_filep(filp);
> atomic_t *address = NULL;
> -   struct timespec ts;
> +   ktime_t wake_time;
>
> /* Ensure that the offset is aligned */
> if (arg->offset & (sizeof(uint32_t) - 1))
> @@ -433,14 +433,13 @@ static int handle_vsoc_cond_wait(struct file *filp, 
> struct vsoc_cond_wait *arg)
>  * We do things this way to flatten differences between 32 bit
>  * and 64 bit timespecs.
>  */
> -   ts.tv_sec = arg->wake_time_sec;
> -   ts.tv_nsec = arg->wake_time_nsec;
> -
> -   if (!timespec_valid())
> +   if (arg->wake_time_nsec >= NSEC_PER_SEC)
> return -EINVAL;
> +   wake_time = ktime_set(arg->wake_time_sec, 
> arg->wake_time_nsec);
> +
> hrtimer_init_on_stack(>timer, CLOCK_MONOTONIC,
>   HRTIMER_MODE_ABS);
> -   hrtimer_set_expires_range_ns(>timer, 
> timespec_to_ktime(ts),
> +   hrtimer_set_expires_range_ns(>timer, wake_time,
>  current->timer_slack_ns);
>
> hrtimer_init_sleeper(to, current);
> --
> 2.9.0
>


Re: [PATCH] staging: android/vsoc: stop using 'timespec'

2018-06-21 Thread Alistair Strachan
On Mon, Jun 18, 2018 at 8:09 AM Arnd Bergmann  wrote:
>
> The timespec structure suffers from the y2038 overflow and should not
> be used. This changes handle_vsoc_cond_wait() to use ktime_t directly.
>
> Signed-off-by: Arnd Bergmann 

Tested-by: Alistair Strachan 

> ---
>  drivers/staging/android/vsoc.c | 11 +--
>  1 file changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/staging/android/vsoc.c b/drivers/staging/android/vsoc.c
> index 806beda1040b..22571abcaa4e 100644
> --- a/drivers/staging/android/vsoc.c
> +++ b/drivers/staging/android/vsoc.c
> @@ -405,7 +405,7 @@ static int handle_vsoc_cond_wait(struct file *filp, 
> struct vsoc_cond_wait *arg)
> int ret = 0;
> struct vsoc_device_region *region_p = vsoc_region_from_filep(filp);
> atomic_t *address = NULL;
> -   struct timespec ts;
> +   ktime_t wake_time;
>
> /* Ensure that the offset is aligned */
> if (arg->offset & (sizeof(uint32_t) - 1))
> @@ -433,14 +433,13 @@ static int handle_vsoc_cond_wait(struct file *filp, 
> struct vsoc_cond_wait *arg)
>  * We do things this way to flatten differences between 32 bit
>  * and 64 bit timespecs.
>  */
> -   ts.tv_sec = arg->wake_time_sec;
> -   ts.tv_nsec = arg->wake_time_nsec;
> -
> -   if (!timespec_valid())
> +   if (arg->wake_time_nsec >= NSEC_PER_SEC)
> return -EINVAL;
> +   wake_time = ktime_set(arg->wake_time_sec, 
> arg->wake_time_nsec);
> +
> hrtimer_init_on_stack(>timer, CLOCK_MONOTONIC,
>   HRTIMER_MODE_ABS);
> -   hrtimer_set_expires_range_ns(>timer, 
> timespec_to_ktime(ts),
> +   hrtimer_set_expires_range_ns(>timer, wake_time,
>  current->timer_slack_ns);
>
> hrtimer_init_sleeper(to, current);
> --
> 2.9.0
>


[PATCH 2/2 v2] staging: android: ashmem: Fix mmap size validation

2018-06-19 Thread Alistair Strachan
The ashmem driver did not check that the size/offset of the vma passed
to its .mmap() function was not larger than the ashmem object being
mapped. This could cause mmap() to succeed, even though accessing parts
of the mapping would later fail with a segmentation fault.

Ensure an error is returned by the ashmem_mmap() function if the vma
size is larger than the ashmem object size. This enables safer handling
of the problem in userspace.

Cc: Greg Kroah-Hartman 
Cc: Arve Hjønnevåg 
Cc: Todd Kjos 
Cc: Martijn Coenen 
Cc: de...@driverdev.osuosl.org
Cc: linux-kernel@vger.kernel.org
Cc: kernel-t...@android.com
Cc: Joel Fernandes 
Signed-off-by: Alistair Strachan 
---
v2: Removed unnecessary use of unlikely() macro

 drivers/staging/android/ashmem.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/staging/android/ashmem.c b/drivers/staging/android/ashmem.c
index c6386e4f5c9b..e392358ec244 100644
--- a/drivers/staging/android/ashmem.c
+++ b/drivers/staging/android/ashmem.c
@@ -366,6 +366,12 @@ static int ashmem_mmap(struct file *file, struct 
vm_area_struct *vma)
goto out;
}
 
+   /* requested mapping size larger than object size */
+   if (vma->vm_end - vma->vm_start > PAGE_ALIGN(asma->size)) {
+   ret = -EINVAL;
+   goto out;
+   }
+
/* requested protection bits must match our allowed protection mask */
if ((vma->vm_flags & ~calc_vm_prot_bits(asma->prot_mask, 0)) &
calc_vm_prot_bits(PROT_MASK, 0)) {


[PATCH 2/2 v2] staging: android: ashmem: Fix mmap size validation

2018-06-19 Thread Alistair Strachan
The ashmem driver did not check that the size/offset of the vma passed
to its .mmap() function was not larger than the ashmem object being
mapped. This could cause mmap() to succeed, even though accessing parts
of the mapping would later fail with a segmentation fault.

Ensure an error is returned by the ashmem_mmap() function if the vma
size is larger than the ashmem object size. This enables safer handling
of the problem in userspace.

Cc: Greg Kroah-Hartman 
Cc: Arve Hjønnevåg 
Cc: Todd Kjos 
Cc: Martijn Coenen 
Cc: de...@driverdev.osuosl.org
Cc: linux-kernel@vger.kernel.org
Cc: kernel-t...@android.com
Cc: Joel Fernandes 
Signed-off-by: Alistair Strachan 
---
v2: Removed unnecessary use of unlikely() macro

 drivers/staging/android/ashmem.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/staging/android/ashmem.c b/drivers/staging/android/ashmem.c
index c6386e4f5c9b..e392358ec244 100644
--- a/drivers/staging/android/ashmem.c
+++ b/drivers/staging/android/ashmem.c
@@ -366,6 +366,12 @@ static int ashmem_mmap(struct file *file, struct 
vm_area_struct *vma)
goto out;
}
 
+   /* requested mapping size larger than object size */
+   if (vma->vm_end - vma->vm_start > PAGE_ALIGN(asma->size)) {
+   ret = -EINVAL;
+   goto out;
+   }
+
/* requested protection bits must match our allowed protection mask */
if ((vma->vm_flags & ~calc_vm_prot_bits(asma->prot_mask, 0)) &
calc_vm_prot_bits(PROT_MASK, 0)) {


[PATCH 1/2] staging: android: ashmem: Remove use of unlikely()

2018-06-19 Thread Alistair Strachan
There is no speed difference, and it makes the code harder to read.

Cc: Greg Kroah-Hartman 
Cc: Arve Hjønnevåg 
Cc: Todd Kjos 
Cc: Martijn Coenen 
Cc: de...@driverdev.osuosl.org
Cc: linux-kernel@vger.kernel.org
Cc: kernel-t...@android.com
Cc: Joel Fernandes 
Suggested-by: Greg Kroah-Hartman 
Signed-off-by: Alistair Strachan 
---
 drivers/staging/android/ashmem.c | 34 
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/drivers/staging/android/ashmem.c b/drivers/staging/android/ashmem.c
index a1a0025b59e0..c6386e4f5c9b 100644
--- a/drivers/staging/android/ashmem.c
+++ b/drivers/staging/android/ashmem.c
@@ -178,7 +178,7 @@ static int range_alloc(struct ashmem_area *asma,
struct ashmem_range *range;
 
range = kmem_cache_zalloc(ashmem_range_cachep, GFP_KERNEL);
-   if (unlikely(!range))
+   if (!range)
return -ENOMEM;
 
range->asma = asma;
@@ -246,11 +246,11 @@ static int ashmem_open(struct inode *inode, struct file 
*file)
int ret;
 
ret = generic_file_open(inode, file);
-   if (unlikely(ret))
+   if (ret)
return ret;
 
asma = kmem_cache_zalloc(ashmem_area_cachep, GFP_KERNEL);
-   if (unlikely(!asma))
+   if (!asma)
return -ENOMEM;
 
INIT_LIST_HEAD(>unpinned_list);
@@ -361,14 +361,14 @@ static int ashmem_mmap(struct file *file, struct 
vm_area_struct *vma)
mutex_lock(_mutex);
 
/* user needs to SET_SIZE before mapping */
-   if (unlikely(!asma->size)) {
+   if (!asma->size) {
ret = -EINVAL;
goto out;
}
 
/* requested protection bits must match our allowed protection mask */
-   if (unlikely((vma->vm_flags & ~calc_vm_prot_bits(asma->prot_mask, 0)) &
-calc_vm_prot_bits(PROT_MASK, 0))) {
+   if ((vma->vm_flags & ~calc_vm_prot_bits(asma->prot_mask, 0)) &
+   calc_vm_prot_bits(PROT_MASK, 0)) {
ret = -EPERM;
goto out;
}
@@ -486,7 +486,7 @@ static int set_prot_mask(struct ashmem_area *asma, unsigned 
long prot)
mutex_lock(_mutex);
 
/* the user can only remove, not add, protection bits */
-   if (unlikely((asma->prot_mask & prot) != prot)) {
+   if ((asma->prot_mask & prot) != prot) {
ret = -EINVAL;
goto out;
}
@@ -524,7 +524,7 @@ static int set_name(struct ashmem_area *asma, void __user 
*name)
local_name[ASHMEM_NAME_LEN - 1] = '\0';
mutex_lock(_mutex);
/* cannot change an existing mapping's name */
-   if (unlikely(asma->file))
+   if (asma->file)
ret = -EINVAL;
else
strcpy(asma->name + ASHMEM_NAME_PREFIX_LEN, local_name);
@@ -563,7 +563,7 @@ static int get_name(struct ashmem_area *asma, void __user 
*name)
 * Now we are just copying from the stack variable to userland
 * No lock held
 */
-   if (unlikely(copy_to_user(name, local_name, len)))
+   if (copy_to_user(name, local_name, len))
ret = -EFAULT;
return ret;
 }
@@ -701,25 +701,25 @@ static int ashmem_pin_unpin(struct ashmem_area *asma, 
unsigned long cmd,
size_t pgstart, pgend;
int ret = -EINVAL;
 
-   if (unlikely(copy_from_user(, p, sizeof(pin
+   if (copy_from_user(, p, sizeof(pin)))
return -EFAULT;
 
mutex_lock(_mutex);
 
-   if (unlikely(!asma->file))
+   if (!asma->file)
goto out_unlock;
 
/* per custom, you can pass zero for len to mean "everything onward" */
if (!pin.len)
pin.len = PAGE_ALIGN(asma->size) - pin.offset;
 
-   if (unlikely((pin.offset | pin.len) & ~PAGE_MASK))
+   if ((pin.offset | pin.len) & ~PAGE_MASK)
goto out_unlock;
 
-   if (unlikely(((__u32)-1) - pin.offset < pin.len))
+   if (((__u32)-1) - pin.offset < pin.len)
goto out_unlock;
 
-   if (unlikely(PAGE_ALIGN(asma->size) < pin.offset + pin.len))
+   if (PAGE_ALIGN(asma->size) < pin.offset + pin.len)
goto out_unlock;
 
pgstart = pin.offset / PAGE_SIZE;
@@ -856,7 +856,7 @@ static int __init ashmem_init(void)
ashmem_area_cachep = kmem_cache_create("ashmem_area_cache",
   sizeof(struct ashmem_area),
   0, 0, NULL);
-   if (unlikely(!ashmem_area_cachep)) {
+   if (!ashmem_area_cachep) {
pr_err("failed to create slab cache\n");
goto out;
}
@@ -864,13 +864,13 @@ static int __init ashmem_init(void)
ashmem_range_cachep = kmem_cache_create("ashmem_range_cache",
 

[PATCH 1/2] staging: android: ashmem: Remove use of unlikely()

2018-06-19 Thread Alistair Strachan
There is no speed difference, and it makes the code harder to read.

Cc: Greg Kroah-Hartman 
Cc: Arve Hjønnevåg 
Cc: Todd Kjos 
Cc: Martijn Coenen 
Cc: de...@driverdev.osuosl.org
Cc: linux-kernel@vger.kernel.org
Cc: kernel-t...@android.com
Cc: Joel Fernandes 
Suggested-by: Greg Kroah-Hartman 
Signed-off-by: Alistair Strachan 
---
 drivers/staging/android/ashmem.c | 34 
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/drivers/staging/android/ashmem.c b/drivers/staging/android/ashmem.c
index a1a0025b59e0..c6386e4f5c9b 100644
--- a/drivers/staging/android/ashmem.c
+++ b/drivers/staging/android/ashmem.c
@@ -178,7 +178,7 @@ static int range_alloc(struct ashmem_area *asma,
struct ashmem_range *range;
 
range = kmem_cache_zalloc(ashmem_range_cachep, GFP_KERNEL);
-   if (unlikely(!range))
+   if (!range)
return -ENOMEM;
 
range->asma = asma;
@@ -246,11 +246,11 @@ static int ashmem_open(struct inode *inode, struct file 
*file)
int ret;
 
ret = generic_file_open(inode, file);
-   if (unlikely(ret))
+   if (ret)
return ret;
 
asma = kmem_cache_zalloc(ashmem_area_cachep, GFP_KERNEL);
-   if (unlikely(!asma))
+   if (!asma)
return -ENOMEM;
 
INIT_LIST_HEAD(>unpinned_list);
@@ -361,14 +361,14 @@ static int ashmem_mmap(struct file *file, struct 
vm_area_struct *vma)
mutex_lock(_mutex);
 
/* user needs to SET_SIZE before mapping */
-   if (unlikely(!asma->size)) {
+   if (!asma->size) {
ret = -EINVAL;
goto out;
}
 
/* requested protection bits must match our allowed protection mask */
-   if (unlikely((vma->vm_flags & ~calc_vm_prot_bits(asma->prot_mask, 0)) &
-calc_vm_prot_bits(PROT_MASK, 0))) {
+   if ((vma->vm_flags & ~calc_vm_prot_bits(asma->prot_mask, 0)) &
+   calc_vm_prot_bits(PROT_MASK, 0)) {
ret = -EPERM;
goto out;
}
@@ -486,7 +486,7 @@ static int set_prot_mask(struct ashmem_area *asma, unsigned 
long prot)
mutex_lock(_mutex);
 
/* the user can only remove, not add, protection bits */
-   if (unlikely((asma->prot_mask & prot) != prot)) {
+   if ((asma->prot_mask & prot) != prot) {
ret = -EINVAL;
goto out;
}
@@ -524,7 +524,7 @@ static int set_name(struct ashmem_area *asma, void __user 
*name)
local_name[ASHMEM_NAME_LEN - 1] = '\0';
mutex_lock(_mutex);
/* cannot change an existing mapping's name */
-   if (unlikely(asma->file))
+   if (asma->file)
ret = -EINVAL;
else
strcpy(asma->name + ASHMEM_NAME_PREFIX_LEN, local_name);
@@ -563,7 +563,7 @@ static int get_name(struct ashmem_area *asma, void __user 
*name)
 * Now we are just copying from the stack variable to userland
 * No lock held
 */
-   if (unlikely(copy_to_user(name, local_name, len)))
+   if (copy_to_user(name, local_name, len))
ret = -EFAULT;
return ret;
 }
@@ -701,25 +701,25 @@ static int ashmem_pin_unpin(struct ashmem_area *asma, 
unsigned long cmd,
size_t pgstart, pgend;
int ret = -EINVAL;
 
-   if (unlikely(copy_from_user(, p, sizeof(pin
+   if (copy_from_user(, p, sizeof(pin)))
return -EFAULT;
 
mutex_lock(_mutex);
 
-   if (unlikely(!asma->file))
+   if (!asma->file)
goto out_unlock;
 
/* per custom, you can pass zero for len to mean "everything onward" */
if (!pin.len)
pin.len = PAGE_ALIGN(asma->size) - pin.offset;
 
-   if (unlikely((pin.offset | pin.len) & ~PAGE_MASK))
+   if ((pin.offset | pin.len) & ~PAGE_MASK)
goto out_unlock;
 
-   if (unlikely(((__u32)-1) - pin.offset < pin.len))
+   if (((__u32)-1) - pin.offset < pin.len)
goto out_unlock;
 
-   if (unlikely(PAGE_ALIGN(asma->size) < pin.offset + pin.len))
+   if (PAGE_ALIGN(asma->size) < pin.offset + pin.len)
goto out_unlock;
 
pgstart = pin.offset / PAGE_SIZE;
@@ -856,7 +856,7 @@ static int __init ashmem_init(void)
ashmem_area_cachep = kmem_cache_create("ashmem_area_cache",
   sizeof(struct ashmem_area),
   0, 0, NULL);
-   if (unlikely(!ashmem_area_cachep)) {
+   if (!ashmem_area_cachep) {
pr_err("failed to create slab cache\n");
goto out;
}
@@ -864,13 +864,13 @@ static int __init ashmem_init(void)
ashmem_range_cachep = kmem_cache_create("ashmem_range_cache",
 

Re: [PATCH] staging: android: ashmem: Fix mmap size validation

2018-06-19 Thread Alistair Strachan
HI Greg,

On Tue, Jun 19, 2018 at 4:01 PM Greg Kroah-Hartman
 wrote:
>
> On Tue, Jun 19, 2018 at 03:24:44PM -0700, Alistair Strachan wrote:
> > The ashmem driver did not check that the size/offset of the vma passed
> > to its .mmap() function was not larger than the ashmem object being
> > mapped. This could cause mmap() to succeed, even though accessing parts
> > of the mapping would later fail with a segmentation fault.
> >
> > Ensure an error is returned by the ashmem_mmap() function if the vma
> > size is larger than the ashmem object size. This enables safer handling
> > of the problem in userspace.
>
> Is this going to break current users of this api as their original call
> was succeeding?

If the user created an ashmem region, set its size with
ASHMEM_SET_SIZE, then mmap()ed a larger size, but did not touch the
extra memory it just tried to mmap(), it could pass with the existing
driver. With this patch, it would fail at mmap() time. We're not aware
of users that rely on such behavior.

> >
> > Cc: Greg Kroah-Hartman 
> > Cc: Arve Hjønnevåg 
> > Cc: Todd Kjos 
> > Cc: Martijn Coenen 
> > Cc: de...@driverdev.osuosl.org
> > Cc: kernel-t...@android.com
> > Signed-off-by: Alistair Strachan 
> > ---
> >  drivers/staging/android/ashmem.c | 6 ++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/drivers/staging/android/ashmem.c 
> > b/drivers/staging/android/ashmem.c
> > index a1a0025b59e0..1eeedb529a10 100644
> > --- a/drivers/staging/android/ashmem.c
> > +++ b/drivers/staging/android/ashmem.c
> > @@ -366,6 +366,12 @@ static int ashmem_mmap(struct file *file, struct 
> > vm_area_struct *vma)
> >   goto out;
> >   }
> >
> > + /* requested mapping size larger than object size */
> > + if (unlikely(vma->vm_end - vma->vm_start > PAGE_ALIGN(asma->size))) {
> > + ret = -EINVAL;
> > + goto out;
> > + }
>
> Unless you can measure the speed difference, never use likely/unlikely.
> The CPU and compiler almost always knows how to do this better than we
> do.  I know there are other checks like this in this function, those
> should also be fixed as well.

Sure. I'll split that out into another change and resend.

>
> thanks,
>
> greg k-h


Re: [PATCH] staging: android: ashmem: Fix mmap size validation

2018-06-19 Thread Alistair Strachan
HI Greg,

On Tue, Jun 19, 2018 at 4:01 PM Greg Kroah-Hartman
 wrote:
>
> On Tue, Jun 19, 2018 at 03:24:44PM -0700, Alistair Strachan wrote:
> > The ashmem driver did not check that the size/offset of the vma passed
> > to its .mmap() function was not larger than the ashmem object being
> > mapped. This could cause mmap() to succeed, even though accessing parts
> > of the mapping would later fail with a segmentation fault.
> >
> > Ensure an error is returned by the ashmem_mmap() function if the vma
> > size is larger than the ashmem object size. This enables safer handling
> > of the problem in userspace.
>
> Is this going to break current users of this api as their original call
> was succeeding?

If the user created an ashmem region, set its size with
ASHMEM_SET_SIZE, then mmap()ed a larger size, but did not touch the
extra memory it just tried to mmap(), it could pass with the existing
driver. With this patch, it would fail at mmap() time. We're not aware
of users that rely on such behavior.

> >
> > Cc: Greg Kroah-Hartman 
> > Cc: Arve Hjønnevåg 
> > Cc: Todd Kjos 
> > Cc: Martijn Coenen 
> > Cc: de...@driverdev.osuosl.org
> > Cc: kernel-t...@android.com
> > Signed-off-by: Alistair Strachan 
> > ---
> >  drivers/staging/android/ashmem.c | 6 ++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/drivers/staging/android/ashmem.c 
> > b/drivers/staging/android/ashmem.c
> > index a1a0025b59e0..1eeedb529a10 100644
> > --- a/drivers/staging/android/ashmem.c
> > +++ b/drivers/staging/android/ashmem.c
> > @@ -366,6 +366,12 @@ static int ashmem_mmap(struct file *file, struct 
> > vm_area_struct *vma)
> >   goto out;
> >   }
> >
> > + /* requested mapping size larger than object size */
> > + if (unlikely(vma->vm_end - vma->vm_start > PAGE_ALIGN(asma->size))) {
> > + ret = -EINVAL;
> > + goto out;
> > + }
>
> Unless you can measure the speed difference, never use likely/unlikely.
> The CPU and compiler almost always knows how to do this better than we
> do.  I know there are other checks like this in this function, those
> should also be fixed as well.

Sure. I'll split that out into another change and resend.

>
> thanks,
>
> greg k-h


[PATCH] staging: android: ashmem: Fix mmap size validation

2018-06-19 Thread Alistair Strachan
The ashmem driver did not check that the size/offset of the vma passed
to its .mmap() function was not larger than the ashmem object being
mapped. This could cause mmap() to succeed, even though accessing parts
of the mapping would later fail with a segmentation fault.

Ensure an error is returned by the ashmem_mmap() function if the vma
size is larger than the ashmem object size. This enables safer handling
of the problem in userspace.

Cc: Greg Kroah-Hartman 
Cc: Arve Hjønnevåg 
Cc: Todd Kjos 
Cc: Martijn Coenen 
Cc: de...@driverdev.osuosl.org
Cc: kernel-t...@android.com
Signed-off-by: Alistair Strachan 
---
 drivers/staging/android/ashmem.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/staging/android/ashmem.c b/drivers/staging/android/ashmem.c
index a1a0025b59e0..1eeedb529a10 100644
--- a/drivers/staging/android/ashmem.c
+++ b/drivers/staging/android/ashmem.c
@@ -366,6 +366,12 @@ static int ashmem_mmap(struct file *file, struct 
vm_area_struct *vma)
goto out;
}
 
+   /* requested mapping size larger than object size */
+   if (unlikely(vma->vm_end - vma->vm_start > PAGE_ALIGN(asma->size))) {
+   ret = -EINVAL;
+   goto out;
+   }
+
/* requested protection bits must match our allowed protection mask */
if (unlikely((vma->vm_flags & ~calc_vm_prot_bits(asma->prot_mask, 0)) &
 calc_vm_prot_bits(PROT_MASK, 0))) {


[PATCH] staging: android: ashmem: Fix mmap size validation

2018-06-19 Thread Alistair Strachan
The ashmem driver did not check that the size/offset of the vma passed
to its .mmap() function was not larger than the ashmem object being
mapped. This could cause mmap() to succeed, even though accessing parts
of the mapping would later fail with a segmentation fault.

Ensure an error is returned by the ashmem_mmap() function if the vma
size is larger than the ashmem object size. This enables safer handling
of the problem in userspace.

Cc: Greg Kroah-Hartman 
Cc: Arve Hjønnevåg 
Cc: Todd Kjos 
Cc: Martijn Coenen 
Cc: de...@driverdev.osuosl.org
Cc: kernel-t...@android.com
Signed-off-by: Alistair Strachan 
---
 drivers/staging/android/ashmem.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/staging/android/ashmem.c b/drivers/staging/android/ashmem.c
index a1a0025b59e0..1eeedb529a10 100644
--- a/drivers/staging/android/ashmem.c
+++ b/drivers/staging/android/ashmem.c
@@ -366,6 +366,12 @@ static int ashmem_mmap(struct file *file, struct 
vm_area_struct *vma)
goto out;
}
 
+   /* requested mapping size larger than object size */
+   if (unlikely(vma->vm_end - vma->vm_start > PAGE_ALIGN(asma->size))) {
+   ret = -EINVAL;
+   goto out;
+   }
+
/* requested protection bits must match our allowed protection mask */
if (unlikely((vma->vm_flags & ~calc_vm_prot_bits(asma->prot_mask, 0)) &
 calc_vm_prot_bits(PROT_MASK, 0))) {


Re: [PATCH v2] proc: Simplify and fix proc by removing the kernel mount

2018-06-17 Thread Alistair Strachan
Hi Eric,

Thanks a lot for looking into this problem.

On Sat, Jun 16, 2018 at 7:55 PM Eric W. Biederman  wrote:
>
>
> Today there are three users of proc_mnt.
> - The legacy sysctl system call implementation.
> - The uml mconsole driver.
> - The process cleanup function proc_flush_task.
>
> The first two are slow path and essentially unused.  I expect soon we
> will be able to remove the legacy sysctl system call entirely.  To
> keep them working for now a new wrapper file_open_proc_is added to
> mount and unmount proc around file_open_root.  Which nicely removes
> the need for a always mounted proc instance for these cases.
>
> Handling proc_flush_task which is regularly used requires a little more
> work.  First I optimize proc_flush_task to do nothing where there is
> evidence that there are no entries in proc, by looking at pid->count.
> Then I carefully update proc_fill_super and proc_kill_sb to maintain a
> ns->proc_super pointer to the super block for proc.  This allows
> proc_flush_task to find the appropriate instance of proc via rcu.
>
> Once the appropriate instance of proc is found in proc_flush_task
> atomic_inc_not_zero is used to increase the s_active count ensuring
> proc_kill_sb will not be called, until the superblock is deactivated.
> This makes it safe to inspect the instance of proc and invalidate any
> dentries that mention the exiting task.
>
> The two extra atomics operations in exit are not my favorite but given
> that exit is already almost completely serialized with the task lock I
> do not expect this change will be measurable.
>
> The benefit for all of this change is that one of the most error prone
> and tricky parts of the pid namespace implementation, maintaining
> kernel mounts of proc is removed.
>
> In addition removing the unnecessary complexity of the kernel mount
> fixes a regression that caused the proc mount options to be ignored.
> Now that the initial mount of proc comes from userspace, those mount
> options are again honored.  This fixes Android's usage of the proc
> hidepid option.
>
> Reported-by: Alistair Strachan 
> Cc: sta...@vger.kernel.org
> Fixes: e94591d0d90c ("proc: Convert proc_mount to use mount_ns.")
> Signed-off-by: "Eric W. Biederman" 
> ---
>
> Alistair if you can test and confirm this fixes your issue I will add
> your tested by and send the fix to Linus.

I tested v2 with both UML and qemu-system-x86_64 / ARCH=x86_64 against
4.18-rc1, 4.14 and 4.9 and I couldn't break it. The hidepid problem is
resolved, and the mount flags can now only be specified on the first
userspace mount for that pid namespace.

Tested-by: Alistair Strachan 

> Since my earlier posting I have spot tested this.  Fixed a few bugs that
> showed up and verified my changes work.  So I think this is ready to go
> unless someone looks at this and in testing or code review spots a bug.

Agreed!

> Eric
>
>  arch/um/drivers/mconsole_kern.c |  4 ++--
>  fs/proc/base.c  | 36 ++---
>  fs/proc/inode.c |  5 -
>  fs/proc/root.c  | 28 ++---
>  include/linux/pid_namespace.h   |  3 +--
>  include/linux/proc_ns.h |  7 ++-
>  kernel/pid.c|  8 
>  kernel/pid_namespace.c  |  7 ---
>  kernel/sysctl_binary.c  |  5 ++---
>  9 files changed, 51 insertions(+), 52 deletions(-)
>
> diff --git a/arch/um/drivers/mconsole_kern.c b/arch/um/drivers/mconsole_kern.c
> index d5f9a2d1da1b..36af0e02d56b 100644
> --- a/arch/um/drivers/mconsole_kern.c
> +++ b/arch/um/drivers/mconsole_kern.c
> @@ -27,6 +27,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #include 
>  #include 
> @@ -124,7 +125,6 @@ void mconsole_log(struct mc_request *req)
>
>  void mconsole_proc(struct mc_request *req)
>  {
> -   struct vfsmount *mnt = task_active_pid_ns(current)->proc_mnt;
> char *buf;
> int len;
> struct file *file;
> @@ -135,7 +135,7 @@ void mconsole_proc(struct mc_request *req)
> ptr += strlen("proc");
> ptr = skip_spaces(ptr);
>
> -   file = file_open_root(mnt->mnt_root, mnt, ptr, O_RDONLY, 0);
> +   file = file_open_proc(ptr, O_RDONLY, 0);
> if (IS_ERR(file)) {
> mconsole_reply(req, "Failed to open file", 1, 0);
> printk(KERN_ERR "open /proc/%s: %ld\n", ptr, PTR_ERR(file));
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index 1b2ede6abcdf..cd7b68a64ed1 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -3052,7 +3052,7 @@ static const struct inode_operations 
> proc_tgid_base_inode_operations = {
&g

Re: [PATCH v2] proc: Simplify and fix proc by removing the kernel mount

2018-06-17 Thread Alistair Strachan
Hi Eric,

Thanks a lot for looking into this problem.

On Sat, Jun 16, 2018 at 7:55 PM Eric W. Biederman  wrote:
>
>
> Today there are three users of proc_mnt.
> - The legacy sysctl system call implementation.
> - The uml mconsole driver.
> - The process cleanup function proc_flush_task.
>
> The first two are slow path and essentially unused.  I expect soon we
> will be able to remove the legacy sysctl system call entirely.  To
> keep them working for now a new wrapper file_open_proc_is added to
> mount and unmount proc around file_open_root.  Which nicely removes
> the need for a always mounted proc instance for these cases.
>
> Handling proc_flush_task which is regularly used requires a little more
> work.  First I optimize proc_flush_task to do nothing where there is
> evidence that there are no entries in proc, by looking at pid->count.
> Then I carefully update proc_fill_super and proc_kill_sb to maintain a
> ns->proc_super pointer to the super block for proc.  This allows
> proc_flush_task to find the appropriate instance of proc via rcu.
>
> Once the appropriate instance of proc is found in proc_flush_task
> atomic_inc_not_zero is used to increase the s_active count ensuring
> proc_kill_sb will not be called, until the superblock is deactivated.
> This makes it safe to inspect the instance of proc and invalidate any
> dentries that mention the exiting task.
>
> The two extra atomics operations in exit are not my favorite but given
> that exit is already almost completely serialized with the task lock I
> do not expect this change will be measurable.
>
> The benefit for all of this change is that one of the most error prone
> and tricky parts of the pid namespace implementation, maintaining
> kernel mounts of proc is removed.
>
> In addition removing the unnecessary complexity of the kernel mount
> fixes a regression that caused the proc mount options to be ignored.
> Now that the initial mount of proc comes from userspace, those mount
> options are again honored.  This fixes Android's usage of the proc
> hidepid option.
>
> Reported-by: Alistair Strachan 
> Cc: sta...@vger.kernel.org
> Fixes: e94591d0d90c ("proc: Convert proc_mount to use mount_ns.")
> Signed-off-by: "Eric W. Biederman" 
> ---
>
> Alistair if you can test and confirm this fixes your issue I will add
> your tested by and send the fix to Linus.

I tested v2 with both UML and qemu-system-x86_64 / ARCH=x86_64 against
4.18-rc1, 4.14 and 4.9 and I couldn't break it. The hidepid problem is
resolved, and the mount flags can now only be specified on the first
userspace mount for that pid namespace.

Tested-by: Alistair Strachan 

> Since my earlier posting I have spot tested this.  Fixed a few bugs that
> showed up and verified my changes work.  So I think this is ready to go
> unless someone looks at this and in testing or code review spots a bug.

Agreed!

> Eric
>
>  arch/um/drivers/mconsole_kern.c |  4 ++--
>  fs/proc/base.c  | 36 ++---
>  fs/proc/inode.c |  5 -
>  fs/proc/root.c  | 28 ++---
>  include/linux/pid_namespace.h   |  3 +--
>  include/linux/proc_ns.h |  7 ++-
>  kernel/pid.c|  8 
>  kernel/pid_namespace.c  |  7 ---
>  kernel/sysctl_binary.c  |  5 ++---
>  9 files changed, 51 insertions(+), 52 deletions(-)
>
> diff --git a/arch/um/drivers/mconsole_kern.c b/arch/um/drivers/mconsole_kern.c
> index d5f9a2d1da1b..36af0e02d56b 100644
> --- a/arch/um/drivers/mconsole_kern.c
> +++ b/arch/um/drivers/mconsole_kern.c
> @@ -27,6 +27,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #include 
>  #include 
> @@ -124,7 +125,6 @@ void mconsole_log(struct mc_request *req)
>
>  void mconsole_proc(struct mc_request *req)
>  {
> -   struct vfsmount *mnt = task_active_pid_ns(current)->proc_mnt;
> char *buf;
> int len;
> struct file *file;
> @@ -135,7 +135,7 @@ void mconsole_proc(struct mc_request *req)
> ptr += strlen("proc");
> ptr = skip_spaces(ptr);
>
> -   file = file_open_root(mnt->mnt_root, mnt, ptr, O_RDONLY, 0);
> +   file = file_open_proc(ptr, O_RDONLY, 0);
> if (IS_ERR(file)) {
> mconsole_reply(req, "Failed to open file", 1, 0);
> printk(KERN_ERR "open /proc/%s: %ld\n", ptr, PTR_ERR(file));
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index 1b2ede6abcdf..cd7b68a64ed1 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -3052,7 +3052,7 @@ static const struct inode_operations 
> proc_tgid_base_inode_operations = {
&g

Re: [PATCH] proc: Fix parsing of mount parameters.

2018-06-12 Thread Alistair Strachan
On Mon, Jun 11, 2018 at 6:22 PM Eric W. Biederman  wrote:
>
> Alistair Strachan  writes:
>
> > In commit e94591d0d90c "proc: Convert proc_mount to use mount_ns"
> > the parsing of mount parameters for the proc filesystem was broken.
> >
> > The SB_KERNMOUNT for procfs happens via:
> >
> >   start_kernel()
> > rest_init()
> >   kernel_thread()
> > _do_fork()
> >copy_process()
> >  alloc_pid()
> >pid_ns_prepare_proc()
> >  kern_mount_data()
> >proc_mount()
> >  mount_ns()
> >
> > In mount_ns(), the kernel calls proc_fill_super() only if the superblock
> > has not previously been set up (i.e. the first mount reference),
> > regardless of SB_KERNMOUNT. Because the call to proc_parse_options() had
> > been moved inside here, and the SB_KERNMOUNT uses no mount options, the
> > option parser became a no-op.
> >
> > When userspace later mounted procfs with e.g. hidepid=2, the options
> > would be ignored.
> >
> > This change backs out a part of the original cleanup and parses the
> > procfs mount options at every mount call. Because the options currently
> > only update the pid_ns for the mount, they are applied for all mounts of
> > proc by that pid or childen of that pid, instantaneously. This is the
> > same behavior as the original code.
>
> Two years for a regression to be reported is a litte long.  I think that
> gets out of the kneejerk immediate fix or revert phase and into thinking
> a little bout about what makes sense in this code.

Android has been using hidepid=2 for a while, but most shipping
products were on 3.18 or 4.4 kernels. To us, it's a new problem.

> As we say with devpts there is a very real danger of someone mounting
> a second instance of proc in a chroot and causing problems by either
> strengthening or weakening the hid pid protections for the entire pid
> namespace.  If we go with your proposed change in behavior.

I guess my change does change the behavior, but it's just back to the
behavior which the kernel had for a good while (~v3.3 thru v4.7).

> Ordinary block device filesystems (like ext4) avoid this problem by
> allowing a second mount and by not parsing the mount options except
> on remount.  What proc currently does.

IMHO, they're not really comparable. You'll only get kernmounts of an
ext4 filesystem when finding rootfs, and in that case the user knows
about the mount and can see it in /proc/mounts, so they know to use -o
remount,.

Since the first mount (where the options might have been respected) is
*always* the kernmount done before init, with your change these mount
options for procfs will never be respected. As userspace didn't yet
mount /proc, it can't know /proc was already mounted, in order to know
to use a remount to re-parse the options. The behavior was changed in
a non-obvious way.

> So I think it can be reasonably argued that the change in behavior is
> was an unintentional fix.
>
> I can see an argument for failing the mount of proc if mount options
> are specified or if those mount options differ from the existing mount
> options.
>
> proc_remount's call of proc_parse_options is definitely buggy as it can
> partially succeed and change the pid namespace and return an error code.
> That is bad error handling.
>
> There may be an argument for making these options available in something
> other than a mount of proc.  As they are pid namespace wide.
>
> There may be an argument for multiple instances of proc so that it makes
> sense to process these options during an ordinary mount.
>
>
> Ultimately what I see is that this is a difficult area of semantics that
> there is at least a little room for improvement on, but it is not
> as simple as this proposed change.

An alternative fix might be to ignore the super setup if done from a
kernmount of procfs. IMO, this initial mount shouldn't be considered
the first reference, because it will not pass the mount options and
cannot be observed by userspace. Such a change looks complicated,
though, and it would only be relevant to procfs. It might be better to
roll back the cleanup and implement these semantics directly in the
procfs code.

> > Fixes: e94591d0d90c ("proc: Convert proc_mount to use mount_ns")
> > Signed-off-by: Alistair Strachan 
> > Cc: Seth Forshee 
> > Cc: Djalal Harouni 
> > Cc: "Eric W. Biederman" 
> > Cc: kernel-t...@android.com
> > Cc: linux-kernel@vger.kernel.org
> > ---
> >  fs/proc/inode.c| 4 
> >  fs/proc/internal.h | 1 -
> >  fs/proc/root.c | 5 -
> >  3 files changed, 4 insertions(+), 6 

Re: [PATCH] proc: Fix parsing of mount parameters.

2018-06-12 Thread Alistair Strachan
On Mon, Jun 11, 2018 at 6:22 PM Eric W. Biederman  wrote:
>
> Alistair Strachan  writes:
>
> > In commit e94591d0d90c "proc: Convert proc_mount to use mount_ns"
> > the parsing of mount parameters for the proc filesystem was broken.
> >
> > The SB_KERNMOUNT for procfs happens via:
> >
> >   start_kernel()
> > rest_init()
> >   kernel_thread()
> > _do_fork()
> >copy_process()
> >  alloc_pid()
> >pid_ns_prepare_proc()
> >  kern_mount_data()
> >proc_mount()
> >  mount_ns()
> >
> > In mount_ns(), the kernel calls proc_fill_super() only if the superblock
> > has not previously been set up (i.e. the first mount reference),
> > regardless of SB_KERNMOUNT. Because the call to proc_parse_options() had
> > been moved inside here, and the SB_KERNMOUNT uses no mount options, the
> > option parser became a no-op.
> >
> > When userspace later mounted procfs with e.g. hidepid=2, the options
> > would be ignored.
> >
> > This change backs out a part of the original cleanup and parses the
> > procfs mount options at every mount call. Because the options currently
> > only update the pid_ns for the mount, they are applied for all mounts of
> > proc by that pid or childen of that pid, instantaneously. This is the
> > same behavior as the original code.
>
> Two years for a regression to be reported is a litte long.  I think that
> gets out of the kneejerk immediate fix or revert phase and into thinking
> a little bout about what makes sense in this code.

Android has been using hidepid=2 for a while, but most shipping
products were on 3.18 or 4.4 kernels. To us, it's a new problem.

> As we say with devpts there is a very real danger of someone mounting
> a second instance of proc in a chroot and causing problems by either
> strengthening or weakening the hid pid protections for the entire pid
> namespace.  If we go with your proposed change in behavior.

I guess my change does change the behavior, but it's just back to the
behavior which the kernel had for a good while (~v3.3 thru v4.7).

> Ordinary block device filesystems (like ext4) avoid this problem by
> allowing a second mount and by not parsing the mount options except
> on remount.  What proc currently does.

IMHO, they're not really comparable. You'll only get kernmounts of an
ext4 filesystem when finding rootfs, and in that case the user knows
about the mount and can see it in /proc/mounts, so they know to use -o
remount,.

Since the first mount (where the options might have been respected) is
*always* the kernmount done before init, with your change these mount
options for procfs will never be respected. As userspace didn't yet
mount /proc, it can't know /proc was already mounted, in order to know
to use a remount to re-parse the options. The behavior was changed in
a non-obvious way.

> So I think it can be reasonably argued that the change in behavior is
> was an unintentional fix.
>
> I can see an argument for failing the mount of proc if mount options
> are specified or if those mount options differ from the existing mount
> options.
>
> proc_remount's call of proc_parse_options is definitely buggy as it can
> partially succeed and change the pid namespace and return an error code.
> That is bad error handling.
>
> There may be an argument for making these options available in something
> other than a mount of proc.  As they are pid namespace wide.
>
> There may be an argument for multiple instances of proc so that it makes
> sense to process these options during an ordinary mount.
>
>
> Ultimately what I see is that this is a difficult area of semantics that
> there is at least a little room for improvement on, but it is not
> as simple as this proposed change.

An alternative fix might be to ignore the super setup if done from a
kernmount of procfs. IMO, this initial mount shouldn't be considered
the first reference, because it will not pass the mount options and
cannot be observed by userspace. Such a change looks complicated,
though, and it would only be relevant to procfs. It might be better to
roll back the cleanup and implement these semantics directly in the
procfs code.

> > Fixes: e94591d0d90c ("proc: Convert proc_mount to use mount_ns")
> > Signed-off-by: Alistair Strachan 
> > Cc: Seth Forshee 
> > Cc: Djalal Harouni 
> > Cc: "Eric W. Biederman" 
> > Cc: kernel-t...@android.com
> > Cc: linux-kernel@vger.kernel.org
> > ---
> >  fs/proc/inode.c| 4 
> >  fs/proc/internal.h | 1 -
> >  fs/proc/root.c | 5 -
> >  3 files changed, 4 insertions(+), 6 

[PATCH] proc: Fix parsing of mount parameters.

2018-06-11 Thread Alistair Strachan
In commit e94591d0d90c "proc: Convert proc_mount to use mount_ns"
the parsing of mount parameters for the proc filesystem was broken.

The SB_KERNMOUNT for procfs happens via:

  start_kernel()
rest_init()
  kernel_thread()
_do_fork()
   copy_process()
 alloc_pid()
   pid_ns_prepare_proc()
 kern_mount_data()
   proc_mount()
 mount_ns()

In mount_ns(), the kernel calls proc_fill_super() only if the superblock
has not previously been set up (i.e. the first mount reference),
regardless of SB_KERNMOUNT. Because the call to proc_parse_options() had
been moved inside here, and the SB_KERNMOUNT uses no mount options, the
option parser became a no-op.

When userspace later mounted procfs with e.g. hidepid=2, the options
would be ignored.

This change backs out a part of the original cleanup and parses the
procfs mount options at every mount call. Because the options currently
only update the pid_ns for the mount, they are applied for all mounts of
proc by that pid or childen of that pid, instantaneously. This is the
same behavior as the original code.

Fixes: e94591d0d90c ("proc: Convert proc_mount to use mount_ns")
Signed-off-by: Alistair Strachan 
Cc: Seth Forshee 
Cc: Djalal Harouni 
Cc: "Eric W. Biederman" 
Cc: kernel-t...@android.com
Cc: linux-kernel@vger.kernel.org
---
 fs/proc/inode.c| 4 
 fs/proc/internal.h | 1 -
 fs/proc/root.c | 5 -
 3 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 2cf3b74391ca..f348be0a 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -492,13 +492,9 @@ struct inode *proc_get_inode(struct super_block *sb, 
struct proc_dir_entry *de)
 
 int proc_fill_super(struct super_block *s, void *data, int silent)
 {
-   struct pid_namespace *ns = get_pid_ns(s->s_fs_info);
struct inode *root_inode;
int ret;
 
-   if (!proc_parse_options(data, ns))
-   return -EINVAL;
-
/* User space would break if executables or devices appear on proc */
s->s_iflags |= SB_I_USERNS_VISIBLE | SB_I_NOEXEC | SB_I_NODEV;
s->s_flags |= SB_NODIRATIME | SB_NOSUID | SB_NOEXEC;
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 50cb22a08c2f..89b7e845b000 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -264,7 +264,6 @@ static inline void proc_tty_init(void) {}
  * root.c
  */
 extern struct proc_dir_entry proc_root;
-extern int proc_parse_options(char *options, struct pid_namespace *pid);
 
 extern void proc_self_init(void);
 extern int proc_remount(struct super_block *, int *, char *);
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 61b7340b357a..d40676a5dd6c 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -36,7 +36,7 @@ static const match_table_t tokens = {
{Opt_err, NULL},
 };
 
-int proc_parse_options(char *options, struct pid_namespace *pid)
+static int proc_parse_options(char *options, struct pid_namespace *pid)
 {
char *p;
substring_t args[MAX_OPT_ARGS];
@@ -98,6 +98,9 @@ static struct dentry *proc_mount(struct file_system_type 
*fs_type,
ns = task_active_pid_ns(current);
}
 
+   if (!proc_parse_options(data, ns))
+   return ERR_PTR(-EINVAL);
+
return mount_ns(fs_type, flags, data, ns, ns->user_ns, proc_fill_super);
 }
 
-- 
2.18.0.rc1.242.g61856ae69a-goog



[PATCH] proc: Fix parsing of mount parameters.

2018-06-11 Thread Alistair Strachan
In commit e94591d0d90c "proc: Convert proc_mount to use mount_ns"
the parsing of mount parameters for the proc filesystem was broken.

The SB_KERNMOUNT for procfs happens via:

  start_kernel()
rest_init()
  kernel_thread()
_do_fork()
   copy_process()
 alloc_pid()
   pid_ns_prepare_proc()
 kern_mount_data()
   proc_mount()
 mount_ns()

In mount_ns(), the kernel calls proc_fill_super() only if the superblock
has not previously been set up (i.e. the first mount reference),
regardless of SB_KERNMOUNT. Because the call to proc_parse_options() had
been moved inside here, and the SB_KERNMOUNT uses no mount options, the
option parser became a no-op.

When userspace later mounted procfs with e.g. hidepid=2, the options
would be ignored.

This change backs out a part of the original cleanup and parses the
procfs mount options at every mount call. Because the options currently
only update the pid_ns for the mount, they are applied for all mounts of
proc by that pid or childen of that pid, instantaneously. This is the
same behavior as the original code.

Fixes: e94591d0d90c ("proc: Convert proc_mount to use mount_ns")
Signed-off-by: Alistair Strachan 
Cc: Seth Forshee 
Cc: Djalal Harouni 
Cc: "Eric W. Biederman" 
Cc: kernel-t...@android.com
Cc: linux-kernel@vger.kernel.org
---
 fs/proc/inode.c| 4 
 fs/proc/internal.h | 1 -
 fs/proc/root.c | 5 -
 3 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 2cf3b74391ca..f348be0a 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -492,13 +492,9 @@ struct inode *proc_get_inode(struct super_block *sb, 
struct proc_dir_entry *de)
 
 int proc_fill_super(struct super_block *s, void *data, int silent)
 {
-   struct pid_namespace *ns = get_pid_ns(s->s_fs_info);
struct inode *root_inode;
int ret;
 
-   if (!proc_parse_options(data, ns))
-   return -EINVAL;
-
/* User space would break if executables or devices appear on proc */
s->s_iflags |= SB_I_USERNS_VISIBLE | SB_I_NOEXEC | SB_I_NODEV;
s->s_flags |= SB_NODIRATIME | SB_NOSUID | SB_NOEXEC;
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 50cb22a08c2f..89b7e845b000 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -264,7 +264,6 @@ static inline void proc_tty_init(void) {}
  * root.c
  */
 extern struct proc_dir_entry proc_root;
-extern int proc_parse_options(char *options, struct pid_namespace *pid);
 
 extern void proc_self_init(void);
 extern int proc_remount(struct super_block *, int *, char *);
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 61b7340b357a..d40676a5dd6c 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -36,7 +36,7 @@ static const match_table_t tokens = {
{Opt_err, NULL},
 };
 
-int proc_parse_options(char *options, struct pid_namespace *pid)
+static int proc_parse_options(char *options, struct pid_namespace *pid)
 {
char *p;
substring_t args[MAX_OPT_ARGS];
@@ -98,6 +98,9 @@ static struct dentry *proc_mount(struct file_system_type 
*fs_type,
ns = task_active_pid_ns(current);
}
 
+   if (!proc_parse_options(data, ns))
+   return ERR_PTR(-EINVAL);
+
return mount_ns(fs_type, flags, data, ns, ns->user_ns, proc_fill_super);
 }
 
-- 
2.18.0.rc1.242.g61856ae69a-goog



Re: [clang] stack protector and f1f029c7bf

2018-05-24 Thread Alistair Strachan
(Resent plain text)

On Thu, May 24, 2018 at 11:24 AM Nick Desaulniers 
wrote:

> On Thu, May 24, 2018 at 11:20 AM  wrote:
> > A stack canary on an *inlined* function? That's bound to break things
> elsewhere too sooner or later.

> But it's *not* inlined by GCC or Clang.

FWIW, GCC can also insert a stack guard in an out-of-lined inline function,
it just doesn't for this one. The -fstack-protector and
-fstack-protector-strong flags are heuristic and the heuristic does not
match between gcc and clang for this function. It is working on GCC purely
by chance.

> While the function is marked `static inline`, it's not in
> arch/x86/kernel/paravirt.o due to:
> arch/x86/kernel/paravirt.c:326

> 325 __visible struct pv_irq_ops pv_irq_ops = {

> 326 .save_fl = __PV_IS_CALLEE_SAVE(native_save_fl),

> see comparison of disassembly  attached in:
> https://bugs.llvm.org/attachment.cgi?id=20338
> --
> Thanks,
> ~Nick Desaulniers


Re: [clang] stack protector and f1f029c7bf

2018-05-24 Thread Alistair Strachan
(Resent plain text)

On Thu, May 24, 2018 at 11:24 AM Nick Desaulniers 
wrote:

> On Thu, May 24, 2018 at 11:20 AM  wrote:
> > A stack canary on an *inlined* function? That's bound to break things
> elsewhere too sooner or later.

> But it's *not* inlined by GCC or Clang.

FWIW, GCC can also insert a stack guard in an out-of-lined inline function,
it just doesn't for this one. The -fstack-protector and
-fstack-protector-strong flags are heuristic and the heuristic does not
match between gcc and clang for this function. It is working on GCC purely
by chance.

> While the function is marked `static inline`, it's not in
> arch/x86/kernel/paravirt.o due to:
> arch/x86/kernel/paravirt.c:326

> 325 __visible struct pv_irq_ops pv_irq_ops = {

> 326 .save_fl = __PV_IS_CALLEE_SAVE(native_save_fl),

> see comparison of disassembly  attached in:
> https://bugs.llvm.org/attachment.cgi?id=20338
> --
> Thanks,
> ~Nick Desaulniers


[PATCH 1/3] staging: Android: vsoc: Create wc kernel mapping for region shm.

2018-05-02 Thread Alistair Strachan
Map the region shm as write-combining instead of uncachable.

Cc: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Cc: Arve Hjønnevåg <a...@android.com>
Cc: Todd Kjos <tk...@android.com>
Cc: Martijn Coenen <m...@android.com>
Cc: Greg Hartman <ghart...@google.com>
Cc: de...@driverdev.osuosl.org
Cc: kernel-t...@android.com
Signed-off-by: Alistair Strachan <astrac...@google.com>
---
 drivers/staging/android/TODO   | 1 -
 drivers/staging/android/vsoc.c | 4 +---
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/staging/android/TODO b/drivers/staging/android/TODO
index 2ea6f97b8f0f..ebd6ba3ae02e 100644
--- a/drivers/staging/android/TODO
+++ b/drivers/staging/android/TODO
@@ -18,7 +18,6 @@ vsoc.c, uapi/vsoc_shm.h
waiting threads. We should eventually use multiple queues and select the
queue based on the region.
  - Add debugfs support for examining the permissions of regions.
- - Use ioremap_wc instead of ioremap_nocache.
  - Remove VSOC_WAIT_FOR_INCOMING_INTERRUPT ioctl. This functionality has been
superseded by the futex and is there for legacy reasons.
 
diff --git a/drivers/staging/android/vsoc.c b/drivers/staging/android/vsoc.c
index 587c66d709b9..794137b7751f 100644
--- a/drivers/staging/android/vsoc.c
+++ b/drivers/staging/android/vsoc.c
@@ -802,9 +802,7 @@ static int vsoc_probe_device(struct pci_dev *pdev,
 
dev_info(>dev, "shared memory @ DMA %p size=0x%zx\n",
 (void *)vsoc_dev.shm_phys_start, vsoc_dev.shm_size);
-   /* TODO(ghartman): ioremap_wc should work here */
-   vsoc_dev.kernel_mapped_shm = ioremap_nocache(
-   vsoc_dev.shm_phys_start, vsoc_dev.shm_size);
+   vsoc_dev.kernel_mapped_shm = pci_iomap_wc(pdev, SHARED_MEMORY_BAR, 0);
if (!vsoc_dev.kernel_mapped_shm) {
dev_err(_dev.dev->dev, "cannot iomap region\n");
vsoc_remove_device(pdev);


[PATCH 1/3] staging: Android: vsoc: Create wc kernel mapping for region shm.

2018-05-02 Thread Alistair Strachan
Map the region shm as write-combining instead of uncachable.

Cc: Greg Kroah-Hartman 
Cc: Arve Hjønnevåg 
Cc: Todd Kjos 
Cc: Martijn Coenen 
Cc: Greg Hartman 
Cc: de...@driverdev.osuosl.org
Cc: kernel-t...@android.com
Signed-off-by: Alistair Strachan 
---
 drivers/staging/android/TODO   | 1 -
 drivers/staging/android/vsoc.c | 4 +---
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/staging/android/TODO b/drivers/staging/android/TODO
index 2ea6f97b8f0f..ebd6ba3ae02e 100644
--- a/drivers/staging/android/TODO
+++ b/drivers/staging/android/TODO
@@ -18,7 +18,6 @@ vsoc.c, uapi/vsoc_shm.h
waiting threads. We should eventually use multiple queues and select the
queue based on the region.
  - Add debugfs support for examining the permissions of regions.
- - Use ioremap_wc instead of ioremap_nocache.
  - Remove VSOC_WAIT_FOR_INCOMING_INTERRUPT ioctl. This functionality has been
superseded by the futex and is there for legacy reasons.
 
diff --git a/drivers/staging/android/vsoc.c b/drivers/staging/android/vsoc.c
index 587c66d709b9..794137b7751f 100644
--- a/drivers/staging/android/vsoc.c
+++ b/drivers/staging/android/vsoc.c
@@ -802,9 +802,7 @@ static int vsoc_probe_device(struct pci_dev *pdev,
 
dev_info(>dev, "shared memory @ DMA %p size=0x%zx\n",
 (void *)vsoc_dev.shm_phys_start, vsoc_dev.shm_size);
-   /* TODO(ghartman): ioremap_wc should work here */
-   vsoc_dev.kernel_mapped_shm = ioremap_nocache(
-   vsoc_dev.shm_phys_start, vsoc_dev.shm_size);
+   vsoc_dev.kernel_mapped_shm = pci_iomap_wc(pdev, SHARED_MEMORY_BAR, 0);
if (!vsoc_dev.kernel_mapped_shm) {
dev_err(_dev.dev->dev, "cannot iomap region\n");
vsoc_remove_device(pdev);


[PATCH 2/3] staging: Android: vsoc: Fix a i386-randconfig warning.

2018-05-02 Thread Alistair Strachan
Fix "warning: cast to pointer from integer of different size" when
printing the region shm physical address. Use the %pa conversion
specifier and pass the resource by reference.

Cc: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Cc: Arve Hjønnevåg <a...@android.com>
Cc: Todd Kjos <tk...@android.com>
Cc: Martijn Coenen <m...@android.com>
Cc: Greg Hartman <ghart...@google.com>
Cc: de...@driverdev.osuosl.org
Cc: kernel-t...@android.com
Signed-off-by: Alistair Strachan <astrac...@google.com>
---
 drivers/staging/android/vsoc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/android/vsoc.c b/drivers/staging/android/vsoc.c
index 794137b7751f..3e6e4af7d6a1 100644
--- a/drivers/staging/android/vsoc.c
+++ b/drivers/staging/android/vsoc.c
@@ -800,8 +800,8 @@ static int vsoc_probe_device(struct pci_dev *pdev,
vsoc_dev.shm_phys_start = pci_resource_start(pdev, SHARED_MEMORY_BAR);
vsoc_dev.shm_size = pci_resource_len(pdev, SHARED_MEMORY_BAR);
 
-   dev_info(>dev, "shared memory @ DMA %p size=0x%zx\n",
-(void *)vsoc_dev.shm_phys_start, vsoc_dev.shm_size);
+   dev_info(>dev, "shared memory @ DMA %pa size=0x%zx\n",
+_dev.shm_phys_start, vsoc_dev.shm_size);
vsoc_dev.kernel_mapped_shm = pci_iomap_wc(pdev, SHARED_MEMORY_BAR, 0);
if (!vsoc_dev.kernel_mapped_shm) {
dev_err(_dev.dev->dev, "cannot iomap region\n");


[PATCH 2/3] staging: Android: vsoc: Fix a i386-randconfig warning.

2018-05-02 Thread Alistair Strachan
Fix "warning: cast to pointer from integer of different size" when
printing the region shm physical address. Use the %pa conversion
specifier and pass the resource by reference.

Cc: Greg Kroah-Hartman 
Cc: Arve Hjønnevåg 
Cc: Todd Kjos 
Cc: Martijn Coenen 
Cc: Greg Hartman 
Cc: de...@driverdev.osuosl.org
Cc: kernel-t...@android.com
Signed-off-by: Alistair Strachan 
---
 drivers/staging/android/vsoc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/android/vsoc.c b/drivers/staging/android/vsoc.c
index 794137b7751f..3e6e4af7d6a1 100644
--- a/drivers/staging/android/vsoc.c
+++ b/drivers/staging/android/vsoc.c
@@ -800,8 +800,8 @@ static int vsoc_probe_device(struct pci_dev *pdev,
vsoc_dev.shm_phys_start = pci_resource_start(pdev, SHARED_MEMORY_BAR);
vsoc_dev.shm_size = pci_resource_len(pdev, SHARED_MEMORY_BAR);
 
-   dev_info(>dev, "shared memory @ DMA %p size=0x%zx\n",
-(void *)vsoc_dev.shm_phys_start, vsoc_dev.shm_size);
+   dev_info(>dev, "shared memory @ DMA %pa size=0x%zx\n",
+_dev.shm_phys_start, vsoc_dev.shm_size);
vsoc_dev.kernel_mapped_shm = pci_iomap_wc(pdev, SHARED_MEMORY_BAR, 0);
if (!vsoc_dev.kernel_mapped_shm) {
dev_err(_dev.dev->dev, "cannot iomap region\n");


[PATCH 3/3] staging: Android: Fix sparse warnings in vsoc driver.

2018-05-02 Thread Alistair Strachan
Cc: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Cc: Arve Hjønnevåg <a...@android.com>
Cc: Todd Kjos <tk...@android.com>
Cc: Martijn Coenen <m...@android.com>
Cc: Greg Hartman <ghart...@google.com>
Cc: de...@driverdev.osuosl.org
Cc: kernel-t...@android.com
Signed-off-by: Alistair Strachan <astrac...@google.com>
---
 drivers/staging/android/vsoc.c | 100 -
 1 file changed, 49 insertions(+), 51 deletions(-)

diff --git a/drivers/staging/android/vsoc.c b/drivers/staging/android/vsoc.c
index 3e6e4af7d6a1..954ed2c5d807 100644
--- a/drivers/staging/android/vsoc.c
+++ b/drivers/staging/android/vsoc.c
@@ -81,8 +81,8 @@ struct vsoc_region_data {
atomic_t *incoming_signalled;
/* Flag indicating the guest has signalled the host. */
atomic_t *outgoing_signalled;
-   int irq_requested;
-   int device_created;
+   bool irq_requested;
+   bool device_created;
 };
 
 struct vsoc_device {
@@ -91,7 +91,7 @@ struct vsoc_device {
/* Physical address of SHARED_MEMORY_BAR. */
phys_addr_t shm_phys_start;
/* Kernel virtual address of SHARED_MEMORY_BAR. */
-   void *kernel_mapped_shm;
+   void __iomem *kernel_mapped_shm;
/* Size of the entire shared memory window in bytes. */
size_t shm_size;
/*
@@ -116,22 +116,23 @@ struct vsoc_device {
 * vsoc_region_data because the kernel deals with them as an array.
 */
struct msix_entry *msix_entries;
-   /*
-* Flags that indicate what we've initialzied. These are used to do an
-* orderly cleanup of the device.
-*/
-   char enabled_device;
-   char requested_regions;
-   char cdev_added;
-   char class_added;
-   char msix_enabled;
/* Mutex that protectes the permission list */
struct mutex mtx;
/* Major number assigned by the kernel */
int major;
-
+   /* Character device assigned by the kernel */
struct cdev cdev;
+   /* Device class assigned by the kernel */
struct class *class;
+   /*
+* Flags that indicate what we've initialized. These are used to do an
+* orderly cleanup of the device.
+*/
+   bool enabled_device;
+   bool requested_regions;
+   bool cdev_added;
+   bool class_added;
+   bool msix_enabled;
 };
 
 static struct vsoc_device vsoc_dev;
@@ -153,13 +154,13 @@ static long vsoc_ioctl(struct file *, unsigned int, 
unsigned long);
 static int vsoc_mmap(struct file *, struct vm_area_struct *);
 static int vsoc_open(struct inode *, struct file *);
 static int vsoc_release(struct inode *, struct file *);
-static ssize_t vsoc_read(struct file *, char *, size_t, loff_t *);
-static ssize_t vsoc_write(struct file *, const char *, size_t, loff_t *);
+static ssize_t vsoc_read(struct file *, char __user *, size_t, loff_t *);
+static ssize_t vsoc_write(struct file *, const char __user *, size_t, loff_t 
*);
 static loff_t vsoc_lseek(struct file *filp, loff_t offset, int origin);
 static int do_create_fd_scoped_permission(
struct vsoc_device_region *region_p,
struct fd_scoped_permission_node *np,
-   struct fd_scoped_permission_arg *__user arg);
+   struct fd_scoped_permission_arg __user *arg);
 static void do_destroy_fd_scoped_permission(
struct vsoc_device_region *owner_region_p,
struct fd_scoped_permission *perm);
@@ -198,7 +199,7 @@ inline int vsoc_validate_filep(struct file *filp)
 /* Converts from shared memory offset to virtual address */
 static inline void *shm_off_to_virtual_addr(__u32 offset)
 {
-   return vsoc_dev.kernel_mapped_shm + offset;
+   return (void __force *)vsoc_dev.kernel_mapped_shm + offset;
 }
 
 /* Converts from shared memory offset to physical address */
@@ -261,7 +262,7 @@ static struct pci_driver vsoc_pci_driver = {
 static int do_create_fd_scoped_permission(
struct vsoc_device_region *region_p,
struct fd_scoped_permission_node *np,
-   struct fd_scoped_permission_arg *__user arg)
+   struct fd_scoped_permission_arg __user *arg)
 {
struct file *managed_filp;
s32 managed_fd;
@@ -632,11 +633,11 @@ static long vsoc_ioctl(struct file *filp, unsigned int 
cmd, unsigned long arg)
return 0;
 }
 
-static ssize_t vsoc_read(struct file *filp, char *buffer, size_t len,
+static ssize_t vsoc_read(struct file *filp, char __user *buffer, size_t len,
 loff_t *poffset)
 {
__u32 area_off;
-   void *area_p;
+   const void *area_p;
ssize_t area_len;
int retval = vsoc_validate_filep(filp);
 
@@ -706,7 +707,7 @@ static loff_t vsoc_lseek(struct file *filp, loff_t offset, 
int origin)
return offset;
 }
 
-static ssize_t vsoc_write(struct file *filp, const char *buffer,
+static ssize_t vsoc_write(struct file *filp, const char __user *buffer,
  size_t len, loff_t

[PATCH 3/3] staging: Android: Fix sparse warnings in vsoc driver.

2018-05-02 Thread Alistair Strachan
Cc: Greg Kroah-Hartman 
Cc: Arve Hjønnevåg 
Cc: Todd Kjos 
Cc: Martijn Coenen 
Cc: Greg Hartman 
Cc: de...@driverdev.osuosl.org
Cc: kernel-t...@android.com
Signed-off-by: Alistair Strachan 
---
 drivers/staging/android/vsoc.c | 100 -
 1 file changed, 49 insertions(+), 51 deletions(-)

diff --git a/drivers/staging/android/vsoc.c b/drivers/staging/android/vsoc.c
index 3e6e4af7d6a1..954ed2c5d807 100644
--- a/drivers/staging/android/vsoc.c
+++ b/drivers/staging/android/vsoc.c
@@ -81,8 +81,8 @@ struct vsoc_region_data {
atomic_t *incoming_signalled;
/* Flag indicating the guest has signalled the host. */
atomic_t *outgoing_signalled;
-   int irq_requested;
-   int device_created;
+   bool irq_requested;
+   bool device_created;
 };
 
 struct vsoc_device {
@@ -91,7 +91,7 @@ struct vsoc_device {
/* Physical address of SHARED_MEMORY_BAR. */
phys_addr_t shm_phys_start;
/* Kernel virtual address of SHARED_MEMORY_BAR. */
-   void *kernel_mapped_shm;
+   void __iomem *kernel_mapped_shm;
/* Size of the entire shared memory window in bytes. */
size_t shm_size;
/*
@@ -116,22 +116,23 @@ struct vsoc_device {
 * vsoc_region_data because the kernel deals with them as an array.
 */
struct msix_entry *msix_entries;
-   /*
-* Flags that indicate what we've initialzied. These are used to do an
-* orderly cleanup of the device.
-*/
-   char enabled_device;
-   char requested_regions;
-   char cdev_added;
-   char class_added;
-   char msix_enabled;
/* Mutex that protectes the permission list */
struct mutex mtx;
/* Major number assigned by the kernel */
int major;
-
+   /* Character device assigned by the kernel */
struct cdev cdev;
+   /* Device class assigned by the kernel */
struct class *class;
+   /*
+* Flags that indicate what we've initialized. These are used to do an
+* orderly cleanup of the device.
+*/
+   bool enabled_device;
+   bool requested_regions;
+   bool cdev_added;
+   bool class_added;
+   bool msix_enabled;
 };
 
 static struct vsoc_device vsoc_dev;
@@ -153,13 +154,13 @@ static long vsoc_ioctl(struct file *, unsigned int, 
unsigned long);
 static int vsoc_mmap(struct file *, struct vm_area_struct *);
 static int vsoc_open(struct inode *, struct file *);
 static int vsoc_release(struct inode *, struct file *);
-static ssize_t vsoc_read(struct file *, char *, size_t, loff_t *);
-static ssize_t vsoc_write(struct file *, const char *, size_t, loff_t *);
+static ssize_t vsoc_read(struct file *, char __user *, size_t, loff_t *);
+static ssize_t vsoc_write(struct file *, const char __user *, size_t, loff_t 
*);
 static loff_t vsoc_lseek(struct file *filp, loff_t offset, int origin);
 static int do_create_fd_scoped_permission(
struct vsoc_device_region *region_p,
struct fd_scoped_permission_node *np,
-   struct fd_scoped_permission_arg *__user arg);
+   struct fd_scoped_permission_arg __user *arg);
 static void do_destroy_fd_scoped_permission(
struct vsoc_device_region *owner_region_p,
struct fd_scoped_permission *perm);
@@ -198,7 +199,7 @@ inline int vsoc_validate_filep(struct file *filp)
 /* Converts from shared memory offset to virtual address */
 static inline void *shm_off_to_virtual_addr(__u32 offset)
 {
-   return vsoc_dev.kernel_mapped_shm + offset;
+   return (void __force *)vsoc_dev.kernel_mapped_shm + offset;
 }
 
 /* Converts from shared memory offset to physical address */
@@ -261,7 +262,7 @@ static struct pci_driver vsoc_pci_driver = {
 static int do_create_fd_scoped_permission(
struct vsoc_device_region *region_p,
struct fd_scoped_permission_node *np,
-   struct fd_scoped_permission_arg *__user arg)
+   struct fd_scoped_permission_arg __user *arg)
 {
struct file *managed_filp;
s32 managed_fd;
@@ -632,11 +633,11 @@ static long vsoc_ioctl(struct file *filp, unsigned int 
cmd, unsigned long arg)
return 0;
 }
 
-static ssize_t vsoc_read(struct file *filp, char *buffer, size_t len,
+static ssize_t vsoc_read(struct file *filp, char __user *buffer, size_t len,
 loff_t *poffset)
 {
__u32 area_off;
-   void *area_p;
+   const void *area_p;
ssize_t area_len;
int retval = vsoc_validate_filep(filp);
 
@@ -706,7 +707,7 @@ static loff_t vsoc_lseek(struct file *filp, loff_t offset, 
int origin)
return offset;
 }
 
-static ssize_t vsoc_write(struct file *filp, const char *buffer,
+static ssize_t vsoc_write(struct file *filp, const char __user *buffer,
  size_t len, loff_t *poffset)
 {
__u32 area_off;
@@ -772,14 +773,14 @@ static int vsoc_probe_device(struct pci_dev *pdev,
pci_name(pdev), result

[PATCH v2] staging: Android: Add 'vsoc' driver for cuttlefish.

2018-04-12 Thread Alistair Strachan
From: Greg Hartman <ghart...@google.com>

The cuttlefish system is a virtual SoC architecture based on QEMU. It
uses the QEMU ivshmem feature to share memory regions between guest and
host with a custom protocol.

Cc: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Cc: Arve Hjønnevåg <a...@android.com>
Cc: Todd Kjos <tk...@android.com>
Cc: Martijn Coenen <m...@android.com>
Cc: Dan Carpenter <dan.carpen...@oracle.com>
Cc: de...@driverdev.osuosl.org
Cc: kernel-t...@android.com
Signed-off-by: Greg Hartman <ghart...@google.com>
[astrachan: rebased against 4.16, added TODO, fixed checkpatch issues]
Signed-off-by: Alistair Strachan <astrac...@google.com>
---
v2: addressed issues with class_create() failure handling and redundant
null pointer checks noticed by Dan Carpenter

 drivers/staging/android/Kconfig |9 +
 drivers/staging/android/Makefile|1 +
 drivers/staging/android/TODO|   10 +
 drivers/staging/android/uapi/vsoc_shm.h |  303 ++
 drivers/staging/android/vsoc.c  | 1169 +++
 5 files changed, 1492 insertions(+)
 create mode 100644 drivers/staging/android/uapi/vsoc_shm.h
 create mode 100644 drivers/staging/android/vsoc.c

diff --git a/drivers/staging/android/Kconfig b/drivers/staging/android/Kconfig
index 71a50b99caff..29d891355f7a 100644
--- a/drivers/staging/android/Kconfig
+++ b/drivers/staging/android/Kconfig
@@ -14,6 +14,15 @@ config ASHMEM
  It is, in theory, a good memory allocator for low-memory devices,
  because it can discard shared memory units when under memory pressure.
 
+config ANDROID_VSOC
+   tristate "Android Virtual SoC support"
+   default n
+   depends on PCI_MSI
+   ---help---
+ This option adds support for the Virtual SoC driver needed to boot
+ a 'cuttlefish' Android image inside QEmu. The driver interacts with
+ a QEmu ivshmem device. If built as a module, it will be called vsoc.
+
 source "drivers/staging/android/ion/Kconfig"
 
 endif # if ANDROID
diff --git a/drivers/staging/android/Makefile b/drivers/staging/android/Makefile
index 7cf1564a49a5..90e6154f11a4 100644
--- a/drivers/staging/android/Makefile
+++ b/drivers/staging/android/Makefile
@@ -3,3 +3,4 @@ ccflags-y += -I$(src)   # needed for trace 
events
 obj-y  += ion/
 
 obj-$(CONFIG_ASHMEM)   += ashmem.o
+obj-$(CONFIG_ANDROID_VSOC) += vsoc.o
diff --git a/drivers/staging/android/TODO b/drivers/staging/android/TODO
index 687e0eac85bf..6aab759efa48 100644
--- a/drivers/staging/android/TODO
+++ b/drivers/staging/android/TODO
@@ -11,5 +11,15 @@ ion/
  - Split /dev/ion up into multiple nodes (e.g. /dev/ion/heap0)
  - Better test framework (integration with VGEM was suggested)
 
+vsoc.c, uapi/vsoc_shm.h
+ - The current driver uses the same wait queue for all of the futexes in a
+   region. This will cause false wakeups in regions with a large number of
+   waiting threads. We should eventually use multiple queues and select the
+   queue based on the region.
+ - Add debugfs support for examining the permissions of regions.
+ - Use ioremap_wc instead of ioremap_nocache.
+ - Remove VSOC_WAIT_FOR_INCOMING_INTERRUPT ioctl. This functionality has been
+   superseded by the futex and is there for legacy reasons.
+
 Please send patches to Greg Kroah-Hartman <g...@kroah.com> and Cc:
 Arve Hjønnevåg <a...@android.com> and Riley Andrews <riandr...@android.com>
diff --git a/drivers/staging/android/uapi/vsoc_shm.h 
b/drivers/staging/android/uapi/vsoc_shm.h
new file mode 100644
index ..741b1387c25b
--- /dev/null
+++ b/drivers/staging/android/uapi/vsoc_shm.h
@@ -0,0 +1,303 @@
+/*
+ * Copyright (C) 2017 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _UAPI_LINUX_VSOC_SHM_H
+#define _UAPI_LINUX_VSOC_SHM_H
+
+#include 
+
+/**
+ * A permission is a token that permits a receiver to read and/or write an area
+ * of memory within a Vsoc region.
+ *
+ * An fd_scoped permission grants both read and write access, and can be
+ * attached to a file description (see open(2)).
+ * Ownership of the area can then be shared by passing a file descriptor
+ * among processes.
+ *
+ * begin_offset and end_offset define the area of memory that is controlled by
+ * the permission. owner_offset points to a word, also in shared memory, that
+ * controls ownership of the area.
+ *
+ * ownership of the region expires when the assoc

[PATCH v2] staging: Android: Add 'vsoc' driver for cuttlefish.

2018-04-12 Thread Alistair Strachan
From: Greg Hartman 

The cuttlefish system is a virtual SoC architecture based on QEMU. It
uses the QEMU ivshmem feature to share memory regions between guest and
host with a custom protocol.

Cc: Greg Kroah-Hartman 
Cc: Arve Hjønnevåg 
Cc: Todd Kjos 
Cc: Martijn Coenen 
Cc: Dan Carpenter 
Cc: de...@driverdev.osuosl.org
Cc: kernel-t...@android.com
Signed-off-by: Greg Hartman 
[astrachan: rebased against 4.16, added TODO, fixed checkpatch issues]
Signed-off-by: Alistair Strachan 
---
v2: addressed issues with class_create() failure handling and redundant
null pointer checks noticed by Dan Carpenter

 drivers/staging/android/Kconfig |9 +
 drivers/staging/android/Makefile|1 +
 drivers/staging/android/TODO|   10 +
 drivers/staging/android/uapi/vsoc_shm.h |  303 ++
 drivers/staging/android/vsoc.c  | 1169 +++
 5 files changed, 1492 insertions(+)
 create mode 100644 drivers/staging/android/uapi/vsoc_shm.h
 create mode 100644 drivers/staging/android/vsoc.c

diff --git a/drivers/staging/android/Kconfig b/drivers/staging/android/Kconfig
index 71a50b99caff..29d891355f7a 100644
--- a/drivers/staging/android/Kconfig
+++ b/drivers/staging/android/Kconfig
@@ -14,6 +14,15 @@ config ASHMEM
  It is, in theory, a good memory allocator for low-memory devices,
  because it can discard shared memory units when under memory pressure.
 
+config ANDROID_VSOC
+   tristate "Android Virtual SoC support"
+   default n
+   depends on PCI_MSI
+   ---help---
+ This option adds support for the Virtual SoC driver needed to boot
+ a 'cuttlefish' Android image inside QEmu. The driver interacts with
+ a QEmu ivshmem device. If built as a module, it will be called vsoc.
+
 source "drivers/staging/android/ion/Kconfig"
 
 endif # if ANDROID
diff --git a/drivers/staging/android/Makefile b/drivers/staging/android/Makefile
index 7cf1564a49a5..90e6154f11a4 100644
--- a/drivers/staging/android/Makefile
+++ b/drivers/staging/android/Makefile
@@ -3,3 +3,4 @@ ccflags-y += -I$(src)   # needed for trace 
events
 obj-y  += ion/
 
 obj-$(CONFIG_ASHMEM)   += ashmem.o
+obj-$(CONFIG_ANDROID_VSOC) += vsoc.o
diff --git a/drivers/staging/android/TODO b/drivers/staging/android/TODO
index 687e0eac85bf..6aab759efa48 100644
--- a/drivers/staging/android/TODO
+++ b/drivers/staging/android/TODO
@@ -11,5 +11,15 @@ ion/
  - Split /dev/ion up into multiple nodes (e.g. /dev/ion/heap0)
  - Better test framework (integration with VGEM was suggested)
 
+vsoc.c, uapi/vsoc_shm.h
+ - The current driver uses the same wait queue for all of the futexes in a
+   region. This will cause false wakeups in regions with a large number of
+   waiting threads. We should eventually use multiple queues and select the
+   queue based on the region.
+ - Add debugfs support for examining the permissions of regions.
+ - Use ioremap_wc instead of ioremap_nocache.
+ - Remove VSOC_WAIT_FOR_INCOMING_INTERRUPT ioctl. This functionality has been
+   superseded by the futex and is there for legacy reasons.
+
 Please send patches to Greg Kroah-Hartman  and Cc:
 Arve Hjønnevåg  and Riley Andrews 
diff --git a/drivers/staging/android/uapi/vsoc_shm.h 
b/drivers/staging/android/uapi/vsoc_shm.h
new file mode 100644
index ..741b1387c25b
--- /dev/null
+++ b/drivers/staging/android/uapi/vsoc_shm.h
@@ -0,0 +1,303 @@
+/*
+ * Copyright (C) 2017 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _UAPI_LINUX_VSOC_SHM_H
+#define _UAPI_LINUX_VSOC_SHM_H
+
+#include 
+
+/**
+ * A permission is a token that permits a receiver to read and/or write an area
+ * of memory within a Vsoc region.
+ *
+ * An fd_scoped permission grants both read and write access, and can be
+ * attached to a file description (see open(2)).
+ * Ownership of the area can then be shared by passing a file descriptor
+ * among processes.
+ *
+ * begin_offset and end_offset define the area of memory that is controlled by
+ * the permission. owner_offset points to a word, also in shared memory, that
+ * controls ownership of the area.
+ *
+ * ownership of the region expires when the associated file description is
+ * released.
+ *
+ * At most one permission can be attached to each file description.
+ *
+ * This is useful when implementing HALs like gralloc that scope and pass
+ * ownership of shared resources via file descriptors.
+ *
+ * The caller is responsibe f

[PATCH] staging: Android: Add 'vsoc' driver for cuttlefish.

2018-04-10 Thread Alistair Strachan
From: Greg Hartman <ghart...@google.com>

The cuttlefish system is a virtual SoC architecture based on QEMU. It
uses the QEMU ivshmem feature to share memory regions between guest and
host with a custom protocol.

Cc: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Cc: Arve Hjønnevåg <a...@android.com>
Cc: Todd Kjos <tk...@android.com>
Cc: Martijn Coenen <m...@android.com>
Cc: de...@driverdev.osuosl.org
Cc: kernel-t...@android.com
Signed-off-by: Greg Hartman <ghart...@google.com>
[astrachan: rebased against 4.16, added TODO, fixed checkpatch issues]
Signed-off-by: Alistair Strachan <astrac...@google.com>
---
 drivers/staging/android/Kconfig |9 +
 drivers/staging/android/Makefile|1 +
 drivers/staging/android/TODO|   10 +
 drivers/staging/android/uapi/vsoc_shm.h |  303 ++
 drivers/staging/android/vsoc.c  | 1167 +++
 5 files changed, 1490 insertions(+)
 create mode 100644 drivers/staging/android/uapi/vsoc_shm.h
 create mode 100644 drivers/staging/android/vsoc.c

diff --git a/drivers/staging/android/Kconfig b/drivers/staging/android/Kconfig
index 71a50b99caff..29d891355f7a 100644
--- a/drivers/staging/android/Kconfig
+++ b/drivers/staging/android/Kconfig
@@ -14,6 +14,15 @@ config ASHMEM
  It is, in theory, a good memory allocator for low-memory devices,
  because it can discard shared memory units when under memory pressure.
 
+config ANDROID_VSOC
+   tristate "Android Virtual SoC support"
+   default n
+   depends on PCI_MSI
+   ---help---
+ This option adds support for the Virtual SoC driver needed to boot
+ a 'cuttlefish' Android image inside QEmu. The driver interacts with
+ a QEmu ivshmem device. If built as a module, it will be called vsoc.
+
 source "drivers/staging/android/ion/Kconfig"
 
 endif # if ANDROID
diff --git a/drivers/staging/android/Makefile b/drivers/staging/android/Makefile
index 7cf1564a49a5..90e6154f11a4 100644
--- a/drivers/staging/android/Makefile
+++ b/drivers/staging/android/Makefile
@@ -3,3 +3,4 @@ ccflags-y += -I$(src)   # needed for trace 
events
 obj-y  += ion/
 
 obj-$(CONFIG_ASHMEM)   += ashmem.o
+obj-$(CONFIG_ANDROID_VSOC) += vsoc.o
diff --git a/drivers/staging/android/TODO b/drivers/staging/android/TODO
index 687e0eac85bf..6aab759efa48 100644
--- a/drivers/staging/android/TODO
+++ b/drivers/staging/android/TODO
@@ -11,5 +11,15 @@ ion/
  - Split /dev/ion up into multiple nodes (e.g. /dev/ion/heap0)
  - Better test framework (integration with VGEM was suggested)
 
+vsoc.c, uapi/vsoc_shm.h
+ - The current driver uses the same wait queue for all of the futexes in a
+   region. This will cause false wakeups in regions with a large number of
+   waiting threads. We should eventually use multiple queues and select the
+   queue based on the region.
+ - Add debugfs support for examining the permissions of regions.
+ - Use ioremap_wc instead of ioremap_nocache.
+ - Remove VSOC_WAIT_FOR_INCOMING_INTERRUPT ioctl. This functionality has been
+   superseded by the futex and is there for legacy reasons.
+
 Please send patches to Greg Kroah-Hartman <g...@kroah.com> and Cc:
 Arve Hjønnevåg <a...@android.com> and Riley Andrews <riandr...@android.com>
diff --git a/drivers/staging/android/uapi/vsoc_shm.h 
b/drivers/staging/android/uapi/vsoc_shm.h
new file mode 100644
index ..741b1387c25b
--- /dev/null
+++ b/drivers/staging/android/uapi/vsoc_shm.h
@@ -0,0 +1,303 @@
+/*
+ * Copyright (C) 2017 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _UAPI_LINUX_VSOC_SHM_H
+#define _UAPI_LINUX_VSOC_SHM_H
+
+#include 
+
+/**
+ * A permission is a token that permits a receiver to read and/or write an area
+ * of memory within a Vsoc region.
+ *
+ * An fd_scoped permission grants both read and write access, and can be
+ * attached to a file description (see open(2)).
+ * Ownership of the area can then be shared by passing a file descriptor
+ * among processes.
+ *
+ * begin_offset and end_offset define the area of memory that is controlled by
+ * the permission. owner_offset points to a word, also in shared memory, that
+ * controls ownership of the area.
+ *
+ * ownership of the region expires when the associated file description is
+ * released.
+ *
+ * At most one permission can be attached to each file description.
+ *
+ * This is useful when implementing HALs like gr

[PATCH] staging: Android: Add 'vsoc' driver for cuttlefish.

2018-04-10 Thread Alistair Strachan
From: Greg Hartman 

The cuttlefish system is a virtual SoC architecture based on QEMU. It
uses the QEMU ivshmem feature to share memory regions between guest and
host with a custom protocol.

Cc: Greg Kroah-Hartman 
Cc: Arve Hjønnevåg 
Cc: Todd Kjos 
Cc: Martijn Coenen 
Cc: de...@driverdev.osuosl.org
Cc: kernel-t...@android.com
Signed-off-by: Greg Hartman 
[astrachan: rebased against 4.16, added TODO, fixed checkpatch issues]
Signed-off-by: Alistair Strachan 
---
 drivers/staging/android/Kconfig |9 +
 drivers/staging/android/Makefile|1 +
 drivers/staging/android/TODO|   10 +
 drivers/staging/android/uapi/vsoc_shm.h |  303 ++
 drivers/staging/android/vsoc.c  | 1167 +++
 5 files changed, 1490 insertions(+)
 create mode 100644 drivers/staging/android/uapi/vsoc_shm.h
 create mode 100644 drivers/staging/android/vsoc.c

diff --git a/drivers/staging/android/Kconfig b/drivers/staging/android/Kconfig
index 71a50b99caff..29d891355f7a 100644
--- a/drivers/staging/android/Kconfig
+++ b/drivers/staging/android/Kconfig
@@ -14,6 +14,15 @@ config ASHMEM
  It is, in theory, a good memory allocator for low-memory devices,
  because it can discard shared memory units when under memory pressure.
 
+config ANDROID_VSOC
+   tristate "Android Virtual SoC support"
+   default n
+   depends on PCI_MSI
+   ---help---
+ This option adds support for the Virtual SoC driver needed to boot
+ a 'cuttlefish' Android image inside QEmu. The driver interacts with
+ a QEmu ivshmem device. If built as a module, it will be called vsoc.
+
 source "drivers/staging/android/ion/Kconfig"
 
 endif # if ANDROID
diff --git a/drivers/staging/android/Makefile b/drivers/staging/android/Makefile
index 7cf1564a49a5..90e6154f11a4 100644
--- a/drivers/staging/android/Makefile
+++ b/drivers/staging/android/Makefile
@@ -3,3 +3,4 @@ ccflags-y += -I$(src)   # needed for trace 
events
 obj-y  += ion/
 
 obj-$(CONFIG_ASHMEM)   += ashmem.o
+obj-$(CONFIG_ANDROID_VSOC) += vsoc.o
diff --git a/drivers/staging/android/TODO b/drivers/staging/android/TODO
index 687e0eac85bf..6aab759efa48 100644
--- a/drivers/staging/android/TODO
+++ b/drivers/staging/android/TODO
@@ -11,5 +11,15 @@ ion/
  - Split /dev/ion up into multiple nodes (e.g. /dev/ion/heap0)
  - Better test framework (integration with VGEM was suggested)
 
+vsoc.c, uapi/vsoc_shm.h
+ - The current driver uses the same wait queue for all of the futexes in a
+   region. This will cause false wakeups in regions with a large number of
+   waiting threads. We should eventually use multiple queues and select the
+   queue based on the region.
+ - Add debugfs support for examining the permissions of regions.
+ - Use ioremap_wc instead of ioremap_nocache.
+ - Remove VSOC_WAIT_FOR_INCOMING_INTERRUPT ioctl. This functionality has been
+   superseded by the futex and is there for legacy reasons.
+
 Please send patches to Greg Kroah-Hartman  and Cc:
 Arve Hjønnevåg  and Riley Andrews 
diff --git a/drivers/staging/android/uapi/vsoc_shm.h 
b/drivers/staging/android/uapi/vsoc_shm.h
new file mode 100644
index ..741b1387c25b
--- /dev/null
+++ b/drivers/staging/android/uapi/vsoc_shm.h
@@ -0,0 +1,303 @@
+/*
+ * Copyright (C) 2017 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _UAPI_LINUX_VSOC_SHM_H
+#define _UAPI_LINUX_VSOC_SHM_H
+
+#include 
+
+/**
+ * A permission is a token that permits a receiver to read and/or write an area
+ * of memory within a Vsoc region.
+ *
+ * An fd_scoped permission grants both read and write access, and can be
+ * attached to a file description (see open(2)).
+ * Ownership of the area can then be shared by passing a file descriptor
+ * among processes.
+ *
+ * begin_offset and end_offset define the area of memory that is controlled by
+ * the permission. owner_offset points to a word, also in shared memory, that
+ * controls ownership of the area.
+ *
+ * ownership of the region expires when the associated file description is
+ * released.
+ *
+ * At most one permission can be attached to each file description.
+ *
+ * This is useful when implementing HALs like gralloc that scope and pass
+ * ownership of shared resources via file descriptors.
+ *
+ * The caller is responsibe for doing any fencing.
+ *
+ * The calling process will normally identify a currently free area of
+ * memory. It will construct a proposed fd_scoped_perm

[PATCH] sync: Fix memory corruption in sync_timeline_signal().

2015-03-24 Thread Alistair Strachan
The android_fence_release() function checks for active sync points
by calling list_empty() on the list head embedded on the sync
point. However, it is only valid to use list_empty() on nodes that
have been initialized with INIT_LIST_HEAD() or list_del_init().

Because the list entry has likely been removed from the active list
by sync_timeline_signal(), there is a good chance that this
WARN_ON_ONCE() will be hit due to dangling pointers pointing at
freed memory (even though the sync drivers did nothing wrong)
and memory corruption will ensue as the list entry is removed for
a second time, corrupting the active list.

This problem can be reproduced quite easily with CONFIG_DEBUG_LIST=y
and fences with more than one sync point.

Signed-off-by: Alistair Strachan 
Cc: Maarten Lankhorst 
Cc: Greg Kroah-Hartman 
Cc: Colin Cross 
---
 drivers/staging/android/sync.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/android/sync.c b/drivers/staging/android/sync.c
index 7bdb62b..f83e00c 100644
--- a/drivers/staging/android/sync.c
+++ b/drivers/staging/android/sync.c
@@ -114,7 +114,7 @@ void sync_timeline_signal(struct sync_timeline *obj)
list_for_each_entry_safe(pt, next, >active_list_head,
 active_list) {
if (fence_is_signaled_locked(>base))
-   list_del(>active_list);
+   list_del_init(>active_list);
}
 
spin_unlock_irqrestore(>child_list_lock, flags);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] sync: Fix memory corruption in sync_timeline_signal().

2015-03-24 Thread Alistair Strachan
The android_fence_release() function checks for active sync points
by calling list_empty() on the list head embedded on the sync
point. However, it is only valid to use list_empty() on nodes that
have been initialized with INIT_LIST_HEAD() or list_del_init().

Because the list entry has likely been removed from the active list
by sync_timeline_signal(), there is a good chance that this
WARN_ON_ONCE() will be hit due to dangling pointers pointing at
freed memory (even though the sync drivers did nothing wrong)
and memory corruption will ensue as the list entry is removed for
a second time, corrupting the active list.

This problem can be reproduced quite easily with CONFIG_DEBUG_LIST=y
and fences with more than one sync point.

Change-Id: Ie2a6bc1480bbcfdc14f9b385fca5a2b833effc05

Signed-off-by: Alistair Strachan 
Cc: Maarten Lankhorst 
Cc: Greg Kroah-Hartman 
Cc: Colin Cross 
---
 drivers/staging/android/sync.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/android/sync.c b/drivers/staging/android/sync.c
index 7bdb62b..f83e00c 100644
--- a/drivers/staging/android/sync.c
+++ b/drivers/staging/android/sync.c
@@ -114,7 +114,7 @@ void sync_timeline_signal(struct sync_timeline *obj)
list_for_each_entry_safe(pt, next, >active_list_head,
 active_list) {
if (fence_is_signaled_locked(>base))
-   list_del(>active_list);
+   list_del_init(>active_list);
}
 
spin_unlock_irqrestore(>child_list_lock, flags);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] sync: Fix memory corruption in sync_timeline_signal().

2015-03-24 Thread Alistair Strachan
The android_fence_release() function checks for active sync points
by calling list_empty() on the list head embedded on the sync
point. However, it is only valid to use list_empty() on nodes that
have been initialized with INIT_LIST_HEAD() or list_del_init().

Because the list entry has likely been removed from the active list
by sync_timeline_signal(), there is a good chance that this
WARN_ON_ONCE() will be hit due to dangling pointers pointing at
freed memory (even though the sync drivers did nothing wrong)
and memory corruption will ensue as the list entry is removed for
a second time, corrupting the active list.

This problem can be reproduced quite easily with CONFIG_DEBUG_LIST=y
and fences with more than one sync point.

Signed-off-by: Alistair Strachan alistair.strac...@imgtec.com
Cc: Maarten Lankhorst maarten.lankho...@canonical.com
Cc: Greg Kroah-Hartman gre...@linuxfoundation.org
Cc: Colin Cross ccr...@google.com
---
 drivers/staging/android/sync.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/android/sync.c b/drivers/staging/android/sync.c
index 7bdb62b..f83e00c 100644
--- a/drivers/staging/android/sync.c
+++ b/drivers/staging/android/sync.c
@@ -114,7 +114,7 @@ void sync_timeline_signal(struct sync_timeline *obj)
list_for_each_entry_safe(pt, next, obj-active_list_head,
 active_list) {
if (fence_is_signaled_locked(pt-base))
-   list_del(pt-active_list);
+   list_del_init(pt-active_list);
}
 
spin_unlock_irqrestore(obj-child_list_lock, flags);
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] sync: Fix memory corruption in sync_timeline_signal().

2015-03-24 Thread Alistair Strachan
The android_fence_release() function checks for active sync points
by calling list_empty() on the list head embedded on the sync
point. However, it is only valid to use list_empty() on nodes that
have been initialized with INIT_LIST_HEAD() or list_del_init().

Because the list entry has likely been removed from the active list
by sync_timeline_signal(), there is a good chance that this
WARN_ON_ONCE() will be hit due to dangling pointers pointing at
freed memory (even though the sync drivers did nothing wrong)
and memory corruption will ensue as the list entry is removed for
a second time, corrupting the active list.

This problem can be reproduced quite easily with CONFIG_DEBUG_LIST=y
and fences with more than one sync point.

Change-Id: Ie2a6bc1480bbcfdc14f9b385fca5a2b833effc05

Signed-off-by: Alistair Strachan alistair.strac...@imgtec.com
Cc: Maarten Lankhorst maarten.lankho...@canonical.com
Cc: Greg Kroah-Hartman gre...@linuxfoundation.org
Cc: Colin Cross ccr...@google.com
---
 drivers/staging/android/sync.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/android/sync.c b/drivers/staging/android/sync.c
index 7bdb62b..f83e00c 100644
--- a/drivers/staging/android/sync.c
+++ b/drivers/staging/android/sync.c
@@ -114,7 +114,7 @@ void sync_timeline_signal(struct sync_timeline *obj)
list_for_each_entry_safe(pt, next, obj-active_list_head,
 active_list) {
if (fence_is_signaled_locked(pt-base))
-   list_del(pt-active_list);
+   list_del_init(pt-active_list);
}
 
spin_unlock_irqrestore(obj-child_list_lock, flags);
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/