Re: [RFC] fs: add userspace critical mounts event support

2016-09-02 Thread Dmitry Torokhov
On Fri, Sep 2, 2016 at 9:11 PM, Linus Torvalds
 wrote:
> On Fri, Sep 2, 2016 at 5:20 PM, Luis R. Rodriguez  wrote:
>>
>> Thoughts ?
>
> I really think this is a horrible hack.
>
> It's basically the kernel giving up, and relying on user space to give
> a single flag, and it's broken nasty crap.  Worse, it's broken nasty
> crap with a user interface, so we'll be stuck with it forever. Please
> no.

I agree that interface is bad, but I do believe we need something like this...

>
> What are the drivers that need this, and why can't those drivers just
> be fixed to not do braindead things?

Like what? Some devices do need to have firmware loaded so we know
their capabilities, so we really can't push the firmware loading into
"open". If your touch controller for some reason decided to crap over
it's nvram and comes in bootloader mode it is nice for it to slurp in
config data or firmware so use does not have to trigger update
manually. And while the controller is in bootloader mode we can't
create input device because we do not know what capabilities to
declare.

These devices we want to probe asynchronously and simply tell firmware
loader to wait for firmware to become available. The problem we do not
know when to give up, since we do not know where the firmware might
be. But userspace knows and can tel us.

Thanks.

-- 
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] fs: add userspace critical mounts event support

2016-09-02 Thread Luis R. Rodriguez
kernel_read_file_from_path() can try to read a file from
the system's filesystem. This is typically done for firmware
for instance, which lives in /lib/firmware. One issue with
this is that the kernel cannot know for sure when the real
final /lib/firmare/ is ready, and even if you use initramfs
drivers are currently initialized *first* prior to the initramfs
kicking off. During init we run through all init calls first
(do_initcalls()) and finally the initramfs is processed via
prepare_namespace():

do_basic_setup() {
   ...
   driver_init();
   ...
   do_initcalls();
   ...
}

kernel_init_freeable() {
   ...
   do_basic_setup();
   ...
   if (sys_access((const char __user *) ramdisk_execute_command, 0) != 0) {
  ramdisk_execute_command = NULL;
  prepare_namespace();
   }
}

This leaves a possible race between loading drivers and any uses
of kernel_read_file_from_path(). Because pivot_root() can be used,
this allows userspace further possibilities in terms of defining
when a kernel critical filesystem should be ready by.

We define kernel critical filesystems as filesystems which the
kernel needs for kernel_read_file_from_path(). Since only userspace
can know when kernel critical filesystems are mounted and ready,
let userspace notify the kernel of this, and enable a new kernel
configuration which lets the kernel wait for this event before
enabling reads from kernel_read_file_from_path().

A default timeout of 10s is used for now. You can override this
through the kernel-parameters using critical_mounts_timeout_ms=T
where T is in ms. cat /sys/kernel/critical_mounts_timeout_ms the
current system value.

When userspace is ready it can simply:

  echo 1 > /sys/kernel/critical_mounts_ready

Signed-off-by: Luis R. Rodriguez 
---

Note, this still leaves the puzzle of the fact that initramfs may carry
some firmware, and some drivers may be OK in using firmware from there,
the wait stuff would just get in the way. To address this I think can
perhaps instead check *one* for the file, and if its present immediately
give it back, we'd only resort to the wait in cases of failure.

Another thing -- other than firmware we have:

security/integrity/ima/ima_fs.c:rc = kernel_read_file_from_path(path, 
, , 0, READING_POLICY);
sound/oss/sound_firmware.h: err = kernel_read_file_from_path((char *)fn, 
(void **)fp, ,

What paths are these? So we can document the current uses in the Kconfig
at least.

Thoughts ?

 Documentation/kernel-parameters.txt |  6 +++
 drivers/base/Kconfig| 48 +++
 fs/exec.c   |  3 ++
 include/linux/fs.h  |  8 
 kernel/ksysfs.c | 77 +
 5 files changed, 142 insertions(+)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 8ccacc44622a..1af89faa9fc9 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -849,6 +849,12 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
It will be ignored when crashkernel=X,high is not used
or memory reserved is below 4G.
 
+   critical_mounts_timeout_ms=T[KNL] timeout in ms
+   Format: 
+   Use this to override the kernel's default timeout for
+   waiting for critical system mount points to become
+   available.
+
cryptomgr.notests
 [KNL] Disable crypto self-tests
 
diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index 12b4f5551501..21576c0a4898 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -25,6 +25,54 @@ config UEVENT_HELPER_PATH
  via /proc/sys/kernel/hotplug or via /sys/kernel/uevent_helper
  later at runtime.
 
+config CRITICAL_MOUNTS_WAIT
+   bool "Enable waiting for critical-filesystems-ready notification"
+   default n
+   help
+ Kernel subsystems and device drivers often need to read files
+ from the filesystem, however in doing this races are possible at
+ bootup -- the subsystem requesting the file might look for it in /
+ early in boot, but if we haven't yet mounted the real root
+ filesystem we'll just tell the subsystem the file is not present and
+ it will fail. Furthermore what path to the filesystem is used varies
+ depending on the subsystem. To help the kernel we provide the option
+ to let the kernel wait for all critical filesystems to mounted and
+ ready before letting the kernel start trying to read files from the
+ systems' filesystem. Since pivot_root() can be used and therefore a
+ system might be configured to change its / filesystem at bootup as
+ many times as it wishes, only userspace can realy know exactly when
+ all critical filesystems are 

Re: [PATCH v2 2/5] firmware: annotate thou shalt not request fw on init or probe

2016-09-02 Thread Luis R. Rodriguez
On Thu, Aug 25, 2016 at 09:41:33PM +0200, Luis R. Rodriguez wrote:
> On Thu, Aug 25, 2016 at 01:05:44PM +0200, Daniel Vetter wrote:
> > On Wed, Aug 24, 2016 at 10:39 PM, Luis R. Rodriguez  
> > wrote:
> > Some users want a fully running gfx stack 2s after power-on. There's
> > not even enough time to load an uefi or vga driver first. i915
> > directly initializes the gpu from power-on state on those.
> 
> I see.. thanks.
> 
> > >> I think what would work is loading the different subsystems of the
> > >> driver in parallel (we already do that largely)
> > >
> > > Init level stuff is actually pretty synchronous, and in fact both
> > > init and probe are called serially. There are a few subsystems which
> > > have been doing things a bit differently, but these are exceptions.
> > >
> > > When you say we already do this largely, can you describe a bit more
> > > precisely what you mean ?
> > 
> > Oh, this isn't subsystems as in linux device/driver model, but
> > different parts within the driver. We fire up a bunch of struct work
> > to get various bits done asynchronously.
> 
> Thanks for the clarification.

<-- snip -->

> > One of the proposals which would have worked for us died a while back:
> > 
> > https://lkml.org/lkml/2014/6/15/47
> 
> Interesting, the problem stated here, which comes from the fact (as clarified
> above) the rootfs comes up only *after* we run do_initcalls() as such there 
> are
> possible theoretical races even with request_firmware_nowait() -- as such that
> justifies even more the SmPL grammar rule to include even 
> request_firmware_nowait()
> on probe / init as this patch has.
> 
> Anyway yeah that patch has good intentions but its wrong for a slew of 
> reasons.
> The problem is the same as discussed with Bjorn recently. The solution
> discussed is that since only userspace knows when the *real* rootfs is ready
> (because of slew of reasons) the best we can do is have an event issued by
> userspace to inform us of that.

OK I have something I think we can build upon for this. Will send RFC.

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] doc: ioctl: Add some clarifications to botching-up-ioctls

2016-09-02 Thread Arnd Bergmann
On Friday, September 2, 2016 3:42:24 PM CEST Laura Abbott wrote:
> - The guide currently says to pad the structure to a multiple of
>   64-bits. This is not necessary in cases where the structure contains
>   no 64-bit types. Clarify this concept to avoid unnecessary padding.
> - When using __u64 to hold user pointers, blindly trying to do a cast to
>   a void __user * may generate a warning on 32-bit systems about a cast
>   from an integer to a pointer of different size. There is a macro to
>   deal with this which hides an ugly double cast. Add a reference to
>   this macro.
> 
> Signed-off-by: Laura Abbott 
> 

Looks good to me,

Acked-by: Arnd Bergmann 
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] doc: ioctl: Add some clarifications to botching-up-ioctls

2016-09-02 Thread Laura Abbott

- The guide currently says to pad the structure to a multiple of
  64-bits. This is not necessary in cases where the structure contains
  no 64-bit types. Clarify this concept to avoid unnecessary padding.
- When using __u64 to hold user pointers, blindly trying to do a cast to
  a void __user * may generate a warning on 32-bit systems about a cast
  from an integer to a pointer of different size. There is a macro to
  deal with this which hides an ugly double cast. Add a reference to
  this macro.

Signed-off-by: Laura Abbott 
---
I can split these into two separate patches if necessary but it seemed simpler
just to colapse them into a single patch.
---
 Documentation/ioctl/botching-up-ioctls.txt | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/Documentation/ioctl/botching-up-ioctls.txt 
b/Documentation/ioctl/botching-up-ioctls.txt
index cc30b14..36138c6 100644
--- a/Documentation/ioctl/botching-up-ioctls.txt
+++ b/Documentation/ioctl/botching-up-ioctls.txt
@@ -34,15 +34,18 @@ will need to add a a 32-bit compat layer:
64-bit platforms do. So we always need padding to the natural size to get
this right.
 
- * Pad the entire struct to a multiple of 64-bits - the structure size will
-   otherwise differ on 32-bit versus 64-bit. Having a different structure size
-   hurts when passing arrays of structures to the kernel, or if the kernel
-   checks the structure size, which e.g. the drm core does.
+ * Pad the entire struct to a multiple of 64-bits if the structure contains
+   64-bit types - the structure size will otherwise differ on 32-bit versus
+   64-bit. Having a different structure size hurts when passing arrays of
+   structures to the kernel, or if the kernel checks the structure size, which
+   e.g. the drm core does.
 
  * Pointers are __u64, cast from/to a uintprt_t on the userspace side and
from/to a void __user * in the kernel. Try really hard not to delay this
conversion or worse, fiddle the raw __u64 through your code since that
-   diminishes the checking tools like sparse can provide.
+   diminishes the checking tools like sparse can provide. The macro
+   u64_to_user_ptr can be used in the kernel to avoid warnings about integers
+   and pointres of different sizes.
 
 
 Basics
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/5] firmware: annotate thou shalt not request fw on init or probe

2016-09-02 Thread Luis R. Rodriguez
On Wed, Aug 24, 2016 at 10:17:38AM +0200, Gabriel Paubert wrote:
> On Tue, Aug 23, 2016 at 05:45:04PM -0700, mcg...@kernel.org wrote:
> 
> [snip]
> > ---
> >  Documentation/firmware_class/README|  20 
> >  drivers/base/Kconfig   |   2 +-
> >  .../request_firmware-avoid-init-probe-init.cocci   | 130 
> > +
> >  3 files changed, 151 insertions(+), 1 deletion(-)
> >  create mode 100644 
> > scripts/coccinelle/api/request_firmware-avoid-init-probe-init.cocci
> > 
> > diff --git a/Documentation/firmware_class/README 
> > b/Documentation/firmware_class/README
> > index cafdca8b3b15..056d1cb9d365 100644
> > --- a/Documentation/firmware_class/README
> > +++ b/Documentation/firmware_class/README
> > @@ -93,6 +93,26 @@
> > user contexts to request firmware asynchronously, but can't be called
> > in atomic contexts.
> >  
> > +Requirements:
> > +=
> > +
> > +You should avoid at all costs requesting firmware on both init and probe 
> > paths
> > +of your device driver. Reason for this is the complexity needed to ensure a
> > +firmware will be available for a driver early in boot through different
> > +build configurations. Consider built-in drivers needing firmware early, or
> > +consider a driver assuming it will only get firmware after pivot_root().
> > +
> > +Drivers that really need firmware early should use stuff the firmware in
> 
> Minor grammatical nit: s/use//

Fixed, thanks.

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 07/20] x86: Provide general kernel support for memory encryption

2016-09-02 Thread Borislav Petkov
On Mon, Aug 22, 2016 at 05:36:46PM -0500, Tom Lendacky wrote:
> Adding general kernel support for memory encryption includes:
> - Modify and create some page table macros to include the Secure Memory
>   Encryption (SME) memory encryption mask
> - Update kernel boot support to call an SME routine that checks for and
>   sets the SME capability (the SME routine will grow later and for now
>   is just a stub routine)
> - Update kernel boot support to call an SME routine that encrypts the
>   kernel (the SME routine will grow later and for now is just a stub
>   routine)
> - Provide an SME initialization routine to update the protection map with
>   the memory encryption mask so that it is used by default
> 
> Signed-off-by: Tom Lendacky 
> ---

...

> diff --git a/arch/x86/include/asm/mem_encrypt.h 
> b/arch/x86/include/asm/mem_encrypt.h
> index 747fc52..9f3e762 100644
> --- a/arch/x86/include/asm/mem_encrypt.h
> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -15,12 +15,21 @@
>  
>  #ifndef __ASSEMBLY__
>  
> +#include 
> +
>  #ifdef CONFIG_AMD_MEM_ENCRYPT
>  
>  extern unsigned long sme_me_mask;
>  
>  u8 sme_get_me_loss(void);
>  
> +void __init sme_early_init(void);
> +
> +#define __sme_pa(x)  (__pa((x)) | sme_me_mask)
> +#define __sme_pa_nodebug(x)  (__pa_nodebug((x)) | sme_me_mask)
> +
> +#define __sme_va(x)  (__va((x) & ~sme_me_mask))

So I'm wondering: why not push the masking off of the SME mask into the
__va() macro instead of defining a specific __sme_va() one?

I mean, do you even see cases where __va() would need to have to
sme_mask left in the virtual address?

Because if not, you could mask it out in __va() so that all __va() users
get the "clean" va, without the enc bits.

Hmmm.

Btw, this patch is huuuge. It would be nice if you could split it, if
possible...

Thanks.

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v15 04/13] task_isolation: add initial support

2016-09-02 Thread Andy Lutomirski
On Sep 2, 2016 7:04 AM, "Chris Metcalf"  wrote:
>
> On 8/30/2016 3:50 PM, Andy Lutomirski wrote:
>>
>> On Tue, Aug 30, 2016 at 12:37 PM, Chris Metcalf  
>> wrote:
>>>
>>> On 8/30/2016 2:43 PM, Andy Lutomirski wrote:

 What if we did it the other way around: set a percpu flag saying
 "going quiescent; disallow new deferred work", then finish all
 existing work and return to userspace.  Then, on the next entry, clear
 that flag.  With the flag set, vmstat would just flush anything that
 it accumulates immediately, nothing would be added to the LRU list,
 etc.
>>>
>>>
>>> This is an interesting idea!
>>>
>>> However, there are a number of implementation ideas that make me
>>> worry that it might be a trickier approach overall.
>>>
>>> First, "on the next entry" hides a world of hurt in four simple words.
>>> Some platforms (arm64 and tile, that I'm familiar with) have a common
>>> chunk of code that always runs on every entry to the kernel.  It would
>>> not be too hard to poke at the assembly and make those platforms
>>> always run some task-isolation specific code on entry.  But x86 scares
>>> me - there seem to be a whole lot of ways to get into the kernel, and
>>> I'm not convinced there is a lot of shared macrology or whatever that
>>> would make it straightforward to intercept all of them.
>>
>> Just use the context tracking entry hook.  It's 100% reliable.  The
>> relevant x86 function is enter_from_user_mode(), but I would just hook
>> into user_exit() in the common code.  (This code is had better be
>> reliable, because context tracking depends on it, and, if context
>> tracking doesn't work on a given arch, then isolation isn't going to
>> work regardless.
>
>
> This looks a lot cleaner than last time I looked at the x86 code. So yes, I 
> think
> we could do an entry-point approach plausibly now.
>
> This is also good for when we want to look at deferring the kernel TLB flush,
> since it's the same mechanism that would be required for that.
>
>

There's at least one gotcha for the latter: NMIs aren't currently
guaranteed to go through context tracking.  Instead they use their own
RCU hooks.  Deferred TLB flushes can still be made to work, but a bit
more care will be needed.  I would probably approach it with an
additional NMI hook in the same places as rcu_nmi_enter() that does,
more or less:

if (need_tlb_flush) flush();

and then make sure that the normal exit hook looks like:

if (need_tlb_flush) {
  flush();
  barrier(); /* An NMI must not see !need_tlb_flush if the TLB hasn't
been flushed */
  flush the TLB;
}

>>> But it does seem like we are adding noticeable maintenance cost on
>>> the mainline kernel to support task isolation by doing this.  My guess
>>> is that it is easier to support the kind of "are you clean?" / "get clean"
>>> APIs for subsystems, rather than weaving a whole set of "stay clean"
>>> mechanism into each subsystem.
>>
>> My intuition is that it's the other way around.  For the mainline
>> kernel, having a nice clean well-integrated implementation is nicer
>> than having a bolted-on implementation that interacts in potentially
>> complicated ways.  Once quiescence support is in mainline, the size of
>> the diff or the degree to which it's scattered around is irrelevant
>> because it's not a diff any more.
>
>
> I'm not concerned with the size of the diff, just with the intrusiveness into
> the various subsystems.
>
> That said, code talks, so let me take a swing at doing it the way you suggest
> for vmstat/lru and we'll see what it looks like.

Thanks :)

>
>
>>> So to pop up a level, what is your actual concern about the existing
>>> "do it in a loop" model?  The macrology currently in use means there
>>> is zero cost if you don't configure TASK_ISOLATION, and the software
>>> maintenance cost seems low since the idioms used for task isolation
>>> in the loop are generally familiar to people reading that code.
>>
>> My concern is that it's not obvious to readers of the code that the
>> loop ever terminates.  It really ought to, but it's doing something
>> very odd.  Normally we can loop because we get scheduled out, but
>> actually blocking in the return-to-userspace path, especially blocking
>> on a condition that doesn't have a wakeup associated with it, is odd.
>
>
> True, although, comments :-)
>
> Regardless, though, this doesn't seem at all weird to me in the
> context of the vmstat and lru stuff, though.  It's exactly parallel to
> the fact that we loop around on checking need_resched and signal, and
> in some cases you could imagine multiple loops around when we schedule
> out and get a signal, so loop around again, and then another
> reschedule event happens during signal processing so we go around
> again, etc.  Eventually it settles down.  It's the same with the
> vmstat/lru stuff.

Only kind of.

When we say, effectively, while (need_resched()) schedule();, we're
not waiting for an 

Re: [PATCH] Fix how CSS files are added to the Sphinx HTML.

2016-09-02 Thread Eric Holscher
On Fri, Sep 2, 2016 at 12:38 AM, Markus Heiser
 wrote:
>
>
>
> >
> > Signed-off-by: Eric Holscher 
> > ---
> > Documentation/conf.py | 14 --
> > 1 file changed, 8 insertions(+), 6 deletions(-)
> >
> > diff --git a/Documentation/conf.py b/Documentation/conf.py
> > index 96b7aa6..1707193 100644
> > --- a/Documentation/conf.py
> > +++ b/Documentation/conf.py
> > @@ -179,12 +179,6 @@ except ImportError:
> >
> > html_static_path = ['sphinx-static']
> >
> > -html_context = {
> > -'css_files': [
> > -'_static/theme_overrides.css',
> > -],
> > -}
> > -
> > # Add any extra paths that contain custom files (such as robots.txt or
> > # .htaccess) here, relative to this directory. These files are copied
> > # directly to the root of the documentation.
> > @@ -419,3 +413,11 @@ pdf_documents = [
> > # line arguments.
> > kerneldoc_bin = '../scripts/kernel-doc'
> > kerneldoc_srctree = '..'
> > +
> > +
> > +def setup(app):
> > +"""
> > +This is a basic Sphinx extension that adds our CSS overrides.
> > +"""
> > +app.add_stylesheet('_static/theme_overrides.css')
>
> Hmm, this will not work, Since the '_static' folder is not needed
> to name here. As far as I know, with
>
>html_static_path = ['sphinx-static']
>
> everything from the 'sphinx-static' is copied to HTML builder's
> '_static' folder. And a (corrected)
>
>   app.add_stylesheet('theme_overrides.css')
>
> will insert a:
>
> 
>
> in the HTML header of the builder (the '_static' folder is added by the HTML
> builder).
>
> Anyway, let me say thanks ... and it would be great if you add
> more of your value to the ML ;-)

I feel silly now! I actually only tested it on Read the Docs, where I
set up a project building from my remote on GitHub. This had the
correct styles working again, but you're correct that it is actually
404'ing on the ``_static/_static/theme_override.css``.

I've updated my patch on my remote to have the correct line, and it's
working correctly now: http://kernel-mirror.readthedocs.io/en/latest/
-- it has the correct HTML:



I'm happy to resend the whole patch, or feel free to simply edit my
original with your suggested
``app.add_stylesheet('theme_overrides.css')`` line.

Cheers,
Eric

-- 
Eric Holscher
Maker of the internet residing in Portland, Oregon
http://ericholscher.com
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v15 04/13] task_isolation: add initial support

2016-09-02 Thread Peter Zijlstra
On Fri, Sep 02, 2016 at 10:03:52AM -0400, Chris Metcalf wrote:
> Any thoughts on the question of "just re-enter the loop" vs. 
> schedule_timeout()?

schedule_timeout() should only be used for things you do not have
control over, like things outside of the machine.

If you want to actually block running, use that completion you were
talking of.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v15 04/13] task_isolation: add initial support

2016-09-02 Thread Chris Metcalf

On 8/30/2016 3:50 PM, Andy Lutomirski wrote:

On Tue, Aug 30, 2016 at 12:37 PM, Chris Metcalf  wrote:

On 8/30/2016 2:43 PM, Andy Lutomirski wrote:

What if we did it the other way around: set a percpu flag saying
"going quiescent; disallow new deferred work", then finish all
existing work and return to userspace.  Then, on the next entry, clear
that flag.  With the flag set, vmstat would just flush anything that
it accumulates immediately, nothing would be added to the LRU list,
etc.


This is an interesting idea!

However, there are a number of implementation ideas that make me
worry that it might be a trickier approach overall.

First, "on the next entry" hides a world of hurt in four simple words.
Some platforms (arm64 and tile, that I'm familiar with) have a common
chunk of code that always runs on every entry to the kernel.  It would
not be too hard to poke at the assembly and make those platforms
always run some task-isolation specific code on entry.  But x86 scares
me - there seem to be a whole lot of ways to get into the kernel, and
I'm not convinced there is a lot of shared macrology or whatever that
would make it straightforward to intercept all of them.

Just use the context tracking entry hook.  It's 100% reliable.  The
relevant x86 function is enter_from_user_mode(), but I would just hook
into user_exit() in the common code.  (This code is had better be
reliable, because context tracking depends on it, and, if context
tracking doesn't work on a given arch, then isolation isn't going to
work regardless.


This looks a lot cleaner than last time I looked at the x86 code. So yes, I 
think
we could do an entry-point approach plausibly now.

This is also good for when we want to look at deferring the kernel TLB flush,
since it's the same mechanism that would be required for that.


But it does seem like we are adding noticeable maintenance cost on
the mainline kernel to support task isolation by doing this.  My guess
is that it is easier to support the kind of "are you clean?" / "get clean"
APIs for subsystems, rather than weaving a whole set of "stay clean"
mechanism into each subsystem.

My intuition is that it's the other way around.  For the mainline
kernel, having a nice clean well-integrated implementation is nicer
than having a bolted-on implementation that interacts in potentially
complicated ways.  Once quiescence support is in mainline, the size of
the diff or the degree to which it's scattered around is irrelevant
because it's not a diff any more.


I'm not concerned with the size of the diff, just with the intrusiveness into
the various subsystems.

That said, code talks, so let me take a swing at doing it the way you suggest
for vmstat/lru and we'll see what it looks like.


So to pop up a level, what is your actual concern about the existing
"do it in a loop" model?  The macrology currently in use means there
is zero cost if you don't configure TASK_ISOLATION, and the software
maintenance cost seems low since the idioms used for task isolation
in the loop are generally familiar to people reading that code.

My concern is that it's not obvious to readers of the code that the
loop ever terminates.  It really ought to, but it's doing something
very odd.  Normally we can loop because we get scheduled out, but
actually blocking in the return-to-userspace path, especially blocking
on a condition that doesn't have a wakeup associated with it, is odd.


True, although, comments :-)

Regardless, though, this doesn't seem at all weird to me in the
context of the vmstat and lru stuff, though.  It's exactly parallel to
the fact that we loop around on checking need_resched and signal, and
in some cases you could imagine multiple loops around when we schedule
out and get a signal, so loop around again, and then another
reschedule event happens during signal processing so we go around
again, etc.  Eventually it settles down.  It's the same with the
vmstat/lru stuff.


Also, this cond_resched stuff doesn't worry me too much at a
fundamental level -- if we're really going quiescent, shouldn't we be
able to arrange that there are no other schedulable tasks on the CPU
in question?

We aren't currently planning to enforce things in the scheduler, so if
the application affinitizes another task on top of an existing task
isolation task, by default the task isolation task just dies. (Unless
it's using NOSIG mode, in which case it just ends up stuck in the
kernel trying to wait out the dyntick until you either kill it, or
re-affinitize the offending task.)  But I'm reluctant to guarantee
every possible way that you might (perhaps briefly) have some
schedulable task, and the current approach seems pretty robust if that
sort of thing happens.

This kind of waiting out the dyntick scares me.  Why is there ever a
dyntick that you're waiting out?  If quiescence is to be a supported
mainline feature, shouldn't the scheduler be integrated well enough
with it that you don't need to wait 

Re: [PATCH] doc-rst:sphinx-extensions: add metadata parallel-safe

2016-09-02 Thread Mauro Carvalho Chehab
Em Thu, 1 Sep 2016 18:22:09 +0200
Markus Heiser  escreveu:

> Am 01.09.2016 um 16:29 schrieb Jani Nikula :
> 
> > On Thu, 01 Sep 2016, Jonathan Corbet  wrote:  
> >> On Wed, 24 Aug 2016 15:35:24 +0200
> >> Markus Heiser  wrote:
> >>   
> >>> With metadata "parallel_read_safe = True" a extension is marked as
> >>> save for "parallel reading of source". This is needed if you want
> >>> build in parallel with N processes. E.g.:
> >>> 
> >>>  make SPHINXOPTS=-j4 htmldocs  
> >> 
> >> A definite improvement; applied to the docs tree, thanks.  
> > 
> > The Sphinx docs say -jN "should be considered experimental" [1]. Any
> > idea *how* experimental that is, really? Could we add some -j by
> > default?  
> 
> My experience is, that parallel build is only strong on "reading
> input" and weak on "writing output". I can't see any rich performance
> increase on more than -j2 ... 
> 
> Mauro posted [2] his experience with -j8 compared to serial. He
> also compares -j8 to -j16:
> 
> > PS: on my server with 16 dual-thread Xeon CPU, the gain with a
> > bigger value for -j was not impressive. Got about the same time as
> > with -j8 or -j32 there.  
> 
> I guess he will get nearly the same results with -j2 ;)
> 
> If we want to add a -j default, I suggest -j2. 

Actually, here I got better results with -j4, on my NUC with
one quad-core, 8 threads Intel(R) Core(TM) i7-6770HQ CPU @ 2.60GHz
and m2SATA SSD disk. 

This is with -j2:

$ make DOCBOOKS="" SPHINXOPTS="-j2" SPHINXDIRS=media 
SPHINX_CONF="conf_nitpick.py" htmldocs 2>err

real0m46.568s
user1m0.676s
sys 0m3.019s

And this is with -j4:

$ make DOCBOOKS="" SPHINXOPTS="-j4" SPHINXDIRS=media 
SPHINX_CONF="conf_nitpick.py" htmldocs 2>err

real0m25.356s
user1m1.408s
sys 0m2.912s

Btw, this is the result on a dual octa-core Intel(R) Xeon(R) CPU E5-2670 0 
@ 2.60GHz, CPU (total of 32 threads), using PCIe SSD disks:

$ for i in $(seq 32 -1 1); do echo "with SPHINXOPTS= -j$i"; make 
cleandocs;/usr/bin/time --format="real %E nuser %U sys %S" make DOCBOOKS=""  
SPHINXOPTS="-j$i" SPHINXDIRS=media SPHINX_CONF="conf.py" SPHINXDIRS=media 
htmldocs >err; done
with SPHINXOPTS= -j32
make[2]: warning: jobserver unavailable: using -j1.  Add '+' to parent make 
rule.
real 0:23.05 nuser 98.77 sys 6.45
with SPHINXOPTS= -j31
make[2]: warning: jobserver unavailable: using -j1.  Add '+' to parent make 
rule.
real 0:22.81 nuser 97.80 sys 6.12
with SPHINXOPTS= -j30
make[2]: warning: jobserver unavailable: using -j1.  Add '+' to parent make 
rule.
real 0:23.16 nuser 97.68 sys 6.41
with SPHINXOPTS= -j29
make[2]: warning: jobserver unavailable: using -j1.  Add '+' to parent make 
rule.
real 0:22.74 nuser 98.02 sys 6.33
with SPHINXOPTS= -j28
make[2]: warning: jobserver unavailable: using -j1.  Add '+' to parent make 
rule.
real 0:23.18 nuser 95.75 sys 6.14
with SPHINXOPTS= -j27
make[2]: warning: jobserver unavailable: using -j1.  Add '+' to parent make 
rule.
real 0:22.67 nuser 96.66 sys 6.27
with SPHINXOPTS= -j26
make[2]: warning: jobserver unavailable: using -j1.  Add '+' to parent make 
rule.
real 0:22.66 nuser 95.93 sys 6.50
with SPHINXOPTS= -j25
make[2]: warning: jobserver unavailable: using -j1.  Add '+' to parent make 
rule.
real 0:23.71 nuser 95.43 sys 6.43
with SPHINXOPTS= -j24
make[2]: warning: jobserver unavailable: using -j1.  Add '+' to parent make 
rule.
real 0:23.71 nuser 94.27 sys 6.42
with SPHINXOPTS= -j23
make[2]: warning: jobserver unavailable: using -j1.  Add '+' to parent make 
rule.
real 0:23.46 nuser 93.85 sys 6.35
with SPHINXOPTS= -j22
make[2]: warning: jobserver unavailable: using -j1.  Add '+' to parent make 
rule.
real 0:23.28 nuser 91.52 sys 6.29
with SPHINXOPTS= -j21
make[2]: warning: jobserver unavailable: using -j1.  Add '+' to parent make 
rule.
real 0:23.93 nuser 90.37 sys 6.14
with SPHINXOPTS= -j20
make[2]: warning: jobserver unavailable: using -j1.  Add '+' to parent make 
rule.
real 0:24.88 nuser 91.40 sys 6.36
with SPHINXOPTS= -j19
make[2]: warning: jobserver unavailable: using -j1.  Add '+' to parent make 
rule.
real 0:24.00 nuser 89.68 sys 5.82
with SPHINXOPTS= -j18
make[2]: warning: jobserver unavailable: using -j1.  Add '+' to parent make 
rule.
real 0:23.73 nuser 89.68 sys 5.92
with SPHINXOPTS= -j17
make[2]: warning: jobserver unavailable: using -j1.  Add '+' to parent make 
rule.
real 0:23.75 nuser 87.85 sys 5.58
with SPHINXOPTS= -j16
make[2]: warning: jobserver unavailable: using -j1.  Add '+' to parent make 
rule.
real 0:24.54 nuser 87.87 sys 5.94
with SPHINXOPTS= -j15
make[2]: warning: jobserver unavailable: using -j1.  Add '+' to parent make 
rule.
real 0:25.45 nuser 88.25 sys 6.28
with SPHINXOPTS= -j14
make[2]: warning: jobserver unavailable: using -j1.  Add '+' to parent make 
rule.
real 0:23.79 nuser 87.23 sys 5.80
with SPHINXOPTS= -j13
make[2]: warning: jobserver unavailable: using -j1.  Add '+' to parent make 
rule.
real 0:24.15 

Re: [PATCH v4 2/4] drm: Add API for capturing frame CRCs

2016-09-02 Thread Emil Velikov
Hi Tomeu,

On 5 August 2016 at 11:45, Tomeu Vizoso  wrote:

> +#ifdef CONFIG_DEBUG_FS
> +   spin_lock_init(>crc.lock);
> +   init_waitqueue_head(>crc.wq);
> +   crtc->crc.source = kstrdup("auto", GFP_KERNEL);
Pedantic: kstrdup() can never fail ?

> +#endif
> +
> if (drm_core_check_feature(dev, DRIVER_ATOMIC)) {
> drm_object_attach_property(>base, config->prop_active, 
> 0);
> drm_object_attach_property(>base, config->prop_mode_id, 
> 0);
> @@ -764,6 +771,11 @@ void drm_crtc_cleanup(struct drm_crtc *crtc)
>  * the indices on the drm_crtc after us in the crtc_list.
>  */
>
> +#ifdef CONFIG_DEBUG_FS
> +   drm_debugfs_crtc_remove(crtc);
> +   kfree(crtc->crc.source);
> +#endif
> +
Worth adding these two as static inlines in the header ?

Things are a bit asymetrical here. drm_debugfs_crtc_add() is in
drm_minor_register(), so why isn't drm_debugfs_crtc_remove() in
drm_minor_free() ?



> -#endif /* CONFIG_DEBUG_FS */
> +int drm_debugfs_crtc_add(struct drm_crtc *crtc)
> +{
> +   struct drm_minor *minor = crtc->dev->primary;
> +   struct dentry *root;
> +   char *name;
> +
> +   name = kasprintf(GFP_KERNEL, "crtc-%d", crtc->index);
Mildly related: some userspace projects define convenience helpers
which you use in order to allocate the correct amount of space on
stack.
Is there such a set for the kernel ?

With those the above will become something like:
char name[5+DECIMAL_STR_MAX] = ksprintf("crtc-%d", crtc->index);



> --- /dev/null
> +++ b/drivers/gpu/drm/drm_debugfs_crc.c

> + *
> + * Authors:
> + *Eric Anholt 
> + *Keith Packard 
> + *
Wondering about these authorship lines - is the code copied/derived
from somewhere ? If so please mention where from.

> +/*
> + * 1 frame field of 10 chars plus a number of CRC fields of 10 chars each, 
> space
> + * separated and with a newline at the end.
> + */
> +#define LINE_LEN(VALUES_CNT)   (10 + 11 * VALUES_CNT + 1 + 1)
Hmm I seem to suck at math. The macro seems larger than expected (comment and
code wise) by 1, right ?

> +#define MAX_LINE_LEN   (LINE_LEN(DRM_MAX_CRC_NR))
> +
Worth using lowercase VALUES_CNT and less likely to clash macro names ?



> +   BUILD_BUG_ON_NOT_POWER_OF_2(DRM_CRC_ENTRIES_NR);
> +   crc->tail = (crc->tail + 1) & (DRM_CRC_ENTRIES_NR - 1);
It's strange that the kernel does not have a macro for this. But for the sake
of me I cannot find one.



> --- a/drivers/gpu/drm/drm_drv.c
> +++ b/drivers/gpu/drm/drm_drv.c
> @@ -192,6 +192,7 @@ static void drm_minor_free(struct drm_device *dev, 
> unsigned int type)
>  static int drm_minor_register(struct drm_device *dev, unsigned int type)
>  {
> struct drm_minor *minor;
> +   struct drm_crtc *crtc;
> unsigned long flags;
> int ret;
>
> @@ -207,6 +208,14 @@ static int drm_minor_register(struct drm_device *dev, 
> unsigned int type)
> return ret;
> }
>
> +   if (type == DRM_MINOR_LEGACY) {
Sidenote: If we did not use DRM_MINOR_LEGACY for render-only GPUs we would
save a couple of cycles here :-)

> +   drm_for_each_crtc(crtc, dev) {
> +   ret = drm_debugfs_crtc_add(crtc);
> +   if (ret)
> +   goto err_debugfs;
IMHO we don't want to bring down the whole setup if this fails, unlike
where we cannot create debug/dri all together.

Regardless: we seems to be missing the cleanup in the error path ?



> +#if defined(CONFIG_DEBUG_FS)
> +extern const struct file_operations drm_crc_control_fops;
> +extern const struct file_operations drm_crtc_crc_fops;
These seems to be named differently in the code. Namely:

drm_crtc_crc_control_fops
drm_crtc_crc_data_fops

Then again, you don't seem to use the outside of the file, so I'd make them
static and drop this hunk.



>  * before data structures are torndown.
>  */
> void (*early_unregister)(struct drm_crtc *crtc);
> +
> +   /**
> +* @set_crc_source:
> +*
> +* Changes the source of CRC checksums of frames at the request of
> +* userspace, typically for testing purposes. The sources available 
> are
> +* specific of each driver and a %NULL value indicates that CRC
> +* generation is to be switched off.
> +*
> +* When CRC generation is enabled, the driver should call
> +* drm_crtc_add_crc_entry() at each frame, providing any information
> +* that characterizes the frame contents in the crcN arguments, as
> +* provided from the configured source. Drivers should accept a "auto"
Very nicely written documentation. Just to make it more obvious - s/should/must/


> +/**
> + * struct drm_crtc_crc_entry - entry describing a frame's content
> + * @frame: number of the frame this CRC is about
> + * @crc: array of values that characterize the frame
> + */

Re: [PATCH V4] leds: trigger: Introduce an USB port trigger

2016-09-02 Thread Alan Stern
On Fri, 2 Sep 2016, Jacek Anaszewski wrote:

> >>> I'm pretty sure noone ever planned to have more than 1 trigger
> >>> assigned to a single LED. I just realized there will be a problem with
> >>> proposed solution: sysfs files conflict.

...

> >> Currently we support only triggers dedicated to specific type of
> >> devices. Even in case of triggers that don't expose custom sysfs
> >> attributes, registered with led_trigger_register_simple(), device
> >> drivers have to generate trigger event with dedicated function, e.g:
> >>
> >> - ledtrig-cpu: void ledtrig_cpu(enum cpu_led_event ledevt)
> >> - ledtrig-disk: void ledtrig_disk_activity(void)
> >> - ledtrig-mtd: void ledtrig_mtd_activity(void)
> >>
> >> If one wanted to have the LED notified by different type of devices,
> >> then they would have to implement a trigger that would exposed all
> >> required types of API.
> >>
> >> Unfortunately, there are many possible combinations of
> >> triggers and that doesn't sound sane to add a new one when someone
> >> will find it useful. Of course it would entail adding a call to the
> >> new trigger API in the drivers, which doesn't seem like something
> >> acceptable in the mainline.
> >
> > Maybe it would make more sense, in this case, to allow only three
> > possibilities for a USB port activity trigger.  Toggle the LED
> > whenever:
> >
> > There is activity on the specified port, or
> >
> > There is any activity on any port on the specified hub, or
> >
> > There is any USB activity on any port.
> >
> > That ought to cover most of the normal use cases, and it would be
> > simple enough to implement.
> 
> What would be the benefit of having a USB port activity trigger,
> for which we would be specifying the port to observe, but in the same
> time we would react on any activity on any port (cases 1 and 3)?

I meant these three cases to be mutually exclusive.  For a given LED,
you could have only one of those trigger types (like mentioned above,
only one trigger per LED).  For example, you might accept any one of:

echo usb1-4.2 >/sys/class/led/foo/trigger

echo hub1-4 >/sys/class/led/foo/trigger

echo usb >/sys/class/led/foo/trigger

Yes, it would be possible to have a port-specific trigger for one LED
and an overall USB activity trigger for another LED.  I don't know how
useful this would be -- you could probably imagine some unlikely
scenario.

The point is that doing things this way wouldn't require any API
violations, and it would allow users to do almost all of the things
they are likely to want.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v15 04/13] task_isolation: add initial support

2016-09-02 Thread Chris Metcalf

On 9/1/2016 6:06 AM, Peter Zijlstra wrote:

On Tue, Aug 30, 2016 at 11:32:16AM -0400, Chris Metcalf wrote:

On 8/30/2016 3:58 AM, Peter Zijlstra wrote:

What !? I really don't get this, what are you waiting for? Why is
rescheduling making things better.

We need to wait for the last dyntick to fire before we can return to
userspace.  There are plenty of options as to what we can do in the
meanwhile.

Why not keep your _TIF_TASK_ISOLATION_FOO flag set and re-enter the
loop?

I really don't see how setting TIF_NEED_RESCHED is helping anything.


Yes, I think I addressed that in an earlier reply to Frederic; and you're right,
I don't think TIF_NEED_RESCHED or schedule() are the way to go.

https://lkml.kernel.org/g/107bd666-dbcf-7fa5-ff9c-f79358899...@mellanox.com

Any thoughts on the question of "just re-enter the loop" vs. schedule_timeout()?

--
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 05/20] x86: Add the Secure Memory Encryption cpu feature

2016-09-02 Thread Borislav Petkov
On Mon, Aug 22, 2016 at 05:36:22PM -0500, Tom Lendacky wrote:
> Update the cpu features to include identifying and reporting on the
> Secure Memory Encryption feature.
> 
> Signed-off-by: Tom Lendacky 
> ---
>  arch/x86/include/asm/cpufeature.h|7 +--
>  arch/x86/include/asm/cpufeatures.h   |5 -
>  arch/x86/include/asm/disabled-features.h |3 ++-
>  arch/x86/include/asm/required-features.h |3 ++-
>  arch/x86/kernel/cpu/scattered.c  |1 +
>  5 files changed, 14 insertions(+), 5 deletions(-)

...

> diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
> index 8cb57df..d86d9a5 100644
> --- a/arch/x86/kernel/cpu/scattered.c
> +++ b/arch/x86/kernel/cpu/scattered.c
> @@ -37,6 +37,7 @@ void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
>   { X86_FEATURE_HW_PSTATE,CR_EDX, 7, 0x8007, 0 },
>   { X86_FEATURE_CPB,  CR_EDX, 9, 0x8007, 0 },
>   { X86_FEATURE_PROC_FEEDBACK,CR_EDX,11, 0x8007, 0 },
> + { X86_FEATURE_SME,  CR_EAX, 0, 0x801f, 0 },

If this is in scattered CPUID features, it doesn't need any of the
(snipped) changes above - you solely need to reuse one of the free
defines, i.e., something like this:

---
--- a/arch/x86/include/asm/cpufeatures.h2016-09-02 15:49:08.853374323 
+0200
+++ b/arch/x86/include/asm/cpufeatures.h2016-09-02 15:52:34.477365610 
+0200
@@ -100,7 +100,7 @@
 #define X86_FEATURE_XTOPOLOGY  ( 3*32+22) /* cpu topology enum extensions */
 #define X86_FEATURE_TSC_RELIABLE ( 3*32+23) /* TSC is known to be reliable */
 #define X86_FEATURE_NONSTOP_TSC( 3*32+24) /* TSC does not stop in C 
states */
-/* free, was #define X86_FEATURE_CLFLUSH_MONITOR ( 3*32+25) * "" clflush reqd 
with monitor */
+#define X86_FEATURE_SME( 3*32+25) /* Secure Memory Encryption 
*/
 #define X86_FEATURE_EXTD_APICID( 3*32+26) /* has extended APICID (8 
bits) */
 #define X86_FEATURE_AMD_DCM ( 3*32+27) /* multi-node processor */
 #define X86_FEATURE_APERFMPERF ( 3*32+28) /* APERFMPERF */
--- a/arch/x86/kernel/cpu/scattered.c   2016-09-02 15:48:52.753375005 +0200
+++ b/arch/x86/kernel/cpu/scattered.c   2016-09-02 15:51:32.437368239 +0200
@@ -37,6 +37,7 @@ void init_scattered_cpuid_features(struc
{ X86_FEATURE_HW_PSTATE,CR_EDX, 7, 0x8007, 0 },
{ X86_FEATURE_CPB,  CR_EDX, 9, 0x8007, 0 },
{ X86_FEATURE_PROC_FEEDBACK,CR_EDX,11, 0x8007, 0 },
+   { X86_FEATURE_SME,  CR_EAX, 0, 0x801f, 0 },
{ 0, 0, 0, 0, 0 }
};

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 12/18] arm64: ilp32: add sys_ilp32.c and a separate table (in entry.S) to use it

2016-09-02 Thread Yury Norov
On Fri, Sep 02, 2016 at 02:55:34PM +0200, Arnd Bergmann wrote:
> On Friday, September 2, 2016 6:46:19 PM CEST Bamvor Jian Zhang wrote:
> > diff --git a/arch/arm64/include/uapi/asm/unistd.h 
> > b/arch/arm64/include/uapi/asm/unistd.h
> > index 043d17a..78bea1d 100644
> > --- a/arch/arm64/include/uapi/asm/unistd.h
> > +++ b/arch/arm64/include/uapi/asm/unistd.h
> > @@ -16,4 +16,9 @@
> > 
> >  #define __ARCH_WANT_RENAMEAT
> > 
> > +/* We need to make sure it works for both userspace and 
> > kernel(sys_ilp32.c) */
> > +#if defined(__ILP32__) || defined(__SYSCALL_COMPAT)
> > +#define __ARCH_WANT_SYNC_FILE_RANGE2
> > +#endif
> > +
> >  #include 
> > diff --git a/arch/arm64/kernel/sys_ilp32.c b/arch/arm64/kernel/sys_ilp32.c
> > index 10fc0ca..13c9c9d 100644
> > --- a/arch/arm64/kernel/sys_ilp32.c
> > +++ b/arch/arm64/kernel/sys_ilp32.c
> > @@ -42,7 +42,7 @@
> >  #define compat_sys_pwrite64compat_sys_pwrite64_wrapper
> >  #define compat_sys_readahead   compat_sys_readahead_wrapper
> >  #define compat_sys_shmat   sys_shmat
> > -#define compat_sys_sync_file_range compat_sys_sync_file_range2_wrapper
> > +#define compat_sys_sync_file_range2compat_sys_sync_file_range2_wrapper
> >  #define compat_sys_truncate64  compat_sys_truncate64_wrapper
> >  #define sys_mmap2  compat_sys_mmap2_wrapper
> >  #define sys_ptrace compat_sys_ptrace
> > 
> 
> Looks good to me.
> 
>   Arnd

Thank you. I'll take it.

Yury.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 12/18] arm64: ilp32: add sys_ilp32.c and a separate table (in entry.S) to use it

2016-09-02 Thread Arnd Bergmann
On Friday, September 2, 2016 6:46:19 PM CEST Bamvor Jian Zhang wrote:
> diff --git a/arch/arm64/include/uapi/asm/unistd.h 
> b/arch/arm64/include/uapi/asm/unistd.h
> index 043d17a..78bea1d 100644
> --- a/arch/arm64/include/uapi/asm/unistd.h
> +++ b/arch/arm64/include/uapi/asm/unistd.h
> @@ -16,4 +16,9 @@
> 
>  #define __ARCH_WANT_RENAMEAT
> 
> +/* We need to make sure it works for both userspace and kernel(sys_ilp32.c) 
> */
> +#if defined(__ILP32__) || defined(__SYSCALL_COMPAT)
> +#define __ARCH_WANT_SYNC_FILE_RANGE2
> +#endif
> +
>  #include 
> diff --git a/arch/arm64/kernel/sys_ilp32.c b/arch/arm64/kernel/sys_ilp32.c
> index 10fc0ca..13c9c9d 100644
> --- a/arch/arm64/kernel/sys_ilp32.c
> +++ b/arch/arm64/kernel/sys_ilp32.c
> @@ -42,7 +42,7 @@
>  #define compat_sys_pwrite64compat_sys_pwrite64_wrapper
>  #define compat_sys_readahead   compat_sys_readahead_wrapper
>  #define compat_sys_shmat   sys_shmat
> -#define compat_sys_sync_file_range compat_sys_sync_file_range2_wrapper
> +#define compat_sys_sync_file_range2compat_sys_sync_file_range2_wrapper
>  #define compat_sys_truncate64  compat_sys_truncate64_wrapper
>  #define sys_mmap2  compat_sys_mmap2_wrapper
>  #define sys_ptrace compat_sys_ptrace
> 

Looks good to me.

Arnd

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 03/20] x86: Secure Memory Encryption (SME) build enablement

2016-09-02 Thread Borislav Petkov
On Mon, Aug 22, 2016 at 05:35:59PM -0500, Tom Lendacky wrote:
> Provide the Kconfig support to build the SME support in the kernel.
> 
> Signed-off-by: Tom Lendacky 
> ---
>  arch/x86/Kconfig |9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index c580d8c..131f329 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1357,6 +1357,15 @@ config X86_DIRECT_GBPAGES
> supports them), so don't confuse the user by printing
> that we have them enabled.
>  
> +config AMD_MEM_ENCRYPT
> + bool "Secure Memory Encryption support for AMD"

 "AMD Secure Memory Encryption support"

> + depends on X86_64 && CPU_SUP_AMD
> + ---help---
> +   Say yes to enable the encryption of system memory. This requires
> +   an AMD processor that supports Secure Memory Encryption (SME).
> +   The encryption of system memory is disabled by default but can be
> +   enabled with the mem_encrypt=on command line option.
> +
>  # Common NUMA Features
>  config NUMA
>   bool "Numa Memory Allocation and Scheduler Support"

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 12/18] arm64: ilp32: add sys_ilp32.c and a separate table (in entry.S) to use it

2016-09-02 Thread Bamvor Jian Zhang
Hi, Yury

On 08/17/2016 07:46 PM, Yury Norov wrote:
> From: Andrew Pinski 
> 
[...]
> diff --git a/arch/arm64/kernel/sys_ilp32.c b/arch/arm64/kernel/sys_ilp32.c
> new file mode 100644
> index 000..10fc0ca
> --- /dev/null
> +++ b/arch/arm64/kernel/sys_ilp32.c
> @@ -0,0 +1,86 @@
> +/*
> + * AArch64- ILP32 specific system calls implementation
> + *
> + * Copyright (C) 2016 Cavium Inc.
> + * Author: Andrew Pinski 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see .
> + */
> +
> +#define __SYSCALL_COMPAT
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +/*
> + * Using aarch32 syscall handlerss where off_t is passed.
> + */
> +#define compat_sys_fadvise64_64  compat_sys_fadvise64_64_wrapper
> +#define compat_sys_fallocate compat_sys_fallocate_wrapper
> +#define compat_sys_fcntl64   sys_fcntl
> +#define compat_sys_ftruncate64   compat_sys_ftruncate64_wrapper
> +#define compat_sys_pread64   compat_sys_pread64_wrapper
> +#define compat_sys_pwrite64  compat_sys_pwrite64_wrapper
> +#define compat_sys_readahead compat_sys_readahead_wrapper
> +#define compat_sys_shmat sys_shmat
> +#define compat_sys_sync_file_range   compat_sys_sync_file_range2_wrapper
> +#define compat_sys_truncate64compat_sys_truncate64_wrapper
When we test sync_file_range with our own testcase, we found that the parameter
sequence is not correct for sync_file_range2 because glibc think kernel use
the sync_file_range. So, in this patch we force to sync_file_range2.

Is it make sense to you?

Regards

Bamvor

>From e730e4db23bca4dd0ff6bcca0bc4c04e5c13b5c7 Mon Sep 17 00:00:00 2001
From: Bamvor Jian Zhang 
Date: Sat, 27 Aug 2016 12:26:31 +0800
Subject: [PATCH] arm64:ilp32: force sync_file_range2

Define __ARCH_WANT_SYNC_FILE_RANGE2 in order to select correct
sync_file_range parameters sequence in glibc and kernel.

Tested-by: Jianguo Chen 
Signed-off-by: Bamvor Jian Zhang 
---
 arch/arm64/include/uapi/asm/unistd.h | 5 +
 arch/arm64/kernel/sys_ilp32.c| 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/uapi/asm/unistd.h 
b/arch/arm64/include/uapi/asm/unistd.h
index 043d17a..78bea1d 100644
--- a/arch/arm64/include/uapi/asm/unistd.h
+++ b/arch/arm64/include/uapi/asm/unistd.h
@@ -16,4 +16,9 @@

 #define __ARCH_WANT_RENAMEAT

+/* We need to make sure it works for both userspace and kernel(sys_ilp32.c) */
+#if defined(__ILP32__) || defined(__SYSCALL_COMPAT)
+#define __ARCH_WANT_SYNC_FILE_RANGE2
+#endif
+
 #include 
diff --git a/arch/arm64/kernel/sys_ilp32.c b/arch/arm64/kernel/sys_ilp32.c
index 10fc0ca..13c9c9d 100644
--- a/arch/arm64/kernel/sys_ilp32.c
+++ b/arch/arm64/kernel/sys_ilp32.c
@@ -42,7 +42,7 @@
 #define compat_sys_pwrite64compat_sys_pwrite64_wrapper
 #define compat_sys_readahead   compat_sys_readahead_wrapper
 #define compat_sys_shmat   sys_shmat
-#define compat_sys_sync_file_range compat_sys_sync_file_range2_wrapper
+#define compat_sys_sync_file_range2compat_sys_sync_file_range2_wrapper
 #define compat_sys_truncate64  compat_sys_truncate64_wrapper
 #define sys_mmap2  compat_sys_mmap2_wrapper
 #define sys_ptrace compat_sys_ptrace
-- 
1.8.4.5

> +#define sys_mmap2compat_sys_mmap2_wrapper
> +#define sys_ptrace   compat_sys_ptrace
> +
> +/*
> + * Use non-compat syscall handlers where rlimit, stat and statfs
> + * structure pointers are passed, as their layout is identical to LP64.
> + */
> +#define compat_sys_fstatfs64 sys_fstatfs
> +#define compat_sys_statfs64  sys_statfs
> +#define sys_fstat64  sys_newfstat
> +#define sys_fstatat64sys_newfstatat
> +#define compat_sys_getrlimit sys_getrlimit
> +#define compat_sys_setrlimit sys_setrlimit
> +
> +asmlinkage long compat_sys_fadvise64_64_wrapper(void);
> +asmlinkage long compat_sys_fallocate_wrapper(void);
> +asmlinkage long compat_sys_ftruncate64_wrapper(void);
> +asmlinkage long compat_sys_mmap2_wrapper(void);
> +asmlinkage long compat_sys_pread64_wrapper(void);
> +asmlinkage long 

Re: [RFC2 nowrap: PATCH v7 00/18] ILP32 for ARM64

2016-09-02 Thread Bamvor Jian Zhang
Base on the off-list discussion, the community care about the
performance regression of aarch64 LP64 and aarch32 after ILP32
is merged.

Given that there is not big open issue in ILP32 in kernel part, I try
to address this concern. It is reasonable that we should run lots of
testsuite(such as LKP) to ensure there is no performance regression.
But I am not expert of this, I started from test the lmbench for
aarch64 LP64 and compare the differnce between ILP32 enabled and
without ILP32 patches.

The branch I used is ilp32-4.8 on [1], compare the result between
two commit "d3746f1 arm64:ilp32: add ARM64_ILP32 to Kconfig"(defconfig
with CONFIG_ARM64_ILP32) and "3054de8 fiz set_personality by Catalin"
(defconfig).

The result show there is no big difference. Most of the difference is
less than 5%. Only two differnce more than 10%:
1.  Context switching 2p/16K 13.16%(ILP32 is bigger than No_ILP32.
smaller is better)
2.  *Local* Communication bandwidths: TCP -10.77%.(ILP32 is smaller than
No_ILP32. bigger is better).


If it is make sense to community, I could continue to do more that.

Thanks

Bamvor

[1] https://github.com/norov/linux.git
[2] The full result: (ILP32 - No_ILP32)/No_ILP32

 L M B E N C H  3 . 0   S U M M A R Y
 
 (Alpha software, do not distribute)

Basic system parameters
--
Host OS Description  Mhz  tlb  cache  mem   scal
 pages line   par   load
   bytes
- - ---  - - -- 
buildroot Linux 4.8.0-r A64_ILP32_diff_No_ILP32 102432   128 0.23% 1

Processor, Processes - times in microseconds - smaller is better
--
Host OS  Mhz null null  open slct sig  sig  fork exec sh
 call  I/O stat clos TCP  inst hndl proc proc proc
- -           
buildroot Linux 4.8.0-r 0.00% 0.00% 0.00% -3.03% -0.42% -1.96% 0.00% -0.67% 
2.29% -6.34% 0.85%

Basic integer operations - times in nanoseconds - smaller is better
---
Host OS  intgr intgr  intgr  intgr  intgr
  bit   addmuldivmod
- - -- -- -- -- --
buildroot Linux 4.8.0-r 0.00%  0.00%  0.00%  0.00%  0.00%

Basic uint64 operations - times in nanoseconds - smaller is better
--
Host OS int64  int64  int64  int64  int64
 bitaddmuldivmod
- - -- -- -- -- --
buildroot Linux 4.8.0-r  0.00%0.00%  0.00%  0.00%

Basic float operations - times in nanoseconds - smaller is better
-
Host OS  float  float  float  float
 addmuldivbogo
- - -- -- -- --
buildroot Linux 4.8.0-r 0.00%  0.00%  0.04%  0.00%

Basic double operations - times in nanoseconds - smaller is better
--
Host OS  double double double double
 addmuldivbogo
- - --  -- -- --
buildroot Linux 4.8.0-r 0.00%  0.00%0.00%  0.00%

Context switching - times in microseconds - smaller is better
-
Host OS  2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
 ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
- - -- -- -- -- -- --- ---
buildroot Linux 4.8.0-r  -6.00%  13.16% -1.83% 3.80%  9.94%  -6.17%   2.72%

*Local* Communication latencies in microseconds - smaller is better
-
Host OS 2p/0K  Pipe AF UDP  RPC/   TCP  RPC/ TCP
ctxsw   UNIX UDP TCP conn
- - - -  - - - - 
buildroot Linux 4.8.0-r -6.00% -4.08% 1.95% -5.02%4.87% 0.00%


File & VM system latencies in microseconds - smaller is better
---
Host OS   0K File  10K File MmapProt   Page   100fd
Create Delete Create Delete Latency Fault  Fault  selct
- - -- -- -- -- --- - --- -
buildroot 

Re: [RFC PATCH v2 01/20] x86: Documentation for AMD Secure Memory Encryption (SME)

2016-09-02 Thread Borislav Petkov
On Mon, Aug 22, 2016 at 05:35:39PM -0500, Tom Lendacky wrote:
> This patch adds a Documenation entry to decribe the AMD Secure Memory
> Encryption (SME) feature.
> 
> Signed-off-by: Tom Lendacky 
> ---
>  Documentation/x86/amd-memory-encryption.txt |   35 
> +++
>  1 file changed, 35 insertions(+)
>  create mode 100644 Documentation/x86/amd-memory-encryption.txt
> 
> diff --git a/Documentation/x86/amd-memory-encryption.txt 
> b/Documentation/x86/amd-memory-encryption.txt
> new file mode 100644
> index 000..f19c555
> --- /dev/null
> +++ b/Documentation/x86/amd-memory-encryption.txt
> @@ -0,0 +1,35 @@
> +Secure Memory Encryption (SME) is a feature found on AMD processors.
> +
> +SME provides the ability to mark individual pages of memory as encrypted 
> using
> +the standard x86 page tables.  A page that is marked encrpyted will be

s/encrpyted/encrypted/

> +automatically decrypted when read from DRAM and encrypted when written to
> +DRAM.  SME can therefore be used to protect the contents of DRAM from 
> physical
> +attacks on the system.
> +
> +Support for SME can be determined through the CPUID instruction. The CPUID
> +function 0x801f reports information related to SME:
> +
> + 0x801f[eax]:
> + Bit[0] indicates support for SME
> + 0x801f[ebx]:
> + Bit[5:0]  pagetable bit number used to enable memory encryption
> + Bit[11:6] reduction in physical address space, in bits, when
> +   memory encryption is enabled (this only affects system
> +   physical addresses, not guest physical addresses)
> +
> +If support for SME is present, MSR 0xc00100010 (SYS_CFG) can be used to
> +determine if SME is enabled and/or to enable memory encryption:
> +
> + 0xc0010010:
> + Bit[23]   0 = memory encryption features are disabled
> +   1 = memory encryption features are enabled
> +
> +Linux relies on BIOS to set this bit if BIOS has determined that the 
> reduction
> +in the physical address space as a result of enabling memory encryption (see
> +CPUID information above) will not conflict with the address space resource
> +requirements for the system.  If this bit is not set upon Linux startup then
> +Linux itself will not set it and memory encryption will not be possible.
> +
> +SME support is configurable in the kernel through the AMD_MEM_ENCRYPT config
> +option.

" ... is configurable through CONFIG_AMD_MEM_ENCRYPT."

> Additionally, the mem_encrypt=on command line parameter is required
> +to activate memory encryption.

I think you want to rewrite the logic here to say that people should use
the BIOS option and if none is present for whatever reason, resort to
the alternative "mem_encrypt=on" kernel command line option, no?

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix how CSS files are added to the Sphinx HTML.

2016-09-02 Thread Markus Heiser
Hi Eric, 

I'am happy to see you here ;-)

Am 02.09.2016 um 02:09 schrieb Eric Holscher :

> This uses the proper Sphinx add_stylesheet API for adding style overrides:
> 
> http://www.sphinx-doc.org/en/stable/extdev/appapi.html#sphinx.application.Sphinx.add_stylesheet
> 
> Previously,
> if a user had any other `css_files` defined by extensions,
> they would be overwritten by the `html_context` redefinition here.

I tested your patch only without a builder adding 'css_files', but taking
a look at sphinx sources shows me, that you are right: 'css_files' will
be overwritten, which is not good.

> 
> Signed-off-by: Eric Holscher 
> ---
> Documentation/conf.py | 14 --
> 1 file changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/conf.py b/Documentation/conf.py
> index 96b7aa6..1707193 100644
> --- a/Documentation/conf.py
> +++ b/Documentation/conf.py
> @@ -179,12 +179,6 @@ except ImportError:
> 
> html_static_path = ['sphinx-static']
> 
> -html_context = {
> -'css_files': [
> -'_static/theme_overrides.css',
> -],
> -}
> -
> # Add any extra paths that contain custom files (such as robots.txt or
> # .htaccess) here, relative to this directory. These files are copied
> # directly to the root of the documentation.
> @@ -419,3 +413,11 @@ pdf_documents = [
> # line arguments.
> kerneldoc_bin = '../scripts/kernel-doc'
> kerneldoc_srctree = '..'
> +
> +
> +def setup(app):
> +"""
> +This is a basic Sphinx extension that adds our CSS overrides.
> +"""
> +app.add_stylesheet('_static/theme_overrides.css')

Hmm, this will not work, Since the '_static' folder is not needed
to name here. As far as I know, with

   html_static_path = ['sphinx-static']

everything from the 'sphinx-static' is copied to HTML builder's
'_static' folder. And a (corrected)

  app.add_stylesheet('theme_overrides.css')

will insert a:



in the HTML header of the builder (the '_static' folder is added by the HTML
builder).

Anyway, let me say thanks ... and it would be great if you add 
more of your value to the ML ;-)

-- Markus --

> +
> -- 
> 2.6.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-doc" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix how CSS files are added to the Sphinx HTML.

2016-09-02 Thread Jani Nikula
On Fri, 02 Sep 2016, Eric Holscher  wrote:
> This uses the proper Sphinx add_stylesheet API for adding style overrides:
>
> http://www.sphinx-doc.org/en/stable/extdev/appapi.html#sphinx.application.Sphinx.add_stylesheet
>
> Previously,
> if a user had any other `css_files` defined by extensions,
> they would be overwritten by the `html_context` redefinition here.

Removing html_context makes sense, but we don't seem to actually have
_static/theme_overrides.css, or _static directory for that matter, so
could we just drop the setup function altogether?

BR,
Jani.

>
> Signed-off-by: Eric Holscher 
> ---
>  Documentation/conf.py | 14 --
>  1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/Documentation/conf.py b/Documentation/conf.py
> index 96b7aa6..1707193 100644
> --- a/Documentation/conf.py
> +++ b/Documentation/conf.py
> @@ -179,12 +179,6 @@ except ImportError:
>  
>  html_static_path = ['sphinx-static']
>  
> -html_context = {
> -'css_files': [
> -'_static/theme_overrides.css',
> -],
> -}
> -
>  # Add any extra paths that contain custom files (such as robots.txt or
>  # .htaccess) here, relative to this directory. These files are copied
>  # directly to the root of the documentation.
> @@ -419,3 +413,11 @@ pdf_documents = [
>  # line arguments.
>  kerneldoc_bin = '../scripts/kernel-doc'
>  kerneldoc_srctree = '..'
> +
> +
> +def setup(app):
> +"""
> +This is a basic Sphinx extension that adds our CSS overrides.
> +"""
> +app.add_stylesheet('_static/theme_overrides.css')
> +

-- 
Jani Nikula, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html