Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller

2016-10-05 Thread Leon Romanovsky
On Wed, Oct 05, 2016 at 08:37:35AM +0200, Christoph Hellwig wrote:
> FYI, the patches look fine to me:
>
> Acked-by: Christoph Hellwig 
>
> but we're past the merge window for 4.9 now unfortunately.

IMHO, it still can make it.

Thanks


signature.asc
Description: PGP signature


Re: [PATCHv12 0/3] rdmacg: IB/core: rdma controller support

2016-10-05 Thread Leon Romanovsky
On Wed, Aug 31, 2016 at 02:07:24PM +0530, Parav Pandit wrote:
> rdmacg: IB/core: rdma controller support
>
> Patch is generated and tested against below Doug's linux-rdma
> git tree.
>
> URL: git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma.git
> Branch: master
>
> Patchset is also compiled and tested against below Tejun's cgroup tree
> using cgroup v2 mode.
> URL: git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git
> Branch: master
>
> Overview:
> Currently user space applications can easily take away all the rdma
> device specific resources such as AH, CQ, QP, MR etc. Due to which other
> applications in other cgroup or kernel space ULPs may not even get chance
> to allocate any rdma resources. This results into service unavailibility.
>
> RDMA cgroup addresses this issue by allowing resource accounting,
> limit enforcement on per cgroup, per rdma device basis.
>
> RDMA uverbs layer will enforce limits on well defined RDMA verb
> resources without any HCA vendor device driver involvement.
>
> RDMA uverbs layer will not do limit enforcement of HCA hw vendor
> specific resources. Instead rdma cgroup provides set of APIs
> through which vendor specific drivers can do resource accounting
> by making use of rdma cgroup.

Hi Parav,
I want to propose an extension to the RDMA cgroup which can be done as
follow-up patches.

Let's add new global type, which will control whole HCA (for example in 
percentages). It will
allow natively define new objects without need to introduce them to the user.

This HCA share will be overwritten by specific UVERBS types which you
already defined.

What do you think?

Except this proposal,
Reviewed-by: Leon Romanovsky 

Thanks.


signature.asc
Description: PGP signature


Re: [RFC PATCH-tip v4 01/10] locking/osq: Make lock/unlock proper acquire/release barrier

2016-10-05 Thread Waiman Long

On 10/04/2016 03:06 PM, Davidlohr Bueso wrote:

On Thu, 18 Aug 2016, Waiman Long wrote:


The osq_lock() and osq_unlock() function may not provide the necessary
acquire and release barrier in some cases. This patch makes sure
that the proper barriers are provided when osq_lock() is successful
or when osq_unlock() is called.


But why do we need these guarantees given that osq is only used 
internally
for lock owner spinning situations? Leaking out of the critical region 
will
obviously be bad if using it as a full lock, but, as is, this can only 
hurt
performance of two of the most popular locks in the kernel -- although 
yes,

using smp_acquire__after_ctrl_dep is nicer for polling.


First of all, it is not obvious from the name osq_lock() that it is not 
an acquire barrier in some cases. We either need to clearly document it 
or has a variant name that indicate that, e.g. osq_lock_relaxed, for 
example.


Secondly, if we look at the use cases of osq_lock(), the additional 
latency (for non-x86 archs) only matters if the master lock is 
immediately available for acquisition after osq_lock() return. 
Otherwise, it will be hidden in the spinning loop for that master lock. 
So yes, there may be a slight performance hit in some cases, but 
certainly not always.


If you need tighter osq for rwsems, could it be refactored such that 
mutexes

do not take a hit?



Yes, we can certainly do that like splitting into 2 variants, one with 
acquire barrier guarantee and one without.




Suggested-by: Peter Zijlstra (Intel) 
Signed-off-by: Waiman Long 
---
kernel/locking/osq_lock.c |   24 ++--
1 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index 05a3785..3da0b97 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -124,6 +124,11 @@ bool osq_lock(struct optimistic_spin_queue *lock)

cpu_relax_lowlatency();
}
+/*
+ * Add an acquire memory barrier for pairing with the release 
barrier

+ * in unlock.
+ */
+smp_acquire__after_ctrl_dep();
return true;

unqueue:
@@ -198,13 +203,20 @@ void osq_unlock(struct optimistic_spin_queue 
*lock)

 * Second most likely case.
 */
node = this_cpu_ptr(&osq_node);
-next = xchg(&node->next, NULL);
-if (next) {
-WRITE_ONCE(next->locked, 1);
+next = xchg_relaxed(&node->next, NULL);
+if (next)
+goto unlock;
+
+next = osq_wait_next(lock, node, NULL);
+if (unlikely(!next)) {
+/*
+ * In the unlikely event that the OSQ is empty, we need to
+ * provide a proper release barrier.
+ */
+smp_mb();
return;
}

-next = osq_wait_next(lock, node, NULL);
-if (next)
-WRITE_ONCE(next->locked, 1);
+unlock:
+smp_store_release(&next->locked, 1);
}


As well as for the smp_acquire__after_ctrl_dep comment you have above, 
this also
obviously pairs with the osq_lock's smp_load_acquire while backing out 
(unqueueing,
step A). Given the above, for this case we might also just rely on 
READ_ONCE(node->locked),
if we get the conditional wrong and miss the node becoming locked, all 
we do is another
iteration, and while there is a cmpxchg() there, it is mitigated with 
the ccas thingy.


Similar to osq_lock(), the current osq_unlock() does not have a release 
barrier guarantee. I think splitting into 2 variants - osq_unlock, 
osq_unlock_relaxed will help.


Cheers,
Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kernel-doc-rst-lint (was: Re: [PATCH 00/15] improve function-level documentation)

2016-10-05 Thread Jani Nikula
On Wed, 05 Oct 2016, Daniel Vetter  wrote:
> Jani Nikula has a patch with a scrip to make the one kernel-doc parser
> into a lint/checker pass over the entire kernel. I think that'd would
> be more robust instead of trying to approximate the real kerneldoc
> parser. Otoh that parser is a horror show of a perl/regex driven state
> machine ;-)
>
> Jani, can you pls digg out these patches? Can't find them right now ...

Expanding the massive Cc: with linux-doc list...

Here goes. It's a quick hack from months ago, but still seems to
somewhat work. At least for the kernel-doc parts. The reStructuredText
lint part isn't all that great, and doesn't have mapping to line numbers
like the Sphinx kernel-doc extension does. Anyway I'm happy how this
integrates with kernel build CHECK and C=1/C=2.

I guess Julia's goal is to automate the *fixing* of some of the error
classes from kernel-doc. Not sure how well this could be made to
integrate with any of that.

BR,
Jani.


>From 1244efa0f63a7b13795e8c37f81733a3c8bfc56a Mon Sep 17 00:00:00 2001
From: Jani Nikula 
Date: Tue, 31 May 2016 18:11:33 +0300
Subject: [PATCH] kernel-doc-rst-lint: add tool to check kernel-doc and rst
 correctness
Organization: Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo
Cc: Jani Nikula 

Simple kernel-doc and reStructuredText lint tool that can be used
independently and as a kernel build CHECK tool to validate kernel-doc
comments.

Independent usage:
$ kernel-doc-rst-lint FILE

Kernel CHECK usage:
$ make CHECK=scripts/kernel-doc-rst-lint C=1# (or C=2)

Depends on docutils and the rst-lint package
https://pypi.python.org/pypi/restructuredtext_lint

Signed-off-by: Jani Nikula 
---
 scripts/kernel-doc-rst-lint | 106 
 1 file changed, 106 insertions(+)
 create mode 100755 scripts/kernel-doc-rst-lint

diff --git a/scripts/kernel-doc-rst-lint b/scripts/kernel-doc-rst-lint
new file mode 100755
index ..7e0157679f83
--- /dev/null
+++ b/scripts/kernel-doc-rst-lint
@@ -0,0 +1,106 @@
+#!/usr/bin/env python
+# coding=utf-8
+#
+# Copyright © 2016 Intel Corporation
+#
+# Permission is hereby granted, free of charge, to any person obtaining a
+# copy of this software and associated documentation files (the "Software"),
+# to deal in the Software without restriction, including without limitation
+# the rights to use, copy, modify, merge, publish, distribute, sublicense,
+# and/or sell copies of the Software, and to permit persons to whom the
+# Software is furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice (including the next
+# paragraph) shall be included in all copies or substantial portions of the
+# Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+# IN THE SOFTWARE.
+#
+# Authors:
+#Jani Nikula 
+#
+# Simple kernel-doc and reStructuredText lint tool that can be used
+# independently and as a kernel build CHECK tool to validate kernel-doc
+# comments.
+#
+# Independent usage:
+# $ kernel-doc-rst-lint FILE
+#
+# Kernel CHECK usage:
+# $ make CHECK=scripts/kernel-doc-rst-lint C=1 # (or C=2)
+#
+# Depends on docutils and the rst-lint package
+# https://pypi.python.org/pypi/restructuredtext_lint
+#
+
+import os
+import subprocess
+import sys
+
+from docutils.parsers.rst import directives
+from docutils.parsers.rst import Directive
+from docutils.parsers.rst import roles
+from docutils import nodes, statemachine
+import restructuredtext_lint
+
+class DummyDirective(Directive):
+required_argument = 1
+optional_arguments = 0
+option_spec = { }
+has_content = True
+
+def run(self):
+return []
+
+# Fake the Sphinx C Domain directives and roles
+directives.register_directive('c:function', DummyDirective)
+directives.register_directive('c:type', DummyDirective)
+roles.register_generic_role('c:func', nodes.emphasis)
+roles.register_generic_role('c:type', nodes.emphasis)
+
+# We accept but ignore parameters to be compatible with how the kernel build
+# invokes CHECK.
+if len(sys.argv) < 2:
+sys.stderr.write('usage: kernel-doc-rst-lint [IGNORED OPTIONS] FILE\n');
+sys.exit(1)
+
+infile = sys.argv[len(sys.argv) - 1]
+cmd = ['scripts/kernel-doc', '-rst', infile]
+
+try:
+p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, 
universal_newlines=True)
+out, err = p.communicate()
+
+# python2 needs conversion to unicode.
+# python3 with universal_newlines=True returns strings.
+if sys.version_info.major < 3:
+out,

Re: [RFC PATCH-tip v4 01/10] locking/osq: Make lock/unlock proper acquire/release barrier

2016-10-05 Thread Waiman Long

On 10/05/2016 08:19 AM, Waiman Long wrote:

On 10/04/2016 03:06 PM, Davidlohr Bueso wrote:

On Thu, 18 Aug 2016, Waiman Long wrote:


The osq_lock() and osq_unlock() function may not provide the necessary
acquire and release barrier in some cases. This patch makes sure
that the proper barriers are provided when osq_lock() is successful
or when osq_unlock() is called.


But why do we need these guarantees given that osq is only used 
internally
for lock owner spinning situations? Leaking out of the critical 
region will
obviously be bad if using it as a full lock, but, as is, this can 
only hurt
performance of two of the most popular locks in the kernel -- 
although yes,

using smp_acquire__after_ctrl_dep is nicer for polling.


First of all, it is not obvious from the name osq_lock() that it is 
not an acquire barrier in some cases. We either need to clearly 
document it or has a variant name that indicate that, e.g. 
osq_lock_relaxed, for example.


Secondly, if we look at the use cases of osq_lock(), the additional 
latency (for non-x86 archs) only matters if the master lock is 
immediately available for acquisition after osq_lock() return. 
Otherwise, it will be hidden in the spinning loop for that master 
lock. So yes, there may be a slight performance hit in some cases, but 
certainly not always.


If you need tighter osq for rwsems, could it be refactored such that 
mutexes

do not take a hit?



Yes, we can certainly do that like splitting into 2 variants, one with 
acquire barrier guarantee and one without.




How about the following patch? Does that address your concern?

Cheers,
Longman

--[ Cut Here 
]--

[PATCH] locking/osq: Provide 2 variants of lock/unlock functions

Despite the lock/unlock names, the osq_lock() and osq_unlock()
functions were not proper acquire and release barriers. To clarify
the situation, two different variants are now provided:

 1) osq_lock/osq_unlock - they are proper acquire (if successful)
and release barriers respectively.

 2) osq_lock_relaxed/osq_unlock_relaxed - they do not provide the
acquire/release barrier guarantee.

Both the mutex and rwsem optimistic spinning codes are modified to
use the relaxed variants which will be faster in some architectures as
the proper memory barrier semantics are not needed for queuing purpose.

Signed-off-by: Waiman Long 
---
 include/linux/osq_lock.h|2 +
 kernel/locking/mutex.c  |6 +-
 kernel/locking/osq_lock.c   |   89 
+++---

 kernel/locking/rwsem-xadd.c |4 +-
 4 files changed, 81 insertions(+), 20 deletions(-)

diff --git a/include/linux/osq_lock.h b/include/linux/osq_lock.h
index 703ea5c..72357d0 100644
--- a/include/linux/osq_lock.h
+++ b/include/linux/osq_lock.h
@@ -30,7 +30,9 @@ static inline void osq_lock_init(struct 
optimistic_spin_queue *lock)

 }

 extern bool osq_lock(struct optimistic_spin_queue *lock);
+extern bool osq_lock_relaxed(struct optimistic_spin_queue *lock);
 extern void osq_unlock(struct optimistic_spin_queue *lock);
+extern void osq_unlock_relaxed(struct optimistic_spin_queue *lock);

 static inline bool osq_is_locked(struct optimistic_spin_queue *lock)
 {
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index a70b90d..b1bf1e0 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -316,7 +316,7 @@ static bool mutex_optimistic_spin(struct mutex *lock,
  * acquire the mutex all at once, the spinners need to take a
  * MCS (queued) lock first before spinning on the owner field.
  */
-if (!osq_lock(&lock->osq))
+if (!osq_lock_relaxed(&lock->osq))
 goto done;

 while (true) {
@@ -358,7 +358,7 @@ static bool mutex_optimistic_spin(struct mutex *lock,
 }

 mutex_set_owner(lock);
-osq_unlock(&lock->osq);
+osq_unlock_relaxed(&lock->osq);
 return true;
 }

@@ -380,7 +380,7 @@ static bool mutex_optimistic_spin(struct mutex *lock,
 cpu_relax_lowlatency();
 }

-osq_unlock(&lock->osq);
+osq_unlock_relaxed(&lock->osq);
 done:
 /*
  * If we fell out of the spin path because of need_resched(),
diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index 05a3785..1e6823a 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -12,6 +12,23 @@
  */
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct optimistic_spin_node, 
osq_node);


+enum mbtype {
+acquire,
+release,
+relaxed,
+};
+
+static __always_inline int
+_atomic_cmpxchg_(const enum mbtype barrier, atomic_t *v, int old, int new)
+{
+if (barrier == acquire)
+return atomic_cmpxchg_acquire(v, old, new);
+else if (barrier == release)
+return atomic_cmpxchg_release(v, old, new);
+else
+return atomic_cmpxchg_relaxed(v, old, new);
+}
+
 /*
  * We use the value 0 to represent "no CPU", thus the encoded value
  * will be 

Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller

2016-10-05 Thread Tejun Heo
Hello,

On Wed, Oct 05, 2016 at 02:22:57PM +0300, Leon Romanovsky wrote:
> On Wed, Oct 05, 2016 at 08:37:35AM +0200, Christoph Hellwig wrote:
> > FYI, the patches look fine to me:
> >
> > Acked-by: Christoph Hellwig 
> >
> > but we're past the merge window for 4.9 now unfortunately.
> 
> IMHO, it still can make it.

Most likely, we only have three / four days till rc1 opens, I think
it's too late.  Let's target the next one.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel-doc-rst-lint (was: Re: [PATCH 00/15] improve function-level documentation)

2016-10-05 Thread Markus Heiser

Am 05.10.2016 um 16:04 schrieb Jani Nikula :

> On Wed, 05 Oct 2016, Daniel Vetter  wrote:
>> Jani Nikula has a patch with a scrip to make the one kernel-doc parser
>> into a lint/checker pass over the entire kernel. I think that'd would
>> be more robust instead of trying to approximate the real kerneldoc
>> parser. Otoh that parser is a horror show of a perl/regex driven state
>> machine ;-)
>> 
>> Jani, can you pls digg out these patches? Can't find them right now ...
> 
> Expanding the massive Cc: with linux-doc list...
> 
> Here goes. It's a quick hack from months ago, but still seems to
> somewhat work. At least for the kernel-doc parts. The reStructuredText
> lint part isn't all that great, and doesn't have mapping to line numbers
> like the Sphinx kernel-doc extension does. Anyway I'm happy how this
> integrates with kernel build CHECK and C=1/C=2.
> 
> I guess Julia's goal is to automate the *fixing* of some of the error
> classes from kernel-doc. Not sure how well this could be made to
> integrate with any of that.
> 
> BR,
> Jani.

Another lint alternative: 

 use the lint from the linuxdoc project 

install the linuxdoc package:

*  https://return42.github.io/linuxdoc/install.html

e.g.::

 pip install --user git+http://github.com/return42/linuxdoc.git

and run kernel-lintdoc with the file/folder to lint as argument / e.g.::

 kernel-lintdoc include/media/

-- Markus --


> 
> 
> From 1244efa0f63a7b13795e8c37f81733a3c8bfc56a Mon Sep 17 00:00:00 2001
> From: Jani Nikula 
> Date: Tue, 31 May 2016 18:11:33 +0300
> Subject: [PATCH] kernel-doc-rst-lint: add tool to check kernel-doc and rst
> correctness
> Organization: Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo
> Cc: Jani Nikula 
> 
> Simple kernel-doc and reStructuredText lint tool that can be used
> independently and as a kernel build CHECK tool to validate kernel-doc
> comments.
> 
> Independent usage:
> $ kernel-doc-rst-lint FILE
> 
> Kernel CHECK usage:
> $ make CHECK=scripts/kernel-doc-rst-lint C=1  # (or C=2)
> 
> Depends on docutils and the rst-lint package
> https://pypi.python.org/pypi/restructuredtext_lint
> 
> Signed-off-by: Jani Nikula 
> ---
> scripts/kernel-doc-rst-lint | 106 
> 1 file changed, 106 insertions(+)
> create mode 100755 scripts/kernel-doc-rst-lint
> 
> diff --git a/scripts/kernel-doc-rst-lint b/scripts/kernel-doc-rst-lint
> new file mode 100755
> index ..7e0157679f83
> --- /dev/null
> +++ b/scripts/kernel-doc-rst-lint
> @@ -0,0 +1,106 @@
> +#!/usr/bin/env python
> +# coding=utf-8
> +#
> +# Copyright © 2016 Intel Corporation
> +#
> +# Permission is hereby granted, free of charge, to any person obtaining a
> +# copy of this software and associated documentation files (the "Software"),
> +# to deal in the Software without restriction, including without limitation
> +# the rights to use, copy, modify, merge, publish, distribute, sublicense,
> +# and/or sell copies of the Software, and to permit persons to whom the
> +# Software is furnished to do so, subject to the following conditions:
> +#
> +# The above copyright notice and this permission notice (including the next
> +# paragraph) shall be included in all copies or substantial portions of the
> +# Software.
> +#
> +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> +# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> +# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
> DEALINGS
> +# IN THE SOFTWARE.
> +#
> +# Authors:
> +#Jani Nikula 
> +#
> +# Simple kernel-doc and reStructuredText lint tool that can be used
> +# independently and as a kernel build CHECK tool to validate kernel-doc
> +# comments.
> +#
> +# Independent usage:
> +# $ kernel-doc-rst-lint FILE
> +#
> +# Kernel CHECK usage:
> +# $ make CHECK=scripts/kernel-doc-rst-lint C=1   # (or C=2)
> +#
> +# Depends on docutils and the rst-lint package
> +# https://pypi.python.org/pypi/restructuredtext_lint
> +#
> +
> +import os
> +import subprocess
> +import sys
> +
> +from docutils.parsers.rst import directives
> +from docutils.parsers.rst import Directive
> +from docutils.parsers.rst import roles
> +from docutils import nodes, statemachine
> +import restructuredtext_lint
> +
> +class DummyDirective(Directive):
> +required_argument = 1
> +optional_arguments = 0
> +option_spec = { }
> +has_content = True
> +
> +def run(self):
> +return []
> +
> +# Fake the Sphinx C Domain directives and roles
> +directives.register_directive('c:function', DummyDirective)
> +directives.register_directive('c:type', DummyDirective)
> +roles.register_generic_role('c:func', nodes.emphasis)
> +roles.register_gene

Re: [RFC] fs: add userspace critical mounts event support

2016-10-05 Thread Luis R. Rodriguez
On Tue, Oct 04, 2016 at 05:32:22PM -0700, Linus Torvalds wrote:
> On Tue, Oct 4, 2016 at 5:24 PM, Luis R. Rodriguez  wrote:
> >
> > Note that the races are beyond firmware, so all
> > kernel_read_file_from_path() users, as such re-using such old /sys/
> > interafeces for firmware will not suffice to cover all ground now for
> > the same race for other possible users.
> 
> Blah blah blah.
> 
> The reason I've hated this whole discussion is that it's full of
> "let's re-architect everything", and then it has these horribly warty
> interfaces.

To be clear, kernel_read_file_from_path() was an agreed upon strategy
about 1 year ago at the Linux Security summit as we found different
kernel implementations for the same exact task, reading files from
the filesystem -- my point here was simply that acknowledging that the
race on early init and driver's init / probe for firmware is implicating
that the race is *also* possible for the other kernel-read-from-fs points.
Its not clear to me what your grudge here is other than the proposal
for a solution in this patch is not what we want.

> It's classic second-system syndrome.
> 
> Just do *one* thing, and do it well. Don't change anything else. Don't
> force existign drivers to use new interfaces. Don't over-architect,
> and don't do stupid interfaces.

If there is a race for the other users and we want to avoid wrapping
a solution for it to the other callers without doing any vetting for
correctness then so be it, but to disregard completely seems error-prone.
I accept that thinking about such other users may complicate a solution
for firmware and if you prefer we just separate the race solution for
both that's fine.

> If user-space mounts a new filesystem (or just unpacks files from a
> tar-file that has firmware images in it, for chissake), that is not
> some magical "critical mount event". The whole concept is just stupid.
> Is it a "mount event" when the user downloads a new firmware image
> from the internet?
> 
> HELL NO.

We've gotten passed that the original implementation proposed is not what we
want, let's move on.

> But what is equally stupid is to then dismiss simple models because
> some totally unrelated "beyond firmware" issue.

I have not heard back from the other stakeholders using
kernel_read_file_from_path() and possible races for them. You seem to suggest
to ignore those possible theoretical races in the name of a simple solution for
firmware. Fine.

> Anything that is "beyond firmware" shouldn't even be discussed, for
> chrissake! It has nothing what-so-ever to do with firmware loading. If
> there ends up being some common helper functions, and shared code,
> that *still* doesn't make it so.

My point was to raise the flag of the possible races on the other call sites
where we read files directly from the kernel, that's all, if we agree we really
don't care for that fine.

> Basic rules of thumb:
> 
>  (a) don't over-design
> 
>  (b) don't have stupid illogical interfaces
> 
>  (c) don't conflate different issues just because you think they may
> have shared code.
> 
>  (4) be consistent. Don't make up new interfaces, and most certainly
> do *NOT* dismiss something just because it's what we have done before.
> 
> That's it.

OK..

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] fs: add userspace critical mounts event support

2016-10-05 Thread Luis R. Rodriguez
On Tue, Sep 13, 2016 at 09:38:17PM -0500, Rob Landley wrote:
> On 09/02/2016 07:20 PM, Luis R. Rodriguez wrote:
> > kernel_read_file_from_path() can try to read a file from
> > the system's filesystem. This is typically done for firmware
> > for instance, which lives in /lib/firmware. One issue with
> > this is that the kernel cannot know for sure when the real
> > final /lib/firmare/ is ready, and even if you use initramfs
> > drivers are currently initialized *first* prior to the initramfs
> > kicking off.
> 
> Why?

do_initcalls() is called prior to prepare_namespace(), other than that
we have no strict rules over where the real rootfs should be, and since
we have pivot_root() its up to userspace to decide when/how the real
rootfs goes. This and the fact that its also up to userspace to design
what files to place in initramfs of further rootfs -- only userspace
will know for sure when all firmware for all drivers is really ready.

> > During init we run through all init calls first
> > (do_initcalls()) and finally the initramfs is processed via
> > prepare_namespace():
> 
> What's the downside of moving initramfs cpio extraction earlier in the boot?

That would help users of initrafms, some folks seem to not want to use
initramfs, one of such users are that of the large firmwares for remote-proc
(Documentation/remoteproc.txt), we're talking about over 200 MiB for some
firmware for example.

> I did some shuffling around of those code to make initmpfs work, does
> anybody know why initramfs extraction _before_ we initialize drivers
> would be a bad thing?

No, but it seems sensible to me, if its done before do_initcalls()
that should resolve the race for initramfs users but -- so long
as the drivers that need firmware early are dumped into initramfs.
We have no assurances/warnings for this, but we can add such things
if we want them. This would not resolve the race for non-initramfs
users / pivot_root() changes.

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] printk: introduce kptr_restrict level 3

2016-10-05 Thread william . c . roberts
From: William Roberts 

Some out-of-tree modules do not use %pK and just use %p, as it's
the common C paradigm for printing pointers. Because of this,
kptr_restrict has no affect on the output and thus, no way to
contain the kernel address leak.

Introduce kptr_restrict level 3 that causes the kernel to
treat %p as if it was %pK and thus always prints zeros.

Sample Output:
kptr_restrict == 2:
p: 604369f4
pK: 

kptr_restrict == 3:
p: 
pK: 

Signed-off-by: William Roberts 
---
 Documentation/sysctl/kernel.txt |   3 ++
 kernel/sysctl.c |   3 +-
 lib/vsprintf.c  | 107 
 3 files changed, 69 insertions(+), 44 deletions(-)

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index ffab8b5..bca72a0 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -393,6 +393,9 @@ values to unprivileged users is a concern.
 When kptr_restrict is set to (2), kernel pointers printed using
 %pK will be replaced with 0's regardless of privileges.
 
+When kptr_restrict is set to (3), kernel pointers printed using
+%p and %pK will be replaced with 0's regardless of privileges.
+
 ==
 
 kstack_depth_to_print: (X86 only)
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index a377bfa..0d4e4af 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -128,6 +128,7 @@ static unsigned long one_ul = 1;
 static int one_hundred = 100;
 static int one_thousand = 1000;
 #ifdef CONFIG_PRINTK
+static int three = 3;
 static int ten_thousand = 1;
 #endif
 #ifdef CONFIG_PERF_EVENTS
@@ -847,7 +848,7 @@ static struct ctl_table kern_table[] = {
.mode   = 0644,
.proc_handler   = proc_dointvec_minmax_sysadmin,
.extra1 = &zero,
-   .extra2 = &two,
+   .extra2 = &three,
},
 #endif
{
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index 0967771..371cfab 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -1472,6 +1472,56 @@ char *flags_string(char *buf, char *end, void 
*flags_ptr, const char *fmt)
 
 int kptr_restrict __read_mostly;
 
+static inline void *cleanse_pointer(void *ptr, struct printf_spec spec,
+   char *buf, char *end, int default_width)
+{
+   switch (kptr_restrict) {
+   case 0:
+   /* Always print %p values */
+   break;
+   case 1: {
+   const struct cred *cred;
+
+   /*
+* kptr_restrict==1 cannot be used in IRQ context
+* because its test for CAP_SYSLOG would be meaningless.
+*/
+   if (in_irq() || in_serving_softirq() || in_nmi()) {
+   if (spec.field_width == -1)
+   spec.field_width = default_width;
+   return string(buf, end, "pK-error", spec);
+   }
+
+   /*
+* Only print the real pointer value if the current
+* process has CAP_SYSLOG and is running with the
+* same credentials it started with. This is because
+* access to files is checked at open() time, but %p
+* checks permission at read() time. We don't want to
+* leak pointer values if a binary opens a file using
+* %pK and then elevates privileges before reading it.
+*/
+   cred = current_cred();
+   if (!has_capability_noaudit(current, CAP_SYSLOG) ||
+   !uid_eq(cred->euid, cred->uid) ||
+   !gid_eq(cred->egid, cred->gid))
+   ptr = NULL;
+   break;
+   }
+   case 2: /* restrict only %pK */
+   case 3: /* restrict all non-extensioned %p and %pK */
+   default:
+   ptr = NULL;
+   break;
+   }
+   return ptr;
+}
+
+static inline int kptr_restrict_always_cleanse(void)
+{
+   return kptr_restrict == 3;
+}
+
 /*
  * Show a '%p' thing.  A kernel extension is that the '%p' is followed
  * by an extra set of alphanumeric characters that are extended format
@@ -1569,6 +1619,9 @@ int kptr_restrict __read_mostly;
  * Note: The difference between 'S' and 'F' is that on ia64 and ppc64
  * function pointers are really function descriptors, which contain a
  * pointer to the real address.
+ *
+ * Note: That for kptr_restrict set to 3, %p and %pK have the same
+ * meaning.
  */
 static noinline_for_stack
 char *pointer(const char *fmt, char *buf, char *end, void *ptr,
@@ -1576,7 +1629,7 @@ char *pointer(const char *fmt, char *buf, char *end, void 
*ptr,
 {
const int default_width = 2 * sizeof(void *);
 
-   if (!ptr && *fmt != 'K') {
+   if (!ptr && *fmt != 'K' && !kptr_restrict_always_cleanse()) {
/*
 * Print (null) wit

Re: [RFC] fs: add userspace critical mounts event support

2016-10-05 Thread Linus Torvalds
On Wed, Oct 5, 2016 at 11:00 AM, Luis R. Rodriguez  wrote:
> On Tue, Sep 13, 2016 at 09:38:17PM -0500, Rob Landley wrote:
>
>> I did some shuffling around of those code to make initmpfs work, does
>> anybody know why initramfs extraction _before_ we initialize drivers
>> would be a bad thing?
>
> No, but it seems sensible to me, if its done before do_initcalls()
> that should resolve the race for initramfs users

initramfs should already be set up before drivers are. Exactly what is
it that has trouble right now?

The gating issue for initramfs is that technically the filesystem
setup needs to be done, which means that it currently ends up being
populated _fairly_ late in the initcall series, but certainly before
drivers. But since initramfs really only needs very limited filesystem
functionality, I assume Rob had few problems with just moving it
earlier.

Still, what kind of ordering issues did people have? What is it that
needs to load files even before driver init? Some crazy subsystem?

   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] cinergyT2-core: don't do DMA on stack

2016-10-05 Thread Mauro Carvalho Chehab
The USB control messages require DMA to work. We cannot pass
a stack-allocated buffer, as it is not warranted that the
stack would be into a DMA enabled area.

Signed-off-by: Mauro Carvalho Chehab 
---

Added the fixups made by Johannes Stezenbach

 drivers/media/usb/dvb-usb/cinergyT2-core.c | 45 ++
 1 file changed, 27 insertions(+), 18 deletions(-)

diff --git a/drivers/media/usb/dvb-usb/cinergyT2-core.c 
b/drivers/media/usb/dvb-usb/cinergyT2-core.c
index 9fd1527494eb..8267e3777af6 100644
--- a/drivers/media/usb/dvb-usb/cinergyT2-core.c
+++ b/drivers/media/usb/dvb-usb/cinergyT2-core.c
@@ -41,6 +41,7 @@ DVB_DEFINE_MOD_OPT_ADAPTER_NR(adapter_nr);
 
 struct cinergyt2_state {
u8 rc_counter;
+   unsigned char data[64];
 };
 
 /* We are missing a release hook with usb_device data */
@@ -50,29 +51,36 @@ static struct dvb_usb_device_properties 
cinergyt2_properties;
 
 static int cinergyt2_streaming_ctrl(struct dvb_usb_adapter *adap, int enable)
 {
-   char buf[] = { CINERGYT2_EP1_CONTROL_STREAM_TRANSFER, enable ? 1 : 0 };
-   char result[64];
-   return dvb_usb_generic_rw(adap->dev, buf, sizeof(buf), result,
-   sizeof(result), 0);
+   struct dvb_usb_device *d = adap->dev;
+   struct cinergyt2_state *st = d->priv;
+
+   st->data[0] = CINERGYT2_EP1_CONTROL_STREAM_TRANSFER;
+   st->data[1] = enable ? 1 : 0;
+
+   return dvb_usb_generic_rw(d, st->data, 2, st->data, 64, 0);
 }
 
 static int cinergyt2_power_ctrl(struct dvb_usb_device *d, int enable)
 {
-   char buf[] = { CINERGYT2_EP1_SLEEP_MODE, enable ? 0 : 1 };
-   char state[3];
-   return dvb_usb_generic_rw(d, buf, sizeof(buf), state, sizeof(state), 0);
+   struct cinergyt2_state *st = d->priv;
+
+   st->data[0] = CINERGYT2_EP1_SLEEP_MODE;
+   st->data[1] = enable ? 1 : 0;
+
+   return dvb_usb_generic_rw(d, st->data, 2, st->data, 3, 0);
 }
 
 static int cinergyt2_frontend_attach(struct dvb_usb_adapter *adap)
 {
-   char query[] = { CINERGYT2_EP1_GET_FIRMWARE_VERSION };
-   char state[3];
+   struct dvb_usb_device *d = adap->dev;
+   struct cinergyt2_state *st = d->priv;
int ret;
 
adap->fe_adap[0].fe = cinergyt2_fe_attach(adap->dev);
 
-   ret = dvb_usb_generic_rw(adap->dev, query, sizeof(query), state,
-   sizeof(state), 0);
+   st->data[0] = CINERGYT2_EP1_GET_FIRMWARE_VERSION;
+
+   ret = dvb_usb_generic_rw(d, st->data, 1, st->data, 3, 0);
if (ret < 0) {
deb_rc("cinergyt2_power_ctrl() Failed to retrieve sleep "
"state info\n");
@@ -141,13 +149,14 @@ static int repeatable_keys[] = {
 static int cinergyt2_rc_query(struct dvb_usb_device *d, u32 *event, int *state)
 {
struct cinergyt2_state *st = d->priv;
-   u8 key[5] = {0, 0, 0, 0, 0}, cmd = CINERGYT2_EP1_GET_RC_EVENTS;
int i;
 
*state = REMOTE_NO_KEY_PRESSED;
 
-   dvb_usb_generic_rw(d, &cmd, 1, key, sizeof(key), 0);
-   if (key[4] == 0xff) {
+   st->data[0] = CINERGYT2_EP1_GET_RC_EVENTS;
+
+   dvb_usb_generic_rw(d, st->data, 1, st->data, 5, 0);
+   if (st->data[4] == 0xff) {
/* key repeat */
st->rc_counter++;
if (st->rc_counter > RC_REPEAT_DELAY) {
@@ -166,13 +175,13 @@ static int cinergyt2_rc_query(struct dvb_usb_device *d, 
u32 *event, int *state)
}
 
/* hack to pass checksum on the custom field */
-   key[2] = ~key[1];
-   dvb_usb_nec_rc_key_to_event(d, key, event, state);
-   if (key[0] != 0) {
+   st->data[2] = ~st->data[1];
+   dvb_usb_nec_rc_key_to_event(d, st->data, event, state);
+   if (st->data[0] != 0) {
if (*event != d->last_event)
st->rc_counter = 0;
 
-   deb_rc("key: %*ph\n", 5, key);
+   deb_rc("key: %*ph\n", 5, st->data);
}
return 0;
 }
-- 
2.7.4


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] cinergyT2-core: don't do DMA on stack

2016-10-05 Thread Mauro Carvalho Chehab
Em Wed,  5 Oct 2016 15:54:18 -0300
Mauro Carvalho Chehab  escreveu:

> The USB control messages require DMA to work. We cannot pass
> a stack-allocated buffer, as it is not warranted that the
> stack would be into a DMA enabled area.

Please disregard this patch at linux-doc ML. It was sent to the wrong
list by mistake.

Regards,
Mauro

> 
> Signed-off-by: Mauro Carvalho Chehab 
> ---
> 
> Added the fixups made by Johannes Stezenbach
> 
>  drivers/media/usb/dvb-usb/cinergyT2-core.c | 45 
> ++
>  1 file changed, 27 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/media/usb/dvb-usb/cinergyT2-core.c 
> b/drivers/media/usb/dvb-usb/cinergyT2-core.c
> index 9fd1527494eb..8267e3777af6 100644
> --- a/drivers/media/usb/dvb-usb/cinergyT2-core.c
> +++ b/drivers/media/usb/dvb-usb/cinergyT2-core.c
> @@ -41,6 +41,7 @@ DVB_DEFINE_MOD_OPT_ADAPTER_NR(adapter_nr);
>  
>  struct cinergyt2_state {
>   u8 rc_counter;
> + unsigned char data[64];
>  };
>  
>  /* We are missing a release hook with usb_device data */
> @@ -50,29 +51,36 @@ static struct dvb_usb_device_properties 
> cinergyt2_properties;
>  
>  static int cinergyt2_streaming_ctrl(struct dvb_usb_adapter *adap, int enable)
>  {
> - char buf[] = { CINERGYT2_EP1_CONTROL_STREAM_TRANSFER, enable ? 1 : 0 };
> - char result[64];
> - return dvb_usb_generic_rw(adap->dev, buf, sizeof(buf), result,
> - sizeof(result), 0);
> + struct dvb_usb_device *d = adap->dev;
> + struct cinergyt2_state *st = d->priv;
> +
> + st->data[0] = CINERGYT2_EP1_CONTROL_STREAM_TRANSFER;
> + st->data[1] = enable ? 1 : 0;
> +
> + return dvb_usb_generic_rw(d, st->data, 2, st->data, 64, 0);
>  }
>  
>  static int cinergyt2_power_ctrl(struct dvb_usb_device *d, int enable)
>  {
> - char buf[] = { CINERGYT2_EP1_SLEEP_MODE, enable ? 0 : 1 };
> - char state[3];
> - return dvb_usb_generic_rw(d, buf, sizeof(buf), state, sizeof(state), 0);
> + struct cinergyt2_state *st = d->priv;
> +
> + st->data[0] = CINERGYT2_EP1_SLEEP_MODE;
> + st->data[1] = enable ? 1 : 0;
> +
> + return dvb_usb_generic_rw(d, st->data, 2, st->data, 3, 0);
>  }
>  
>  static int cinergyt2_frontend_attach(struct dvb_usb_adapter *adap)
>  {
> - char query[] = { CINERGYT2_EP1_GET_FIRMWARE_VERSION };
> - char state[3];
> + struct dvb_usb_device *d = adap->dev;
> + struct cinergyt2_state *st = d->priv;
>   int ret;
>  
>   adap->fe_adap[0].fe = cinergyt2_fe_attach(adap->dev);
>  
> - ret = dvb_usb_generic_rw(adap->dev, query, sizeof(query), state,
> - sizeof(state), 0);
> + st->data[0] = CINERGYT2_EP1_GET_FIRMWARE_VERSION;
> +
> + ret = dvb_usb_generic_rw(d, st->data, 1, st->data, 3, 0);
>   if (ret < 0) {
>   deb_rc("cinergyt2_power_ctrl() Failed to retrieve sleep "
>   "state info\n");
> @@ -141,13 +149,14 @@ static int repeatable_keys[] = {
>  static int cinergyt2_rc_query(struct dvb_usb_device *d, u32 *event, int 
> *state)
>  {
>   struct cinergyt2_state *st = d->priv;
> - u8 key[5] = {0, 0, 0, 0, 0}, cmd = CINERGYT2_EP1_GET_RC_EVENTS;
>   int i;
>  
>   *state = REMOTE_NO_KEY_PRESSED;
>  
> - dvb_usb_generic_rw(d, &cmd, 1, key, sizeof(key), 0);
> - if (key[4] == 0xff) {
> + st->data[0] = CINERGYT2_EP1_GET_RC_EVENTS;
> +
> + dvb_usb_generic_rw(d, st->data, 1, st->data, 5, 0);
> + if (st->data[4] == 0xff) {
>   /* key repeat */
>   st->rc_counter++;
>   if (st->rc_counter > RC_REPEAT_DELAY) {
> @@ -166,13 +175,13 @@ static int cinergyt2_rc_query(struct dvb_usb_device *d, 
> u32 *event, int *state)
>   }
>  
>   /* hack to pass checksum on the custom field */
> - key[2] = ~key[1];
> - dvb_usb_nec_rc_key_to_event(d, key, event, state);
> - if (key[0] != 0) {
> + st->data[2] = ~st->data[1];
> + dvb_usb_nec_rc_key_to_event(d, st->data, event, state);
> + if (st->data[0] != 0) {
>   if (*event != d->last_event)
>   st->rc_counter = 0;
>  
> - deb_rc("key: %*ph\n", 5, key);
> + deb_rc("key: %*ph\n", 5, st->data);
>   }
>   return 0;
>  }



Thanks,
Mauro
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] printk: introduce kptr_restrict level 3

2016-10-05 Thread Kees Cook
On Wed, Oct 5, 2016 at 11:04 AM,   wrote:
> From: William Roberts 
>
> Some out-of-tree modules do not use %pK and just use %p, as it's
> the common C paradigm for printing pointers. Because of this,
> kptr_restrict has no affect on the output and thus, no way to
> contain the kernel address leak.

Solving this is certainly a good idea -- I'm all for finding a solid solution.

> Introduce kptr_restrict level 3 that causes the kernel to
> treat %p as if it was %pK and thus always prints zeros.

I'm worried that this could break kernel internals where %p is being
used and not exposed to userspace. Maybe those situations don't
exist...

Regardless, I would rather do what Grsecurity has done in this area,
and whitelist known-safe values instead. For example, they have %pP
for approved pointers, and %pX for approved
dereference_function_descriptor() output. Everything else is censored
if it is a value in kernel memory and destined for a user-space memory
buffer:

if ((unsigned long)ptr > TASK_SIZE && *fmt != 'P' && *fmt !=
'X' && *fmt != 'K' && is_usercopy_object(buf)) {
printk(KERN_ALERT "grsec: kernel infoleak detected!
Please report this log to spen...@grsecurity.net.\n");
dump_stack();
ptr = NULL;
}

The "is_usercopy_object()" test is something we can add, which is
testing for a new SLAB flag that is used to mark slab caches as either
used by user-space or not, which is done also through whitelisting.
(For more details on this, see:
http://www.openwall.com/lists/kernel-hardening/2016/06/08/10)

Would you have time/interest to add the slab flags and
is_usercopy_object()? The hardened usercopy part of the slab
whitelisting can be separate, since it likely needs a different
usercopy interface to sanely integrate with upstream.

-Kees

-- 
Kees Cook
Nexus Security
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] fs: add userspace critical mounts event support

2016-10-05 Thread Luis R. Rodriguez
On Wed, Oct 05, 2016 at 11:08:06AM -0700, Linus Torvalds wrote:
> On Wed, Oct 5, 2016 at 11:00 AM, Luis R. Rodriguez  wrote:
> > On Tue, Sep 13, 2016 at 09:38:17PM -0500, Rob Landley wrote:
> >
> >> I did some shuffling around of those code to make initmpfs work, does
> >> anybody know why initramfs extraction _before_ we initialize drivers
> >> would be a bad thing?
> >
> > No, but it seems sensible to me, if its done before do_initcalls()
> > that should resolve the race for initramfs users
> 
> initramfs should already be set up before drivers are.

Actually you are right, the issue would only be for old initrd, for initramfs
we populate that via rootfs_initcall(populate_rootfs), so as long as drivers
in question use an init level beyond rootfs's we're good there.

> Exactly what is it that has trouble right now?

It would seem then that the only current stated race possible should
then be non-initramfs users. One example if very large firmware for
remote-proc, whereby an initramfs is just not practical or desirable.

> The gating issue for initramfs is that technically the filesystem
> setup needs to be done, which means that it currently ends up being
> populated _fairly_ late in the initcall series, but certainly before
> drivers. But since initramfs really only needs very limited filesystem
> functionality, I assume Rob had few problems with just moving it
> earlier.
> 
> Still, what kind of ordering issues did people have? What is it that
> needs to load files even before driver init? Some crazy subsystem?

No, I think this is just about non-initramfs users now, if we disregard
old initrd users. Bjorn, Marc, correct me if I'm wrong, as I think its
so far you both who have seemed to run into race issues and have then
ended up trying to look for hacks to address this race or considered using
the usermode helper (which we're trying to minimize users for). Daniel
seems to note a lot of video drivers use firmware on probe as well so
there's a potential issue for those users if they don't use initramfs.

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel-doc-rst-lint (was: Re: [PATCH 00/15] improve function-level documentation)

2016-10-05 Thread Julia Lawall


On Wed, 5 Oct 2016, Jani Nikula wrote:

> On Wed, 05 Oct 2016, Daniel Vetter  wrote:
> > Jani Nikula has a patch with a scrip to make the one kernel-doc parser
> > into a lint/checker pass over the entire kernel. I think that'd would
> > be more robust instead of trying to approximate the real kerneldoc
> > parser. Otoh that parser is a horror show of a perl/regex driven state
> > machine ;-)
> >
> > Jani, can you pls digg out these patches? Can't find them right now ...
>
> Expanding the massive Cc: with linux-doc list...
>
> Here goes. It's a quick hack from months ago, but still seems to
> somewhat work. At least for the kernel-doc parts. The reStructuredText
> lint part isn't all that great, and doesn't have mapping to line numbers
> like the Sphinx kernel-doc extension does. Anyway I'm happy how this
> integrates with kernel build CHECK and C=1/C=2.
>
> I guess Julia's goal is to automate the *fixing* of some of the error
> classes from kernel-doc. Not sure how well this could be made to
> integrate with any of that.

No, my work doesn't fix anything.  Coccinelle can't actually process
comments.  I just correlated the parsed comment with the function header.

julia

>
> BR,
> Jani.
>
>
> From 1244efa0f63a7b13795e8c37f81733a3c8bfc56a Mon Sep 17 00:00:00 2001
> From: Jani Nikula 
> Date: Tue, 31 May 2016 18:11:33 +0300
> Subject: [PATCH] kernel-doc-rst-lint: add tool to check kernel-doc and rst
>  correctness
> Organization: Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo
> Cc: Jani Nikula 
>
> Simple kernel-doc and reStructuredText lint tool that can be used
> independently and as a kernel build CHECK tool to validate kernel-doc
> comments.
>
> Independent usage:
> $ kernel-doc-rst-lint FILE
>
> Kernel CHECK usage:
> $ make CHECK=scripts/kernel-doc-rst-lint C=1  # (or C=2)
>
> Depends on docutils and the rst-lint package
> https://pypi.python.org/pypi/restructuredtext_lint
>
> Signed-off-by: Jani Nikula 
> ---
>  scripts/kernel-doc-rst-lint | 106 
> 
>  1 file changed, 106 insertions(+)
>  create mode 100755 scripts/kernel-doc-rst-lint
>
> diff --git a/scripts/kernel-doc-rst-lint b/scripts/kernel-doc-rst-lint
> new file mode 100755
> index ..7e0157679f83
> --- /dev/null
> +++ b/scripts/kernel-doc-rst-lint
> @@ -0,0 +1,106 @@
> +#!/usr/bin/env python
> +# coding=utf-8
> +#
> +# Copyright © 2016 Intel Corporation
> +#
> +# Permission is hereby granted, free of charge, to any person obtaining a
> +# copy of this software and associated documentation files (the "Software"),
> +# to deal in the Software without restriction, including without limitation
> +# the rights to use, copy, modify, merge, publish, distribute, sublicense,
> +# and/or sell copies of the Software, and to permit persons to whom the
> +# Software is furnished to do so, subject to the following conditions:
> +#
> +# The above copyright notice and this permission notice (including the next
> +# paragraph) shall be included in all copies or substantial portions of the
> +# Software.
> +#
> +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> +# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> +# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
> DEALINGS
> +# IN THE SOFTWARE.
> +#
> +# Authors:
> +#Jani Nikula 
> +#
> +# Simple kernel-doc and reStructuredText lint tool that can be used
> +# independently and as a kernel build CHECK tool to validate kernel-doc
> +# comments.
> +#
> +# Independent usage:
> +# $ kernel-doc-rst-lint FILE
> +#
> +# Kernel CHECK usage:
> +# $ make CHECK=scripts/kernel-doc-rst-lint C=1   # (or C=2)
> +#
> +# Depends on docutils and the rst-lint package
> +# https://pypi.python.org/pypi/restructuredtext_lint
> +#
> +
> +import os
> +import subprocess
> +import sys
> +
> +from docutils.parsers.rst import directives
> +from docutils.parsers.rst import Directive
> +from docutils.parsers.rst import roles
> +from docutils import nodes, statemachine
> +import restructuredtext_lint
> +
> +class DummyDirective(Directive):
> +required_argument = 1
> +optional_arguments = 0
> +option_spec = { }
> +has_content = True
> +
> +def run(self):
> +return []
> +
> +# Fake the Sphinx C Domain directives and roles
> +directives.register_directive('c:function', DummyDirective)
> +directives.register_directive('c:type', DummyDirective)
> +roles.register_generic_role('c:func', nodes.emphasis)
> +roles.register_generic_role('c:type', nodes.emphasis)
> +
> +# We accept but ignore parameters to be compatible with how the kernel build
> +# invokes CHECK.
> +if len(sys.argv) < 2:
> +sys.stderr.write('usage: kerne

Re: [PATCH] printk: introduce kptr_restrict level 3

2016-10-05 Thread Rasmus Villemoes
On Wed, Oct 05 2016, william.c.robe...@intel.com wrote:

> From: William Roberts 
>
> Some out-of-tree modules do not use %pK and just use %p, as it's
> the common C paradigm for printing pointers. Because of this,
> kptr_restrict has no affect on the output and thus, no way to
> contain the kernel address leak.
>
> Introduce kptr_restrict level 3 that causes the kernel to
> treat %p as if it was %pK and thus always prints zeros.
>
> Sample Output:
> kptr_restrict == 2:
> p: 604369f4
> pK: 
>
> kptr_restrict == 3:
> p: 
> pK: 
>
> Signed-off-by: William Roberts 
> ---
>  Documentation/sysctl/kernel.txt |   3 ++
>  kernel/sysctl.c |   3 +-
>  lib/vsprintf.c  | 107 
> 

That's a lot of changed lines. Why isn't this just

--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -1719,6 +1719,8 @@ char *pointer(const char *fmt, char *buf, char *end, void 
*ptr,
case 'G':
return flags_string(buf, end, ptr, fmt);
}
+   if (kptr_restrict == 3)
+   ptr = NULL;
spec.flags |= SMALL;
if (spec.field_width == -1) {
spec.field_width = default_width;

?
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH-tip v4 01/10] locking/osq: Make lock/unlock proper acquire/release barrier

2016-10-05 Thread Davidlohr Bueso

On Wed, 05 Oct 2016, Waiman Long wrote:


diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index 05a3785..1e6823a 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -12,6 +12,23 @@
 */
static DEFINE_PER_CPU_SHARED_ALIGNED(struct optimistic_spin_node, 
osq_node);


+enum mbtype {
+acquire,
+release,
+relaxed,
+};


No, please.


+
+static __always_inline int
+_atomic_cmpxchg_(const enum mbtype barrier, atomic_t *v, int old, int new)
+{
+if (barrier == acquire)
+return atomic_cmpxchg_acquire(v, old, new);
+else if (barrier == release)
+return atomic_cmpxchg_release(v, old, new);
+else
+return atomic_cmpxchg_relaxed(v, old, new);
+}


Things like the above are icky. How about something like below? I'm not
crazy about it, but there are other similar macros, ie lockref. We still
provide the osq_lock/unlock to imply acquire/release and the new _relaxed
flavor, as I agree that should be the correct naming

While I have not touched osq_wait_next(), the following are impacted:

- node->locked is now completely without ordering for _relaxed() (currently
its under smp_load_acquire, which does not match and the race is harmless
to begin with as we just iterate again. For the acquire flavor, it is always
formed with ctr dep + smp_rmb().

- If osq_lock() fails we never guarantee any ordering.

What do you think?

Thanks,
Davidlohr


diff --git a/include/linux/osq_lock.h b/include/linux/osq_lock.h
index 703ea5c30a33..183ee51e6e54 100644
--- a/include/linux/osq_lock.h
+++ b/include/linux/osq_lock.h
@@ -29,9 +29,20 @@ static inline void osq_lock_init(struct 
optimistic_spin_queue *lock)
atomic_set(&lock->tail, OSQ_UNLOCKED_VAL);
}

+/*
+ * Versions of osq_lock/unlock that do not imply or guarantee (load)-ACQUIRE
+ * (store)-RELEASE barrier semantics.
+ *
+ * Note that a failed call to either osq_lock() or osq_lock_relaxed() does
+ * not imply barriers... we are next to block.
+ */
+extern bool osq_lock_relaxed(struct optimistic_spin_queue *lock);
+extern void osq_unlock_relaxed(struct optimistic_spin_queue *lock);
+
extern bool osq_lock(struct optimistic_spin_queue *lock);
extern void osq_unlock(struct optimistic_spin_queue *lock);

+
static inline bool osq_is_locked(struct optimistic_spin_queue *lock)
{
return atomic_read(&lock->tail) != OSQ_UNLOCKED_VAL;
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index a70b90db3909..b1bf1e057565 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -316,7 +316,7 @@ static bool mutex_optimistic_spin(struct mutex *lock,
 * acquire the mutex all at once, the spinners need to take a
 * MCS (queued) lock first before spinning on the owner field.
 */
-   if (!osq_lock(&lock->osq))
+   if (!osq_lock_relaxed(&lock->osq))
goto done;

while (true) {
@@ -358,7 +358,7 @@ static bool mutex_optimistic_spin(struct mutex *lock,
}

mutex_set_owner(lock);
-   osq_unlock(&lock->osq);
+   osq_unlock_relaxed(&lock->osq);
return true;
}

@@ -380,7 +380,7 @@ static bool mutex_optimistic_spin(struct mutex *lock,
cpu_relax_lowlatency();
}

-   osq_unlock(&lock->osq);
+   osq_unlock_relaxed(&lock->osq);
done:
/*
 * If we fell out of the spin path because of need_resched(),
diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index 05a37857ab55..8c3d57698702 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -28,6 +28,17 @@ static inline struct optimistic_spin_node *decode_cpu(int 
encoded_cpu_val)
return per_cpu_ptr(&osq_node, cpu_nr);
}

+static inline void set_node_locked_release(struct optimistic_spin_node *node)
+{
+   smp_store_release(&node->locked, 1);
+}
+
+static inline void set_node_locked_relaxed(struct optimistic_spin_node *node)
+{
+   WRITE_ONCE(node->locked, 1);
+
+}
+
/*
 * Get a stable @node->next pointer, either for unlock() or unqueue() purposes.
 * Can return NULL in case we were the last queued and we updated @lock instead.
@@ -81,130 +92,140 @@ osq_wait_next(struct optimistic_spin_queue *lock,
return next;
}

-bool osq_lock(struct optimistic_spin_queue *lock)
-{
-   struct optimistic_spin_node *node = this_cpu_ptr(&osq_node);
-   struct optimistic_spin_node *prev, *next;
-   int curr = encode_cpu(smp_processor_id());
-   int old;
-
-   node->locked = 0;
-   node->next = NULL;
-   node->cpu = curr;
-
-   /*
-* We need both ACQUIRE (pairs with corresponding RELEASE in
-* unlock() uncontended, or fastpath) and RELEASE (to publish
-* the node fields we just initialised) semantics when updating
-* the lock tail.
-*/
-   old = atomic_xchg(&lock->tail, curr);
-   if (old == OSQ_UNLOCKED_VAL)
-