Re: [PATCH 2/2] x86/mtrr: Refactor PAT initialization code

2016-03-29 Thread Luis R. Rodriguez
On Tue, Mar 29, 2016 at 5:16 PM, Toshi Kani  wrote:
> On Tue, 2016-03-29 at 15:12 -0700, Luis R. Rodriguez wrote:
>> On Tue, Mar 29, 2016 at 2:46 PM, Toshi Kani  wrote:
>> > On Tue, 2016-03-29 at 10:14 -0700, Luis R. Rodriguez wrote:
>> > > On Fri, Mar 18, 2016 at 2:35 PM, Toshi Kani 
>> > > wrote:
>  :
>> > >
>> > > Do we really need UC for the fan?
>> >
>> > When you say "we", are you referring Xen guests?  Xen guests do not
>> > need to control the fan, so they do not need UC set in MTRRs.
>> >
>> > In general, yes, MMIO registers need UC when they need to be accessed.
>>
>> Curious, what does a BIOS do for fan control when MTRRs are disabled?
>
> You mean, when the kernel modified the MTRR setup and disabled them.

Nope, but the below is good to know!

I meant to ask about the case where the option the lets a user go in a
muck with BIOS settings to disable MTRR e xists and the user disables
MTRR. What would happen for fan control in such situations? I'd
imagine such cases allow for a system to exist with proper fan
control, and allow the kernel to boot without having to deal with the
pesky MTRRs at all, while PAT lives on, no?

> BIOS
> would assume the original setup and still access the registers.  This may
> lead to undefined behavior and may result in a system crash.
>
>> Also what if a BIOS just set MSR_MTRRdefType to uncachable only ?
>
> Many BIOSes actually set the default type to UC.

Thanks, I asked as I saw my BIOS uses write-back by default. Good to
know there are different strategies.

> MTRRs then cover regular memory with WB.

When you say regular memory you mean everything else we see as RAM? I
was under the impression we'd only need MTRR for a special range of
memory, and its up to implementation how they are used. If you can use
MTRR to change the cache attribute for regular RAM and if this is
actually a requirement if the default MTRR is UC then one way or
another a BIOS seems to always require MTRR, either for UC setting for
fan control or WB for regular RAM, is that right?

>> Wouldn't that help simplify the BIOS when systems are known as not
>> wanting to deal with reading MTRRs on the kernel front, even if its
>> just to read the setup ?
>
> Nope.
>
>> I'm trying to determine exactly why a BIOS cannot simply enable use an
>> alternative for what it needs for fan control and let the kernel live
>> without any MTRR code at run time as an option. Although the
>> documentation says that the same "procedure" is needed for PAT setup,
>> I see it possible to split the skeleton of the code and have each
>> peace of code live separately and compartmentalized, they'd just have
>> respective calls on the skeleton of the procedure.
>
> I agree that the MTRR rendezvous handler can be improved for PAT, but I do
> not see a compelling reason to make such change now.  With my fix, I think
> the code works reasonably for Xen.

Agreed, don't think its needed now, my questions are for future optimizations.

>> > > What is the default for PAT?
>> >
>> > There is no such thing as the default for PAT.
>> >
>> > > Can't
>> > > the same be used so that we way by default all ranges match what is
>> > > also the default by PAT? Would that really break fan control ? If we
>> > > have a match should't we be able to not have to worry about MTRRs at
>> > > all in-kernel even on bare metal?
>> >
>> > We do not need to know about BIOS impl, such as fan control, etc.  The
>> > point is that if BIOS sets MTRRs, then the kernel keeps their setup.
>>
>> Right, if the kernel no longer uses it directly it seems like an
>> aweful lot of code to keep updating simply for a BIOS requirement, I'm
>> trying to see if we can have the option to live without this
>> requirement.
>
> Please be aware of the hibernation case. I think this procedure involves
> setting MTRRs back to the original setup.

Eek, right, so best just disable them if we can.

>> > If (virtual) BIOS does not enable MTRRs, the kernel keeps them
>> > disabled.  We just need not to mess with the setup.
>>
>> Sure, thanks! I'm trying to see if we can have a similar option on bare
>> metal.
>>
>> > > Another option, which I've alluded to on the Xen thread is skipping
>> > > over the MTRR space from the e820 map. Is that not possible ? This
>> > > could be last resort... but which I'm hinting more for the Xen side
>> > > of things if we *really* need get_mtrr() on the Xen guest side of
>> > > things...
>> >
>> > There is no MTRR space in the e820 map since they are MSRs.  Since Xen
>> > guests disable MTRRs, I do not think you have any issue here...
>>
>> Xen seems to clip the e820 map given to a guest in certain MTRR
>> conditions, see init_e820(), this calls
>> machine_specific_memory_setup() which later clips MTRR if
>> mtrr_top_of_ram(). This is an Intel check that trims the e820 map if
>> MTRRs were found to be enabled and the default MTRR is not write-back.
>> If returns the address of the first non write-back variable MTRR, it
>> uses clip_to_limit() to 

[PATCH 03/11] perf tools: Make -f/--force option documentation consistent across tools

2016-03-29 Thread Arnaldo Carvalho de Melo
From: Jiri Olsa 

Signed-off-by: Jiri Olsa 
Cc: Andi Kleen 
Cc: David Ahern 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Link: 
http://lkml.kernel.org/r/1458823940-24583-6-git-send-email-jo...@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/Documentation/perf-annotate.txt | 2 +-
 tools/perf/Documentation/perf-diff.txt | 2 +-
 tools/perf/Documentation/perf-report.txt   | 2 +-
 tools/perf/Documentation/perf-script.txt   | 4 
 4 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/tools/perf/Documentation/perf-annotate.txt 
b/tools/perf/Documentation/perf-annotate.txt
index e9cd39a92dc2..778f54d4d0bd 100644
--- a/tools/perf/Documentation/perf-annotate.txt
+++ b/tools/perf/Documentation/perf-annotate.txt
@@ -33,7 +33,7 @@ OPTIONS
 
 -f::
 --force::
-Don't complain, do it.
+Don't do ownership validation.
 
 -v::
 --verbose::
diff --git a/tools/perf/Documentation/perf-diff.txt 
b/tools/perf/Documentation/perf-diff.txt
index d1deb573877f..3e9490b9c533 100644
--- a/tools/perf/Documentation/perf-diff.txt
+++ b/tools/perf/Documentation/perf-diff.txt
@@ -75,7 +75,7 @@ OPTIONS
 
 -f::
 --force::
-   Don't complain, do it.
+Don't do ownership validation.
 
 --symfs=::
 Look for files with symbols relative to this directory.
diff --git a/tools/perf/Documentation/perf-report.txt 
b/tools/perf/Documentation/perf-report.txt
index 12113992ac9d..496d42cdf02b 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -285,7 +285,7 @@ OPTIONS
 
 -f::
 --force::
-Don't complain, do it.
+Don't do ownership validation.
 
 --symfs=::
 Look for files with symbols relative to this directory.
diff --git a/tools/perf/Documentation/perf-script.txt 
b/tools/perf/Documentation/perf-script.txt
index 382ddfb45d1d..22ef3933342a 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -262,6 +262,10 @@ include::itrace.txt[]
 --ns::
Use 9 decimal places when displaying time (i.e. show the nanoseconds)
 
+-f::
+--force::
+   Don't do ownership validation.
+
 SEE ALSO
 
 linkperf:perf-record[1], linkperf:perf-script-perl[1],
-- 
2.5.5



[PATCH 05/11] perf config: Remove duplicated set_buildid_dir calls

2016-03-29 Thread Arnaldo Carvalho de Melo
From: Taeung Song 

Signed-off-by: Taeung Song 
Acked-by: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/1459099340-16911-1-git-send-email-treeze.tae...@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/perf.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/tools/perf/perf.c b/tools/perf/perf.c
index aaee0a782747..7b2df2b46525 100644
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
@@ -549,6 +549,7 @@ int main(int argc, const char **argv)
srandom(time(NULL));
 
perf_config(perf_default_config, NULL);
+   set_buildid_dir(NULL);
 
/* get debugfs/tracefs mount point from /proc/mounts */
tracing_path_mount();
@@ -572,7 +573,6 @@ int main(int argc, const char **argv)
}
if (!prefixcmp(cmd, "trace")) {
 #ifdef HAVE_LIBAUDIT_SUPPORT
-   set_buildid_dir(NULL);
setup_path();
argv[0] = "trace";
return cmd_trace(argc, argv, NULL);
@@ -587,7 +587,6 @@ int main(int argc, const char **argv)
argc--;
handle_options(, , NULL);
commit_pager_choice();
-   set_buildid_dir(NULL);
 
if (argc > 0) {
if (!prefixcmp(argv[0], "--"))
-- 
2.5.5



[PATCH 06/11] perf config: Rework buildid_dir_command_config to perf_buildid_config

2016-03-29 Thread Arnaldo Carvalho de Melo
From: Taeung Song 

To avoid repeated calling perf_config() remove
buildid_dir_command_config() and add new perf_buildid_config into
perf_default_config.

Because perf_config() is already called with perf_default_config at
main().

Signed-off-by: Taeung Song 
Acked-by: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Wang Nan 
Link: 
http://lkml.kernel.org/r/1459099340-16911-2-git-send-email-treeze.tae...@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/config.c | 50 +---
 1 file changed, 18 insertions(+), 32 deletions(-)

diff --git a/tools/perf/util/config.c b/tools/perf/util/config.c
index 4e727635476e..2dd78f4c97a0 100644
--- a/tools/perf/util/config.c
+++ b/tools/perf/util/config.c
@@ -377,6 +377,21 @@ const char *perf_config_dirname(const char *name, const 
char *value)
return value;
 }
 
+static int perf_buildid_config(const char *var, const char *value)
+{
+   /* same dir for all commands */
+   if (!strcmp(var, "buildid.dir")) {
+   const char *dirname = perf_config_dirname(var, value);
+
+   if (!dirname)
+   return -1;
+   strncpy(buildid_dir, dirname, MAXPATHLEN-1);
+   buildid_dir[MAXPATHLEN-1] = '\0';
+   }
+
+   return 0;
+}
+
 static int perf_default_core_config(const char *var __maybe_unused,
const char *value __maybe_unused)
 {
@@ -412,6 +427,9 @@ int perf_default_config(const char *var, const char *value,
if (!prefixcmp(var, "llvm."))
return perf_llvm_config(var, value);
 
+   if (!prefixcmp(var, "buildid."))
+   return perf_buildid_config(var, value);
+
/* Add other config variables here. */
return 0;
 }
@@ -515,43 +533,11 @@ int config_error_nonbool(const char *var)
return error("Missing value for '%s'", var);
 }
 
-struct buildid_dir_config {
-   char *dir;
-};
-
-static int buildid_dir_command_config(const char *var, const char *value,
- void *data)
-{
-   struct buildid_dir_config *c = data;
-   const char *v;
-
-   /* same dir for all commands */
-   if (!strcmp(var, "buildid.dir")) {
-   v = perf_config_dirname(var, value);
-   if (!v)
-   return -1;
-   strncpy(c->dir, v, MAXPATHLEN-1);
-   c->dir[MAXPATHLEN-1] = '\0';
-   }
-   return 0;
-}
-
-static void check_buildid_dir_config(void)
-{
-   struct buildid_dir_config c;
-   c.dir = buildid_dir;
-   perf_config(buildid_dir_command_config, );
-}
-
 void set_buildid_dir(const char *dir)
 {
if (dir)
scnprintf(buildid_dir, MAXPATHLEN-1, "%s", dir);
 
-   /* try config file */
-   if (buildid_dir[0] == '\0')
-   check_buildid_dir_config();
-
/* default to $HOME/.debug */
if (buildid_dir[0] == '\0') {
char *v = getenv("HOME");
-- 
2.5.5



[PATCH 05/11] perf config: Remove duplicated set_buildid_dir calls

2016-03-29 Thread Arnaldo Carvalho de Melo
From: Taeung Song 

Signed-off-by: Taeung Song 
Acked-by: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/1459099340-16911-1-git-send-email-treeze.tae...@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/perf.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/tools/perf/perf.c b/tools/perf/perf.c
index aaee0a782747..7b2df2b46525 100644
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
@@ -549,6 +549,7 @@ int main(int argc, const char **argv)
srandom(time(NULL));
 
perf_config(perf_default_config, NULL);
+   set_buildid_dir(NULL);
 
/* get debugfs/tracefs mount point from /proc/mounts */
tracing_path_mount();
@@ -572,7 +573,6 @@ int main(int argc, const char **argv)
}
if (!prefixcmp(cmd, "trace")) {
 #ifdef HAVE_LIBAUDIT_SUPPORT
-   set_buildid_dir(NULL);
setup_path();
argv[0] = "trace";
return cmd_trace(argc, argv, NULL);
@@ -587,7 +587,6 @@ int main(int argc, const char **argv)
argc--;
handle_options(, , NULL);
commit_pager_choice();
-   set_buildid_dir(NULL);
 
if (argc > 0) {
if (!prefixcmp(argv[0], "--"))
-- 
2.5.5



[PATCH 06/11] perf config: Rework buildid_dir_command_config to perf_buildid_config

2016-03-29 Thread Arnaldo Carvalho de Melo
From: Taeung Song 

To avoid repeated calling perf_config() remove
buildid_dir_command_config() and add new perf_buildid_config into
perf_default_config.

Because perf_config() is already called with perf_default_config at
main().

Signed-off-by: Taeung Song 
Acked-by: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Wang Nan 
Link: 
http://lkml.kernel.org/r/1459099340-16911-2-git-send-email-treeze.tae...@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/config.c | 50 +---
 1 file changed, 18 insertions(+), 32 deletions(-)

diff --git a/tools/perf/util/config.c b/tools/perf/util/config.c
index 4e727635476e..2dd78f4c97a0 100644
--- a/tools/perf/util/config.c
+++ b/tools/perf/util/config.c
@@ -377,6 +377,21 @@ const char *perf_config_dirname(const char *name, const 
char *value)
return value;
 }
 
+static int perf_buildid_config(const char *var, const char *value)
+{
+   /* same dir for all commands */
+   if (!strcmp(var, "buildid.dir")) {
+   const char *dirname = perf_config_dirname(var, value);
+
+   if (!dirname)
+   return -1;
+   strncpy(buildid_dir, dirname, MAXPATHLEN-1);
+   buildid_dir[MAXPATHLEN-1] = '\0';
+   }
+
+   return 0;
+}
+
 static int perf_default_core_config(const char *var __maybe_unused,
const char *value __maybe_unused)
 {
@@ -412,6 +427,9 @@ int perf_default_config(const char *var, const char *value,
if (!prefixcmp(var, "llvm."))
return perf_llvm_config(var, value);
 
+   if (!prefixcmp(var, "buildid."))
+   return perf_buildid_config(var, value);
+
/* Add other config variables here. */
return 0;
 }
@@ -515,43 +533,11 @@ int config_error_nonbool(const char *var)
return error("Missing value for '%s'", var);
 }
 
-struct buildid_dir_config {
-   char *dir;
-};
-
-static int buildid_dir_command_config(const char *var, const char *value,
- void *data)
-{
-   struct buildid_dir_config *c = data;
-   const char *v;
-
-   /* same dir for all commands */
-   if (!strcmp(var, "buildid.dir")) {
-   v = perf_config_dirname(var, value);
-   if (!v)
-   return -1;
-   strncpy(c->dir, v, MAXPATHLEN-1);
-   c->dir[MAXPATHLEN-1] = '\0';
-   }
-   return 0;
-}
-
-static void check_buildid_dir_config(void)
-{
-   struct buildid_dir_config c;
-   c.dir = buildid_dir;
-   perf_config(buildid_dir_command_config, );
-}
-
 void set_buildid_dir(const char *dir)
 {
if (dir)
scnprintf(buildid_dir, MAXPATHLEN-1, "%s", dir);
 
-   /* try config file */
-   if (buildid_dir[0] == '\0')
-   check_buildid_dir_config();
-
/* default to $HOME/.debug */
if (buildid_dir[0] == '\0') {
char *v = getenv("HOME");
-- 
2.5.5



[PATCH 04/11] perf tests: Add test to check for event times

2016-03-29 Thread Arnaldo Carvalho de Melo
From: Jiri Olsa 

This test creates software event 'cpu-clock' attaches it in several ways
and checks that enabled and running times match.

Committer notes:

Testing it:

  [acme@jouet linux]$ perf test -v times
  44: Test events times:
  --- start ---
  test child forked, pid 27170
  attaching to spawned child, enable on exec
OK: ena 307328, run 307328
  attaching to current thread as enabled
OK: ena 7826, run 7826
  attaching to current thread as disabled
OK: ena 738, run 738
  attaching to CPU 0 as enabled
SKIP  : not enough rights
  attaching to CPU 0 as enabled
SKIP  : not enough rights
  test child finished with -2
   end 
  Test events times: Skip
  [acme@jouet linux]$

  [root@jouet ~]# perf test times
  44: Test events times: Ok
  [root@jouet ~]# perf test -v times
  44: Test events times:
  --- start ---
  test child forked, pid 27306
  attaching to spawned child, enable on exec
OK: ena 479290, run 479290
  attaching to current thread as enabled
OK: ena 11356, run 11356
  attaching to current thread as disabled
OK: ena 987, run 987
  attaching to CPU 0 as enabled
OK: ena 3717, run 3717
  attaching to CPU 0 as enabled
OK: ena 2323, run 2323
  test child finished with 0
   end 
  Test events times: Ok
  [root@jouet ~]#

Signed-off-by: Jiri Olsa 
Tested-by: Arnaldo Carvalho de Melo 
Cc: Andi Kleen 
Cc: David Ahern 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Link: 
http://lkml.kernel.org/r/1458823940-24583-7-git-send-email-jo...@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/tests/Build  |   1 +
 tools/perf/tests/builtin-test.c |   4 +
 tools/perf/tests/event-times.c  | 236 
 tools/perf/tests/tests.h|   1 +
 4 files changed, 242 insertions(+)
 create mode 100644 tools/perf/tests/event-times.c

diff --git a/tools/perf/tests/Build b/tools/perf/tests/Build
index 1ba628ed049a..449fe97a555f 100644
--- a/tools/perf/tests/Build
+++ b/tools/perf/tests/Build
@@ -37,6 +37,7 @@ perf-y += topology.o
 perf-y += cpumap.o
 perf-y += stat.o
 perf-y += event_update.o
+perf-y += event-times.o
 
 $(OUTPUT)tests/llvm-src-base.c: tests/bpf-script-example.c tests/Build
$(call rule_mkdir)
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index f2b1dcac45d3..93c467015e71 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -204,6 +204,10 @@ static struct test generic_tests[] = {
.func = test__event_update,
},
{
+   .desc = "Test events times",
+   .func = test__event_times,
+   },
+   {
.func = NULL,
},
 };
diff --git a/tools/perf/tests/event-times.c b/tools/perf/tests/event-times.c
new file mode 100644
index ..95fb744f6628
--- /dev/null
+++ b/tools/perf/tests/event-times.c
@@ -0,0 +1,236 @@
+#include 
+#include 
+#include "tests.h"
+#include "evlist.h"
+#include "evsel.h"
+#include "util.h"
+#include "debug.h"
+#include "thread_map.h"
+#include "target.h"
+
+static int attach__enable_on_exec(struct perf_evlist *evlist)
+{
+   struct perf_evsel *evsel = perf_evlist__last(evlist);
+   struct target target = {
+   .uid = UINT_MAX,
+   };
+   const char *argv[] = { "true", NULL, };
+   char sbuf[STRERR_BUFSIZE];
+   int err;
+
+   pr_debug("attaching to spawned child, enable on exec\n");
+
+   err = perf_evlist__create_maps(evlist, );
+   if (err < 0) {
+   pr_debug("Not enough memory to create thread/cpu maps\n");
+   return err;
+   }
+
+   err = perf_evlist__prepare_workload(evlist, , argv, false, NULL);
+   if (err < 0) {
+   pr_debug("Couldn't run the workload!\n");
+   return err;
+   }
+
+   evsel->attr.enable_on_exec = 1;
+
+   err = perf_evlist__open(evlist);
+   if (err < 0) {
+   pr_debug("perf_evlist__open: %s\n",
+strerror_r(errno, sbuf, sizeof(sbuf)));
+   return err;
+   }
+
+   return perf_evlist__start_workload(evlist) == 1 ? TEST_OK : TEST_FAIL;
+}
+
+static int detach__enable_on_exec(struct perf_evlist *evlist)
+{
+   waitpid(evlist->workload.pid, NULL, 0);
+   return 0;
+}
+
+static int attach__current_disabled(struct perf_evlist *evlist)
+{
+   struct perf_evsel *evsel = perf_evlist__last(evlist);
+   struct thread_map *threads;
+   int err;
+
+   pr_debug("attaching to current thread as disabled\n");
+
+   threads = thread_map__new(-1, getpid(), UINT_MAX);
+   

[PATCH 04/11] perf tests: Add test to check for event times

2016-03-29 Thread Arnaldo Carvalho de Melo
From: Jiri Olsa 

This test creates software event 'cpu-clock' attaches it in several ways
and checks that enabled and running times match.

Committer notes:

Testing it:

  [acme@jouet linux]$ perf test -v times
  44: Test events times:
  --- start ---
  test child forked, pid 27170
  attaching to spawned child, enable on exec
OK: ena 307328, run 307328
  attaching to current thread as enabled
OK: ena 7826, run 7826
  attaching to current thread as disabled
OK: ena 738, run 738
  attaching to CPU 0 as enabled
SKIP  : not enough rights
  attaching to CPU 0 as enabled
SKIP  : not enough rights
  test child finished with -2
   end 
  Test events times: Skip
  [acme@jouet linux]$

  [root@jouet ~]# perf test times
  44: Test events times: Ok
  [root@jouet ~]# perf test -v times
  44: Test events times:
  --- start ---
  test child forked, pid 27306
  attaching to spawned child, enable on exec
OK: ena 479290, run 479290
  attaching to current thread as enabled
OK: ena 11356, run 11356
  attaching to current thread as disabled
OK: ena 987, run 987
  attaching to CPU 0 as enabled
OK: ena 3717, run 3717
  attaching to CPU 0 as enabled
OK: ena 2323, run 2323
  test child finished with 0
   end 
  Test events times: Ok
  [root@jouet ~]#

Signed-off-by: Jiri Olsa 
Tested-by: Arnaldo Carvalho de Melo 
Cc: Andi Kleen 
Cc: David Ahern 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Link: 
http://lkml.kernel.org/r/1458823940-24583-7-git-send-email-jo...@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/tests/Build  |   1 +
 tools/perf/tests/builtin-test.c |   4 +
 tools/perf/tests/event-times.c  | 236 
 tools/perf/tests/tests.h|   1 +
 4 files changed, 242 insertions(+)
 create mode 100644 tools/perf/tests/event-times.c

diff --git a/tools/perf/tests/Build b/tools/perf/tests/Build
index 1ba628ed049a..449fe97a555f 100644
--- a/tools/perf/tests/Build
+++ b/tools/perf/tests/Build
@@ -37,6 +37,7 @@ perf-y += topology.o
 perf-y += cpumap.o
 perf-y += stat.o
 perf-y += event_update.o
+perf-y += event-times.o
 
 $(OUTPUT)tests/llvm-src-base.c: tests/bpf-script-example.c tests/Build
$(call rule_mkdir)
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index f2b1dcac45d3..93c467015e71 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -204,6 +204,10 @@ static struct test generic_tests[] = {
.func = test__event_update,
},
{
+   .desc = "Test events times",
+   .func = test__event_times,
+   },
+   {
.func = NULL,
},
 };
diff --git a/tools/perf/tests/event-times.c b/tools/perf/tests/event-times.c
new file mode 100644
index ..95fb744f6628
--- /dev/null
+++ b/tools/perf/tests/event-times.c
@@ -0,0 +1,236 @@
+#include 
+#include 
+#include "tests.h"
+#include "evlist.h"
+#include "evsel.h"
+#include "util.h"
+#include "debug.h"
+#include "thread_map.h"
+#include "target.h"
+
+static int attach__enable_on_exec(struct perf_evlist *evlist)
+{
+   struct perf_evsel *evsel = perf_evlist__last(evlist);
+   struct target target = {
+   .uid = UINT_MAX,
+   };
+   const char *argv[] = { "true", NULL, };
+   char sbuf[STRERR_BUFSIZE];
+   int err;
+
+   pr_debug("attaching to spawned child, enable on exec\n");
+
+   err = perf_evlist__create_maps(evlist, );
+   if (err < 0) {
+   pr_debug("Not enough memory to create thread/cpu maps\n");
+   return err;
+   }
+
+   err = perf_evlist__prepare_workload(evlist, , argv, false, NULL);
+   if (err < 0) {
+   pr_debug("Couldn't run the workload!\n");
+   return err;
+   }
+
+   evsel->attr.enable_on_exec = 1;
+
+   err = perf_evlist__open(evlist);
+   if (err < 0) {
+   pr_debug("perf_evlist__open: %s\n",
+strerror_r(errno, sbuf, sizeof(sbuf)));
+   return err;
+   }
+
+   return perf_evlist__start_workload(evlist) == 1 ? TEST_OK : TEST_FAIL;
+}
+
+static int detach__enable_on_exec(struct perf_evlist *evlist)
+{
+   waitpid(evlist->workload.pid, NULL, 0);
+   return 0;
+}
+
+static int attach__current_disabled(struct perf_evlist *evlist)
+{
+   struct perf_evsel *evsel = perf_evlist__last(evlist);
+   struct thread_map *threads;
+   int err;
+
+   pr_debug("attaching to current thread as disabled\n");
+
+   threads = thread_map__new(-1, getpid(), UINT_MAX);
+   if (threads == NULL) {
+   pr_debug("thread_map__new\n");
+   return -1;
+   }
+
+   evsel->attr.disabled = 1;
+
+   err = 

[PATCH 07/11] perf config: Rename 'v' to 'home' in set_buildid_dir()

2016-03-29 Thread Arnaldo Carvalho de Melo
From: Taeung Song 

Change the variable name 'v' to 'home' to make it more readable.

Signed-off-by: Taeung Song 
Acked-by: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/1459099340-16911-3-git-send-email-treeze.tae...@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/config.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/config.c b/tools/perf/util/config.c
index 2dd78f4c97a0..5c20d783423b 100644
--- a/tools/perf/util/config.c
+++ b/tools/perf/util/config.c
@@ -540,10 +540,11 @@ void set_buildid_dir(const char *dir)
 
/* default to $HOME/.debug */
if (buildid_dir[0] == '\0') {
-   char *v = getenv("HOME");
-   if (v) {
+   char *home = getenv("HOME");
+
+   if (home) {
snprintf(buildid_dir, MAXPATHLEN-1, "%s/%s",
-v, DEBUG_CACHE_DIR);
+home, DEBUG_CACHE_DIR);
} else {
strncpy(buildid_dir, DEBUG_CACHE_DIR, MAXPATHLEN-1);
}
-- 
2.5.5



[PATCH 08/11] perf script perl: Perl scripts now get a backtrace, like the python ones

2016-03-29 Thread Arnaldo Carvalho de Melo
From: Dima Kogan 

We have some infrastructure to use perl or python to analyze logs
generated by perf.  Prior to this patch, only the python tools had
access to backtrace information.  This patch makes this information
available to perl scripts as well.  Example:

  Let's look at malloc() calls made by the seq utility.  First we
  create a probe point:

  $ perf probe -x /lib/x86_64-linux-gnu/libc.so.6 malloc
  Added new events:
  ...

  Now we run seq, while monitoring malloc() calls with perf

  $ perf record --call-graph=dwarf -e probe_libc:malloc seq 5
  1
  2
  3
  4
  5
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.064 MB perf.data (6 samples) ]

  We can use perf to look at its log to see the malloc calls and the backtrace

  $ perf script
  seq 14195 [000] 1927993.748254: probe_libc:malloc: (7f9ff8edd320) 
bytes=0x22
  7f9ff8edd320 malloc (/lib/x86_64-linux-gnu/libc-2.22.so)
  7f9ff8e8eab0 set_binding_values.part.0 
(/lib/x86_64-linux-gnu/libc-2.22.so)
  7f9ff8e8eda1 __bindtextdomain 
(/lib/x86_64-linux-gnu/libc-2.22.so)
401b22 main (/usr/bin/seq)
  7f9ff8e82610 __libc_start_main 
(/lib/x86_64-linux-gnu/libc-2.22.so)
402799 _start (/usr/bin/seq)
  ...

  We can also use the scripting facilities.  We create a skeleton perl
  script that simply prints out the events

  $ perf script -g perl
  generated Perl script: perf-script.pl

  We can then use this script to see the malloc() calls with a
  backtrace.  Prior to this patch, the backtrace was not available to
  the perl scripts.

  $ perf script -s perf-script.pl
  probe_libc::malloc  0 1927993.748254260  14195 seq   
__probe_ip=140325052863264, bytes=34
  [7f9ff8edd320] malloc
  [7f9ff8e8eab0] set_binding_values.part.0
  [7f9ff8e8eda1] __bindtextdomain
  [401b22] main
  [7f9ff8e82610] __libc_start_main
  [402799] _start
  ...

Tested-by: Arnaldo Carvalho de Melo 
Link: http://lkml.kernel.org/r/87mvphzld0@secretsauce.net
Signed-off-by: Dima Kogan 
---
 .../perf/util/scripting-engines/trace-event-perl.c | 114 +++--
 1 file changed, 106 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/scripting-engines/trace-event-perl.c 
b/tools/perf/util/scripting-engines/trace-event-perl.c
index b3aabc0d4eb0..1d160855cda9 100644
--- a/tools/perf/util/scripting-engines/trace-event-perl.c
+++ b/tools/perf/util/scripting-engines/trace-event-perl.c
@@ -31,6 +31,8 @@
 #include 
 
 #include "../../perf.h"
+#include "../callchain.h"
+#include "../machine.h"
 #include "../thread.h"
 #include "../event.h"
 #include "../trace-event.h"
@@ -248,10 +250,78 @@ static void define_event_symbols(struct event_format 
*event,
define_event_symbols(event, ev_name, args->next);
 }
 
+static SV *perl_process_callchain(struct perf_sample *sample,
+ struct perf_evsel *evsel,
+ struct addr_location *al)
+{
+   AV *list;
+
+   list = newAV();
+   if (!list)
+   goto exit;
+
+   if (!symbol_conf.use_callchain || !sample->callchain)
+   goto exit;
+
+   if (thread__resolve_callchain(al->thread, evsel,
+ sample, NULL, NULL,
+ PERF_MAX_STACK_DEPTH) != 0) {
+   pr_err("Failed to resolve callchain. Skipping\n");
+   goto exit;
+   }
+   callchain_cursor_commit(_cursor);
+
+
+   while (1) {
+   HV *elem;
+   struct callchain_cursor_node *node;
+   node = callchain_cursor_current(_cursor);
+   if (!node)
+   break;
+
+   elem = newHV();
+   if (!elem)
+   goto exit;
+
+   hv_stores(elem, "ip", newSVuv(node->ip));
+
+   if (node->sym) {
+   HV *sym = newHV();
+   if (!sym)
+   goto exit;
+   hv_stores(sym, "start",   newSVuv(node->sym->start));
+   hv_stores(sym, "end", newSVuv(node->sym->end));
+   hv_stores(sym, "binding", newSVuv(node->sym->binding));
+   hv_stores(sym, "name",newSVpvn(node->sym->name,
+  node->sym->namelen));
+   hv_stores(elem, "sym",newRV_noinc((SV*)sym));
+   }
+
+   if (node->map) {
+   struct map *map = node->map;
+   const char *dsoname = "[unknown]";
+   if (map && map->dso && (map->dso->name || 

[PATCH 07/11] perf config: Rename 'v' to 'home' in set_buildid_dir()

2016-03-29 Thread Arnaldo Carvalho de Melo
From: Taeung Song 

Change the variable name 'v' to 'home' to make it more readable.

Signed-off-by: Taeung Song 
Acked-by: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/1459099340-16911-3-git-send-email-treeze.tae...@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/config.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/config.c b/tools/perf/util/config.c
index 2dd78f4c97a0..5c20d783423b 100644
--- a/tools/perf/util/config.c
+++ b/tools/perf/util/config.c
@@ -540,10 +540,11 @@ void set_buildid_dir(const char *dir)
 
/* default to $HOME/.debug */
if (buildid_dir[0] == '\0') {
-   char *v = getenv("HOME");
-   if (v) {
+   char *home = getenv("HOME");
+
+   if (home) {
snprintf(buildid_dir, MAXPATHLEN-1, "%s/%s",
-v, DEBUG_CACHE_DIR);
+home, DEBUG_CACHE_DIR);
} else {
strncpy(buildid_dir, DEBUG_CACHE_DIR, MAXPATHLEN-1);
}
-- 
2.5.5



[PATCH 08/11] perf script perl: Perl scripts now get a backtrace, like the python ones

2016-03-29 Thread Arnaldo Carvalho de Melo
From: Dima Kogan 

We have some infrastructure to use perl or python to analyze logs
generated by perf.  Prior to this patch, only the python tools had
access to backtrace information.  This patch makes this information
available to perl scripts as well.  Example:

  Let's look at malloc() calls made by the seq utility.  First we
  create a probe point:

  $ perf probe -x /lib/x86_64-linux-gnu/libc.so.6 malloc
  Added new events:
  ...

  Now we run seq, while monitoring malloc() calls with perf

  $ perf record --call-graph=dwarf -e probe_libc:malloc seq 5
  1
  2
  3
  4
  5
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.064 MB perf.data (6 samples) ]

  We can use perf to look at its log to see the malloc calls and the backtrace

  $ perf script
  seq 14195 [000] 1927993.748254: probe_libc:malloc: (7f9ff8edd320) 
bytes=0x22
  7f9ff8edd320 malloc (/lib/x86_64-linux-gnu/libc-2.22.so)
  7f9ff8e8eab0 set_binding_values.part.0 
(/lib/x86_64-linux-gnu/libc-2.22.so)
  7f9ff8e8eda1 __bindtextdomain 
(/lib/x86_64-linux-gnu/libc-2.22.so)
401b22 main (/usr/bin/seq)
  7f9ff8e82610 __libc_start_main 
(/lib/x86_64-linux-gnu/libc-2.22.so)
402799 _start (/usr/bin/seq)
  ...

  We can also use the scripting facilities.  We create a skeleton perl
  script that simply prints out the events

  $ perf script -g perl
  generated Perl script: perf-script.pl

  We can then use this script to see the malloc() calls with a
  backtrace.  Prior to this patch, the backtrace was not available to
  the perl scripts.

  $ perf script -s perf-script.pl
  probe_libc::malloc  0 1927993.748254260  14195 seq   
__probe_ip=140325052863264, bytes=34
  [7f9ff8edd320] malloc
  [7f9ff8e8eab0] set_binding_values.part.0
  [7f9ff8e8eda1] __bindtextdomain
  [401b22] main
  [7f9ff8e82610] __libc_start_main
  [402799] _start
  ...

Tested-by: Arnaldo Carvalho de Melo 
Link: http://lkml.kernel.org/r/87mvphzld0@secretsauce.net
Signed-off-by: Dima Kogan 
---
 .../perf/util/scripting-engines/trace-event-perl.c | 114 +++--
 1 file changed, 106 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/scripting-engines/trace-event-perl.c 
b/tools/perf/util/scripting-engines/trace-event-perl.c
index b3aabc0d4eb0..1d160855cda9 100644
--- a/tools/perf/util/scripting-engines/trace-event-perl.c
+++ b/tools/perf/util/scripting-engines/trace-event-perl.c
@@ -31,6 +31,8 @@
 #include 
 
 #include "../../perf.h"
+#include "../callchain.h"
+#include "../machine.h"
 #include "../thread.h"
 #include "../event.h"
 #include "../trace-event.h"
@@ -248,10 +250,78 @@ static void define_event_symbols(struct event_format 
*event,
define_event_symbols(event, ev_name, args->next);
 }
 
+static SV *perl_process_callchain(struct perf_sample *sample,
+ struct perf_evsel *evsel,
+ struct addr_location *al)
+{
+   AV *list;
+
+   list = newAV();
+   if (!list)
+   goto exit;
+
+   if (!symbol_conf.use_callchain || !sample->callchain)
+   goto exit;
+
+   if (thread__resolve_callchain(al->thread, evsel,
+ sample, NULL, NULL,
+ PERF_MAX_STACK_DEPTH) != 0) {
+   pr_err("Failed to resolve callchain. Skipping\n");
+   goto exit;
+   }
+   callchain_cursor_commit(_cursor);
+
+
+   while (1) {
+   HV *elem;
+   struct callchain_cursor_node *node;
+   node = callchain_cursor_current(_cursor);
+   if (!node)
+   break;
+
+   elem = newHV();
+   if (!elem)
+   goto exit;
+
+   hv_stores(elem, "ip", newSVuv(node->ip));
+
+   if (node->sym) {
+   HV *sym = newHV();
+   if (!sym)
+   goto exit;
+   hv_stores(sym, "start",   newSVuv(node->sym->start));
+   hv_stores(sym, "end", newSVuv(node->sym->end));
+   hv_stores(sym, "binding", newSVuv(node->sym->binding));
+   hv_stores(sym, "name",newSVpvn(node->sym->name,
+  node->sym->namelen));
+   hv_stores(elem, "sym",newRV_noinc((SV*)sym));
+   }
+
+   if (node->map) {
+   struct map *map = node->map;
+   const char *dsoname = "[unknown]";
+   if (map && map->dso && (map->dso->name || 
map->dso->long_name)) {
+   if 

[PATCH 11/11] perf script: Add support for printing assembler

2016-03-29 Thread Arnaldo Carvalho de Melo
From: Andi Kleen 

When dumping PT traces with perf script it is very useful to see the
assembler for each sample, so that it is easily possible to follow the
control flow.

As using objdump is difficult and inefficient from perf script this
patch uses the udis86 library to implement assembler output.  The
library can be downloaded from http://udis86.sourceforge.net/

The library is probed as an external dependency in the usual way. Then
'perf script' calls into it when needed, and handles callbacks to
resolve symbols.

  % perf record -e intel_pt//u true
  % perf script -F sym,symoff,ip,asm --itrace=i0ns | head
 7fc7188b4190 _start+0x0mov %rsp, %rdi
 7fc7188b4193 _start+0x3call _dl_start
 7fc7188b7710 _dl_start+0x0 push %rbp
 7fc7188b7711 _dl_start+0x1 mov %rsp, %rbp
 7fc7188b7714 _dl_start+0x4 push %r15
 7fc7188b7716 _dl_start+0x6 push %r14
 7fc7188b7718 _dl_start+0x8 push %r13
 7fc7188b771a _dl_start+0xa push %r12
 7fc7188b771c _dl_start+0xc mov %rdi, %r12
 7fc7188b771f _dl_start+0xf push %rbx

Current issues:

- Some jump references do not get resolved to symbols.
- udis86 release does not support STAC/CLAC, which are used in the kernel,
  but there is a pending patch for it.

v2: Fix address resolution. Port to latest acme/perf/core

Committer note:

To test intel_pt one needs to make sure VT-x isn't active, i.e.
stopping KVM guests on the test machine, as described by Andi Kleen at
http://lkml.kernel.org/r/20160301234953.gd23...@tassilo.jf.intel.com

Signed-off-by: Andi Kleen 
Tested-by: Arnaldo Carvalho de Melo 
Cc: Adrian Hunter 
Cc: Jiri Olsa 
Cc: Stephane Eranian 
Link: 
http://lkml.kernel.org/r/1459187142-20035-3-git-send-email-a...@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/Documentation/perf-script.txt |   4 +-
 tools/perf/builtin-script.c  | 107 +--
 2 files changed, 105 insertions(+), 6 deletions(-)

diff --git a/tools/perf/Documentation/perf-script.txt 
b/tools/perf/Documentation/perf-script.txt
index 22ef3933342a..f2b81d837799 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -116,7 +116,7 @@ OPTIONS
 --fields::
 Comma separated list of fields to print. Options are:
 comm, tid, pid, time, cpu, event, trace, ip, sym, dso, addr, symoff,
-   srcline, period, iregs, brstack, brstacksym, flags.
+   srcline, period, iregs, brstack, brstacksym, flags, asm.
 Field list can be prepended with the type, trace, sw or hw,
 to indicate to which event type the field list applies.
 e.g., -f sw:comm,tid,time,ip,sym  and -f trace:time,cpu,trace
@@ -185,6 +185,8 @@ OPTIONS
 
The brstacksym is identical to brstack, except that the FROM and TO 
addresses are printed in a symbolic form if possible.
 
+   When asm is specified the assembler instruction of each sample is 
printed in disassembled form.
+
 -k::
 --vmlinux=::
 vmlinux pathname
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 3770c3dffe5e..323572e72706 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -25,6 +25,10 @@
 #include "asm/bug.h"
 #include "util/mem-events.h"
 
+#ifdef HAVE_UDIS86
+#include 
+#endif
+
 static char const  *script_name;
 static char const  *generate_script_lang;
 static booldebug_mode;
@@ -62,6 +66,7 @@ enum perf_output_field {
PERF_OUTPUT_DATA_SRC= 1U << 17,
PERF_OUTPUT_WEIGHT  = 1U << 18,
PERF_OUTPUT_BPF_OUTPUT  = 1U << 19,
+   PERF_OUTPUT_ASM = 1U << 20,
 };
 
 struct output_option {
@@ -88,6 +93,7 @@ struct output_option {
{.str = "data_src", .field = PERF_OUTPUT_DATA_SRC},
{.str = "weight",   .field = PERF_OUTPUT_WEIGHT},
{.str = "bpf-output",   .field = PERF_OUTPUT_BPF_OUTPUT},
+   {.str = "asm", .field = PERF_OUTPUT_ASM},
 };
 
 /* default set to maintain compatibility with current format */
@@ -282,7 +288,11 @@ static int perf_evsel__check_attr(struct perf_evsel *evsel,
   "selected. Hence, no address to lookup the source line 
number.\n");
return -EINVAL;
}
-
+   if (PRINT_FIELD(ASM) && !PRINT_FIELD(IP)) {
+   pr_err("Display of assembler requested but sample IP is not\n"
+  "selected.\n");
+   return -EINVAL;
+   }
if ((PRINT_FIELD(PID) || PRINT_FIELD(TID)) &&
perf_evsel__check_stype(evsel, PERF_SAMPLE_TID, "TID",
PERF_OUTPUT_TID|PERF_OUTPUT_PID))
@@ -421,6 +431,88 @@ static void print_sample_iregs(struct perf_sample *sample,
}
 }
 
+#ifdef HAVE_UDIS86
+
+struct perf_ud {
+   ud_t ud_obj;
+ 

Re: [RFC][PATCH] mm/slub: Skip CPU slab activation when debugging

2016-03-29 Thread Laura Abbott

On 03/28/2016 06:52 PM, Laura Abbott wrote:

On 03/28/2016 03:53 PM, Laura Abbott wrote:

The per-cpu slab is designed to be the primary path for allocation in SLUB
since it assumed allocations will go through the fast path if possible.
When debugging is enabled, the fast path is disabled and per-cpu
allocations are not used. The current debugging code path still activates
the cpu slab for allocations and then immediately deactivates it. This
is useless work. When a slab is enabled for debugging, skip cpu
activation.

Signed-off-by: Laura Abbott 
---
This is a follow on to the optimization of the debug paths for poisoning
With this I get ~2 second drop on hackbench -g 20 -l 1000 with slub_debug=P
and no noticable change with slub_debug=- .


zero day robot pointed out this is triggering one of the BUG_ON on bootup.
I'll take a deeper look tomorrow unless the approach is actually worthless.

---
  mm/slub.c | 82 +++
  1 file changed, 77 insertions(+), 5 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 7277413..4507bd8 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1482,8 +1482,8 @@ static struct page *allocate_slab(struct kmem_cache *s, 
gfp_t flags, int node)
  }

  page->freelist = fixup_red_left(s, start);
-page->inuse = page->objects;
-page->frozen = 1;
+page->inuse = kmem_cache_debug(s) ? 1 : page->objects;
+page->frozen = kmem_cache_debug(s) ? 0 : 1;

  out:
  if (gfpflags_allow_blocking(flags))
@@ -1658,6 +1658,64 @@ static inline void *acquire_slab(struct kmem_cache *s,
  return freelist;
  }

+
+static inline void *acquire_slab_debug(struct kmem_cache *s,
+struct kmem_cache_node *n, struct page *page,
+int mode, int *objects)
+{
+void *freelist;
+unsigned long counters;
+struct page new;
+void *next;
+
+lockdep_assert_held(>list_lock);
+
+
+/*
+ * Zap the freelist and set the frozen bit.
+ * The old freelist is the list of objects for the
+ * per cpu allocation list.
+ */
+freelist = page->freelist;
+counters = page->counters;
+
+BUG_ON(!freelist);
+
+next = get_freepointer_safe(s, freelist);
+
+new.counters = counters;
+*objects = new.objects - new.inuse;
+if (mode) {
+new.inuse++;
+new.freelist = next;
+} else {
+BUG();
+}
+
+VM_BUG_ON(new.frozen);
+
+if (!new.freelist) {
+remove_partial(n, page);
+add_full(s, n, page);
+}
+
+if (!__cmpxchg_double_slab(s, page,
+freelist, counters,
+new.freelist, new.counters,
+"acquire_slab")) {
+if (!new.freelist) {
+remove_full(s, n, page);
+add_partial(n, page, DEACTIVATE_TO_HEAD);
+}
+return NULL;
+}
+
+WARN_ON(!freelist);
+return freelist;
+}
+
+
+
  static void put_cpu_partial(struct kmem_cache *s, struct page *page, int 
drain);
  static inline bool pfmemalloc_match(struct page *page, gfp_t gfpflags);

@@ -1688,7 +1746,11 @@ static void *get_partial_node(struct kmem_cache *s, 
struct kmem_cache_node *n,
  if (!pfmemalloc_match(page, flags))
  continue;

-t = acquire_slab(s, n, page, object == NULL, );
+if (kmem_cache_debug(s))
+t = acquire_slab_debug(s, n, page, object == NULL, );
+else
+t = acquire_slab(s, n, page, object == NULL, );
+
  if (!t)
  break;

@@ -2284,7 +2346,17 @@ static inline void *new_slab_objects(struct kmem_cache 
*s, gfp_t flags,
   * muck around with it freely without cmpxchg
   */
  freelist = page->freelist;
-page->freelist = NULL;
+page->freelist = kmem_cache_debug(s) ?
+get_freepointer(s, freelist) : NULL;
+
+if (kmem_cache_debug(s)) {
+struct kmem_cache_node *n;
+
+n = get_node(s, page_to_nid(page));
+spin_lock(>list_lock);
+add_partial(n, page, DEACTIVATE_TO_HEAD);
+spin_unlock(>list_lock);
+}


This needs to account for slabs full after one object, otherwise it bugs out on 
the
partial list.



  stat(s, ALLOC_SLAB);
  c->page = page;
@@ -2446,7 +2518,7 @@ new_slab:
  !alloc_debug_processing(s, page, freelist, addr))
  goto new_slab;/* Slab failed checks. Next slab needed */

-deactivate_slab(s, page, get_freepointer(s, freelist));
+/* No need to deactivate, no cpu slab */
  c->page = NULL;
  c->freelist = NULL;
  return freelist;







[PATCH 11/11] perf script: Add support for printing assembler

2016-03-29 Thread Arnaldo Carvalho de Melo
From: Andi Kleen 

When dumping PT traces with perf script it is very useful to see the
assembler for each sample, so that it is easily possible to follow the
control flow.

As using objdump is difficult and inefficient from perf script this
patch uses the udis86 library to implement assembler output.  The
library can be downloaded from http://udis86.sourceforge.net/

The library is probed as an external dependency in the usual way. Then
'perf script' calls into it when needed, and handles callbacks to
resolve symbols.

  % perf record -e intel_pt//u true
  % perf script -F sym,symoff,ip,asm --itrace=i0ns | head
 7fc7188b4190 _start+0x0mov %rsp, %rdi
 7fc7188b4193 _start+0x3call _dl_start
 7fc7188b7710 _dl_start+0x0 push %rbp
 7fc7188b7711 _dl_start+0x1 mov %rsp, %rbp
 7fc7188b7714 _dl_start+0x4 push %r15
 7fc7188b7716 _dl_start+0x6 push %r14
 7fc7188b7718 _dl_start+0x8 push %r13
 7fc7188b771a _dl_start+0xa push %r12
 7fc7188b771c _dl_start+0xc mov %rdi, %r12
 7fc7188b771f _dl_start+0xf push %rbx

Current issues:

- Some jump references do not get resolved to symbols.
- udis86 release does not support STAC/CLAC, which are used in the kernel,
  but there is a pending patch for it.

v2: Fix address resolution. Port to latest acme/perf/core

Committer note:

To test intel_pt one needs to make sure VT-x isn't active, i.e.
stopping KVM guests on the test machine, as described by Andi Kleen at
http://lkml.kernel.org/r/20160301234953.gd23...@tassilo.jf.intel.com

Signed-off-by: Andi Kleen 
Tested-by: Arnaldo Carvalho de Melo 
Cc: Adrian Hunter 
Cc: Jiri Olsa 
Cc: Stephane Eranian 
Link: 
http://lkml.kernel.org/r/1459187142-20035-3-git-send-email-a...@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/Documentation/perf-script.txt |   4 +-
 tools/perf/builtin-script.c  | 107 +--
 2 files changed, 105 insertions(+), 6 deletions(-)

diff --git a/tools/perf/Documentation/perf-script.txt 
b/tools/perf/Documentation/perf-script.txt
index 22ef3933342a..f2b81d837799 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -116,7 +116,7 @@ OPTIONS
 --fields::
 Comma separated list of fields to print. Options are:
 comm, tid, pid, time, cpu, event, trace, ip, sym, dso, addr, symoff,
-   srcline, period, iregs, brstack, brstacksym, flags.
+   srcline, period, iregs, brstack, brstacksym, flags, asm.
 Field list can be prepended with the type, trace, sw or hw,
 to indicate to which event type the field list applies.
 e.g., -f sw:comm,tid,time,ip,sym  and -f trace:time,cpu,trace
@@ -185,6 +185,8 @@ OPTIONS
 
The brstacksym is identical to brstack, except that the FROM and TO 
addresses are printed in a symbolic form if possible.
 
+   When asm is specified the assembler instruction of each sample is 
printed in disassembled form.
+
 -k::
 --vmlinux=::
 vmlinux pathname
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 3770c3dffe5e..323572e72706 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -25,6 +25,10 @@
 #include "asm/bug.h"
 #include "util/mem-events.h"
 
+#ifdef HAVE_UDIS86
+#include 
+#endif
+
 static char const  *script_name;
 static char const  *generate_script_lang;
 static booldebug_mode;
@@ -62,6 +66,7 @@ enum perf_output_field {
PERF_OUTPUT_DATA_SRC= 1U << 17,
PERF_OUTPUT_WEIGHT  = 1U << 18,
PERF_OUTPUT_BPF_OUTPUT  = 1U << 19,
+   PERF_OUTPUT_ASM = 1U << 20,
 };
 
 struct output_option {
@@ -88,6 +93,7 @@ struct output_option {
{.str = "data_src", .field = PERF_OUTPUT_DATA_SRC},
{.str = "weight",   .field = PERF_OUTPUT_WEIGHT},
{.str = "bpf-output",   .field = PERF_OUTPUT_BPF_OUTPUT},
+   {.str = "asm", .field = PERF_OUTPUT_ASM},
 };
 
 /* default set to maintain compatibility with current format */
@@ -282,7 +288,11 @@ static int perf_evsel__check_attr(struct perf_evsel *evsel,
   "selected. Hence, no address to lookup the source line 
number.\n");
return -EINVAL;
}
-
+   if (PRINT_FIELD(ASM) && !PRINT_FIELD(IP)) {
+   pr_err("Display of assembler requested but sample IP is not\n"
+  "selected.\n");
+   return -EINVAL;
+   }
if ((PRINT_FIELD(PID) || PRINT_FIELD(TID)) &&
perf_evsel__check_stype(evsel, PERF_SAMPLE_TID, "TID",
PERF_OUTPUT_TID|PERF_OUTPUT_PID))
@@ -421,6 +431,88 @@ static void print_sample_iregs(struct perf_sample *sample,
}
 }
 
+#ifdef HAVE_UDIS86
+
+struct perf_ud {
+   ud_t ud_obj;
+   struct thread *thread;
+   u8 cpumode;
+   int cpu;
+};
+
+static const char *dis_resolve(struct ud *u, uint64_t addr, int64_t 

Re: [RFC][PATCH] mm/slub: Skip CPU slab activation when debugging

2016-03-29 Thread Laura Abbott

On 03/28/2016 06:52 PM, Laura Abbott wrote:

On 03/28/2016 03:53 PM, Laura Abbott wrote:

The per-cpu slab is designed to be the primary path for allocation in SLUB
since it assumed allocations will go through the fast path if possible.
When debugging is enabled, the fast path is disabled and per-cpu
allocations are not used. The current debugging code path still activates
the cpu slab for allocations and then immediately deactivates it. This
is useless work. When a slab is enabled for debugging, skip cpu
activation.

Signed-off-by: Laura Abbott 
---
This is a follow on to the optimization of the debug paths for poisoning
With this I get ~2 second drop on hackbench -g 20 -l 1000 with slub_debug=P
and no noticable change with slub_debug=- .


zero day robot pointed out this is triggering one of the BUG_ON on bootup.
I'll take a deeper look tomorrow unless the approach is actually worthless.

---
  mm/slub.c | 82 +++
  1 file changed, 77 insertions(+), 5 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 7277413..4507bd8 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1482,8 +1482,8 @@ static struct page *allocate_slab(struct kmem_cache *s, 
gfp_t flags, int node)
  }

  page->freelist = fixup_red_left(s, start);
-page->inuse = page->objects;
-page->frozen = 1;
+page->inuse = kmem_cache_debug(s) ? 1 : page->objects;
+page->frozen = kmem_cache_debug(s) ? 0 : 1;

  out:
  if (gfpflags_allow_blocking(flags))
@@ -1658,6 +1658,64 @@ static inline void *acquire_slab(struct kmem_cache *s,
  return freelist;
  }

+
+static inline void *acquire_slab_debug(struct kmem_cache *s,
+struct kmem_cache_node *n, struct page *page,
+int mode, int *objects)
+{
+void *freelist;
+unsigned long counters;
+struct page new;
+void *next;
+
+lockdep_assert_held(>list_lock);
+
+
+/*
+ * Zap the freelist and set the frozen bit.
+ * The old freelist is the list of objects for the
+ * per cpu allocation list.
+ */
+freelist = page->freelist;
+counters = page->counters;
+
+BUG_ON(!freelist);
+
+next = get_freepointer_safe(s, freelist);
+
+new.counters = counters;
+*objects = new.objects - new.inuse;
+if (mode) {
+new.inuse++;
+new.freelist = next;
+} else {
+BUG();
+}
+
+VM_BUG_ON(new.frozen);
+
+if (!new.freelist) {
+remove_partial(n, page);
+add_full(s, n, page);
+}
+
+if (!__cmpxchg_double_slab(s, page,
+freelist, counters,
+new.freelist, new.counters,
+"acquire_slab")) {
+if (!new.freelist) {
+remove_full(s, n, page);
+add_partial(n, page, DEACTIVATE_TO_HEAD);
+}
+return NULL;
+}
+
+WARN_ON(!freelist);
+return freelist;
+}
+
+
+
  static void put_cpu_partial(struct kmem_cache *s, struct page *page, int 
drain);
  static inline bool pfmemalloc_match(struct page *page, gfp_t gfpflags);

@@ -1688,7 +1746,11 @@ static void *get_partial_node(struct kmem_cache *s, 
struct kmem_cache_node *n,
  if (!pfmemalloc_match(page, flags))
  continue;

-t = acquire_slab(s, n, page, object == NULL, );
+if (kmem_cache_debug(s))
+t = acquire_slab_debug(s, n, page, object == NULL, );
+else
+t = acquire_slab(s, n, page, object == NULL, );
+
  if (!t)
  break;

@@ -2284,7 +2346,17 @@ static inline void *new_slab_objects(struct kmem_cache 
*s, gfp_t flags,
   * muck around with it freely without cmpxchg
   */
  freelist = page->freelist;
-page->freelist = NULL;
+page->freelist = kmem_cache_debug(s) ?
+get_freepointer(s, freelist) : NULL;
+
+if (kmem_cache_debug(s)) {
+struct kmem_cache_node *n;
+
+n = get_node(s, page_to_nid(page));
+spin_lock(>list_lock);
+add_partial(n, page, DEACTIVATE_TO_HEAD);
+spin_unlock(>list_lock);
+}


This needs to account for slabs full after one object, otherwise it bugs out on 
the
partial list.



  stat(s, ALLOC_SLAB);
  c->page = page;
@@ -2446,7 +2518,7 @@ new_slab:
  !alloc_debug_processing(s, page, freelist, addr))
  goto new_slab;/* Slab failed checks. Next slab needed */

-deactivate_slab(s, page, get_freepointer(s, freelist));
+/* No need to deactivate, no cpu slab */
  c->page = NULL;
  c->freelist = NULL;
  return freelist;







[PATCH 10/11] perf tools: Add probing for udev86 library

2016-03-29 Thread Arnaldo Carvalho de Melo
From: Andi Kleen 

Add autoprobing for the udev86 disassembler library.

Signed-off-by: Andi Kleen 
Acked-by: Jiri Olsa 
Cc: Stephane Eranian 
Link: 
http://lkml.kernel.org/r/1459187142-20035-2-git-send-email-a...@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/build/Makefile.feature  | 6 --
 tools/build/feature/Makefile  | 8 ++--
 tools/build/feature/test-all.c| 5 +
 tools/build/feature/test-udis86.c | 8 
 tools/perf/config/Makefile| 5 +
 5 files changed, 28 insertions(+), 4 deletions(-)
 create mode 100644 tools/build/feature/test-udis86.c

diff --git a/tools/build/Makefile.feature b/tools/build/Makefile.feature
index 6b7707270aa3..db4f426cae09 100644
--- a/tools/build/Makefile.feature
+++ b/tools/build/Makefile.feature
@@ -55,7 +55,8 @@ FEATURE_TESTS_BASIC :=\
zlib\
lzma\
get_cpuid   \
-   bpf
+   bpf \
+   udis86
 
 # FEATURE_TESTS_BASIC + FEATURE_TESTS_EXTRA is the complete list
 # of all feature tests
@@ -94,7 +95,8 @@ FEATURE_DISPLAY ?=\
zlib\
lzma\
get_cpuid   \
-   bpf
+   bpf \
+   udis86
 
 # Set FEATURE_CHECK_(C|LD)FLAGS-all for all FEATURE_TESTS features.
 # If in the future we need per-feature checks/flags for features not
diff --git a/tools/build/feature/Makefile b/tools/build/feature/Makefile
index c5f4c417428d..d05c312f25c0 100644
--- a/tools/build/feature/Makefile
+++ b/tools/build/feature/Makefile
@@ -36,7 +36,8 @@ FILES=\
test-zlib.bin   \
test-lzma.bin   \
test-bpf.bin\
-   test-get_cpuid.bin
+   test-get_cpuid.bin  \
+   test-udis86.bin
 
 FILES := $(addprefix $(OUTPUT),$(FILES))
 
@@ -51,7 +52,7 @@ __BUILD = $(CC) $(CFLAGS) -Wall -Werror -o $@ $(patsubst 
%.bin,%.c,$(@F)) $(LDFL
 ###
 
 $(OUTPUT)test-all.bin:
-   $(BUILD) -fstack-protector-all -O2 -D_FORTIFY_SOURCE=2 -ldw -lelf 
-lnuma -lelf -laudit -I/usr/include/slang -lslang $(shell $(PKG_CONFIG) --libs 
--cflags gtk+-2.0 2>/dev/null) $(FLAGS_PERL_EMBED) $(FLAGS_PYTHON_EMBED) 
-DPACKAGE='"perf"' -lbfd -ldl -lz -llzma
+   $(BUILD) -fstack-protector-all -O2 -D_FORTIFY_SOURCE=2 -ldw -lelf 
-lnuma -lelf -laudit -I/usr/include/slang -lslang $(shell $(PKG_CONFIG) --libs 
--cflags gtk+-2.0 2>/dev/null) $(FLAGS_PERL_EMBED) $(FLAGS_PYTHON_EMBED) 
-DPACKAGE='"perf"' -lbfd -ldl -lz -llzma -ludis86
 
 $(OUTPUT)test-hello.bin:
$(BUILD)
@@ -97,6 +98,9 @@ $(OUTPUT)test-numa_num_possible_cpus.bin:
 $(OUTPUT)test-libunwind.bin:
$(BUILD) -lelf
 
+$(OUTPUT)test-udis86.bin:
+   $(BUILD) -ludis86
+
 $(OUTPUT)test-libunwind-debug-frame.bin:
$(BUILD) -lelf
 
diff --git a/tools/build/feature/test-all.c b/tools/build/feature/test-all.c
index e499a36c1e4a..76b0de3d145a 100644
--- a/tools/build/feature/test-all.c
+++ b/tools/build/feature/test-all.c
@@ -133,6 +133,10 @@
 # include "test-libcrypto.c"
 #undef main
 
+#define main main_test_udis86
+#  include "test-udis86.c"
+#endif
+
 int main(int argc, char *argv[])
 {
main_test_libpython();
@@ -163,6 +167,7 @@ int main(int argc, char *argv[])
main_test_get_cpuid();
main_test_bpf();
main_test_libcrypto();
+   main_test_udis86();
 
return 0;
 }
diff --git a/tools/build/feature/test-udis86.c 
b/tools/build/feature/test-udis86.c
new file mode 100644
index ..623c545f4bad
--- /dev/null
+++ b/tools/build/feature/test-udis86.c
@@ -0,0 +1,8 @@
+#include 
+
+int main(void)
+{
+   ud_t ud;
+   ud_init();
+   return 0;
+}
diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
index f7d7f5a1cad5..399ada8e7a47 100644
--- a/tools/perf/config/Makefile
+++ b/tools/perf/config/Makefile
@@ -587,6 +587,11 @@ ifneq ($(filter -lbfd,$(EXTLIBS)),)
   CFLAGS += -DHAVE_LIBBFD_SUPPORT
 endif
 
+ifeq ($(feature-udis86), 1)
+  CFLAGS += -DHAVE_UDIS86
+  EXTLIBS += -ludis86
+endif
+
 ifndef NO_ZLIB
   ifeq ($(feature-zlib), 1)
 CFLAGS += -DHAVE_ZLIB_SUPPORT
-- 
2.5.5



[PATCH 09/11] perf tools: Add support for skipping itrace instructions

2016-03-29 Thread Arnaldo Carvalho de Melo
From: Andi Kleen 

When using 'perf script' to look at PT traces it is often useful to
ignore the initialization code at the beginning.

On larger traces which may have many millions of instructions in
initialization code doing that in a pipeline can be very slow, with perf
script spending a lot of CPU time calling printf and writing data.

This patch adds an extension to the --itrace argument that skips 'n'
events (instructions, branches or transactions) at the beginning. This
is much more efficient.

v2:
Add support for BTS (Adrian Hunter)
Document in itrace.txt
Fix branch check
Check transactions and instructions too

Committer note:

To test intel_pt one needs to make sure VT-x isn't active, i.e.
stopping KVM guests on the test machine, as described by Andi Kleen
at http://lkml.kernel.org/r/20160301234953.gd23...@tassilo.jf.intel.com

Signed-off-by: Andi Kleen 
Tested-by: Arnaldo Carvalho de Melo 
Cc: Adrian Hunter 
Cc: Jiri Olsa 
Cc: Stephane Eranian 
Link: 
http://lkml.kernel.org/r/1459187142-20035-1-git-send-email-a...@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/Documentation/intel-pt.txt |  7 +++
 tools/perf/Documentation/itrace.txt   |  8 
 tools/perf/util/auxtrace.c|  7 +++
 tools/perf/util/auxtrace.h|  2 ++
 tools/perf/util/intel-bts.c   |  5 +
 tools/perf/util/intel-pt.c| 22 --
 6 files changed, 49 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/intel-pt.txt 
b/tools/perf/Documentation/intel-pt.txt
index be764f9ec769..c6c8318e38a2 100644
--- a/tools/perf/Documentation/intel-pt.txt
+++ b/tools/perf/Documentation/intel-pt.txt
@@ -672,6 +672,7 @@ The letters are:
d   create a debug log
g   synthesize a call chain (use with i or x)
l   synthesize last branch entries (use with i or x)
+   s   skip initial number of events
 
 "Instructions" events look like they were recorded by "perf record -e
 instructions".
@@ -730,6 +731,12 @@ from one sample to the next.
 
 To disable trace decoding entirely, use the option --no-itrace.
 
+It is also possible to skip events generated (instructions, branches, 
transactions)
+at the beginning. This is useful to ignore initialization code.
+
+   --itrace=i0nss100
+
+skips the first million instructions.
 
 dump option
 ---
diff --git a/tools/perf/Documentation/itrace.txt 
b/tools/perf/Documentation/itrace.txt
index 65453f4c7006..e2a4c5e0dbe5 100644
--- a/tools/perf/Documentation/itrace.txt
+++ b/tools/perf/Documentation/itrace.txt
@@ -7,6 +7,7 @@
d   create a debug log
g   synthesize a call chain (use with i or x)
l   synthesize last branch entries (use with i or x)
+   s   skip initial number of events
 
The default is all events i.e. the same as --itrace=ibxe
 
@@ -24,3 +25,10 @@
 
Also the number of last branch entries (default 64, max. 1024) for
instructions or transactions events can be specified.
+
+   It is also possible to skip events generated (instructions, branches, 
transactions)
+   at the beginning. This is useful to ignore initialization code.
+
+   --itrace=i0nss100
+
+   skips the first million instructions.
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index ec164fe70718..c9169011e55e 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -940,6 +940,7 @@ void itrace_synth_opts__set_default(struct 
itrace_synth_opts *synth_opts)
synth_opts->period = PERF_ITRACE_DEFAULT_PERIOD;
synth_opts->callchain_sz = PERF_ITRACE_DEFAULT_CALLCHAIN_SZ;
synth_opts->last_branch_sz = PERF_ITRACE_DEFAULT_LAST_BRANCH_SZ;
+   synth_opts->initial_skip = 0;
 }
 
 /*
@@ -1064,6 +1065,12 @@ int itrace_parse_synth_opts(const struct option *opt, 
const char *str,
synth_opts->last_branch_sz = val;
}
break;
+   case 's':
+   synth_opts->initial_skip = strtoul(p, , 10);
+   if (p == endptr)
+   goto out_err;
+   p = endptr;
+   break;
case ' ':
case ',':
break;
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index 57ff31ecb8e4..767989e0e312 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -68,6 +68,7 @@ enum itrace_period_type {
  * @last_branch_sz: branch context size
  * @period: 'instructions' events period
  * @period_type: 'instructions' events period type
+ * @initial_skip: skip N events at the beginning.
  */
 struct itrace_synth_opts {
bool

[PATCH 09/11] perf tools: Add support for skipping itrace instructions

2016-03-29 Thread Arnaldo Carvalho de Melo
From: Andi Kleen 

When using 'perf script' to look at PT traces it is often useful to
ignore the initialization code at the beginning.

On larger traces which may have many millions of instructions in
initialization code doing that in a pipeline can be very slow, with perf
script spending a lot of CPU time calling printf and writing data.

This patch adds an extension to the --itrace argument that skips 'n'
events (instructions, branches or transactions) at the beginning. This
is much more efficient.

v2:
Add support for BTS (Adrian Hunter)
Document in itrace.txt
Fix branch check
Check transactions and instructions too

Committer note:

To test intel_pt one needs to make sure VT-x isn't active, i.e.
stopping KVM guests on the test machine, as described by Andi Kleen
at http://lkml.kernel.org/r/20160301234953.gd23...@tassilo.jf.intel.com

Signed-off-by: Andi Kleen 
Tested-by: Arnaldo Carvalho de Melo 
Cc: Adrian Hunter 
Cc: Jiri Olsa 
Cc: Stephane Eranian 
Link: 
http://lkml.kernel.org/r/1459187142-20035-1-git-send-email-a...@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/Documentation/intel-pt.txt |  7 +++
 tools/perf/Documentation/itrace.txt   |  8 
 tools/perf/util/auxtrace.c|  7 +++
 tools/perf/util/auxtrace.h|  2 ++
 tools/perf/util/intel-bts.c   |  5 +
 tools/perf/util/intel-pt.c| 22 --
 6 files changed, 49 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/intel-pt.txt 
b/tools/perf/Documentation/intel-pt.txt
index be764f9ec769..c6c8318e38a2 100644
--- a/tools/perf/Documentation/intel-pt.txt
+++ b/tools/perf/Documentation/intel-pt.txt
@@ -672,6 +672,7 @@ The letters are:
d   create a debug log
g   synthesize a call chain (use with i or x)
l   synthesize last branch entries (use with i or x)
+   s   skip initial number of events
 
 "Instructions" events look like they were recorded by "perf record -e
 instructions".
@@ -730,6 +731,12 @@ from one sample to the next.
 
 To disable trace decoding entirely, use the option --no-itrace.
 
+It is also possible to skip events generated (instructions, branches, 
transactions)
+at the beginning. This is useful to ignore initialization code.
+
+   --itrace=i0nss100
+
+skips the first million instructions.
 
 dump option
 ---
diff --git a/tools/perf/Documentation/itrace.txt 
b/tools/perf/Documentation/itrace.txt
index 65453f4c7006..e2a4c5e0dbe5 100644
--- a/tools/perf/Documentation/itrace.txt
+++ b/tools/perf/Documentation/itrace.txt
@@ -7,6 +7,7 @@
d   create a debug log
g   synthesize a call chain (use with i or x)
l   synthesize last branch entries (use with i or x)
+   s   skip initial number of events
 
The default is all events i.e. the same as --itrace=ibxe
 
@@ -24,3 +25,10 @@
 
Also the number of last branch entries (default 64, max. 1024) for
instructions or transactions events can be specified.
+
+   It is also possible to skip events generated (instructions, branches, 
transactions)
+   at the beginning. This is useful to ignore initialization code.
+
+   --itrace=i0nss100
+
+   skips the first million instructions.
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index ec164fe70718..c9169011e55e 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -940,6 +940,7 @@ void itrace_synth_opts__set_default(struct 
itrace_synth_opts *synth_opts)
synth_opts->period = PERF_ITRACE_DEFAULT_PERIOD;
synth_opts->callchain_sz = PERF_ITRACE_DEFAULT_CALLCHAIN_SZ;
synth_opts->last_branch_sz = PERF_ITRACE_DEFAULT_LAST_BRANCH_SZ;
+   synth_opts->initial_skip = 0;
 }
 
 /*
@@ -1064,6 +1065,12 @@ int itrace_parse_synth_opts(const struct option *opt, 
const char *str,
synth_opts->last_branch_sz = val;
}
break;
+   case 's':
+   synth_opts->initial_skip = strtoul(p, , 10);
+   if (p == endptr)
+   goto out_err;
+   p = endptr;
+   break;
case ' ':
case ',':
break;
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index 57ff31ecb8e4..767989e0e312 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -68,6 +68,7 @@ enum itrace_period_type {
  * @last_branch_sz: branch context size
  * @period: 'instructions' events period
  * @period_type: 'instructions' events period type
+ * @initial_skip: skip N events at the beginning.
  */
 struct itrace_synth_opts {
boolset;
@@ -86,6 +87,7 @@ struct itrace_synth_opts {
unsigned intlast_branch_sz;
unsigned long long 

[PATCH 10/11] perf tools: Add probing for udev86 library

2016-03-29 Thread Arnaldo Carvalho de Melo
From: Andi Kleen 

Add autoprobing for the udev86 disassembler library.

Signed-off-by: Andi Kleen 
Acked-by: Jiri Olsa 
Cc: Stephane Eranian 
Link: 
http://lkml.kernel.org/r/1459187142-20035-2-git-send-email-a...@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/build/Makefile.feature  | 6 --
 tools/build/feature/Makefile  | 8 ++--
 tools/build/feature/test-all.c| 5 +
 tools/build/feature/test-udis86.c | 8 
 tools/perf/config/Makefile| 5 +
 5 files changed, 28 insertions(+), 4 deletions(-)
 create mode 100644 tools/build/feature/test-udis86.c

diff --git a/tools/build/Makefile.feature b/tools/build/Makefile.feature
index 6b7707270aa3..db4f426cae09 100644
--- a/tools/build/Makefile.feature
+++ b/tools/build/Makefile.feature
@@ -55,7 +55,8 @@ FEATURE_TESTS_BASIC :=\
zlib\
lzma\
get_cpuid   \
-   bpf
+   bpf \
+   udis86
 
 # FEATURE_TESTS_BASIC + FEATURE_TESTS_EXTRA is the complete list
 # of all feature tests
@@ -94,7 +95,8 @@ FEATURE_DISPLAY ?=\
zlib\
lzma\
get_cpuid   \
-   bpf
+   bpf \
+   udis86
 
 # Set FEATURE_CHECK_(C|LD)FLAGS-all for all FEATURE_TESTS features.
 # If in the future we need per-feature checks/flags for features not
diff --git a/tools/build/feature/Makefile b/tools/build/feature/Makefile
index c5f4c417428d..d05c312f25c0 100644
--- a/tools/build/feature/Makefile
+++ b/tools/build/feature/Makefile
@@ -36,7 +36,8 @@ FILES=\
test-zlib.bin   \
test-lzma.bin   \
test-bpf.bin\
-   test-get_cpuid.bin
+   test-get_cpuid.bin  \
+   test-udis86.bin
 
 FILES := $(addprefix $(OUTPUT),$(FILES))
 
@@ -51,7 +52,7 @@ __BUILD = $(CC) $(CFLAGS) -Wall -Werror -o $@ $(patsubst 
%.bin,%.c,$(@F)) $(LDFL
 ###
 
 $(OUTPUT)test-all.bin:
-   $(BUILD) -fstack-protector-all -O2 -D_FORTIFY_SOURCE=2 -ldw -lelf 
-lnuma -lelf -laudit -I/usr/include/slang -lslang $(shell $(PKG_CONFIG) --libs 
--cflags gtk+-2.0 2>/dev/null) $(FLAGS_PERL_EMBED) $(FLAGS_PYTHON_EMBED) 
-DPACKAGE='"perf"' -lbfd -ldl -lz -llzma
+   $(BUILD) -fstack-protector-all -O2 -D_FORTIFY_SOURCE=2 -ldw -lelf 
-lnuma -lelf -laudit -I/usr/include/slang -lslang $(shell $(PKG_CONFIG) --libs 
--cflags gtk+-2.0 2>/dev/null) $(FLAGS_PERL_EMBED) $(FLAGS_PYTHON_EMBED) 
-DPACKAGE='"perf"' -lbfd -ldl -lz -llzma -ludis86
 
 $(OUTPUT)test-hello.bin:
$(BUILD)
@@ -97,6 +98,9 @@ $(OUTPUT)test-numa_num_possible_cpus.bin:
 $(OUTPUT)test-libunwind.bin:
$(BUILD) -lelf
 
+$(OUTPUT)test-udis86.bin:
+   $(BUILD) -ludis86
+
 $(OUTPUT)test-libunwind-debug-frame.bin:
$(BUILD) -lelf
 
diff --git a/tools/build/feature/test-all.c b/tools/build/feature/test-all.c
index e499a36c1e4a..76b0de3d145a 100644
--- a/tools/build/feature/test-all.c
+++ b/tools/build/feature/test-all.c
@@ -133,6 +133,10 @@
 # include "test-libcrypto.c"
 #undef main
 
+#define main main_test_udis86
+#  include "test-udis86.c"
+#endif
+
 int main(int argc, char *argv[])
 {
main_test_libpython();
@@ -163,6 +167,7 @@ int main(int argc, char *argv[])
main_test_get_cpuid();
main_test_bpf();
main_test_libcrypto();
+   main_test_udis86();
 
return 0;
 }
diff --git a/tools/build/feature/test-udis86.c 
b/tools/build/feature/test-udis86.c
new file mode 100644
index ..623c545f4bad
--- /dev/null
+++ b/tools/build/feature/test-udis86.c
@@ -0,0 +1,8 @@
+#include 
+
+int main(void)
+{
+   ud_t ud;
+   ud_init();
+   return 0;
+}
diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
index f7d7f5a1cad5..399ada8e7a47 100644
--- a/tools/perf/config/Makefile
+++ b/tools/perf/config/Makefile
@@ -587,6 +587,11 @@ ifneq ($(filter -lbfd,$(EXTLIBS)),)
   CFLAGS += -DHAVE_LIBBFD_SUPPORT
 endif
 
+ifeq ($(feature-udis86), 1)
+  CFLAGS += -DHAVE_UDIS86
+  EXTLIBS += -ludis86
+endif
+
 ifndef NO_ZLIB
   ifeq ($(feature-zlib), 1)
 CFLAGS += -DHAVE_ZLIB_SUPPORT
-- 
2.5.5



Re: [PATCH v3] sparc64: Reduce TLB flushes during hugepte changes

2016-03-29 Thread David Miller
From: Nitin Gupta 
Date: Tue, 29 Mar 2016 14:11:14 -0700

> During hugepage map/unmap, TSB and TLB flushes are currently
> issued at every PAGE_SIZE'd boundary which is unnecessary.
> We now issue the flush at REAL_HPAGE_SIZE boundaries only.
> 
> Without this patch workloads which unmap a large hugepage
> backed VMA region get CPU lockups due to excessive TLB
> flush calls.
> 
> Orabug: 22365539, 22643230, 22995196
> 
> Signed-off-by: Nitin Gupta 

You really need to put some more work into this, there are lots of
config variants you didn't even test.

See the kbuild robot replies...



Applied "regulator: core: Log when we bring constraints into range" to the regulator tree

2016-03-29 Thread Mark Brown
The patch

   regulator: core: Log when we bring constraints into range

has been applied to the regulator tree at

   git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator.git 

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.  

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

>From 45a91e8f767afbb46bf7251f81d15d121136 Mon Sep 17 00:00:00 2001
From: Mark Brown 
Date: Tue, 29 Mar 2016 16:33:42 -0700
Subject: [PATCH] regulator: core: Log when we bring constraints into range

This aids in debugging problems triggered by the regulator core applying
its constraints, we could potentially crash immediately after updating
the voltage if the constraints are buggy.

Signed-off-by: Mark Brown 
---
 drivers/regulator/core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index 881c37e61f75..18dd7ee61455 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -935,6 +935,8 @@ static int machine_constraints_voltage(struct regulator_dev 
*rdev,
}
 
if (target_min != current_uV || target_max != current_uV) {
+   rdev_info(rdev, "Bringing %duV into %d-%duV\n",
+ current_uV, target_min, target_max);
ret = _regulator_do_set_voltage(
rdev, target_min, target_max);
if (ret < 0) {
-- 
2.8.0.rc3



Re: [PATCH v3] sparc64: Reduce TLB flushes during hugepte changes

2016-03-29 Thread David Miller
From: Nitin Gupta 
Date: Tue, 29 Mar 2016 14:11:14 -0700

> During hugepage map/unmap, TSB and TLB flushes are currently
> issued at every PAGE_SIZE'd boundary which is unnecessary.
> We now issue the flush at REAL_HPAGE_SIZE boundaries only.
> 
> Without this patch workloads which unmap a large hugepage
> backed VMA region get CPU lockups due to excessive TLB
> flush calls.
> 
> Orabug: 22365539, 22643230, 22995196
> 
> Signed-off-by: Nitin Gupta 

You really need to put some more work into this, there are lots of
config variants you didn't even test.

See the kbuild robot replies...



Applied "regulator: core: Log when we bring constraints into range" to the regulator tree

2016-03-29 Thread Mark Brown
The patch

   regulator: core: Log when we bring constraints into range

has been applied to the regulator tree at

   git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator.git 

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.  

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

>From 45a91e8f767afbb46bf7251f81d15d121136 Mon Sep 17 00:00:00 2001
From: Mark Brown 
Date: Tue, 29 Mar 2016 16:33:42 -0700
Subject: [PATCH] regulator: core: Log when we bring constraints into range

This aids in debugging problems triggered by the regulator core applying
its constraints, we could potentially crash immediately after updating
the voltage if the constraints are buggy.

Signed-off-by: Mark Brown 
---
 drivers/regulator/core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index 881c37e61f75..18dd7ee61455 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -935,6 +935,8 @@ static int machine_constraints_voltage(struct regulator_dev 
*rdev,
}
 
if (target_min != current_uV || target_max != current_uV) {
+   rdev_info(rdev, "Bringing %duV into %d-%duV\n",
+ current_uV, target_min, target_max);
ret = _regulator_do_set_voltage(
rdev, target_min, target_max);
if (ret < 0) {
-- 
2.8.0.rc3



Re: [GIT PULL] bcm2835 clk changes for 4.6 maybe

2016-03-29 Thread Stephen Boyd
On 03/17, Eric Anholt wrote:
> This is late, so feel free to drop it, but I figured I'd send it to
> you in case you were still open to merges.  I've pounded on it a bit
> today (modesets to all sorts of resolutions on HDMI, used it for
> testing the DPI panel support that I'm hoping to have for 4.7, and did
> a whole lot of browsing of clk_summary as I debugged DPI), and kbuild
> test robot came back clean, so I'm pretty happy with it.
> 
> The following changes since commit 4d3ac6662452060721599a3392bc2f524af984cb:
> 
>   clk: bcm2835: fix check of error code returned by devm_ioremap_resource() 
> (2016-03-15 18:14:11 -0700)
> 
> are available in the git repository at:
> 
>   g...@github.com:anholt/linux.git tags/bcm2835-clk-next-2016-03-17

Please make sure to use a proper URL here. I don't have access to
g...@github.com, but I can fetch this from
git://github.com/anholt/linux.git. The tag and the contents match
so I'm fairly confident all is well.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


Re: [GIT PULL] bcm2835 clk changes for 4.6 maybe

2016-03-29 Thread Stephen Boyd
On 03/17, Eric Anholt wrote:
> This is late, so feel free to drop it, but I figured I'd send it to
> you in case you were still open to merges.  I've pounded on it a bit
> today (modesets to all sorts of resolutions on HDMI, used it for
> testing the DPI panel support that I'm hoping to have for 4.7, and did
> a whole lot of browsing of clk_summary as I debugged DPI), and kbuild
> test robot came back clean, so I'm pretty happy with it.
> 
> The following changes since commit 4d3ac6662452060721599a3392bc2f524af984cb:
> 
>   clk: bcm2835: fix check of error code returned by devm_ioremap_resource() 
> (2016-03-15 18:14:11 -0700)
> 
> are available in the git repository at:
> 
>   g...@github.com:anholt/linux.git tags/bcm2835-clk-next-2016-03-17

Please make sure to use a proper URL here. I don't have access to
g...@github.com, but I can fetch this from
git://github.com/anholt/linux.git. The tag and the contents match
so I'm fairly confident all is well.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


[PATCH] regulator: core: Log when we bring constraints into range

2016-03-29 Thread Mark Brown
This aids in debugging problems triggered by the regulator core applying
its constraints, we could potentially crash immediately after updating
the voltage if the constraints are buggy.

Signed-off-by: Mark Brown 
---
 drivers/regulator/core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index 881c37e61f75..18dd7ee61455 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -935,6 +935,8 @@ static int machine_constraints_voltage(struct regulator_dev 
*rdev,
}
 
if (target_min != current_uV || target_max != current_uV) {
+   rdev_info(rdev, "Bringing %duV into %d-%duV\n",
+ current_uV, target_min, target_max);
ret = _regulator_do_set_voltage(
rdev, target_min, target_max);
if (ret < 0) {
-- 
2.8.0.rc3



[PATCH] regulator: core: Log when we bring constraints into range

2016-03-29 Thread Mark Brown
This aids in debugging problems triggered by the regulator core applying
its constraints, we could potentially crash immediately after updating
the voltage if the constraints are buggy.

Signed-off-by: Mark Brown 
---
 drivers/regulator/core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index 881c37e61f75..18dd7ee61455 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -935,6 +935,8 @@ static int machine_constraints_voltage(struct regulator_dev 
*rdev,
}
 
if (target_min != current_uV || target_max != current_uV) {
+   rdev_info(rdev, "Bringing %duV into %d-%duV\n",
+ current_uV, target_min, target_max);
ret = _regulator_do_set_voltage(
rdev, target_min, target_max);
if (ret < 0) {
-- 
2.8.0.rc3



Re: 3.14.65: Memory leak when slub_debug is enabled

2016-03-29 Thread Christoph Lameter
On Tue, 29 Mar 2016, Ajay Patel wrote:

> We have custom board with Marvell Armada dual core ARMV7.
> The driver uses buffers from kmalloc-8192 slab heavily.
> When slub_debug is enabled, the kmalloc-8192 active slabs are
> increasing. The slub stats shows  cmpxchg_double_fail and objects_partial
> are increasing too. Eventually system panics on oom.

Hmmm... I thought we fall back to pass through to the page allocator for
order 1 requests? Why is it going through the regular allocator paths?

> Following patch fixes the issue.

Wonder how that could be? Does the __cmpxchg_double work correctly on ARM?

> Has anybody encountered this issue?
> Is this right fix?

Looks like something is screwing around with the page flags because an
order 1 page is a compound page? Can you ensure that order 1 allocs are
using page allocator fallback. See kmalloc_large().


Re: 3.14.65: Memory leak when slub_debug is enabled

2016-03-29 Thread Christoph Lameter
On Tue, 29 Mar 2016, Ajay Patel wrote:

> We have custom board with Marvell Armada dual core ARMV7.
> The driver uses buffers from kmalloc-8192 slab heavily.
> When slub_debug is enabled, the kmalloc-8192 active slabs are
> increasing. The slub stats shows  cmpxchg_double_fail and objects_partial
> are increasing too. Eventually system panics on oom.

Hmmm... I thought we fall back to pass through to the page allocator for
order 1 requests? Why is it going through the regular allocator paths?

> Following patch fixes the issue.

Wonder how that could be? Does the __cmpxchg_double work correctly on ARM?

> Has anybody encountered this issue?
> Is this right fix?

Looks like something is screwing around with the page flags because an
order 1 page is a compound page? Can you ensure that order 1 allocs are
using page allocator fallback. See kmalloc_large().


Re: [PATCH v4 0/4] SROP Mitigation: Sigreturn Cookies

2016-03-29 Thread Scotty Bauer


On 03/29/2016 05:25 PM, Linus Torvalds wrote:
> On Tue, Mar 29, 2016 at 6:11 PM, Scotty Bauer  wrote:
>>
>> Yeah I had toyed with using hashes, I used hash_64 not md5 which is like 14
>> extra instructions or something.
> 
> That sounds fine. Anything that requires enough code to undo that it
> kind of defeats the purpose of a SROP should be enough. It's not about
> encryption, I'd just think that if you can force the buffer overflow
> while already in a signal handler, you'd want something that is at
> least *slightly* harder to defeat than a single "xor" instruction.
> 
>> It's not hard to implement So I can try it. When you say an extra hardening
>> mode do you mean hide it behind a sysctl or some sort of compile time CONFIG?
> 
> Since there already is a sysctl, I'd just assume that.
> 
> The important part is that the *default* value for that sysctl can't
> break real applications. I don't really count CRIU as a real app, if
> only because once you start doing checkpoint-restore you are going to
> do some amount of system maintenance anyway, so somebody doing CRIU is
> kind of expected to have a certain amount of system expertise, I would
> say.
> 
> But dosemu - or Wine - is very much something that "normal people" run
> - people who we do *not* expect to have to know about new sysctl's
> etc. They already have one (mmap at zero), but that is very directly
> related to what vm86 mode and Wine does, and people have had time to
> learn about it. Let's not add another.
> 
> So testing dosemu and wine would be good. I wonder what else has shown
> issues with signal stack layout changes. Debuggers and some JIT
> engines, I suspect.
> 
>   Linus
> 


Alright I'll test Wine/Mono, Dosemu, some random languages/debuggers see if
there is anything that breaks.

Thanks.




Re: [PATCH v4 3/4] perf config: Prepare all default configs

2016-03-29 Thread Taeung Song



On 03/30/2016 01:12 AM, Arnaldo Carvalho de Melo wrote:

Em Tue, Mar 29, 2016 at 09:43:56AM +0900, Taeung Song escreveu:

To precisely manage configs,
prepare all default perf's configs that contain
default section name, variable name, value
and correct type, not string type.

In the near future, this will be used when
checking type of config variable or showing
all configs with default values, etc.

Acked-by: Namhyung Kim 


Doesn't apply, probably because needs the first patch?


All right, I'm waiting for namhyung's reconsidering the first patch.

Thanks,
Taeung




Cc: Jiri Olsa 
Signed-off-by: Taeung Song 
---
  tools/perf/util/config.c | 110 ++-
  tools/perf/util/config.h |  62 +-
  2 files changed, 168 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/config.c b/tools/perf/util/config.c
index 2dbf47c..df9f0dd 100644
--- a/tools/perf/util/config.c
+++ b/tools/perf/util/config.c
@@ -15,6 +15,7 @@
  #include "util/llvm-utils.h"   /* perf_llvm_config */
  #include "config.h"

+#define MAX_CONFIGS 64
  #define MAXNAME (256)

  #define DEBUG_CACHE_DIR ".debug"
@@ -29,6 +30,111 @@ static int config_file_eof;

  const char *config_exclusive_filename;

+struct perf_config_section default_sections[] = {
+   { .name = "colors" },
+   { .name = "tui" },
+   { .name = "buildid" },
+   { .name = "annotate" },
+   { .name = "gtk" },
+   { .name = "pager" },
+   { .name = "help" },
+   { .name = "hist" },
+   { .name = "ui" },
+   { .name = "call-graph" },
+   { .name = "report" },
+   { .name = "top" },
+   { .name = "man" },
+   { .name = "kmem" }
+};
+
+struct perf_config_item default_config_items[][MAX_CONFIGS] = {
+   [CONFIG_COLORS] = {
+   CONF_STR_VAR("top", "red, default"),
+   CONF_STR_VAR("medium", "green, default"),
+   CONF_STR_VAR("normal", "lightgray, default"),
+   CONF_STR_VAR("selected", "white, lightgray"),
+   CONF_STR_VAR("jump_arrows", "blue, default"),
+   CONF_STR_VAR("addr", "magenta, default"),
+   CONF_STR_VAR("root", "white, blue"),
+   CONF_END()
+   },
+   [CONFIG_TUI] = {
+   CONF_BOOL_VAR("report", true),
+   CONF_BOOL_VAR("annotate", true),
+   CONF_BOOL_VAR("top", true),
+   CONF_END()
+   },
+   [CONFIG_BUILDID] = {
+   CONF_STR_VAR("dir", "~/.debug"),
+   CONF_END()
+   },
+   [CONFIG_ANNOTATE] = {
+   CONF_BOOL_VAR("hide_src_code", false),
+   CONF_BOOL_VAR("use_offset", true),
+   CONF_BOOL_VAR("jump_arrows", true),
+   CONF_BOOL_VAR("show_nr_jumps", false),
+   CONF_BOOL_VAR("show_linenr", false),
+   CONF_BOOL_VAR("show_total_period", false),
+   CONF_END()
+   },
+   [CONFIG_GTK] = {
+   CONF_BOOL_VAR("annotate", false),
+   CONF_BOOL_VAR("report", false),
+   CONF_BOOL_VAR("top", false),
+   CONF_END()
+   },
+   [CONFIG_PAGER] = {
+   CONF_BOOL_VAR("cmd", true),
+   CONF_BOOL_VAR("report", true),
+   CONF_BOOL_VAR("annotate", true),
+   CONF_BOOL_VAR("top", true),
+   CONF_BOOL_VAR("diff", true),
+   CONF_END()
+   },
+   [CONFIG_HELP] = {
+   CONF_STR_VAR("format", "man"),
+   CONF_INT_VAR("autocorrect", 0),
+   CONF_END()
+   },
+   [CONFIG_HIST] = {
+   CONF_STR_VAR("percentage", "absolute"),
+   CONF_END()
+   },
+   [CONFIG_UI] = {
+   CONF_BOOL_VAR("show-headers", true),
+   CONF_END()
+   },
+   [CONFIG_CALL_GRAPH] = {
+   CONF_STR_VAR("record-mode", "fp"),
+   CONF_LONG_VAR("dump-size", 8192),
+   CONF_STR_VAR("print-type", "graph"),
+   CONF_STR_VAR("order", "callee"),
+   CONF_STR_VAR("sort-key", "function"),
+   CONF_DOUBLE_VAR("threshold", 0.5),
+   CONF_LONG_VAR("print-limit", 0),
+   CONF_END()
+   },
+   [CONFIG_REPORT] = {
+   CONF_BOOL_VAR("group", true),
+   CONF_BOOL_VAR("children", true),
+   CONF_FLOAT_VAR("percent-limit", 0),
+   CONF_U64_VAR("queue-size", 0),
+   CONF_END()
+   },
+   [CONFIG_TOP] = {
+   CONF_BOOL_VAR("children", true),
+   CONF_END()
+   },
+   [CONFIG_MAN] = {
+   CONF_STR_VAR("viewer", "man"),
+   CONF_END()
+   },
+   [CONFIG_KMEM] = {
+   CONF_STR_VAR("default", "slab"),
+   CONF_END()
+   }
+};
+
  static int get_next_char(void)
  {
int c;

Re: [PATCH v4 0/4] SROP Mitigation: Sigreturn Cookies

2016-03-29 Thread Scotty Bauer


On 03/29/2016 05:25 PM, Linus Torvalds wrote:
> On Tue, Mar 29, 2016 at 6:11 PM, Scotty Bauer  wrote:
>>
>> Yeah I had toyed with using hashes, I used hash_64 not md5 which is like 14
>> extra instructions or something.
> 
> That sounds fine. Anything that requires enough code to undo that it
> kind of defeats the purpose of a SROP should be enough. It's not about
> encryption, I'd just think that if you can force the buffer overflow
> while already in a signal handler, you'd want something that is at
> least *slightly* harder to defeat than a single "xor" instruction.
> 
>> It's not hard to implement So I can try it. When you say an extra hardening
>> mode do you mean hide it behind a sysctl or some sort of compile time CONFIG?
> 
> Since there already is a sysctl, I'd just assume that.
> 
> The important part is that the *default* value for that sysctl can't
> break real applications. I don't really count CRIU as a real app, if
> only because once you start doing checkpoint-restore you are going to
> do some amount of system maintenance anyway, so somebody doing CRIU is
> kind of expected to have a certain amount of system expertise, I would
> say.
> 
> But dosemu - or Wine - is very much something that "normal people" run
> - people who we do *not* expect to have to know about new sysctl's
> etc. They already have one (mmap at zero), but that is very directly
> related to what vm86 mode and Wine does, and people have had time to
> learn about it. Let's not add another.
> 
> So testing dosemu and wine would be good. I wonder what else has shown
> issues with signal stack layout changes. Debuggers and some JIT
> engines, I suspect.
> 
>   Linus
> 


Alright I'll test Wine/Mono, Dosemu, some random languages/debuggers see if
there is anything that breaks.

Thanks.




Re: [PATCH v4 3/4] perf config: Prepare all default configs

2016-03-29 Thread Taeung Song



On 03/30/2016 01:12 AM, Arnaldo Carvalho de Melo wrote:

Em Tue, Mar 29, 2016 at 09:43:56AM +0900, Taeung Song escreveu:

To precisely manage configs,
prepare all default perf's configs that contain
default section name, variable name, value
and correct type, not string type.

In the near future, this will be used when
checking type of config variable or showing
all configs with default values, etc.

Acked-by: Namhyung Kim 


Doesn't apply, probably because needs the first patch?


All right, I'm waiting for namhyung's reconsidering the first patch.

Thanks,
Taeung




Cc: Jiri Olsa 
Signed-off-by: Taeung Song 
---
  tools/perf/util/config.c | 110 ++-
  tools/perf/util/config.h |  62 +-
  2 files changed, 168 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/config.c b/tools/perf/util/config.c
index 2dbf47c..df9f0dd 100644
--- a/tools/perf/util/config.c
+++ b/tools/perf/util/config.c
@@ -15,6 +15,7 @@
  #include "util/llvm-utils.h"   /* perf_llvm_config */
  #include "config.h"

+#define MAX_CONFIGS 64
  #define MAXNAME (256)

  #define DEBUG_CACHE_DIR ".debug"
@@ -29,6 +30,111 @@ static int config_file_eof;

  const char *config_exclusive_filename;

+struct perf_config_section default_sections[] = {
+   { .name = "colors" },
+   { .name = "tui" },
+   { .name = "buildid" },
+   { .name = "annotate" },
+   { .name = "gtk" },
+   { .name = "pager" },
+   { .name = "help" },
+   { .name = "hist" },
+   { .name = "ui" },
+   { .name = "call-graph" },
+   { .name = "report" },
+   { .name = "top" },
+   { .name = "man" },
+   { .name = "kmem" }
+};
+
+struct perf_config_item default_config_items[][MAX_CONFIGS] = {
+   [CONFIG_COLORS] = {
+   CONF_STR_VAR("top", "red, default"),
+   CONF_STR_VAR("medium", "green, default"),
+   CONF_STR_VAR("normal", "lightgray, default"),
+   CONF_STR_VAR("selected", "white, lightgray"),
+   CONF_STR_VAR("jump_arrows", "blue, default"),
+   CONF_STR_VAR("addr", "magenta, default"),
+   CONF_STR_VAR("root", "white, blue"),
+   CONF_END()
+   },
+   [CONFIG_TUI] = {
+   CONF_BOOL_VAR("report", true),
+   CONF_BOOL_VAR("annotate", true),
+   CONF_BOOL_VAR("top", true),
+   CONF_END()
+   },
+   [CONFIG_BUILDID] = {
+   CONF_STR_VAR("dir", "~/.debug"),
+   CONF_END()
+   },
+   [CONFIG_ANNOTATE] = {
+   CONF_BOOL_VAR("hide_src_code", false),
+   CONF_BOOL_VAR("use_offset", true),
+   CONF_BOOL_VAR("jump_arrows", true),
+   CONF_BOOL_VAR("show_nr_jumps", false),
+   CONF_BOOL_VAR("show_linenr", false),
+   CONF_BOOL_VAR("show_total_period", false),
+   CONF_END()
+   },
+   [CONFIG_GTK] = {
+   CONF_BOOL_VAR("annotate", false),
+   CONF_BOOL_VAR("report", false),
+   CONF_BOOL_VAR("top", false),
+   CONF_END()
+   },
+   [CONFIG_PAGER] = {
+   CONF_BOOL_VAR("cmd", true),
+   CONF_BOOL_VAR("report", true),
+   CONF_BOOL_VAR("annotate", true),
+   CONF_BOOL_VAR("top", true),
+   CONF_BOOL_VAR("diff", true),
+   CONF_END()
+   },
+   [CONFIG_HELP] = {
+   CONF_STR_VAR("format", "man"),
+   CONF_INT_VAR("autocorrect", 0),
+   CONF_END()
+   },
+   [CONFIG_HIST] = {
+   CONF_STR_VAR("percentage", "absolute"),
+   CONF_END()
+   },
+   [CONFIG_UI] = {
+   CONF_BOOL_VAR("show-headers", true),
+   CONF_END()
+   },
+   [CONFIG_CALL_GRAPH] = {
+   CONF_STR_VAR("record-mode", "fp"),
+   CONF_LONG_VAR("dump-size", 8192),
+   CONF_STR_VAR("print-type", "graph"),
+   CONF_STR_VAR("order", "callee"),
+   CONF_STR_VAR("sort-key", "function"),
+   CONF_DOUBLE_VAR("threshold", 0.5),
+   CONF_LONG_VAR("print-limit", 0),
+   CONF_END()
+   },
+   [CONFIG_REPORT] = {
+   CONF_BOOL_VAR("group", true),
+   CONF_BOOL_VAR("children", true),
+   CONF_FLOAT_VAR("percent-limit", 0),
+   CONF_U64_VAR("queue-size", 0),
+   CONF_END()
+   },
+   [CONFIG_TOP] = {
+   CONF_BOOL_VAR("children", true),
+   CONF_END()
+   },
+   [CONFIG_MAN] = {
+   CONF_STR_VAR("viewer", "man"),
+   CONF_END()
+   },
+   [CONFIG_KMEM] = {
+   CONF_STR_VAR("default", "slab"),
+   CONF_END()
+   }
+};
+
  static int get_next_char(void)
  {
int c;
@@ -665,12 +771,12 @@ void perf_config_set__delete(struct 

Re: [PATCH 04/17] clk: qcom: ipq4019: switch remaining defines to enums

2016-03-29 Thread Stephen Boyd
On 03/23, Matthew McClintock wrote:
> When this was added not all the remaining defines were switched over to
> use enums, so let's complete that process here
> 
> Reported-by: Stephen Boyd 
> Signed-off-by: Matthew McClintock 
> ---

Applied to clk-fixes

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


Re: [PATCH 04/17] clk: qcom: ipq4019: switch remaining defines to enums

2016-03-29 Thread Stephen Boyd
On 03/23, Matthew McClintock wrote:
> When this was added not all the remaining defines were switched over to
> use enums, so let's complete that process here
> 
> Reported-by: Stephen Boyd 
> Signed-off-by: Matthew McClintock 
> ---

Applied to clk-fixes

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


Re: [PATCH 05/17] clk: qcom: ipq4019: add some fixed clocks for ddrppl and fepll

2016-03-29 Thread Stephen Boyd
On 03/23, Matthew McClintock wrote:
> Drivers for these don't exist yet so we will add them as fixed clocks
> so we don't BUG() if we change clocks that reference these clocks.
> 
> Signed-off-by: Matthew McClintock 
> ---

Applied to clk-fixes

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


Re: [PATCH 05/17] clk: qcom: ipq4019: add some fixed clocks for ddrppl and fepll

2016-03-29 Thread Stephen Boyd
On 03/23, Matthew McClintock wrote:
> Drivers for these don't exist yet so we will add them as fixed clocks
> so we don't BUG() if we change clocks that reference these clocks.
> 
> Signed-off-by: Matthew McClintock 
> ---

Applied to clk-fixes

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


Re: [PATCH v2] module: fix noreturn attribute for __module_put_and_exit()

2016-03-29 Thread Jiri Kosina
On Mon, 21 Mar 2016, Jiri Kosina wrote:

> __module_put_and_exit() is makred noreturn in module.h declaration, but is
> lacking the attribute in the definition, which makes some tools (such as
> sparse) unhappy. Amend the definition with the attribute as well (and
> reformat the declaration so that it uses more common format).
> 
> Signed-off-by: Jiri Kosina 
> ---
> 
> v1 -> v2: use __noreturn instead of __attribute__((noreturn)) as requested 
> by Rusty

Rusty, friendly ping on this one.

Thanks!

> 
>  include/linux/module.h | 4 ++--
>  kernel/module.c| 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/module.h b/include/linux/module.h
> index 2bb0c30..17a13ec 100644
> --- a/include/linux/module.h
> +++ b/include/linux/module.h
> @@ -562,8 +562,8 @@ int module_kallsyms_on_each_symbol(int (*fn)(void *, 
> const char *,
>struct module *, unsigned long),
>  void *data);
>  
> -extern void __module_put_and_exit(struct module *mod, long code)
> - __attribute__((noreturn));
> +extern void __noreturn __module_put_and_exit(struct module *mod,
> + long code);
>  #define module_put_and_exit(code) __module_put_and_exit(THIS_MODULE, code)
>  
>  #ifdef CONFIG_MODULE_UNLOAD
> diff --git a/kernel/module.c b/kernel/module.c
> index 041200c..d367ba0 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -336,7 +336,7 @@ static inline void add_taint_module(struct module *mod, 
> unsigned flag,
>   * A thread that wants to hold a reference to a module only while it
>   * is running can call this to safely exit.  nfsd and lockd use this.
>   */
> -void __module_put_and_exit(struct module *mod, long code)
> +void __noreturn __module_put_and_exit(struct module *mod, long code)
>  {
>   module_put(mod);
>   do_exit(code);
> -- 
> Jiri Kosina
> SUSE Labs
> 

-- 
Jiri Kosina
SUSE Labs



Re: [PATCH v2] module: fix noreturn attribute for __module_put_and_exit()

2016-03-29 Thread Jiri Kosina
On Mon, 21 Mar 2016, Jiri Kosina wrote:

> __module_put_and_exit() is makred noreturn in module.h declaration, but is
> lacking the attribute in the definition, which makes some tools (such as
> sparse) unhappy. Amend the definition with the attribute as well (and
> reformat the declaration so that it uses more common format).
> 
> Signed-off-by: Jiri Kosina 
> ---
> 
> v1 -> v2: use __noreturn instead of __attribute__((noreturn)) as requested 
> by Rusty

Rusty, friendly ping on this one.

Thanks!

> 
>  include/linux/module.h | 4 ++--
>  kernel/module.c| 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/module.h b/include/linux/module.h
> index 2bb0c30..17a13ec 100644
> --- a/include/linux/module.h
> +++ b/include/linux/module.h
> @@ -562,8 +562,8 @@ int module_kallsyms_on_each_symbol(int (*fn)(void *, 
> const char *,
>struct module *, unsigned long),
>  void *data);
>  
> -extern void __module_put_and_exit(struct module *mod, long code)
> - __attribute__((noreturn));
> +extern void __noreturn __module_put_and_exit(struct module *mod,
> + long code);
>  #define module_put_and_exit(code) __module_put_and_exit(THIS_MODULE, code)
>  
>  #ifdef CONFIG_MODULE_UNLOAD
> diff --git a/kernel/module.c b/kernel/module.c
> index 041200c..d367ba0 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -336,7 +336,7 @@ static inline void add_taint_module(struct module *mod, 
> unsigned flag,
>   * A thread that wants to hold a reference to a module only while it
>   * is running can call this to safely exit.  nfsd and lockd use this.
>   */
> -void __module_put_and_exit(struct module *mod, long code)
> +void __noreturn __module_put_and_exit(struct module *mod, long code)
>  {
>   module_put(mod);
>   do_exit(code);
> -- 
> Jiri Kosina
> SUSE Labs
> 

-- 
Jiri Kosina
SUSE Labs



Re: [PATCH v4 1/4] perf config: Introduce perf_config_set class

2016-03-29 Thread Taeung Song

Hi, Arnaldo and Namhyung

On 03/30/2016 01:12 AM, Arnaldo Carvalho de Melo wrote:

Em Tue, Mar 29, 2016 at 09:43:13AM +0900, Taeung Song escreveu:

This infrastructure code was designed for
upcoming features of perf-config.

That collect config key-value pairs from user and
system config files (i.e. user wide ~/.perfconfig
and system wide $(sysconfdir)/perfconfig)
to manage perf's configs.

Cc: Namhyung Kim 
Cc: Jiri Olsa 
Signed-off-by: Taeung Song 


Waiting for ack.


Namhyung,
The difference between v3 and v4 for this patch as below.
(fill perf_config_set__delete() in collect_config() for state of error)

Can you review this patch, again ?

diff --git a/tools/perf/util/config.c b/tools/perf/util/config.c
index 725015f..2cfafff 100644
--- a/tools/perf/util/config.c
+++ b/tools/perf/util/config.c
@@ -705,14 +706,15 @@ static int set_value(struct perf_config_item 
*config_item, const char *value)

 }

 static int collect_config(const char *var, const char *value,
-  void *section_list)
+  void *perf_config_set)
 {
 int ret = -1;
 char *ptr, *key;
 char *section_name, *name;
 struct perf_config_section *section = NULL;
 struct perf_config_item *config_item = NULL;
-struct list_head *sections = section_list;
+struct perf_config_set *perf_configs = perf_config_set;
+struct list_head *sections = _configs->sections;

 key = ptr = strdup(var);
 if (!key) {
@@ -743,7 +745,8 @@ static int collect_config(const char *var, const 
char *value,


 out_free:
 free(key);
-return ret;
+perf_config_set__delete(perf_configs);
+return -1;
 }

 static struct perf_config_set *perf_config_set__init(struct 
perf_config_set *perf_configs)

@@ -777,7 +780,7 @@ struct perf_config_set *perf_config_set__new(void)
 return NULL;

 perf_config_set__init(perf_configs);
-perf_config(collect_config, _configs->sections);
+perf_config(collect_config, perf_configs);

 return perf_configs;
 }


Thanks,
Taeung



---
  tools/perf/util/config.c | 171 +++
  tools/perf/util/config.h |  26 +++
  2 files changed, 197 insertions(+)
  create mode 100644 tools/perf/util/config.h

diff --git a/tools/perf/util/config.c b/tools/perf/util/config.c
index 4e72763..2dbf47c 100644
--- a/tools/perf/util/config.c
+++ b/tools/perf/util/config.c
@@ -13,6 +13,7 @@
  #include 
  #include "util/hist.h"  /* perf_hist_config */
  #include "util/llvm-utils.h"   /* perf_llvm_config */
+#include "config.h"

  #define MAXNAME (256)

@@ -506,6 +507,176 @@ out:
return ret;
  }

+static struct perf_config_section *find_section(struct list_head *sections,
+   const char *section_name)
+{
+   struct perf_config_section *section;
+
+   list_for_each_entry(section, sections, list)
+   if (!strcmp(section->name, section_name))
+   return section;
+
+   return NULL;
+}
+
+static struct perf_config_item *find_config_item(const char *name,
+struct perf_config_section 
*section)
+{
+   struct perf_config_item *config_item;
+
+   list_for_each_entry(config_item, >config_items, list)
+   if (!strcmp(config_item->name, name))
+   return config_item;
+
+   return NULL;
+}
+
+static void find_config(struct list_head *sections,
+   struct perf_config_section **section,
+   struct perf_config_item **config_item,
+   const char *section_name, const char *name)
+{
+   *section = find_section(sections, section_name);
+
+   if (*section != NULL)
+   *config_item = find_config_item(name, *section);
+   else
+   *config_item = NULL;
+}
+
+static struct perf_config_section *add_section(struct list_head *sections,
+  const char *section_name)
+{
+   struct perf_config_section *section = zalloc(sizeof(*section));
+
+   if (!section)
+   return NULL;
+
+   INIT_LIST_HEAD(>config_items);
+   section->name = strdup(section_name);
+   if (!section->name) {
+   pr_err("%s: strdup failed\n", __func__);
+   free(section);
+   return NULL;
+   }
+
+   list_add_tail(>list, sections);
+   return section;
+}
+
+static struct perf_config_item *add_config_item(struct perf_config_section 
*section,
+   const char *name)
+{
+   struct perf_config_item *config_item = zalloc(sizeof(*config_item));
+
+   if (!config_item)
+   return NULL;
+
+   config_item->name = strdup(name);
+   if (!name) {
+   pr_err("%s: strdup 

Re: [PATCH v4 1/4] perf config: Introduce perf_config_set class

2016-03-29 Thread Taeung Song

Hi, Arnaldo and Namhyung

On 03/30/2016 01:12 AM, Arnaldo Carvalho de Melo wrote:

Em Tue, Mar 29, 2016 at 09:43:13AM +0900, Taeung Song escreveu:

This infrastructure code was designed for
upcoming features of perf-config.

That collect config key-value pairs from user and
system config files (i.e. user wide ~/.perfconfig
and system wide $(sysconfdir)/perfconfig)
to manage perf's configs.

Cc: Namhyung Kim 
Cc: Jiri Olsa 
Signed-off-by: Taeung Song 


Waiting for ack.


Namhyung,
The difference between v3 and v4 for this patch as below.
(fill perf_config_set__delete() in collect_config() for state of error)

Can you review this patch, again ?

diff --git a/tools/perf/util/config.c b/tools/perf/util/config.c
index 725015f..2cfafff 100644
--- a/tools/perf/util/config.c
+++ b/tools/perf/util/config.c
@@ -705,14 +706,15 @@ static int set_value(struct perf_config_item 
*config_item, const char *value)

 }

 static int collect_config(const char *var, const char *value,
-  void *section_list)
+  void *perf_config_set)
 {
 int ret = -1;
 char *ptr, *key;
 char *section_name, *name;
 struct perf_config_section *section = NULL;
 struct perf_config_item *config_item = NULL;
-struct list_head *sections = section_list;
+struct perf_config_set *perf_configs = perf_config_set;
+struct list_head *sections = _configs->sections;

 key = ptr = strdup(var);
 if (!key) {
@@ -743,7 +745,8 @@ static int collect_config(const char *var, const 
char *value,


 out_free:
 free(key);
-return ret;
+perf_config_set__delete(perf_configs);
+return -1;
 }

 static struct perf_config_set *perf_config_set__init(struct 
perf_config_set *perf_configs)

@@ -777,7 +780,7 @@ struct perf_config_set *perf_config_set__new(void)
 return NULL;

 perf_config_set__init(perf_configs);
-perf_config(collect_config, _configs->sections);
+perf_config(collect_config, perf_configs);

 return perf_configs;
 }


Thanks,
Taeung



---
  tools/perf/util/config.c | 171 +++
  tools/perf/util/config.h |  26 +++
  2 files changed, 197 insertions(+)
  create mode 100644 tools/perf/util/config.h

diff --git a/tools/perf/util/config.c b/tools/perf/util/config.c
index 4e72763..2dbf47c 100644
--- a/tools/perf/util/config.c
+++ b/tools/perf/util/config.c
@@ -13,6 +13,7 @@
  #include 
  #include "util/hist.h"  /* perf_hist_config */
  #include "util/llvm-utils.h"   /* perf_llvm_config */
+#include "config.h"

  #define MAXNAME (256)

@@ -506,6 +507,176 @@ out:
return ret;
  }

+static struct perf_config_section *find_section(struct list_head *sections,
+   const char *section_name)
+{
+   struct perf_config_section *section;
+
+   list_for_each_entry(section, sections, list)
+   if (!strcmp(section->name, section_name))
+   return section;
+
+   return NULL;
+}
+
+static struct perf_config_item *find_config_item(const char *name,
+struct perf_config_section 
*section)
+{
+   struct perf_config_item *config_item;
+
+   list_for_each_entry(config_item, >config_items, list)
+   if (!strcmp(config_item->name, name))
+   return config_item;
+
+   return NULL;
+}
+
+static void find_config(struct list_head *sections,
+   struct perf_config_section **section,
+   struct perf_config_item **config_item,
+   const char *section_name, const char *name)
+{
+   *section = find_section(sections, section_name);
+
+   if (*section != NULL)
+   *config_item = find_config_item(name, *section);
+   else
+   *config_item = NULL;
+}
+
+static struct perf_config_section *add_section(struct list_head *sections,
+  const char *section_name)
+{
+   struct perf_config_section *section = zalloc(sizeof(*section));
+
+   if (!section)
+   return NULL;
+
+   INIT_LIST_HEAD(>config_items);
+   section->name = strdup(section_name);
+   if (!section->name) {
+   pr_err("%s: strdup failed\n", __func__);
+   free(section);
+   return NULL;
+   }
+
+   list_add_tail(>list, sections);
+   return section;
+}
+
+static struct perf_config_item *add_config_item(struct perf_config_section 
*section,
+   const char *name)
+{
+   struct perf_config_item *config_item = zalloc(sizeof(*config_item));
+
+   if (!config_item)
+   return NULL;
+
+   config_item->name = strdup(name);
+   if (!name) {
+   pr_err("%s: strdup failed\n", __func__);
+   goto out_err;
+   }
+
+ 

Re: [PATCH v4 0/4] SROP Mitigation: Sigreturn Cookies

2016-03-29 Thread Linus Torvalds
On Tue, Mar 29, 2016 at 6:11 PM, Scotty Bauer  wrote:
>
> Yeah I had toyed with using hashes, I used hash_64 not md5 which is like 14
> extra instructions or something.

That sounds fine. Anything that requires enough code to undo that it
kind of defeats the purpose of a SROP should be enough. It's not about
encryption, I'd just think that if you can force the buffer overflow
while already in a signal handler, you'd want something that is at
least *slightly* harder to defeat than a single "xor" instruction.

> It's not hard to implement So I can try it. When you say an extra hardening
> mode do you mean hide it behind a sysctl or some sort of compile time CONFIG?

Since there already is a sysctl, I'd just assume that.

The important part is that the *default* value for that sysctl can't
break real applications. I don't really count CRIU as a real app, if
only because once you start doing checkpoint-restore you are going to
do some amount of system maintenance anyway, so somebody doing CRIU is
kind of expected to have a certain amount of system expertise, I would
say.

But dosemu - or Wine - is very much something that "normal people" run
- people who we do *not* expect to have to know about new sysctl's
etc. They already have one (mmap at zero), but that is very directly
related to what vm86 mode and Wine does, and people have had time to
learn about it. Let's not add another.

So testing dosemu and wine would be good. I wonder what else has shown
issues with signal stack layout changes. Debuggers and some JIT
engines, I suspect.

  Linus


Re: [PATCH v4 0/4] SROP Mitigation: Sigreturn Cookies

2016-03-29 Thread Linus Torvalds
On Tue, Mar 29, 2016 at 6:11 PM, Scotty Bauer  wrote:
>
> Yeah I had toyed with using hashes, I used hash_64 not md5 which is like 14
> extra instructions or something.

That sounds fine. Anything that requires enough code to undo that it
kind of defeats the purpose of a SROP should be enough. It's not about
encryption, I'd just think that if you can force the buffer overflow
while already in a signal handler, you'd want something that is at
least *slightly* harder to defeat than a single "xor" instruction.

> It's not hard to implement So I can try it. When you say an extra hardening
> mode do you mean hide it behind a sysctl or some sort of compile time CONFIG?

Since there already is a sysctl, I'd just assume that.

The important part is that the *default* value for that sysctl can't
break real applications. I don't really count CRIU as a real app, if
only because once you start doing checkpoint-restore you are going to
do some amount of system maintenance anyway, so somebody doing CRIU is
kind of expected to have a certain amount of system expertise, I would
say.

But dosemu - or Wine - is very much something that "normal people" run
- people who we do *not* expect to have to know about new sysctl's
etc. They already have one (mmap at zero), but that is very directly
related to what vm86 mode and Wine does, and people have had time to
learn about it. Let's not add another.

So testing dosemu and wine would be good. I wonder what else has shown
issues with signal stack layout changes. Debuggers and some JIT
engines, I suspect.

  Linus


Re: [PATCH 2/2] x86/mtrr: Refactor PAT initialization code

2016-03-29 Thread Toshi Kani
On Tue, 2016-03-29 at 15:12 -0700, Luis R. Rodriguez wrote:
> On Tue, Mar 29, 2016 at 2:46 PM, Toshi Kani  wrote:
> > On Tue, 2016-03-29 at 10:14 -0700, Luis R. Rodriguez wrote:
> > > On Fri, Mar 18, 2016 at 2:35 PM, Toshi Kani 
> > > wrote:
 :
> > > 
> > > Do we really need UC for the fan?
> > 
> > When you say "we", are you referring Xen guests?  Xen guests do not
> > need to control the fan, so they do not need UC set in MTRRs.
> > 
> > In general, yes, MMIO registers need UC when they need to be accessed.
> 
> Curious, what does a BIOS do for fan control when MTRRs are disabled?

You mean, when the kernel modified the MTRR setup and disabled them.  BIOS
would assume the original setup and still access the registers.  This may
lead to undefined behavior and may result in a system crash.

> Also what if a BIOS just set MSR_MTRRdefType to uncachable only ?

Many BIOSes actually set the default type to UC.  MTRRs then cover regular
memory with WB.

> Wouldn't that help simplify the BIOS when systems are known as not
> wanting to deal with reading MTRRs on the kernel front, even if its
> just to read the setup ?

Nope.

> I'm trying to determine exactly why a BIOS cannot simply enable use an
> alternative for what it needs for fan control and let the kernel live
> without any MTRR code at run time as an option. Although the
> documentation says that the same "procedure" is needed for PAT setup,
> I see it possible to split the skeleton of the code and have each
> peace of code live separately and compartmentalized, they'd just have
> respective calls on the skeleton of the procedure.

I agree that the MTRR rendezvous handler can be improved for PAT, but I do
not see a compelling reason to make such change now.  With my fix, I think
the code works reasonably for Xen.

> > > What is the default for PAT?
> > 
> > There is no such thing as the default for PAT.
> > 
> > > Can't
> > > the same be used so that we way by default all ranges match what is
> > > also the default by PAT? Would that really break fan control ? If we
> > > have a match should't we be able to not have to worry about MTRRs at
> > > all in-kernel even on bare metal?
> > 
> > We do not need to know about BIOS impl, such as fan control, etc.  The
> > point is that if BIOS sets MTRRs, then the kernel keeps their setup.
> 
> Right, if the kernel no longer uses it directly it seems like an
> aweful lot of code to keep updating simply for a BIOS requirement, I'm
> trying to see if we can have the option to live without this
> requirement.

Please be aware of the hibernation case. I think this procedure involves
setting MTRRs back to the original setup.

> > If (virtual) BIOS does not enable MTRRs, the kernel keeps them
> > disabled.  We just need not to mess with the setup.
> 
> Sure, thanks! I'm trying to see if we can have a similar option on bare
> metal.
> 
> > > Another option, which I've alluded to on the Xen thread is skipping
> > > over the MTRR space from the e820 map. Is that not possible ? This
> > > could be last resort... but which I'm hinting more for the Xen side
> > > of things if we *really* need get_mtrr() on the Xen guest side of
> > > things...
> > 
> > There is no MTRR space in the e820 map since they are MSRs.  Since Xen
> > guests disable MTRRs, I do not think you have any issue here...
> 
> Xen seems to clip the e820 map given to a guest in certain MTRR
> conditions, see init_e820(), this calls
> machine_specific_memory_setup() which later clips MTRR if
> mtrr_top_of_ram(). This is an Intel check that trims the e820 map if
> MTRRs were found to be enabled and the default MTRR is not write-back.
> If returns the address of the first non write-back variable MTRR, it
> uses clip_to_limit() to limit the exposed memory [0], notice how
> clip_to_limit() is also used to generally limit exposed memory through
> the opt_mem boot parameter as well. Its not exactly clear why that's
> done, but this looks very similar to the Linux MTRR cleanup -- see
> x86_get_mtrr_mem_range().
> 
> [0] http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/e820.c

It looks to me that the code makes sure all E820_RAM ranges in the e820
table are covered by WB entries of MTRRs.  If not, it trims the e820 table.

I suppose it tries to react on a case when someone modified MTRRs and
resulted in mismatch with the e820 table.  I'd think you do not need this
code as long as you do not modify the MTRR setup.

Thanks,
-Toshi


Re: [PATCH 2/2] x86/mtrr: Refactor PAT initialization code

2016-03-29 Thread Toshi Kani
On Tue, 2016-03-29 at 15:12 -0700, Luis R. Rodriguez wrote:
> On Tue, Mar 29, 2016 at 2:46 PM, Toshi Kani  wrote:
> > On Tue, 2016-03-29 at 10:14 -0700, Luis R. Rodriguez wrote:
> > > On Fri, Mar 18, 2016 at 2:35 PM, Toshi Kani 
> > > wrote:
 :
> > > 
> > > Do we really need UC for the fan?
> > 
> > When you say "we", are you referring Xen guests?  Xen guests do not
> > need to control the fan, so they do not need UC set in MTRRs.
> > 
> > In general, yes, MMIO registers need UC when they need to be accessed.
> 
> Curious, what does a BIOS do for fan control when MTRRs are disabled?

You mean, when the kernel modified the MTRR setup and disabled them.  BIOS
would assume the original setup and still access the registers.  This may
lead to undefined behavior and may result in a system crash.

> Also what if a BIOS just set MSR_MTRRdefType to uncachable only ?

Many BIOSes actually set the default type to UC.  MTRRs then cover regular
memory with WB.

> Wouldn't that help simplify the BIOS when systems are known as not
> wanting to deal with reading MTRRs on the kernel front, even if its
> just to read the setup ?

Nope.

> I'm trying to determine exactly why a BIOS cannot simply enable use an
> alternative for what it needs for fan control and let the kernel live
> without any MTRR code at run time as an option. Although the
> documentation says that the same "procedure" is needed for PAT setup,
> I see it possible to split the skeleton of the code and have each
> peace of code live separately and compartmentalized, they'd just have
> respective calls on the skeleton of the procedure.

I agree that the MTRR rendezvous handler can be improved for PAT, but I do
not see a compelling reason to make such change now.  With my fix, I think
the code works reasonably for Xen.

> > > What is the default for PAT?
> > 
> > There is no such thing as the default for PAT.
> > 
> > > Can't
> > > the same be used so that we way by default all ranges match what is
> > > also the default by PAT? Would that really break fan control ? If we
> > > have a match should't we be able to not have to worry about MTRRs at
> > > all in-kernel even on bare metal?
> > 
> > We do not need to know about BIOS impl, such as fan control, etc.  The
> > point is that if BIOS sets MTRRs, then the kernel keeps their setup.
> 
> Right, if the kernel no longer uses it directly it seems like an
> aweful lot of code to keep updating simply for a BIOS requirement, I'm
> trying to see if we can have the option to live without this
> requirement.

Please be aware of the hibernation case. I think this procedure involves
setting MTRRs back to the original setup.

> > If (virtual) BIOS does not enable MTRRs, the kernel keeps them
> > disabled.  We just need not to mess with the setup.
> 
> Sure, thanks! I'm trying to see if we can have a similar option on bare
> metal.
> 
> > > Another option, which I've alluded to on the Xen thread is skipping
> > > over the MTRR space from the e820 map. Is that not possible ? This
> > > could be last resort... but which I'm hinting more for the Xen side
> > > of things if we *really* need get_mtrr() on the Xen guest side of
> > > things...
> > 
> > There is no MTRR space in the e820 map since they are MSRs.  Since Xen
> > guests disable MTRRs, I do not think you have any issue here...
> 
> Xen seems to clip the e820 map given to a guest in certain MTRR
> conditions, see init_e820(), this calls
> machine_specific_memory_setup() which later clips MTRR if
> mtrr_top_of_ram(). This is an Intel check that trims the e820 map if
> MTRRs were found to be enabled and the default MTRR is not write-back.
> If returns the address of the first non write-back variable MTRR, it
> uses clip_to_limit() to limit the exposed memory [0], notice how
> clip_to_limit() is also used to generally limit exposed memory through
> the opt_mem boot parameter as well. Its not exactly clear why that's
> done, but this looks very similar to the Linux MTRR cleanup -- see
> x86_get_mtrr_mem_range().
> 
> [0] http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/e820.c

It looks to me that the code makes sure all E820_RAM ranges in the e820
table are covered by WB entries of MTRRs.  If not, it trims the e820 table.

I suppose it tries to react on a case when someone modified MTRRs and
resulted in mismatch with the e820 table.  I'd think you do not need this
code as long as you do not modify the MTRR setup.

Thanks,
-Toshi


Re: (mostly) Arch-independent livepatch

2016-03-29 Thread Jiri Kosina
On Tue, 29 Mar 2016, Jessica Yu wrote:

> > v6:
> > - Since we hard-code the field widths for the objname and symbol name
> >   for the sscanf() calls, which are supposed to correspond to the values
> >   of MODULE_NAME_LEN and KSYM_NAME_LEN, use BUILD_BUG_ON() to detect when
> >   the values of these constants deviate from the expected values.
> > - Squash the sample livepatch module patch into patch 4
> >   ("livepatch: reuse module loader code to write relocations") so
> >   git bisects don't break
> > - Don't need the klp_buf struct, just use plain char arrays to hold the
> >   output of sscanf(). Also, no need to clear the bufs after every
> >   invocation, as sscanf() takes care to put a null byte at the end of
> >   the bufs.
> > - Fix compiler kbuild errors for the !CONFIG_LIVEPATCH case
> > - Fixed some small module.c nits
> > 
> 
> Pinging Rusty, just in case this thread got buried :-)
> How do the module.c changes look?

Plus there are (admittedly indeed rather small and trivial) changes to 
s390 module loader, so I'd prefer to have Heiko's / Martin's Ack before 
merging this.

Hence, let me piggy back on this ping to Rusty, and let me ping Heiko and 
Martin as well (adding to CC explicitly to make sure this doesn't get lost 
in general noise).

Thanks,

-- 
Jiri Kosina
SUSE Labs



Re: (mostly) Arch-independent livepatch

2016-03-29 Thread Jiri Kosina
On Tue, 29 Mar 2016, Jessica Yu wrote:

> > v6:
> > - Since we hard-code the field widths for the objname and symbol name
> >   for the sscanf() calls, which are supposed to correspond to the values
> >   of MODULE_NAME_LEN and KSYM_NAME_LEN, use BUILD_BUG_ON() to detect when
> >   the values of these constants deviate from the expected values.
> > - Squash the sample livepatch module patch into patch 4
> >   ("livepatch: reuse module loader code to write relocations") so
> >   git bisects don't break
> > - Don't need the klp_buf struct, just use plain char arrays to hold the
> >   output of sscanf(). Also, no need to clear the bufs after every
> >   invocation, as sscanf() takes care to put a null byte at the end of
> >   the bufs.
> > - Fix compiler kbuild errors for the !CONFIG_LIVEPATCH case
> > - Fixed some small module.c nits
> > 
> 
> Pinging Rusty, just in case this thread got buried :-)
> How do the module.c changes look?

Plus there are (admittedly indeed rather small and trivial) changes to 
s390 module loader, so I'd prefer to have Heiko's / Martin's Ack before 
merging this.

Hence, let me piggy back on this ping to Rusty, and let me ping Heiko and 
Martin as well (adding to CC explicitly to make sure this doesn't get lost 
in general noise).

Thanks,

-- 
Jiri Kosina
SUSE Labs



Re: [PATCH v2 03/15] MIPS: PCI: Compatibility with ARM-like PCI host drivers

2016-03-29 Thread Florian Fainelli
Le 03/02/2016 03:30, Paul Burton a écrit :
> Introduce support for struct hw_pci & the associated pci_common_init_dev
> function as used by the PCI drivers written for ARM platforms under
> drivers/pci. This is in preparation for reusing the xilinx-pcie driver
> on the MIPS Boston board.
> 
> Platforms that make use of this more generic code will need to select
> CONFIG_MIPS_GENERIC_PCI. Platforms which don't will continue to work as
> they have, with the intent that PCI drivers be migrated towards struct
> hw_pci & drivers/pci/ over time.
> 
> Signed-off-by: Paul Burton 
> ---

[snip]

> + if (hw->preinit)
> + hw->preinit();
> +
> + ret = hw->setup(i, >sysdata);
> + if (ret < 0) {

This needs to be ret <= 0 to be compliant with what ARM PCI host
controllers do, which is return 1 in case they could get hw->setup to
finish with success, and 0 or negative if they could not, see
arch/arm/kernel/bios32.c.
-- 
Florian


Re: [PATCH v2 03/15] MIPS: PCI: Compatibility with ARM-like PCI host drivers

2016-03-29 Thread Florian Fainelli
Le 03/02/2016 03:30, Paul Burton a écrit :
> Introduce support for struct hw_pci & the associated pci_common_init_dev
> function as used by the PCI drivers written for ARM platforms under
> drivers/pci. This is in preparation for reusing the xilinx-pcie driver
> on the MIPS Boston board.
> 
> Platforms that make use of this more generic code will need to select
> CONFIG_MIPS_GENERIC_PCI. Platforms which don't will continue to work as
> they have, with the intent that PCI drivers be migrated towards struct
> hw_pci & drivers/pci/ over time.
> 
> Signed-off-by: Paul Burton 
> ---

[snip]

> + if (hw->preinit)
> + hw->preinit();
> +
> + ret = hw->setup(i, >sysdata);
> + if (ret < 0) {

This needs to be ret <= 0 to be compliant with what ARM PCI host
controllers do, which is return 1 in case they could get hw->setup to
finish with success, and 0 or negative if they could not, see
arch/arm/kernel/bios32.c.
-- 
Florian


Re: [PATCH v4 0/4] SROP Mitigation: Sigreturn Cookies

2016-03-29 Thread Scotty Bauer


On 03/29/2016 04:34 PM, Linus Torvalds wrote:
> On Tue, Mar 29, 2016 at 4:38 PM, Andy Lutomirski  wrote:
>>
>> Then there's an unanswered question: is this patch acceptable given
>> that it's an ABI break?  Security fixes are sometimes an exception to
>> the "no ABI breaks" rule, but it's by no means an automatic exception.
> 
> So there isn't any "no ABI break" rule - there is only a "does it
> break real applications" rule.
> 
> (This can also be re-stated as: "Talk is cheap", aka "reality trumps
> documentation".
> 
> Documentation is meaningless if it doesn't match reality, and what we
> actually *do* is what matters.
> 
> So the ABI isn't about some theoretical interface documentation, the
> ABI is about what people use and have tested.
> 
> On the one hand, that means that that our ABI is _stricter_ than any
> documentatiuon, and that "but we can make this change that breaks app
> XYZ, because XYZ is depending on undocumented behavior" is not an
> acceptable excuse.
> 
> But on the other hand it *also* means that since the ABI is about real
> programs, not theoretical issues, we can also change things as long as
> we don't actually break anything that people can notice and depend
> on).
> 
> And while *acute* security holes will be fixed regardless of ABI
> issues, something like this that is only hardening rather than fixing
> a particular security hole, really needs to not break any
> applications.
> 
> Because if it does break anything, it needs to be turned off by
> default. That's a hard rule. And since that would be largely defeating
> the whole point o fthe series, I think we really need to have made
> sure nothing breaks before a patch series like this can be accepted.
> 
> That said, if this is done right, I don't think it will break
> anything. CRIU may indeed be a special case, but CRIU isn't really a
> normal application, and the CRIU people may need to turn this off
> explicitly, if it does break.
> 
> But yes, dosemu needs to be tested, and needs to just continue
> working. But does dosemu actually create a signal stack, as opposed to
> just playing with one that has been created for it? I thought it was
> just the latter case, which should be ok even with a magic cookie in
> there.
> 
>Linus
> 


For what it's worth this series is breaking CRIU, I just tested:

root@node0:/mnt/criu# criu restore - -o restore.log --shell-job
root@node0:/mnt/criu# tail -3 /var/log/syslog
Mar 29 17:12:08 localhost kernel: [ 3554.625535] Possible exploit attempt or 
buggy program!
Mar 29 17:12:08 localhost kernel: [ 3554.625535] If you believe this is an 
error you can disable SROP  Protection by #echo 1 > 
/proc/sys/kernel/disable-srop-protection
Mar 29 17:12:08 localhost kernel: [ 3554.625545] test_[25305] bad frame in 
rt_sigreturn frame:0001e540 ip:7f561542cf20 sp:7ffe004ecfd8 
orax: in libc-2.19.so[7f561536c000+1bb0]
root@node0:/mnt/criu# echo 1 > /proc/sys/kernel/disable-srop-protection 
root@node0:/mnt/criu# criu restore - -o restore.log --shell-job
slept for one second
slept for one second
slept for one second
slept for one second
root@node0:/mnt/criu# 


I'm working on getting dosemu up and running-- are there any other applications
off the top of your head that I should be testing with?





Re: [PATCH v4 0/4] SROP Mitigation: Sigreturn Cookies

2016-03-29 Thread Scotty Bauer


On 03/29/2016 04:34 PM, Linus Torvalds wrote:
> On Tue, Mar 29, 2016 at 4:38 PM, Andy Lutomirski  wrote:
>>
>> Then there's an unanswered question: is this patch acceptable given
>> that it's an ABI break?  Security fixes are sometimes an exception to
>> the "no ABI breaks" rule, but it's by no means an automatic exception.
> 
> So there isn't any "no ABI break" rule - there is only a "does it
> break real applications" rule.
> 
> (This can also be re-stated as: "Talk is cheap", aka "reality trumps
> documentation".
> 
> Documentation is meaningless if it doesn't match reality, and what we
> actually *do* is what matters.
> 
> So the ABI isn't about some theoretical interface documentation, the
> ABI is about what people use and have tested.
> 
> On the one hand, that means that that our ABI is _stricter_ than any
> documentatiuon, and that "but we can make this change that breaks app
> XYZ, because XYZ is depending on undocumented behavior" is not an
> acceptable excuse.
> 
> But on the other hand it *also* means that since the ABI is about real
> programs, not theoretical issues, we can also change things as long as
> we don't actually break anything that people can notice and depend
> on).
> 
> And while *acute* security holes will be fixed regardless of ABI
> issues, something like this that is only hardening rather than fixing
> a particular security hole, really needs to not break any
> applications.
> 
> Because if it does break anything, it needs to be turned off by
> default. That's a hard rule. And since that would be largely defeating
> the whole point o fthe series, I think we really need to have made
> sure nothing breaks before a patch series like this can be accepted.
> 
> That said, if this is done right, I don't think it will break
> anything. CRIU may indeed be a special case, but CRIU isn't really a
> normal application, and the CRIU people may need to turn this off
> explicitly, if it does break.
> 
> But yes, dosemu needs to be tested, and needs to just continue
> working. But does dosemu actually create a signal stack, as opposed to
> just playing with one that has been created for it? I thought it was
> just the latter case, which should be ok even with a magic cookie in
> there.
> 
>Linus
> 


For what it's worth this series is breaking CRIU, I just tested:

root@node0:/mnt/criu# criu restore - -o restore.log --shell-job
root@node0:/mnt/criu# tail -3 /var/log/syslog
Mar 29 17:12:08 localhost kernel: [ 3554.625535] Possible exploit attempt or 
buggy program!
Mar 29 17:12:08 localhost kernel: [ 3554.625535] If you believe this is an 
error you can disable SROP  Protection by #echo 1 > 
/proc/sys/kernel/disable-srop-protection
Mar 29 17:12:08 localhost kernel: [ 3554.625545] test_[25305] bad frame in 
rt_sigreturn frame:0001e540 ip:7f561542cf20 sp:7ffe004ecfd8 
orax: in libc-2.19.so[7f561536c000+1bb0]
root@node0:/mnt/criu# echo 1 > /proc/sys/kernel/disable-srop-protection 
root@node0:/mnt/criu# criu restore - -o restore.log --shell-job
slept for one second
slept for one second
slept for one second
slept for one second
root@node0:/mnt/criu# 


I'm working on getting dosemu up and running-- are there any other applications
off the top of your head that I should be testing with?





[ANNOUNCE] 3.10.101-rt111

2016-03-29 Thread Steven Rostedt

Dear RT Folks,

I'm pleased to announce the 3.10.101-rt111 stable release.


This release is just an update to the new stable 3.10.101 version
and no RT specific changes have been made.


You can get this release via the git tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git

  branch: v3.10-rt
  Head SHA1: 0b5a9056e6b482592c02b6253da383a41cf7e255


Or to build 3.10.101-rt111 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v3.x/linux-3.10.tar.xz

  http://www.kernel.org/pub/linux/kernel/v3.x/patch-3.10.101.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/3.10/patch-3.10.101-rt111.patch.xz




Enjoy,

-- Steve



[ANNOUNCE] 3.12.57-rt77

2016-03-29 Thread Steven Rostedt

Dear RT Folks,

I'm pleased to announce the 3.12.57-rt77 stable release.


This release is just an update to the new stable 3.12.57 version
and no RT specific changes have been made.


You can get this release via the git tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git

  branch: v3.12-rt
  Head SHA1: b3636f196951441e6f578d909482b903a5aee40b


Or to build 3.12.57-rt77 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v3.x/linux-3.12.tar.xz

  http://www.kernel.org/pub/linux/kernel/v3.x/patch-3.12.57.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/3.12/patch-3.12.57-rt77.patch.xz




Enjoy,

-- Steve



[ANNOUNCE] 3.4.111-rt141

2016-03-29 Thread Steven Rostedt

Dear RT Folks,

I'm pleased to announce the 3.4.111-rt141 stable release.


This release is just an update to the new stable 3.4.111 version
and no RT specific changes have been made.


You can get this release via the git tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git

  branch: v3.4-rt
  Head SHA1: 588e04089a1afd30416b43c0091922162d86dcab


Or to build 3.4.111-rt141 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v3.x/linux-3.4.tar.xz

  http://www.kernel.org/pub/linux/kernel/v3.x/patch-3.4.111.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/3.4/patch-3.4.111-rt141.patch.xz




Enjoy,

-- Steve



[ANNOUNCE] 3.12.57-rt77

2016-03-29 Thread Steven Rostedt

Dear RT Folks,

I'm pleased to announce the 3.12.57-rt77 stable release.


This release is just an update to the new stable 3.12.57 version
and no RT specific changes have been made.


You can get this release via the git tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git

  branch: v3.12-rt
  Head SHA1: b3636f196951441e6f578d909482b903a5aee40b


Or to build 3.12.57-rt77 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v3.x/linux-3.12.tar.xz

  http://www.kernel.org/pub/linux/kernel/v3.x/patch-3.12.57.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/3.12/patch-3.12.57-rt77.patch.xz




Enjoy,

-- Steve



[ANNOUNCE] 3.4.111-rt141

2016-03-29 Thread Steven Rostedt

Dear RT Folks,

I'm pleased to announce the 3.4.111-rt141 stable release.


This release is just an update to the new stable 3.4.111 version
and no RT specific changes have been made.


You can get this release via the git tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git

  branch: v3.4-rt
  Head SHA1: 588e04089a1afd30416b43c0091922162d86dcab


Or to build 3.4.111-rt141 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v3.x/linux-3.4.tar.xz

  http://www.kernel.org/pub/linux/kernel/v3.x/patch-3.4.111.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/3.4/patch-3.4.111-rt141.patch.xz




Enjoy,

-- Steve



[ANNOUNCE] 3.10.101-rt111

2016-03-29 Thread Steven Rostedt

Dear RT Folks,

I'm pleased to announce the 3.10.101-rt111 stable release.


This release is just an update to the new stable 3.10.101 version
and no RT specific changes have been made.


You can get this release via the git tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git

  branch: v3.10-rt
  Head SHA1: 0b5a9056e6b482592c02b6253da383a41cf7e255


Or to build 3.10.101-rt111 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v3.x/linux-3.10.tar.xz

  http://www.kernel.org/pub/linux/kernel/v3.x/patch-3.10.101.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/3.10/patch-3.10.101-rt111.patch.xz




Enjoy,

-- Steve



[ANNOUNCE] 3.14.65-rt68

2016-03-29 Thread Steven Rostedt

Dear RT Folks,

I'm pleased to announce the 3.14.65-rt68 stable release.


This release is just an update to the new stable 3.14.65 version
and no RT specific changes have been made.


You can get this release via the git tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git

  branch: v3.14-rt
  Head SHA1: 85093d7b4dfe1f0080bfd557a32bc717bd7c4a03


Or to build 3.14.65-rt68 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v3.x/linux-3.14.tar.xz

  http://www.kernel.org/pub/linux/kernel/v3.x/patch-3.14.65.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/3.14/patch-3.14.65-rt68.patch.xz




Enjoy,

-- Steve



[ANNOUNCE] 3.14.65-rt68

2016-03-29 Thread Steven Rostedt

Dear RT Folks,

I'm pleased to announce the 3.14.65-rt68 stable release.


This release is just an update to the new stable 3.14.65 version
and no RT specific changes have been made.


You can get this release via the git tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git

  branch: v3.14-rt
  Head SHA1: 85093d7b4dfe1f0080bfd557a32bc717bd7c4a03


Or to build 3.14.65-rt68 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v3.x/linux-3.14.tar.xz

  http://www.kernel.org/pub/linux/kernel/v3.x/patch-3.14.65.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/3.14/patch-3.14.65-rt68.patch.xz




Enjoy,

-- Steve



Re: [PATCH v4 0/4] SROP Mitigation: Sigreturn Cookies

2016-03-29 Thread Scotty Bauer


On 03/29/2016 04:54 PM, Linus Torvalds wrote:
> On Tue, Mar 29, 2016 at 2:53 PM, Scott Bauer  wrote:
>>
>> These patches implement the necessary changes to generate a cookie
>> which will be placed above signal frame upon signal delivery to userland.
>> The cookie is generated using a per-process random value xor'd with
>> the address where the cookie will be stored on the stack.
> 
> Side note: wouldn't it be better to make the cookie something that
> doesn't make it trivial to figure out the random value in case you
> already have access to a signal stack?
> 
> Maybe there could be a stronger variation of this that makes the
> cookie be something like a single md5 round (not a full md5).
> Something fast, and not necessarily secure, but something that needs
> more than one single CPU instruction to figure out.
> 
> So you could do 4 32
> 
>  - the random value
>  - the low 32 bits of the address of the cookie
>  - the low 32 bits of the return point stack and instruction pointer
> 
> Yes, yes, md5 is not cryptographically secure, and making it a single
> iteration rather than the full four makes it even less so, but if the
> attacker can generate long arbitrary code, then the whole SROP is
> pointless to begin with, no?
> 

Yeah I had toyed with using hashes, I used hash_64 not md5 which is like 14
extra instructions or something. Anyway Daniel Micay pointed out we could use 
SipHash
https://131002.net/siphash/, but there's no siphash for me to use in the kernel
and I'm the *last* person on earth to start porting/implementing 'crypto' algos.

Anyway, we all sort of agreed that if you have enough arbitrary execution 
already
to cause a signal, leak the cookie, do some xor magic to get the per-process 
secret then you probably don't really need to SROP in your exploit. Although
you did mention an interesting attack which is force a signal then muck with
an existing legitimate frame, which I would like to protect against now.

> In contrast, with the plain xor, the SROP would be a trivial operation
> if you can just force it to happen within the context of a signal, so
> that you can just re-use the signal return stack as-is. But mixing in
> the returning IP and SP would make it *much* harder to use the
> sigreturn as an attack vector.
> 
> I realize that this would likely need to be a separate and non-default
> extra hardening mode, because there are *definitely* applications that
> take signals and then update the return address (maybe single-stepping
> over instructions etc). But for a *lot* of applications, signal return
> implies changing no signal state at all, and mixing in the returning
> IP and SP would seem to be a fundamentally stronger cookie.
> 
> No?

It's not hard to implement So I can try it. When you say an extra hardening
mode do you mean hide it behind a sysctl or some sort of compile time CONFIG?



[ANNOUNCE] 3.18.29-rt30

2016-03-29 Thread Steven Rostedt

Dear RT Folks,

I'm pleased to announce the 3.18.29-rt30 stable release.

Note, 3.18.29-rt29 was also released that only included the stable
update. But -rt30 is released because the last pull of changes to
3.18-rt included a change that was also reverted in upstream. This
release includes both the stable update and that revert.

You can get this release via the git tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git

  branch: v3.18-rt
  Head SHA1: 18303cd31e84f06eb780dd58ea213e4a515609fc


Or to build 3.18.29-rt30 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v3.x/linux-3.18.tar.xz

  http://www.kernel.org/pub/linux/kernel/v3.x/patch-3.18.29.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/3.18/patch-3.18.29-rt30.patch.xz



You can also build from 3.18.29-rt29 by applying the incremental patch:

  
http://www.kernel.org/pub/linux/kernel/projects/rt/3.18/incr/patch-3.18.29-rt29-rt30.patch.xz



Enjoy,

-- Steve


Changes from v3.18.29-rt29:

---

Steven Rostedt (Red Hat) (1):
  Linux 3.18.29-rt30

Thomas Gleixner (1):
  Revert d04ea10ba1ea mmc: sdhci: don't provide hard irq handler


 drivers/mmc/host/sdhci.c | 32 +---
 localversion-rt  |  2 +-
 2 files changed, 6 insertions(+), 28 deletions(-)
---
diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index 9411f8b0cd11..9109287e47ac 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -2565,31 +2565,6 @@ static irqreturn_t sdhci_thread_irq(int irq, void 
*dev_id)
return isr ? IRQ_HANDLED : IRQ_NONE;
 }
 
-#ifdef CONFIG_PREEMPT_RT_BASE
-static irqreturn_t sdhci_rt_irq(int irq, void *dev_id)
-{
-   irqreturn_t ret;
-
-   local_bh_disable();
-   ret = sdhci_irq(irq, dev_id);
-   local_bh_enable();
-   if (ret == IRQ_WAKE_THREAD)
-   ret = sdhci_thread_irq(irq, dev_id);
-   return ret;
-}
-#endif
-
-static int sdhci_req_irq(struct sdhci_host *host)
-{
-#ifdef CONFIG_PREEMPT_RT_BASE
-   return request_threaded_irq(host->irq, NULL, sdhci_rt_irq,
-   IRQF_SHARED, mmc_hostname(host->mmc), host);
-#else
-   return request_threaded_irq(host->irq, sdhci_irq, sdhci_thread_irq,
-   IRQF_SHARED, mmc_hostname(host->mmc), host);
-#endif
-}
-
 /*\
  *   *
  * Suspend/resume*
@@ -2657,7 +2632,9 @@ int sdhci_resume_host(struct sdhci_host *host)
}
 
if (!device_may_wakeup(mmc_dev(host->mmc))) {
-   ret = sdhci_req_irq(host);
+   ret = request_threaded_irq(host->irq, sdhci_irq,
+  sdhci_thread_irq, IRQF_SHARED,
+  mmc_hostname(host->mmc), host);
if (ret)
return ret;
} else {
@@ -3276,7 +3253,8 @@ int sdhci_add_host(struct sdhci_host *host)
 
sdhci_init(host, 0);
 
-   ret = sdhci_req_irq(host);
+   ret = request_threaded_irq(host->irq, sdhci_irq, sdhci_thread_irq,
+  IRQF_SHARED, mmc_hostname(mmc), host);
if (ret) {
pr_err("%s: Failed to request IRQ %d: %d\n",
   mmc_hostname(mmc), host->irq, ret);
diff --git a/localversion-rt b/localversion-rt
index 90290c642ed5..b72862e06be4 100644
--- a/localversion-rt
+++ b/localversion-rt
@@ -1 +1 @@
--rt29
+-rt30


Re: [PATCH v4 0/4] SROP Mitigation: Sigreturn Cookies

2016-03-29 Thread Scotty Bauer


On 03/29/2016 04:54 PM, Linus Torvalds wrote:
> On Tue, Mar 29, 2016 at 2:53 PM, Scott Bauer  wrote:
>>
>> These patches implement the necessary changes to generate a cookie
>> which will be placed above signal frame upon signal delivery to userland.
>> The cookie is generated using a per-process random value xor'd with
>> the address where the cookie will be stored on the stack.
> 
> Side note: wouldn't it be better to make the cookie something that
> doesn't make it trivial to figure out the random value in case you
> already have access to a signal stack?
> 
> Maybe there could be a stronger variation of this that makes the
> cookie be something like a single md5 round (not a full md5).
> Something fast, and not necessarily secure, but something that needs
> more than one single CPU instruction to figure out.
> 
> So you could do 4 32
> 
>  - the random value
>  - the low 32 bits of the address of the cookie
>  - the low 32 bits of the return point stack and instruction pointer
> 
> Yes, yes, md5 is not cryptographically secure, and making it a single
> iteration rather than the full four makes it even less so, but if the
> attacker can generate long arbitrary code, then the whole SROP is
> pointless to begin with, no?
> 

Yeah I had toyed with using hashes, I used hash_64 not md5 which is like 14
extra instructions or something. Anyway Daniel Micay pointed out we could use 
SipHash
https://131002.net/siphash/, but there's no siphash for me to use in the kernel
and I'm the *last* person on earth to start porting/implementing 'crypto' algos.

Anyway, we all sort of agreed that if you have enough arbitrary execution 
already
to cause a signal, leak the cookie, do some xor magic to get the per-process 
secret then you probably don't really need to SROP in your exploit. Although
you did mention an interesting attack which is force a signal then muck with
an existing legitimate frame, which I would like to protect against now.

> In contrast, with the plain xor, the SROP would be a trivial operation
> if you can just force it to happen within the context of a signal, so
> that you can just re-use the signal return stack as-is. But mixing in
> the returning IP and SP would make it *much* harder to use the
> sigreturn as an attack vector.
> 
> I realize that this would likely need to be a separate and non-default
> extra hardening mode, because there are *definitely* applications that
> take signals and then update the return address (maybe single-stepping
> over instructions etc). But for a *lot* of applications, signal return
> implies changing no signal state at all, and mixing in the returning
> IP and SP would seem to be a fundamentally stronger cookie.
> 
> No?

It's not hard to implement So I can try it. When you say an extra hardening
mode do you mean hide it behind a sysctl or some sort of compile time CONFIG?



[ANNOUNCE] 3.18.29-rt30

2016-03-29 Thread Steven Rostedt

Dear RT Folks,

I'm pleased to announce the 3.18.29-rt30 stable release.

Note, 3.18.29-rt29 was also released that only included the stable
update. But -rt30 is released because the last pull of changes to
3.18-rt included a change that was also reverted in upstream. This
release includes both the stable update and that revert.

You can get this release via the git tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git

  branch: v3.18-rt
  Head SHA1: 18303cd31e84f06eb780dd58ea213e4a515609fc


Or to build 3.18.29-rt30 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v3.x/linux-3.18.tar.xz

  http://www.kernel.org/pub/linux/kernel/v3.x/patch-3.18.29.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/3.18/patch-3.18.29-rt30.patch.xz



You can also build from 3.18.29-rt29 by applying the incremental patch:

  
http://www.kernel.org/pub/linux/kernel/projects/rt/3.18/incr/patch-3.18.29-rt29-rt30.patch.xz



Enjoy,

-- Steve


Changes from v3.18.29-rt29:

---

Steven Rostedt (Red Hat) (1):
  Linux 3.18.29-rt30

Thomas Gleixner (1):
  Revert d04ea10ba1ea mmc: sdhci: don't provide hard irq handler


 drivers/mmc/host/sdhci.c | 32 +---
 localversion-rt  |  2 +-
 2 files changed, 6 insertions(+), 28 deletions(-)
---
diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index 9411f8b0cd11..9109287e47ac 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -2565,31 +2565,6 @@ static irqreturn_t sdhci_thread_irq(int irq, void 
*dev_id)
return isr ? IRQ_HANDLED : IRQ_NONE;
 }
 
-#ifdef CONFIG_PREEMPT_RT_BASE
-static irqreturn_t sdhci_rt_irq(int irq, void *dev_id)
-{
-   irqreturn_t ret;
-
-   local_bh_disable();
-   ret = sdhci_irq(irq, dev_id);
-   local_bh_enable();
-   if (ret == IRQ_WAKE_THREAD)
-   ret = sdhci_thread_irq(irq, dev_id);
-   return ret;
-}
-#endif
-
-static int sdhci_req_irq(struct sdhci_host *host)
-{
-#ifdef CONFIG_PREEMPT_RT_BASE
-   return request_threaded_irq(host->irq, NULL, sdhci_rt_irq,
-   IRQF_SHARED, mmc_hostname(host->mmc), host);
-#else
-   return request_threaded_irq(host->irq, sdhci_irq, sdhci_thread_irq,
-   IRQF_SHARED, mmc_hostname(host->mmc), host);
-#endif
-}
-
 /*\
  *   *
  * Suspend/resume*
@@ -2657,7 +2632,9 @@ int sdhci_resume_host(struct sdhci_host *host)
}
 
if (!device_may_wakeup(mmc_dev(host->mmc))) {
-   ret = sdhci_req_irq(host);
+   ret = request_threaded_irq(host->irq, sdhci_irq,
+  sdhci_thread_irq, IRQF_SHARED,
+  mmc_hostname(host->mmc), host);
if (ret)
return ret;
} else {
@@ -3276,7 +3253,8 @@ int sdhci_add_host(struct sdhci_host *host)
 
sdhci_init(host, 0);
 
-   ret = sdhci_req_irq(host);
+   ret = request_threaded_irq(host->irq, sdhci_irq, sdhci_thread_irq,
+  IRQF_SHARED, mmc_hostname(mmc), host);
if (ret) {
pr_err("%s: Failed to request IRQ %d: %d\n",
   mmc_hostname(mmc), host->irq, ret);
diff --git a/localversion-rt b/localversion-rt
index 90290c642ed5..b72862e06be4 100644
--- a/localversion-rt
+++ b/localversion-rt
@@ -1 +1 @@
--rt29
+-rt30


[PATCH 1/1] perf tools: Add missing initialization of perf_sample.cpumode in synthesized samples

2016-03-29 Thread Arnaldo Carvalho de Melo
From: Arnaldo Carvalho de Melo 

In 473398a21d28 ("perf tools: Add cpumode to struct perf_sample"), I
missed some places where perf_sample fields are directly initialized in
addition to what is done in perf_evsel__parse_sample(), namely when
synthesizing PERF_RECORD_{MMAP*,COMM,FORK,EXIT} for pre-existing threads
and also in intel_pt and intel_bts when synthesizing events from
processor trace, the jitdump code also was affected, fix it.

The problem was noticed with running:

  # perf record -e intel_pt//u true
  # perf script

Where the samples wouldn't get resolved because perf_sample.cpumode
would be left as zero, i.e. PERF_RECORD_MISC_CPUMODE_UNKNOWN, not
resolving as kernel, hypervisor or user cpu modes.

Cc: Alexander Shishkin 
Cc: Andi Kleen 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Fixes: 473398a21d28 ("perf tools: Add cpumode to struct perf_sample")
Link: http://lkml.kernel.org/n/tip-n5sdauxgk24d5nun8kuuu...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/event.c | 23 ---
 tools/perf/util/intel-bts.c |  1 +
 tools/perf/util/intel-pt.c  |  3 +++
 tools/perf/util/jitdump.c   |  2 ++
 4 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 52cf479bc593..dad55d04ffdd 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -56,13 +56,22 @@ const char *perf_event__name(unsigned int id)
return perf_event__names[id];
 }
 
-static struct perf_sample synth_sample = {
+static int perf_tool__process_synth_event(struct perf_tool *tool,
+ union perf_event *event,
+ struct machine *machine,
+ perf_event__handler_t process)
+{
+   struct perf_sample synth_sample = {
.pid   = -1,
.tid   = -1,
.time  = -1,
.stream_id = -1,
.cpu   = -1,
.period= 1,
+   .cpumode   = event->header.misc & PERF_RECORD_MISC_CPUMODE_MASK,
+   };
+
+   return process(tool, event, _sample, machine);
 };
 
 /*
@@ -186,7 +195,7 @@ pid_t perf_event__synthesize_comm(struct perf_tool *tool,
if (perf_event__prepare_comm(event, pid, machine, , ) != 0)
return -1;
 
-   if (process(tool, event, _sample, machine) != 0)
+   if (perf_tool__process_synth_event(tool, event, machine, process) != 0)
return -1;
 
return tgid;
@@ -218,7 +227,7 @@ static int perf_event__synthesize_fork(struct perf_tool 
*tool,
 
event->fork.header.size = (sizeof(event->fork) + machine->id_hdr_size);
 
-   if (process(tool, event, _sample, machine) != 0)
+   if (perf_tool__process_synth_event(tool, event, machine, process) != 0)
return -1;
 
return 0;
@@ -344,7 +353,7 @@ out:
event->mmap2.pid = tgid;
event->mmap2.tid = pid;
 
-   if (process(tool, event, _sample, machine) != 0) {
+   if (perf_tool__process_synth_event(tool, event, machine, 
process) != 0) {
rc = -1;
break;
}
@@ -402,7 +411,7 @@ int perf_event__synthesize_modules(struct perf_tool *tool,
 
memcpy(event->mmap.filename, pos->dso->long_name,
   pos->dso->long_name_len + 1);
-   if (process(tool, event, _sample, machine) != 0) {
+   if (perf_tool__process_synth_event(tool, event, machine, 
process) != 0) {
rc = -1;
break;
}
@@ -472,7 +481,7 @@ static int __event__synthesize_thread(union perf_event 
*comm_event,
/*
 * Send the prepared comm event
 */
-   if (process(tool, comm_event, _sample, machine) != 0)
+   if (perf_tool__process_synth_event(tool, comm_event, machine, 
process) != 0)
break;
 
rc = 0;
@@ -701,7 +710,7 @@ int perf_event__synthesize_kernel_mmap(struct perf_tool 
*tool,
event->mmap.len   = map->end - event->mmap.start;
event->mmap.pid   = machine->pid;
 
-   err = process(tool, event, _sample, machine);
+   err = perf_tool__process_synth_event(tool, event, machine, process);
free(event);
 
return err;
diff --git a/tools/perf/util/intel-bts.c b/tools/perf/util/intel-bts.c
index 6bc3ecd2e7ca..abf1366e2a24 100644
--- a/tools/perf/util/intel-bts.c
+++ b/tools/perf/util/intel-bts.c
@@ -279,6 +279,7 @@ static int intel_bts_synth_branch_sample(struct 
intel_bts_queue *btsq,
event.sample.header.misc = 

[PATCH 1/1] perf tools: Add missing initialization of perf_sample.cpumode in synthesized samples

2016-03-29 Thread Arnaldo Carvalho de Melo
From: Arnaldo Carvalho de Melo 

In 473398a21d28 ("perf tools: Add cpumode to struct perf_sample"), I
missed some places where perf_sample fields are directly initialized in
addition to what is done in perf_evsel__parse_sample(), namely when
synthesizing PERF_RECORD_{MMAP*,COMM,FORK,EXIT} for pre-existing threads
and also in intel_pt and intel_bts when synthesizing events from
processor trace, the jitdump code also was affected, fix it.

The problem was noticed with running:

  # perf record -e intel_pt//u true
  # perf script

Where the samples wouldn't get resolved because perf_sample.cpumode
would be left as zero, i.e. PERF_RECORD_MISC_CPUMODE_UNKNOWN, not
resolving as kernel, hypervisor or user cpu modes.

Cc: Alexander Shishkin 
Cc: Andi Kleen 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Fixes: 473398a21d28 ("perf tools: Add cpumode to struct perf_sample")
Link: http://lkml.kernel.org/n/tip-n5sdauxgk24d5nun8kuuu...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/event.c | 23 ---
 tools/perf/util/intel-bts.c |  1 +
 tools/perf/util/intel-pt.c  |  3 +++
 tools/perf/util/jitdump.c   |  2 ++
 4 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 52cf479bc593..dad55d04ffdd 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -56,13 +56,22 @@ const char *perf_event__name(unsigned int id)
return perf_event__names[id];
 }
 
-static struct perf_sample synth_sample = {
+static int perf_tool__process_synth_event(struct perf_tool *tool,
+ union perf_event *event,
+ struct machine *machine,
+ perf_event__handler_t process)
+{
+   struct perf_sample synth_sample = {
.pid   = -1,
.tid   = -1,
.time  = -1,
.stream_id = -1,
.cpu   = -1,
.period= 1,
+   .cpumode   = event->header.misc & PERF_RECORD_MISC_CPUMODE_MASK,
+   };
+
+   return process(tool, event, _sample, machine);
 };
 
 /*
@@ -186,7 +195,7 @@ pid_t perf_event__synthesize_comm(struct perf_tool *tool,
if (perf_event__prepare_comm(event, pid, machine, , ) != 0)
return -1;
 
-   if (process(tool, event, _sample, machine) != 0)
+   if (perf_tool__process_synth_event(tool, event, machine, process) != 0)
return -1;
 
return tgid;
@@ -218,7 +227,7 @@ static int perf_event__synthesize_fork(struct perf_tool 
*tool,
 
event->fork.header.size = (sizeof(event->fork) + machine->id_hdr_size);
 
-   if (process(tool, event, _sample, machine) != 0)
+   if (perf_tool__process_synth_event(tool, event, machine, process) != 0)
return -1;
 
return 0;
@@ -344,7 +353,7 @@ out:
event->mmap2.pid = tgid;
event->mmap2.tid = pid;
 
-   if (process(tool, event, _sample, machine) != 0) {
+   if (perf_tool__process_synth_event(tool, event, machine, 
process) != 0) {
rc = -1;
break;
}
@@ -402,7 +411,7 @@ int perf_event__synthesize_modules(struct perf_tool *tool,
 
memcpy(event->mmap.filename, pos->dso->long_name,
   pos->dso->long_name_len + 1);
-   if (process(tool, event, _sample, machine) != 0) {
+   if (perf_tool__process_synth_event(tool, event, machine, 
process) != 0) {
rc = -1;
break;
}
@@ -472,7 +481,7 @@ static int __event__synthesize_thread(union perf_event 
*comm_event,
/*
 * Send the prepared comm event
 */
-   if (process(tool, comm_event, _sample, machine) != 0)
+   if (perf_tool__process_synth_event(tool, comm_event, machine, 
process) != 0)
break;
 
rc = 0;
@@ -701,7 +710,7 @@ int perf_event__synthesize_kernel_mmap(struct perf_tool 
*tool,
event->mmap.len   = map->end - event->mmap.start;
event->mmap.pid   = machine->pid;
 
-   err = process(tool, event, _sample, machine);
+   err = perf_tool__process_synth_event(tool, event, machine, process);
free(event);
 
return err;
diff --git a/tools/perf/util/intel-bts.c b/tools/perf/util/intel-bts.c
index 6bc3ecd2e7ca..abf1366e2a24 100644
--- a/tools/perf/util/intel-bts.c
+++ b/tools/perf/util/intel-bts.c
@@ -279,6 +279,7 @@ static int intel_bts_synth_branch_sample(struct 
intel_bts_queue *btsq,
event.sample.header.misc = PERF_RECORD_MISC_USER;
event.sample.header.size = sizeof(struct perf_event_header);
 
+   sample.cpumode = PERF_RECORD_MISC_USER;
sample.ip = le64_to_cpu(branch->from);
sample.pid = 

Re: [PATCH] regulator: qcom_spmi: Add slewing delays for all SMPS types

2016-03-29 Thread Mark Brown
On Tue, Mar 29, 2016 at 03:58:40PM -0700, Stephen Boyd wrote:

> - if (vreg->logical_type == SPMI_REGULATOR_LOGICAL_TYPE_FTSMPS) {
> - ret = spmi_regulator_ftsmps_init_slew_rate(vreg);
> + if (vreg->logical_type == SPMI_REGULATOR_LOGICAL_TYPE_FTSMPS ||
> + vreg->logical_type == SPMI_REGULATOR_LOGICAL_TYPE_ULT_LO_SMPS ||
> + vreg->logical_type == SPMI_REGULATOR_LOGICAL_TYPE_ULT_HO_SMPS ||
> + vreg->logical_type == SPMI_REGULATOR_LOGICAL_TYPE_SMPS) {
> + ret = spmi_regulator_init_slew_rate(vreg);

This should be a switch statement.  Otherwise this looks fine.


signature.asc
Description: PGP signature


[GIT PULL 0/1] perf/urgent fix

2016-03-29 Thread Arnaldo Carvalho de Melo
Hi Ingo,

Please consider pulling, regression introduced in this
merge window,

- Arnaldo

The following changes since commit 5dc1037305140d8a7e580e916ac17df5d0124add:

  Merge tag 'perf-urgent-for-mingo-20160328' of 
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent 
(2016-03-29 10:39:12 +0200)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
tags/perf-urgent-for-mingo-20160329

for you to fetch changes up to 3ea223adcb0c5893a6dc8ed3a84dce264cbb61d6:

  perf tools: Add missing initialization of perf_sample.cpumode in synthesized 
samples (2016-03-29 20:03:56 -0300)


perf/urgent fix:

- Add missing initialization of perf_sample.cpumode in synthesized samples,
  affects jitdump, records for pre-existing threads and records synthesized
  from processor trace data, noticed while testing intel_pt events with
  'perf script' (Arnaldo Carvalho de Melo)

Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com>


Arnaldo Carvalho de Melo (1):
  perf tools: Add missing initialization of perf_sample.cpumode in 
synthesized samples

 tools/perf/util/event.c | 23 ---
 tools/perf/util/intel-bts.c |  1 +
 tools/perf/util/intel-pt.c  |  3 +++
 tools/perf/util/jitdump.c   |  2 ++
 4 files changed, 22 insertions(+), 7 deletions(-)


Re: [PATCH] regulator: qcom_spmi: Add slewing delays for all SMPS types

2016-03-29 Thread Mark Brown
On Tue, Mar 29, 2016 at 03:58:40PM -0700, Stephen Boyd wrote:

> - if (vreg->logical_type == SPMI_REGULATOR_LOGICAL_TYPE_FTSMPS) {
> - ret = spmi_regulator_ftsmps_init_slew_rate(vreg);
> + if (vreg->logical_type == SPMI_REGULATOR_LOGICAL_TYPE_FTSMPS ||
> + vreg->logical_type == SPMI_REGULATOR_LOGICAL_TYPE_ULT_LO_SMPS ||
> + vreg->logical_type == SPMI_REGULATOR_LOGICAL_TYPE_ULT_HO_SMPS ||
> + vreg->logical_type == SPMI_REGULATOR_LOGICAL_TYPE_SMPS) {
> + ret = spmi_regulator_init_slew_rate(vreg);

This should be a switch statement.  Otherwise this looks fine.


signature.asc
Description: PGP signature


[GIT PULL 0/1] perf/urgent fix

2016-03-29 Thread Arnaldo Carvalho de Melo
Hi Ingo,

Please consider pulling, regression introduced in this
merge window,

- Arnaldo

The following changes since commit 5dc1037305140d8a7e580e916ac17df5d0124add:

  Merge tag 'perf-urgent-for-mingo-20160328' of 
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent 
(2016-03-29 10:39:12 +0200)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
tags/perf-urgent-for-mingo-20160329

for you to fetch changes up to 3ea223adcb0c5893a6dc8ed3a84dce264cbb61d6:

  perf tools: Add missing initialization of perf_sample.cpumode in synthesized 
samples (2016-03-29 20:03:56 -0300)


perf/urgent fix:

- Add missing initialization of perf_sample.cpumode in synthesized samples,
  affects jitdump, records for pre-existing threads and records synthesized
  from processor trace data, noticed while testing intel_pt events with
  'perf script' (Arnaldo Carvalho de Melo)

Signed-off-by: Arnaldo Carvalho de Melo 


Arnaldo Carvalho de Melo (1):
  perf tools: Add missing initialization of perf_sample.cpumode in 
synthesized samples

 tools/perf/util/event.c | 23 ---
 tools/perf/util/intel-bts.c |  1 +
 tools/perf/util/intel-pt.c  |  3 +++
 tools/perf/util/jitdump.c   |  2 ++
 4 files changed, 22 insertions(+), 7 deletions(-)


[ANNOUNCE] 4.1.20-rt23

2016-03-29 Thread Steven Rostedt

Dear RT Folks,

I'm pleased to announce the 4.1.20-rt23 stable release.


This release is just an update to the new stable 4.1.20 version
and no RT specific changes have been made.


You can get this release via the git tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git

  branch: v4.1-rt
  Head SHA1: 115d588693b6f8f9cfad409c091225d4095159e3


Or to build 4.1.20-rt23 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v3.x/linux-4.1.tar.xz

  http://www.kernel.org/pub/linux/kernel/v3.x/patch-4.1.20.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/4.1/patch-4.1.20-rt23.patch.xz




Enjoy,

-- Steve



[ANNOUNCE] 4.1.20-rt23

2016-03-29 Thread Steven Rostedt

Dear RT Folks,

I'm pleased to announce the 4.1.20-rt23 stable release.


This release is just an update to the new stable 4.1.20 version
and no RT specific changes have been made.


You can get this release via the git tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git

  branch: v4.1-rt
  Head SHA1: 115d588693b6f8f9cfad409c091225d4095159e3


Or to build 4.1.20-rt23 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v3.x/linux-4.1.tar.xz

  http://www.kernel.org/pub/linux/kernel/v3.x/patch-4.1.20.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/4.1/patch-4.1.20-rt23.patch.xz




Enjoy,

-- Steve



Re: [PATCH v4 0/4] SROP Mitigation: Sigreturn Cookies

2016-03-29 Thread Andy Lutomirski
On Tue, Mar 29, 2016 at 3:54 PM, Linus Torvalds
 wrote:
> On Tue, Mar 29, 2016 at 2:53 PM, Scott Bauer  wrote:
>>
>> These patches implement the necessary changes to generate a cookie
>> which will be placed above signal frame upon signal delivery to userland.
>> The cookie is generated using a per-process random value xor'd with
>> the address where the cookie will be stored on the stack.
>

> I realize that this would likely need to be a separate and non-default
> extra hardening mode, because there are *definitely* applications that
> take signals and then update the return address (maybe single-stepping
> over instructions etc). But for a *lot* of applications, signal return
> implies changing no signal state at all, and mixing in the returning
> IP and SP would seem to be a fundamentally stronger cookie.

Like selftests/x86? :)

If we wanted to increase confidence that this wouldn't break existing
applications, I've been thinking about adding an extensible bit mask
of backwards compatibility breaks that an and/or libc is okay with.
One of these would be "I don't use vsyscalls", in which case the
vsyscall page would be unmapped entirely.  Another could be
"sigcontext cookies are okay".  These could potentially be programmed
by syscall and/or ELF notes.

--Andy


Re: [PATCH v4 0/4] SROP Mitigation: Sigreturn Cookies

2016-03-29 Thread Andy Lutomirski
On Tue, Mar 29, 2016 at 3:54 PM, Linus Torvalds
 wrote:
> On Tue, Mar 29, 2016 at 2:53 PM, Scott Bauer  wrote:
>>
>> These patches implement the necessary changes to generate a cookie
>> which will be placed above signal frame upon signal delivery to userland.
>> The cookie is generated using a per-process random value xor'd with
>> the address where the cookie will be stored on the stack.
>

> I realize that this would likely need to be a separate and non-default
> extra hardening mode, because there are *definitely* applications that
> take signals and then update the return address (maybe single-stepping
> over instructions etc). But for a *lot* of applications, signal return
> implies changing no signal state at all, and mixing in the returning
> IP and SP would seem to be a fundamentally stronger cookie.

Like selftests/x86? :)

If we wanted to increase confidence that this wouldn't break existing
applications, I've been thinking about adding an extensible bit mask
of backwards compatibility breaks that an and/or libc is okay with.
One of these would be "I don't use vsyscalls", in which case the
vsyscall page would be unmapped entirely.  Another could be
"sigcontext cookies are okay".  These could potentially be programmed
by syscall and/or ELF notes.

--Andy


Re: [PATCH v4 1/4] SROP Mitigation: Architecture independent code for signal cookies

2016-03-29 Thread Linus Torvalds
On Tue, Mar 29, 2016 at 2:53 PM, Scott Bauer  wrote:
> @@ -1231,6 +1232,8 @@ void setup_new_exec(struct linux_binprm * bprm)
> /* This is the point of no return */
> current->sas_ss_sp = current->sas_ss_size = 0;
>
> +   get_random_bytes(>sig_cookie, sizeof(current->sig_cookie));
> +

This should probably just be

 current->sig_cookie = get_random_long();

instead. That will use hardware random numbers if available, and be
*much* faster.

I realize that some people don't like the hardware random number
generators because they don't trust them, but quite frankly, for
something like this it's fine. If the attacker is in collusion with
the hardware manufacturer, you have way bigger problems than a SROP
attack.

 Linus


Re: [PATCH v4 1/4] SROP Mitigation: Architecture independent code for signal cookies

2016-03-29 Thread Linus Torvalds
On Tue, Mar 29, 2016 at 2:53 PM, Scott Bauer  wrote:
> @@ -1231,6 +1232,8 @@ void setup_new_exec(struct linux_binprm * bprm)
> /* This is the point of no return */
> current->sas_ss_sp = current->sas_ss_size = 0;
>
> +   get_random_bytes(>sig_cookie, sizeof(current->sig_cookie));
> +

This should probably just be

 current->sig_cookie = get_random_long();

instead. That will use hardware random numbers if available, and be
*much* faster.

I realize that some people don't like the hardware random number
generators because they don't trust them, but quite frankly, for
something like this it's fine. If the attacker is in collusion with
the hardware manufacturer, you have way bigger problems than a SROP
attack.

 Linus


Re: [PATCH v12 07/29] HMM: add per mirror page table v4.

2016-03-29 Thread John Hubbard
On Tue, 8 Mar 2016, Jérôme Glisse wrote:

> This patch add the per mirror page table. It also propagate CPU page
> table update to this per mirror page table using mmu_notifier callback.
> All update are contextualized with an HMM event structure that convey
> all information needed by device driver to take proper actions (update
> its own mmu to reflect changes and schedule proper flushing).
> 
> Core HMM is responsible for updating the per mirror page table once
> the device driver is done with its update. Most importantly HMM will
> properly propagate HMM page table dirty bit to underlying page.
> 
> Changed since v1:
>   - Removed unused fence code to defer it to latter patches.
> 
> Changed since v2:
>   - Use new bit flag helper for mirror page table manipulation.
>   - Differentiate fork event with HMM_FORK from other events.
> 
> Changed since v3:
>   - Get rid of HMM_ISDIRTY and rely on write protect instead.
>   - Adapt to HMM page table changes
> 
> Signed-off-by: Jérôme Glisse 
> Signed-off-by: Sherry Cheung 
> Signed-off-by: Subhash Gutti 
> Signed-off-by: Mark Hairgrove 
> Signed-off-by: John Hubbard 
> Signed-off-by: Jatin Kumar 
> ---
>  include/linux/hmm.h |  83 
>  mm/hmm.c| 221 
> 
>  2 files changed, 304 insertions(+)
> 
> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> index b559c0b..5488fa9 100644
> --- a/include/linux/hmm.h
> +++ b/include/linux/hmm.h
> @@ -46,6 +46,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  
>  struct hmm_device;
> @@ -53,6 +54,38 @@ struct hmm_mirror;
>  struct hmm;
>  
>  
> +/*
> + * hmm_event - each event is described by a type associated with a struct.
> + */
> +enum hmm_etype {
> + HMM_NONE = 0,
> + HMM_FORK,
> + HMM_MIGRATE,
> + HMM_MUNMAP,
> + HMM_DEVICE_RFAULT,
> + HMM_DEVICE_WFAULT,

Hi Jerome,

Just a tiny thing I noticed, while connecting HMM to NVIDIA's upcoming 
device driver: the last two enum items above should probably be named 
like this:

HMM_DEVICE_READ_FAULT,
HMM_DEVICE_WRITE_FAULT,

instead of _WFAULT / _RFAULT. (Earlier code reviewers asked for more 
clarity on these types of names.)

thanks,
John Hubbard

> + HMM_WRITE_PROTECT,
> +};
> +

Re: [PATCH v12 07/29] HMM: add per mirror page table v4.

2016-03-29 Thread John Hubbard
On Tue, 8 Mar 2016, Jérôme Glisse wrote:

> This patch add the per mirror page table. It also propagate CPU page
> table update to this per mirror page table using mmu_notifier callback.
> All update are contextualized with an HMM event structure that convey
> all information needed by device driver to take proper actions (update
> its own mmu to reflect changes and schedule proper flushing).
> 
> Core HMM is responsible for updating the per mirror page table once
> the device driver is done with its update. Most importantly HMM will
> properly propagate HMM page table dirty bit to underlying page.
> 
> Changed since v1:
>   - Removed unused fence code to defer it to latter patches.
> 
> Changed since v2:
>   - Use new bit flag helper for mirror page table manipulation.
>   - Differentiate fork event with HMM_FORK from other events.
> 
> Changed since v3:
>   - Get rid of HMM_ISDIRTY and rely on write protect instead.
>   - Adapt to HMM page table changes
> 
> Signed-off-by: Jérôme Glisse 
> Signed-off-by: Sherry Cheung 
> Signed-off-by: Subhash Gutti 
> Signed-off-by: Mark Hairgrove 
> Signed-off-by: John Hubbard 
> Signed-off-by: Jatin Kumar 
> ---
>  include/linux/hmm.h |  83 
>  mm/hmm.c| 221 
> 
>  2 files changed, 304 insertions(+)
> 
> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> index b559c0b..5488fa9 100644
> --- a/include/linux/hmm.h
> +++ b/include/linux/hmm.h
> @@ -46,6 +46,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  
>  struct hmm_device;
> @@ -53,6 +54,38 @@ struct hmm_mirror;
>  struct hmm;
>  
>  
> +/*
> + * hmm_event - each event is described by a type associated with a struct.
> + */
> +enum hmm_etype {
> + HMM_NONE = 0,
> + HMM_FORK,
> + HMM_MIGRATE,
> + HMM_MUNMAP,
> + HMM_DEVICE_RFAULT,
> + HMM_DEVICE_WFAULT,

Hi Jerome,

Just a tiny thing I noticed, while connecting HMM to NVIDIA's upcoming 
device driver: the last two enum items above should probably be named 
like this:

HMM_DEVICE_READ_FAULT,
HMM_DEVICE_WRITE_FAULT,

instead of _WFAULT / _RFAULT. (Earlier code reviewers asked for more 
clarity on these types of names.)

thanks,
John Hubbard

> + HMM_WRITE_PROTECT,
> +};
> +

[PATCH] regulator: qcom_spmi: Add slewing delays for all SMPS types

2016-03-29 Thread Stephen Boyd
Only the FT SMPS type regulators have slewing supported in the
driver, but all types of SMPS regulators need the same support.
The only difference is that some SMPS regulators don't have a
step size and the step delay is typically 20, not 8. Luckily, the
step size reads as 0 for the non-FT types, so we can always read
that, but we need to detect which type of regulator we're using
to figure out what step delay to use. Make these minor
adjustments to the slew rate calculations and add support for the
delay function to the appropriate regulator ops.

Reported-by: Georgi Djakov 
Cc: David Collins 
Signed-off-by: Stephen Boyd 
---
 drivers/regulator/qcom_spmi-regulator.c | 26 +-
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/regulator/qcom_spmi-regulator.c 
b/drivers/regulator/qcom_spmi-regulator.c
index 88a5dc88badc..c2a63f9e1c10 100644
--- a/drivers/regulator/qcom_spmi-regulator.c
+++ b/drivers/regulator/qcom_spmi-regulator.c
@@ -246,6 +246,7 @@ enum spmi_common_control_register_index {
 
 /* Minimum voltage stepper delay for each step. */
 #define SPMI_FTSMPS_STEP_DELAY 8
+#define SPMI_DEFAULT_STEP_DELAY20
 
 /*
  * The ratio SPMI_FTSMPS_STEP_MARGIN_NUM/SPMI_FTSMPS_STEP_MARGIN_DEN is used to
@@ -1008,6 +1009,7 @@ static struct regulator_ops spmi_smps_ops = {
.disable= spmi_regulator_common_disable,
.is_enabled = spmi_regulator_common_is_enabled,
.set_voltage= spmi_regulator_common_set_voltage,
+   .set_voltage_time_sel   = spmi_regulator_set_voltage_time_sel,
.get_voltage= spmi_regulator_common_get_voltage,
.list_voltage   = spmi_regulator_common_list_voltage,
.set_mode   = spmi_regulator_common_set_mode,
@@ -1081,6 +1083,7 @@ static struct regulator_ops spmi_ult_lo_smps_ops = {
.disable= spmi_regulator_common_disable,
.is_enabled = spmi_regulator_common_is_enabled,
.set_voltage= spmi_regulator_ult_lo_smps_set_voltage,
+   .set_voltage_time_sel   = spmi_regulator_set_voltage_time_sel,
.get_voltage= spmi_regulator_ult_lo_smps_get_voltage,
.list_voltage   = spmi_regulator_common_list_voltage,
.set_mode   = spmi_regulator_common_set_mode,
@@ -1094,6 +1097,7 @@ static struct regulator_ops spmi_ult_ho_smps_ops = {
.disable= spmi_regulator_common_disable,
.is_enabled = spmi_regulator_common_is_enabled,
.set_voltage= spmi_regulator_single_range_set_voltage,
+   .set_voltage_time_sel   = spmi_regulator_set_voltage_time_sel,
.get_voltage= spmi_regulator_single_range_get_voltage,
.list_voltage   = spmi_regulator_common_list_voltage,
.set_mode   = spmi_regulator_common_set_mode,
@@ -1245,11 +1249,11 @@ found:
return 0;
 }
 
-static int spmi_regulator_ftsmps_init_slew_rate(struct spmi_regulator *vreg)
+static int spmi_regulator_init_slew_rate(struct spmi_regulator *vreg)
 {
int ret;
u8 reg = 0;
-   int step, delay, slew_rate;
+   int step, delay, slew_rate, step_delay;
const struct spmi_voltage_range *range;
 
ret = spmi_vreg_read(vreg, SPMI_COMMON_REG_STEP_CTRL, , 1);
@@ -1262,6 +1266,15 @@ static int spmi_regulator_ftsmps_init_slew_rate(struct 
spmi_regulator *vreg)
if (!range)
return -EINVAL;
 
+   switch (vreg->logical_type) {
+   case SPMI_REGULATOR_LOGICAL_TYPE_FTSMPS:
+   step_delay = SPMI_FTSMPS_STEP_DELAY;
+   break;
+   default:
+   step_delay = SPMI_DEFAULT_STEP_DELAY;
+   break;
+   }
+
step = reg & SPMI_FTSMPS_STEP_CTRL_STEP_MASK;
step >>= SPMI_FTSMPS_STEP_CTRL_STEP_SHIFT;
 
@@ -1270,7 +1283,7 @@ static int spmi_regulator_ftsmps_init_slew_rate(struct 
spmi_regulator *vreg)
 
/* slew_rate has units of uV/us */
slew_rate = SPMI_FTSMPS_CLOCK_RATE * range->step_uV * (1 << step);
-   slew_rate /= 1000 * (SPMI_FTSMPS_STEP_DELAY << delay);
+   slew_rate /= 1000 * (step_delay << delay);
slew_rate *= SPMI_FTSMPS_STEP_MARGIN_NUM;
slew_rate /= SPMI_FTSMPS_STEP_MARGIN_DEN;
 
@@ -1411,8 +1424,11 @@ static int spmi_regulator_of_parse(struct device_node 
*node,
return ret;
}
 
-   if (vreg->logical_type == SPMI_REGULATOR_LOGICAL_TYPE_FTSMPS) {
-   ret = spmi_regulator_ftsmps_init_slew_rate(vreg);
+   if (vreg->logical_type == SPMI_REGULATOR_LOGICAL_TYPE_FTSMPS ||
+   vreg->logical_type == SPMI_REGULATOR_LOGICAL_TYPE_ULT_LO_SMPS ||
+   vreg->logical_type == SPMI_REGULATOR_LOGICAL_TYPE_ULT_HO_SMPS ||
+   vreg->logical_type == 

[PATCH] regulator: qcom_spmi: Add slewing delays for all SMPS types

2016-03-29 Thread Stephen Boyd
Only the FT SMPS type regulators have slewing supported in the
driver, but all types of SMPS regulators need the same support.
The only difference is that some SMPS regulators don't have a
step size and the step delay is typically 20, not 8. Luckily, the
step size reads as 0 for the non-FT types, so we can always read
that, but we need to detect which type of regulator we're using
to figure out what step delay to use. Make these minor
adjustments to the slew rate calculations and add support for the
delay function to the appropriate regulator ops.

Reported-by: Georgi Djakov 
Cc: David Collins 
Signed-off-by: Stephen Boyd 
---
 drivers/regulator/qcom_spmi-regulator.c | 26 +-
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/regulator/qcom_spmi-regulator.c 
b/drivers/regulator/qcom_spmi-regulator.c
index 88a5dc88badc..c2a63f9e1c10 100644
--- a/drivers/regulator/qcom_spmi-regulator.c
+++ b/drivers/regulator/qcom_spmi-regulator.c
@@ -246,6 +246,7 @@ enum spmi_common_control_register_index {
 
 /* Minimum voltage stepper delay for each step. */
 #define SPMI_FTSMPS_STEP_DELAY 8
+#define SPMI_DEFAULT_STEP_DELAY20
 
 /*
  * The ratio SPMI_FTSMPS_STEP_MARGIN_NUM/SPMI_FTSMPS_STEP_MARGIN_DEN is used to
@@ -1008,6 +1009,7 @@ static struct regulator_ops spmi_smps_ops = {
.disable= spmi_regulator_common_disable,
.is_enabled = spmi_regulator_common_is_enabled,
.set_voltage= spmi_regulator_common_set_voltage,
+   .set_voltage_time_sel   = spmi_regulator_set_voltage_time_sel,
.get_voltage= spmi_regulator_common_get_voltage,
.list_voltage   = spmi_regulator_common_list_voltage,
.set_mode   = spmi_regulator_common_set_mode,
@@ -1081,6 +1083,7 @@ static struct regulator_ops spmi_ult_lo_smps_ops = {
.disable= spmi_regulator_common_disable,
.is_enabled = spmi_regulator_common_is_enabled,
.set_voltage= spmi_regulator_ult_lo_smps_set_voltage,
+   .set_voltage_time_sel   = spmi_regulator_set_voltage_time_sel,
.get_voltage= spmi_regulator_ult_lo_smps_get_voltage,
.list_voltage   = spmi_regulator_common_list_voltage,
.set_mode   = spmi_regulator_common_set_mode,
@@ -1094,6 +1097,7 @@ static struct regulator_ops spmi_ult_ho_smps_ops = {
.disable= spmi_regulator_common_disable,
.is_enabled = spmi_regulator_common_is_enabled,
.set_voltage= spmi_regulator_single_range_set_voltage,
+   .set_voltage_time_sel   = spmi_regulator_set_voltage_time_sel,
.get_voltage= spmi_regulator_single_range_get_voltage,
.list_voltage   = spmi_regulator_common_list_voltage,
.set_mode   = spmi_regulator_common_set_mode,
@@ -1245,11 +1249,11 @@ found:
return 0;
 }
 
-static int spmi_regulator_ftsmps_init_slew_rate(struct spmi_regulator *vreg)
+static int spmi_regulator_init_slew_rate(struct spmi_regulator *vreg)
 {
int ret;
u8 reg = 0;
-   int step, delay, slew_rate;
+   int step, delay, slew_rate, step_delay;
const struct spmi_voltage_range *range;
 
ret = spmi_vreg_read(vreg, SPMI_COMMON_REG_STEP_CTRL, , 1);
@@ -1262,6 +1266,15 @@ static int spmi_regulator_ftsmps_init_slew_rate(struct 
spmi_regulator *vreg)
if (!range)
return -EINVAL;
 
+   switch (vreg->logical_type) {
+   case SPMI_REGULATOR_LOGICAL_TYPE_FTSMPS:
+   step_delay = SPMI_FTSMPS_STEP_DELAY;
+   break;
+   default:
+   step_delay = SPMI_DEFAULT_STEP_DELAY;
+   break;
+   }
+
step = reg & SPMI_FTSMPS_STEP_CTRL_STEP_MASK;
step >>= SPMI_FTSMPS_STEP_CTRL_STEP_SHIFT;
 
@@ -1270,7 +1283,7 @@ static int spmi_regulator_ftsmps_init_slew_rate(struct 
spmi_regulator *vreg)
 
/* slew_rate has units of uV/us */
slew_rate = SPMI_FTSMPS_CLOCK_RATE * range->step_uV * (1 << step);
-   slew_rate /= 1000 * (SPMI_FTSMPS_STEP_DELAY << delay);
+   slew_rate /= 1000 * (step_delay << delay);
slew_rate *= SPMI_FTSMPS_STEP_MARGIN_NUM;
slew_rate /= SPMI_FTSMPS_STEP_MARGIN_DEN;
 
@@ -1411,8 +1424,11 @@ static int spmi_regulator_of_parse(struct device_node 
*node,
return ret;
}
 
-   if (vreg->logical_type == SPMI_REGULATOR_LOGICAL_TYPE_FTSMPS) {
-   ret = spmi_regulator_ftsmps_init_slew_rate(vreg);
+   if (vreg->logical_type == SPMI_REGULATOR_LOGICAL_TYPE_FTSMPS ||
+   vreg->logical_type == SPMI_REGULATOR_LOGICAL_TYPE_ULT_LO_SMPS ||
+   vreg->logical_type == SPMI_REGULATOR_LOGICAL_TYPE_ULT_HO_SMPS ||
+   vreg->logical_type == SPMI_REGULATOR_LOGICAL_TYPE_SMPS) {
+   ret = spmi_regulator_init_slew_rate(vreg);

Re: [PATCH v4 0/4] SROP Mitigation: Sigreturn Cookies

2016-03-29 Thread Linus Torvalds
On Tue, Mar 29, 2016 at 5:54 PM, Linus Torvalds
 wrote:
>
> So you could do 4 32
>
>  - the random value
>  - the low 32 bits of the address of the cookie
>  - the low 32 bits of the return point stack and instruction pointer

Oops, editing mishap. That was supposed to be about the 128-bit md5
chunk, which uses 4 32-bit values, but then I edited things and didn't
get back to it.

Linus


Re: [PATCH v4 0/4] SROP Mitigation: Sigreturn Cookies

2016-03-29 Thread Linus Torvalds
On Tue, Mar 29, 2016 at 5:54 PM, Linus Torvalds
 wrote:
>
> So you could do 4 32
>
>  - the random value
>  - the low 32 bits of the address of the cookie
>  - the low 32 bits of the return point stack and instruction pointer

Oops, editing mishap. That was supposed to be about the 128-bit md5
chunk, which uses 4 32-bit values, but then I edited things and didn't
get back to it.

Linus


mmotm 2016-03-29-15-54 uploaded

2016-03-29 Thread akpm
The mm-of-the-moment snapshot 2016-03-29-15-54 has been uploaded to

   http://www.ozlabs.org/~akpm/mmotm/

mmotm-readme.txt says

README for mm-of-the-moment:

http://www.ozlabs.org/~akpm/mmotm/

This is a snapshot of my -mm patch queue.  Uploaded at random hopefully
more than once a week.

You will need quilt to apply these patches to the latest Linus release (4.x
or 4.x-rcY).  The series file is in broken-out.tar.gz and is duplicated in
http://ozlabs.org/~akpm/mmotm/series

The file broken-out.tar.gz contains two datestamp files: .DATE and
.DATE--mm-dd-hh-mm-ss.  Both contain the string -mm-dd-hh-mm-ss,
followed by the base kernel version against which this patch series is to
be applied.

This tree is partially included in linux-next.  To see which patches are
included in linux-next, consult the `series' file.  Only the patches
within the #NEXT_PATCHES_START/#NEXT_PATCHES_END markers are included in
linux-next.

A git tree which contains the memory management portion of this tree is
maintained at git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git
by Michal Hocko.  It contains the patches which are between the
"#NEXT_PATCHES_START mm" and "#NEXT_PATCHES_END" markers, from the series
file, http://www.ozlabs.org/~akpm/mmotm/series.


A full copy of the full kernel tree with the linux-next and mmotm patches
already applied is available through git within an hour of the mmotm
release.  Individual mmotm releases are tagged.  The master branch always
points to the latest release, so it's constantly rebasing.

http://git.cmpxchg.org/cgit.cgi/linux-mmotm.git/

To develop on top of mmotm git:

  $ git remote add mmotm 
git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git
  $ git remote update mmotm
  $ git checkout -b topic mmotm/master
  
  $ git send-email mmotm/master.. [...]

To rebase a branch with older patches to a new mmotm release:

  $ git remote update mmotm
  $ git rebase --onto mmotm/master  topic




The directory http://www.ozlabs.org/~akpm/mmots/ (mm-of-the-second)
contains daily snapshots of the -mm tree.  It is updated more frequently
than mmotm, and is untested.

A git copy of this tree is available at

http://git.cmpxchg.org/cgit.cgi/linux-mmots.git/

and use of this tree is similar to
http://git.cmpxchg.org/cgit.cgi/linux-mmotm.git/, described above.


This mmotm tree contains the following patches against 4.6-rc1:
(patches marked "*" will be included in linux-next)

  origin.patch
  i-need-old-gcc.patch
  arch-alpha-kernel-systblss-remove-debug-check.patch
  drivers-gpu-drm-i915-intel_spritec-fix-build.patch
  drivers-gpu-drm-i915-intel_tvc-fix-build.patch
* maintainers-orangefs-mailing-list-is-subscribers-only.patch
* 
include-linux-huge_mmh-return-null-instead-of-false-for-pmd_trans_huge_lock.patch
* mm-fix-invalid-node-in-alloc_migrate_target.patch
* x86-mm-tlb_remote_send_ipi-should-count-pages.patch
* mm-rmap-batched-invalidations-should-use-existing-api.patch
* mm-page_ref-use-page_ref-helper-instead-of-direct-modification-of-_count.patch
* mm-rename-_count-field-of-the-struct-page-to-_refcount.patch
* mm-rename-_count-field-of-the-struct-page-to-_refcount-fix.patch
* ksm-introduce-ksm_max_page_sharing-per-page-deduplication-limit.patch
* ksm-introduce-ksm_max_page_sharing-per-page-deduplication-limit-fix-2.patch
* ksm-introduce-ksm_max_page_sharing-per-page-deduplication-limit-fix-3.patch
* mm-page_isolation-fix-tracepoint-to-mirror-check-function-behavior.patch
* oom-oom_reaper-do-not-enqueue-task-if-it-is-on-the-oom_reaper_list-head.patch
* mm-page_isolationc-fix-the-function-comments.patch
* compilerh-provide-__always_inline-to-userspace-headers-too.patch
* arm-arch-arm-include-asm-pageh-needs-personalityh.patch
* fs-ext4-fsyncc-generic_file_fsync-call-based-on-barrier-flag.patch
* 
ocfs2-error-code-comments-and-amendments-the-comment-of-ocfs2_extended_slot-should-be-0x08.patch
* ocfs2-clean-up-an-unused-variable-wants_rotate-in-ocfs2_truncate_rec.patch
* ocfs2-o2hb-add-negotiate-timer.patch
* ocfs2-o2hb-add-negotiate-timer-v2.patch
* ocfs2-o2hb-add-nego_timeout-message.patch
* ocfs2-o2hb-add-nego_timeout-message-v2.patch
* ocfs2-o2hb-add-negotiate_approve-message.patch
* ocfs2-o2hb-add-negotiate_approve-message-v2.patch
* ocfs2-o2hb-add-some-user-debug-log.patch
* ocfs2-o2hb-add-some-user-debug-log-v2.patch
* ocfs2-o2hb-dont-negotiate-if-last-hb-fail.patch
* ocfs2-o2hb-fix-hb-hung-time.patch
* 
block-restore-proc-partitions-to-not-display-non-partitionable-removable-devices.patch
  mm.patch
* mm-slab-hold-a-slab_mutex-when-calling-__kmem_cache_shrink.patch
* mm-slab-remove-bad_alien_magic-again.patch
* mm-slab-drain-the-free-slab-as-much-as-possible.patch
* mm-slab-factor-out-kmem_cache_node-initialization-code.patch
* mm-slab-clean-up-kmem_cache_node-setup.patch
* mm-slab-dont-keep-free-slabs-if-free_objects-exceeds-free_limit.patch
* mm-slab-racy-access-modify-the-slab-color.patch
* 

Re: [kernel-hardening] Re: [PATCH v4 0/4] SROP Mitigation: Sigreturn Cookies

2016-03-29 Thread Daniel Micay
> Then there's an unanswered question: is this patch acceptable given
> that it's an ABI break?  Security fixes are sometimes an exception to
> the "no ABI breaks" rule, but it's by no means an automatic exception.
> 
> --Andy

It seems this could be worked around in general. Processes can have a
bit tracking whether this is enabled, and CRIU can save/restore it. It
would just leave it off for resuming old saved processes.

Should CRIU really be covered by the kernel's ABI guarantee though? It
seems like this was meant to be extensible, so it's adding an extra ABI
guarantee that wasn't there before. It makes sense to freeze this ABI
for CRIU, but a version field should be added first in one final ABI
break if it's not too late.

signature.asc
Description: This is a digitally signed message part


mmotm 2016-03-29-15-54 uploaded

2016-03-29 Thread akpm
The mm-of-the-moment snapshot 2016-03-29-15-54 has been uploaded to

   http://www.ozlabs.org/~akpm/mmotm/

mmotm-readme.txt says

README for mm-of-the-moment:

http://www.ozlabs.org/~akpm/mmotm/

This is a snapshot of my -mm patch queue.  Uploaded at random hopefully
more than once a week.

You will need quilt to apply these patches to the latest Linus release (4.x
or 4.x-rcY).  The series file is in broken-out.tar.gz and is duplicated in
http://ozlabs.org/~akpm/mmotm/series

The file broken-out.tar.gz contains two datestamp files: .DATE and
.DATE--mm-dd-hh-mm-ss.  Both contain the string -mm-dd-hh-mm-ss,
followed by the base kernel version against which this patch series is to
be applied.

This tree is partially included in linux-next.  To see which patches are
included in linux-next, consult the `series' file.  Only the patches
within the #NEXT_PATCHES_START/#NEXT_PATCHES_END markers are included in
linux-next.

A git tree which contains the memory management portion of this tree is
maintained at git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git
by Michal Hocko.  It contains the patches which are between the
"#NEXT_PATCHES_START mm" and "#NEXT_PATCHES_END" markers, from the series
file, http://www.ozlabs.org/~akpm/mmotm/series.


A full copy of the full kernel tree with the linux-next and mmotm patches
already applied is available through git within an hour of the mmotm
release.  Individual mmotm releases are tagged.  The master branch always
points to the latest release, so it's constantly rebasing.

http://git.cmpxchg.org/cgit.cgi/linux-mmotm.git/

To develop on top of mmotm git:

  $ git remote add mmotm 
git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git
  $ git remote update mmotm
  $ git checkout -b topic mmotm/master
  
  $ git send-email mmotm/master.. [...]

To rebase a branch with older patches to a new mmotm release:

  $ git remote update mmotm
  $ git rebase --onto mmotm/master  topic




The directory http://www.ozlabs.org/~akpm/mmots/ (mm-of-the-second)
contains daily snapshots of the -mm tree.  It is updated more frequently
than mmotm, and is untested.

A git copy of this tree is available at

http://git.cmpxchg.org/cgit.cgi/linux-mmots.git/

and use of this tree is similar to
http://git.cmpxchg.org/cgit.cgi/linux-mmotm.git/, described above.


This mmotm tree contains the following patches against 4.6-rc1:
(patches marked "*" will be included in linux-next)

  origin.patch
  i-need-old-gcc.patch
  arch-alpha-kernel-systblss-remove-debug-check.patch
  drivers-gpu-drm-i915-intel_spritec-fix-build.patch
  drivers-gpu-drm-i915-intel_tvc-fix-build.patch
* maintainers-orangefs-mailing-list-is-subscribers-only.patch
* 
include-linux-huge_mmh-return-null-instead-of-false-for-pmd_trans_huge_lock.patch
* mm-fix-invalid-node-in-alloc_migrate_target.patch
* x86-mm-tlb_remote_send_ipi-should-count-pages.patch
* mm-rmap-batched-invalidations-should-use-existing-api.patch
* mm-page_ref-use-page_ref-helper-instead-of-direct-modification-of-_count.patch
* mm-rename-_count-field-of-the-struct-page-to-_refcount.patch
* mm-rename-_count-field-of-the-struct-page-to-_refcount-fix.patch
* ksm-introduce-ksm_max_page_sharing-per-page-deduplication-limit.patch
* ksm-introduce-ksm_max_page_sharing-per-page-deduplication-limit-fix-2.patch
* ksm-introduce-ksm_max_page_sharing-per-page-deduplication-limit-fix-3.patch
* mm-page_isolation-fix-tracepoint-to-mirror-check-function-behavior.patch
* oom-oom_reaper-do-not-enqueue-task-if-it-is-on-the-oom_reaper_list-head.patch
* mm-page_isolationc-fix-the-function-comments.patch
* compilerh-provide-__always_inline-to-userspace-headers-too.patch
* arm-arch-arm-include-asm-pageh-needs-personalityh.patch
* fs-ext4-fsyncc-generic_file_fsync-call-based-on-barrier-flag.patch
* 
ocfs2-error-code-comments-and-amendments-the-comment-of-ocfs2_extended_slot-should-be-0x08.patch
* ocfs2-clean-up-an-unused-variable-wants_rotate-in-ocfs2_truncate_rec.patch
* ocfs2-o2hb-add-negotiate-timer.patch
* ocfs2-o2hb-add-negotiate-timer-v2.patch
* ocfs2-o2hb-add-nego_timeout-message.patch
* ocfs2-o2hb-add-nego_timeout-message-v2.patch
* ocfs2-o2hb-add-negotiate_approve-message.patch
* ocfs2-o2hb-add-negotiate_approve-message-v2.patch
* ocfs2-o2hb-add-some-user-debug-log.patch
* ocfs2-o2hb-add-some-user-debug-log-v2.patch
* ocfs2-o2hb-dont-negotiate-if-last-hb-fail.patch
* ocfs2-o2hb-fix-hb-hung-time.patch
* 
block-restore-proc-partitions-to-not-display-non-partitionable-removable-devices.patch
  mm.patch
* mm-slab-hold-a-slab_mutex-when-calling-__kmem_cache_shrink.patch
* mm-slab-remove-bad_alien_magic-again.patch
* mm-slab-drain-the-free-slab-as-much-as-possible.patch
* mm-slab-factor-out-kmem_cache_node-initialization-code.patch
* mm-slab-clean-up-kmem_cache_node-setup.patch
* mm-slab-dont-keep-free-slabs-if-free_objects-exceeds-free_limit.patch
* mm-slab-racy-access-modify-the-slab-color.patch
* 

Re: [kernel-hardening] Re: [PATCH v4 0/4] SROP Mitigation: Sigreturn Cookies

2016-03-29 Thread Daniel Micay
> Then there's an unanswered question: is this patch acceptable given
> that it's an ABI break?  Security fixes are sometimes an exception to
> the "no ABI breaks" rule, but it's by no means an automatic exception.
> 
> --Andy

It seems this could be worked around in general. Processes can have a
bit tracking whether this is enabled, and CRIU can save/restore it. It
would just leave it off for resuming old saved processes.

Should CRIU really be covered by the kernel's ABI guarantee though? It
seems like this was meant to be extensible, so it's adding an extra ABI
guarantee that wasn't there before. It makes sense to freeze this ABI
for CRIU, but a version field should be added first in one final ABI
break if it's not too late.

signature.asc
Description: This is a digitally signed message part


Re: [PATCH v4 0/4] SROP Mitigation: Sigreturn Cookies

2016-03-29 Thread Linus Torvalds
On Tue, Mar 29, 2016 at 2:53 PM, Scott Bauer  wrote:
>
> These patches implement the necessary changes to generate a cookie
> which will be placed above signal frame upon signal delivery to userland.
> The cookie is generated using a per-process random value xor'd with
> the address where the cookie will be stored on the stack.

Side note: wouldn't it be better to make the cookie something that
doesn't make it trivial to figure out the random value in case you
already have access to a signal stack?

Maybe there could be a stronger variation of this that makes the
cookie be something like a single md5 round (not a full md5).
Something fast, and not necessarily secure, but something that needs
more than one single CPU instruction to figure out.

So you could do 4 32

 - the random value
 - the low 32 bits of the address of the cookie
 - the low 32 bits of the return point stack and instruction pointer

Yes, yes, md5 is not cryptographically secure, and making it a single
iteration rather than the full four makes it even less so, but if the
attacker can generate long arbitrary code, then the whole SROP is
pointless to begin with, no?

In contrast, with the plain xor, the SROP would be a trivial operation
if you can just force it to happen within the context of a signal, so
that you can just re-use the signal return stack as-is. But mixing in
the returning IP and SP would make it *much* harder to use the
sigreturn as an attack vector.

I realize that this would likely need to be a separate and non-default
extra hardening mode, because there are *definitely* applications that
take signals and then update the return address (maybe single-stepping
over instructions etc). But for a *lot* of applications, signal return
implies changing no signal state at all, and mixing in the returning
IP and SP would seem to be a fundamentally stronger cookie.

No?

 Linus


Re: [PATCH v4 0/4] SROP Mitigation: Sigreturn Cookies

2016-03-29 Thread Linus Torvalds
On Tue, Mar 29, 2016 at 2:53 PM, Scott Bauer  wrote:
>
> These patches implement the necessary changes to generate a cookie
> which will be placed above signal frame upon signal delivery to userland.
> The cookie is generated using a per-process random value xor'd with
> the address where the cookie will be stored on the stack.

Side note: wouldn't it be better to make the cookie something that
doesn't make it trivial to figure out the random value in case you
already have access to a signal stack?

Maybe there could be a stronger variation of this that makes the
cookie be something like a single md5 round (not a full md5).
Something fast, and not necessarily secure, but something that needs
more than one single CPU instruction to figure out.

So you could do 4 32

 - the random value
 - the low 32 bits of the address of the cookie
 - the low 32 bits of the return point stack and instruction pointer

Yes, yes, md5 is not cryptographically secure, and making it a single
iteration rather than the full four makes it even less so, but if the
attacker can generate long arbitrary code, then the whole SROP is
pointless to begin with, no?

In contrast, with the plain xor, the SROP would be a trivial operation
if you can just force it to happen within the context of a signal, so
that you can just re-use the signal return stack as-is. But mixing in
the returning IP and SP would make it *much* harder to use the
sigreturn as an attack vector.

I realize that this would likely need to be a separate and non-default
extra hardening mode, because there are *definitely* applications that
take signals and then update the return address (maybe single-stepping
over instructions etc). But for a *lot* of applications, signal return
implies changing no signal state at all, and mixing in the returning
IP and SP would seem to be a fundamentally stronger cookie.

No?

 Linus


Re: arm64: kernel v4.6-rc1 hangs on QEMU

2016-03-29 Thread Yury Norov
On Wed, Mar 30, 2016 at 12:32:42AM +0200, Arnd Bergmann wrote:
> On Wednesday 30 March 2016 01:22:17 Yury Norov wrote:
> > > 
> > > Undefined instruction in cpuinfo_store_boot_cpu() could be related
> > > to the SYS_ID_AA64MMFR2_EL1 access that was recently added.
> > > 
> > > What does the architecture say about reading unknown cpuid registers?
> > > 
> > >   Arnd
> > 
> > ThunderX has some unimplemented system registers. AFAIR, attempt to access 
> > it
> > causes data abort.
> 
> Ok, if that is the case, maybe the read_cpuid() macro can be changed
> so it contains a fixup for the trap? That should handle both data abort
> and undefinstr.
> 
>   Arnd

Sounds alluring, but not clear what we'd return that way. I mean, how
we'd distinguish between correct value and error code (0, -1 or whatever).
But I think, we can do like this:

val = read_cpuid_safe(reg, impossible_val);
if (val == impossible_val)
goto err;

I think it will work for many cases.

Yury.


Re: arm64: kernel v4.6-rc1 hangs on QEMU

2016-03-29 Thread Yury Norov
On Wed, Mar 30, 2016 at 12:32:42AM +0200, Arnd Bergmann wrote:
> On Wednesday 30 March 2016 01:22:17 Yury Norov wrote:
> > > 
> > > Undefined instruction in cpuinfo_store_boot_cpu() could be related
> > > to the SYS_ID_AA64MMFR2_EL1 access that was recently added.
> > > 
> > > What does the architecture say about reading unknown cpuid registers?
> > > 
> > >   Arnd
> > 
> > ThunderX has some unimplemented system registers. AFAIR, attempt to access 
> > it
> > causes data abort.
> 
> Ok, if that is the case, maybe the read_cpuid() macro can be changed
> so it contains a fixup for the trap? That should handle both data abort
> and undefinstr.
> 
>   Arnd

Sounds alluring, but not clear what we'd return that way. I mean, how
we'd distinguish between correct value and error code (0, -1 or whatever).
But I think, we can do like this:

val = read_cpuid_safe(reg, impossible_val);
if (val == impossible_val)
goto err;

I think it will work for many cases.

Yury.


Re: [PATCH v3] sched/deadline: do not try to push tasks if pinned task switches to dl

2016-03-29 Thread Steven Rostedt
On Wed, 22 Oct 2014 10:33:05 +0100
Juri Lelli  wrote:

> On 22/10/14 01:36, Wanpeng Li wrote:
> > As Kirill mentioned(https://lkml.org/lkml/2013/1/29/118):
> > | If rq has already had 2 or more pushable tasks and we try to add a 
> > | pinned task then call of push_rt_task will just waste a time.
> > 
> > Just switched pinned task is not able to be pushed. If the rq has had
> > several dl tasks before they have already been considered as candidates
> > to be pushed (or pulled). This patch implements the same behavior as rt 
> > class which introduced by commit 10447917551e ("sched/rt: Do not try to 
> > push tasks if pinned task switches to RT"). 
> > 
> > Signed-off-by: Wanpeng Li 
> > ---
> > v2 -> v3:
> >  * cleanup patch description
> >  * align && to p->nr_cpus_allowed
> > v1 -> v2:
> >  * use 12 or more chars for the git commit ID
> > 
> >  kernel/sched/deadline.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > index abfaf3d..bd5e479 100644
> > --- a/kernel/sched/deadline.c
> > +++ b/kernel/sched/deadline.c
> > @@ -1603,7 +1603,8 @@ static void switched_to_dl(struct rq *rq, struct 
> > task_struct *p)
> >  
> > if (task_on_rq_queued(p) && rq->curr != p) {
> >  #ifdef CONFIG_SMP
> > -   if (rq->dl.overloaded && push_dl_task(rq) && rq != task_rq(p))
> > +   if (p->nr_cpus_allowed > 1 && rq->dl.overloaded &&
> > +   push_dl_task(rq) && rq != task_rq(p))
> > /* Only reschedule if pushing failed */
> > check_resched = 0;
> >  #endif /* CONFIG_SMP */
> >   
> 

I'm looking at some old changes for sched-deadline, and I stumbled
across this. As I'm working on sched deadline tests, I've discovered
that they can't have cpu affinity. They are limited to their sched
domains. That is, sched deadline tasks have whatever affinity that the
domain they happen to be in has.

Is there a condition where rq != task_rq(p) and p->nr_cpus_allowed > 1
isn't true?

Now maybe this will help with -rt when a task hits a migrate disable?

Just asking.

-- Steve



Re: [PATCH v3] sched/deadline: do not try to push tasks if pinned task switches to dl

2016-03-29 Thread Steven Rostedt
On Wed, 22 Oct 2014 10:33:05 +0100
Juri Lelli  wrote:

> On 22/10/14 01:36, Wanpeng Li wrote:
> > As Kirill mentioned(https://lkml.org/lkml/2013/1/29/118):
> > | If rq has already had 2 or more pushable tasks and we try to add a 
> > | pinned task then call of push_rt_task will just waste a time.
> > 
> > Just switched pinned task is not able to be pushed. If the rq has had
> > several dl tasks before they have already been considered as candidates
> > to be pushed (or pulled). This patch implements the same behavior as rt 
> > class which introduced by commit 10447917551e ("sched/rt: Do not try to 
> > push tasks if pinned task switches to RT"). 
> > 
> > Signed-off-by: Wanpeng Li 
> > ---
> > v2 -> v3:
> >  * cleanup patch description
> >  * align && to p->nr_cpus_allowed
> > v1 -> v2:
> >  * use 12 or more chars for the git commit ID
> > 
> >  kernel/sched/deadline.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > index abfaf3d..bd5e479 100644
> > --- a/kernel/sched/deadline.c
> > +++ b/kernel/sched/deadline.c
> > @@ -1603,7 +1603,8 @@ static void switched_to_dl(struct rq *rq, struct 
> > task_struct *p)
> >  
> > if (task_on_rq_queued(p) && rq->curr != p) {
> >  #ifdef CONFIG_SMP
> > -   if (rq->dl.overloaded && push_dl_task(rq) && rq != task_rq(p))
> > +   if (p->nr_cpus_allowed > 1 && rq->dl.overloaded &&
> > +   push_dl_task(rq) && rq != task_rq(p))
> > /* Only reschedule if pushing failed */
> > check_resched = 0;
> >  #endif /* CONFIG_SMP */
> >   
> 

I'm looking at some old changes for sched-deadline, and I stumbled
across this. As I'm working on sched deadline tests, I've discovered
that they can't have cpu affinity. They are limited to their sched
domains. That is, sched deadline tasks have whatever affinity that the
domain they happen to be in has.

Is there a condition where rq != task_rq(p) and p->nr_cpus_allowed > 1
isn't true?

Now maybe this will help with -rt when a task hits a migrate disable?

Just asking.

-- Steve



[PATCH] acpi/acpica: fix Thunderbolt hotplug

2016-03-29 Thread Prarit Bhargava
The following hung task trace is seen when hotplugging
an ethernet dongle in a Thunderbolt port on Linux.

INFO: task kworker/0:4:1468 blocked for more than 120 seconds.
  Tainted: GW   4.6.0-rc1+ #1
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 kworker/0:4 D 8802a265ba38 13344  1468  2 0x
 Workqueue: kacpid acpi_os_execute_deferred
 8802a265ba38 8802a265ba00 81130200 81e0d580
 88029e5eb340 8802a265c000 88029d69d000 88029e5eb340
 818c1b8d 8802b64e8758 8802a265ba50 818bdfcc
 Call Trace:
 [] ? test_callback+0x10/0x30
 [] ? __down_timeout+0x5d/0xd0
 [] schedule+0x3c/0x90
 [] schedule_timeout+0x210/0x360
 [] ? sched_clock+0x9/0x10
 [] ? local_clock+0x1c/0x20
 [] ? mark_held_locks+0x76/0xa0
 [] ? _raw_spin_unlock_irq+0x2c/0x40
 [] ? __down_timeout+0x5d/0xd0
 [] ? trace_hardirqs_on_caller+0xf5/0x1b0
 [] ? __down_timeout+0x5d/0xd0
 [] __down_timeout+0x7c/0xd0
 [] ? _raw_spin_lock_irqsave+0x82/0x90
 [] down_timeout+0x4c/0x60
 [] acpi_os_wait_semaphore+0xaa/0x16a
 [] acpi_ex_system_wait_mutex+0x81/0xfa
 [] acpi_ds_begin_method_execution+0x25a/0x373
 [] acpi_ds_call_control_method+0x107/0x2e0
 [] acpi_ps_parse_aml+0x177/0x495
 [] acpi_ps_execute_method+0x1f7/0x2b9
 [] acpi_ns_evaluate+0x2ee/0x435
 [] acpi_ev_asynch_execute_gpe_method+0xbd/0x159
 [] acpi_os_execute_deferred+0x17/0x23
 [] process_one_work+0x242/0x700
 [] ? process_one_work+0x1ba/0x700
 [] worker_thread+0x4e/0x490
 [] ? process_one_work+0x700/0x700
 [] ? process_one_work+0x700/0x700
 [] kthread+0x101/0x120
 [] ? trace_hardirqs_on_caller+0xf5/0x1b0
 [] ret_from_fork+0x22/0x50
 [] ? kthread_create_on_node+0x250/0x250
 2 locks held by kworker/0:4/1468:
 #0:  ("kacpid"){.+.+.+}, at: [] process_one_work+0x1ba/0x700
 #1:  ((>work)){+.+.+.}, at: [] 
process_one_work+0x1ba/0x700

The issue appears to be that the kworker thread attempts to acquire the
_E42 method's mutex twice when executing  acpi_ps_execute_method() and
recursing through the entry method.

The current code does take the possiblity of this recursion into account,
however, it is only for the case where the walk_state has been populated.

This can be fixed by setting the thread id in the !walk_state case to
allow for recursion.

Cc: Robert Moore 
Cc: Lv Zheng 
Cc: "Rafael J. Wysocki" 
Cc: Len Brown 
Cc: linux-a...@vger.kernel.org
Cc: de...@acpica.org
Signed-off-by: Prarit Bhargava 
---
 drivers/acpi/acpica/dsmethod.c |3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/acpi/acpica/dsmethod.c b/drivers/acpi/acpica/dsmethod.c
index 1982310..93799db 100644
--- a/drivers/acpi/acpica/dsmethod.c
+++ b/drivers/acpi/acpica/dsmethod.c
@@ -428,6 +428,9 @@ acpi_ds_begin_method_execution(struct acpi_namespace_node 
*method_node,
obj_desc->method.mutex->mutex.
original_sync_level =
obj_desc->method.mutex->mutex.sync_level;
+
+   obj_desc->method.mutex->mutex.thread_id =
+   acpi_os_get_thread_id();
}
}
 
-- 
1.7.9.3



[PATCH] acpi/acpica: fix Thunderbolt hotplug

2016-03-29 Thread Prarit Bhargava
The following hung task trace is seen when hotplugging
an ethernet dongle in a Thunderbolt port on Linux.

INFO: task kworker/0:4:1468 blocked for more than 120 seconds.
  Tainted: GW   4.6.0-rc1+ #1
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 kworker/0:4 D 8802a265ba38 13344  1468  2 0x
 Workqueue: kacpid acpi_os_execute_deferred
 8802a265ba38 8802a265ba00 81130200 81e0d580
 88029e5eb340 8802a265c000 88029d69d000 88029e5eb340
 818c1b8d 8802b64e8758 8802a265ba50 818bdfcc
 Call Trace:
 [] ? test_callback+0x10/0x30
 [] ? __down_timeout+0x5d/0xd0
 [] schedule+0x3c/0x90
 [] schedule_timeout+0x210/0x360
 [] ? sched_clock+0x9/0x10
 [] ? local_clock+0x1c/0x20
 [] ? mark_held_locks+0x76/0xa0
 [] ? _raw_spin_unlock_irq+0x2c/0x40
 [] ? __down_timeout+0x5d/0xd0
 [] ? trace_hardirqs_on_caller+0xf5/0x1b0
 [] ? __down_timeout+0x5d/0xd0
 [] __down_timeout+0x7c/0xd0
 [] ? _raw_spin_lock_irqsave+0x82/0x90
 [] down_timeout+0x4c/0x60
 [] acpi_os_wait_semaphore+0xaa/0x16a
 [] acpi_ex_system_wait_mutex+0x81/0xfa
 [] acpi_ds_begin_method_execution+0x25a/0x373
 [] acpi_ds_call_control_method+0x107/0x2e0
 [] acpi_ps_parse_aml+0x177/0x495
 [] acpi_ps_execute_method+0x1f7/0x2b9
 [] acpi_ns_evaluate+0x2ee/0x435
 [] acpi_ev_asynch_execute_gpe_method+0xbd/0x159
 [] acpi_os_execute_deferred+0x17/0x23
 [] process_one_work+0x242/0x700
 [] ? process_one_work+0x1ba/0x700
 [] worker_thread+0x4e/0x490
 [] ? process_one_work+0x700/0x700
 [] ? process_one_work+0x700/0x700
 [] kthread+0x101/0x120
 [] ? trace_hardirqs_on_caller+0xf5/0x1b0
 [] ret_from_fork+0x22/0x50
 [] ? kthread_create_on_node+0x250/0x250
 2 locks held by kworker/0:4/1468:
 #0:  ("kacpid"){.+.+.+}, at: [] process_one_work+0x1ba/0x700
 #1:  ((>work)){+.+.+.}, at: [] 
process_one_work+0x1ba/0x700

The issue appears to be that the kworker thread attempts to acquire the
_E42 method's mutex twice when executing  acpi_ps_execute_method() and
recursing through the entry method.

The current code does take the possiblity of this recursion into account,
however, it is only for the case where the walk_state has been populated.

This can be fixed by setting the thread id in the !walk_state case to
allow for recursion.

Cc: Robert Moore 
Cc: Lv Zheng 
Cc: "Rafael J. Wysocki" 
Cc: Len Brown 
Cc: linux-a...@vger.kernel.org
Cc: de...@acpica.org
Signed-off-by: Prarit Bhargava 
---
 drivers/acpi/acpica/dsmethod.c |3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/acpi/acpica/dsmethod.c b/drivers/acpi/acpica/dsmethod.c
index 1982310..93799db 100644
--- a/drivers/acpi/acpica/dsmethod.c
+++ b/drivers/acpi/acpica/dsmethod.c
@@ -428,6 +428,9 @@ acpi_ds_begin_method_execution(struct acpi_namespace_node 
*method_node,
obj_desc->method.mutex->mutex.
original_sync_level =
obj_desc->method.mutex->mutex.sync_level;
+
+   obj_desc->method.mutex->mutex.thread_id =
+   acpi_os_get_thread_id();
}
}
 
-- 
1.7.9.3



Re: [PATCH v3] sparc64: Reduce TLB flushes during hugepte changes

2016-03-29 Thread kbuild test robot
Hi Nitin,

[auto build test ERROR on sparc/master]
[also build test ERROR on v4.6-rc1 next-20160329]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improving the system]

url:
https://github.com/0day-ci/linux/commits/Nitin-Gupta/sparc64-Reduce-TLB-flushes-during-hugepte-changes/20160330-051327
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc.git master
config: sparc64-allnoconfig (attached as .config)
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=sparc64 

All errors (new ones prefixed by >>):

   arch/sparc/mm/tlb.c: In function 'tlb_batch_add':
>> arch/sparc/mm/tlb.c:115:2: error: implicit declaration of function 
>> 'is_hugetlb_pte' [-Werror=implicit-function-declaration]
 bool huge = is_hugetlb_pte(orig);
 ^
   cc1: all warnings being treated as errors

vim +/is_hugetlb_pte +115 arch/sparc/mm/tlb.c

   109  put_cpu_var(tlb_batch);
   110  }
   111  
   112  void tlb_batch_add(struct mm_struct *mm, unsigned long vaddr,
   113 pte_t *ptep, pte_t orig, int fullmm)
   114  {
 > 115  bool huge = is_hugetlb_pte(orig);
   116  
   117  if (tlb_type != hypervisor &&
   118  pte_dirty(orig)) {

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [PATCH v3] sparc64: Reduce TLB flushes during hugepte changes

2016-03-29 Thread kbuild test robot
Hi Nitin,

[auto build test ERROR on sparc/master]
[also build test ERROR on v4.6-rc1 next-20160329]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improving the system]

url:
https://github.com/0day-ci/linux/commits/Nitin-Gupta/sparc64-Reduce-TLB-flushes-during-hugepte-changes/20160330-051327
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc.git master
config: sparc64-allnoconfig (attached as .config)
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=sparc64 

All errors (new ones prefixed by >>):

   arch/sparc/mm/tlb.c: In function 'tlb_batch_add':
>> arch/sparc/mm/tlb.c:115:2: error: implicit declaration of function 
>> 'is_hugetlb_pte' [-Werror=implicit-function-declaration]
 bool huge = is_hugetlb_pte(orig);
 ^
   cc1: all warnings being treated as errors

vim +/is_hugetlb_pte +115 arch/sparc/mm/tlb.c

   109  put_cpu_var(tlb_batch);
   110  }
   111  
   112  void tlb_batch_add(struct mm_struct *mm, unsigned long vaddr,
   113 pte_t *ptep, pte_t orig, int fullmm)
   114  {
 > 115  bool huge = is_hugetlb_pte(orig);
   116  
   117  if (tlb_type != hypervisor &&
   118  pte_dirty(orig)) {

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: fallocate INSERT_RANGE/COLLAPSE_RANGE is completely broken [PATCH]

2016-03-29 Thread Dave Chinner
On Mon, Mar 28, 2016 at 10:04:10PM -0800, Kent Overstreet wrote:
> On Tue, Mar 29, 2016 at 04:15:58PM +1100, Dave Chinner wrote:
> > On Mon, Mar 28, 2016 at 08:25:46PM -0800, Kent Overstreet wrote:
> > > Bit of previous discussion:
> > > http://thread.gmane.org/gmane.linux.file-systems/101201/
> > > 
> > > The underlying issue is that we have no mechanism for invalidating a 
> > > range of
> > > the pagecache and then _keeping it invalidated_ while we Do Stuff. 
> > > 
> > > The fallocate INSERT_RANGE/COLLAPSE_RANGE situation seems likely to be 
> > > worse
> > > than I initially thought. I've been digging into this in the course of 
> > > bcachefs
> > > testing - I was hitting assertions that meant state hanging off the page 
> > > cache
> > > (in this case, allocation information, i.e. whether we needed to reserve 
> > > space
> > > on write) was inconsistent with the btree in writepages().
> > > 
> > > Well, bcachefs isn't the only filesystem that hangs additional state off 
> > > the
> > > pagecache, and the situation today is that an unpriviliged user can cause
> > > inconsistencies there by just doing buffered reads concurrently with
> > > INSERT_RANGE/COLLAPSE_RANGE. I highly highly doubt this is an issue of 
> > > just
> > > "oops, you corrupted your file because you were doing stupid stuff" - who 
> > > knows
> > > what internal invariants are getting broken here, and I don't 
> > > particularly care
> > > to find out.
> > 
> > I'd like to see a test case for this. Concurrent IO and/or page
> > faults should not run at the same as fallocate on XFS. Hence I'd
> > like to see the test cases that demonstrate buffered reads are
> > causing corruption during insert/collapse range operations. We use
> > the same locking strategy for fallocate as we use for truncate and
> > all the other internal extent manipulation operations, so if there's
> > something wrong, we need to fix it.
> 
> It's entirely possible I'm wrong about XFS - your fault path locking looked
> correct, and I did see you had extra locking in your buffered read path but I
> thought it was a different lock. I'll recheck later, but for the moment I'm 
> just
> going to assume I misspoke (and tbh always found xfs's locking to be quite
> rigorous).

There are two locks the XFS_IOLOCK for read/write/splice IO path vs
truncate/fallocate exclusion, and XFS_MMAPLOCK for page fault vs
truncate/fallocate exclusion.

> ext4 uses the generic code in all the places you're hooking into though -
> .fault, .read_iter, etc.
> 
> The scheme I've got in this patch should perform quite a bit better than what
> you're doing - only locking in the slow cache miss path, vs. every time you
> touch the page cache.

I'm not sure I follow - how does this work when a fallocate
operation use the page cache for, say, zeroing data blocks rather
than invalidating them (e.g.  FALLOC_FL_ZERO_RANGE can validly zero
blocks through the page cache, so can hole punching)?  Won't the
buffered read then return a mix of real and zeroed data, depending
who wins the race to each underlying page lock?

i.e. if the locking only occurs in the page insert slow path, then
it doesn't provide sufficient exclusion for extent manipulation
operations that use the page cache during their normal operation.
IOWs, other, higher level synchronisation methods for fallocate
are still necessary

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: fallocate INSERT_RANGE/COLLAPSE_RANGE is completely broken [PATCH]

2016-03-29 Thread Dave Chinner
On Mon, Mar 28, 2016 at 10:04:10PM -0800, Kent Overstreet wrote:
> On Tue, Mar 29, 2016 at 04:15:58PM +1100, Dave Chinner wrote:
> > On Mon, Mar 28, 2016 at 08:25:46PM -0800, Kent Overstreet wrote:
> > > Bit of previous discussion:
> > > http://thread.gmane.org/gmane.linux.file-systems/101201/
> > > 
> > > The underlying issue is that we have no mechanism for invalidating a 
> > > range of
> > > the pagecache and then _keeping it invalidated_ while we Do Stuff. 
> > > 
> > > The fallocate INSERT_RANGE/COLLAPSE_RANGE situation seems likely to be 
> > > worse
> > > than I initially thought. I've been digging into this in the course of 
> > > bcachefs
> > > testing - I was hitting assertions that meant state hanging off the page 
> > > cache
> > > (in this case, allocation information, i.e. whether we needed to reserve 
> > > space
> > > on write) was inconsistent with the btree in writepages().
> > > 
> > > Well, bcachefs isn't the only filesystem that hangs additional state off 
> > > the
> > > pagecache, and the situation today is that an unpriviliged user can cause
> > > inconsistencies there by just doing buffered reads concurrently with
> > > INSERT_RANGE/COLLAPSE_RANGE. I highly highly doubt this is an issue of 
> > > just
> > > "oops, you corrupted your file because you were doing stupid stuff" - who 
> > > knows
> > > what internal invariants are getting broken here, and I don't 
> > > particularly care
> > > to find out.
> > 
> > I'd like to see a test case for this. Concurrent IO and/or page
> > faults should not run at the same as fallocate on XFS. Hence I'd
> > like to see the test cases that demonstrate buffered reads are
> > causing corruption during insert/collapse range operations. We use
> > the same locking strategy for fallocate as we use for truncate and
> > all the other internal extent manipulation operations, so if there's
> > something wrong, we need to fix it.
> 
> It's entirely possible I'm wrong about XFS - your fault path locking looked
> correct, and I did see you had extra locking in your buffered read path but I
> thought it was a different lock. I'll recheck later, but for the moment I'm 
> just
> going to assume I misspoke (and tbh always found xfs's locking to be quite
> rigorous).

There are two locks the XFS_IOLOCK for read/write/splice IO path vs
truncate/fallocate exclusion, and XFS_MMAPLOCK for page fault vs
truncate/fallocate exclusion.

> ext4 uses the generic code in all the places you're hooking into though -
> .fault, .read_iter, etc.
> 
> The scheme I've got in this patch should perform quite a bit better than what
> you're doing - only locking in the slow cache miss path, vs. every time you
> touch the page cache.

I'm not sure I follow - how does this work when a fallocate
operation use the page cache for, say, zeroing data blocks rather
than invalidating them (e.g.  FALLOC_FL_ZERO_RANGE can validly zero
blocks through the page cache, so can hole punching)?  Won't the
buffered read then return a mix of real and zeroed data, depending
who wins the race to each underlying page lock?

i.e. if the locking only occurs in the page insert slow path, then
it doesn't provide sufficient exclusion for extent manipulation
operations that use the page cache during their normal operation.
IOWs, other, higher level synchronisation methods for fallocate
are still necessary

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: [REGRESSION 4.6-rc1] NFS mounts (using autofs) failing

2016-03-29 Thread Junichi Nomura
On 03/30/16 04:17, Arend van Spriel wrote:
> On 29-03-16 16:02, Al Viro wrote:
>> On Tue, Mar 29, 2016 at 01:11:55PM +0200, Arend Van Spriel wrote:
>>> Moved to 4.6-rc1 and found NFS mounts were failing moving to the new
>>> kernel. The NFS mounts are done using autofs. Below is the bisect log
>>> and attached the kernel .config file. Let me know if you need any other
>>> information.
>>
>> AFAICS, it's the same one that got reported yesterday by Junichi Nomura.
>> Folks, could you check if the delta below fixes it?
> 
> Works for me so you may add
> 
> Tested-by: Arend van Spriel 

Yes, that works for me, too. Thank you.

Tested-by: Jun'ichi Nomura 

-- 
Jun'ichi Nomura, NEC Corporation


Re: [REGRESSION 4.6-rc1] NFS mounts (using autofs) failing

2016-03-29 Thread Junichi Nomura
On 03/30/16 04:17, Arend van Spriel wrote:
> On 29-03-16 16:02, Al Viro wrote:
>> On Tue, Mar 29, 2016 at 01:11:55PM +0200, Arend Van Spriel wrote:
>>> Moved to 4.6-rc1 and found NFS mounts were failing moving to the new
>>> kernel. The NFS mounts are done using autofs. Below is the bisect log
>>> and attached the kernel .config file. Let me know if you need any other
>>> information.
>>
>> AFAICS, it's the same one that got reported yesterday by Junichi Nomura.
>> Folks, could you check if the delta below fixes it?
> 
> Works for me so you may add
> 
> Tested-by: Arend van Spriel 

Yes, that works for me, too. Thank you.

Tested-by: Jun'ichi Nomura 

-- 
Jun'ichi Nomura, NEC Corporation


<    1   2   3   4   5   6   7   8   9   10   >