Re: RFC: WMI Enhancements

2017-04-17 Thread Darren Hart
On Mon, Apr 17, 2017 at 03:03:29PM -0700, Andy Lutomirski wrote:
> On Fri, Apr 14, 2017 at 4:05 PM, Darren Hart  wrote:
> > On Sat, Apr 15, 2017 at 12:45:30AM +0200, Rafael Wysocki wrote:
> >> On Wednesday, April 12, 2017 04:08:54 PM Darren Hart wrote:
> >> > Hi All,
> >> >
> >> > There are a few parallel efforts involving the Windows Management
> >> > Instrumentation (WMI)[1] and dependent/related drivers. I'd like to have 
> >> > a round of
> >> > discussion among those of you that have been involved in this space 
> >> > before we
> >> > decide on a direction.
> >> >
> >> > The WMI support in the kernel today fairly narrowly supports a handful of
> >> > systems. Andy L. has a work-in-progress series [2] which converts wmi 
> >> > into a
> >> > platform device and a proper bus, providing devices for dependent 
> >> > drivers to
> >> > bind to, and a mechanism for sibling devices to communicate with each 
> >> > other.
> >> > I've reviewed the series and feel like the approach is sound, I plan to 
> >> > carry
> >> > this series forward and merge it (with Andy L's permission).
> >> >
> >> > Are there any objections to this?
> >> >
> >> > In Windows, applications interact with WMI more or less directly. We 
> >> > don't do
> >> > this in Linux currently, although it has been discussed in the past [3]. 
> >> > Some
> >> > vendors will work around this by performing SMI/SMM, which is 
> >> > inefficient at
> >> > best. Exposing WMI methods to userspace would bring parity to WMI for 
> >> > Linux and
> >> > Windows.
> >> >
> >> > There are two principal concerns I'd appreciate your thoughts on:
> >> >
> >> > a) As an undiscoverable interface (you need to know the method 
> >> > signatures ahead
> >> > of time), universally exposing every WMI "device" to userspace seems 
> >> > like "a bad
> >> > idea" from a security and stability perspective. While access would 
> >> > certainly be
> >> > privileged, it seems more prudent to make this exposure opt-in. We also 
> >> > handle
> >> > some of this with kernel drivers and exposing those "devices" to 
> >> > userspace would
> >> > enable userspace and the kernel to fight over control. So - if we expose 
> >> > WMI
> >> > devices to userspace, I believe this should be done on a case by case 
> >> > basis,
> >> > opting in, and not by default as part of the WMI driver (although it can 
> >> > provide
> >> > the mechanism for a sub-driver to use), and possibly a devmode to do so 
> >> > by
> >> > default.
> >>
> >> A couple of loose thoughts here.
> >>
> >> In principle there could be a "generic default WMI driver" or similar that 
> >> would
> >> "claim" all WMI "devices" that have not been "claimed" by anyone else and 
> >> would
> >> simply expose them to user space somehow (e.g. using a chardev interface).
> >>
> >> Then, depending on how that thing is implemented, opt-in etc should be 
> >> possible
> >> too.
> >>
> >
> > I think we agree this would be an ideal approach.
> >
> > As we look into this more, it is becoming clear that the necessary 
> > functionality
> > is not nicely divided into GUIDs for what is necessary in userspace and 
> > what is
> > handled in the kernel. A single WMI METHOD GUID may be needed by userspace 
> > for
> > certain functionality, while the kernel drivers may use it for something 
> > else.
> >
> > :-(
> >
> > The input to a WMI method is just a buffer, so it is very free form. One
> > approach Mario has mentioned was to audit the user space WMI METHOD calls 
> > in the
> > kernel platform drivers and reject those calls with arguments matching those
> > issued by the kernel driver. This is likely to be complex and error prone 
> > in my
> > opinion. However, I have not yet thought of another means to meet the
> > requirement of having disjoint feature sets for userspace and kernel space 
> > via a
> > mechanism that was effectively designed to be used solely from user space 
> > with
> > vendor defined method signatures.
> >
> > Next step is to look at just how complex it would be to audit the method 
> > calls
> > the kernel currently uses.
> 
> I'm wondering whether it's really worth it.  We already allow doing
> darned near anything using dcdbas.  Maybe the world won't end if we
> expose a complete-ish ioctl interface to WMI.
> 
> Also, dcdbas is, to put it mildly, a bit ridiculous.  It seems to be a
> seriously awkward sysfs interface that allows you to, drumroll please,
> issue outb and inb instructions.  It doesn't even check that it's
> running on a Dell system.  It might be nice to deprecate it some day
> in favor of a real interface.  I'd consider a low-level WMI ioctl
> interface to be a vast improvement.
> 

I've been reluctantly arriving here as well. Given that every WMI interface will
be vendor specific, and non-discoverable, it seems unlikely developers will
eagerly duplicate kernel functionality in user-space. And if they do, it will
affect very few platforms.

I still think it makes sense 

Re: RFC: WMI Enhancements

2017-04-17 Thread Darren Hart
On Mon, Apr 17, 2017 at 03:03:29PM -0700, Andy Lutomirski wrote:
> On Fri, Apr 14, 2017 at 4:05 PM, Darren Hart  wrote:
> > On Sat, Apr 15, 2017 at 12:45:30AM +0200, Rafael Wysocki wrote:
> >> On Wednesday, April 12, 2017 04:08:54 PM Darren Hart wrote:
> >> > Hi All,
> >> >
> >> > There are a few parallel efforts involving the Windows Management
> >> > Instrumentation (WMI)[1] and dependent/related drivers. I'd like to have 
> >> > a round of
> >> > discussion among those of you that have been involved in this space 
> >> > before we
> >> > decide on a direction.
> >> >
> >> > The WMI support in the kernel today fairly narrowly supports a handful of
> >> > systems. Andy L. has a work-in-progress series [2] which converts wmi 
> >> > into a
> >> > platform device and a proper bus, providing devices for dependent 
> >> > drivers to
> >> > bind to, and a mechanism for sibling devices to communicate with each 
> >> > other.
> >> > I've reviewed the series and feel like the approach is sound, I plan to 
> >> > carry
> >> > this series forward and merge it (with Andy L's permission).
> >> >
> >> > Are there any objections to this?
> >> >
> >> > In Windows, applications interact with WMI more or less directly. We 
> >> > don't do
> >> > this in Linux currently, although it has been discussed in the past [3]. 
> >> > Some
> >> > vendors will work around this by performing SMI/SMM, which is 
> >> > inefficient at
> >> > best. Exposing WMI methods to userspace would bring parity to WMI for 
> >> > Linux and
> >> > Windows.
> >> >
> >> > There are two principal concerns I'd appreciate your thoughts on:
> >> >
> >> > a) As an undiscoverable interface (you need to know the method 
> >> > signatures ahead
> >> > of time), universally exposing every WMI "device" to userspace seems 
> >> > like "a bad
> >> > idea" from a security and stability perspective. While access would 
> >> > certainly be
> >> > privileged, it seems more prudent to make this exposure opt-in. We also 
> >> > handle
> >> > some of this with kernel drivers and exposing those "devices" to 
> >> > userspace would
> >> > enable userspace and the kernel to fight over control. So - if we expose 
> >> > WMI
> >> > devices to userspace, I believe this should be done on a case by case 
> >> > basis,
> >> > opting in, and not by default as part of the WMI driver (although it can 
> >> > provide
> >> > the mechanism for a sub-driver to use), and possibly a devmode to do so 
> >> > by
> >> > default.
> >>
> >> A couple of loose thoughts here.
> >>
> >> In principle there could be a "generic default WMI driver" or similar that 
> >> would
> >> "claim" all WMI "devices" that have not been "claimed" by anyone else and 
> >> would
> >> simply expose them to user space somehow (e.g. using a chardev interface).
> >>
> >> Then, depending on how that thing is implemented, opt-in etc should be 
> >> possible
> >> too.
> >>
> >
> > I think we agree this would be an ideal approach.
> >
> > As we look into this more, it is becoming clear that the necessary 
> > functionality
> > is not nicely divided into GUIDs for what is necessary in userspace and 
> > what is
> > handled in the kernel. A single WMI METHOD GUID may be needed by userspace 
> > for
> > certain functionality, while the kernel drivers may use it for something 
> > else.
> >
> > :-(
> >
> > The input to a WMI method is just a buffer, so it is very free form. One
> > approach Mario has mentioned was to audit the user space WMI METHOD calls 
> > in the
> > kernel platform drivers and reject those calls with arguments matching those
> > issued by the kernel driver. This is likely to be complex and error prone 
> > in my
> > opinion. However, I have not yet thought of another means to meet the
> > requirement of having disjoint feature sets for userspace and kernel space 
> > via a
> > mechanism that was effectively designed to be used solely from user space 
> > with
> > vendor defined method signatures.
> >
> > Next step is to look at just how complex it would be to audit the method 
> > calls
> > the kernel currently uses.
> 
> I'm wondering whether it's really worth it.  We already allow doing
> darned near anything using dcdbas.  Maybe the world won't end if we
> expose a complete-ish ioctl interface to WMI.
> 
> Also, dcdbas is, to put it mildly, a bit ridiculous.  It seems to be a
> seriously awkward sysfs interface that allows you to, drumroll please,
> issue outb and inb instructions.  It doesn't even check that it's
> running on a Dell system.  It might be nice to deprecate it some day
> in favor of a real interface.  I'd consider a low-level WMI ioctl
> interface to be a vast improvement.
> 

I've been reluctantly arriving here as well. Given that every WMI interface will
be vendor specific, and non-discoverable, it seems unlikely developers will
eagerly duplicate kernel functionality in user-space. And if they do, it will
affect very few platforms.

I still think it makes sense to only expose a WMI 

[PATCH 2/3] perf hists browser: add callchain-specific annotation

2017-04-17 Thread Alexis Berlemont
By pressing 'a' in the hists browser, the user gets an annotated view of
the code of the selected symbols.

This patch adds the case 'A'; if this key is pressed (and if the
call-graph option was enabled at record time), perf will build
the callchain from the top frame until the selected symbol and select
the corresponding annotation histogram.

Thus, the user will get an annotated view of the symbol code specific to
the callchain displayed in the hists browser.

Signed-off-by: Alexis Berlemont 
---
 tools/perf/ui/browsers/hists.c | 188 -
 1 file changed, 185 insertions(+), 3 deletions(-)

diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index da24072bb76e..9ed7fdc4dc51 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -2543,17 +2543,157 @@ struct popup_action {
struct thread   *thread;
struct map_symbol   ms;
int socket;
+   boolcontext_annotate;
 
int (*fn)(struct hist_browser *browser, struct popup_action *act);
 };
 
+static int copy_call_list_entry(struct list_head *callchain,
+   struct callchain_list *call)
+{
+   struct call_list_entry *entry = zalloc(sizeof(*entry));
+
+   if (!entry) {
+   perror("not enough memory to scan callchains");
+   return -1;
+   }
+
+   entry->ip = entry->sym_start = call->ip;
+   if (call->ms.sym != NULL)
+   entry->sym_start = call->ms.sym->start;
+
+   list_add_tail(>list, callchain);
+
+   return 0;
+}
+
+static int __hist_browser_build_callchain(struct list_head *callchain,
+ struct rb_root *root,
+ struct callchain_list *target)
+{
+   struct callchain_node *tmp_node;
+   struct rb_node *node;
+   struct callchain_list *call;
+   struct call_list_entry *new_call, *tmp;
+
+   node = rb_first(root);
+
+   while (node) {
+   char folded_sign = ' ';
+   size_t added_count = 0;
+
+   tmp_node = rb_entry(node, struct callchain_node, rb_node);
+
+   /*
+* If the callchain display mode is "flat", the list
+* "parent_val" may contain the entries in common.
+*/
+
+   list_for_each_entry(call, _node->parent_val, list) {
+
+   /*
+* If we have not found the highlighted callchain
+* entry...
+*/
+
+   if (target == call)
+   return 0;
+
+   /*
+* ...we need to keep the current element: the next
+* one could be the right one and we need to build a
+* callchain.
+*/
+
+   if (copy_call_list_entry(callchain, call) < 0)
+   return -1;
+
+   added_count++;
+   }
+
+   /*
+* If the callchain display mode is "graph", "fractal" or even
+* "flat", the callchain entries (the last one for "flat" are in
+* the list "val".
+*/
+
+   list_for_each_entry(call, _node->val, list) {
+
+   /*
+* If we have not found the highlighted callchain
+* entry...
+*/
+
+   if (target == call)
+   return 0;
+
+   /*
+* ...we need to keep the current element: the next
+* one could be the right one and we need to build a
+* callchain.
+*/
+
+   if (copy_call_list_entry(callchain, call) < 0)
+   return -1;
+
+   added_count++;
+
+   /*
+* If we meet the folded sign '+' (and if the current
+* element does not match), there is no need to go
+* further, the callchain elements below cannot be the
+* ones we are looking for.
+*/
+
+   folded_sign = callchain_list__folded(call);
+   if (folded_sign == '+')
+   break;
+   }
+
+   /*
+* If the last scanned entry is unfolded, the callchain element
+* we are looking for may be behing; so, let's scan its tree of
+* callchain nodes.
+*/
+
+   if (folded_sign == '-' &&
+   

[PATCH 3/3] perf report: fill per-callchain symbol annotation histograms

2017-04-17 Thread Alexis Berlemont
The per-callchain histograms must be fed at some points with the
profiling samples. A solution is to fill them right after having filled
the per-symbol ones: in the callback hist_iter__report_callback.

Signed-off-by: Alexis Berlemont 
---
 tools/perf/builtin-report.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index c18158b83eb1..d825a599d4b4 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -136,6 +136,14 @@ static int hist_iter__report_callback(struct 
hist_entry_iter *iter,
if (single)
err = hist_entry__inc_addr_samples(he, evsel->idx,
   al->addr);
+   if (err == 0) {
+   struct callchain_cursor *cursor = _cursor;
+
+   err = hist_entry_cxt__inc_addr_samples(he,
+  evsel->idx,
+  al->addr,
+  cursor);
+   }
} else {
err = hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
}
-- 
2.12.2



[PATCH 2/3] perf hists browser: add callchain-specific annotation

2017-04-17 Thread Alexis Berlemont
By pressing 'a' in the hists browser, the user gets an annotated view of
the code of the selected symbols.

This patch adds the case 'A'; if this key is pressed (and if the
call-graph option was enabled at record time), perf will build
the callchain from the top frame until the selected symbol and select
the corresponding annotation histogram.

Thus, the user will get an annotated view of the symbol code specific to
the callchain displayed in the hists browser.

Signed-off-by: Alexis Berlemont 
---
 tools/perf/ui/browsers/hists.c | 188 -
 1 file changed, 185 insertions(+), 3 deletions(-)

diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index da24072bb76e..9ed7fdc4dc51 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -2543,17 +2543,157 @@ struct popup_action {
struct thread   *thread;
struct map_symbol   ms;
int socket;
+   boolcontext_annotate;
 
int (*fn)(struct hist_browser *browser, struct popup_action *act);
 };
 
+static int copy_call_list_entry(struct list_head *callchain,
+   struct callchain_list *call)
+{
+   struct call_list_entry *entry = zalloc(sizeof(*entry));
+
+   if (!entry) {
+   perror("not enough memory to scan callchains");
+   return -1;
+   }
+
+   entry->ip = entry->sym_start = call->ip;
+   if (call->ms.sym != NULL)
+   entry->sym_start = call->ms.sym->start;
+
+   list_add_tail(>list, callchain);
+
+   return 0;
+}
+
+static int __hist_browser_build_callchain(struct list_head *callchain,
+ struct rb_root *root,
+ struct callchain_list *target)
+{
+   struct callchain_node *tmp_node;
+   struct rb_node *node;
+   struct callchain_list *call;
+   struct call_list_entry *new_call, *tmp;
+
+   node = rb_first(root);
+
+   while (node) {
+   char folded_sign = ' ';
+   size_t added_count = 0;
+
+   tmp_node = rb_entry(node, struct callchain_node, rb_node);
+
+   /*
+* If the callchain display mode is "flat", the list
+* "parent_val" may contain the entries in common.
+*/
+
+   list_for_each_entry(call, _node->parent_val, list) {
+
+   /*
+* If we have not found the highlighted callchain
+* entry...
+*/
+
+   if (target == call)
+   return 0;
+
+   /*
+* ...we need to keep the current element: the next
+* one could be the right one and we need to build a
+* callchain.
+*/
+
+   if (copy_call_list_entry(callchain, call) < 0)
+   return -1;
+
+   added_count++;
+   }
+
+   /*
+* If the callchain display mode is "graph", "fractal" or even
+* "flat", the callchain entries (the last one for "flat" are in
+* the list "val".
+*/
+
+   list_for_each_entry(call, _node->val, list) {
+
+   /*
+* If we have not found the highlighted callchain
+* entry...
+*/
+
+   if (target == call)
+   return 0;
+
+   /*
+* ...we need to keep the current element: the next
+* one could be the right one and we need to build a
+* callchain.
+*/
+
+   if (copy_call_list_entry(callchain, call) < 0)
+   return -1;
+
+   added_count++;
+
+   /*
+* If we meet the folded sign '+' (and if the current
+* element does not match), there is no need to go
+* further, the callchain elements below cannot be the
+* ones we are looking for.
+*/
+
+   folded_sign = callchain_list__folded(call);
+   if (folded_sign == '+')
+   break;
+   }
+
+   /*
+* If the last scanned entry is unfolded, the callchain element
+* we are looking for may be behing; so, let's scan its tree of
+* callchain nodes.
+*/
+
+   if (folded_sign == '-' &&
+   __hist_browser_build_callchain(callchain,
+  

[PATCH 3/3] perf report: fill per-callchain symbol annotation histograms

2017-04-17 Thread Alexis Berlemont
The per-callchain histograms must be fed at some points with the
profiling samples. A solution is to fill them right after having filled
the per-symbol ones: in the callback hist_iter__report_callback.

Signed-off-by: Alexis Berlemont 
---
 tools/perf/builtin-report.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index c18158b83eb1..d825a599d4b4 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -136,6 +136,14 @@ static int hist_iter__report_callback(struct 
hist_entry_iter *iter,
if (single)
err = hist_entry__inc_addr_samples(he, evsel->idx,
   al->addr);
+   if (err == 0) {
+   struct callchain_cursor *cursor = _cursor;
+
+   err = hist_entry_cxt__inc_addr_samples(he,
+  evsel->idx,
+  al->addr,
+  cursor);
+   }
} else {
err = hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
}
-- 
2.12.2



[PATCH 1/3] perf annotate: implement per-callchain annotation histogram

2017-04-17 Thread Alexis Berlemont
A symbol can be called from various points and according to its calling
context, it can provide different results.

This patch creates one histogram for every different callchain recorded
and accumulates profiling samples into the appropriate one. Samples
recorded with the same callchain are not mixed with other ones, The
noise should be reduced.

Signed-off-by: Alexis Berlemont 
---
 tools/perf/util/annotate.c | 308 -
 tools/perf/util/annotate.h |  19 +++
 2 files changed, 324 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 30498a2d4a6f..fa7691a0f205 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -18,6 +18,7 @@
 #include "annotate.h"
 #include "evsel.h"
 #include "block-range.h"
+#include "callchain.h"
 #include "arch/common.h"
 #include 
 #include 
@@ -554,7 +555,7 @@ int symbol__alloc_hist(struct symbol *sym)
 {
struct annotation *notes = symbol__annotation(sym);
const size_t size = symbol__size(sym);
-   size_t sizeof_sym_hist;
+   size_t sizeof_sym_hist, nr_sym_hist;
 
/* Check for overflow when calculating sizeof_sym_hist */
if (size > (SIZE_MAX - sizeof(struct sym_hist)) / sizeof(u64))
@@ -567,12 +568,17 @@ int symbol__alloc_hist(struct symbol *sym)
/ symbol_conf.nr_events)
return -1;
 
-   notes->src = zalloc(sizeof(*notes->src) + symbol_conf.nr_events * 
sizeof_sym_hist);
+   /* Allocate 1 histogram per event and 1 more for context annotation */
+   nr_sym_hist = symbol_conf.nr_events + 1;
+
+   notes->src = zalloc(sizeof(*notes->src) +
+   nr_sym_hist * sizeof_sym_hist);
if (notes->src == NULL)
return -1;
notes->src->sizeof_sym_hist = sizeof_sym_hist;
-   notes->src->nr_histograms   = symbol_conf.nr_events;
+   notes->src->nr_histograms   = nr_sym_hist;
INIT_LIST_HEAD(>src->source);
+   INIT_LIST_HEAD(>src->context_hists);
return 0;
 }
 
@@ -591,11 +597,18 @@ static int symbol__alloc_hist_cycles(struct symbol *sym)
 void symbol__annotate_zero_histograms(struct symbol *sym)
 {
struct annotation *notes = symbol__annotation(sym);
+   struct cxt_hist_entry *hist_entry;
 
pthread_mutex_lock(>lock);
if (notes->src != NULL) {
memset(notes->src->histograms, 0,
   notes->src->nr_histograms * notes->src->sizeof_sym_hist);
+
+   list_for_each_entry(hist_entry,
+   >src->context_hists, list) {
+   list_del(_entry->list);
+   free(hist_entry);
+   }
if (notes->src->cycles_hist)
memset(notes->src->cycles_hist, 0,
symbol__size(sym) * sizeof(struct cyc_hist));
@@ -681,6 +694,7 @@ static struct annotation *symbol__get_annotation(struct 
symbol *sym, bool cycles
if (symbol__alloc_hist_cycles(sym) < 0)
return NULL;
}
+
return notes;
 }
 
@@ -697,6 +711,281 @@ static int symbol__inc_addr_samples(struct symbol *sym, 
struct map *map,
return __symbol__inc_addr_samples(sym, map, notes, evidx, addr);
 }
 
+static bool __symbol_cxt__cmp_call_cursor(struct cxt_hist_entry *hist_entry,
+ struct callchain_cursor *cursor)
+{
+   struct callchain_cursor_node *node;
+   struct call_list_entry *entry;
+   u64 match_count = 0;
+
+   callchain_cursor_commit(cursor);
+
+   /*
+* One element of the callchain cursor cannot be matched: the last
+* element (or first according to the order) which corresponds to the
+* real sampled address; so, let's skip it (if need be) and...
+*/
+   if (callchain_param.order == ORDER_CALLEE)
+   callchain_cursor_advance(cursor);
+
+   /* ...consider it has matched */
+   match_count++;
+
+   list_for_each_entry(entry, _entry->callchain, list) {
+
+   node = callchain_cursor_current(cursor);
+   if (node == NULL)
+   break;
+
+   if (entry->ip == node->ip)
+   match_count++;
+
+   callchain_cursor_advance(cursor);
+   }
+
+   return match_count == cursor->nr;
+}
+
+static int __symbol_cxt__copy_call_cursor(struct cxt_hist_entry *hist_entry,
+ struct callchain_cursor *cursor)
+{
+   struct callchain_cursor_node *node;
+   struct call_list_entry *entry, *n;
+
+   callchain_cursor_commit(cursor);
+   node = callchain_cursor_current(cursor);
+
+   /*
+* For each entry in the callchain cursor, we need to copy 2 fields: the
+* text addresses and the symbol in which the address is located...

[PATCH 1/3] perf annotate: implement per-callchain annotation histogram

2017-04-17 Thread Alexis Berlemont
A symbol can be called from various points and according to its calling
context, it can provide different results.

This patch creates one histogram for every different callchain recorded
and accumulates profiling samples into the appropriate one. Samples
recorded with the same callchain are not mixed with other ones, The
noise should be reduced.

Signed-off-by: Alexis Berlemont 
---
 tools/perf/util/annotate.c | 308 -
 tools/perf/util/annotate.h |  19 +++
 2 files changed, 324 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 30498a2d4a6f..fa7691a0f205 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -18,6 +18,7 @@
 #include "annotate.h"
 #include "evsel.h"
 #include "block-range.h"
+#include "callchain.h"
 #include "arch/common.h"
 #include 
 #include 
@@ -554,7 +555,7 @@ int symbol__alloc_hist(struct symbol *sym)
 {
struct annotation *notes = symbol__annotation(sym);
const size_t size = symbol__size(sym);
-   size_t sizeof_sym_hist;
+   size_t sizeof_sym_hist, nr_sym_hist;
 
/* Check for overflow when calculating sizeof_sym_hist */
if (size > (SIZE_MAX - sizeof(struct sym_hist)) / sizeof(u64))
@@ -567,12 +568,17 @@ int symbol__alloc_hist(struct symbol *sym)
/ symbol_conf.nr_events)
return -1;
 
-   notes->src = zalloc(sizeof(*notes->src) + symbol_conf.nr_events * 
sizeof_sym_hist);
+   /* Allocate 1 histogram per event and 1 more for context annotation */
+   nr_sym_hist = symbol_conf.nr_events + 1;
+
+   notes->src = zalloc(sizeof(*notes->src) +
+   nr_sym_hist * sizeof_sym_hist);
if (notes->src == NULL)
return -1;
notes->src->sizeof_sym_hist = sizeof_sym_hist;
-   notes->src->nr_histograms   = symbol_conf.nr_events;
+   notes->src->nr_histograms   = nr_sym_hist;
INIT_LIST_HEAD(>src->source);
+   INIT_LIST_HEAD(>src->context_hists);
return 0;
 }
 
@@ -591,11 +597,18 @@ static int symbol__alloc_hist_cycles(struct symbol *sym)
 void symbol__annotate_zero_histograms(struct symbol *sym)
 {
struct annotation *notes = symbol__annotation(sym);
+   struct cxt_hist_entry *hist_entry;
 
pthread_mutex_lock(>lock);
if (notes->src != NULL) {
memset(notes->src->histograms, 0,
   notes->src->nr_histograms * notes->src->sizeof_sym_hist);
+
+   list_for_each_entry(hist_entry,
+   >src->context_hists, list) {
+   list_del(_entry->list);
+   free(hist_entry);
+   }
if (notes->src->cycles_hist)
memset(notes->src->cycles_hist, 0,
symbol__size(sym) * sizeof(struct cyc_hist));
@@ -681,6 +694,7 @@ static struct annotation *symbol__get_annotation(struct 
symbol *sym, bool cycles
if (symbol__alloc_hist_cycles(sym) < 0)
return NULL;
}
+
return notes;
 }
 
@@ -697,6 +711,281 @@ static int symbol__inc_addr_samples(struct symbol *sym, 
struct map *map,
return __symbol__inc_addr_samples(sym, map, notes, evidx, addr);
 }
 
+static bool __symbol_cxt__cmp_call_cursor(struct cxt_hist_entry *hist_entry,
+ struct callchain_cursor *cursor)
+{
+   struct callchain_cursor_node *node;
+   struct call_list_entry *entry;
+   u64 match_count = 0;
+
+   callchain_cursor_commit(cursor);
+
+   /*
+* One element of the callchain cursor cannot be matched: the last
+* element (or first according to the order) which corresponds to the
+* real sampled address; so, let's skip it (if need be) and...
+*/
+   if (callchain_param.order == ORDER_CALLEE)
+   callchain_cursor_advance(cursor);
+
+   /* ...consider it has matched */
+   match_count++;
+
+   list_for_each_entry(entry, _entry->callchain, list) {
+
+   node = callchain_cursor_current(cursor);
+   if (node == NULL)
+   break;
+
+   if (entry->ip == node->ip)
+   match_count++;
+
+   callchain_cursor_advance(cursor);
+   }
+
+   return match_count == cursor->nr;
+}
+
+static int __symbol_cxt__copy_call_cursor(struct cxt_hist_entry *hist_entry,
+ struct callchain_cursor *cursor)
+{
+   struct callchain_cursor_node *node;
+   struct call_list_entry *entry, *n;
+
+   callchain_cursor_commit(cursor);
+   node = callchain_cursor_current(cursor);
+
+   /*
+* For each entry in the callchain cursor, we need to copy 2 fields: the
+* text addresses and the symbol in which the address is located...
+*/
+   while 

[PATCH 0/3] perf: callchain-specific annotation

2017-04-17 Thread Alexis Berlemont
Hi,

These patches are a proposal to fulfill the following point of perf's TODO list
(https://perf.wiki.kernel.org/index.php/Todo):

* What I want is that if I am on bar*(), it annotates bar*(), no samples just
  the call site (obtained from the callchain) dissassembly. This is useful
  because in many cases there maybe multiple call sites within a function and
  there maybe inlines in between. Hard to track down if you cannot figure out
  the surrounding addresses of the call site. (Request made by Stephane
  Eranian).

These patches are still at an early stage:
* Per-callchain annotation is only available in perf-report;
* Tests were not performed on real-world applications but on small basic ones.

Alexis.


Alexis Berlemont (3):
  perf annotate: implement per-callchain annotation histogram
  perf hists browser: add callchain-specific annotation
  perf report: fill per-callchain symbol annotation histograms

 tools/perf/builtin-report.c|   8 ++
 tools/perf/ui/browsers/hists.c | 188 -
 tools/perf/util/annotate.c | 308 -
 tools/perf/util/annotate.h |  19 +++
 4 files changed, 517 insertions(+), 6 deletions(-)

--
2.12.2



[PATCH 0/3] perf: callchain-specific annotation

2017-04-17 Thread Alexis Berlemont
Hi,

These patches are a proposal to fulfill the following point of perf's TODO list
(https://perf.wiki.kernel.org/index.php/Todo):

* What I want is that if I am on bar*(), it annotates bar*(), no samples just
  the call site (obtained from the callchain) dissassembly. This is useful
  because in many cases there maybe multiple call sites within a function and
  there maybe inlines in between. Hard to track down if you cannot figure out
  the surrounding addresses of the call site. (Request made by Stephane
  Eranian).

These patches are still at an early stage:
* Per-callchain annotation is only available in perf-report;
* Tests were not performed on real-world applications but on small basic ones.

Alexis.


Alexis Berlemont (3):
  perf annotate: implement per-callchain annotation histogram
  perf hists browser: add callchain-specific annotation
  perf report: fill per-callchain symbol annotation histograms

 tools/perf/builtin-report.c|   8 ++
 tools/perf/ui/browsers/hists.c | 188 -
 tools/perf/util/annotate.c | 308 -
 tools/perf/util/annotate.h |  19 +++
 4 files changed, 517 insertions(+), 6 deletions(-)

--
2.12.2



[PATCH] nl80211: Fix enum type of variable in nl80211_put_sta_rate()

2017-04-17 Thread Matthias Kaehlcke
rate_flg is of type 'enum nl80211_attrs', however it is assigned with
'enum nl80211_rate_info' values. Change the type of rate_flg accordingly.

Signed-off-by: Matthias Kaehlcke 
---
 net/wireless/nl80211.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 2312dc2ffdb9..9af21a21ea6b 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -4151,7 +4151,7 @@ static bool nl80211_put_sta_rate(struct sk_buff *msg, 
struct rate_info *info,
struct nlattr *rate;
u32 bitrate;
u16 bitrate_compat;
-   enum nl80211_attrs rate_flg;
+   enum nl80211_rate_info rate_flg;
 
rate = nla_nest_start(msg, attr);
if (!rate)
-- 
2.12.2.762.g0e3151a226-goog



[PATCH] nl80211: Fix enum type of variable in nl80211_put_sta_rate()

2017-04-17 Thread Matthias Kaehlcke
rate_flg is of type 'enum nl80211_attrs', however it is assigned with
'enum nl80211_rate_info' values. Change the type of rate_flg accordingly.

Signed-off-by: Matthias Kaehlcke 
---
 net/wireless/nl80211.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 2312dc2ffdb9..9af21a21ea6b 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -4151,7 +4151,7 @@ static bool nl80211_put_sta_rate(struct sk_buff *msg, 
struct rate_info *info,
struct nlattr *rate;
u32 bitrate;
u16 bitrate_compat;
-   enum nl80211_attrs rate_flg;
+   enum nl80211_rate_info rate_flg;
 
rate = nla_nest_start(msg, attr);
if (!rate)
-- 
2.12.2.762.g0e3151a226-goog



Re: [PATCH v2 08/17] fs: retrofit old error reporting API onto new infrastructure

2017-04-17 Thread NeilBrown
On Wed, Apr 12 2017, Jeff Layton wrote:

> On Thu, 2017-04-13 at 08:14 +1000, NeilBrown wrote:
>> 
>> I suspect that the filemap_check_wb_error() will need to be moved
>> into some parent of the current call site, which is essentially what you
>> suggest below.  It would be nice if we could do that first, rather than
>> having the current rather odd code.  But maybe this way is an easier
>> transition.  It isn't obviously wrong, it just isn't obviously right
>> either.
>> 
>
> Yeah. It's just such a daunting task to have to change so much of the
> existing code. I'm looking for ways to make this simpler.
>
> I think it probably is reasonable for filemap_write_and_wait* to just
> sample it as early as possible in those functions. filemap_fdatawait is
> the real questionable one, as you may have already had some writebacks
> complete with errors.
>
> In any case, my thinking was that the old code is not obviously correct
> either, so while this shortens the "error capture window" on these
> calls, it seems like a reasonable place to start improving things.

I agree.  It wouldn't hurt to add a note to this effect in the patch
comment so that people understand that the code isn't seen to be
"correct" but only "no worse" with clear direction on what sort of
improvement might be appropriate.

Thanks,
NeilBrown


signature.asc
Description: PGP signature


Re: [PATCH] ACPICA: Export mutex functions

2017-04-17 Thread Rafael J. Wysocki
On Tue, Apr 18, 2017 at 12:32 AM, Guenter Roeck  wrote:
> On Mon, Apr 17, 2017 at 11:29:38PM +0200, Rafael J. Wysocki wrote:
>> On Mon, Apr 17, 2017 at 11:03 PM, Guenter Roeck  wrote:
>> > On Mon, Apr 17, 2017 at 08:40:38PM +, Moore, Robert wrote:
>> >>

[cut]

>> >> [Moore, Robert]
>> >>
>> >> I'm not an expert on the _DLM method, but I would point you to the 
>> >> description section in the ACPI spec, 5.7.5 _DLM (DeviceLock Mutex).
>> >>
>> >
>> > I did. However, not being an ACPI expert, that doesn't tell me anything.
>>
>> Basically, if the kernel and AML need to access a device concurrently,
>> there should be a _DLM object under that device in the ACPI tables.
>> In that case it is expected to return a list of (AML) mutexes that can
>> be acquired by the kernel in order to synchronize device access with
>> respect to AML (and for each mutex it may also return a description of
>> the specific resources to be protected by it).
>>
>> Bottom line: without _DLM, the kernel cannot synchronize things with
>> respect to AML properly, because it has no information how to do that
>> then.
>
> That is all quite interesting. I do see the mutex in question used on various
> motherboards from various vendors (I checked boards from Gigabyte, MSI, and
> Intel). Interestingly, the naming seems to be consistent - it is always named
> "MUT0". For the most part, it seems to be available on more recent
> motherboards; older motherboards tend to use the resource without locking.
> However, I don't see any mention of "_DLM" in any of the DSDTs.
>
> At the same time, access to ports 0x2e/0x2f is widely used in the kernel.
> As mentioned before, it is used in watchdog, hardware monitoring, and gpio
> drivers, but also in parallel port and infrared driver code. Effectively
> that means that all this code is inherently unsafe on systems with ACPI
> support.

If AML accesses those ports via operation regions, then it is unsafe.

Note, however, that the mere existence of operation regions covering
them doesn't automatically mean that they are accessed by any AML on
the given platform.

> I had thought about implementing a set of utility functions to make the kernel
> code safer to use if the mutex is found to exist. Right now I wonder, though,
> if such code would have a chance to be accepted. Any thoughts on that ?

You can argue that there should be _DLM objects in the ACPI tables on
those platforms, but they are missing, so platform quirks (or
something else to that effect) are needed to ensure mutual exclusion
between AML accesses and direct accesses in drivers etc.

Unlike in the DT case, we have no influence on what the vendors put
into the ACPI tables on their systems, so we need to live with what is
in there and possibly fix up things as needed.

Thanks,
Rafael


Re: [PATCH v2 08/17] fs: retrofit old error reporting API onto new infrastructure

2017-04-17 Thread NeilBrown
On Wed, Apr 12 2017, Jeff Layton wrote:

> On Thu, 2017-04-13 at 08:14 +1000, NeilBrown wrote:
>> 
>> I suspect that the filemap_check_wb_error() will need to be moved
>> into some parent of the current call site, which is essentially what you
>> suggest below.  It would be nice if we could do that first, rather than
>> having the current rather odd code.  But maybe this way is an easier
>> transition.  It isn't obviously wrong, it just isn't obviously right
>> either.
>> 
>
> Yeah. It's just such a daunting task to have to change so much of the
> existing code. I'm looking for ways to make this simpler.
>
> I think it probably is reasonable for filemap_write_and_wait* to just
> sample it as early as possible in those functions. filemap_fdatawait is
> the real questionable one, as you may have already had some writebacks
> complete with errors.
>
> In any case, my thinking was that the old code is not obviously correct
> either, so while this shortens the "error capture window" on these
> calls, it seems like a reasonable place to start improving things.

I agree.  It wouldn't hurt to add a note to this effect in the patch
comment so that people understand that the code isn't seen to be
"correct" but only "no worse" with clear direction on what sort of
improvement might be appropriate.

Thanks,
NeilBrown


signature.asc
Description: PGP signature


Re: [PATCH] ACPICA: Export mutex functions

2017-04-17 Thread Rafael J. Wysocki
On Tue, Apr 18, 2017 at 12:32 AM, Guenter Roeck  wrote:
> On Mon, Apr 17, 2017 at 11:29:38PM +0200, Rafael J. Wysocki wrote:
>> On Mon, Apr 17, 2017 at 11:03 PM, Guenter Roeck  wrote:
>> > On Mon, Apr 17, 2017 at 08:40:38PM +, Moore, Robert wrote:
>> >>

[cut]

>> >> [Moore, Robert]
>> >>
>> >> I'm not an expert on the _DLM method, but I would point you to the 
>> >> description section in the ACPI spec, 5.7.5 _DLM (DeviceLock Mutex).
>> >>
>> >
>> > I did. However, not being an ACPI expert, that doesn't tell me anything.
>>
>> Basically, if the kernel and AML need to access a device concurrently,
>> there should be a _DLM object under that device in the ACPI tables.
>> In that case it is expected to return a list of (AML) mutexes that can
>> be acquired by the kernel in order to synchronize device access with
>> respect to AML (and for each mutex it may also return a description of
>> the specific resources to be protected by it).
>>
>> Bottom line: without _DLM, the kernel cannot synchronize things with
>> respect to AML properly, because it has no information how to do that
>> then.
>
> That is all quite interesting. I do see the mutex in question used on various
> motherboards from various vendors (I checked boards from Gigabyte, MSI, and
> Intel). Interestingly, the naming seems to be consistent - it is always named
> "MUT0". For the most part, it seems to be available on more recent
> motherboards; older motherboards tend to use the resource without locking.
> However, I don't see any mention of "_DLM" in any of the DSDTs.
>
> At the same time, access to ports 0x2e/0x2f is widely used in the kernel.
> As mentioned before, it is used in watchdog, hardware monitoring, and gpio
> drivers, but also in parallel port and infrared driver code. Effectively
> that means that all this code is inherently unsafe on systems with ACPI
> support.

If AML accesses those ports via operation regions, then it is unsafe.

Note, however, that the mere existence of operation regions covering
them doesn't automatically mean that they are accessed by any AML on
the given platform.

> I had thought about implementing a set of utility functions to make the kernel
> code safer to use if the mutex is found to exist. Right now I wonder, though,
> if such code would have a chance to be accepted. Any thoughts on that ?

You can argue that there should be _DLM objects in the ACPI tables on
those platforms, but they are missing, so platform quirks (or
something else to that effect) are needed to ensure mutual exclusion
between AML accesses and direct accesses in drivers etc.

Unlike in the DT case, we have no influence on what the vendors put
into the ACPI tables on their systems, so we need to live with what is
in there and possibly fix up things as needed.

Thanks,
Rafael


Re: [PATCH v2 07/17] fs: new infrastructure for writeback error handling and reporting

2017-04-17 Thread NeilBrown
On Wed, Apr 12 2017, Jeff Layton wrote:

> On Thu, 2017-04-13 at 07:55 +1000, NeilBrown wrote:
>> On Wed, Apr 12 2017, Jeff Layton wrote:
>> 
>> 
>> > +void __filemap_set_wb_error(struct address_space *mapping, int err)
>> 
>> I was really hoping that this would be
>> 
>>   void __set_wb_error(wb_err_t *wb_err, int err)
>> 
>> so
>> 
>> Then nfs_context_set_write_error could become
>> 
>> static void nfs_context_set_write_error(struct nfs_open_context *ctx, int 
>> error)
>> {
>>  __set_wb_error(>wb_err, error);
>> }
>> 
>> and filemap_set_sb_error() would be:
>> 
>> static inline void filemap_set_wb_error(struct address_space *mapping, int 
>> err)
>> {
>>  /* Optimize for the common case of no error */
>>  if (unlikely(err))
>>  __set_wb_error(>f_wb_err, err);
>> }
>> 
>> Similarly we would have
>>   wb_err_t sample_wb_error(wb_err_t *wb_err)
>>   {
>>...
>>   }
>> 
>> and
>> 
>> wb_err_t filemap_sample_wb_error(struct address_space *mapping)
>> {
>>   return sample_wb_error(>f_wb_err);
>> }
>> 
>> so nfs_file_fsync_commit() could have
>>   ret = sample_wb_error(>wb_err);
>> in place of
>>  ret = xchg(>error, 0);
>> 
>> int filemap_report_wb_error(struct file *file)
>> 
>> would become
>> 
>> int filemap_report_wb_error(struct file *file, wb_err_t *err)
>> 
>> or something.
>> 
>> The address space is just one (obvious) place where the wb error can be
>> stored.  The filesystem might have a different place with finer
>> granularity (nfs already does).
>> 
>> 
>
> I think it'd be much simpler to adapt NFS over to use the new
> infrastructure (I have a draft patch for that already). You'd lose the
> ability to track a different error for each nfs_open_context, but I'm
> not sure how valuable that is anyway. I'll need to think about that
> one...

From a technical perspective, it might be "simpler" but I contest "much
simpler".   I think it would be easy to put one wb_err_t per
nfs_open_context, if the former were designed well (which itself would
be easy).

From a political perspective, I doubt it would be simple.  NFS is the
way it is for a reason, and convincing an author that their reason is
not valid tends to be harder than most technical issues.
(looking to history...
the 'error' field was added to the nfs_open_context in
Commit: 6caf69feb23a ("NFSv2/v3/v4: Place NFS nfs_page shared data into a 
single structurethat hangs off filp->private_data. As a side effect, this 
alsocleans up the NFSv4 private file state info.")

in 2.6.12.  Prior to that file->f_error was used.
Prior to commit 9ffb8c3a1955 ("Import 2.2.3pre1") (which has no comment)
errors were ... interesting.  Look for nfs_check_error in
commit d9c0ffee4db7 ("Import 2.1.128") and notice the use of current->pid!!
All commits from the history.git tree.
)

It is quite possible for an NFS server to return different errors to
different users.  It might be odd, but it is possible.  Should an error
that affects one user pollute all other users?

Thanks,
NeilBrown


signature.asc
Description: PGP signature


Re: [PATCH v2 07/17] fs: new infrastructure for writeback error handling and reporting

2017-04-17 Thread NeilBrown
On Wed, Apr 12 2017, Jeff Layton wrote:

> On Thu, 2017-04-13 at 07:55 +1000, NeilBrown wrote:
>> On Wed, Apr 12 2017, Jeff Layton wrote:
>> 
>> 
>> > +void __filemap_set_wb_error(struct address_space *mapping, int err)
>> 
>> I was really hoping that this would be
>> 
>>   void __set_wb_error(wb_err_t *wb_err, int err)
>> 
>> so
>> 
>> Then nfs_context_set_write_error could become
>> 
>> static void nfs_context_set_write_error(struct nfs_open_context *ctx, int 
>> error)
>> {
>>  __set_wb_error(>wb_err, error);
>> }
>> 
>> and filemap_set_sb_error() would be:
>> 
>> static inline void filemap_set_wb_error(struct address_space *mapping, int 
>> err)
>> {
>>  /* Optimize for the common case of no error */
>>  if (unlikely(err))
>>  __set_wb_error(>f_wb_err, err);
>> }
>> 
>> Similarly we would have
>>   wb_err_t sample_wb_error(wb_err_t *wb_err)
>>   {
>>...
>>   }
>> 
>> and
>> 
>> wb_err_t filemap_sample_wb_error(struct address_space *mapping)
>> {
>>   return sample_wb_error(>f_wb_err);
>> }
>> 
>> so nfs_file_fsync_commit() could have
>>   ret = sample_wb_error(>wb_err);
>> in place of
>>  ret = xchg(>error, 0);
>> 
>> int filemap_report_wb_error(struct file *file)
>> 
>> would become
>> 
>> int filemap_report_wb_error(struct file *file, wb_err_t *err)
>> 
>> or something.
>> 
>> The address space is just one (obvious) place where the wb error can be
>> stored.  The filesystem might have a different place with finer
>> granularity (nfs already does).
>> 
>> 
>
> I think it'd be much simpler to adapt NFS over to use the new
> infrastructure (I have a draft patch for that already). You'd lose the
> ability to track a different error for each nfs_open_context, but I'm
> not sure how valuable that is anyway. I'll need to think about that
> one...

From a technical perspective, it might be "simpler" but I contest "much
simpler".   I think it would be easy to put one wb_err_t per
nfs_open_context, if the former were designed well (which itself would
be easy).

From a political perspective, I doubt it would be simple.  NFS is the
way it is for a reason, and convincing an author that their reason is
not valid tends to be harder than most technical issues.
(looking to history...
the 'error' field was added to the nfs_open_context in
Commit: 6caf69feb23a ("NFSv2/v3/v4: Place NFS nfs_page shared data into a 
single structurethat hangs off filp->private_data. As a side effect, this 
alsocleans up the NFSv4 private file state info.")

in 2.6.12.  Prior to that file->f_error was used.
Prior to commit 9ffb8c3a1955 ("Import 2.2.3pre1") (which has no comment)
errors were ... interesting.  Look for nfs_check_error in
commit d9c0ffee4db7 ("Import 2.1.128") and notice the use of current->pid!!
All commits from the history.git tree.
)

It is quite possible for an NFS server to return different errors to
different users.  It might be odd, but it is possible.  Should an error
that affects one user pollute all other users?

Thanks,
NeilBrown


signature.asc
Description: PGP signature


Re: [PATCH v8] nvme: improve performance for virtual NVMe devices

2017-04-17 Thread Keith Busch
On Fri, Apr 14, 2017 at 03:10:30PM -0300, Helen Koike wrote:
> + Add missing maintainers from scripts/get_maintainer.pl in the email thread
> 
> Hi,
> 
> I would like to know if it would be possible to get this patch for kernel
> 4.12.
> Should I send a pull request? Or do you usually get the patch from the email
> thread?

Hi Helen,

This looks good to me. I pulled this into a branch for the next merge
window here:

http://git.infradead.org/nvme.git/shortlog/refs/heads/for-next

There's only one other commit at the moment in addition to yours, so
I'll just need to make sure we've got everything we want for 4.12 before
submitting our next pull request.


Re: [PATCH v8] nvme: improve performance for virtual NVMe devices

2017-04-17 Thread Keith Busch
On Fri, Apr 14, 2017 at 03:10:30PM -0300, Helen Koike wrote:
> + Add missing maintainers from scripts/get_maintainer.pl in the email thread
> 
> Hi,
> 
> I would like to know if it would be possible to get this patch for kernel
> 4.12.
> Should I send a pull request? Or do you usually get the patch from the email
> thread?

Hi Helen,

This looks good to me. I pulled this into a branch for the next merge
window here:

http://git.infradead.org/nvme.git/shortlog/refs/heads/for-next

There's only one other commit at the moment in addition to yours, so
I'll just need to make sure we've got everything we want for 4.12 before
submitting our next pull request.


[cpu/hotplug] 6362ef376a: [ INFO: possible circular locking dependency detected ]

2017-04-17 Thread kernel test robot
Greetings,

0day kernel testing robot got the below dmesg and the first bad commit is

https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.hotplug

commit 6362ef376aacbc309022ad12f08693d201cdda08
Author: Thomas Gleixner 
AuthorDate: Thu Apr 13 10:50:55 2017 +0200
Commit: Thomas Gleixner 
CommitDate: Mon Apr 17 16:12:27 2017 +0200

cpu/hotplug: Convert hotplug locking to percpu rwsem

There are no more (known) nested calls to get_online_cpus() so it's
possible to remove the nested call magic and convert the mutex to a
percpu-rwsem, which speeds up get/put_online_cpus() significantly for the
uncontended case.

The contended case (write locked for hotplug operations) is slow anyway, so
the slightly more expensive down_write of the percpu rwsem does not matter.

Signed-off-by: Thomas Gleixner 

00ebb15e90  perf/core: Remove redundant get_online_cpus()
6362ef376a  cpu/hotplug: Convert hotplug locking to percpu rwsem
6362ef376a  cpu/hotplug: Convert hotplug locking to percpu rwsem
d36d99770e  Merge branch 'timers/core'
++++++
|| 00ebb15e90 | 6362ef376a 
| 6362ef376a | d36d99770e |
++++++
| boot_successes | 35 | 0  
| 0  | 0  |
| boot_failures  | 0  | 13 
| 13 | 11 |
| INFO:possible_circular_locking_dependency_detected | 0  | 13 
| 13 | 5  |
| INFO:possible_recursive_locking_detected   | 0  | 0  
| 0  | 6  |
++++++

[init] Using pid_max = 32768
[init] Started watchdog process, PID is 8909
[main] Main thread is alive.
[   23.762049] 
[   23.762311] ==
[   23.763199] [ INFO: possible circular locking dependency detected ]
[   23.764101] 4.11.0-rc6-00237-g6362ef3 #1 Not tainted
[   23.764847] ---
[   23.765822] trinity-main/352 is trying to acquire lock:
[   23.766631]  (jump_label_mutex){+.+...}, at: [] 
jump_label_lock+0x12/0x14
[   23.767937] 
[   23.767937] but task is already holding lock:
[   23.768778]  (perf_sched_mutex){+.+...}, at: [] 
perf_event_alloc+0x563/0x65e
[   23.770002] 
[   23.770002] which lock already depends on the new lock.
[   23.770002] 
[   23.771156] 
[   23.771156] the existing dependency chain (in reverse order) is:
[   23.772309] 
[   23.772309] -> #3 (perf_sched_mutex){+.+...}:

  # HH:MM RESULT GOOD 
BAD GOOD_BUT_DIRTY DIRTY_NOT_BAD
git bisect start 98c5b9ad245d1735d95d66e463a44b449122581c 
4f7d029b9bf009fbee76bb10c0c4351a1870d2f3 --
git bisect  bad 9364dcd917be48868feab7d699b90ec8457e85b4  # 03:47  B  0
11   22   0  Merge 'net/master' into devel-catchup-201704180142
git bisect good c5de08f31b482b1ba1cf637d41921e41efe519bd  # 03:56  G 11 
00   0  Merge 
'linux-review/Aneesh-Kumar-K-V/HugeTLB-migration-support-for-PPC64/20170418-011540'
 into devel-catchup-201704180142
git bisect good 7c90451c9c633b4c7fefc29272ba15f18e56b87c  # 04:10  G 11 
00   0  Merge 'net-next/master' into devel-catchup-201704180142
git bisect  bad b5cea3442f45b5c98f45f8225978e0aa05f18da8  # 04:19  B  0 
1   12   0  Merge 'tip/WIP.hotplug' into devel-catchup-201704180142
git bisect good a5cbdf693a60d5b86d4d21dfedd90f17754eb273  # 04:34  G 10 
00   0  ACPI/processor: Fix error handling in __acpi_processor_start()
git bisect good 92b636ebde0776e579cd16d47783b8b755a4c9ea  # 04:42  G 11 
00   0  KVM/PPC/Book3S HV: Use cpuhp_setup_state_nocalls_locked()
git bisect good d21819c3b480fa0d24419f09abef05c3214ce72e  # 04:59  G 11 
00   0  kernel/hotplug: Use stop_machine_locked() in takedown_cpu()
git bisect good 862a8fda91ef0575bdd559158f6db5b6e9a70fe7  # 05:26  G 11 
00   0  PCI: Replace the racy recursion prevention
git bisect good 00ebb15e900ebda8ee9036dddc63e3214cfad22b  # 05:36  G 11 
00   0  perf/core: Remove redundant get_online_cpus()
git bisect  bad 6362ef376aacbc309022ad12f08693d201cdda08  # 05:53  B  0 
2   13   0  cpu/hotplug: Convert hotplug locking to percpu rwsem
# first bad commit: [6362ef376aacbc309022ad12f08693d201cdda08] cpu/hotplug: 
Convert hotplug locking to percpu rwsem
git bisect good 00ebb15e900ebda8ee9036dddc63e3214cfad22b  # 06:00  G 31 
00   0  perf/core: Remove redundant get_online_cpus()
# extra tests with CONFIG_DEBUG_INFO_REDUCED
git bisect  bad 

[cpu/hotplug] 6362ef376a: [ INFO: possible circular locking dependency detected ]

2017-04-17 Thread kernel test robot
Greetings,

0day kernel testing robot got the below dmesg and the first bad commit is

https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.hotplug

commit 6362ef376aacbc309022ad12f08693d201cdda08
Author: Thomas Gleixner 
AuthorDate: Thu Apr 13 10:50:55 2017 +0200
Commit: Thomas Gleixner 
CommitDate: Mon Apr 17 16:12:27 2017 +0200

cpu/hotplug: Convert hotplug locking to percpu rwsem

There are no more (known) nested calls to get_online_cpus() so it's
possible to remove the nested call magic and convert the mutex to a
percpu-rwsem, which speeds up get/put_online_cpus() significantly for the
uncontended case.

The contended case (write locked for hotplug operations) is slow anyway, so
the slightly more expensive down_write of the percpu rwsem does not matter.

Signed-off-by: Thomas Gleixner 

00ebb15e90  perf/core: Remove redundant get_online_cpus()
6362ef376a  cpu/hotplug: Convert hotplug locking to percpu rwsem
6362ef376a  cpu/hotplug: Convert hotplug locking to percpu rwsem
d36d99770e  Merge branch 'timers/core'
++++++
|| 00ebb15e90 | 6362ef376a 
| 6362ef376a | d36d99770e |
++++++
| boot_successes | 35 | 0  
| 0  | 0  |
| boot_failures  | 0  | 13 
| 13 | 11 |
| INFO:possible_circular_locking_dependency_detected | 0  | 13 
| 13 | 5  |
| INFO:possible_recursive_locking_detected   | 0  | 0  
| 0  | 6  |
++++++

[init] Using pid_max = 32768
[init] Started watchdog process, PID is 8909
[main] Main thread is alive.
[   23.762049] 
[   23.762311] ==
[   23.763199] [ INFO: possible circular locking dependency detected ]
[   23.764101] 4.11.0-rc6-00237-g6362ef3 #1 Not tainted
[   23.764847] ---
[   23.765822] trinity-main/352 is trying to acquire lock:
[   23.766631]  (jump_label_mutex){+.+...}, at: [] 
jump_label_lock+0x12/0x14
[   23.767937] 
[   23.767937] but task is already holding lock:
[   23.768778]  (perf_sched_mutex){+.+...}, at: [] 
perf_event_alloc+0x563/0x65e
[   23.770002] 
[   23.770002] which lock already depends on the new lock.
[   23.770002] 
[   23.771156] 
[   23.771156] the existing dependency chain (in reverse order) is:
[   23.772309] 
[   23.772309] -> #3 (perf_sched_mutex){+.+...}:

  # HH:MM RESULT GOOD 
BAD GOOD_BUT_DIRTY DIRTY_NOT_BAD
git bisect start 98c5b9ad245d1735d95d66e463a44b449122581c 
4f7d029b9bf009fbee76bb10c0c4351a1870d2f3 --
git bisect  bad 9364dcd917be48868feab7d699b90ec8457e85b4  # 03:47  B  0
11   22   0  Merge 'net/master' into devel-catchup-201704180142
git bisect good c5de08f31b482b1ba1cf637d41921e41efe519bd  # 03:56  G 11 
00   0  Merge 
'linux-review/Aneesh-Kumar-K-V/HugeTLB-migration-support-for-PPC64/20170418-011540'
 into devel-catchup-201704180142
git bisect good 7c90451c9c633b4c7fefc29272ba15f18e56b87c  # 04:10  G 11 
00   0  Merge 'net-next/master' into devel-catchup-201704180142
git bisect  bad b5cea3442f45b5c98f45f8225978e0aa05f18da8  # 04:19  B  0 
1   12   0  Merge 'tip/WIP.hotplug' into devel-catchup-201704180142
git bisect good a5cbdf693a60d5b86d4d21dfedd90f17754eb273  # 04:34  G 10 
00   0  ACPI/processor: Fix error handling in __acpi_processor_start()
git bisect good 92b636ebde0776e579cd16d47783b8b755a4c9ea  # 04:42  G 11 
00   0  KVM/PPC/Book3S HV: Use cpuhp_setup_state_nocalls_locked()
git bisect good d21819c3b480fa0d24419f09abef05c3214ce72e  # 04:59  G 11 
00   0  kernel/hotplug: Use stop_machine_locked() in takedown_cpu()
git bisect good 862a8fda91ef0575bdd559158f6db5b6e9a70fe7  # 05:26  G 11 
00   0  PCI: Replace the racy recursion prevention
git bisect good 00ebb15e900ebda8ee9036dddc63e3214cfad22b  # 05:36  G 11 
00   0  perf/core: Remove redundant get_online_cpus()
git bisect  bad 6362ef376aacbc309022ad12f08693d201cdda08  # 05:53  B  0 
2   13   0  cpu/hotplug: Convert hotplug locking to percpu rwsem
# first bad commit: [6362ef376aacbc309022ad12f08693d201cdda08] cpu/hotplug: 
Convert hotplug locking to percpu rwsem
git bisect good 00ebb15e900ebda8ee9036dddc63e3214cfad22b  # 06:00  G 31 
00   0  perf/core: Remove redundant get_online_cpus()
# extra tests with CONFIG_DEBUG_INFO_REDUCED
git bisect  bad 6362ef376aacbc309022ad12f08693d201cdda08  # 06:09  B  0  

[PATCH] sparc64: Fix hugepage page table free

2017-04-17 Thread Nitin Gupta
Make sure the start adderess is aligned to PMD_SIZE
boundary when freeing page table backing a hugepage
region. The issue was causing segfaults when a region
backed by 64K pages was unmapped since such a region
is in general not PMD_SIZE aligned.

Signed-off-by: Nitin Gupta 
---
 arch/sparc/mm/hugetlbpage.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c
index ee5273a..7c29d38 100644
--- a/arch/sparc/mm/hugetlbpage.c
+++ b/arch/sparc/mm/hugetlbpage.c
@@ -461,6 +461,22 @@ void hugetlb_free_pgd_range(struct mmu_gather *tlb,
pgd_t *pgd;
unsigned long next;
 
+   addr &= PMD_MASK;
+   if (addr < floor) {
+   addr += PMD_SIZE;
+   if (!addr)
+   return;
+   }
+   if (ceiling) {
+   ceiling &= PMD_MASK;
+   if (!ceiling)
+   return;
+   }
+   if (end - 1 > ceiling - 1)
+   end -= PMD_SIZE;
+   if (addr > end - 1)
+   return;
+
pgd = pgd_offset(tlb->mm, addr);
do {
next = pgd_addr_end(addr, end);
-- 
2.9.2



[PATCH] sparc64: Fix hugepage page table free

2017-04-17 Thread Nitin Gupta
Make sure the start adderess is aligned to PMD_SIZE
boundary when freeing page table backing a hugepage
region. The issue was causing segfaults when a region
backed by 64K pages was unmapped since such a region
is in general not PMD_SIZE aligned.

Signed-off-by: Nitin Gupta 
---
 arch/sparc/mm/hugetlbpage.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c
index ee5273a..7c29d38 100644
--- a/arch/sparc/mm/hugetlbpage.c
+++ b/arch/sparc/mm/hugetlbpage.c
@@ -461,6 +461,22 @@ void hugetlb_free_pgd_range(struct mmu_gather *tlb,
pgd_t *pgd;
unsigned long next;
 
+   addr &= PMD_MASK;
+   if (addr < floor) {
+   addr += PMD_SIZE;
+   if (!addr)
+   return;
+   }
+   if (ceiling) {
+   ceiling &= PMD_MASK;
+   if (!ceiling)
+   return;
+   }
+   if (end - 1 > ceiling - 1)
+   end -= PMD_SIZE;
+   if (addr > end - 1)
+   return;
+
pgd = pgd_offset(tlb->mm, addr);
do {
next = pgd_addr_end(addr, end);
-- 
2.9.2



[PATCH v5 3/3] vfio/type1: Reduce repetitive calls in vfio_pin_pages_remote()

2017-04-17 Thread Alex Williamson
vfio_pin_pages_remote() is typically called to iterate over a range
of memory.  Testing CAP_IPC_LOCK is relatively expensive, so it makes
sense to push it up to the caller, which can then repeatedly call
vfio_pin_pages_remote() using that value.  This can show nearly a 20%
improvement on the worst case path through VFIO_IOMMU_MAP_DMA with
contiguous page mapping disabled.  Testing RLIMIT_MEMLOCK is much more
lightweight, but we bring it along on the same principle and it does
seem to show a marginal improvement.

Signed-off-by: Alex Williamson 
---
 drivers/vfio/vfio_iommu_type1.c |   17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 372e4f626138..8549cb111627 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -380,10 +380,10 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned 
long vaddr,
  * first page and all consecutive pages with the same locking.
  */
 static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
- long npage, unsigned long *pfn_base)
+ long npage, unsigned long *pfn_base,
+ bool lock_cap, unsigned long limit)
 {
-   unsigned long pfn = 0, limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
-   bool lock_cap = capable(CAP_IPC_LOCK);
+   unsigned long pfn = 0;
long ret, pinned = 0, lock_acct = 0;
bool rsvd;
dma_addr_t iova = vaddr - dma->vaddr + dma->iova;
@@ -924,13 +924,15 @@ static int vfio_pin_map_dma(struct vfio_iommu *iommu, 
struct vfio_dma *dma,
unsigned long vaddr = dma->vaddr;
size_t size = map_size;
long npage;
-   unsigned long pfn;
+   unsigned long pfn, limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
+   bool lock_cap = capable(CAP_IPC_LOCK);
int ret = 0;
 
while (size) {
/* Pin a contiguous chunk of memory */
npage = vfio_pin_pages_remote(dma, vaddr + dma->size,
- size >> PAGE_SHIFT, );
+ size >> PAGE_SHIFT, ,
+ lock_cap, limit);
if (npage <= 0) {
WARN_ON(!npage);
ret = (int)npage;
@@ -1040,6 +1042,8 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
 {
struct vfio_domain *d;
struct rb_node *n;
+   unsigned long limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
+   bool lock_cap = capable(CAP_IPC_LOCK);
int ret;
 
/* Arbitrarily pick the first domain in the list for lookups */
@@ -1086,7 +1090,8 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
 
npage = vfio_pin_pages_remote(dma, vaddr,
  n >> PAGE_SHIFT,
- );
+ , lock_cap,
+ limit);
if (npage <= 0) {
WARN_ON(!npage);
ret = (int)npage;



[PATCH v5 3/3] vfio/type1: Reduce repetitive calls in vfio_pin_pages_remote()

2017-04-17 Thread Alex Williamson
vfio_pin_pages_remote() is typically called to iterate over a range
of memory.  Testing CAP_IPC_LOCK is relatively expensive, so it makes
sense to push it up to the caller, which can then repeatedly call
vfio_pin_pages_remote() using that value.  This can show nearly a 20%
improvement on the worst case path through VFIO_IOMMU_MAP_DMA with
contiguous page mapping disabled.  Testing RLIMIT_MEMLOCK is much more
lightweight, but we bring it along on the same principle and it does
seem to show a marginal improvement.

Signed-off-by: Alex Williamson 
---
 drivers/vfio/vfio_iommu_type1.c |   17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 372e4f626138..8549cb111627 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -380,10 +380,10 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned 
long vaddr,
  * first page and all consecutive pages with the same locking.
  */
 static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
- long npage, unsigned long *pfn_base)
+ long npage, unsigned long *pfn_base,
+ bool lock_cap, unsigned long limit)
 {
-   unsigned long pfn = 0, limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
-   bool lock_cap = capable(CAP_IPC_LOCK);
+   unsigned long pfn = 0;
long ret, pinned = 0, lock_acct = 0;
bool rsvd;
dma_addr_t iova = vaddr - dma->vaddr + dma->iova;
@@ -924,13 +924,15 @@ static int vfio_pin_map_dma(struct vfio_iommu *iommu, 
struct vfio_dma *dma,
unsigned long vaddr = dma->vaddr;
size_t size = map_size;
long npage;
-   unsigned long pfn;
+   unsigned long pfn, limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
+   bool lock_cap = capable(CAP_IPC_LOCK);
int ret = 0;
 
while (size) {
/* Pin a contiguous chunk of memory */
npage = vfio_pin_pages_remote(dma, vaddr + dma->size,
- size >> PAGE_SHIFT, );
+ size >> PAGE_SHIFT, ,
+ lock_cap, limit);
if (npage <= 0) {
WARN_ON(!npage);
ret = (int)npage;
@@ -1040,6 +1042,8 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
 {
struct vfio_domain *d;
struct rb_node *n;
+   unsigned long limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
+   bool lock_cap = capable(CAP_IPC_LOCK);
int ret;
 
/* Arbitrarily pick the first domain in the list for lookups */
@@ -1086,7 +1090,8 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
 
npage = vfio_pin_pages_remote(dma, vaddr,
  n >> PAGE_SHIFT,
- );
+ , lock_cap,
+ limit);
if (npage <= 0) {
WARN_ON(!npage);
ret = (int)npage;



[PATCH v5 1/3] vfio/type1: Remove locked page accounting workqueue

2017-04-17 Thread Alex Williamson
If the mmap_sem is contented then the vfio type1 IOMMU backend will
defer locked page accounting updates to a workqueue task.  This has a
few problems and depending on which side the user tries to play, they
might be over-penalized for unmaps that haven't yet been accounted or
race the workqueue to enter more mappings than they're allowed.  The
original intent of this workqueue mechanism seems to be focused on
reducing latency through the ioctl, but we cannot do so at the cost
of correctness.  Remove this workqueue mechanism and update the
callers to allow for failure.  We can also now recheck the limit under
write lock to make sure we don't exceed it.

vfio_pin_pages_remote() also now necessarily includes an unwind path
which we can jump to directly if the consecutive page pinning finds
that we're exceeding the user's memory limits.  This avoids the
current lazy approach which does accounting and mapping up to the
fault, only to return an error on the next iteration to unwind the
entire vfio_dma.

Cc: sta...@vger.kernel.org
Signed-off-by: Alex Williamson 
---
 drivers/vfio/vfio_iommu_type1.c |  110 ++-
 1 file changed, 51 insertions(+), 59 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 32d2633092a3..a8a079ba9477 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -246,69 +246,46 @@ static int vfio_iova_put_vfio_pfn(struct vfio_dma *dma, 
struct vfio_pfn *vpfn)
return ret;
 }
 
-struct vwork {
-   struct mm_struct*mm;
-   longnpage;
-   struct work_struct  work;
-};
-
-/* delayed decrement/increment for locked_vm */
-static void vfio_lock_acct_bg(struct work_struct *work)
-{
-   struct vwork *vwork = container_of(work, struct vwork, work);
-   struct mm_struct *mm;
-
-   mm = vwork->mm;
-   down_write(>mmap_sem);
-   mm->locked_vm += vwork->npage;
-   up_write(>mmap_sem);
-   mmput(mm);
-   kfree(vwork);
-}
-
-static void vfio_lock_acct(struct task_struct *task, long npage)
+static int vfio_lock_acct(struct task_struct *task, long npage, bool *lock_cap)
 {
-   struct vwork *vwork;
struct mm_struct *mm;
bool is_current;
+   int ret;
 
if (!npage)
-   return;
+   return 0;
 
is_current = (task->mm == current->mm);
 
mm = is_current ? task->mm : get_task_mm(task);
if (!mm)
-   return; /* process exited */
+   return -ESRCH; /* process exited */
 
-   if (down_write_trylock(>mmap_sem)) {
-   mm->locked_vm += npage;
-   up_write(>mmap_sem);
-   if (!is_current)
-   mmput(mm);
-   return;
-   }
+   ret = down_write_killable(>mmap_sem);
+   if (!ret) {
+   if (npage > 0) {
+   if (lock_cap ? !*lock_cap :
+   !has_capability(task, CAP_IPC_LOCK)) {
+   unsigned long limit;
+
+   limit = task_rlimit(task,
+   RLIMIT_MEMLOCK) >> PAGE_SHIFT;
+
+   if (mm->locked_vm + npage > limit)
+   ret = -ENOMEM;
+   }
+   }
+
+   if (!ret)
+   mm->locked_vm += npage;
 
-   if (is_current) {
-   mm = get_task_mm(task);
-   if (!mm)
-   return;
+   up_write(>mmap_sem);
}
 
-   /*
-* Couldn't get mmap_sem lock, so must setup to update
-* mm->locked_vm later. If locked_vm were atomic, we
-* wouldn't need this silliness
-*/
-   vwork = kmalloc(sizeof(struct vwork), GFP_KERNEL);
-   if (WARN_ON(!vwork)) {
+   if (!is_current)
mmput(mm);
-   return;
-   }
-   INIT_WORK(>work, vfio_lock_acct_bg);
-   vwork->mm = mm;
-   vwork->npage = npage;
-   schedule_work(>work);
+
+   return ret;
 }
 
 /*
@@ -405,7 +382,7 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned 
long vaddr,
 static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
  long npage, unsigned long *pfn_base)
 {
-   unsigned long limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
+   unsigned long pfn = 0, limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
bool lock_cap = capable(CAP_IPC_LOCK);
long ret, pinned = 0, lock_acct = 0;
bool rsvd;
@@ -442,8 +419,6 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, 
unsigned long vaddr,
/* Lock all the consecutive pages from pfn_base */
for (vaddr += PAGE_SIZE, iova += PAGE_SIZE; pinned < npage;
 pinned++, vaddr += PAGE_SIZE, iova += PAGE_SIZE) {
-   

[PATCH v5 0/3] vfio/type1: Synchronous locked page accounting

2017-04-17 Thread Alex Williamson
v5:
  - patch 1/ Use bool* to cleanup vfio_lock_acct() callers; sorry
 we cannot re-test CAP_IPC_LOCK for all callers
  - patch 2/ Re-add pr_warn, add Kirti's R-b
  - patch 3/ NEW, analyzing impact of vfio_lock_acct() testing
 CAP_IPC_LOCK for all callers revealed a long hanging
 optimization

Thanks for the reviews, keep 'em coming,
Alex

---

Alex Williamson (3):
  vfio/type1: Remove locked page accounting workqueue
  vfio/type1: Prune vfio_pin_page_external()
  vfio/type1: Reduce repetitive calls in vfio_pin_pages_remote()


 drivers/vfio/vfio_iommu_type1.c |  150 +--
 1 file changed, 64 insertions(+), 86 deletions(-)


[PATCH v5 0/3] vfio/type1: Synchronous locked page accounting

2017-04-17 Thread Alex Williamson
v5:
  - patch 1/ Use bool* to cleanup vfio_lock_acct() callers; sorry
 we cannot re-test CAP_IPC_LOCK for all callers
  - patch 2/ Re-add pr_warn, add Kirti's R-b
  - patch 3/ NEW, analyzing impact of vfio_lock_acct() testing
 CAP_IPC_LOCK for all callers revealed a long hanging
 optimization

Thanks for the reviews, keep 'em coming,
Alex

---

Alex Williamson (3):
  vfio/type1: Remove locked page accounting workqueue
  vfio/type1: Prune vfio_pin_page_external()
  vfio/type1: Reduce repetitive calls in vfio_pin_pages_remote()


 drivers/vfio/vfio_iommu_type1.c |  150 +--
 1 file changed, 64 insertions(+), 86 deletions(-)


[PATCH v5 1/3] vfio/type1: Remove locked page accounting workqueue

2017-04-17 Thread Alex Williamson
If the mmap_sem is contented then the vfio type1 IOMMU backend will
defer locked page accounting updates to a workqueue task.  This has a
few problems and depending on which side the user tries to play, they
might be over-penalized for unmaps that haven't yet been accounted or
race the workqueue to enter more mappings than they're allowed.  The
original intent of this workqueue mechanism seems to be focused on
reducing latency through the ioctl, but we cannot do so at the cost
of correctness.  Remove this workqueue mechanism and update the
callers to allow for failure.  We can also now recheck the limit under
write lock to make sure we don't exceed it.

vfio_pin_pages_remote() also now necessarily includes an unwind path
which we can jump to directly if the consecutive page pinning finds
that we're exceeding the user's memory limits.  This avoids the
current lazy approach which does accounting and mapping up to the
fault, only to return an error on the next iteration to unwind the
entire vfio_dma.

Cc: sta...@vger.kernel.org
Signed-off-by: Alex Williamson 
---
 drivers/vfio/vfio_iommu_type1.c |  110 ++-
 1 file changed, 51 insertions(+), 59 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 32d2633092a3..a8a079ba9477 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -246,69 +246,46 @@ static int vfio_iova_put_vfio_pfn(struct vfio_dma *dma, 
struct vfio_pfn *vpfn)
return ret;
 }
 
-struct vwork {
-   struct mm_struct*mm;
-   longnpage;
-   struct work_struct  work;
-};
-
-/* delayed decrement/increment for locked_vm */
-static void vfio_lock_acct_bg(struct work_struct *work)
-{
-   struct vwork *vwork = container_of(work, struct vwork, work);
-   struct mm_struct *mm;
-
-   mm = vwork->mm;
-   down_write(>mmap_sem);
-   mm->locked_vm += vwork->npage;
-   up_write(>mmap_sem);
-   mmput(mm);
-   kfree(vwork);
-}
-
-static void vfio_lock_acct(struct task_struct *task, long npage)
+static int vfio_lock_acct(struct task_struct *task, long npage, bool *lock_cap)
 {
-   struct vwork *vwork;
struct mm_struct *mm;
bool is_current;
+   int ret;
 
if (!npage)
-   return;
+   return 0;
 
is_current = (task->mm == current->mm);
 
mm = is_current ? task->mm : get_task_mm(task);
if (!mm)
-   return; /* process exited */
+   return -ESRCH; /* process exited */
 
-   if (down_write_trylock(>mmap_sem)) {
-   mm->locked_vm += npage;
-   up_write(>mmap_sem);
-   if (!is_current)
-   mmput(mm);
-   return;
-   }
+   ret = down_write_killable(>mmap_sem);
+   if (!ret) {
+   if (npage > 0) {
+   if (lock_cap ? !*lock_cap :
+   !has_capability(task, CAP_IPC_LOCK)) {
+   unsigned long limit;
+
+   limit = task_rlimit(task,
+   RLIMIT_MEMLOCK) >> PAGE_SHIFT;
+
+   if (mm->locked_vm + npage > limit)
+   ret = -ENOMEM;
+   }
+   }
+
+   if (!ret)
+   mm->locked_vm += npage;
 
-   if (is_current) {
-   mm = get_task_mm(task);
-   if (!mm)
-   return;
+   up_write(>mmap_sem);
}
 
-   /*
-* Couldn't get mmap_sem lock, so must setup to update
-* mm->locked_vm later. If locked_vm were atomic, we
-* wouldn't need this silliness
-*/
-   vwork = kmalloc(sizeof(struct vwork), GFP_KERNEL);
-   if (WARN_ON(!vwork)) {
+   if (!is_current)
mmput(mm);
-   return;
-   }
-   INIT_WORK(>work, vfio_lock_acct_bg);
-   vwork->mm = mm;
-   vwork->npage = npage;
-   schedule_work(>work);
+
+   return ret;
 }
 
 /*
@@ -405,7 +382,7 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned 
long vaddr,
 static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
  long npage, unsigned long *pfn_base)
 {
-   unsigned long limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
+   unsigned long pfn = 0, limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
bool lock_cap = capable(CAP_IPC_LOCK);
long ret, pinned = 0, lock_acct = 0;
bool rsvd;
@@ -442,8 +419,6 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, 
unsigned long vaddr,
/* Lock all the consecutive pages from pfn_base */
for (vaddr += PAGE_SIZE, iova += PAGE_SIZE; pinned < npage;
 pinned++, vaddr += PAGE_SIZE, iova += PAGE_SIZE) {
-   unsigned long pfn = 0;
-
   

[PATCH v5 2/3] vfio/type1: Prune vfio_pin_page_external()

2017-04-17 Thread Alex Williamson
With vfio_lock_acct() testing the locked memory limit under mmap_sem,
it's redundant to do it here for a single page.  We can also reorder
our tests such that we can avoid testing for reserved pages if we're
not doing accounting and let vfio_lock_acct() test the process
CAP_IPC_LOCK.  Finally, this function oddly returns 1 on success.
Update to return zero on success, -errno on error.  Since the function
only pins a single page, there's no need to return the number of pages
pinned.

N.B. vfio_pin_pages_remote() can pin a large contiguous range of pages
before calling vfio_lock_acct().  If we were to similarly remove the
extra test there, a user could temporarily pin far more pages than
they're allowed.

Suggested-by: Kirti Wankhede 
Suggested-by: Eric Auger 
Reviewed-by: Kirti Wankhede 
Signed-off-by: Alex Williamson 
---
 drivers/vfio/vfio_iommu_type1.c |   35 ---
 1 file changed, 8 insertions(+), 27 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index a8a079ba9477..372e4f626138 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -482,43 +482,26 @@ static long vfio_unpin_pages_remote(struct vfio_dma *dma, 
dma_addr_t iova,
 static int vfio_pin_page_external(struct vfio_dma *dma, unsigned long vaddr,
  unsigned long *pfn_base, bool do_accounting)
 {
-   unsigned long limit;
-   bool lock_cap = has_capability(dma->task, CAP_IPC_LOCK);
struct mm_struct *mm;
int ret;
-   bool rsvd;
 
mm = get_task_mm(dma->task);
if (!mm)
return -ENODEV;
 
ret = vaddr_get_pfn(mm, vaddr, dma->prot, pfn_base);
-   if (ret)
-   goto pin_page_exit;
-
-   rsvd = is_invalid_reserved_pfn(*pfn_base);
-   limit = task_rlimit(dma->task, RLIMIT_MEMLOCK) >> PAGE_SHIFT;
-
-   if (!rsvd && !lock_cap && mm->locked_vm + 1 > limit) {
-   put_pfn(*pfn_base, dma->prot);
-   pr_warn("%s: Task %s (%d) RLIMIT_MEMLOCK (%ld) exceeded\n",
-   __func__, dma->task->comm, task_pid_nr(dma->task),
-   limit << PAGE_SHIFT);
-   ret = -ENOMEM;
-   goto pin_page_exit;
-   }
-
-   if (!rsvd && do_accounting) {
-   ret = vfio_lock_acct(dma->task, 1, _cap);
+   if (!ret && do_accounting && !is_invalid_reserved_pfn(*pfn_base)) {
+   ret = vfio_lock_acct(dma->task, 1, NULL);
if (ret) {
put_pfn(*pfn_base, dma->prot);
-   goto pin_page_exit;
+   if (ret == -ENOMEM)
+   pr_warn("%s: Task %s (%d) RLIMIT_MEMLOCK "
+   "(%ld) exceeded\n", __func__,
+   dma->task->comm, task_pid_nr(dma->task),
+   task_rlimit(dma->task, RLIMIT_MEMLOCK));
}
}
 
-   ret = 1;
-
-pin_page_exit:
mmput(mm);
return ret;
 }
@@ -598,10 +581,8 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
remote_vaddr = dma->vaddr + iova - dma->iova;
ret = vfio_pin_page_external(dma, remote_vaddr, _pfn[i],
 do_accounting);
-   if (ret <= 0) {
-   WARN_ON(!ret);
+   if (ret)
goto pin_unwind;
-   }
 
ret = vfio_add_to_pfn_list(dma, iova, phys_pfn[i]);
if (ret) {



[PATCH v5 2/3] vfio/type1: Prune vfio_pin_page_external()

2017-04-17 Thread Alex Williamson
With vfio_lock_acct() testing the locked memory limit under mmap_sem,
it's redundant to do it here for a single page.  We can also reorder
our tests such that we can avoid testing for reserved pages if we're
not doing accounting and let vfio_lock_acct() test the process
CAP_IPC_LOCK.  Finally, this function oddly returns 1 on success.
Update to return zero on success, -errno on error.  Since the function
only pins a single page, there's no need to return the number of pages
pinned.

N.B. vfio_pin_pages_remote() can pin a large contiguous range of pages
before calling vfio_lock_acct().  If we were to similarly remove the
extra test there, a user could temporarily pin far more pages than
they're allowed.

Suggested-by: Kirti Wankhede 
Suggested-by: Eric Auger 
Reviewed-by: Kirti Wankhede 
Signed-off-by: Alex Williamson 
---
 drivers/vfio/vfio_iommu_type1.c |   35 ---
 1 file changed, 8 insertions(+), 27 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index a8a079ba9477..372e4f626138 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -482,43 +482,26 @@ static long vfio_unpin_pages_remote(struct vfio_dma *dma, 
dma_addr_t iova,
 static int vfio_pin_page_external(struct vfio_dma *dma, unsigned long vaddr,
  unsigned long *pfn_base, bool do_accounting)
 {
-   unsigned long limit;
-   bool lock_cap = has_capability(dma->task, CAP_IPC_LOCK);
struct mm_struct *mm;
int ret;
-   bool rsvd;
 
mm = get_task_mm(dma->task);
if (!mm)
return -ENODEV;
 
ret = vaddr_get_pfn(mm, vaddr, dma->prot, pfn_base);
-   if (ret)
-   goto pin_page_exit;
-
-   rsvd = is_invalid_reserved_pfn(*pfn_base);
-   limit = task_rlimit(dma->task, RLIMIT_MEMLOCK) >> PAGE_SHIFT;
-
-   if (!rsvd && !lock_cap && mm->locked_vm + 1 > limit) {
-   put_pfn(*pfn_base, dma->prot);
-   pr_warn("%s: Task %s (%d) RLIMIT_MEMLOCK (%ld) exceeded\n",
-   __func__, dma->task->comm, task_pid_nr(dma->task),
-   limit << PAGE_SHIFT);
-   ret = -ENOMEM;
-   goto pin_page_exit;
-   }
-
-   if (!rsvd && do_accounting) {
-   ret = vfio_lock_acct(dma->task, 1, _cap);
+   if (!ret && do_accounting && !is_invalid_reserved_pfn(*pfn_base)) {
+   ret = vfio_lock_acct(dma->task, 1, NULL);
if (ret) {
put_pfn(*pfn_base, dma->prot);
-   goto pin_page_exit;
+   if (ret == -ENOMEM)
+   pr_warn("%s: Task %s (%d) RLIMIT_MEMLOCK "
+   "(%ld) exceeded\n", __func__,
+   dma->task->comm, task_pid_nr(dma->task),
+   task_rlimit(dma->task, RLIMIT_MEMLOCK));
}
}
 
-   ret = 1;
-
-pin_page_exit:
mmput(mm);
return ret;
 }
@@ -598,10 +581,8 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
remote_vaddr = dma->vaddr + iova - dma->iova;
ret = vfio_pin_page_external(dma, remote_vaddr, _pfn[i],
 do_accounting);
-   if (ret <= 0) {
-   WARN_ON(!ret);
+   if (ret)
goto pin_unwind;
-   }
 
ret = vfio_add_to_pfn_list(dma, iova, phys_pfn[i]);
if (ret) {



Re: bug? dwc2: insufficient fifo memory

2017-04-17 Thread John Stultz
On Fri, Feb 24, 2017 at 2:46 PM, John Stultz  wrote:
> Hey John,
>   So after the USB tree landed in 4.11-rc, I've been seeing the
> following warning at bootup.
>
> It seems the fifo_mem/total_fifo_size value on hikey is 1920, but I'm
> seeing the addresses zip upward quickly as the txfsz values are all
> 2048.  Not exactly sure whats wrong here.  Things still seem to work
> properly.
>
> thanks
> -john
>
>
> [8.944987] dwc2 f72c.usb: bound driver configfs-gadget
> [8.956651] insufficient fifo memory
> [8.956703] [ cut here ]
> [8.964906] WARNING: CPU: 7 PID: 1 at drivers/usb/dwc2/gadget.c:330
> dwc2_hsotg_init_fifo+0x1a8/0x1c8


Hey John,
   So I finally got a bit of time to look deeper into this, and it
seems like this issue was introduced by commit 3c6aea7344c3 ("usb:
dwc2: gadget: Add checking for g-tx-fifo-size parameter"), as that
change added the following snippit:

   if (hsotg->params.g_tx_fifo_size[fifo] < min ||
   hsotg->params.g_tx_fifo_size[fifo] >  dptxfszn) {
   dev_warn(hsotg->dev, "%s: Invalid parameter
g_tx_fifo_size[%d]=%d\n",
__func__, fifo,
hsotg->params.g_tx_fifo_size[fifo]);
   hsotg->params.g_tx_fifo_size[fifo] = dptxfszn;
   }

On HiKey, we have g-tx-fifo-size = <128 128 128 128 128 128> in the
dtsi, and the fifo_mem value ends up initialized at 1920.

Unfortunately, in the above, it sees other entries in the
g_tx_fifo_size[] array are initialized to zero, which then causes them
to be set to the "default" value of dptxfszn which is 2048.  So then
later in dwc2_hsotg_init_fifo() the addr value (which adds the
fifo_size array value each step) quickly grows beyond the fifo_mem
value, causing the warning.

Not sure what the right fix is here? Should the min value be used
instead of the "default" dptxfszn value?  Again, I'm not sure I see
any side-effects from this warning, but wanted to try to figure out
how to fix it properly.

thanks
-john


Re: bug? dwc2: insufficient fifo memory

2017-04-17 Thread John Stultz
On Fri, Feb 24, 2017 at 2:46 PM, John Stultz  wrote:
> Hey John,
>   So after the USB tree landed in 4.11-rc, I've been seeing the
> following warning at bootup.
>
> It seems the fifo_mem/total_fifo_size value on hikey is 1920, but I'm
> seeing the addresses zip upward quickly as the txfsz values are all
> 2048.  Not exactly sure whats wrong here.  Things still seem to work
> properly.
>
> thanks
> -john
>
>
> [8.944987] dwc2 f72c.usb: bound driver configfs-gadget
> [8.956651] insufficient fifo memory
> [8.956703] [ cut here ]
> [8.964906] WARNING: CPU: 7 PID: 1 at drivers/usb/dwc2/gadget.c:330
> dwc2_hsotg_init_fifo+0x1a8/0x1c8


Hey John,
   So I finally got a bit of time to look deeper into this, and it
seems like this issue was introduced by commit 3c6aea7344c3 ("usb:
dwc2: gadget: Add checking for g-tx-fifo-size parameter"), as that
change added the following snippit:

   if (hsotg->params.g_tx_fifo_size[fifo] < min ||
   hsotg->params.g_tx_fifo_size[fifo] >  dptxfszn) {
   dev_warn(hsotg->dev, "%s: Invalid parameter
g_tx_fifo_size[%d]=%d\n",
__func__, fifo,
hsotg->params.g_tx_fifo_size[fifo]);
   hsotg->params.g_tx_fifo_size[fifo] = dptxfszn;
   }

On HiKey, we have g-tx-fifo-size = <128 128 128 128 128 128> in the
dtsi, and the fifo_mem value ends up initialized at 1920.

Unfortunately, in the above, it sees other entries in the
g_tx_fifo_size[] array are initialized to zero, which then causes them
to be set to the "default" value of dptxfszn which is 2048.  So then
later in dwc2_hsotg_init_fifo() the addr value (which adds the
fifo_size array value each step) quickly grows beyond the fifo_mem
value, causing the warning.

Not sure what the right fix is here? Should the min value be used
instead of the "default" dptxfszn value?  Again, I'm not sure I see
any side-effects from this warning, but wanted to try to figure out
how to fix it properly.

thanks
-john


Re: [PATCH] ACPICA: Export mutex functions

2017-04-17 Thread Guenter Roeck
On Mon, Apr 17, 2017 at 11:29:38PM +0200, Rafael J. Wysocki wrote:
> On Mon, Apr 17, 2017 at 11:03 PM, Guenter Roeck  wrote:
> > On Mon, Apr 17, 2017 at 08:40:38PM +, Moore, Robert wrote:
> >>
> >>
> >> > -Original Message-
> >> > From: Guenter Roeck [mailto:li...@roeck-us.net]
> >> > Sent: Monday, April 17, 2017 12:45 PM
> >> > To: Moore, Robert 
> >> > Cc: Zheng, Lv ; Wysocki, Rafael J
> >> > ; 'Len Brown' ; 'linux-
> >> > a...@vger.kernel.org' ; 'de...@acpica.org'
> >> > ; 'linux-kernel@vger.kernel.org'  >> > ker...@vger.kernel.org>; Box, David E 
> >> > Subject: Re: [PATCH] ACPICA: Export mutex functions
> >> >
> >> > On Mon, Apr 17, 2017 at 07:27:37PM +, Moore, Robert wrote:
> >> > >
> >> > > > -Original Message-
> >> > > > From: Moore, Robert
> >> > > > Sent: Monday, April 17, 2017 10:13 AM
> >> > > > To: Guenter Roeck ; Zheng, Lv
> >> > > > 
> >> > > > Cc: Wysocki, Rafael J ; Len Brown
> >> > > > ; linux-a...@vger.kernel.org; de...@acpica.org;
> >> > > > linux- ker...@vger.kernel.org
> >> > > > Subject: RE: [PATCH] ACPICA: Export mutex functions
> >> > > >
> >> > > > There is a model for the drivers to directly acquire an AML mutex
> >> > > > object. That is why the acquire/release public interfaces were added
> >> > > > to ACPICA.
> >> > > >
> >> > > > I forget all of the details, but the model was developed with MS and
> >> > > > others during the ACPI 6.0 timeframe.
> >> > > >
> >> > > >
> >> > > [Moore, Robert]
> >> > >
> >> > >
> >> > > Here is the case where the OS may need to directly acquire an AML
> >> > mutex:
> >> > >
> >> > > From the ACPI spec:
> >> > >
> >> > > 19.6.2 Acquire (Acquire a Mutex)
> >> > >
> >> > > Note: For Mutex objects referenced by a _DLM object, the host OS may
> >> > also contend for ownership.
> >> > >
> >> > From the context in the dsdt, and from description of expected use cases
> >> > for _DLM objects I can find, this is what the mutex is used for (to
> >> > serialize access to a resource on a low pin count serial interconnect,
> >> > aka LPC).
> >> >
> >> > What does that mean in practice ? That I am not supposed to use it
> >> > because it doesn't follow standard ACPI mutex declaration rules ?
> >> >
> >> > Thanks,
> >> > Guenter
> >> >
> >> > >
> >> [Moore, Robert]
> >>
> >> I'm not an expert on the _DLM method, but I would point you to the 
> >> description section in the ACPI spec, 5.7.5 _DLM (DeviceLock Mutex).
> >>
> >
> > I did. However, not being an ACPI expert, that doesn't tell me anything.
> 
> Basically, if the kernel and AML need to access a device concurrently,
> there should be a _DLM object under that device in the ACPI tables.
> In that case it is expected to return a list of (AML) mutexes that can
> be acquired by the kernel in order to synchronize device access with
> respect to AML (and for each mutex it may also return a description of
> the specific resources to be protected by it).
> 
> Bottom line: without _DLM, the kernel cannot synchronize things with
> respect to AML properly, because it has no information how to do that
> then.

That is all quite interesting. I do see the mutex in question used on various
motherboards from various vendors (I checked boards from Gigabyte, MSI, and
Intel). Interestingly, the naming seems to be consistent - it is always named
"MUT0". For the most part, it seems to be available on more recent
motherboards; older motherboards tend to use the resource without locking.
However, I don't see any mention of "_DLM" in any of the DSDTs.

At the same time, access to ports 0x2e/0x2f is widely used in the kernel.
As mentioned before, it is used in watchdog, hardware monitoring, and gpio
drivers, but also in parallel port and infrared driver code. Effectively
that means that all this code is inherently unsafe on systems with ACPI
support.

I had thought about implementing a set of utility functions to make the kernel
code safer to use if the mutex is found to exist. Right now I wonder, though,
if such code would have a chance to be accepted. Any thoughts on that ?

Thanks,
Guenter

> 
> Thanks,
> Rafael


Re: [PATCH] ACPICA: Export mutex functions

2017-04-17 Thread Guenter Roeck
On Mon, Apr 17, 2017 at 11:29:38PM +0200, Rafael J. Wysocki wrote:
> On Mon, Apr 17, 2017 at 11:03 PM, Guenter Roeck  wrote:
> > On Mon, Apr 17, 2017 at 08:40:38PM +, Moore, Robert wrote:
> >>
> >>
> >> > -Original Message-
> >> > From: Guenter Roeck [mailto:li...@roeck-us.net]
> >> > Sent: Monday, April 17, 2017 12:45 PM
> >> > To: Moore, Robert 
> >> > Cc: Zheng, Lv ; Wysocki, Rafael J
> >> > ; 'Len Brown' ; 'linux-
> >> > a...@vger.kernel.org' ; 'de...@acpica.org'
> >> > ; 'linux-kernel@vger.kernel.org'  >> > ker...@vger.kernel.org>; Box, David E 
> >> > Subject: Re: [PATCH] ACPICA: Export mutex functions
> >> >
> >> > On Mon, Apr 17, 2017 at 07:27:37PM +, Moore, Robert wrote:
> >> > >
> >> > > > -Original Message-
> >> > > > From: Moore, Robert
> >> > > > Sent: Monday, April 17, 2017 10:13 AM
> >> > > > To: Guenter Roeck ; Zheng, Lv
> >> > > > 
> >> > > > Cc: Wysocki, Rafael J ; Len Brown
> >> > > > ; linux-a...@vger.kernel.org; de...@acpica.org;
> >> > > > linux- ker...@vger.kernel.org
> >> > > > Subject: RE: [PATCH] ACPICA: Export mutex functions
> >> > > >
> >> > > > There is a model for the drivers to directly acquire an AML mutex
> >> > > > object. That is why the acquire/release public interfaces were added
> >> > > > to ACPICA.
> >> > > >
> >> > > > I forget all of the details, but the model was developed with MS and
> >> > > > others during the ACPI 6.0 timeframe.
> >> > > >
> >> > > >
> >> > > [Moore, Robert]
> >> > >
> >> > >
> >> > > Here is the case where the OS may need to directly acquire an AML
> >> > mutex:
> >> > >
> >> > > From the ACPI spec:
> >> > >
> >> > > 19.6.2 Acquire (Acquire a Mutex)
> >> > >
> >> > > Note: For Mutex objects referenced by a _DLM object, the host OS may
> >> > also contend for ownership.
> >> > >
> >> > From the context in the dsdt, and from description of expected use cases
> >> > for _DLM objects I can find, this is what the mutex is used for (to
> >> > serialize access to a resource on a low pin count serial interconnect,
> >> > aka LPC).
> >> >
> >> > What does that mean in practice ? That I am not supposed to use it
> >> > because it doesn't follow standard ACPI mutex declaration rules ?
> >> >
> >> > Thanks,
> >> > Guenter
> >> >
> >> > >
> >> [Moore, Robert]
> >>
> >> I'm not an expert on the _DLM method, but I would point you to the 
> >> description section in the ACPI spec, 5.7.5 _DLM (DeviceLock Mutex).
> >>
> >
> > I did. However, not being an ACPI expert, that doesn't tell me anything.
> 
> Basically, if the kernel and AML need to access a device concurrently,
> there should be a _DLM object under that device in the ACPI tables.
> In that case it is expected to return a list of (AML) mutexes that can
> be acquired by the kernel in order to synchronize device access with
> respect to AML (and for each mutex it may also return a description of
> the specific resources to be protected by it).
> 
> Bottom line: without _DLM, the kernel cannot synchronize things with
> respect to AML properly, because it has no information how to do that
> then.

That is all quite interesting. I do see the mutex in question used on various
motherboards from various vendors (I checked boards from Gigabyte, MSI, and
Intel). Interestingly, the naming seems to be consistent - it is always named
"MUT0". For the most part, it seems to be available on more recent
motherboards; older motherboards tend to use the resource without locking.
However, I don't see any mention of "_DLM" in any of the DSDTs.

At the same time, access to ports 0x2e/0x2f is widely used in the kernel.
As mentioned before, it is used in watchdog, hardware monitoring, and gpio
drivers, but also in parallel port and infrared driver code. Effectively
that means that all this code is inherently unsafe on systems with ACPI
support.

I had thought about implementing a set of utility functions to make the kernel
code safer to use if the mutex is found to exist. Right now I wonder, though,
if such code would have a chance to be accepted. Any thoughts on that ?

Thanks,
Guenter

> 
> Thanks,
> Rafael


Re: [PATCH] slab: avoid IPIs when creating kmem caches

2017-04-17 Thread David Rientjes
On Sun, 16 Apr 2017, Greg Thelen wrote:

> Each slab kmem cache has per cpu array caches.  The array caches are
> created when the kmem_cache is created, either via kmem_cache_create()
> or lazily when the first object is allocated in context of a kmem
> enabled memcg.  Array caches are replaced by writing to /proc/slabinfo.
> 
> Array caches are protected by holding slab_mutex or disabling
> interrupts.  Array cache allocation and replacement is done by
> __do_tune_cpucache() which holds slab_mutex and calls
> kick_all_cpus_sync() to interrupt all remote processors which confirms
> there are no references to the old array caches.
> 
> IPIs are needed when replacing array caches.  But when creating a new
> array cache, there's no need to send IPIs because there cannot be any
> references to the new cache.  Outside of memcg kmem accounting these
> IPIs occur at boot time, so they're not a problem.  But with memcg kmem
> accounting each container can create kmem caches, so the IPIs are
> wasteful.
> 
> Avoid unnecessary IPIs when creating array caches.
> 
> Test which reports the IPI count of allocating slab in 1 memcg:
>   import os
> 
>   def ipi_count():
>   with open("/proc/interrupts") as f:
>   for l in f:
>   if 'Function call interrupts' in l:
>   return int(l.split()[1])
> 
>   def echo(val, path):
>   with open(path, "w") as f:
>   f.write(val)
> 
>   n = 1
>   os.chdir("/mnt/cgroup/memory")
>   pid = str(os.getpid())
>   a = ipi_count()
>   for i in range(n):
>   os.mkdir(str(i))
>   echo("1G\n", "%d/memory.limit_in_bytes" % i)
>   echo("1G\n", "%d/memory.kmem.limit_in_bytes" % i)
>   echo(pid, "%d/cgroup.procs" % i)
>   open("/tmp/x", "w").close()
>   os.unlink("/tmp/x")
>   b = ipi_count()
>   print "%d loops: %d => %d (+%d ipis)" % (n, a, b, b-a)
>   echo(pid, "cgroup.procs")
>   for i in range(n):
>   os.rmdir(str(i))
> 
> patched:   1 loops: 1069 => 1170 (+101 ipis)
> unpatched: 1 loops: 1192 => 48933 (+47741 ipis)
> 
> Signed-off-by: Greg Thelen 

Acked-by: David Rientjes 


Re: [PATCH] slab: avoid IPIs when creating kmem caches

2017-04-17 Thread David Rientjes
On Sun, 16 Apr 2017, Greg Thelen wrote:

> Each slab kmem cache has per cpu array caches.  The array caches are
> created when the kmem_cache is created, either via kmem_cache_create()
> or lazily when the first object is allocated in context of a kmem
> enabled memcg.  Array caches are replaced by writing to /proc/slabinfo.
> 
> Array caches are protected by holding slab_mutex or disabling
> interrupts.  Array cache allocation and replacement is done by
> __do_tune_cpucache() which holds slab_mutex and calls
> kick_all_cpus_sync() to interrupt all remote processors which confirms
> there are no references to the old array caches.
> 
> IPIs are needed when replacing array caches.  But when creating a new
> array cache, there's no need to send IPIs because there cannot be any
> references to the new cache.  Outside of memcg kmem accounting these
> IPIs occur at boot time, so they're not a problem.  But with memcg kmem
> accounting each container can create kmem caches, so the IPIs are
> wasteful.
> 
> Avoid unnecessary IPIs when creating array caches.
> 
> Test which reports the IPI count of allocating slab in 1 memcg:
>   import os
> 
>   def ipi_count():
>   with open("/proc/interrupts") as f:
>   for l in f:
>   if 'Function call interrupts' in l:
>   return int(l.split()[1])
> 
>   def echo(val, path):
>   with open(path, "w") as f:
>   f.write(val)
> 
>   n = 1
>   os.chdir("/mnt/cgroup/memory")
>   pid = str(os.getpid())
>   a = ipi_count()
>   for i in range(n):
>   os.mkdir(str(i))
>   echo("1G\n", "%d/memory.limit_in_bytes" % i)
>   echo("1G\n", "%d/memory.kmem.limit_in_bytes" % i)
>   echo(pid, "%d/cgroup.procs" % i)
>   open("/tmp/x", "w").close()
>   os.unlink("/tmp/x")
>   b = ipi_count()
>   print "%d loops: %d => %d (+%d ipis)" % (n, a, b, b-a)
>   echo(pid, "cgroup.procs")
>   for i in range(n):
>   os.rmdir(str(i))
> 
> patched:   1 loops: 1069 => 1170 (+101 ipis)
> unpatched: 1 loops: 1192 => 48933 (+47741 ipis)
> 
> Signed-off-by: Greg Thelen 

Acked-by: David Rientjes 


Re: [PATCH] hidma_mgmt_sys: Use devm_kmalloc_array() in hidma_mgmt_init_sys()

2017-04-17 Thread Sinan Kaya
On 4/17/2017 4:55 PM, SF Markus Elfring wrote:
> From: Markus Elfring 
> Date: Mon, 17 Apr 2017 22:42:40 +0200
> 
> * A multiplication for the size determination of a memory allocation
>   indicated that an array data structure should be processed.
>   Thus use the corresponding function "devm_kmalloc_array".
> 
>   This issue was detected by using the Coccinelle software.
> 
> * Delete the local variable "required" which became unnecessary
>   with this refactoring.
> 
> Signed-off-by: Markus Elfring 
> ---
>  drivers/dma/qcom/hidma_mgmt_sys.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/dma/qcom/hidma_mgmt_sys.c 
> b/drivers/dma/qcom/hidma_mgmt_sys.c
> index d61f1068a34b..e7222a0e9ba4 100644
> --- a/drivers/dma/qcom/hidma_mgmt_sys.c
> +++ b/drivers/dma/qcom/hidma_mgmt_sys.c
> @@ -245,11 +245,12 @@ int hidma_mgmt_init_sys(struct hidma_mgmt_dev *mdev)
>  {
>   unsigned int i;
>   int rc;
> - int required;
>   struct kobject *chanops;
>  
> - required = sizeof(*mdev->chroots) * mdev->dma_channels;
> - mdev->chroots = devm_kmalloc(>pdev->dev, required, GFP_KERNEL);
> + mdev->chroots = devm_kmalloc_array(>pdev->dev,
> +mdev->dma_channels,
> +sizeof(*mdev->chroots),
> +GFP_KERNEL);
>   if (!mdev->chroots)
>   return -ENOMEM;
>  
> 

Reviewed-by: Sinan Kaya 



-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.


Re: [PATCH] hidma_mgmt_sys: Use devm_kmalloc_array() in hidma_mgmt_init_sys()

2017-04-17 Thread Sinan Kaya
On 4/17/2017 4:55 PM, SF Markus Elfring wrote:
> From: Markus Elfring 
> Date: Mon, 17 Apr 2017 22:42:40 +0200
> 
> * A multiplication for the size determination of a memory allocation
>   indicated that an array data structure should be processed.
>   Thus use the corresponding function "devm_kmalloc_array".
> 
>   This issue was detected by using the Coccinelle software.
> 
> * Delete the local variable "required" which became unnecessary
>   with this refactoring.
> 
> Signed-off-by: Markus Elfring 
> ---
>  drivers/dma/qcom/hidma_mgmt_sys.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/dma/qcom/hidma_mgmt_sys.c 
> b/drivers/dma/qcom/hidma_mgmt_sys.c
> index d61f1068a34b..e7222a0e9ba4 100644
> --- a/drivers/dma/qcom/hidma_mgmt_sys.c
> +++ b/drivers/dma/qcom/hidma_mgmt_sys.c
> @@ -245,11 +245,12 @@ int hidma_mgmt_init_sys(struct hidma_mgmt_dev *mdev)
>  {
>   unsigned int i;
>   int rc;
> - int required;
>   struct kobject *chanops;
>  
> - required = sizeof(*mdev->chroots) * mdev->dma_channels;
> - mdev->chroots = devm_kmalloc(>pdev->dev, required, GFP_KERNEL);
> + mdev->chroots = devm_kmalloc_array(>pdev->dev,
> +mdev->dma_channels,
> +sizeof(*mdev->chroots),
> +GFP_KERNEL);
>   if (!mdev->chroots)
>   return -ENOMEM;
>  
> 

Reviewed-by: Sinan Kaya 



-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.


Re: [PATCH 2/2] iio: adc: Allow setting Shunt Voltage PGA gain and Bus Voltage range

2017-04-17 Thread Stefan Bruens
On Freitag, 14. April 2017 17:12:03 CEST Jonathan Cameron wrote:
> On 12/04/17 04:01, Stefan Brüns wrote:
> > Reducing shunt and bus voltage range improves the accuracy, so allow
> > altering the default settings.
> > 
> > Signed-off-by: Stefan Brüns 
> 
> Hi Stefan,
> 
> There is new userspace ABI in here, so starting point is to document that.
> 
> That would allow the discussion of whether it is the right ABI to begin.
> 
> In particular can one of these at least be rolled into the standard
> scale attributes that are already supported by the driver?
> It looks to me like they both probably can - perhaps in conjunction with
> use of the _available callback to notify userspace the range available from
> _raw thus allowing easy computation of the range you are providing.
> 
> Keeping new ABI to a minimum makes life a lot easier for userspace tooling!
> 
> I particularly love the way it's described in the datasheet as a gain
> for the shunt voltage but a range for the bus voltage - despite being the
> same PGA (at least in the symbolic representation).

Unfortunately, correct use of raw and scale is somewhat underdocumented. I 
would expect the raw values to reflect the value read from the device, 
unaltered.

For the INA226, all value registers are 16 bit, while for the INA219 the 
voltage register is 13bit (msb aligned, lowest 3 bits from the register are 
masked), the other 3 registers are 16 bit as well.

The INA219 incorporates the bus range and shunt voltage gain in the register 
value, i.e. the shunt voltage value 0x0100 always corresponds to 256 * 10uV, 
irrespective of the PGA setting.

I think its a bad idea to incorporate the gain settings into the scale 
attribute:
1. Raw values would no longer be raw values
2. Scale for the INA219 would be settable, but readonly for the INA226
3. If the device has a gain setting, it should be exposed as such, and names 
should correspond to the datasheet
4. Any user of the gain settings had to be made aware of the possibility to 
change it, no matter how it is exposed. Making it part of the scale, and thus 
changing the meaning of the raw values, would be breaking the existing ABI.

Kind regards,

Stefan

-- 
Stefan Brüns  /  Bergstraße 21  /  52062 Aachen
home: +49 241 53809034 mobile: +49 151 50412019
work: +49 2405 49936-424


Re: [PATCH 2/2] iio: adc: Allow setting Shunt Voltage PGA gain and Bus Voltage range

2017-04-17 Thread Stefan Bruens
On Freitag, 14. April 2017 17:12:03 CEST Jonathan Cameron wrote:
> On 12/04/17 04:01, Stefan Brüns wrote:
> > Reducing shunt and bus voltage range improves the accuracy, so allow
> > altering the default settings.
> > 
> > Signed-off-by: Stefan Brüns 
> 
> Hi Stefan,
> 
> There is new userspace ABI in here, so starting point is to document that.
> 
> That would allow the discussion of whether it is the right ABI to begin.
> 
> In particular can one of these at least be rolled into the standard
> scale attributes that are already supported by the driver?
> It looks to me like they both probably can - perhaps in conjunction with
> use of the _available callback to notify userspace the range available from
> _raw thus allowing easy computation of the range you are providing.
> 
> Keeping new ABI to a minimum makes life a lot easier for userspace tooling!
> 
> I particularly love the way it's described in the datasheet as a gain
> for the shunt voltage but a range for the bus voltage - despite being the
> same PGA (at least in the symbolic representation).

Unfortunately, correct use of raw and scale is somewhat underdocumented. I 
would expect the raw values to reflect the value read from the device, 
unaltered.

For the INA226, all value registers are 16 bit, while for the INA219 the 
voltage register is 13bit (msb aligned, lowest 3 bits from the register are 
masked), the other 3 registers are 16 bit as well.

The INA219 incorporates the bus range and shunt voltage gain in the register 
value, i.e. the shunt voltage value 0x0100 always corresponds to 256 * 10uV, 
irrespective of the PGA setting.

I think its a bad idea to incorporate the gain settings into the scale 
attribute:
1. Raw values would no longer be raw values
2. Scale for the INA219 would be settable, but readonly for the INA226
3. If the device has a gain setting, it should be exposed as such, and names 
should correspond to the datasheet
4. Any user of the gain settings had to be made aware of the possibility to 
change it, no matter how it is exposed. Making it part of the scale, and thus 
changing the meaning of the raw values, would be breaking the existing ABI.

Kind regards,

Stefan

-- 
Stefan Brüns  /  Bergstraße 21  /  52062 Aachen
home: +49 241 53809034 mobile: +49 151 50412019
work: +49 2405 49936-424


Re: RFC: WMI Enhancements

2017-04-17 Thread Andy Lutomirski
On Fri, Apr 14, 2017 at 4:05 PM, Darren Hart  wrote:
> On Sat, Apr 15, 2017 at 12:45:30AM +0200, Rafael Wysocki wrote:
>> On Wednesday, April 12, 2017 04:08:54 PM Darren Hart wrote:
>> > Hi All,
>> >
>> > There are a few parallel efforts involving the Windows Management
>> > Instrumentation (WMI)[1] and dependent/related drivers. I'd like to have a 
>> > round of
>> > discussion among those of you that have been involved in this space before 
>> > we
>> > decide on a direction.
>> >
>> > The WMI support in the kernel today fairly narrowly supports a handful of
>> > systems. Andy L. has a work-in-progress series [2] which converts wmi into 
>> > a
>> > platform device and a proper bus, providing devices for dependent drivers 
>> > to
>> > bind to, and a mechanism for sibling devices to communicate with each 
>> > other.
>> > I've reviewed the series and feel like the approach is sound, I plan to 
>> > carry
>> > this series forward and merge it (with Andy L's permission).
>> >
>> > Are there any objections to this?
>> >
>> > In Windows, applications interact with WMI more or less directly. We don't 
>> > do
>> > this in Linux currently, although it has been discussed in the past [3]. 
>> > Some
>> > vendors will work around this by performing SMI/SMM, which is inefficient 
>> > at
>> > best. Exposing WMI methods to userspace would bring parity to WMI for 
>> > Linux and
>> > Windows.
>> >
>> > There are two principal concerns I'd appreciate your thoughts on:
>> >
>> > a) As an undiscoverable interface (you need to know the method signatures 
>> > ahead
>> > of time), universally exposing every WMI "device" to userspace seems like 
>> > "a bad
>> > idea" from a security and stability perspective. While access would 
>> > certainly be
>> > privileged, it seems more prudent to make this exposure opt-in. We also 
>> > handle
>> > some of this with kernel drivers and exposing those "devices" to userspace 
>> > would
>> > enable userspace and the kernel to fight over control. So - if we expose 
>> > WMI
>> > devices to userspace, I believe this should be done on a case by case 
>> > basis,
>> > opting in, and not by default as part of the WMI driver (although it can 
>> > provide
>> > the mechanism for a sub-driver to use), and possibly a devmode to do so by
>> > default.
>>
>> A couple of loose thoughts here.
>>
>> In principle there could be a "generic default WMI driver" or similar that 
>> would
>> "claim" all WMI "devices" that have not been "claimed" by anyone else and 
>> would
>> simply expose them to user space somehow (e.g. using a chardev interface).
>>
>> Then, depending on how that thing is implemented, opt-in etc should be 
>> possible
>> too.
>>
>
> I think we agree this would be an ideal approach.
>
> As we look into this more, it is becoming clear that the necessary 
> functionality
> is not nicely divided into GUIDs for what is necessary in userspace and what 
> is
> handled in the kernel. A single WMI METHOD GUID may be needed by userspace for
> certain functionality, while the kernel drivers may use it for something else.
>
> :-(
>
> The input to a WMI method is just a buffer, so it is very free form. One
> approach Mario has mentioned was to audit the user space WMI METHOD calls in 
> the
> kernel platform drivers and reject those calls with arguments matching those
> issued by the kernel driver. This is likely to be complex and error prone in 
> my
> opinion. However, I have not yet thought of another means to meet the
> requirement of having disjoint feature sets for userspace and kernel space 
> via a
> mechanism that was effectively designed to be used solely from user space with
> vendor defined method signatures.
>
> Next step is to look at just how complex it would be to audit the method calls
> the kernel currently uses.

I'm wondering whether it's really worth it.  We already allow doing
darned near anything using dcdbas.  Maybe the world won't end if we
expose a complete-ish ioctl interface to WMI.

Also, dcdbas is, to put it mildly, a bit ridiculous.  It seems to be a
seriously awkward sysfs interface that allows you to, drumroll please,
issue outb and inb instructions.  It doesn't even check that it's
running on a Dell system.  It might be nice to deprecate it some day
in favor of a real interface.  I'd consider a low-level WMI ioctl
interface to be a vast improvement.


Re: RFC: WMI Enhancements

2017-04-17 Thread Andy Lutomirski
On Fri, Apr 14, 2017 at 4:05 PM, Darren Hart  wrote:
> On Sat, Apr 15, 2017 at 12:45:30AM +0200, Rafael Wysocki wrote:
>> On Wednesday, April 12, 2017 04:08:54 PM Darren Hart wrote:
>> > Hi All,
>> >
>> > There are a few parallel efforts involving the Windows Management
>> > Instrumentation (WMI)[1] and dependent/related drivers. I'd like to have a 
>> > round of
>> > discussion among those of you that have been involved in this space before 
>> > we
>> > decide on a direction.
>> >
>> > The WMI support in the kernel today fairly narrowly supports a handful of
>> > systems. Andy L. has a work-in-progress series [2] which converts wmi into 
>> > a
>> > platform device and a proper bus, providing devices for dependent drivers 
>> > to
>> > bind to, and a mechanism for sibling devices to communicate with each 
>> > other.
>> > I've reviewed the series and feel like the approach is sound, I plan to 
>> > carry
>> > this series forward and merge it (with Andy L's permission).
>> >
>> > Are there any objections to this?
>> >
>> > In Windows, applications interact with WMI more or less directly. We don't 
>> > do
>> > this in Linux currently, although it has been discussed in the past [3]. 
>> > Some
>> > vendors will work around this by performing SMI/SMM, which is inefficient 
>> > at
>> > best. Exposing WMI methods to userspace would bring parity to WMI for 
>> > Linux and
>> > Windows.
>> >
>> > There are two principal concerns I'd appreciate your thoughts on:
>> >
>> > a) As an undiscoverable interface (you need to know the method signatures 
>> > ahead
>> > of time), universally exposing every WMI "device" to userspace seems like 
>> > "a bad
>> > idea" from a security and stability perspective. While access would 
>> > certainly be
>> > privileged, it seems more prudent to make this exposure opt-in. We also 
>> > handle
>> > some of this with kernel drivers and exposing those "devices" to userspace 
>> > would
>> > enable userspace and the kernel to fight over control. So - if we expose 
>> > WMI
>> > devices to userspace, I believe this should be done on a case by case 
>> > basis,
>> > opting in, and not by default as part of the WMI driver (although it can 
>> > provide
>> > the mechanism for a sub-driver to use), and possibly a devmode to do so by
>> > default.
>>
>> A couple of loose thoughts here.
>>
>> In principle there could be a "generic default WMI driver" or similar that 
>> would
>> "claim" all WMI "devices" that have not been "claimed" by anyone else and 
>> would
>> simply expose them to user space somehow (e.g. using a chardev interface).
>>
>> Then, depending on how that thing is implemented, opt-in etc should be 
>> possible
>> too.
>>
>
> I think we agree this would be an ideal approach.
>
> As we look into this more, it is becoming clear that the necessary 
> functionality
> is not nicely divided into GUIDs for what is necessary in userspace and what 
> is
> handled in the kernel. A single WMI METHOD GUID may be needed by userspace for
> certain functionality, while the kernel drivers may use it for something else.
>
> :-(
>
> The input to a WMI method is just a buffer, so it is very free form. One
> approach Mario has mentioned was to audit the user space WMI METHOD calls in 
> the
> kernel platform drivers and reject those calls with arguments matching those
> issued by the kernel driver. This is likely to be complex and error prone in 
> my
> opinion. However, I have not yet thought of another means to meet the
> requirement of having disjoint feature sets for userspace and kernel space 
> via a
> mechanism that was effectively designed to be used solely from user space with
> vendor defined method signatures.
>
> Next step is to look at just how complex it would be to audit the method calls
> the kernel currently uses.

I'm wondering whether it's really worth it.  We already allow doing
darned near anything using dcdbas.  Maybe the world won't end if we
expose a complete-ish ioctl interface to WMI.

Also, dcdbas is, to put it mildly, a bit ridiculous.  It seems to be a
seriously awkward sysfs interface that allows you to, drumroll please,
issue outb and inb instructions.  It doesn't even check that it's
running on a Dell system.  It might be nice to deprecate it some day
in favor of a real interface.  I'd consider a low-level WMI ioctl
interface to be a vast improvement.


Re: [PATCH v8 10/15] ACPI: platform-msi: retrieve dev id from IORT

2017-04-17 Thread Sinan Kaya
On 4/17/2017 5:44 PM, Sinan Kaya wrote:
> Any idea what happened to the change in this function during merge?
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=ae7c18380495ac5c14a614fdb6c452c3bf9148ac
> 

I realized that there is a V9 out there. I'm catching up with the work. 

https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/commit/?h=irq/irqchip-4.12=e6db07d0f3b6da1f8cfd485776bfefa4fcdbfc45

seems to be fixing the issue. 

> 63a52b3 platform-msi: Make platform_msi_create_device_domain() ACPI aware
> e07b978 irqchip/gicv3-its: platform-msi: Scan MADT to create platform msi 
> domain
> 09be1d5 irqchip/gicv3-its: platform-msi: Refactor its_pmsi_init() to prepare 
> for ACPI
> 4e96df9 irqchip/gicv3-its: platform-msi: Refactor its_pmsi_prepare()
> e593828 irqchip/gic-v3-its: Keep the include header files in alphabetic order
> e226a74 ACPI: platform: setup MSI domain for ACPI based platform device
> 79c7533 ACPI: platform-msi: retrieve devid from IORT
> 43394c9 ACPI/IORT: Introduce iort_node_map_platform_id() to retrieve dev id
> e6eaeae ACPI/IORT: Rename iort_node_map_rid() to make it generic
> 6acbb92 ACPI/IORT: Rework iort_match_node_callback() return value handling
> f8e2f50 ACPI/IORT: Add missing comment for iort_dev_find_its_id()
> 4b1 ACPI/IORT: Fix the indentation in iort_scan_node()

Apologies for the noise.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.


Re: [PATCH v8 10/15] ACPI: platform-msi: retrieve dev id from IORT

2017-04-17 Thread Sinan Kaya
On 4/17/2017 5:44 PM, Sinan Kaya wrote:
> Any idea what happened to the change in this function during merge?
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=ae7c18380495ac5c14a614fdb6c452c3bf9148ac
> 

I realized that there is a V9 out there. I'm catching up with the work. 

https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/commit/?h=irq/irqchip-4.12=e6db07d0f3b6da1f8cfd485776bfefa4fcdbfc45

seems to be fixing the issue. 

> 63a52b3 platform-msi: Make platform_msi_create_device_domain() ACPI aware
> e07b978 irqchip/gicv3-its: platform-msi: Scan MADT to create platform msi 
> domain
> 09be1d5 irqchip/gicv3-its: platform-msi: Refactor its_pmsi_init() to prepare 
> for ACPI
> 4e96df9 irqchip/gicv3-its: platform-msi: Refactor its_pmsi_prepare()
> e593828 irqchip/gic-v3-its: Keep the include header files in alphabetic order
> e226a74 ACPI: platform: setup MSI domain for ACPI based platform device
> 79c7533 ACPI: platform-msi: retrieve devid from IORT
> 43394c9 ACPI/IORT: Introduce iort_node_map_platform_id() to retrieve dev id
> e6eaeae ACPI/IORT: Rename iort_node_map_rid() to make it generic
> 6acbb92 ACPI/IORT: Rework iort_match_node_callback() return value handling
> f8e2f50 ACPI/IORT: Add missing comment for iort_dev_find_its_id()
> 4b1 ACPI/IORT: Fix the indentation in iort_scan_node()

Apologies for the noise.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.


Re: [PATCH -v2 0/9] mm: make movable onlining suck less

2017-04-17 Thread Dan Williams
On Tue, Apr 11, 2017 at 10:03 AM, Michal Hocko  wrote:
> All the reported issue seem to be fixed and pushed to my git tree
> attempts/rewrite-mem_hotplug branch. I will wait a day or two for more
> feedback and then repost for the inclusion. I would really appreaciate
> more testing/review!

This still seems to be based on 4.10? It's missing some block-layer
fixes and other things that trigger failures in the nvdimm unit tests.
Can you rebase to a more recent 4.11-rc?


Re: [PATCH -v2 0/9] mm: make movable onlining suck less

2017-04-17 Thread Dan Williams
On Tue, Apr 11, 2017 at 10:03 AM, Michal Hocko  wrote:
> All the reported issue seem to be fixed and pushed to my git tree
> attempts/rewrite-mem_hotplug branch. I will wait a day or two for more
> feedback and then repost for the inclusion. I would really appreaciate
> more testing/review!

This still seems to be based on 4.10? It's missing some block-layer
fixes and other things that trigger failures in the nvdimm unit tests.
Can you rebase to a more recent 4.11-rc?


Re: [PATCH] drm/vc4: Fix refcounting of runtime PM get if it errors out.

2017-04-17 Thread Eric Anholt
Sean Paul  writes:

> On Mon, Apr 17, 2017 at 09:26:03AM -0700, Eric Anholt wrote:
>> We were returning without decrementing if the error happened, meaning
>> that at the next submit we wouldn't try to bring up the power domain.
>> 
>> Signed-off-by: Eric Anholt 
>
> This change looks good to me. It seems a little odd to wrap pm_runtime which
> is already refcounted in another refcount, but I'm really not familiar with
> the driver, and I'm sure there's a good reason for it.
>
> Reviewed-by: Sean Paul 

Yeah, it's an unfortunate hack because the reset control is buried in
the power domain driver on the RPi, so we have to get the power domain
actually off (0 refcount) in order to reset the GPU.

Yay for building around closed-source firmware :(


signature.asc
Description: PGP signature


Re: [PATCH] drm/vc4: Fix refcounting of runtime PM get if it errors out.

2017-04-17 Thread Eric Anholt
Sean Paul  writes:

> On Mon, Apr 17, 2017 at 09:26:03AM -0700, Eric Anholt wrote:
>> We were returning without decrementing if the error happened, meaning
>> that at the next submit we wouldn't try to bring up the power domain.
>> 
>> Signed-off-by: Eric Anholt 
>
> This change looks good to me. It seems a little odd to wrap pm_runtime which
> is already refcounted in another refcount, but I'm really not familiar with
> the driver, and I'm sure there's a good reason for it.
>
> Reviewed-by: Sean Paul 

Yeah, it's an unfortunate hack because the reset control is buried in
the power domain driver on the RPi, so we have to get the power domain
actually off (0 refcount) in order to reset the GPU.

Yay for building around closed-source firmware :(


signature.asc
Description: PGP signature


Re: [PATCH v8 10/15] ACPI: platform-msi: retrieve dev id from IORT

2017-04-17 Thread Sinan Kaya
On 1/18/2017 7:55 AM, Hanjun Guo wrote:
> --- a/drivers/irqchip/irq-gic-v3-its-platform-msi.c
> +++ b/drivers/irqchip/irq-gic-v3-its-platform-msi.c
> @@ -57,7 +57,8 @@ static int its_pmsi_prepare(struct irq_domain *domain, 
> struct device *dev,
>  
>   msi_info = msi_get_domain_info(domain->parent);
>  
> - ret = of_pmsi_get_dev_id(domain, dev, _id);
> + ret = dev->of_node ? of_pmsi_get_dev_id(domain, dev, _id) :
> + iort_pmsi_get_dev_id(dev, _id);
>   if (ret)
>   return ret;
>  

Any idea what happened to the change in this function during merge?

https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=ae7c18380495ac5c14a614fdb6c452c3bf9148ac

63a52b3 platform-msi: Make platform_msi_create_device_domain() ACPI aware
e07b978 irqchip/gicv3-its: platform-msi: Scan MADT to create platform msi domain
09be1d5 irqchip/gicv3-its: platform-msi: Refactor its_pmsi_init() to prepare 
for ACPI
4e96df9 irqchip/gicv3-its: platform-msi: Refactor its_pmsi_prepare()
e593828 irqchip/gic-v3-its: Keep the include header files in alphabetic order
e226a74 ACPI: platform: setup MSI domain for ACPI based platform device
79c7533 ACPI: platform-msi: retrieve devid from IORT
43394c9 ACPI/IORT: Introduce iort_node_map_platform_id() to retrieve dev id
e6eaeae ACPI/IORT: Rename iort_node_map_rid() to make it generic
6acbb92 ACPI/IORT: Rework iort_match_node_callback() return value handling
f8e2f50 ACPI/IORT: Add missing comment for iort_dev_find_its_id()
4b1 ACPI/IORT: Fix the indentation in iort_scan_node()

I'm getting these warnings when using the above patches on top of 4.11-rc6?
The code is basically trying to read OF attribute on an ACPI system.

[   11.644792] [ cut here ]
[   11.649407] WARNING: CPU: 35 PID: 1 at 
/local/mnt/workspace/projects/caf/kernel/drivers/irqchip/irq-gic-v3-its-platform-msi.c:41
 its_pmsi_prepare+0x9c/0xfc
[   11.663305] Modules linked in:
[   11.666353]
[   11.667839] CPU: 35 PID: 1 Comm: swapper/0 Tainted: GW   
4.11.0-00036-g63a52b3 #1
[   11.676355] Hardware name: (null) (DT)
[   11.680097] task: 8007dbab8000 task.stack: 8007dbab4000
[   11.686009] PC is at its_pmsi_prepare+0x9c/0xfc
[   11.690533] LR is at its_pmsi_prepare+0x5c/0xfc
[   11.695056] pc : [] lr : [] pstate: 
20400045
[   11.702443] sp : 8007dbab7a40
[   11.705750] x29: 8007dbab7a40 x28: 
[   11.711057] x27: 092a9000 x26: 
[   11.716364] x25: 8007df02ba00 x24: 08a745a9
[   11.721670] x23: 08a7459e x22: 000b
[   11.726976] x21: 8007dbab7b70 x20: 8007d9c48c10
[   11.732282] x19: 8007dba8ba80 x18: 000a
[   11.737588] x17: 1000 x16: 0040
[   11.742895] x15: 0009bb9b x14: 00400041
[   11.748201] x13: 0140 x12: 0088
[   11.753507] x11: 08af x10: 094ad000
[   11.758814] x9 :  x8 : 8007d977ac00
[   11.764120] x7 :  x6 : 003f
[   11.769426] x5 : 8007dbab7a90 x4 : 
[   11.774733] x3 : 0006 x2 : fffe
[   11.780039] x1 : 8007df02b760 x0 : 08d8fb70
[   11.785345]
[   11.786830] ---[ end trace 265ed4f0c6d0486b ]---
[   11.791439] Call trace:
[   11.793879] Exception stack(0x8007dbab7870 to 0x8007dbab79a0)
[   11.800312] 7860:   8007dba8ba80 
0001
[   11.808134] 7880: 8007dbab7a40 08371b68 7e00 
0004
[   11.815956] 78a0: 08af 8200 000ff200 
8007dbab8000
[   11.823779] 78c0: 000ff000 081cb6ec 8007dbab78e0 
000ff000
[   11.831601] 78e0: 0040 80072a80 8007dbab7940 
08166530
[   11.839423] 7900: 8007dbab7960 081984f4 08d8fb70 
8007df02b760
[   11.847245] 7920: fffe 0006  
8007dbab7a90
[   11.855067] 7940: 003f  8007d977ac00 

[   11.862890] 7960: 094ad000 08af 0088 
0140
[   11.870712] 7980: 00400041 0009bb9b 0040 
1000
[   11.878535] [] its_pmsi_prepare+0x9c/0xfc
[   11.884101] [] msi_domain_prepare_irqs+0x54/0x68
[   11.890274] [] msi_domain_alloc_irqs+0x30/0x14c
[   11.896361] [] platform_msi_domain_alloc_irqs+0x54/0x8c
[   11.903142] [] hidma_probe+0x5ac/0x824
[   11.908448] [] platform_drv_probe+0x54/0xa4
[   11.914186] [] driver_probe_device+0x140/0x2a0
[   11.920185] [] __driver_attach+0x74/0xa0
[   11.925664] [] bus_for_each_dev+0x68/0x98
[   11.931229] [] driver_attach+0x20/0x28
[   11.936534] [] bus_add_driver+0xe0/0x1ec
[   11.942012] [] driver_register+0x90/0xdc
[   11.947490] [] __platform_driver_register+0x48/0x50
[   11.953924] [] hidma_driver_init+0x18/0x20
[   

Re: [PATCH v8 10/15] ACPI: platform-msi: retrieve dev id from IORT

2017-04-17 Thread Sinan Kaya
On 1/18/2017 7:55 AM, Hanjun Guo wrote:
> --- a/drivers/irqchip/irq-gic-v3-its-platform-msi.c
> +++ b/drivers/irqchip/irq-gic-v3-its-platform-msi.c
> @@ -57,7 +57,8 @@ static int its_pmsi_prepare(struct irq_domain *domain, 
> struct device *dev,
>  
>   msi_info = msi_get_domain_info(domain->parent);
>  
> - ret = of_pmsi_get_dev_id(domain, dev, _id);
> + ret = dev->of_node ? of_pmsi_get_dev_id(domain, dev, _id) :
> + iort_pmsi_get_dev_id(dev, _id);
>   if (ret)
>   return ret;
>  

Any idea what happened to the change in this function during merge?

https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=ae7c18380495ac5c14a614fdb6c452c3bf9148ac

63a52b3 platform-msi: Make platform_msi_create_device_domain() ACPI aware
e07b978 irqchip/gicv3-its: platform-msi: Scan MADT to create platform msi domain
09be1d5 irqchip/gicv3-its: platform-msi: Refactor its_pmsi_init() to prepare 
for ACPI
4e96df9 irqchip/gicv3-its: platform-msi: Refactor its_pmsi_prepare()
e593828 irqchip/gic-v3-its: Keep the include header files in alphabetic order
e226a74 ACPI: platform: setup MSI domain for ACPI based platform device
79c7533 ACPI: platform-msi: retrieve devid from IORT
43394c9 ACPI/IORT: Introduce iort_node_map_platform_id() to retrieve dev id
e6eaeae ACPI/IORT: Rename iort_node_map_rid() to make it generic
6acbb92 ACPI/IORT: Rework iort_match_node_callback() return value handling
f8e2f50 ACPI/IORT: Add missing comment for iort_dev_find_its_id()
4b1 ACPI/IORT: Fix the indentation in iort_scan_node()

I'm getting these warnings when using the above patches on top of 4.11-rc6?
The code is basically trying to read OF attribute on an ACPI system.

[   11.644792] [ cut here ]
[   11.649407] WARNING: CPU: 35 PID: 1 at 
/local/mnt/workspace/projects/caf/kernel/drivers/irqchip/irq-gic-v3-its-platform-msi.c:41
 its_pmsi_prepare+0x9c/0xfc
[   11.663305] Modules linked in:
[   11.666353]
[   11.667839] CPU: 35 PID: 1 Comm: swapper/0 Tainted: GW   
4.11.0-00036-g63a52b3 #1
[   11.676355] Hardware name: (null) (DT)
[   11.680097] task: 8007dbab8000 task.stack: 8007dbab4000
[   11.686009] PC is at its_pmsi_prepare+0x9c/0xfc
[   11.690533] LR is at its_pmsi_prepare+0x5c/0xfc
[   11.695056] pc : [] lr : [] pstate: 
20400045
[   11.702443] sp : 8007dbab7a40
[   11.705750] x29: 8007dbab7a40 x28: 
[   11.711057] x27: 092a9000 x26: 
[   11.716364] x25: 8007df02ba00 x24: 08a745a9
[   11.721670] x23: 08a7459e x22: 000b
[   11.726976] x21: 8007dbab7b70 x20: 8007d9c48c10
[   11.732282] x19: 8007dba8ba80 x18: 000a
[   11.737588] x17: 1000 x16: 0040
[   11.742895] x15: 0009bb9b x14: 00400041
[   11.748201] x13: 0140 x12: 0088
[   11.753507] x11: 08af x10: 094ad000
[   11.758814] x9 :  x8 : 8007d977ac00
[   11.764120] x7 :  x6 : 003f
[   11.769426] x5 : 8007dbab7a90 x4 : 
[   11.774733] x3 : 0006 x2 : fffe
[   11.780039] x1 : 8007df02b760 x0 : 08d8fb70
[   11.785345]
[   11.786830] ---[ end trace 265ed4f0c6d0486b ]---
[   11.791439] Call trace:
[   11.793879] Exception stack(0x8007dbab7870 to 0x8007dbab79a0)
[   11.800312] 7860:   8007dba8ba80 
0001
[   11.808134] 7880: 8007dbab7a40 08371b68 7e00 
0004
[   11.815956] 78a0: 08af 8200 000ff200 
8007dbab8000
[   11.823779] 78c0: 000ff000 081cb6ec 8007dbab78e0 
000ff000
[   11.831601] 78e0: 0040 80072a80 8007dbab7940 
08166530
[   11.839423] 7900: 8007dbab7960 081984f4 08d8fb70 
8007df02b760
[   11.847245] 7920: fffe 0006  
8007dbab7a90
[   11.855067] 7940: 003f  8007d977ac00 

[   11.862890] 7960: 094ad000 08af 0088 
0140
[   11.870712] 7980: 00400041 0009bb9b 0040 
1000
[   11.878535] [] its_pmsi_prepare+0x9c/0xfc
[   11.884101] [] msi_domain_prepare_irqs+0x54/0x68
[   11.890274] [] msi_domain_alloc_irqs+0x30/0x14c
[   11.896361] [] platform_msi_domain_alloc_irqs+0x54/0x8c
[   11.903142] [] hidma_probe+0x5ac/0x824
[   11.908448] [] platform_drv_probe+0x54/0xa4
[   11.914186] [] driver_probe_device+0x140/0x2a0
[   11.920185] [] __driver_attach+0x74/0xa0
[   11.925664] [] bus_for_each_dev+0x68/0x98
[   11.931229] [] driver_attach+0x20/0x28
[   11.936534] [] bus_add_driver+0xe0/0x1ec
[   11.942012] [] driver_register+0x90/0xdc
[   11.947490] [] __platform_driver_register+0x48/0x50
[   11.953924] [] hidma_driver_init+0x18/0x20
[   

Re: [GIT PULL] KEYS: Blacklisting, restrictions and DH

2017-04-17 Thread James Morris
On Wed, 12 Apr 2017, David Howells wrote:

> 
> Hi James,
> 
> Could you pull these changes into security/next please:
> 
>  (1) Provide a blacklist keyring and a blacklist key type such that X.509
>  keys and PKCS#7 certs can be blacklisted.  It is possible to load the
>  blacklist from a file at compile time.  A future patch will
>  additionally load the blacklist from the UEFI blacklist if available.
> 
>  (2) Make it possible to create a userspace keyring and to apply a
>  restriction to it such that no new keys can be added unless they meet
>  the criteria.
> 
>  (3) Add SP800-56A KDF support for the DH operation.
> 

Pulled, thanks.


-- 
James Morris




Re: [GIT PULL] KEYS: Blacklisting, restrictions and DH

2017-04-17 Thread James Morris
On Wed, 12 Apr 2017, David Howells wrote:

> 
> Hi James,
> 
> Could you pull these changes into security/next please:
> 
>  (1) Provide a blacklist keyring and a blacklist key type such that X.509
>  keys and PKCS#7 certs can be blacklisted.  It is possible to load the
>  blacklist from a file at compile time.  A future patch will
>  additionally load the blacklist from the UEFI blacklist if available.
> 
>  (2) Make it possible to create a userspace keyring and to apply a
>  restriction to it such that no new keys can be added unless they meet
>  the criteria.
> 
>  (3) Add SP800-56A KDF support for the DH operation.
> 

Pulled, thanks.


-- 
James Morris




Re: [PATCH v4 1/2] vfio/type1: Remove locked page accounting workqueue

2017-04-17 Thread Alex Williamson
On Tue, 18 Apr 2017 01:02:12 +0530
Kirti Wankhede  wrote:

> On 4/18/2017 12:49 AM, Alex Williamson wrote:
> > On Tue, 18 Apr 2017 00:35:06 +0530
> > Kirti Wankhede  wrote:
> >   
> >> On 4/17/2017 8:02 PM, Alex Williamson wrote:  
> >>> On Mon, 17 Apr 2017 14:47:54 +0800
> >>> Peter Xu  wrote:
> >>> 
>  On Sun, Apr 16, 2017 at 07:42:27PM -0600, Alex Williamson wrote:
> 
>  [...]
> 
> > -static void vfio_lock_acct(struct task_struct *task, long npage)
> > +static int vfio_lock_acct(struct task_struct *task, long npage, bool 
> > lock_cap)
> >  {
> > -   struct vwork *vwork;
> > struct mm_struct *mm;
> > bool is_current;
> > +   int ret;
> >  
> > if (!npage)
> > -   return;
> > +   return 0;
> >  
> > is_current = (task->mm == current->mm);
> >  
> > mm = is_current ? task->mm : get_task_mm(task);
> > if (!mm)
> > -   return; /* process exited */
> > +   return -ESRCH; /* process exited */
> >  
> > -   if (down_write_trylock(>mmap_sem)) {
> > -   mm->locked_vm += npage;
> > -   up_write(>mmap_sem);
> > -   if (!is_current)
> > -   mmput(mm);
> > -   return;
> > -   }
> > +   ret = down_write_killable(>mmap_sem);
> > +   if (!ret) {
> > +   if (npage < 0 || lock_cap) {  
> 
>  Nit: maybe we can avoid passing in lock_cap in all the callers of
>  vfio_lock_acct() and fetch it via has_capability() only if npage < 0?
>  IMHO that'll keep the vfio_lock_acct() interface cleaner, and we won't
>  need to pass in "false" any time when doing unpins.
> >>>
> >>> Unfortunately vfio_pin_pages_remote() needs to know about lock_cap
> >>> since it tests whether the user is exceeding their locked memory
> >>> limit.  The other callers could certainly get away with
> >>> vfio_lock_acct() testing the capability itself but that would add a
> >>> redundant call for the most common user.  I'm not a big fan of passing
> >>> a lock_cap bool either, but it seemed the best fix for now.  The
> >>> cleanest alternative I can up with is this (untested):
> >>> 
> >>
> >> In my opinion, passing 'bool lock_cap' looks much clean and simple.
> >>
> >> Reviewed-by: Kirti Wankhede   
> > 
> > Well shoot, I was just starting to warm up to the bool*.  I like that
> > we're not presuming the polarity for the callers we expect to be
> > removing pages and I generally just dislike passing fixed bool
> > parameters to change the function behavior.  I've cleaned it up a bit
> > further and was starting to do some testing on this which I'd propose
> > for v5.  Does it change your opinion?  
> 
> If passing fixed bool parameter is the concern then I would lean towards
> Peter's suggestion. vfio_pin_pages_remote() will check lock capability
> outside vfio_lock_acct() and again in vfio_lock_acct(). At other places,
> it will be takes care within vfio_lock_acct()

Sorry, I don't see that as a viable option.  Testing for CAP_IPC_LOCK in
both vfio_pin_pages_remote() and vfio_lock_acct() results in over a
10% performance hit on the mapping path with a custom micro-benchmark.
In fact, it suggests we should probably pass that from even higher in
the call stack.  Thanks,

Alex

> > 
> > commit cd61c5f507d614ac14b75b0a548c8738deff88ea
> > Author: Alex Williamson 
> > Date:   Thu Apr 13 14:10:15 2017 -0600
> > 
> > vfio/type1: Remove locked page accounting workqueue
> > 
> > If the mmap_sem is contented then the vfio type1 IOMMU backend will
> > defer locked page accounting updates to a workqueue task.  This has a
> > few problems and depending on which side the user tries to play, they
> > might be over-penalized for unmaps that haven't yet been accounted or
> > race the workqueue to enter more mappings than they're allowed.  The
> > original intent of this workqueue mechanism seems to be focused on
> > reducing latency through the ioctl, but we cannot do so at the cost
> > of correctness.  Remove this workqueue mechanism and update the
> > callers to allow for failure.  We can also now recheck the limit under
> > write lock to make sure we don't exceed it.
> > 
> > vfio_pin_pages_remote() also now necessarily includes an unwind path
> > which we can jump to directly if the consecutive page pinning finds
> > that we're exceeding the user's memory limits.  This avoids the
> > current lazy approach which does accounting and mapping up to the
> > fault, only to return an error on the next iteration to unwind the
> > entire vfio_dma.
> > 
> > Cc: sta...@vger.kernel.org
> > Signed-off-by: Alex 

Re: [PATCH v4 1/2] vfio/type1: Remove locked page accounting workqueue

2017-04-17 Thread Alex Williamson
On Tue, 18 Apr 2017 01:02:12 +0530
Kirti Wankhede  wrote:

> On 4/18/2017 12:49 AM, Alex Williamson wrote:
> > On Tue, 18 Apr 2017 00:35:06 +0530
> > Kirti Wankhede  wrote:
> >   
> >> On 4/17/2017 8:02 PM, Alex Williamson wrote:  
> >>> On Mon, 17 Apr 2017 14:47:54 +0800
> >>> Peter Xu  wrote:
> >>> 
>  On Sun, Apr 16, 2017 at 07:42:27PM -0600, Alex Williamson wrote:
> 
>  [...]
> 
> > -static void vfio_lock_acct(struct task_struct *task, long npage)
> > +static int vfio_lock_acct(struct task_struct *task, long npage, bool 
> > lock_cap)
> >  {
> > -   struct vwork *vwork;
> > struct mm_struct *mm;
> > bool is_current;
> > +   int ret;
> >  
> > if (!npage)
> > -   return;
> > +   return 0;
> >  
> > is_current = (task->mm == current->mm);
> >  
> > mm = is_current ? task->mm : get_task_mm(task);
> > if (!mm)
> > -   return; /* process exited */
> > +   return -ESRCH; /* process exited */
> >  
> > -   if (down_write_trylock(>mmap_sem)) {
> > -   mm->locked_vm += npage;
> > -   up_write(>mmap_sem);
> > -   if (!is_current)
> > -   mmput(mm);
> > -   return;
> > -   }
> > +   ret = down_write_killable(>mmap_sem);
> > +   if (!ret) {
> > +   if (npage < 0 || lock_cap) {  
> 
>  Nit: maybe we can avoid passing in lock_cap in all the callers of
>  vfio_lock_acct() and fetch it via has_capability() only if npage < 0?
>  IMHO that'll keep the vfio_lock_acct() interface cleaner, and we won't
>  need to pass in "false" any time when doing unpins.
> >>>
> >>> Unfortunately vfio_pin_pages_remote() needs to know about lock_cap
> >>> since it tests whether the user is exceeding their locked memory
> >>> limit.  The other callers could certainly get away with
> >>> vfio_lock_acct() testing the capability itself but that would add a
> >>> redundant call for the most common user.  I'm not a big fan of passing
> >>> a lock_cap bool either, but it seemed the best fix for now.  The
> >>> cleanest alternative I can up with is this (untested):
> >>> 
> >>
> >> In my opinion, passing 'bool lock_cap' looks much clean and simple.
> >>
> >> Reviewed-by: Kirti Wankhede   
> > 
> > Well shoot, I was just starting to warm up to the bool*.  I like that
> > we're not presuming the polarity for the callers we expect to be
> > removing pages and I generally just dislike passing fixed bool
> > parameters to change the function behavior.  I've cleaned it up a bit
> > further and was starting to do some testing on this which I'd propose
> > for v5.  Does it change your opinion?  
> 
> If passing fixed bool parameter is the concern then I would lean towards
> Peter's suggestion. vfio_pin_pages_remote() will check lock capability
> outside vfio_lock_acct() and again in vfio_lock_acct(). At other places,
> it will be takes care within vfio_lock_acct()

Sorry, I don't see that as a viable option.  Testing for CAP_IPC_LOCK in
both vfio_pin_pages_remote() and vfio_lock_acct() results in over a
10% performance hit on the mapping path with a custom micro-benchmark.
In fact, it suggests we should probably pass that from even higher in
the call stack.  Thanks,

Alex

> > 
> > commit cd61c5f507d614ac14b75b0a548c8738deff88ea
> > Author: Alex Williamson 
> > Date:   Thu Apr 13 14:10:15 2017 -0600
> > 
> > vfio/type1: Remove locked page accounting workqueue
> > 
> > If the mmap_sem is contented then the vfio type1 IOMMU backend will
> > defer locked page accounting updates to a workqueue task.  This has a
> > few problems and depending on which side the user tries to play, they
> > might be over-penalized for unmaps that haven't yet been accounted or
> > race the workqueue to enter more mappings than they're allowed.  The
> > original intent of this workqueue mechanism seems to be focused on
> > reducing latency through the ioctl, but we cannot do so at the cost
> > of correctness.  Remove this workqueue mechanism and update the
> > callers to allow for failure.  We can also now recheck the limit under
> > write lock to make sure we don't exceed it.
> > 
> > vfio_pin_pages_remote() also now necessarily includes an unwind path
> > which we can jump to directly if the consecutive page pinning finds
> > that we're exceeding the user's memory limits.  This avoids the
> > current lazy approach which does accounting and mapping up to the
> > fault, only to return an error on the next iteration to unwind the
> > entire vfio_dma.
> > 
> > Cc: sta...@vger.kernel.org
> > Signed-off-by: Alex Williamson 
> > 
> > diff --git a/drivers/vfio/vfio_iommu_type1.c 
> > b/drivers/vfio/vfio_iommu_type1.c
> > index 

Re: [PATCH] ACPICA: Export mutex functions

2017-04-17 Thread Rafael J. Wysocki
On Mon, Apr 17, 2017 at 11:03 PM, Guenter Roeck  wrote:
> On Mon, Apr 17, 2017 at 08:40:38PM +, Moore, Robert wrote:
>>
>>
>> > -Original Message-
>> > From: Guenter Roeck [mailto:li...@roeck-us.net]
>> > Sent: Monday, April 17, 2017 12:45 PM
>> > To: Moore, Robert 
>> > Cc: Zheng, Lv ; Wysocki, Rafael J
>> > ; 'Len Brown' ; 'linux-
>> > a...@vger.kernel.org' ; 'de...@acpica.org'
>> > ; 'linux-kernel@vger.kernel.org' > > ker...@vger.kernel.org>; Box, David E 
>> > Subject: Re: [PATCH] ACPICA: Export mutex functions
>> >
>> > On Mon, Apr 17, 2017 at 07:27:37PM +, Moore, Robert wrote:
>> > >
>> > > > -Original Message-
>> > > > From: Moore, Robert
>> > > > Sent: Monday, April 17, 2017 10:13 AM
>> > > > To: Guenter Roeck ; Zheng, Lv
>> > > > 
>> > > > Cc: Wysocki, Rafael J ; Len Brown
>> > > > ; linux-a...@vger.kernel.org; de...@acpica.org;
>> > > > linux- ker...@vger.kernel.org
>> > > > Subject: RE: [PATCH] ACPICA: Export mutex functions
>> > > >
>> > > > There is a model for the drivers to directly acquire an AML mutex
>> > > > object. That is why the acquire/release public interfaces were added
>> > > > to ACPICA.
>> > > >
>> > > > I forget all of the details, but the model was developed with MS and
>> > > > others during the ACPI 6.0 timeframe.
>> > > >
>> > > >
>> > > [Moore, Robert]
>> > >
>> > >
>> > > Here is the case where the OS may need to directly acquire an AML
>> > mutex:
>> > >
>> > > From the ACPI spec:
>> > >
>> > > 19.6.2 Acquire (Acquire a Mutex)
>> > >
>> > > Note: For Mutex objects referenced by a _DLM object, the host OS may
>> > also contend for ownership.
>> > >
>> > From the context in the dsdt, and from description of expected use cases
>> > for _DLM objects I can find, this is what the mutex is used for (to
>> > serialize access to a resource on a low pin count serial interconnect,
>> > aka LPC).
>> >
>> > What does that mean in practice ? That I am not supposed to use it
>> > because it doesn't follow standard ACPI mutex declaration rules ?
>> >
>> > Thanks,
>> > Guenter
>> >
>> > >
>> [Moore, Robert]
>>
>> I'm not an expert on the _DLM method, but I would point you to the 
>> description section in the ACPI spec, 5.7.5 _DLM (DeviceLock Mutex).
>>
>
> I did. However, not being an ACPI expert, that doesn't tell me anything.

Basically, if the kernel and AML need to access a device concurrently,
there should be a _DLM object under that device in the ACPI tables.
In that case it is expected to return a list of (AML) mutexes that can
be acquired by the kernel in order to synchronize device access with
respect to AML (and for each mutex it may also return a description of
the specific resources to be protected by it).

Bottom line: without _DLM, the kernel cannot synchronize things with
respect to AML properly, because it has no information how to do that
then.

Thanks,
Rafael


Re: [PATCH] ACPICA: Export mutex functions

2017-04-17 Thread Rafael J. Wysocki
On Mon, Apr 17, 2017 at 11:03 PM, Guenter Roeck  wrote:
> On Mon, Apr 17, 2017 at 08:40:38PM +, Moore, Robert wrote:
>>
>>
>> > -Original Message-
>> > From: Guenter Roeck [mailto:li...@roeck-us.net]
>> > Sent: Monday, April 17, 2017 12:45 PM
>> > To: Moore, Robert 
>> > Cc: Zheng, Lv ; Wysocki, Rafael J
>> > ; 'Len Brown' ; 'linux-
>> > a...@vger.kernel.org' ; 'de...@acpica.org'
>> > ; 'linux-kernel@vger.kernel.org' > > ker...@vger.kernel.org>; Box, David E 
>> > Subject: Re: [PATCH] ACPICA: Export mutex functions
>> >
>> > On Mon, Apr 17, 2017 at 07:27:37PM +, Moore, Robert wrote:
>> > >
>> > > > -Original Message-
>> > > > From: Moore, Robert
>> > > > Sent: Monday, April 17, 2017 10:13 AM
>> > > > To: Guenter Roeck ; Zheng, Lv
>> > > > 
>> > > > Cc: Wysocki, Rafael J ; Len Brown
>> > > > ; linux-a...@vger.kernel.org; de...@acpica.org;
>> > > > linux- ker...@vger.kernel.org
>> > > > Subject: RE: [PATCH] ACPICA: Export mutex functions
>> > > >
>> > > > There is a model for the drivers to directly acquire an AML mutex
>> > > > object. That is why the acquire/release public interfaces were added
>> > > > to ACPICA.
>> > > >
>> > > > I forget all of the details, but the model was developed with MS and
>> > > > others during the ACPI 6.0 timeframe.
>> > > >
>> > > >
>> > > [Moore, Robert]
>> > >
>> > >
>> > > Here is the case where the OS may need to directly acquire an AML
>> > mutex:
>> > >
>> > > From the ACPI spec:
>> > >
>> > > 19.6.2 Acquire (Acquire a Mutex)
>> > >
>> > > Note: For Mutex objects referenced by a _DLM object, the host OS may
>> > also contend for ownership.
>> > >
>> > From the context in the dsdt, and from description of expected use cases
>> > for _DLM objects I can find, this is what the mutex is used for (to
>> > serialize access to a resource on a low pin count serial interconnect,
>> > aka LPC).
>> >
>> > What does that mean in practice ? That I am not supposed to use it
>> > because it doesn't follow standard ACPI mutex declaration rules ?
>> >
>> > Thanks,
>> > Guenter
>> >
>> > >
>> [Moore, Robert]
>>
>> I'm not an expert on the _DLM method, but I would point you to the 
>> description section in the ACPI spec, 5.7.5 _DLM (DeviceLock Mutex).
>>
>
> I did. However, not being an ACPI expert, that doesn't tell me anything.

Basically, if the kernel and AML need to access a device concurrently,
there should be a _DLM object under that device in the ACPI tables.
In that case it is expected to return a list of (AML) mutexes that can
be acquired by the kernel in order to synchronize device access with
respect to AML (and for each mutex it may also return a description of
the specific resources to be protected by it).

Bottom line: without _DLM, the kernel cannot synchronize things with
respect to AML properly, because it has no information how to do that
then.

Thanks,
Rafael


[tip:WIP.timers 8/10] kernel//time/timer_migration.c:190:4: error: implicit declaration of function 'timer_expire_remote'

2017-04-17 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.timers
head:   c0b7a5dbb870d1660aa5e566c5ce9972290a2bed
commit: 6a3164fa4cd35a587b5bb2e4bd86b75900af8286 [8/10] timer: Implement the 
hierarchical pull model
config: i386-randconfig-r0-201716 (attached as .config)
compiler: gcc-5 (Debian 5.4.1-2) 5.4.1 20160904
reproduce:
git checkout 6a3164fa4cd35a587b5bb2e4bd86b75900af8286
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   In file included from kernel//time/timer_migration.c:16:0:
   kernel//time/timer_migration.h: In function 'tmigr_cpu_idle':
   kernel//time/timer_migration.h:26:1: warning: no return statement in 
function returning non-void [-Wreturn-type]
static inline u64 tmigr_cpu_idle(u64 nextevt) { }
^
   kernel//time/timer_migration.c: In function '__tmigr_handle_remote':
>> kernel//time/timer_migration.c:190:4: error: implicit declaration of 
>> function 'timer_expire_remote' [-Werror=implicit-function-declaration]
   timer_expire_remote(evt->cpu);
   ^
   kernel//time/timer_migration.c: At top level:
>> kernel//time/timer_migration.c:223:6: error: redefinition of 
>> 'tmigr_handle_remote'
void tmigr_handle_remote(void)
 ^
   In file included from kernel//time/timer_migration.c:16:0:
   kernel//time/timer_migration.h:25:20: note: previous definition of 
'tmigr_handle_remote' was here
static inline void tmigr_handle_remote(void) { }
   ^
>> kernel//time/timer_migration.c:348:5: error: redefinition of 'tmigr_cpu_idle'
u64 tmigr_cpu_idle(u64 nextevt)
^
   In file included from kernel//time/timer_migration.c:16:0:
   kernel//time/timer_migration.h:26:19: note: previous definition of 
'tmigr_cpu_idle' was here
static inline u64 tmigr_cpu_idle(u64 nextevt) { }
  ^
>> kernel//time/timer_migration.c:406:6: error: redefinition of 
>> 'tmigr_cpu_activate'
void tmigr_cpu_activate(void)
 ^
   In file included from kernel//time/timer_migration.c:16:0:
   kernel//time/timer_migration.h:27:20: note: previous definition of 
'tmigr_cpu_activate' was here
static inline void tmigr_cpu_activate(void) { }
   ^
   cc1: some warnings being treated as errors

vim +/timer_expire_remote +190 kernel//time/timer_migration.c

10  #include 
11  #include 
12  #include 
13  #include 
14  #include 
15  
  > 16  #include "timer_migration.h"
17  #include "tick-internal.h"
18  
19  #ifdef DEBUG
20  # define DBG_BUG_ON(x)  BUG_ON(x)
21  #else
22  # define DBG_BUG_ON(x)
23  #endif
24  
25  /* Per group capacity. Must be a power of 2! */
26  static const unsigned int tmigr_childs_per_group = 8;
27  
28  bool tmigr_enabled __read_mostly;
29  static unsigned int tmigr_hierarchy_levels __read_mostly;
30  static unsigned int tmigr_crossnode_level __read_mostly;
31  static struct list_head *tmigr_level_list __read_mostly;
32  
33  static DEFINE_MUTEX(tmigr_mutex);
34  
35  static DEFINE_PER_CPU(struct tmigr_cpu, tmigr_cpu);
36  
37  static void tmigr_add_evt(struct tmigr_group *group, struct tmigr_event 
*evt)
38  {
39  /*
40   * Can be called with @evt == NULL, an already queued @evt or
41   * an event that do not need to be queued (expires ==
42   * KTIME_MAX)
43   */
44  if (!evt || !RB_EMPTY_NODE(>nextevt.node) ||
45  evt->nextevt.expires == KTIME_MAX)
46  return;
47  
48  /* @group->group event must not be queued in the parent group */
49  DBG_BUG_ON(!RB_EMPTY_NODE(>groupevt.nextevt.node));
50  
51  /*  If this is the new first to expire event, update group 
event */
52  if (timerqueue_add(>events, >nextevt)) {
53  group->groupevt.nextevt.expires = evt->nextevt.expires;
54  group->groupevt.cpu = evt->cpu;
55  }
56  }
57  
58  static void tmigr_remove_evt(struct tmigr_group *group, struct 
tmigr_event *evt)
59  {
60  struct timerqueue_node *next;
61  struct tmigr_event *nextevt;
62  bool first;
63  
64  /*
65   * It's safe to modify the group event of this group, because 
it is
66   * not queued in the parent group.
67   */
68  DBG_BUG_ON(!RB_EMPTY_NODE(>groupevt.nextevt.node));
69  
70  /* Remove the child event, if pending */
71  if (!evt || RB_EMPTY_NODE(>nextevt.node))
72  return;
73  /*
74   * If this was the last queued event in the group, clear
75   * the group event. If this was the first event to expire,
76   * update the group.
77   */
78  first = 

[tip:WIP.timers 8/10] kernel//time/timer_migration.c:190:4: error: implicit declaration of function 'timer_expire_remote'

2017-04-17 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.timers
head:   c0b7a5dbb870d1660aa5e566c5ce9972290a2bed
commit: 6a3164fa4cd35a587b5bb2e4bd86b75900af8286 [8/10] timer: Implement the 
hierarchical pull model
config: i386-randconfig-r0-201716 (attached as .config)
compiler: gcc-5 (Debian 5.4.1-2) 5.4.1 20160904
reproduce:
git checkout 6a3164fa4cd35a587b5bb2e4bd86b75900af8286
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   In file included from kernel//time/timer_migration.c:16:0:
   kernel//time/timer_migration.h: In function 'tmigr_cpu_idle':
   kernel//time/timer_migration.h:26:1: warning: no return statement in 
function returning non-void [-Wreturn-type]
static inline u64 tmigr_cpu_idle(u64 nextevt) { }
^
   kernel//time/timer_migration.c: In function '__tmigr_handle_remote':
>> kernel//time/timer_migration.c:190:4: error: implicit declaration of 
>> function 'timer_expire_remote' [-Werror=implicit-function-declaration]
   timer_expire_remote(evt->cpu);
   ^
   kernel//time/timer_migration.c: At top level:
>> kernel//time/timer_migration.c:223:6: error: redefinition of 
>> 'tmigr_handle_remote'
void tmigr_handle_remote(void)
 ^
   In file included from kernel//time/timer_migration.c:16:0:
   kernel//time/timer_migration.h:25:20: note: previous definition of 
'tmigr_handle_remote' was here
static inline void tmigr_handle_remote(void) { }
   ^
>> kernel//time/timer_migration.c:348:5: error: redefinition of 'tmigr_cpu_idle'
u64 tmigr_cpu_idle(u64 nextevt)
^
   In file included from kernel//time/timer_migration.c:16:0:
   kernel//time/timer_migration.h:26:19: note: previous definition of 
'tmigr_cpu_idle' was here
static inline u64 tmigr_cpu_idle(u64 nextevt) { }
  ^
>> kernel//time/timer_migration.c:406:6: error: redefinition of 
>> 'tmigr_cpu_activate'
void tmigr_cpu_activate(void)
 ^
   In file included from kernel//time/timer_migration.c:16:0:
   kernel//time/timer_migration.h:27:20: note: previous definition of 
'tmigr_cpu_activate' was here
static inline void tmigr_cpu_activate(void) { }
   ^
   cc1: some warnings being treated as errors

vim +/timer_expire_remote +190 kernel//time/timer_migration.c

10  #include 
11  #include 
12  #include 
13  #include 
14  #include 
15  
  > 16  #include "timer_migration.h"
17  #include "tick-internal.h"
18  
19  #ifdef DEBUG
20  # define DBG_BUG_ON(x)  BUG_ON(x)
21  #else
22  # define DBG_BUG_ON(x)
23  #endif
24  
25  /* Per group capacity. Must be a power of 2! */
26  static const unsigned int tmigr_childs_per_group = 8;
27  
28  bool tmigr_enabled __read_mostly;
29  static unsigned int tmigr_hierarchy_levels __read_mostly;
30  static unsigned int tmigr_crossnode_level __read_mostly;
31  static struct list_head *tmigr_level_list __read_mostly;
32  
33  static DEFINE_MUTEX(tmigr_mutex);
34  
35  static DEFINE_PER_CPU(struct tmigr_cpu, tmigr_cpu);
36  
37  static void tmigr_add_evt(struct tmigr_group *group, struct tmigr_event 
*evt)
38  {
39  /*
40   * Can be called with @evt == NULL, an already queued @evt or
41   * an event that do not need to be queued (expires ==
42   * KTIME_MAX)
43   */
44  if (!evt || !RB_EMPTY_NODE(>nextevt.node) ||
45  evt->nextevt.expires == KTIME_MAX)
46  return;
47  
48  /* @group->group event must not be queued in the parent group */
49  DBG_BUG_ON(!RB_EMPTY_NODE(>groupevt.nextevt.node));
50  
51  /*  If this is the new first to expire event, update group 
event */
52  if (timerqueue_add(>events, >nextevt)) {
53  group->groupevt.nextevt.expires = evt->nextevt.expires;
54  group->groupevt.cpu = evt->cpu;
55  }
56  }
57  
58  static void tmigr_remove_evt(struct tmigr_group *group, struct 
tmigr_event *evt)
59  {
60  struct timerqueue_node *next;
61  struct tmigr_event *nextevt;
62  bool first;
63  
64  /*
65   * It's safe to modify the group event of this group, because 
it is
66   * not queued in the parent group.
67   */
68  DBG_BUG_ON(!RB_EMPTY_NODE(>groupevt.nextevt.node));
69  
70  /* Remove the child event, if pending */
71  if (!evt || RB_EMPTY_NODE(>nextevt.node))
72  return;
73  /*
74   * If this was the last queued event in the group, clear
75   * the group event. If this was the first event to expire,
76   * update the group.
77   */
78  first = 

[tip:WIP.timers 8/10] kernel/time/timer_migration.h:26:1: warning: no return statement in function returning non-void

2017-04-17 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.timers
head:   c0b7a5dbb870d1660aa5e566c5ce9972290a2bed
commit: 6a3164fa4cd35a587b5bb2e4bd86b75900af8286 [8/10] timer: Implement the 
hierarchical pull model
config: x86_64-randconfig-x012-201716 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
git checkout 6a3164fa4cd35a587b5bb2e4bd86b75900af8286
# save the attached .config to linux build tree
make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   In file included from kernel/time/timer.c:54:0:
   kernel/time/timer_migration.h: In function 'tmigr_cpu_idle':
>> kernel/time/timer_migration.h:26:1: warning: no return statement in function 
>> returning non-void [-Wreturn-type]
static inline u64 tmigr_cpu_idle(u64 nextevt) { }
^~

vim +26 kernel/time/timer_migration.h

10  #define _KERNEL_TIME_MIGRATION_H
11  
12  #ifdef CONFIG_NO_HZ_COMMON
13  extern bool tmigr_enabled;
14  void tmigr_init(void);
15  #else
16  static inline void tmigr_init(void) { }
17  #endif
18  
19  #if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
20  extern void tmigr_handle_remote(void);
21  extern u64 tmigr_cpu_idle(u64 nextevt);
22  extern void tmigr_cpu_activate(void);
23  extern void timer_expire_remote(unsigned int cpu);
24  #else
25  static inline void tmigr_handle_remote(void) { }
  > 26  static inline u64 tmigr_cpu_idle(u64 nextevt) { }
27  static inline void tmigr_cpu_activate(void) { }
28  #endif
29  
30  #define TMIGR_NONE  (~0U)
31  
32  /**
33   * struct tmigr_event - A timer event associated to a CPU or a group
34   * @nextevt:The node to enqueue an event in the group queue

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


[tip:WIP.timers 8/10] kernel/time/timer_migration.h:26:1: warning: no return statement in function returning non-void

2017-04-17 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.timers
head:   c0b7a5dbb870d1660aa5e566c5ce9972290a2bed
commit: 6a3164fa4cd35a587b5bb2e4bd86b75900af8286 [8/10] timer: Implement the 
hierarchical pull model
config: x86_64-randconfig-x012-201716 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
git checkout 6a3164fa4cd35a587b5bb2e4bd86b75900af8286
# save the attached .config to linux build tree
make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   In file included from kernel/time/timer.c:54:0:
   kernel/time/timer_migration.h: In function 'tmigr_cpu_idle':
>> kernel/time/timer_migration.h:26:1: warning: no return statement in function 
>> returning non-void [-Wreturn-type]
static inline u64 tmigr_cpu_idle(u64 nextevt) { }
^~

vim +26 kernel/time/timer_migration.h

10  #define _KERNEL_TIME_MIGRATION_H
11  
12  #ifdef CONFIG_NO_HZ_COMMON
13  extern bool tmigr_enabled;
14  void tmigr_init(void);
15  #else
16  static inline void tmigr_init(void) { }
17  #endif
18  
19  #if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
20  extern void tmigr_handle_remote(void);
21  extern u64 tmigr_cpu_idle(u64 nextevt);
22  extern void tmigr_cpu_activate(void);
23  extern void timer_expire_remote(unsigned int cpu);
24  #else
25  static inline void tmigr_handle_remote(void) { }
  > 26  static inline u64 tmigr_cpu_idle(u64 nextevt) { }
27  static inline void tmigr_cpu_activate(void) { }
28  #endif
29  
30  #define TMIGR_NONE  (~0U)
31  
32  /**
33   * struct tmigr_event - A timer event associated to a CPU or a group
34   * @nextevt:The node to enqueue an event in the group queue

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-17 Thread Benjamin Herrenschmidt
On Mon, 2017-04-17 at 10:52 -0600, Logan Gunthorpe wrote:
> 
> On 17/04/17 01:20 AM, Benjamin Herrenschmidt wrote:
> > But is it ? For example take a GPU, does it, in your scheme, need an
> > additional "p2pmem" child ? Why can't the GPU driver just use some
> > helper to instantiate the necessary struct pages ? What does having an
> > actual "struct device" child buys you ?
> 
> Yes, in this scheme, it needs an additional p2pmem child. Why is that an
> issue? It certainly makes it a lot easier for the user to understand the
> p2pmem memory in the system (through the sysfs tree) and reason about
> the topology and when to use it. This is important.

Is it ? Again, you create a "concept" the user may have no idea about,
"p2pmem memory". So now any kind of memory buffer on a device can could
be use for p2p but also potentially a bunch of other things becomes
special and called "p2pmem" ...

> > > 2) In order to create the struct pages we use the ZONE_DEVICE
> > > infrastructure which requires a struct device. (See
> > > devm_memremap_pages.)
> > 
> > Yup, but you already have one in the actual pci_dev ... What is the
> > benefit of adding a second one ?
> 
> But that would tie all of this very tightly to be pci only and may get
> hard to differentiate if more users of ZONE_DEVICE crop up who happen to
> be using a pci device.

But what do you have in p2pmem that somebody benefits from. Again I
don't understand what that "p2pmem" device buys you in term of
functionality vs. having the device just instanciate the pages.

Now having some kind of way to override the dma_ops, yes I do get that,
and it could be that this "p2pmem" is typically the way to do it, but
at the moment you don't even have that. So I'm a bit at a loss here.
 
>  Having a specific class for this makes it very
> clear how this memory would be handled.

But it doesn't *have* to be. Again, take my GPU example. The fact that
a NIC might be able to DMA into it doesn't make it specifically "p2p
memory".

Essentially you are saying that any device that happens to have a piece
of mappable "memory" (or something that behaves like it) and can be
DMA'ed into should now have that "p2pmem" thing attached to it.

Now take an example where that becomes really awkward (it's also a real
example of something people want to do). I have a NIC and a GPU, the
NIC DMA's data to/from the GPU, but they also want to poke at each
other doorbell, the GPU to kick the NIC into action when data is ready
to send, the NIC to poke the GPU when data has been received.

Those doorbells are MMIO registers.

So now your "p2pmem" device needs to also be laid out on top of those
MMIO registers ? It's becoming weird.

See, basically, doing peer 2 peer between devices has 3 main challenges
today: The DMA API needing struct pages, the MMIO translation issues
and the IOMMU translation issues.

You seem to create that added device as some kind of "owner" for the
struct pages, solving #1, but leave #2 and #3 alone.

Now, as I said, it could very well be that having the devmap pointer
point to some specific device-type with a well known structure to
provide solutions for #2 and #3 such as dma_ops overrides, is indeed
the right way to solve these problems.

If we go down that path, though, rather than calling it p2pmem I would
call it something like dma_target which I find much clearer especially
since it doesn't have to be just memory.

For the sole case of creating struct page's however, I fail to see the
point.

>  For example, although I haven't
> looked into it, this could very well be a point of conflict with HMM. If
> they were to use the pci device to populate the dev_pagemap then we
> couldn't also use the pci device. I feel it's much better for users of
> dev_pagemap to have their struct devices they own to avoid such conflicts.

If we are going to create some sort of struct dma_target, HMM could
potentially just look for the parent if it needs the PCI device.

> > >  This amazingly gets us the get_dev_pagemap
> > > architecture which also uses a struct device. So by using a p2pmem
> > > device we can go from struct page to struct device to p2pmem device
> > > quickly and effortlessly.
> > 
> > Which isn't terribly useful in itself right ? What you care about is
> > the "enclosing" pci_dev no ? Or am I missing something ?
> 
> Sure it is. What if we want to someday support p2pmem that's on another bus?

But why not directly use that other bus' device in that case ?

> > > 3) You wouldn't want to use the pci's struct device because it doesn't
> > > really describe what's going on. For example, there may be multiple
> > > devices on the pci device in question: eg. an NVME card and some p2pmem.
> > 
> > What is "some p2pmem" ?
> > > Or it could be a NIC with some p2pmem.
> > 
> > Again what is "some p2pmem" ?
> 
> Some device local memory intended for use as a DMA target from a
> neighbour device or itself. On a PCI device, this would be a BAR, or a
> portion of a BAR with memory 

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-17 Thread Benjamin Herrenschmidt
On Mon, 2017-04-17 at 10:52 -0600, Logan Gunthorpe wrote:
> 
> On 17/04/17 01:20 AM, Benjamin Herrenschmidt wrote:
> > But is it ? For example take a GPU, does it, in your scheme, need an
> > additional "p2pmem" child ? Why can't the GPU driver just use some
> > helper to instantiate the necessary struct pages ? What does having an
> > actual "struct device" child buys you ?
> 
> Yes, in this scheme, it needs an additional p2pmem child. Why is that an
> issue? It certainly makes it a lot easier for the user to understand the
> p2pmem memory in the system (through the sysfs tree) and reason about
> the topology and when to use it. This is important.

Is it ? Again, you create a "concept" the user may have no idea about,
"p2pmem memory". So now any kind of memory buffer on a device can could
be use for p2p but also potentially a bunch of other things becomes
special and called "p2pmem" ...

> > > 2) In order to create the struct pages we use the ZONE_DEVICE
> > > infrastructure which requires a struct device. (See
> > > devm_memremap_pages.)
> > 
> > Yup, but you already have one in the actual pci_dev ... What is the
> > benefit of adding a second one ?
> 
> But that would tie all of this very tightly to be pci only and may get
> hard to differentiate if more users of ZONE_DEVICE crop up who happen to
> be using a pci device.

But what do you have in p2pmem that somebody benefits from. Again I
don't understand what that "p2pmem" device buys you in term of
functionality vs. having the device just instanciate the pages.

Now having some kind of way to override the dma_ops, yes I do get that,
and it could be that this "p2pmem" is typically the way to do it, but
at the moment you don't even have that. So I'm a bit at a loss here.
 
>  Having a specific class for this makes it very
> clear how this memory would be handled.

But it doesn't *have* to be. Again, take my GPU example. The fact that
a NIC might be able to DMA into it doesn't make it specifically "p2p
memory".

Essentially you are saying that any device that happens to have a piece
of mappable "memory" (or something that behaves like it) and can be
DMA'ed into should now have that "p2pmem" thing attached to it.

Now take an example where that becomes really awkward (it's also a real
example of something people want to do). I have a NIC and a GPU, the
NIC DMA's data to/from the GPU, but they also want to poke at each
other doorbell, the GPU to kick the NIC into action when data is ready
to send, the NIC to poke the GPU when data has been received.

Those doorbells are MMIO registers.

So now your "p2pmem" device needs to also be laid out on top of those
MMIO registers ? It's becoming weird.

See, basically, doing peer 2 peer between devices has 3 main challenges
today: The DMA API needing struct pages, the MMIO translation issues
and the IOMMU translation issues.

You seem to create that added device as some kind of "owner" for the
struct pages, solving #1, but leave #2 and #3 alone.

Now, as I said, it could very well be that having the devmap pointer
point to some specific device-type with a well known structure to
provide solutions for #2 and #3 such as dma_ops overrides, is indeed
the right way to solve these problems.

If we go down that path, though, rather than calling it p2pmem I would
call it something like dma_target which I find much clearer especially
since it doesn't have to be just memory.

For the sole case of creating struct page's however, I fail to see the
point.

>  For example, although I haven't
> looked into it, this could very well be a point of conflict with HMM. If
> they were to use the pci device to populate the dev_pagemap then we
> couldn't also use the pci device. I feel it's much better for users of
> dev_pagemap to have their struct devices they own to avoid such conflicts.

If we are going to create some sort of struct dma_target, HMM could
potentially just look for the parent if it needs the PCI device.

> > >  This amazingly gets us the get_dev_pagemap
> > > architecture which also uses a struct device. So by using a p2pmem
> > > device we can go from struct page to struct device to p2pmem device
> > > quickly and effortlessly.
> > 
> > Which isn't terribly useful in itself right ? What you care about is
> > the "enclosing" pci_dev no ? Or am I missing something ?
> 
> Sure it is. What if we want to someday support p2pmem that's on another bus?

But why not directly use that other bus' device in that case ?

> > > 3) You wouldn't want to use the pci's struct device because it doesn't
> > > really describe what's going on. For example, there may be multiple
> > > devices on the pci device in question: eg. an NVME card and some p2pmem.
> > 
> > What is "some p2pmem" ?
> > > Or it could be a NIC with some p2pmem.
> > 
> > Again what is "some p2pmem" ?
> 
> Some device local memory intended for use as a DMA target from a
> neighbour device or itself. On a PCI device, this would be a BAR, or a
> portion of a BAR with memory 

Re: [PATCH] ACPICA: Export mutex functions

2017-04-17 Thread Guenter Roeck
On Mon, Apr 17, 2017 at 08:40:38PM +, Moore, Robert wrote:
> 
> 
> > -Original Message-
> > From: Guenter Roeck [mailto:li...@roeck-us.net]
> > Sent: Monday, April 17, 2017 12:45 PM
> > To: Moore, Robert 
> > Cc: Zheng, Lv ; Wysocki, Rafael J
> > ; 'Len Brown' ; 'linux-
> > a...@vger.kernel.org' ; 'de...@acpica.org'
> > ; 'linux-kernel@vger.kernel.org'  > ker...@vger.kernel.org>; Box, David E 
> > Subject: Re: [PATCH] ACPICA: Export mutex functions
> > 
> > On Mon, Apr 17, 2017 at 07:27:37PM +, Moore, Robert wrote:
> > >
> > > > -Original Message-
> > > > From: Moore, Robert
> > > > Sent: Monday, April 17, 2017 10:13 AM
> > > > To: Guenter Roeck ; Zheng, Lv
> > > > 
> > > > Cc: Wysocki, Rafael J ; Len Brown
> > > > ; linux-a...@vger.kernel.org; de...@acpica.org;
> > > > linux- ker...@vger.kernel.org
> > > > Subject: RE: [PATCH] ACPICA: Export mutex functions
> > > >
> > > > There is a model for the drivers to directly acquire an AML mutex
> > > > object. That is why the acquire/release public interfaces were added
> > > > to ACPICA.
> > > >
> > > > I forget all of the details, but the model was developed with MS and
> > > > others during the ACPI 6.0 timeframe.
> > > >
> > > >
> > > [Moore, Robert]
> > >
> > >
> > > Here is the case where the OS may need to directly acquire an AML
> > mutex:
> > >
> > > From the ACPI spec:
> > >
> > > 19.6.2 Acquire (Acquire a Mutex)
> > >
> > > Note: For Mutex objects referenced by a _DLM object, the host OS may
> > also contend for ownership.
> > >
> > From the context in the dsdt, and from description of expected use cases
> > for _DLM objects I can find, this is what the mutex is used for (to
> > serialize access to a resource on a low pin count serial interconnect,
> > aka LPC).
> > 
> > What does that mean in practice ? That I am not supposed to use it
> > because it doesn't follow standard ACPI mutex declaration rules ?
> > 
> > Thanks,
> > Guenter
> > 
> > >
> [Moore, Robert] 
> 
> I'm not an expert on the _DLM method, but I would point you to the 
> description section in the ACPI spec, 5.7.5 _DLM (DeviceLock Mutex).
> 

I did. However, not being an ACPI expert, that doesn't tell me anything.

Guenter

> 
> 
> > >
> > >
> > > Other than this case, the OS/drivers should never need to directly
> > acquire an AML mutex.
> > > Bob
> > >


Re: [PATCH] ACPICA: Export mutex functions

2017-04-17 Thread Guenter Roeck
On Mon, Apr 17, 2017 at 08:40:38PM +, Moore, Robert wrote:
> 
> 
> > -Original Message-
> > From: Guenter Roeck [mailto:li...@roeck-us.net]
> > Sent: Monday, April 17, 2017 12:45 PM
> > To: Moore, Robert 
> > Cc: Zheng, Lv ; Wysocki, Rafael J
> > ; 'Len Brown' ; 'linux-
> > a...@vger.kernel.org' ; 'de...@acpica.org'
> > ; 'linux-kernel@vger.kernel.org'  > ker...@vger.kernel.org>; Box, David E 
> > Subject: Re: [PATCH] ACPICA: Export mutex functions
> > 
> > On Mon, Apr 17, 2017 at 07:27:37PM +, Moore, Robert wrote:
> > >
> > > > -Original Message-
> > > > From: Moore, Robert
> > > > Sent: Monday, April 17, 2017 10:13 AM
> > > > To: Guenter Roeck ; Zheng, Lv
> > > > 
> > > > Cc: Wysocki, Rafael J ; Len Brown
> > > > ; linux-a...@vger.kernel.org; de...@acpica.org;
> > > > linux- ker...@vger.kernel.org
> > > > Subject: RE: [PATCH] ACPICA: Export mutex functions
> > > >
> > > > There is a model for the drivers to directly acquire an AML mutex
> > > > object. That is why the acquire/release public interfaces were added
> > > > to ACPICA.
> > > >
> > > > I forget all of the details, but the model was developed with MS and
> > > > others during the ACPI 6.0 timeframe.
> > > >
> > > >
> > > [Moore, Robert]
> > >
> > >
> > > Here is the case where the OS may need to directly acquire an AML
> > mutex:
> > >
> > > From the ACPI spec:
> > >
> > > 19.6.2 Acquire (Acquire a Mutex)
> > >
> > > Note: For Mutex objects referenced by a _DLM object, the host OS may
> > also contend for ownership.
> > >
> > From the context in the dsdt, and from description of expected use cases
> > for _DLM objects I can find, this is what the mutex is used for (to
> > serialize access to a resource on a low pin count serial interconnect,
> > aka LPC).
> > 
> > What does that mean in practice ? That I am not supposed to use it
> > because it doesn't follow standard ACPI mutex declaration rules ?
> > 
> > Thanks,
> > Guenter
> > 
> > >
> [Moore, Robert] 
> 
> I'm not an expert on the _DLM method, but I would point you to the 
> description section in the ACPI spec, 5.7.5 _DLM (DeviceLock Mutex).
> 

I did. However, not being an ACPI expert, that doesn't tell me anything.

Guenter

> 
> 
> > >
> > >
> > > Other than this case, the OS/drivers should never need to directly
> > acquire an AML mutex.
> > > Bob
> > >


Re: [PATCH] device-mapper: Convert printks to pr_ macros

2017-04-17 Thread kbuild test robot
Hi Joe,

[auto build test ERROR on v4.9-rc8]
[also build test ERROR on next-20170413]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Joe-Perches/device-mapper-Convert-printks-to-pr_-level-macros/20170418-030508
config: i386-allmodconfig (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All error/warnings (new ones prefixed by >>):

   In file included from fs/nfs/blocklayout/blocklayout.h:35:0,
from fs/nfs/blocklayout/dev.c:11:
   fs/nfs/blocklayout/dev.c: In function 'bl_free_device':
>> include/linux/device-mapper.h:536:34: error: expected ')' before 
>> 'DM_MSG_PREFIX'
#define pr_fmt(fmt) DM_NAME ": " DM_MSG_PREFIX ": " fmt
 ^
>> include/linux/printk.h:277:18: note: in expansion of macro 'pr_fmt'
 printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
 ^~
>> fs/nfs/blocklayout/dev.c:33:5: note: in expansion of macro 'pr_err'
pr_err("failed to unregister PR key.\n");
^~
   fs/nfs/blocklayout/dev.c: In function 'nfs4_block_decode_volume':
>> include/linux/device-mapper.h:536:34: error: expected ')' before 
>> 'DM_MSG_PREFIX'
#define pr_fmt(fmt) DM_NAME ": " DM_MSG_PREFIX ": " fmt
 ^
   include/linux/printk.h:284:19: note: in expansion of macro 'pr_fmt'
 printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
  ^~
>> fs/nfs/blocklayout/dev.c:81:5: note: in expansion of macro 'pr_info'
pr_info("signature too long: %d\n",
^~~
   fs/nfs/blocklayout/dev.c: In function 'bl_validate_designator':
>> include/linux/device-mapper.h:536:34: error: expected ')' before 
>> 'DM_MSG_PREFIX'
#define pr_fmt(fmt) DM_NAME ": " DM_MSG_PREFIX ": " fmt
 ^
>> include/linux/printk.h:277:18: note: in expansion of macro 'pr_fmt'
 printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
 ^~
   fs/nfs/blocklayout/dev.c:287:3: note: in expansion of macro 'pr_err'
  pr_err("pNFS: unsupported designator "
  ^~
>> include/linux/device-mapper.h:536:34: error: expected ')' before 
>> 'DM_MSG_PREFIX'
#define pr_fmt(fmt) DM_NAME ": " DM_MSG_PREFIX ": " fmt
 ^
>> include/linux/printk.h:277:18: note: in expansion of macro 'pr_fmt'
 printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
 ^~
   fs/nfs/blocklayout/dev.c:294:3: note: in expansion of macro 'pr_err'
  pr_err("pNFS: invalid designator "
  ^~
   fs/nfs/blocklayout/dev.c: In function 'bl_open_udev_path':
>> include/linux/device-mapper.h:536:34: error: expected ')' before 
>> 'DM_MSG_PREFIX'
#define pr_fmt(fmt) DM_NAME ": " DM_MSG_PREFIX ": " fmt
 ^
   include/linux/printk.h:279:22: note: in expansion of macro 'pr_fmt'
 printk(KERN_WARNING pr_fmt(fmt), ##__VA_ARGS__)
 ^~
   include/linux/printk.h:280:17: note: in expansion of macro 'pr_warning'
#define pr_warn pr_warning
^~
>> fs/nfs/blocklayout/dev.c:320:3: note: in expansion of macro 'pr_warn'
  pr_warn("pNFS: failed to open device %s (%ld)\n",
  ^~~
   fs/nfs/blocklayout/dev.c: In function 'bl_parse_scsi':
>> include/linux/device-mapper.h:536:34: error: expected ')' before 
>> 'DM_MSG_PREFIX'
#define pr_fmt(fmt) DM_NAME ": " DM_MSG_PREFIX ": " fmt
 ^
   include/linux/printk.h:284:19: note: in expansion of macro 'pr_fmt'
 printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
  ^~
   fs/nfs/blocklayout/dev.c:373:2: note: in expansion of macro 'pr_info'
 pr_info("pNFS: using block device %s (reservation key 0x%llx)\n",
 ^~~
>> include/linux/device-mapper.h:536:34: error: expected ')' before 
>> 'DM_MSG_PREFIX'
#define pr_fmt(fmt) DM_NAME ": " DM_MSG_PREFIX ": " fmt
 ^
>> include/linux/printk.h:277:18: note: in expansion of macro 'pr_fmt'
 printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
 ^~
   fs/nfs/blocklayout/dev.c:378:3: note: in expansion of macro 'pr_err'
  pr_err("pNFS: block device %s does not support reservations.",
  ^~
>> include/linux/device-mapper.h:536:34: error: expected ')' before 
>> 'DM_MSG_PREFIX'
#define pr_fmt(fmt) DM_NAME ": " DM_MSG_PREFIX ": " fmt
 ^
>> include/linux/printk.h:277:18: note: in expansion of macro 'pr_fmt'
 printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
 ^~
   fs/nfs/blocklayout/dev.c:386:3: note: in expansion of macro 'pr_err'
  pr_err("pNFS: failed to register key for block device %s.",
  ^~
--
   In file included from 

Re: [PATCH] device-mapper: Convert printks to pr_ macros

2017-04-17 Thread kbuild test robot
Hi Joe,

[auto build test ERROR on v4.9-rc8]
[also build test ERROR on next-20170413]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Joe-Perches/device-mapper-Convert-printks-to-pr_-level-macros/20170418-030508
config: i386-allmodconfig (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All error/warnings (new ones prefixed by >>):

   In file included from fs/nfs/blocklayout/blocklayout.h:35:0,
from fs/nfs/blocklayout/dev.c:11:
   fs/nfs/blocklayout/dev.c: In function 'bl_free_device':
>> include/linux/device-mapper.h:536:34: error: expected ')' before 
>> 'DM_MSG_PREFIX'
#define pr_fmt(fmt) DM_NAME ": " DM_MSG_PREFIX ": " fmt
 ^
>> include/linux/printk.h:277:18: note: in expansion of macro 'pr_fmt'
 printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
 ^~
>> fs/nfs/blocklayout/dev.c:33:5: note: in expansion of macro 'pr_err'
pr_err("failed to unregister PR key.\n");
^~
   fs/nfs/blocklayout/dev.c: In function 'nfs4_block_decode_volume':
>> include/linux/device-mapper.h:536:34: error: expected ')' before 
>> 'DM_MSG_PREFIX'
#define pr_fmt(fmt) DM_NAME ": " DM_MSG_PREFIX ": " fmt
 ^
   include/linux/printk.h:284:19: note: in expansion of macro 'pr_fmt'
 printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
  ^~
>> fs/nfs/blocklayout/dev.c:81:5: note: in expansion of macro 'pr_info'
pr_info("signature too long: %d\n",
^~~
   fs/nfs/blocklayout/dev.c: In function 'bl_validate_designator':
>> include/linux/device-mapper.h:536:34: error: expected ')' before 
>> 'DM_MSG_PREFIX'
#define pr_fmt(fmt) DM_NAME ": " DM_MSG_PREFIX ": " fmt
 ^
>> include/linux/printk.h:277:18: note: in expansion of macro 'pr_fmt'
 printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
 ^~
   fs/nfs/blocklayout/dev.c:287:3: note: in expansion of macro 'pr_err'
  pr_err("pNFS: unsupported designator "
  ^~
>> include/linux/device-mapper.h:536:34: error: expected ')' before 
>> 'DM_MSG_PREFIX'
#define pr_fmt(fmt) DM_NAME ": " DM_MSG_PREFIX ": " fmt
 ^
>> include/linux/printk.h:277:18: note: in expansion of macro 'pr_fmt'
 printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
 ^~
   fs/nfs/blocklayout/dev.c:294:3: note: in expansion of macro 'pr_err'
  pr_err("pNFS: invalid designator "
  ^~
   fs/nfs/blocklayout/dev.c: In function 'bl_open_udev_path':
>> include/linux/device-mapper.h:536:34: error: expected ')' before 
>> 'DM_MSG_PREFIX'
#define pr_fmt(fmt) DM_NAME ": " DM_MSG_PREFIX ": " fmt
 ^
   include/linux/printk.h:279:22: note: in expansion of macro 'pr_fmt'
 printk(KERN_WARNING pr_fmt(fmt), ##__VA_ARGS__)
 ^~
   include/linux/printk.h:280:17: note: in expansion of macro 'pr_warning'
#define pr_warn pr_warning
^~
>> fs/nfs/blocklayout/dev.c:320:3: note: in expansion of macro 'pr_warn'
  pr_warn("pNFS: failed to open device %s (%ld)\n",
  ^~~
   fs/nfs/blocklayout/dev.c: In function 'bl_parse_scsi':
>> include/linux/device-mapper.h:536:34: error: expected ')' before 
>> 'DM_MSG_PREFIX'
#define pr_fmt(fmt) DM_NAME ": " DM_MSG_PREFIX ": " fmt
 ^
   include/linux/printk.h:284:19: note: in expansion of macro 'pr_fmt'
 printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
  ^~
   fs/nfs/blocklayout/dev.c:373:2: note: in expansion of macro 'pr_info'
 pr_info("pNFS: using block device %s (reservation key 0x%llx)\n",
 ^~~
>> include/linux/device-mapper.h:536:34: error: expected ')' before 
>> 'DM_MSG_PREFIX'
#define pr_fmt(fmt) DM_NAME ": " DM_MSG_PREFIX ": " fmt
 ^
>> include/linux/printk.h:277:18: note: in expansion of macro 'pr_fmt'
 printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
 ^~
   fs/nfs/blocklayout/dev.c:378:3: note: in expansion of macro 'pr_err'
  pr_err("pNFS: block device %s does not support reservations.",
  ^~
>> include/linux/device-mapper.h:536:34: error: expected ')' before 
>> 'DM_MSG_PREFIX'
#define pr_fmt(fmt) DM_NAME ": " DM_MSG_PREFIX ": " fmt
 ^
>> include/linux/printk.h:277:18: note: in expansion of macro 'pr_fmt'
 printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
 ^~
   fs/nfs/blocklayout/dev.c:386:3: note: in expansion of macro 'pr_err'
  pr_err("pNFS: failed to register key for block device %s.",
  ^~
--
   In file included from 

[PATCH] mac80211: ibss: Fix channel type enum in ieee80211_sta_join_ibss()

2017-04-17 Thread Matthias Kaehlcke
cfg80211_chandef_create() expects an 'enum nl80211_channel_type' as
channel type however in ieee80211_sta_join_ibss()
NL80211_CHAN_WIDTH_20_NOHT is passed in two occasions, which is of
the enum type 'nl80211_chan_width'. Change the value to NL80211_CHAN_NO_HT
(20 MHz, non-HT channel) of the channel type enum.

Signed-off-by: Matthias Kaehlcke 
---
 net/mac80211/ibss.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/mac80211/ibss.c b/net/mac80211/ibss.c
index 98999d3d5262..e957351976a2 100644
--- a/net/mac80211/ibss.c
+++ b/net/mac80211/ibss.c
@@ -425,7 +425,7 @@ static void ieee80211_sta_join_ibss(struct 
ieee80211_sub_if_data *sdata,
case NL80211_CHAN_WIDTH_5:
case NL80211_CHAN_WIDTH_10:
cfg80211_chandef_create(, cbss->channel,
-   NL80211_CHAN_WIDTH_20_NOHT);
+   NL80211_CHAN_NO_HT);
chandef.width = sdata->u.ibss.chandef.width;
break;
case NL80211_CHAN_WIDTH_80:
@@ -437,7 +437,7 @@ static void ieee80211_sta_join_ibss(struct 
ieee80211_sub_if_data *sdata,
default:
/* fall back to 20 MHz for unsupported modes */
cfg80211_chandef_create(, cbss->channel,
-   NL80211_CHAN_WIDTH_20_NOHT);
+   NL80211_CHAN_NO_HT);
break;
}
 
-- 
2.12.2.762.g0e3151a226-goog



[PATCH] mac80211: ibss: Fix channel type enum in ieee80211_sta_join_ibss()

2017-04-17 Thread Matthias Kaehlcke
cfg80211_chandef_create() expects an 'enum nl80211_channel_type' as
channel type however in ieee80211_sta_join_ibss()
NL80211_CHAN_WIDTH_20_NOHT is passed in two occasions, which is of
the enum type 'nl80211_chan_width'. Change the value to NL80211_CHAN_NO_HT
(20 MHz, non-HT channel) of the channel type enum.

Signed-off-by: Matthias Kaehlcke 
---
 net/mac80211/ibss.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/mac80211/ibss.c b/net/mac80211/ibss.c
index 98999d3d5262..e957351976a2 100644
--- a/net/mac80211/ibss.c
+++ b/net/mac80211/ibss.c
@@ -425,7 +425,7 @@ static void ieee80211_sta_join_ibss(struct 
ieee80211_sub_if_data *sdata,
case NL80211_CHAN_WIDTH_5:
case NL80211_CHAN_WIDTH_10:
cfg80211_chandef_create(, cbss->channel,
-   NL80211_CHAN_WIDTH_20_NOHT);
+   NL80211_CHAN_NO_HT);
chandef.width = sdata->u.ibss.chandef.width;
break;
case NL80211_CHAN_WIDTH_80:
@@ -437,7 +437,7 @@ static void ieee80211_sta_join_ibss(struct 
ieee80211_sub_if_data *sdata,
default:
/* fall back to 20 MHz for unsupported modes */
cfg80211_chandef_create(, cbss->channel,
-   NL80211_CHAN_WIDTH_20_NOHT);
+   NL80211_CHAN_NO_HT);
break;
}
 
-- 
2.12.2.762.g0e3151a226-goog



[PATCH] sched/rt: minimize rq->lock contention in do_sched_rt_period_timer()

2017-04-17 Thread Dave Kleikamp
With CONFIG_RT_GROUP_SCHED defined, do_sched_rt_period_timer() sequentially
takes each cpu's rq->lock. On a large, busy system, the cumulative time it
takes to acquire each lock can be excessive, even triggering a watchdog
timeout.

If rt_rq_rt_time and rt_rq->rt_nr_running are both zero, this function does
nothing while holding the lock, so don't bother taking it at all.

Orabug: 25491970

Signed-off-by: Dave Kleikamp 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
---
 kernel/sched/rt.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 9f3e40226dec..ae4a8c529a02 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -840,6 +840,17 @@ static int do_sched_rt_period_timer(struct rt_bandwidth 
*rt_b, int overrun)
int enqueue = 0;
struct rt_rq *rt_rq = sched_rt_period_rt_rq(rt_b, i);
struct rq *rq = rq_of_rt_rq(rt_rq);
+   int skip;
+
+   /*
+* When span == cpu_online_mask, taking each rq->lock
+* can be time-consuming. Try to avoid it when possible.
+*/
+   raw_spin_lock(_rq->rt_runtime_lock);
+   skip = !rt_rq->rt_time && !rt_rq->rt_nr_running;
+   raw_spin_unlock(_rq->rt_runtime_lock);
+   if (skip)
+   continue;
 
raw_spin_lock(>lock);
if (rt_rq->rt_time) {
-- 
2.12.2


[PATCH] sched/rt: minimize rq->lock contention in do_sched_rt_period_timer()

2017-04-17 Thread Dave Kleikamp
With CONFIG_RT_GROUP_SCHED defined, do_sched_rt_period_timer() sequentially
takes each cpu's rq->lock. On a large, busy system, the cumulative time it
takes to acquire each lock can be excessive, even triggering a watchdog
timeout.

If rt_rq_rt_time and rt_rq->rt_nr_running are both zero, this function does
nothing while holding the lock, so don't bother taking it at all.

Orabug: 25491970

Signed-off-by: Dave Kleikamp 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
---
 kernel/sched/rt.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 9f3e40226dec..ae4a8c529a02 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -840,6 +840,17 @@ static int do_sched_rt_period_timer(struct rt_bandwidth 
*rt_b, int overrun)
int enqueue = 0;
struct rt_rq *rt_rq = sched_rt_period_rt_rq(rt_b, i);
struct rq *rq = rq_of_rt_rq(rt_rq);
+   int skip;
+
+   /*
+* When span == cpu_online_mask, taking each rq->lock
+* can be time-consuming. Try to avoid it when possible.
+*/
+   raw_spin_lock(_rq->rt_runtime_lock);
+   skip = !rt_rq->rt_time && !rt_rq->rt_nr_running;
+   raw_spin_unlock(_rq->rt_runtime_lock);
+   if (skip)
+   continue;
 
raw_spin_lock(>lock);
if (rt_rq->rt_time) {
-- 
2.12.2


[PATCH] hidma_mgmt_sys: Use devm_kmalloc_array() in hidma_mgmt_init_sys()

2017-04-17 Thread SF Markus Elfring
From: Markus Elfring 
Date: Mon, 17 Apr 2017 22:42:40 +0200

* A multiplication for the size determination of a memory allocation
  indicated that an array data structure should be processed.
  Thus use the corresponding function "devm_kmalloc_array".

  This issue was detected by using the Coccinelle software.

* Delete the local variable "required" which became unnecessary
  with this refactoring.

Signed-off-by: Markus Elfring 
---
 drivers/dma/qcom/hidma_mgmt_sys.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/dma/qcom/hidma_mgmt_sys.c 
b/drivers/dma/qcom/hidma_mgmt_sys.c
index d61f1068a34b..e7222a0e9ba4 100644
--- a/drivers/dma/qcom/hidma_mgmt_sys.c
+++ b/drivers/dma/qcom/hidma_mgmt_sys.c
@@ -245,11 +245,12 @@ int hidma_mgmt_init_sys(struct hidma_mgmt_dev *mdev)
 {
unsigned int i;
int rc;
-   int required;
struct kobject *chanops;
 
-   required = sizeof(*mdev->chroots) * mdev->dma_channels;
-   mdev->chroots = devm_kmalloc(>pdev->dev, required, GFP_KERNEL);
+   mdev->chroots = devm_kmalloc_array(>pdev->dev,
+  mdev->dma_channels,
+  sizeof(*mdev->chroots),
+  GFP_KERNEL);
if (!mdev->chroots)
return -ENOMEM;
 
-- 
2.12.2



[PATCH] hidma_mgmt_sys: Use devm_kmalloc_array() in hidma_mgmt_init_sys()

2017-04-17 Thread SF Markus Elfring
From: Markus Elfring 
Date: Mon, 17 Apr 2017 22:42:40 +0200

* A multiplication for the size determination of a memory allocation
  indicated that an array data structure should be processed.
  Thus use the corresponding function "devm_kmalloc_array".

  This issue was detected by using the Coccinelle software.

* Delete the local variable "required" which became unnecessary
  with this refactoring.

Signed-off-by: Markus Elfring 
---
 drivers/dma/qcom/hidma_mgmt_sys.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/dma/qcom/hidma_mgmt_sys.c 
b/drivers/dma/qcom/hidma_mgmt_sys.c
index d61f1068a34b..e7222a0e9ba4 100644
--- a/drivers/dma/qcom/hidma_mgmt_sys.c
+++ b/drivers/dma/qcom/hidma_mgmt_sys.c
@@ -245,11 +245,12 @@ int hidma_mgmt_init_sys(struct hidma_mgmt_dev *mdev)
 {
unsigned int i;
int rc;
-   int required;
struct kobject *chanops;
 
-   required = sizeof(*mdev->chroots) * mdev->dma_channels;
-   mdev->chroots = devm_kmalloc(>pdev->dev, required, GFP_KERNEL);
+   mdev->chroots = devm_kmalloc_array(>pdev->dev,
+  mdev->dma_channels,
+  sizeof(*mdev->chroots),
+  GFP_KERNEL);
if (!mdev->chroots)
return -ENOMEM;
 
-- 
2.12.2



[PATCH] soc/fsl/qbman: Disable IRQs for deferred QBMan work

2017-04-17 Thread Roy Pledge
Work for Congestion State Notifications (CSCN) and Message Ring (MR)
handling is handled via the workqueue mechanism. This requires the
driver to disable those IRQs before scheduling the work and re-enabling
it once the work is completed so that the interrupt doesn't continually
fire.

Signed-off-by: Roy Pledge 
---
 drivers/soc/fsl/qbman/qman.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/soc/fsl/qbman/qman.c b/drivers/soc/fsl/qbman/qman.c
index 3d891db..18eefc3 100644
--- a/drivers/soc/fsl/qbman/qman.c
+++ b/drivers/soc/fsl/qbman/qman.c
@@ -1344,6 +1344,7 @@ static void qm_congestion_task(struct work_struct *work)
if (!qm_mc_result_timeout(>p, )) {
spin_unlock(>cgr_lock);
dev_crit(p->config->dev, "QUERYCONGESTION timeout\n");
+   qman_p_irqsource_add(p, QM_PIRQ_CSCI);
return;
}
/* mask out the ones I'm not interested in */
@@ -1358,6 +1359,7 @@ static void qm_congestion_task(struct work_struct *work)
if (cgr->cb && qman_cgrs_get(, cgr->cgrid))
cgr->cb(p, cgr, qman_cgrs_get(, cgr->cgrid));
spin_unlock(>cgr_lock);
+   qman_p_irqsource_add(p, QM_PIRQ_CSCI);
 }
 
 static void qm_mr_process_task(struct work_struct *work)
@@ -1417,12 +1419,14 @@ static void qm_mr_process_task(struct work_struct *work)
}
 
qm_mr_cci_consume(>p, num);
+   qman_p_irqsource_add(p, QM_PIRQ_MRI);
preempt_enable();
 }
 
 static u32 __poll_portal_slow(struct qman_portal *p, u32 is)
 {
if (is & QM_PIRQ_CSCI) {
+   qman_p_irqsource_remove(p, QM_PIRQ_CSCI);
queue_work_on(smp_processor_id(), qm_portal_wq,
  >congestion_work);
}
@@ -1434,6 +1438,7 @@ static u32 __poll_portal_slow(struct qman_portal *p, u32 
is)
}
 
if (is & QM_PIRQ_MRI) {
+   qman_p_irqsource_remove(p, QM_PIRQ_MRI);
queue_work_on(smp_processor_id(), qm_portal_wq,
  >mr_work);
}
-- 
2.7.4



[PATCH] soc/fsl/qbman: Disable IRQs for deferred QBMan work

2017-04-17 Thread Roy Pledge
Work for Congestion State Notifications (CSCN) and Message Ring (MR)
handling is handled via the workqueue mechanism. This requires the
driver to disable those IRQs before scheduling the work and re-enabling
it once the work is completed so that the interrupt doesn't continually
fire.

Signed-off-by: Roy Pledge 
---
 drivers/soc/fsl/qbman/qman.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/soc/fsl/qbman/qman.c b/drivers/soc/fsl/qbman/qman.c
index 3d891db..18eefc3 100644
--- a/drivers/soc/fsl/qbman/qman.c
+++ b/drivers/soc/fsl/qbman/qman.c
@@ -1344,6 +1344,7 @@ static void qm_congestion_task(struct work_struct *work)
if (!qm_mc_result_timeout(>p, )) {
spin_unlock(>cgr_lock);
dev_crit(p->config->dev, "QUERYCONGESTION timeout\n");
+   qman_p_irqsource_add(p, QM_PIRQ_CSCI);
return;
}
/* mask out the ones I'm not interested in */
@@ -1358,6 +1359,7 @@ static void qm_congestion_task(struct work_struct *work)
if (cgr->cb && qman_cgrs_get(, cgr->cgrid))
cgr->cb(p, cgr, qman_cgrs_get(, cgr->cgrid));
spin_unlock(>cgr_lock);
+   qman_p_irqsource_add(p, QM_PIRQ_CSCI);
 }
 
 static void qm_mr_process_task(struct work_struct *work)
@@ -1417,12 +1419,14 @@ static void qm_mr_process_task(struct work_struct *work)
}
 
qm_mr_cci_consume(>p, num);
+   qman_p_irqsource_add(p, QM_PIRQ_MRI);
preempt_enable();
 }
 
 static u32 __poll_portal_slow(struct qman_portal *p, u32 is)
 {
if (is & QM_PIRQ_CSCI) {
+   qman_p_irqsource_remove(p, QM_PIRQ_CSCI);
queue_work_on(smp_processor_id(), qm_portal_wq,
  >congestion_work);
}
@@ -1434,6 +1438,7 @@ static u32 __poll_portal_slow(struct qman_portal *p, u32 
is)
}
 
if (is & QM_PIRQ_MRI) {
+   qman_p_irqsource_remove(p, QM_PIRQ_MRI);
queue_work_on(smp_processor_id(), qm_portal_wq,
  >mr_work);
}
-- 
2.7.4



[PATCH v2] net: cx89x0: move attribute declaration before struct keyword

2017-04-17 Thread Stefan Agner
The attribute declaration is typically before the definition. Move
the __maybe_unused attribute declaration before the struct keyword.

Signed-off-by: Stefan Agner 
---
Changes in v2:
- Move __maybe_unused after the complete type

 drivers/net/ethernet/cirrus/cs89x0.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/cirrus/cs89x0.c 
b/drivers/net/ethernet/cirrus/cs89x0.c
index 3647b28e8de0..47384f7323ac 100644
--- a/drivers/net/ethernet/cirrus/cs89x0.c
+++ b/drivers/net/ethernet/cirrus/cs89x0.c
@@ -1896,7 +1896,7 @@ static int cs89x0_platform_remove(struct platform_device 
*pdev)
return 0;
 }
 
-static const struct __maybe_unused of_device_id cs89x0_match[] = {
+static const struct of_device_id __maybe_unused cs89x0_match[] = {
{ .compatible = "cirrus,cs8900", },
{ .compatible = "cirrus,cs8920", },
{ },
-- 
2.12.2



[PATCH v2] net: cx89x0: move attribute declaration before struct keyword

2017-04-17 Thread Stefan Agner
The attribute declaration is typically before the definition. Move
the __maybe_unused attribute declaration before the struct keyword.

Signed-off-by: Stefan Agner 
---
Changes in v2:
- Move __maybe_unused after the complete type

 drivers/net/ethernet/cirrus/cs89x0.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/cirrus/cs89x0.c 
b/drivers/net/ethernet/cirrus/cs89x0.c
index 3647b28e8de0..47384f7323ac 100644
--- a/drivers/net/ethernet/cirrus/cs89x0.c
+++ b/drivers/net/ethernet/cirrus/cs89x0.c
@@ -1896,7 +1896,7 @@ static int cs89x0_platform_remove(struct platform_device 
*pdev)
return 0;
 }
 
-static const struct __maybe_unused of_device_id cs89x0_match[] = {
+static const struct of_device_id __maybe_unused cs89x0_match[] = {
{ .compatible = "cirrus,cs8900", },
{ .compatible = "cirrus,cs8920", },
{ },
-- 
2.12.2



Re: [PATCH] net: cx89x0: move attribute declaration before struct keyword

2017-04-17 Thread David Miller
From: Stefan Agner 
Date: Mon, 17 Apr 2017 13:31:28 -0700

> Given that, can you reconsider?

Please put the attribute after the compete type.

Thanks.


Re: [PATCH] net: cx89x0: move attribute declaration before struct keyword

2017-04-17 Thread David Miller
From: Stefan Agner 
Date: Mon, 17 Apr 2017 13:31:28 -0700

> Given that, can you reconsider?

Please put the attribute after the compete type.

Thanks.


RE: [PATCH] ACPICA: Export mutex functions

2017-04-17 Thread Moore, Robert


> -Original Message-
> From: Guenter Roeck [mailto:li...@roeck-us.net]
> Sent: Monday, April 17, 2017 12:45 PM
> To: Moore, Robert 
> Cc: Zheng, Lv ; Wysocki, Rafael J
> ; 'Len Brown' ; 'linux-
> a...@vger.kernel.org' ; 'de...@acpica.org'
> ; 'linux-kernel@vger.kernel.org'  ker...@vger.kernel.org>; Box, David E 
> Subject: Re: [PATCH] ACPICA: Export mutex functions
> 
> On Mon, Apr 17, 2017 at 07:27:37PM +, Moore, Robert wrote:
> >
> > > -Original Message-
> > > From: Moore, Robert
> > > Sent: Monday, April 17, 2017 10:13 AM
> > > To: Guenter Roeck ; Zheng, Lv
> > > 
> > > Cc: Wysocki, Rafael J ; Len Brown
> > > ; linux-a...@vger.kernel.org; de...@acpica.org;
> > > linux- ker...@vger.kernel.org
> > > Subject: RE: [PATCH] ACPICA: Export mutex functions
> > >
> > > There is a model for the drivers to directly acquire an AML mutex
> > > object. That is why the acquire/release public interfaces were added
> > > to ACPICA.
> > >
> > > I forget all of the details, but the model was developed with MS and
> > > others during the ACPI 6.0 timeframe.
> > >
> > >
> > [Moore, Robert]
> >
> >
> > Here is the case where the OS may need to directly acquire an AML
> mutex:
> >
> > From the ACPI spec:
> >
> > 19.6.2 Acquire (Acquire a Mutex)
> >
> > Note: For Mutex objects referenced by a _DLM object, the host OS may
> also contend for ownership.
> >
> From the context in the dsdt, and from description of expected use cases
> for _DLM objects I can find, this is what the mutex is used for (to
> serialize access to a resource on a low pin count serial interconnect,
> aka LPC).
> 
> What does that mean in practice ? That I am not supposed to use it
> because it doesn't follow standard ACPI mutex declaration rules ?
> 
> Thanks,
> Guenter
> 
> >
[Moore, Robert] 

I'm not an expert on the _DLM method, but I would point you to the description 
section in the ACPI spec, 5.7.5 _DLM (DeviceLock Mutex).





> >
> >
> > Other than this case, the OS/drivers should never need to directly
> acquire an AML mutex.
> > Bob
> >


RE: [PATCH] ACPICA: Export mutex functions

2017-04-17 Thread Moore, Robert


> -Original Message-
> From: Guenter Roeck [mailto:li...@roeck-us.net]
> Sent: Monday, April 17, 2017 12:45 PM
> To: Moore, Robert 
> Cc: Zheng, Lv ; Wysocki, Rafael J
> ; 'Len Brown' ; 'linux-
> a...@vger.kernel.org' ; 'de...@acpica.org'
> ; 'linux-kernel@vger.kernel.org'  ker...@vger.kernel.org>; Box, David E 
> Subject: Re: [PATCH] ACPICA: Export mutex functions
> 
> On Mon, Apr 17, 2017 at 07:27:37PM +, Moore, Robert wrote:
> >
> > > -Original Message-
> > > From: Moore, Robert
> > > Sent: Monday, April 17, 2017 10:13 AM
> > > To: Guenter Roeck ; Zheng, Lv
> > > 
> > > Cc: Wysocki, Rafael J ; Len Brown
> > > ; linux-a...@vger.kernel.org; de...@acpica.org;
> > > linux- ker...@vger.kernel.org
> > > Subject: RE: [PATCH] ACPICA: Export mutex functions
> > >
> > > There is a model for the drivers to directly acquire an AML mutex
> > > object. That is why the acquire/release public interfaces were added
> > > to ACPICA.
> > >
> > > I forget all of the details, but the model was developed with MS and
> > > others during the ACPI 6.0 timeframe.
> > >
> > >
> > [Moore, Robert]
> >
> >
> > Here is the case where the OS may need to directly acquire an AML
> mutex:
> >
> > From the ACPI spec:
> >
> > 19.6.2 Acquire (Acquire a Mutex)
> >
> > Note: For Mutex objects referenced by a _DLM object, the host OS may
> also contend for ownership.
> >
> From the context in the dsdt, and from description of expected use cases
> for _DLM objects I can find, this is what the mutex is used for (to
> serialize access to a resource on a low pin count serial interconnect,
> aka LPC).
> 
> What does that mean in practice ? That I am not supposed to use it
> because it doesn't follow standard ACPI mutex declaration rules ?
> 
> Thanks,
> Guenter
> 
> >
[Moore, Robert] 

I'm not an expert on the _DLM method, but I would point you to the description 
section in the ACPI spec, 5.7.5 _DLM (DeviceLock Mutex).





> >
> >
> > Other than this case, the OS/drivers should never need to directly
> acquire an AML mutex.
> > Bob
> >


Re: [PATCH] um: Include kbuild.h instead of duplicating its macros

2017-04-17 Thread Matthias Kaehlcke
El Mon, Apr 03, 2017 at 12:54:58PM -0700 Matthias Kaehlcke ha dit:

> Signed-off-by: Matthias Kaehlcke 
> ---
>  arch/x86/um/shared/sysdep/kernel-offsets.h | 9 +
>  1 file changed, 1 insertion(+), 8 deletions(-)
> 
> diff --git a/arch/x86/um/shared/sysdep/kernel-offsets.h 
> b/arch/x86/um/shared/sysdep/kernel-offsets.h
> index 46a9df99f3c5..7e1d35b6ad5c 100644
> --- a/arch/x86/um/shared/sysdep/kernel-offsets.h
> +++ b/arch/x86/um/shared/sysdep/kernel-offsets.h
> @@ -2,16 +2,9 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
> -#define DEFINE(sym, val) \
> - asm volatile("\n->" #sym " %0 " #val : : "i" (val))
> -
> -#define BLANK() asm volatile("\n->" : : )
> -
> -#define OFFSET(sym, str, mem) \
> - DEFINE(sym, offsetof(struct str, mem));
> -
>  void foo(void)
>  {
>  #include 
> -- 

Ping, any comment on this patch?

Cheers

Matthias


Re: [PATCH] um: Include kbuild.h instead of duplicating its macros

2017-04-17 Thread Matthias Kaehlcke
El Mon, Apr 03, 2017 at 12:54:58PM -0700 Matthias Kaehlcke ha dit:

> Signed-off-by: Matthias Kaehlcke 
> ---
>  arch/x86/um/shared/sysdep/kernel-offsets.h | 9 +
>  1 file changed, 1 insertion(+), 8 deletions(-)
> 
> diff --git a/arch/x86/um/shared/sysdep/kernel-offsets.h 
> b/arch/x86/um/shared/sysdep/kernel-offsets.h
> index 46a9df99f3c5..7e1d35b6ad5c 100644
> --- a/arch/x86/um/shared/sysdep/kernel-offsets.h
> +++ b/arch/x86/um/shared/sysdep/kernel-offsets.h
> @@ -2,16 +2,9 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
> -#define DEFINE(sym, val) \
> - asm volatile("\n->" #sym " %0 " #val : : "i" (val))
> -
> -#define BLANK() asm volatile("\n->" : : )
> -
> -#define OFFSET(sym, str, mem) \
> - DEFINE(sym, offsetof(struct str, mem));
> -
>  void foo(void)
>  {
>  #include 
> -- 

Ping, any comment on this patch?

Cheers

Matthias


[PATCH] watchdog: f71808e_wdt: Add F71868 support

2017-04-17 Thread Maciej S. Szmigiero
This adds support for watchdog part of Fintek F71868 Super I/O chip to
f71808e_wdt driver.

The F71868 chip is, in general, very similar to a F71869, however it has
slightly different set of available reset pulse widths.

Tested on MSI A55M-P33 motherboard.

Signed-off-by: Maciej S. Szmigiero 
---
 drivers/watchdog/Kconfig   |  7 ---
 drivers/watchdog/f71808e_wdt.c | 27 ---
 2 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
index 8b9049dac094..1f23079537ad 100644
--- a/drivers/watchdog/Kconfig
+++ b/drivers/watchdog/Kconfig
@@ -829,11 +829,12 @@ config EBC_C384_WDT
  the timeout module parameter.
 
 config F71808E_WDT
-   tristate "Fintek F71808E, F71862FG, F71869, F71882FG and F71889FG 
Watchdog"
+   tristate "Fintek F718xx, F818xx Super I/O Watchdog"
depends on X86
help
- This is the driver for the hardware watchdog on the Fintek
- F71808E, F71862FG, F71869, F71882FG and F71889FG Super I/O 
controllers.
+ This is the driver for the hardware watchdog on the Fintek F71808E,
+ F71862FG, F71868, F71869, F71882FG, F71889FG, F81865 and F81866
+ Super I/O controllers.
 
  You can compile this driver directly into the kernel, or use
  it as a module.  The module will be called f71808e_wdt.
diff --git a/drivers/watchdog/f71808e_wdt.c b/drivers/watchdog/f71808e_wdt.c
index 1b7e9169072f..8658dba21768 100644
--- a/drivers/watchdog/f71808e_wdt.c
+++ b/drivers/watchdog/f71808e_wdt.c
@@ -57,6 +57,7 @@
 #define SIO_F71808_ID  0x0901  /* Chipset ID */
 #define SIO_F71858_ID  0x0507  /* Chipset ID */
 #define SIO_F71862_ID  0x0601  /* Chipset ID */
+#define SIO_F71868_ID  0x1106  /* Chipset ID */
 #define SIO_F71869_ID  0x0814  /* Chipset ID */
 #define SIO_F71869A_ID 0x1007  /* Chipset ID */
 #define SIO_F71882_ID  0x0541  /* Chipset ID */
@@ -101,7 +102,7 @@ MODULE_PARM_DESC(timeout,
 static unsigned int pulse_width = WATCHDOG_PULSE_WIDTH;
 module_param(pulse_width, uint, 0);
 MODULE_PARM_DESC(pulse_width,
-   "Watchdog signal pulse width. 0(=level), 1 ms, 25 ms, 125 ms or 5000 ms"
+   "Watchdog signal pulse width. 0(=level), 1, 25, 30, 125, 150, 5000 or 
6000 ms"
" (default=" __MODULE_STRING(WATCHDOG_PULSE_WIDTH) ")");
 
 static unsigned int f71862fg_pin = WATCHDOG_F71862FG_PIN;
@@ -119,13 +120,14 @@ module_param(start_withtimeout, uint, 0);
 MODULE_PARM_DESC(start_withtimeout, "Start watchdog timer on module load with"
" given initial timeout. Zero (default) disables this feature.");
 
-enum chips { f71808fg, f71858fg, f71862fg, f71869, f71882fg, f71889fg, f81865,
-f81866};
+enum chips { f71808fg, f71858fg, f71862fg, f71868, f71869, f71882fg, f71889fg,
+f81865, f81866};
 
 static const char *f71808e_names[] = {
"f71808fg",
"f71858fg",
"f71862fg",
+   "f71868",
"f71869",
"f71882fg",
"f71889fg",
@@ -252,16 +254,23 @@ static int watchdog_set_timeout(int timeout)
 static int watchdog_set_pulse_width(unsigned int pw)
 {
int err = 0;
+   unsigned int t1 = 25, t2 = 125, t3 = 5000;
+
+   if (watchdog.type == f71868) {
+   t1 = 30;
+   t2 = 150;
+   t3 = 6000;
+   }
 
mutex_lock();
 
-   if(pw <=1) {
+   if(pw <=  1) {
watchdog.pulse_val = 0;
-   } else if (pw <=   25) {
+   } else if (pw <= t1) {
watchdog.pulse_val = 1;
-   } else if (pw <=  125) {
+   } else if (pw <= t2) {
watchdog.pulse_val = 2;
-   } else if (pw <= 5000) {
+   } else if (pw <= t3) {
watchdog.pulse_val = 3;
} else {
pr_err("pulse width out of range\n");
@@ -354,6 +363,7 @@ static int watchdog_start(void)
goto exit_superio;
break;
 
+   case f71868:
case f71869:
/* GPIO14 --> WDTRST# */
superio_clear_bit(watchdog.sioaddr, SIO_REG_MFUNCT1, 4);
@@ -792,6 +802,9 @@ static int __init f71808e_find(int sioaddr)
watchdog.type = f71862fg;
err = f71862fg_pin_configure(0); /* validate module parameter */
break;
+   case SIO_F71868_ID:
+   watchdog.type = f71868;
+   break;
case SIO_F71869_ID:
case SIO_F71869A_ID:
watchdog.type = f71869;


[PATCH] watchdog: f71808e_wdt: Add F71868 support

2017-04-17 Thread Maciej S. Szmigiero
This adds support for watchdog part of Fintek F71868 Super I/O chip to
f71808e_wdt driver.

The F71868 chip is, in general, very similar to a F71869, however it has
slightly different set of available reset pulse widths.

Tested on MSI A55M-P33 motherboard.

Signed-off-by: Maciej S. Szmigiero 
---
 drivers/watchdog/Kconfig   |  7 ---
 drivers/watchdog/f71808e_wdt.c | 27 ---
 2 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
index 8b9049dac094..1f23079537ad 100644
--- a/drivers/watchdog/Kconfig
+++ b/drivers/watchdog/Kconfig
@@ -829,11 +829,12 @@ config EBC_C384_WDT
  the timeout module parameter.
 
 config F71808E_WDT
-   tristate "Fintek F71808E, F71862FG, F71869, F71882FG and F71889FG 
Watchdog"
+   tristate "Fintek F718xx, F818xx Super I/O Watchdog"
depends on X86
help
- This is the driver for the hardware watchdog on the Fintek
- F71808E, F71862FG, F71869, F71882FG and F71889FG Super I/O 
controllers.
+ This is the driver for the hardware watchdog on the Fintek F71808E,
+ F71862FG, F71868, F71869, F71882FG, F71889FG, F81865 and F81866
+ Super I/O controllers.
 
  You can compile this driver directly into the kernel, or use
  it as a module.  The module will be called f71808e_wdt.
diff --git a/drivers/watchdog/f71808e_wdt.c b/drivers/watchdog/f71808e_wdt.c
index 1b7e9169072f..8658dba21768 100644
--- a/drivers/watchdog/f71808e_wdt.c
+++ b/drivers/watchdog/f71808e_wdt.c
@@ -57,6 +57,7 @@
 #define SIO_F71808_ID  0x0901  /* Chipset ID */
 #define SIO_F71858_ID  0x0507  /* Chipset ID */
 #define SIO_F71862_ID  0x0601  /* Chipset ID */
+#define SIO_F71868_ID  0x1106  /* Chipset ID */
 #define SIO_F71869_ID  0x0814  /* Chipset ID */
 #define SIO_F71869A_ID 0x1007  /* Chipset ID */
 #define SIO_F71882_ID  0x0541  /* Chipset ID */
@@ -101,7 +102,7 @@ MODULE_PARM_DESC(timeout,
 static unsigned int pulse_width = WATCHDOG_PULSE_WIDTH;
 module_param(pulse_width, uint, 0);
 MODULE_PARM_DESC(pulse_width,
-   "Watchdog signal pulse width. 0(=level), 1 ms, 25 ms, 125 ms or 5000 ms"
+   "Watchdog signal pulse width. 0(=level), 1, 25, 30, 125, 150, 5000 or 
6000 ms"
" (default=" __MODULE_STRING(WATCHDOG_PULSE_WIDTH) ")");
 
 static unsigned int f71862fg_pin = WATCHDOG_F71862FG_PIN;
@@ -119,13 +120,14 @@ module_param(start_withtimeout, uint, 0);
 MODULE_PARM_DESC(start_withtimeout, "Start watchdog timer on module load with"
" given initial timeout. Zero (default) disables this feature.");
 
-enum chips { f71808fg, f71858fg, f71862fg, f71869, f71882fg, f71889fg, f81865,
-f81866};
+enum chips { f71808fg, f71858fg, f71862fg, f71868, f71869, f71882fg, f71889fg,
+f81865, f81866};
 
 static const char *f71808e_names[] = {
"f71808fg",
"f71858fg",
"f71862fg",
+   "f71868",
"f71869",
"f71882fg",
"f71889fg",
@@ -252,16 +254,23 @@ static int watchdog_set_timeout(int timeout)
 static int watchdog_set_pulse_width(unsigned int pw)
 {
int err = 0;
+   unsigned int t1 = 25, t2 = 125, t3 = 5000;
+
+   if (watchdog.type == f71868) {
+   t1 = 30;
+   t2 = 150;
+   t3 = 6000;
+   }
 
mutex_lock();
 
-   if(pw <=1) {
+   if(pw <=  1) {
watchdog.pulse_val = 0;
-   } else if (pw <=   25) {
+   } else if (pw <= t1) {
watchdog.pulse_val = 1;
-   } else if (pw <=  125) {
+   } else if (pw <= t2) {
watchdog.pulse_val = 2;
-   } else if (pw <= 5000) {
+   } else if (pw <= t3) {
watchdog.pulse_val = 3;
} else {
pr_err("pulse width out of range\n");
@@ -354,6 +363,7 @@ static int watchdog_start(void)
goto exit_superio;
break;
 
+   case f71868:
case f71869:
/* GPIO14 --> WDTRST# */
superio_clear_bit(watchdog.sioaddr, SIO_REG_MFUNCT1, 4);
@@ -792,6 +802,9 @@ static int __init f71808e_find(int sioaddr)
watchdog.type = f71862fg;
err = f71862fg_pin_configure(0); /* validate module parameter */
break;
+   case SIO_F71868_ID:
+   watchdog.type = f71868;
+   break;
case SIO_F71869_ID:
case SIO_F71869A_ID:
watchdog.type = f71869;


Re: [PATCH] frv: Use OFFSET macro in DEF_*REG()

2017-04-17 Thread Matthias Kaehlcke
El Mon, Apr 03, 2017 at 12:46:36PM -0700 Matthias Kaehlcke ha dit:

> Avoid code duplication by using OFFSET() in DEF_*REG() instead of
> replicating the macro.
> 
> Signed-off-by: Matthias Kaehlcke 

Ping, any feedback on this patch?

Thanks

Matthias


Re: [PATCH] frv: Use OFFSET macro in DEF_*REG()

2017-04-17 Thread Matthias Kaehlcke
El Mon, Apr 03, 2017 at 12:46:36PM -0700 Matthias Kaehlcke ha dit:

> Avoid code duplication by using OFFSET() in DEF_*REG() instead of
> replicating the macro.
> 
> Signed-off-by: Matthias Kaehlcke 

Ping, any feedback on this patch?

Thanks

Matthias


Re: Microphone gain on N900

2017-04-17 Thread Pavel Machek
Hi!

>  v4.10 works quite nicely on N900, but I still have problems with
>  audio. Even GSM calls would be usable, if I had reasonable volume on
>  microphone and speaker... but I don't.
> 
>  Both speaker and microphone are too quiet. I can get louder output by
>  plugging headset -- good. I can hear the other side nicely, but am
>  too quiet for them. Unfortunately, even if I plug in wired headset,
>  internal microphone is still used.
> 
> >> For me these settings provide reasonable playback and capture volumes
> >> using v4.10.1:
> >>
> >> amixer -D hw:0 set PCM 100%
> >> amixer -D hw:0 set 'Line DAC' 80% on
> >> amixer -D hw:0 set 'Line' on
> >> amixer -D hw:0 set 'Left Line Mixer DACL1' on
> >> amixer -D hw:0 set 'Right Line Mixer DACR1' on
> >> amixer -D hw:0 set 'Jack Function' 'Headphone'
> >> amixer -D hw:0 set 'TPA6130A2 Headphone' 80%
> >>
> >> amixer -D hw:0 set 'Jack Function' 'Headset'
> >> amixer -D hw:0 set 'AGC' off
> >> amixer -D hw:0 set 'Input Select' 'ADC'
> >> amixer -D hw:0 set 'Left PGA Mixer Line1L' on
> >> amixer -D hw:0 set 'Left Line1L Mux' 'differential'
> >> amixer -D hw:0 set 'Right PGA Mixer Line1L' on
> >> amixer -D hw:0 set 'Right Line1L Mux' 'differential'
> >> amixer -D hw:0 set 'ADC HPF Cut-off' '0.0045xFs' 'disabled'
> >> amixer -D hw:0 set PGA 50% on
> >> amixer -D hw:0 cset iface=MIXER,name='PGA Capture Switch',index=0 on
> > 
> > This seems to correspond to volumes I have had before. They are enough
> > for phone call in quiet environment, but not really for normal usage.
> > 
> > Do you have settings where microphone on wired headset works?
> > 
> Those settings above. They are set after putting all volumes to 0 and
> muted. I'm using the Nokia WH-205 headset that came with the phone.

I just wanted to say thank you. I finally got to testing those
settings again, and this time headset microphone definitely works.

Loopback using arecord | aplay now produces reasonable volume of
audio. (Unfortunately, I still have some problems somewhere in the
audio<->GSM path :-( ).

Best regards,
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: Microphone gain on N900

2017-04-17 Thread Pavel Machek
Hi!

>  v4.10 works quite nicely on N900, but I still have problems with
>  audio. Even GSM calls would be usable, if I had reasonable volume on
>  microphone and speaker... but I don't.
> 
>  Both speaker and microphone are too quiet. I can get louder output by
>  plugging headset -- good. I can hear the other side nicely, but am
>  too quiet for them. Unfortunately, even if I plug in wired headset,
>  internal microphone is still used.
> 
> >> For me these settings provide reasonable playback and capture volumes
> >> using v4.10.1:
> >>
> >> amixer -D hw:0 set PCM 100%
> >> amixer -D hw:0 set 'Line DAC' 80% on
> >> amixer -D hw:0 set 'Line' on
> >> amixer -D hw:0 set 'Left Line Mixer DACL1' on
> >> amixer -D hw:0 set 'Right Line Mixer DACR1' on
> >> amixer -D hw:0 set 'Jack Function' 'Headphone'
> >> amixer -D hw:0 set 'TPA6130A2 Headphone' 80%
> >>
> >> amixer -D hw:0 set 'Jack Function' 'Headset'
> >> amixer -D hw:0 set 'AGC' off
> >> amixer -D hw:0 set 'Input Select' 'ADC'
> >> amixer -D hw:0 set 'Left PGA Mixer Line1L' on
> >> amixer -D hw:0 set 'Left Line1L Mux' 'differential'
> >> amixer -D hw:0 set 'Right PGA Mixer Line1L' on
> >> amixer -D hw:0 set 'Right Line1L Mux' 'differential'
> >> amixer -D hw:0 set 'ADC HPF Cut-off' '0.0045xFs' 'disabled'
> >> amixer -D hw:0 set PGA 50% on
> >> amixer -D hw:0 cset iface=MIXER,name='PGA Capture Switch',index=0 on
> > 
> > This seems to correspond to volumes I have had before. They are enough
> > for phone call in quiet environment, but not really for normal usage.
> > 
> > Do you have settings where microphone on wired headset works?
> > 
> Those settings above. They are set after putting all volumes to 0 and
> muted. I'm using the Nokia WH-205 headset that came with the phone.

I just wanted to say thank you. I finally got to testing those
settings again, and this time headset microphone definitely works.

Loopback using arecord | aplay now produces reasonable volume of
audio. (Unfortunately, I still have some problems somewhere in the
audio<->GSM path :-( ).

Best regards,
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: [PATCH] net: cx89x0: move attribute declaration before struct keyword

2017-04-17 Thread Stefan Agner
On 2017-04-17 13:09, David Miller wrote:
> From: Stefan Agner 
> Date: Sun, 16 Apr 2017 23:20:32 -0700
> 
>> The attribute declaration is typically before the definition. Move
>> the __maybe_unused attribute declaration before the struct keyword.
>>
>> Signed-off-by: Stefan Agner 
> 

I did catch that while compiling with clang, and the exact error message
is:
drivers/net/ethernet/cirrus/cs89x0.c:1899:21: warning: attribute
declaration must precede definition [-Wignored-attributes]
static const struct __maybe_unused of_device_id cs89x0_match[] = {

> Well, I see if just as often after the variable name too:
> 
> net/irda/iriap.c:static const char *const ias_charset_types[] __maybe_unused 
> = {
> net/irda/irlap.c:static const char *const lap_reasons[] __maybe_unused = {
> net/irda/irlap_event.c:static const char *const irlap_event[] __maybe_unused 
> = {
> net/irda/irlmp_event.c:static const char *const irlmp_event[] __maybe_unused 
> = {
> 

That seems not to fire when compiling with clang. I guess because the
attribute is after the _complete_ type?

> Or after the struct:
> 
> drivers/net/phy/ste10Xp.c:static struct mdio_device_id __maybe_unused
> ste10Xp_tbl[] = {
> drivers/net/phy/teranetics.c:static struct mdio_device_id
> __maybe_unused teranetics_tbl[] = {
> drivers/net/phy/vitesse.c:static struct mdio_device_id __maybe_unused
> vitesse_tbl[] = {
> 

Same here...

> So unless we decide tree wide to do it in one order or another, such changes
> are largely a waste of time.

Afaik, "struct of_device_id" as a whole is a type. This case is really
odd since it puts the attribute in the middle of a type. It is the only
instance which came across (not everything compiles fine with clang yet,
so there might be more... but a quick grep did not turn up more of the
same cases)...

> 
> Sorry I'm not applying this patch.

Given that, can you reconsider?

--
Stefan


Re: [PATCH] net: cx89x0: move attribute declaration before struct keyword

2017-04-17 Thread Stefan Agner
On 2017-04-17 13:09, David Miller wrote:
> From: Stefan Agner 
> Date: Sun, 16 Apr 2017 23:20:32 -0700
> 
>> The attribute declaration is typically before the definition. Move
>> the __maybe_unused attribute declaration before the struct keyword.
>>
>> Signed-off-by: Stefan Agner 
> 

I did catch that while compiling with clang, and the exact error message
is:
drivers/net/ethernet/cirrus/cs89x0.c:1899:21: warning: attribute
declaration must precede definition [-Wignored-attributes]
static const struct __maybe_unused of_device_id cs89x0_match[] = {

> Well, I see if just as often after the variable name too:
> 
> net/irda/iriap.c:static const char *const ias_charset_types[] __maybe_unused 
> = {
> net/irda/irlap.c:static const char *const lap_reasons[] __maybe_unused = {
> net/irda/irlap_event.c:static const char *const irlap_event[] __maybe_unused 
> = {
> net/irda/irlmp_event.c:static const char *const irlmp_event[] __maybe_unused 
> = {
> 

That seems not to fire when compiling with clang. I guess because the
attribute is after the _complete_ type?

> Or after the struct:
> 
> drivers/net/phy/ste10Xp.c:static struct mdio_device_id __maybe_unused
> ste10Xp_tbl[] = {
> drivers/net/phy/teranetics.c:static struct mdio_device_id
> __maybe_unused teranetics_tbl[] = {
> drivers/net/phy/vitesse.c:static struct mdio_device_id __maybe_unused
> vitesse_tbl[] = {
> 

Same here...

> So unless we decide tree wide to do it in one order or another, such changes
> are largely a waste of time.

Afaik, "struct of_device_id" as a whole is a type. This case is really
odd since it puts the attribute in the middle of a type. It is the only
instance which came across (not everything compiles fine with clang yet,
so there might be more... but a quick grep did not turn up more of the
same cases)...

> 
> Sorry I'm not applying this patch.

Given that, can you reconsider?

--
Stefan


[GIT PULL] parisc architecture fix for v4.11

2017-04-17 Thread Helge Deller
Hi Linus,

please pull one important fix for the parisc architecture for kernel 4.11 from:

  git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux.git 
parisc-4.11-5

One patch which fixes get_user() for 64-bit values on 32-bit kernels. Up
to now we lost the upper 32-bits of the returned 64-bit value.

Thanks,
Helge


Helge Deller (1):
  parisc: Fix get_user() for 64-bit value on 32-bit kernel

 arch/parisc/include/asm/uaccess.h | 86 +--
 1 file changed, 55 insertions(+), 31 deletions(-)


[GIT PULL] parisc architecture fix for v4.11

2017-04-17 Thread Helge Deller
Hi Linus,

please pull one important fix for the parisc architecture for kernel 4.11 from:

  git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux.git 
parisc-4.11-5

One patch which fixes get_user() for 64-bit values on 32-bit kernels. Up
to now we lost the upper 32-bits of the returned 64-bit value.

Thanks,
Helge


Helge Deller (1):
  parisc: Fix get_user() for 64-bit value on 32-bit kernel

 arch/parisc/include/asm/uaccess.h | 86 +--
 1 file changed, 55 insertions(+), 31 deletions(-)


Re: [PATCH 4/9] mm, memory_hotplug: get rid of is_zone_device_section

2017-04-17 Thread Jerome Glisse
On Mon, Apr 10, 2017 at 01:03:46PM +0200, Michal Hocko wrote:
> From: Michal Hocko 
> 
> device memory hotplug hooks into regular memory hotplug only half way.
> It needs memory sections to track struct pages but there is no
> need/desire to associate those sections with memory blocks and export
> them to the userspace via sysfs because they cannot be onlined anyway.
> 
> This is currently expressed by for_device argument to arch_add_memory
> which then makes sure to associate the given memory range with
> ZONE_DEVICE. register_new_memory then relies on is_zone_device_section
> to distinguish special memory hotplug from the regular one. While this
> works now, later patches in this series want to move __add_zone outside
> of arch_add_memory path so we have to come up with something else.
> 
> Add want_memblock down the __add_pages path and use it to control
> whether the section->memblock association should be done. arch_add_memory
> then just trivially want memblock for everything but for_device hotplug.
> 
> remove_memory_section doesn't need is_zone_device_section either. We can
> simply skip all the memblock specific cleanup if there is no memblock
> for the given section.
> 
> This shouldn't introduce any functional change.
> 
> Cc: Dan Williams 
> Signed-off-by: Michal Hocko 
> ---
>  arch/ia64/mm/init.c|  2 +-
>  arch/powerpc/mm/mem.c  |  2 +-
>  arch/s390/mm/init.c|  2 +-
>  arch/sh/mm/init.c  |  2 +-
>  arch/x86/mm/init_32.c  |  2 +-
>  arch/x86/mm/init_64.c  |  2 +-
>  drivers/base/memory.c  | 22 --
>  include/linux/memory_hotplug.h |  2 +-
>  mm/memory_hotplug.c| 11 +++
>  9 files changed, 22 insertions(+), 25 deletions(-)
> 
> diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
> index 06cdaef54b2e..62085fd902e6 100644
> --- a/arch/ia64/mm/init.c
> +++ b/arch/ia64/mm/init.c
> @@ -657,7 +657,7 @@ int arch_add_memory(int nid, u64 start, u64 size, bool 
> for_device)
>  
>   zone = pgdat->node_zones +
>   zone_for_memory(nid, start, size, ZONE_NORMAL, for_device);
> - ret = __add_pages(nid, zone, start_pfn, nr_pages);
> + ret = __add_pages(nid, zone, start_pfn, nr_pages, !for_device);
>  
>   if (ret)
>   printk("%s: Problem encountered in __add_pages() as ret=%d\n",
> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> index 5f844337de21..ea3e09a62f38 100644
> --- a/arch/powerpc/mm/mem.c
> +++ b/arch/powerpc/mm/mem.c
> @@ -149,7 +149,7 @@ int arch_add_memory(int nid, u64 start, u64 size, bool 
> for_device)
>   zone = pgdata->node_zones +
>   zone_for_memory(nid, start, size, 0, for_device);
>  
> - return __add_pages(nid, zone, start_pfn, nr_pages);
> + return __add_pages(nid, zone, start_pfn, nr_pages, !for_device);
>  }
>  
>  #ifdef CONFIG_MEMORY_HOTREMOVE
> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
> index bf5b8a0c4ff7..5c84346e5211 100644
> --- a/arch/s390/mm/init.c
> +++ b/arch/s390/mm/init.c
> @@ -182,7 +182,7 @@ int arch_add_memory(int nid, u64 start, u64 size, bool 
> for_device)
>   continue;
>   nr_pages = (start_pfn + size_pages > zone_end_pfn) ?
>  zone_end_pfn - start_pfn : size_pages;
> - rc = __add_pages(nid, zone, start_pfn, nr_pages);
> + rc = __add_pages(nid, zone, start_pfn, nr_pages, !for_device);
>   if (rc)
>   break;
>   start_pfn += nr_pages;
> diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
> index 75491862d900..a9d57f75ae8c 100644
> --- a/arch/sh/mm/init.c
> +++ b/arch/sh/mm/init.c
> @@ -498,7 +498,7 @@ int arch_add_memory(int nid, u64 start, u64 size, bool 
> for_device)
>   ret = __add_pages(nid, pgdat->node_zones +
>   zone_for_memory(nid, start, size, ZONE_NORMAL,
>   for_device),
> - start_pfn, nr_pages);
> + start_pfn, nr_pages, !for_device);
>   if (unlikely(ret))
>   printk("%s: Failed, __add_pages() == %d\n", __func__, ret);
>  
> diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
> index c68078fd06fd..4b0f05328af0 100644
> --- a/arch/x86/mm/init_32.c
> +++ b/arch/x86/mm/init_32.c
> @@ -834,7 +834,7 @@ int arch_add_memory(int nid, u64 start, u64 size, bool 
> for_device)
>   unsigned long start_pfn = start >> PAGE_SHIFT;
>   unsigned long nr_pages = size >> PAGE_SHIFT;
>  
> - return __add_pages(nid, zone, start_pfn, nr_pages);
> + return __add_pages(nid, zone, start_pfn, nr_pages, !for_device);
>  }
>  
>  #ifdef CONFIG_MEMORY_HOTREMOVE
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 7eef17239378..39cfaee93975 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -652,7 +652,7 @@ int arch_add_memory(int nid, u64 start, 

Re: [PATCH 4/9] mm, memory_hotplug: get rid of is_zone_device_section

2017-04-17 Thread Jerome Glisse
On Mon, Apr 10, 2017 at 01:03:46PM +0200, Michal Hocko wrote:
> From: Michal Hocko 
> 
> device memory hotplug hooks into regular memory hotplug only half way.
> It needs memory sections to track struct pages but there is no
> need/desire to associate those sections with memory blocks and export
> them to the userspace via sysfs because they cannot be onlined anyway.
> 
> This is currently expressed by for_device argument to arch_add_memory
> which then makes sure to associate the given memory range with
> ZONE_DEVICE. register_new_memory then relies on is_zone_device_section
> to distinguish special memory hotplug from the regular one. While this
> works now, later patches in this series want to move __add_zone outside
> of arch_add_memory path so we have to come up with something else.
> 
> Add want_memblock down the __add_pages path and use it to control
> whether the section->memblock association should be done. arch_add_memory
> then just trivially want memblock for everything but for_device hotplug.
> 
> remove_memory_section doesn't need is_zone_device_section either. We can
> simply skip all the memblock specific cleanup if there is no memblock
> for the given section.
> 
> This shouldn't introduce any functional change.
> 
> Cc: Dan Williams 
> Signed-off-by: Michal Hocko 
> ---
>  arch/ia64/mm/init.c|  2 +-
>  arch/powerpc/mm/mem.c  |  2 +-
>  arch/s390/mm/init.c|  2 +-
>  arch/sh/mm/init.c  |  2 +-
>  arch/x86/mm/init_32.c  |  2 +-
>  arch/x86/mm/init_64.c  |  2 +-
>  drivers/base/memory.c  | 22 --
>  include/linux/memory_hotplug.h |  2 +-
>  mm/memory_hotplug.c| 11 +++
>  9 files changed, 22 insertions(+), 25 deletions(-)
> 
> diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
> index 06cdaef54b2e..62085fd902e6 100644
> --- a/arch/ia64/mm/init.c
> +++ b/arch/ia64/mm/init.c
> @@ -657,7 +657,7 @@ int arch_add_memory(int nid, u64 start, u64 size, bool 
> for_device)
>  
>   zone = pgdat->node_zones +
>   zone_for_memory(nid, start, size, ZONE_NORMAL, for_device);
> - ret = __add_pages(nid, zone, start_pfn, nr_pages);
> + ret = __add_pages(nid, zone, start_pfn, nr_pages, !for_device);
>  
>   if (ret)
>   printk("%s: Problem encountered in __add_pages() as ret=%d\n",
> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> index 5f844337de21..ea3e09a62f38 100644
> --- a/arch/powerpc/mm/mem.c
> +++ b/arch/powerpc/mm/mem.c
> @@ -149,7 +149,7 @@ int arch_add_memory(int nid, u64 start, u64 size, bool 
> for_device)
>   zone = pgdata->node_zones +
>   zone_for_memory(nid, start, size, 0, for_device);
>  
> - return __add_pages(nid, zone, start_pfn, nr_pages);
> + return __add_pages(nid, zone, start_pfn, nr_pages, !for_device);
>  }
>  
>  #ifdef CONFIG_MEMORY_HOTREMOVE
> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
> index bf5b8a0c4ff7..5c84346e5211 100644
> --- a/arch/s390/mm/init.c
> +++ b/arch/s390/mm/init.c
> @@ -182,7 +182,7 @@ int arch_add_memory(int nid, u64 start, u64 size, bool 
> for_device)
>   continue;
>   nr_pages = (start_pfn + size_pages > zone_end_pfn) ?
>  zone_end_pfn - start_pfn : size_pages;
> - rc = __add_pages(nid, zone, start_pfn, nr_pages);
> + rc = __add_pages(nid, zone, start_pfn, nr_pages, !for_device);
>   if (rc)
>   break;
>   start_pfn += nr_pages;
> diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
> index 75491862d900..a9d57f75ae8c 100644
> --- a/arch/sh/mm/init.c
> +++ b/arch/sh/mm/init.c
> @@ -498,7 +498,7 @@ int arch_add_memory(int nid, u64 start, u64 size, bool 
> for_device)
>   ret = __add_pages(nid, pgdat->node_zones +
>   zone_for_memory(nid, start, size, ZONE_NORMAL,
>   for_device),
> - start_pfn, nr_pages);
> + start_pfn, nr_pages, !for_device);
>   if (unlikely(ret))
>   printk("%s: Failed, __add_pages() == %d\n", __func__, ret);
>  
> diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
> index c68078fd06fd..4b0f05328af0 100644
> --- a/arch/x86/mm/init_32.c
> +++ b/arch/x86/mm/init_32.c
> @@ -834,7 +834,7 @@ int arch_add_memory(int nid, u64 start, u64 size, bool 
> for_device)
>   unsigned long start_pfn = start >> PAGE_SHIFT;
>   unsigned long nr_pages = size >> PAGE_SHIFT;
>  
> - return __add_pages(nid, zone, start_pfn, nr_pages);
> + return __add_pages(nid, zone, start_pfn, nr_pages, !for_device);
>  }
>  
>  #ifdef CONFIG_MEMORY_HOTREMOVE
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 7eef17239378..39cfaee93975 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -652,7 +652,7 @@ int arch_add_memory(int nid, u64 start, u64 size, bool 
> for_device)
>  
>   

Re: [PATCH] net: cx89x0: move attribute declaration before struct keyword

2017-04-17 Thread David Miller
From: Stefan Agner 
Date: Sun, 16 Apr 2017 23:20:32 -0700

> The attribute declaration is typically before the definition. Move
> the __maybe_unused attribute declaration before the struct keyword.
> 
> Signed-off-by: Stefan Agner 

Well, I see if just as often after the variable name too:

net/irda/iriap.c:static const char *const ias_charset_types[] __maybe_unused = {
net/irda/irlap.c:static const char *const lap_reasons[] __maybe_unused = {
net/irda/irlap_event.c:static const char *const irlap_event[] __maybe_unused = {
net/irda/irlmp_event.c:static const char *const irlmp_event[] __maybe_unused = {

Or after the struct:

drivers/net/phy/ste10Xp.c:static struct mdio_device_id __maybe_unused 
ste10Xp_tbl[] = {
drivers/net/phy/teranetics.c:static struct mdio_device_id __maybe_unused 
teranetics_tbl[] = {
drivers/net/phy/vitesse.c:static struct mdio_device_id __maybe_unused 
vitesse_tbl[] = {

So unless we decide tree wide to do it in one order or another, such changes
are largely a waste of time.

Sorry I'm not applying this patch.


Re: [PATCH] net: cx89x0: move attribute declaration before struct keyword

2017-04-17 Thread David Miller
From: Stefan Agner 
Date: Sun, 16 Apr 2017 23:20:32 -0700

> The attribute declaration is typically before the definition. Move
> the __maybe_unused attribute declaration before the struct keyword.
> 
> Signed-off-by: Stefan Agner 

Well, I see if just as often after the variable name too:

net/irda/iriap.c:static const char *const ias_charset_types[] __maybe_unused = {
net/irda/irlap.c:static const char *const lap_reasons[] __maybe_unused = {
net/irda/irlap_event.c:static const char *const irlap_event[] __maybe_unused = {
net/irda/irlmp_event.c:static const char *const irlmp_event[] __maybe_unused = {

Or after the struct:

drivers/net/phy/ste10Xp.c:static struct mdio_device_id __maybe_unused 
ste10Xp_tbl[] = {
drivers/net/phy/teranetics.c:static struct mdio_device_id __maybe_unused 
teranetics_tbl[] = {
drivers/net/phy/vitesse.c:static struct mdio_device_id __maybe_unused 
vitesse_tbl[] = {

So unless we decide tree wide to do it in one order or another, such changes
are largely a waste of time.

Sorry I'm not applying this patch.


Re: [PATCH 3.18 010/145] sctp: avoid BUG_ON on sctp_wait_for_sndbuf

2017-04-17 Thread Marcelo Ricardo Leitner



Em 16-04-2017 07:48, Greg Kroah-Hartman escreveu:

3.18-stable review patch.  If anyone has any objections, please let me know.

--

From: Marcelo Ricardo Leitner 


[ Upstream commit 2dcab598484185dea7ec22219c76dcdd59e3cb90 ]



Hi Greg. Are you also including dfcb9f4f99f1 ("sctp: deny peeloff 
operation on asocs with threads sleeping on it")

It's a follow-up fix for this one, would be good to have it too.



Alexander Popov reported that an application may trigger a BUG_ON in
sctp_wait_for_sndbuf if the socket tx buffer is full, a thread is
waiting on it to queue more data and meanwhile another thread peels off
the association being used by the first thread.

This patch replaces the BUG_ON call with a proper error handling. It
will return -EPIPE to the original sendmsg call, similarly to what would
have been done if the association wasn't found in the first place.

Acked-by: Alexander Popov 
Signed-off-by: Marcelo Ricardo Leitner 
Reviewed-by: Xin Long 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Greg Kroah-Hartman 
---
  net/sctp/socket.c |3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -6962,7 +6962,8 @@ static int sctp_wait_for_sndbuf(struct s
 */
release_sock(sk);
current_timeo = schedule_timeout(current_timeo);
-   BUG_ON(sk != asoc->base.sk);
+   if (sk != asoc->base.sk)
+   goto do_error;
lock_sock(sk);
  
  		*timeo_p = current_timeo;





Re: [PATCH 3.18 010/145] sctp: avoid BUG_ON on sctp_wait_for_sndbuf

2017-04-17 Thread Marcelo Ricardo Leitner



Em 16-04-2017 07:48, Greg Kroah-Hartman escreveu:

3.18-stable review patch.  If anyone has any objections, please let me know.

--

From: Marcelo Ricardo Leitner 


[ Upstream commit 2dcab598484185dea7ec22219c76dcdd59e3cb90 ]



Hi Greg. Are you also including dfcb9f4f99f1 ("sctp: deny peeloff 
operation on asocs with threads sleeping on it")

It's a follow-up fix for this one, would be good to have it too.



Alexander Popov reported that an application may trigger a BUG_ON in
sctp_wait_for_sndbuf if the socket tx buffer is full, a thread is
waiting on it to queue more data and meanwhile another thread peels off
the association being used by the first thread.

This patch replaces the BUG_ON call with a proper error handling. It
will return -EPIPE to the original sendmsg call, similarly to what would
have been done if the association wasn't found in the first place.

Acked-by: Alexander Popov 
Signed-off-by: Marcelo Ricardo Leitner 
Reviewed-by: Xin Long 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Greg Kroah-Hartman 
---
  net/sctp/socket.c |3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -6962,7 +6962,8 @@ static int sctp_wait_for_sndbuf(struct s
 */
release_sock(sk);
current_timeo = schedule_timeout(current_timeo);
-   BUG_ON(sk != asoc->base.sk);
+   if (sk != asoc->base.sk)
+   goto do_error;
lock_sock(sk);
  
  		*timeo_p = current_timeo;





<    1   2   3   4   5   6   7   8   9   10   >