date:20160222

Re: [PATCH 1/2] vfs: make sure struct filename->iname is word-aligned

2016-02-22 Thread Rasmus Villemoes

On Fri, Feb 19 2016, Theodore Ts'o  wrote:

> On Thu, Feb 18, 2016 at 09:10:21PM +0100, Rasmus Villemoes wrote:
>> 
>> Sure, that would work as well. I don't really care how ->iname is pushed
>> out to offset 32, but I'd like to know if it's worth it.
>
> Do you have access to one of these platforms where unaligned access is
> really painful?

No. But FWIW, I did a microbenchmark on my aging Core2, doing nothing
but lstat() on the same "..." string in a loop. 'before' is 4.4.2
with a few unrelated patches, 'after' is that plus 1/2 and 2/2. In
perf_x_y, x is length of "aaa..." string and y is alignment mod 8 in
userspace.

$ grep strncpy_from_user *.report
perf_30_0_after.report: 5.47%  s_f_u[k] strncpy_from_user
perf_30_0_before.report: 7.40%  s_f_u[k] strncpy_from_user 
perf_30_3_after.report: 5.05%  s_f_u[k] strncpy_from_user 
perf_30_3_before.report: 7.29%  s_f_u[k] strncpy_from_user  
perf_30_4_after.report: 4.88%  s_f_u[k] strncpy_from_user
perf_30_4_before.report: 7.28%  s_f_u[k] strncpy_from_user
perf_30_6_after.report: 5.43%  s_f_u[k] strncpy_from_user
perf_30_6_before.report: 6.74%  s_f_u[k] strncpy_from_user
perf_40_0_after.report: 5.68%  s_f_u[k] strncpy_from_user
perf_40_0_before.report:10.99%  s_f_u[k] strncpy_from_user
perf_40_3_after.report: 5.37%  s_f_u[k] strncpy_from_user
perf_40_3_before.report:10.62%  s_f_u[k] strncpy_from_user
perf_40_4_after.report: 5.61%  s_f_u[k] strncpy_from_user
perf_40_4_before.report:10.91%  s_f_u[k] strncpy_from_user
perf_40_6_after.report: 5.81%  s_f_u[k] strncpy_from_user
perf_40_6_before.report:10.84%  s_f_u[k] strncpy_from_user  
perf_50_0_after.report: 6.29%  s_f_u[k] strncpy_from_user
perf_50_0_before.report:12.46%  s_f_u[k] strncpy_from_user
perf_50_3_after.report: 7.15%  s_f_u[k] strncpy_from_user
perf_50_3_before.report:14.09%  s_f_u[k] strncpy_from_user  
 
perf_50_4_after.report: 7.64%  s_f_u[k] strncpy_from_user
perf_50_4_before.report:14.10%  s_f_u[k] strncpy_from_user
perf_50_6_after.report: 7.30%  s_f_u[k] strncpy_from_user
perf_50_6_before.report:14.10%  s_f_u[k] strncpy_from_user
perf_60_0_after.report: 6.81%  s_f_u[k] strncpy_from_user
perf_60_0_before.report:13.25%  s_f_u[k] strncpy_from_user 
perf_60_3_after.report: 9.48%  s_f_u[k] strncpy_from_user
perf_60_3_before.report:13.26%  s_f_u[k] strncpy_from_user 
perf_60_4_after.report: 9.90%  s_f_u[k] strncpy_from_user
perf_60_4_before.report:15.09%  s_f_u[k] strncpy_from_user  
perf_60_6_after.report: 9.91%  s_f_u[k] strncpy_from_user
perf_60_6_before.report:13.85%  s_f_u[k] strncpy_from_user

So the numbers vary and it's a bit odd that some of the
userspace-unaligned cases seem faster than the corresponding aligned
ones, but overall I think it's ok to conclude there's a measurable
difference.

Note the huge jump from 30_y_before to 40_y_before. I suppose that's
because we do an unaligned store crossing a cache line boundary when the
string is > 32 bytes.

I suppose 2/2 is also responsible for some of the above, since it not
only aligns the kernel-side stores, but also means we stay within a
single cacheline for strings up to 56 bytes. I should measure the effect
of 1/2 by itself, but compiling a kernel takes forever for me, so I
won't get to that tonight.

[It turns out that 32 is the median length from 'git ls-files' in the
kernel tree, with 33.2 being the mean, so even though I used relatively
long paths above to get strncpy_from_user to stand out, such path
lengths are not totally uncommon.]

> The usual thing is to benchmark something like "git
> stat" which has to stat every single file in a repository's working
> directory.

I tried that as well; strncpy_from_user was around 0.5% both before and
after.

Rasmus

[PATCH] Staging: comedi: contec_pci_dio: fixed comment blocks coding style issues

2016-02-22 Thread Philippe Loctaux

Makes the comment blocks start with /* on separate lines, and end
with */ on separate lines as well,
starting with * for each comment lines.

Signed-off-by: Philippe Loctaux 
---
 drivers/staging/comedi/drivers/contec_pci_dio.c | 47 +
 1 file changed, 24 insertions(+), 23 deletions(-)

diff --git a/drivers/staging/comedi/drivers/contec_pci_dio.c 
b/drivers/staging/comedi/drivers/contec_pci_dio.c
index 4956a49..5f84839 100644
--- a/drivers/staging/comedi/drivers/contec_pci_dio.c
+++ b/drivers/staging/comedi/drivers/contec_pci_dio.c
@@ -1,29 +1,30 @@
 /*
-comedi/drivers/contec_pci_dio.c
-
-COMEDI - Linux Control and Measurement Device Interface
-Copyright (C) 2000 David A. Schleef 
-
-This program is free software; you can redistribute it and/or modify
-it under the terms of the GNU General Public License as published by
-the Free Software Foundation; either version 2 of the License, or
-(at your option) any later version.
+ * comedi/drivers/contec_pci_dio.c
+ *
+ * COMEDI - Linux Control and Measurement Device Interface
+ * Copyright (C) 2000 David A. Schleef 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
 
-This program is distributed in the hope that it will be useful,
-but WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-GNU General Public License for more details.
-*/
 /*
-Driver: contec_pci_dio
-Description: Contec PIO1616L digital I/O board
-Devices: [Contec] PIO1616L (contec_pci_dio)
-Author: Stefano Rivoir 
-Updated: Wed, 27 Jun 2007 13:00:06 +0100
-Status: works
-
-Configuration Options: not applicable, uses comedi PCI auto config
-*/
+ * Driver: contec_pci_dio
+ * Description: Contec PIO1616L digital I/O board
+ * Devices: [Contec] PIO1616L (contec_pci_dio)
+ * Author: Stefano Rivoir 
+ * Updated: Wed, 27 Jun 2007 13:00:06 +0100
+ * Status: works
+ *
+ * Configuration Options: not applicable, uses comedi PCI auto config
+ */
 
 #include 
 
-- 
2.7.1

Re: [PATCH 1/2] MIPS: Add barriers between dcache & icache flushes

2016-02-22 Thread Joshua Kinard

On 02/22/2016 13:09, Paul Burton wrote:
> Index-based cache operations may be arbitrarily reordered by out of
> order CPUs. Thus code which writes back the dcache & then invalidates
> the icache using indexed cache ops must include a barrier between
> operating on the 2 caches in order to prevent the scenario in which:
> 
>   - icache invalidation occurs.
> 
>   - icache fetch occurs, due to speculation.
> 
>   - dcache writeback occurs.
> 
> If the above were allowed to happen then the icache would contain stale
> data. Forcing the dcache writeback to complete before the icache
> invalidation avoids this.

Is there a particular symptom one should look for to check for this issue
occurring?  I haven't seen any odd effects on my SGI systems that appear to
relate to this.  I believe the R1x000 family resolves all hazards in hardware,
so maybe this issue doesn't affect that CPU family?

If not, let me know what to look or test for so I can check the patch out on my
systems.

Thanks!

--J


> Signed-off-by: Paul Burton 
> Cc: James Hogan 
> ---
> 
>  arch/mips/mm/c-r4k.c | 15 +--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/mips/mm/c-r4k.c b/arch/mips/mm/c-r4k.c
> index caac3d7..a49010c 100644
> --- a/arch/mips/mm/c-r4k.c
> +++ b/arch/mips/mm/c-r4k.c
> @@ -449,6 +449,7 @@ static inline void local_r4k___flush_cache_all(void * 
> args)
>  
>   default:
>   r4k_blast_dcache();
> + mb(); /* cache instructions may be reordered */
>   r4k_blast_icache();
>   break;
>   }
> @@ -493,8 +494,10 @@ static inline void local_r4k_flush_cache_range(void * 
> args)
>   return;
>  
>   r4k_blast_dcache();
> - if (exec)
> + if (exec) {
> + mb(); /* cache instructions may be reordered */
>   r4k_blast_icache();
> + }
>  }
>  
>  static void r4k_flush_cache_range(struct vm_area_struct *vma,
> @@ -599,8 +602,13 @@ static inline void local_r4k_flush_cache_page(void *args)
>   if (cpu_has_dc_aliases || (exec && !cpu_has_ic_fills_f_dc)) {
>   vaddr ? r4k_blast_dcache_page(addr) :
>   r4k_blast_dcache_user_page(addr);
> - if (exec && !cpu_icache_snoops_remote_store)
> + if (exec)
> + mb(); /* cache instructions may be reordered */
> +
> + if (exec && !cpu_icache_snoops_remote_store) {
>   r4k_blast_scache_page(addr);
> + mb(); /* cache instructions may be reordered */
> + }
>   }
>   if (exec) {
>   if (vaddr && cpu_has_vtag_icache && mm == current->active_mm) {
> @@ -660,6 +668,7 @@ static inline void local_r4k_flush_icache_range(unsigned 
> long start, unsigned lo
>   R4600_HIT_CACHEOP_WAR_IMPL;
>   protected_blast_dcache_range(start, end);
>   }
> + mb(); /* cache instructions may be reordered */
>   }
>  
>   if (end - start > icache_size)
> @@ -798,6 +807,8 @@ static void local_r4k_flush_cache_sigtramp(void * arg)
>   protected_writeback_dcache_line(addr & ~(dc_lsize - 1));
>   if (!cpu_icache_snoops_remote_store && scache_size)
>   protected_writeback_scache_line(addr & ~(sc_lsize - 1));
> + if ((dc_lsize || scache_size) && ic_lsize)
> + mb(); /* cache instructions may be reordered */
>   if (ic_lsize)
>   protected_flush_icache_line(addr & ~(ic_lsize - 1));
>   if (MIPS4K_ICACHE_REFILL_WAR) {
>

[PATCH] clk: Update some outdated comments

2016-02-22 Thread Stephen Boyd

__clk_init() was renamed to __clk_core_init() but these comments
weren't updated.

Cc: Masahiro Yamada 
Signed-off-by: Stephen Boyd 
---
 drivers/clk/clk.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 51d673370d42..d95d6f924cac 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -2157,7 +2157,7 @@ unlock:
  *
  * Dynamically removes a clk and all its child nodes from the
  * debugfs clk directory if clk->dentry points to debugfs created by
- * clk_debug_register in __clk_init.
+ * clk_debug_register in __clk_core_init.
  */
 static void clk_debug_unregister(struct clk_core *core)
 {
@@ -2309,8 +2309,8 @@ static int __clk_core_init(struct clk_core *core)
core->parent = __clk_init_parent(core);
 
/*
-* Populate core->parent if parent has already been __clk_init'd.  If
-* parent has not yet been __clk_init'd then place clk in the orphan
+* Populate core->parent if parent has already been clk_core_init'd. If
+* parent has not yet been clk_core_init'd then place clk in the orphan
 * list.  If clk doesn't have any parents then place it in the root
 * clk list.
 *
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [PATCH] intel-pstate: Update frequencies of policy->cpus only from ->set_policy()

2016-02-22 Thread Srinivas Pandruvada




On 02/21/2016 08:57 PM, Viresh Kumar wrote:

The intel-pstate driver is using intel_pstate_hwp_set() from two
separate paths, i.e. ->set_policy() callback and sysfs update path for
the files present in /sys/devices/system/cpu/intel_pstate/ directory.

While an update to the sysfs path applies to all the CPUs being managed
by the driver (which essentially means all the online CPUs), the update
via the ->set_policy() callback applies to a smaller group of CPUs
managed by the policy for which ->set_policy() is called.

And so, intel_pstate_hwp_set() should update frequencies of only the
CPUs that are part of policy->cpus mask, while it is called from
->set_policy() callback.

In order to do that, add a parameter (cpumask) to intel_pstate_hwp_set()
and apply the frequency changes only to the concerned CPUs.

For ->set_policy() path, we are only concerned about policy->cpus, and
so policy->rwsem lock taken by the core prior to calling ->set_policy()
is enough to take care of any races. The larger lock acquired by
get_online_cpus() is required only for the updates to sysfs files.

Add another routine, intel_pstate_hwp_set_online_cpus(), and call it
from the sysfs update paths.

This also fixes a lockdep reported recently, where policy->rwsem and
get_online_cpus() could have been acquired in any order causing an ABBA
deadlock. The sequence of events leading to that was:

intel_pstate_init(...)
...cpufreq_online(...)
down_write(&policy->rwsem); // Locks policy->rwsem
...
cpufreq_init_policy(policy);
...intel_pstate_hwp_set();
get_online_cpus(); // Temporarily locks 
cpu_hotplug.lock
...
up_write(&policy->rwsem);

pm_suspend(...)
...disable_nonboot_cpus()
_cpu_down()
cpu_hotplug_begin(); // Locks cpu_hotplug.lock
__cpu_notify(CPU_DOWN_PREPARE, ...);
...cpufreq_offline_prepare();
down_write(&policy->rwsem); // Locks 
policy->rwsem

Reported-by: Joonas Lahtinen 
Signed-off-by: Viresh Kumar 

Acked-by: Srinivas Pandruvada 

---
  drivers/cpufreq/intel_pstate.c | 21 -
  1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index f4d85c2ae7b1..2e7058a2479d 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -287,7 +287,7 @@ static inline void update_turbo_state(void)
 cpu->pstate.max_pstate == cpu->pstate.turbo_pstate);
  }
  
-static void intel_pstate_hwp_set(void)

+static void intel_pstate_hwp_set(const struct cpumask *cpumask)
  {
int min, hw_min, max, hw_max, cpu, range, adj_range;
u64 value, cap;
@@ -297,9 +297,7 @@ static void intel_pstate_hwp_set(void)
hw_max = HWP_HIGHEST_PERF(cap);
range = hw_max - hw_min;
  
-	get_online_cpus();

-
-   for_each_online_cpu(cpu) {
+   for_each_cpu(cpu, cpumask) {
rdmsrl_on_cpu(cpu, MSR_HWP_REQUEST, &value);
adj_range = limits->min_perf_pct * range / 100;
min = hw_min + adj_range;
@@ -318,7 +316,12 @@ static void intel_pstate_hwp_set(void)
value |= HWP_MAX_PERF(max);
wrmsrl_on_cpu(cpu, MSR_HWP_REQUEST, value);
}
+}
  
+static void intel_pstate_hwp_set_online_cpus(void)

+{
+   get_online_cpus();
+   intel_pstate_hwp_set(cpu_online_mask);
put_online_cpus();
  }
  
@@ -440,7 +443,7 @@ static ssize_t store_no_turbo(struct kobject *a, struct attribute *b,

limits->no_turbo = clamp_t(int, input, 0, 1);
  
  	if (hwp_active)

-   intel_pstate_hwp_set();
+   intel_pstate_hwp_set_online_cpus();
  
  	return count;

  }
@@ -466,7 +469,7 @@ static ssize_t store_max_perf_pct(struct kobject *a, struct 
attribute *b,
  int_tofp(100));
  
  	if (hwp_active)

-   intel_pstate_hwp_set();
+   intel_pstate_hwp_set_online_cpus();
return count;
  }
  
@@ -491,7 +494,7 @@ static ssize_t store_min_perf_pct(struct kobject *a, struct attribute *b,

  int_tofp(100));
  
  	if (hwp_active)

-   intel_pstate_hwp_set();
+   intel_pstate_hwp_set_online_cpus();
return count;
  }
  
@@ -1112,7 +1115,7 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy)

pr_debug("intel_pstate: set performance\n");
limits = &performance_limits;
if (hwp_active)
-   intel_pstate_hwp_set();
+   intel_pstate_hwp_set(policy->cpus);
return 0;
}
  
@@ -1144,7 +1147,7 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy)

  int_tofp(100));

Re: [PATCH 4/8] akcipher: Move the RSA DER encoding to the crypto layer

2016-02-22 Thread Tadeusz Struk

On 02/22/2016 02:28 PM, David Howells wrote:
> Tadeusz Struk  wrote:
> 
>> I wonder if this should be merged with the crypto/rsa-pkcs1pad.c template
>> that we already have. Looks like the two do the same padding now.
>> Should we merge then and pass the hash param as a separate template param,
>> e.g the public_key would allocate "pkcs1pad(rsa, sha1)"?
> 
> Ummm...  Possibly.  Is that how it's used?
> 
>   warthog>git grep pkcs1pad -- Documentation
>   warthog1>

Yes, no docs. Sorry.

> 
> Anyway, the problem I have with this is that I want to get that knowledge out
> of the asymmetric key in-software public key subtype.  It knows "rsa", "dsa",
> "ecdsa", ... because that's all the OIDs tell it.

Rigth, for now the public_key would need to build the full algorithm string as
follows:

vsprintf(name, "pkcs1pad(%s, %s)", pkey_algo_name[sig->pkey_algo],
 hash_algo_name[sig->pkey_hash_algo]);

Do you plan to add more padding schemes later? 

> 
> I guess if I have to, I can stoop to converting "rsa" to "pkcs1pad(rsa, 
> sha1)".
> 
> Can you do me a really quick merge?  -rc5 is already out, and I want to get it
> to James pronto - plus I have things that are pending on this change being
> made.

Yes, I can start woring on a subsequent patch based on your changes in
http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=keys-rsa
Is that ok with you?

> 
> Oh - and how does the padding template find the algorithm DER encoding string
> to use?  I have wondered whether it should be stored in with the hash
> algorithm, but it probably makes more sense to keep it with the rsa module.

We can put everything into the crypto/rsa-pkcs1pad.c
This is where all the padding logic should be, I think.
Thanks,
-- 
TS

Re: [PATCH 0/8] tty: n_gsm: Make mux work as a responder station DUPLICATE

2016-02-22 Thread Andrej Krpic



Please ignore this duplicate thread.

[PATCH] Staging: comedi: mite: added spaces around | and *

2016-02-22 Thread Philippe Loctaux

Added spaces around | and *, fixing 2 checkpatch checks.

Signed-off-by: Philippe Loctaux 
---
 drivers/staging/comedi/drivers/mite.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/comedi/drivers/mite.c 
b/drivers/staging/comedi/drivers/mite.c
index 8f24702..d5cd4f3 100644
--- a/drivers/staging/comedi/drivers/mite.c
+++ b/drivers/staging/comedi/drivers/mite.c
@@ -51,7 +51,7 @@
 
 #include "mite.h"
 
-#define TOP_OF_PAGE(x) ((x)|(~(PAGE_MASK)))
+#define TOP_OF_PAGE(x) ((x) | (~(PAGE_MASK)))
 
 struct mite_struct *mite_alloc(struct pci_dev *pcidev)
 {
@@ -216,7 +216,7 @@ EXPORT_SYMBOL_GPL(mite_free_ring);
 struct mite_channel *mite_request_channel_in_range(struct mite_struct *mite,
   struct
   mite_dma_descriptor_ring
-  *ring, unsigned min_channel,
+  * ring, unsigned min_channel,
   unsigned max_channel)
 {
int i;
-- 
2.7.1

Re: [PATCH 04/10] x86/xsaves: Introduce a new check that allows correct xstates copy from kernel to user directly

2016-02-22 Thread Andy Lutomirski

On Mon, Feb 22, 2016 at 3:27 PM, Yu-cheng Yu  wrote:
> On Mon, Feb 22, 2016 at 02:42:54PM -0800, Andy Lutomirski wrote:
>> > +static int should_save_registers_directly(void)
>>
>> I don't like the name of this function because:
>>
>> > +{
>> > +   /*
>> > +* In signal handling path, the kernel already checks if
>> > +* FPU instructions have been used before it calls
>> > +* copy_fpstate_to_sigframe(). We check this here again
>> > +* to detect any potential mis-use and saving invalid
>> > +* register values directly to a signal frame.
>> > +*/
>> > +   WARN_ONCE(!current->thread.fpu.fpstate_active,
>> > + "direct FPU save with no math use\n");
>>
>> ... Here "direct" seems to mean that we're asking whether to directly save 
>> ...
>>
>> > +
>> > +   /*
>> > +* In the case that we are using a compacted kernel
>> > +* xsave area, we can not copy the thread.fpu.state
>> > +* directly to userspace and *must* save it from the
>> > +* registers directly.
>> > +*/
>>
>> ... and here "directly" means *both* copying directly to userspace and
>> saving using xsave directly.
>>
>> So can you rename it to something with an obvious meaning like
>> "may_memcpy_fpu_regs" or similar?
>
> This function determines whether the kernel should do
> copy_fpregs_to_sigframe(). Perhaps we can re-name it to
> may_copy_fpregs_to_sigframe()?
>

Works for me.


> --Yu-cheng
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

[PATCH v2] mm: scale kswapd watermarks in proportion to memory

2016-02-22 Thread Johannes Weiner

In machines with 140G of memory and enterprise flash storage, we have
seen read and write bursts routinely exceed the kswapd watermarks and
cause thundering herds in direct reclaim. Unfortunately, the only way
to tune kswapd aggressiveness is through adjusting min_free_kbytes -
the system's emergency reserves - which is entirely unrelated to the
system's latency requirements. In order to get kswapd to maintain a
250M buffer of free memory, the emergency reserves need to be set to
1G. That is a lot of memory wasted for no good reason.

On the other hand, it's reasonable to assume that allocation bursts
and overall allocation concurrency scale with memory capacity, so it
makes sense to make kswapd aggressiveness a function of that as well.

Change the kswapd watermark scale factor from the currently fixed 25%
of the tunable emergency reserve to a tunable 0.001% of memory.

Beyond 1G of memory, this will produce bigger watermark steps than the
current formula in default settings. Ensure that the new formula never
chooses steps smaller than that, i.e. 25% of the emergency reserve.

On a 140G machine, this raises the default watermark steps - the
distance between min and low, and low and high - from 16M to 143M.

Signed-off-by: Johannes Weiner 
Acked-by: Mel Gorman 
---
 Documentation/sysctl/vm.txt | 18 ++
 include/linux/mm.h  |  1 +
 include/linux/mmzone.h  |  2 ++
 kernel/sysctl.c | 10 ++
 mm/page_alloc.c | 29 +++--
 5 files changed, 58 insertions(+), 2 deletions(-)

v2: Ensure 25% of emergency reserves as a minimum on small machines -Rik

diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 89a887c..b02d940 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -803,6 +803,24 @@ performance impact. Reclaim code needs to take various 
locks to find freeable
 directory and inode objects. With vfs_cache_pressure=1000, it will look for
 ten times more freeable objects than there are.
 
+=
+
+watermark_scale_factor:
+
+This factor controls the aggressiveness of kswapd. It defines the
+amount of memory left in a node/system before kswapd is woken up and
+how much memory needs to be free before kswapd goes back to sleep.
+
+The unit is in fractions of 10,000. The default value of 10 means the
+distances between watermarks are 0.001% of the available memory in the
+node/system. The maximum value is 1000, or 10% of memory.
+
+A high rate of threads entering direct reclaim (allocstall) or kswapd
+going to sleep prematurely (kswapd_low_wmark_hit_quickly) can indicate
+that the number of free pages kswapd maintains for latency reasons is
+too small for the allocation bursts occurring in the system. This knob
+can then be used to tune kswapd aggressiveness accordingly.
+
 ==
 
 zone_reclaim_mode:
diff --git a/include/linux/mm.h b/include/linux/mm.h
index a0ad7af..d330cbb 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1869,6 +1869,7 @@ extern void zone_pcp_reset(struct zone *zone);
 
 /* page_alloc.c */
 extern int min_free_kbytes;
+extern int watermark_scale_factor;
 
 /* nommu.c */
 extern atomic_long_t mmap_pages_allocated;
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 03cbdd9..85d6702 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -833,6 +833,8 @@ static inline int is_highmem(struct zone *zone)
 struct ctl_table;
 int min_free_kbytes_sysctl_handler(struct ctl_table *, int,
void __user *, size_t *, loff_t *);
+int watermark_scale_factor_sysctl_handler(struct ctl_table *, int,
+   void __user *, size_t *, loff_t *);
 extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1];
 int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *, int,
void __user *, size_t *, loff_t *);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index d479707..780769e 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -126,6 +126,7 @@ static int __maybe_unused two = 2;
 static int __maybe_unused four = 4;
 static unsigned long one_ul = 1;
 static int one_hundred = 100;
+static int one_thousand = 1000;
 #ifdef CONFIG_PRINTK
 static int ten_thousand = 1;
 #endif
@@ -1393,6 +1394,15 @@ static struct ctl_table vm_table[] = {
.extra1 = &zero,
},
{
+   .procname   = "watermark_scale_factor",
+   .data   = &watermark_scale_factor,
+   .maxlen = sizeof(watermark_scale_factor),
+   .mode   = 0644,
+   .proc_handler   = watermark_scale_factor_sysctl_handler,
+   .extra1 = &one,
+   .extra2 = &one_thousand,
+   },
+   {
.procname   = "percpu_pa

Re: [PATCH v2 12/16] clk: avoid circular clock topology

2016-02-22 Thread Joachim Eastwood

On 22 February 2016 at 03:29, Masahiro Yamada
 wrote:
> Hi Joachim,
>
>
> 2016-02-22 6:39 GMT+09:00 Joachim Eastwood :
>> Hi everyone,
>>
>> On 28 December 2015 at 11:10, Masahiro Yamada
>>  wrote:
>>> Currently, clk_register() never checks a circular parent looping,
>>> but clock providers could register such an insane clock topology.
>>> For example, "clk_a" could have "clk_b" as a parent, and vice versa.
>>> In this case, clk_core_reparent() creates a circular parent list
>>> and __clk_recalc_accuracies() calls itself recursively forever.
>>>
>>> The core infrastructure should be kind enough to bail out, showing
>>> an appropriate error message in such a case.  This helps to easily
>>> find a bug in clock providers.  (uh, I made such a silly mistake
>>> when I was implementing my clock providers first.  I was upset
>>> because the kernel did not respond, without any error message.)
>>>
>>> This commit adds a new helper function, __clk_is_ancestor().  It
>>> returns true if the second argument is a possible ancestor of the
>>> first one.  If a clock core is a possible ancestor of itself, it
>>> would make a loop when it were registered.  That should be detected
>>> as an error.
>>
>> This commit breaks lpc18xx boot in next right now. See
>> http://marc.info/?l=linux-arm-kernel&m=145608597106087&w=2
>>
>> The Clock Generation Unit (CGU) on lpc18xx allow for circular parents
>> in hardware. While it is obliviously not a good idea to configure the
>> clocks in that manner there is nothing that stops you either.
>>
>> Please take a look at the second figure on:
>> https://github.com/manabian/linux-lpc/wiki/LPC18xx-LPC43xx-clocks
>> All PLLs can feed clock into the dividers and the dividers can feed
>> clock into the PLLs.
>>
>> The reason why this is made possible in the CGU is because you can
>> then choose where to put your divider; either before the PLL or after.
>>
>
>
> Sorry for breaking your board.

No worries, that is why we have next so we can catch it before it hits
mainline :-)


> I am OK with reverting b58f75aa83fb.
>
>
>
> I guess your hardware could make clock looping for the best flexibility
> but you do not make clock looping in actual use cases.

That's right.


> Maybe, does it make sense to check the parent looping
> in clk_set_parent() or somewhere, not in clk_register()?

I think that would be a nice addition. While the CGU can certainly be
configured with loops it is indeed something that we should prevent
from happening.


regards,
Joachim Eastwood

Re: [PATCH 04/10] x86/xsaves: Introduce a new check that allows correct xstates copy from kernel to user directly

2016-02-22 Thread Yu-cheng Yu

On Mon, Feb 22, 2016 at 02:42:54PM -0800, Andy Lutomirski wrote:
> > +static int should_save_registers_directly(void)
> 
> I don't like the name of this function because:
> 
> > +{
> > +   /*
> > +* In signal handling path, the kernel already checks if
> > +* FPU instructions have been used before it calls
> > +* copy_fpstate_to_sigframe(). We check this here again
> > +* to detect any potential mis-use and saving invalid
> > +* register values directly to a signal frame.
> > +*/
> > +   WARN_ONCE(!current->thread.fpu.fpstate_active,
> > + "direct FPU save with no math use\n");
> 
> ... Here "direct" seems to mean that we're asking whether to directly save ...
> 
> > +
> > +   /*
> > +* In the case that we are using a compacted kernel
> > +* xsave area, we can not copy the thread.fpu.state
> > +* directly to userspace and *must* save it from the
> > +* registers directly.
> > +*/
> 
> ... and here "directly" means *both* copying directly to userspace and
> saving using xsave directly.
> 
> So can you rename it to something with an obvious meaning like
> "may_memcpy_fpu_regs" or similar?

This function determines whether the kernel should do
copy_fpregs_to_sigframe(). Perhaps we can re-name it to
may_copy_fpregs_to_sigframe()?

--Yu-cheng

Re: [PATCH v3 4/5] ACPI / processor_idle : introduce ARCH_SUPPORTS_ACPI_PROCESSOR_CSTATE

2016-02-22 Thread Rafael J. Wysocki

On Mon, Feb 22, 2016 at 2:46 PM, Sudeep Holla  wrote:
> Hi Rafael,
>
> On 17/02/16 12:21, Sudeep Holla wrote:
>>
>>
>>
>> On 16/02/16 20:18, Rafael J. Wysocki wrote:
>
>
> [..]
>
>>
>>> This way it all should work without any new Kconfig options.
>>>
>>
>> I agree with you in terms of avoiding new Kconfig option. However the
>> main reason for adding it is to avoid declaring dummy functions and
>> variables on ARM64.
>>
>> It's hard to justify the maintainers as it's totally useless on ARM64.
>> E.g. boot_option_idle_override, IDLE_NOMWAIT, acpi_unlazy_tlb,
>> arch_safe_halt.
>>
>> Other option is to push those under CONFIG_X86, but then I don't have
>> much idea on what are all needed for IA64, so took an option that
>> encapsulates everything under CSTATE feature Kconfig, which is not user
>> visible and selected by archs supporting it by default.
>>
>> I am open to any other alternative.
>>
>
> Whatever alternative methods I tried so far ended up much horrible than
> this. So any suggestions are much appreciated.

Well, you have to give me some time here, sorry.

I'm still not done with cpufreq at this point and it's going to
consume some more cycles I'm afraid.

Thanks,
Rafael

Re: [PATCH] lkdtm: add test for executing .rodata

2016-02-22 Thread PaX Team

On 22 Feb 2016 at 12:46, Kees Cook wrote:

> GCC really wants to declare the section. :(

hmm, i see, so how about going about it another way. instead of trying
to do this at compile/link time, do it an load/runtime. one way of doing
it would be to preserve a page in .rodata then map in a code page underneath
that holds your empty function (which you can generate from C). it'd be
somewhat similar to how the vsyscall page on amd64 is mapped (or used to
be mapped) from the kernel image into its userland visible place.

[PATCH] Revert "clk: avoid circular clock topology"

2016-02-22 Thread Stephen Boyd

This reverts commit 858d5881564026cbc4e6f5e25ae878a27df5d4c9.

Joachim reports that this commit breaks lpc18xx boot. This is
because the hardware has circular clk topology where PLLs can
feed into dividers and the same dividers can feed into the PLLs.
The hardware is designed this way so that you can choose to put
the divider before the PLL or after the PLL depending on what you
configure to be the parent of the divider and what you configure
to be the parent of the PLL.

So let's drop this patch for now because we have hardware that
actually has loops. A future patch could check for circular
parents when we change parents and fail the switch, but that's
probably best left to some debugging Kconfig option so that we
don't suffer the sanity checking cost all the time.

Reported-by: Joachim Eastwood 
Cc: Masahiro Yamada 
Signed-off-by: Stephen Boyd 
---
 drivers/clk/clk.c | 40 
 1 file changed, 40 deletions(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 58ef3dab894a..51d673370d42 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -2242,38 +2242,6 @@ static inline void clk_debug_unregister(struct clk_core 
*core)
 #endif
 
 /**
- * __clk_is_ancestor - check if a clk_core is a possible ancestor of another
- * @core: clock core
- * @ancestor: ancestor clock core
- *
- * Returns true if there is a possibility that @ancestor can be an ancestor
- * of @core, false otherwise.
- *
- * This function can be used against @core or @ancestor that has not been
- * registered yet.
- */
-static bool __clk_is_ancestor(struct clk_core *core, struct clk_core *ancestor)
-{
-   struct clk_core *parent;
-   int i;
-
-   for (i = 0; i < core->num_parents; i++) {
-   parent = clk_core_get_parent_by_index(core, i);
-   /*
-* If ancestor has not been added to clk_{root,orphan}_list
-* yet, clk_core_lookup() cannot find it.  If parent is NULL,
-* compare the name strings, too.
-*/
-   if ((parent && (parent == ancestor ||
-   __clk_is_ancestor(parent, ancestor))) ||
-   (!parent && !strcmp(core->parent_names[i], ancestor->name)))
-   return true;
-   }
-
-   return false;
-}
-
-/**
  * __clk_core_init - initialize the data structures in a struct clk_core
  * @core:  clk_core being initialized
  *
@@ -2338,14 +2306,6 @@ static int __clk_core_init(struct clk_core *core)
"%s: invalid NULL in %s's .parent_names\n",
__func__, core->name);
 
-   /* If core is an ancestor of itself, it would make a loop. */
-   if (__clk_is_ancestor(core, core)) {
-   pr_err("%s: %s would create circular parent\n", __func__,
-  core->name);
-   ret = -EINVAL;
-   goto out;
-   }
-
core->parent = __clk_init_parent(core);
 
/*
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

[PATCH 2/6] ncr5380: Dont release lock for PIO transfer

2016-02-22 Thread Finn Thain

The calls to NCR5380_transfer_pio() for DATA IN and DATA OUT phases will
modify cmd->SCp.this_residual, cmd->SCp.ptr and cmd->SCp.buffer. That
works as long as EH does not intervene, which became possible in
atari_NCR5380.c when I changed the locking to bring it closer to
NCR5380.c.

If error recovery aborts the command, the scsi_cmnd in question and its
buffer will be returned to the mid-layer. So the transfer has to cease,
but it can't be stopped by the initiator because the target controls the
bus phase.

The problem does not arise if the lock is not released. That was fine for
atari_scsi, because it implements DMA. For the other drivers, we have to
release the lock and re-enable interrupts for long PIO data transfers.

The solution is to split the transfer into small chunks. In between chunks
the main loop releases the lock and re-enables interrupts. Thus interrupts
can be serviced and eh_bus_reset_handler can intervene if need be.

This fixes an oops in NCR5380_transfer_pio() that can happen when the EH
abort handler is invoked during DATA IN or DATA OUT phase.

Fixes: 11d2f63b9cf5 ("ncr5380: Change instance->host_lock to hostdata->lock")
Reported-and-tested-by: Michael Schmitz 
Signed-off-by: Finn Thain 

---
 drivers/scsi/NCR5380.c   |   16 +---
 drivers/scsi/atari_NCR5380.c |   16 +---
 2 files changed, 18 insertions(+), 14 deletions(-)

Index: linux/drivers/scsi/NCR5380.c
===
--- linux.orig/drivers/scsi/NCR5380.c   2016-02-23 10:06:56.0 +1100
+++ linux/drivers/scsi/NCR5380.c2016-02-23 10:06:57.0 +1100
@@ -1759,9 +1759,7 @@ static void NCR5380_information_transfer
unsigned char msgout = NOP;
int sink = 0;
int len;
-#if defined(PSEUDO_DMA) || defined(REAL_DMA_POLL)
int transfersize;
-#endif
unsigned char *data;
unsigned char phase, tmp, extended_msg[10], old_phase = 0xff;
struct scsi_cmnd *cmd;
@@ -1854,13 +1852,17 @@ static void NCR5380_information_transfer
} else
 #endif /* defined(PSEUDO_DMA) || 
defined(REAL_DMA_POLL) */
{
-   spin_unlock_irq(&hostdata->lock);
-   NCR5380_transfer_pio(instance, &phase,
-(int 
*)&cmd->SCp.this_residual,
+   /* Break up transfer into 3 ms chunks,
+* presuming 6 accesses per handshake.
+*/
+   transfersize = min((unsigned 
long)cmd->SCp.this_residual,
+  
hostdata->accesses_per_ms / 2);
+   len = transfersize;
+   NCR5380_transfer_pio(instance, &phase, 
&len,
 (unsigned char 
**)&cmd->SCp.ptr);
-   spin_lock_irq(&hostdata->lock);
+   cmd->SCp.this_residual -= transfersize 
- len;
}
-   break;
+   return;
case PHASE_MSGIN:
len = 1;
data = &tmp;
Index: linux/drivers/scsi/atari_NCR5380.c
===
--- linux.orig/drivers/scsi/atari_NCR5380.c 2016-02-23 10:06:56.0 
+1100
+++ linux/drivers/scsi/atari_NCR5380.c  2016-02-23 10:06:57.0 +1100
@@ -1838,9 +1838,7 @@ static void NCR5380_information_transfer
unsigned char msgout = NOP;
int sink = 0;
int len;
-#if defined(REAL_DMA)
int transfersize;
-#endif
unsigned char *data;
unsigned char phase, tmp, extended_msg[10], old_phase = 0xff;
struct scsi_cmnd *cmd;
@@ -1983,18 +1981,22 @@ static void NCR5380_information_transfer
} else
 #endif /* defined(REAL_DMA) */
{
-   spin_unlock_irq(&hostdata->lock);
-   NCR5380_transfer_pio(instance, &phase,
-(int 
*)&cmd->SCp.this_residual,
+   /* Break up transfer into 3 ms chunks,
+* presuming 6 accesses per handshake.
+*/
+   transfersize = min((unsigned 
long)cmd->SCp.this_residual,
+  
hostdata->accesses_per_ms / 2);
+   len = transfersize;
+

[PATCH 3/6] ncr5380: Dont re-enter NCR5380_select()

2016-02-22 Thread Finn Thain

Calling NCR5380_select() from the abort handler causes various problems.
Firstly, it means potentially re-entering NCR5380_select(). Secondly, it
means that the lock is released, which permits the EH handlers to be
re-entered. The combination results in crashes. Don't do it.

Fixes: 8b00c3d5d40d ("ncr5380: Implement new eh_abort_handler")
Reported-and-tested-by: Michael Schmitz 
Signed-off-by: Finn Thain 

---
 drivers/scsi/NCR5380.c   |   16 
 drivers/scsi/atari_NCR5380.c |   16 
 2 files changed, 16 insertions(+), 16 deletions(-)

Index: linux/drivers/scsi/NCR5380.c
===
--- linux.orig/drivers/scsi/NCR5380.c   2016-02-23 10:06:57.0 +1100
+++ linux/drivers/scsi/NCR5380.c2016-02-23 10:06:58.0 +1100
@@ -2302,6 +2302,9 @@ static bool list_del_cmd(struct list_hea
  * If cmd was not found at all then presumably it has already been completed,
  * in which case return SUCCESS to try to avoid further EH measures.
  * If the command has not completed yet, we must not fail to find it.
+ *
+ * The lock protects driver data structures, but EH handlers also use it
+ * to serialize their own execution and prevent their own re-entry.
  */
 
 static int NCR5380_abort(struct scsi_cmnd *cmd)
@@ -2338,14 +2341,11 @@ static int NCR5380_abort(struct scsi_cmn
if (list_del_cmd(&hostdata->disconnected, cmd)) {
dsprintk(NDEBUG_ABORT, instance,
 "abort: removed %p from disconnected list\n", cmd);
-   cmd->result = DID_ERROR << 16;
-   if (!hostdata->connected)
-   NCR5380_select(instance, cmd);
-   if (hostdata->connected != cmd) {
-   complete_cmd(instance, cmd);
-   result = FAILED;
-   goto out;
-   }
+   /* Can't call NCR5380_select() and send ABORT because that
+* means releasing the lock. Need a bus reset.
+*/
+   result = FAILED;
+   goto out;
}
 
if (hostdata->connected == cmd) {
Index: linux/drivers/scsi/atari_NCR5380.c
===
--- linux.orig/drivers/scsi/atari_NCR5380.c 2016-02-23 10:06:57.0 
+1100
+++ linux/drivers/scsi/atari_NCR5380.c  2016-02-23 10:06:58.0 +1100
@@ -2497,6 +2497,9 @@ static bool list_del_cmd(struct list_hea
  * If cmd was not found at all then presumably it has already been completed,
  * in which case return SUCCESS to try to avoid further EH measures.
  * If the command has not completed yet, we must not fail to find it.
+ *
+ * The lock protects driver data structures, but EH handlers also use it
+ * to serialize their own execution and prevent their own re-entry.
  */
 
 static int NCR5380_abort(struct scsi_cmnd *cmd)
@@ -2533,14 +2536,11 @@ static int NCR5380_abort(struct scsi_cmn
if (list_del_cmd(&hostdata->disconnected, cmd)) {
dsprintk(NDEBUG_ABORT, instance,
 "abort: removed %p from disconnected list\n", cmd);
-   cmd->result = DID_ERROR << 16;
-   if (!hostdata->connected)
-   NCR5380_select(instance, cmd);
-   if (hostdata->connected != cmd) {
-   complete_cmd(instance, cmd);
-   result = FAILED;
-   goto out;
-   }
+   /* Can't call NCR5380_select() and send ABORT because that
+* means releasing the lock. Need a bus reset.
+*/
+   result = FAILED;
+   goto out;
}
 
if (hostdata->connected == cmd) {

[PATCH 4/6] ncr5380: Forget aborted commands

2016-02-22 Thread Finn Thain

The list structures and related logic used in the NCR5380 driver mean that
a command cannot be queued twice (i.e. can't appear on more than one queue
and can't appear on the same queue more than once).

The abort handler must forget the command so that the mid-layer can re-use
it. E.g. the ML may send it back to the LLD via via scsi_eh_get_sense().

Fix this and also fix two error paths, so that commands get forgotten iff
completed.

Fixes: 8b00c3d5d40d ("ncr5380: Implement new eh_abort_handler")
Tested-by: Michael Schmitz 
Signed-off-by: Finn Thain 

---
 drivers/scsi/NCR5380.c   |   62 +++
 drivers/scsi/atari_NCR5380.c |   62 +++
 2 files changed, 34 insertions(+), 90 deletions(-)

Index: linux/drivers/scsi/NCR5380.c
===
--- linux.orig/drivers/scsi/NCR5380.c   2016-02-23 10:06:58.0 +1100
+++ linux/drivers/scsi/NCR5380.c2016-02-23 10:07:00.0 +1100
@@ -1796,6 +1796,7 @@ static void NCR5380_information_transfer
do_abort(instance);
cmd->result = DID_ERROR << 16;
complete_cmd(instance, cmd);
+   hostdata->connected = NULL;
return;
 #endif
case PHASE_DATAIN:
@@ -1845,7 +1846,6 @@ static void NCR5380_information_transfer
sink = 1;
do_abort(instance);
cmd->result = DID_ERROR << 16;
-   complete_cmd(instance, cmd);
/* XXX - need to source or sink 
data here, as appropriate */
} else
cmd->SCp.this_residual -= 
transfersize - len;
@@ -2294,14 +2294,14 @@ static bool list_del_cmd(struct list_hea
  * [disconnected -> connected ->]...
  * [autosense -> connected ->] done
  *
- * If cmd is unissued then just remove it.
- * If cmd is disconnected, try to select the target.
- * If cmd is connected, try to send an abort message.
- * If cmd is waiting for autosense, give it a chance to complete but check
- * that it isn't left connected.
  * If cmd was not found at all then presumably it has already been completed,
  * in which case return SUCCESS to try to avoid further EH measures.
+ *
  * If the command has not completed yet, we must not fail to find it.
+ * We have no option but to forget the aborted command (even if it still
+ * lacks sense data). The mid-layer may re-issue a command that is in error
+ * recovery (see scsi_send_eh_cmnd), but the logic and data structures in
+ * this driver are such that a command can appear on one queue only.
  *
  * The lock protects driver data structures, but EH handlers also use it
  * to serialize their own execution and prevent their own re-entry.
@@ -2327,6 +2327,7 @@ static int NCR5380_abort(struct scsi_cmn
 "abort: removed %p from issue queue\n", cmd);
cmd->result = DID_ABORT << 16;
cmd->scsi_done(cmd); /* No tag or busy flag to worry about */
+   goto out;
}
 
if (hostdata->selecting == cmd) {
@@ -2344,6 +2345,8 @@ static int NCR5380_abort(struct scsi_cmn
/* Can't call NCR5380_select() and send ABORT because that
 * means releasing the lock. Need a bus reset.
 */
+   set_host_byte(cmd, DID_ERROR);
+   complete_cmd(instance, cmd);
result = FAILED;
goto out;
}
@@ -2351,45 +2354,9 @@ static int NCR5380_abort(struct scsi_cmn
if (hostdata->connected == cmd) {
dsprintk(NDEBUG_ABORT, instance, "abort: cmd %p is 
connected\n", cmd);
hostdata->connected = NULL;
-   if (do_abort(instance)) {
-   set_host_byte(cmd, DID_ERROR);
-   complete_cmd(instance, cmd);
-   result = FAILED;
-   goto out;
-   }
-   set_host_byte(cmd, DID_ABORT);
 #ifdef REAL_DMA
hostdata->dma_len = 0;
 #endif
-   if (cmd->cmnd[0] == REQUEST_SENSE)
-   complete_cmd(instance, cmd);
-   else {
-   struct NCR5380_cmd *ncmd = scsi_cmd_priv(cmd);
-
-   /* Perform autosense for this command */
-   list_add(&ncmd->list, &hostdata->autosense);
-   }
-   }
-
-   if (list_find_cmd(&hostdata->autosense, cmd)) {
-   dsprintk(NDEBUG_ABORT, instance,
-"abort: found %p on sense queue\n", cmd);
-   spin_unlock_irqrestore(&hostdata->lock,

[PATCH 1/6] ncr5380: Correctly clear command pointers and lists after bus reset

2016-02-22 Thread Finn Thain

Commands subject to exception handling are to be returned to the scsi
mid-layer. Make sure that the various command pointers and command lists
in the low-level driver are correctly cleansed of affected commands.

This fixes some bugs that I accidentally introduced in v4.5-rc1 including
the removal of INIT_LIST_HEAD for the 'autosense' and 'disconnected'
command lists, and the possible NULL pointer dereference in
NCR5380_bus_reset() that was reported by Dan Carpenter.

hostdata->sensing may also point to an affected command so this pointer
also has to be cleared. The abort handler calls complete_cmd() to take
care of this; let's have the bus reset handler do the same.

The issue queue may also contain an affected command. If so, remove it.
This also follows the abort handler logic.

Reported-by: Dan Carpenter 
Fixes: 62717f537e1b ("ncr5380: Implement new eh_bus_reset_handler")
Tested-by: Michael Schmitz 
Signed-off-by: Finn Thain 

---
 drivers/scsi/NCR5380.c   |   19 ---
 drivers/scsi/atari_NCR5380.c |   19 ---
 2 files changed, 24 insertions(+), 14 deletions(-)

Index: linux/drivers/scsi/NCR5380.c
===
--- linux.orig/drivers/scsi/NCR5380.c   2016-02-23 10:06:56.0 +1100
+++ linux/drivers/scsi/NCR5380.c2016-02-23 10:06:56.0 +1100
@@ -2450,7 +2450,16 @@ static int NCR5380_bus_reset(struct scsi
 * commands!
 */
 
-   hostdata->selecting = NULL;
+   if (list_del_cmd(&hostdata->unissued, cmd)) {
+   cmd->result = DID_RESET << 16;
+   cmd->scsi_done(cmd);
+   }
+
+   if (hostdata->selecting) {
+   hostdata->selecting->result = DID_RESET << 16;
+   complete_cmd(instance, hostdata->selecting);
+   hostdata->selecting = NULL;
+   }
 
list_for_each_entry(ncmd, &hostdata->disconnected, list) {
struct scsi_cmnd *cmd = NCR5380_to_scmd(ncmd);
@@ -2458,6 +2467,7 @@ static int NCR5380_bus_reset(struct scsi
set_host_byte(cmd, DID_RESET);
cmd->scsi_done(cmd);
}
+   INIT_LIST_HEAD(&hostdata->disconnected);
 
list_for_each_entry(ncmd, &hostdata->autosense, list) {
struct scsi_cmnd *cmd = NCR5380_to_scmd(ncmd);
@@ -2465,6 +2475,7 @@ static int NCR5380_bus_reset(struct scsi
set_host_byte(cmd, DID_RESET);
cmd->scsi_done(cmd);
}
+   INIT_LIST_HEAD(&hostdata->autosense);
 
if (hostdata->connected) {
set_host_byte(hostdata->connected, DID_RESET);
@@ -2472,12 +2483,6 @@ static int NCR5380_bus_reset(struct scsi
hostdata->connected = NULL;
}
 
-   if (hostdata->sensing) {
-   set_host_byte(hostdata->connected, DID_RESET);
-   complete_cmd(instance, hostdata->sensing);
-   hostdata->sensing = NULL;
-   }
-
for (i = 0; i < 8; ++i)
hostdata->busy[i] = 0;
 #ifdef REAL_DMA
Index: linux/drivers/scsi/atari_NCR5380.c
===
--- linux.orig/drivers/scsi/atari_NCR5380.c 2016-02-23 10:06:56.0 
+1100
+++ linux/drivers/scsi/atari_NCR5380.c  2016-02-23 10:06:56.0 +1100
@@ -2646,7 +2646,16 @@ static int NCR5380_bus_reset(struct scsi
 * commands!
 */
 
-   hostdata->selecting = NULL;
+   if (list_del_cmd(&hostdata->unissued, cmd)) {
+   cmd->result = DID_RESET << 16;
+   cmd->scsi_done(cmd);
+   }
+
+   if (hostdata->selecting) {
+   hostdata->selecting->result = DID_RESET << 16;
+   complete_cmd(instance, hostdata->selecting);
+   hostdata->selecting = NULL;
+   }
 
list_for_each_entry(ncmd, &hostdata->disconnected, list) {
struct scsi_cmnd *cmd = NCR5380_to_scmd(ncmd);
@@ -2654,6 +2663,7 @@ static int NCR5380_bus_reset(struct scsi
set_host_byte(cmd, DID_RESET);
cmd->scsi_done(cmd);
}
+   INIT_LIST_HEAD(&hostdata->disconnected);
 
list_for_each_entry(ncmd, &hostdata->autosense, list) {
struct scsi_cmnd *cmd = NCR5380_to_scmd(ncmd);
@@ -2661,6 +2671,7 @@ static int NCR5380_bus_reset(struct scsi
set_host_byte(cmd, DID_RESET);
cmd->scsi_done(cmd);
}
+   INIT_LIST_HEAD(&hostdata->autosense);
 
if (hostdata->connected) {
set_host_byte(hostdata->connected, DID_RESET);
@@ -2668,12 +2679,6 @@ static int NCR5380_bus_reset(struct scsi
hostdata->connected = NULL;
}
 
-   if (hostdata->sensing) {
-   set_host_byte(hostdata->connected, DID_RESET);
-   complete_cmd(instance, hostdata->sensing);
-   hostdata->sensing = NULL;
-   }
-
 #ifdef SUPPORT_TAGS
free_all_tags(hostdata);
 #endif

[PATCH 5/6] ncr5380: Fix NCR5380_select() EH checks and result handling

2016-02-22 Thread Finn Thain

Add missing checks for EH abort during arbitration and selection.
Rework the handling of NCR5380_select() result to improve clarity.

Fixes: 707d62b37fbb ("ncr5380: Fix EH during arbitration and selection")
Tested-by: Michael Schmitz 
Signed-off-by: Finn Thain 

---
 drivers/scsi/NCR5380.c   |   16 +++-
 drivers/scsi/atari_NCR5380.c |   16 +++-
 2 files changed, 22 insertions(+), 10 deletions(-)

Index: linux/drivers/scsi/NCR5380.c
===
--- linux.orig/drivers/scsi/NCR5380.c   2016-02-23 10:07:00.0 +1100
+++ linux/drivers/scsi/NCR5380.c2016-02-23 10:07:01.0 +1100
@@ -815,15 +815,17 @@ static void NCR5380_main(struct work_str
struct NCR5380_hostdata *hostdata =
container_of(work, struct NCR5380_hostdata, main_task);
struct Scsi_Host *instance = hostdata->host;
-   struct scsi_cmnd *cmd;
int done;
 
do {
done = 1;
 
spin_lock_irq(&hostdata->lock);
-   while (!hostdata->connected &&
-  (cmd = dequeue_next_cmd(instance))) {
+   while (!hostdata->connected && !hostdata->selecting) {
+   struct scsi_cmnd *cmd = dequeue_next_cmd(instance);
+
+   if (!cmd)
+   break;
 
dsprintk(NDEBUG_MAIN, instance, "main: dequeued %p\n", 
cmd);
 
@@ -840,8 +842,7 @@ static void NCR5380_main(struct work_str
 * entire unit.
 */
 
-   cmd = NCR5380_select(instance, cmd);
-   if (!cmd) {
+   if (!NCR5380_select(instance, cmd)) {
dsprintk(NDEBUG_MAIN, instance, "main: select 
complete\n");
} else {
dsprintk(NDEBUG_MAIN | NDEBUG_QUEUES, instance,
@@ -1056,6 +1057,11 @@ static struct scsi_cmnd *NCR5380_select(
/* Reselection interrupt */
goto out;
}
+   if (!hostdata->selecting) {
+   /* Command was aborted */
+   NCR5380_write(MODE_REG, MR_BASE);
+   goto out;
+   }
if (err < 0) {
NCR5380_write(MODE_REG, MR_BASE);
shost_printk(KERN_ERR, instance,
Index: linux/drivers/scsi/atari_NCR5380.c
===
--- linux.orig/drivers/scsi/atari_NCR5380.c 2016-02-23 10:07:00.0 
+1100
+++ linux/drivers/scsi/atari_NCR5380.c  2016-02-23 10:07:01.0 +1100
@@ -923,7 +923,6 @@ static void NCR5380_main(struct work_str
struct NCR5380_hostdata *hostdata =
container_of(work, struct NCR5380_hostdata, main_task);
struct Scsi_Host *instance = hostdata->host;
-   struct scsi_cmnd *cmd;
int done;
 
/*
@@ -936,8 +935,11 @@ static void NCR5380_main(struct work_str
done = 1;
 
spin_lock_irq(&hostdata->lock);
-   while (!hostdata->connected &&
-  (cmd = dequeue_next_cmd(instance))) {
+   while (!hostdata->connected && !hostdata->selecting) {
+   struct scsi_cmnd *cmd = dequeue_next_cmd(instance);
+
+   if (!cmd)
+   break;
 
dsprintk(NDEBUG_MAIN, instance, "main: dequeued %p\n", 
cmd);
 
@@ -960,8 +962,7 @@ static void NCR5380_main(struct work_str
 #ifdef SUPPORT_TAGS
cmd_get_tag(cmd, cmd->cmnd[0] != REQUEST_SENSE);
 #endif
-   cmd = NCR5380_select(instance, cmd);
-   if (!cmd) {
+   if (!NCR5380_select(instance, cmd)) {
dsprintk(NDEBUG_MAIN, instance, "main: select 
complete\n");
maybe_release_dma_irq(instance);
} else {
@@ -1257,6 +1258,11 @@ static struct scsi_cmnd *NCR5380_select(
/* Reselection interrupt */
goto out;
}
+   if (!hostdata->selecting) {
+   /* Command was aborted */
+   NCR5380_write(MODE_REG, MR_BASE);
+   goto out;
+   }
if (err < 0) {
NCR5380_write(MODE_REG, MR_BASE);
shost_printk(KERN_ERR, instance,

[PATCH 6/6] ncr5380: Call scsi_eh_prep_cmnd() and scsi_eh_restore_cmnd() as and when appropriate

2016-02-22 Thread Finn Thain

This bug causes the wrong command to have its sense pointer overwritten,
which sometimes leads to a NULL pointer deref. Fix this by checking which
command is being requeued before restoring the scsi_eh_save data.

It turns out that some targets will disconnect a REQUEST SENSE command.
The autosense algorithm doesn't anticipate this. Hence multiple commands
can end up undergoing autosense simultaneously, and they will all try to
use the same scsi_eh_save struct, which won't work. Defer autosense when
the scsi_eh_save storage is in use by another command.

Fixes: f27db8eb98a1 ("ncr5380: Fix autosense bugs")
Reported-and-tested-by: Michael Schmitz 
Signed-off-by: Finn Thain 

---
 drivers/scsi/NCR5380.c   |4 ++--
 drivers/scsi/atari_NCR5380.c |4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

Index: linux/drivers/scsi/NCR5380.c
===
--- linux.orig/drivers/scsi/NCR5380.c   2016-02-23 10:07:01.0 +1100
+++ linux/drivers/scsi/NCR5380.c2016-02-23 10:07:02.0 +1100
@@ -760,7 +760,7 @@ static struct scsi_cmnd *dequeue_next_cm
struct NCR5380_cmd *ncmd;
struct scsi_cmnd *cmd;
 
-   if (list_empty(&hostdata->autosense)) {
+   if (hostdata->sensing || list_empty(&hostdata->autosense)) {
list_for_each_entry(ncmd, &hostdata->unissued, list) {
cmd = NCR5380_to_scmd(ncmd);
dsprintk(NDEBUG_QUEUES, instance, "dequeue: cmd=%p 
target=%d busy=0x%02x lun=%llu\n",
@@ -793,7 +793,7 @@ static void requeue_cmd(struct Scsi_Host
struct NCR5380_hostdata *hostdata = shost_priv(instance);
struct NCR5380_cmd *ncmd = scsi_cmd_priv(cmd);
 
-   if (hostdata->sensing) {
+   if (hostdata->sensing == cmd) {
scsi_eh_restore_cmnd(cmd, &hostdata->ses);
list_add(&ncmd->list, &hostdata->autosense);
hostdata->sensing = NULL;
Index: linux/drivers/scsi/atari_NCR5380.c
===
--- linux.orig/drivers/scsi/atari_NCR5380.c 2016-02-23 10:07:01.0 
+1100
+++ linux/drivers/scsi/atari_NCR5380.c  2016-02-23 10:07:02.0 +1100
@@ -862,7 +862,7 @@ static struct scsi_cmnd *dequeue_next_cm
struct NCR5380_cmd *ncmd;
struct scsi_cmnd *cmd;
 
-   if (list_empty(&hostdata->autosense)) {
+   if (hostdata->sensing || list_empty(&hostdata->autosense)) {
list_for_each_entry(ncmd, &hostdata->unissued, list) {
cmd = NCR5380_to_scmd(ncmd);
dsprintk(NDEBUG_QUEUES, instance, "dequeue: cmd=%p 
target=%d busy=0x%02x lun=%llu\n",
@@ -901,7 +901,7 @@ static void requeue_cmd(struct Scsi_Host
struct NCR5380_hostdata *hostdata = shost_priv(instance);
struct NCR5380_cmd *ncmd = scsi_cmd_priv(cmd);
 
-   if (hostdata->sensing) {
+   if (hostdata->sensing == cmd) {
scsi_eh_restore_cmnd(cmd, &hostdata->ses);
list_add(&ncmd->list, &hostdata->autosense);
hostdata->sensing = NULL;

Re: [PATCH net v2] r8169:fix "rtl_counters_cond == 1 (loop: 1000, delay: 10)" log spam.

2016-02-22 Thread Francois Romieu

Chunhao Lin  :
[...]
> diff --git a/drivers/net/ethernet/realtek/r8169.c 
> b/drivers/net/ethernet/realtek/r8169.c
> index 537974c..a645f8d 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -7730,10 +7730,13 @@ rtl8169_get_stats64(struct net_device *dev, struct 
> rtnl_link_stats64 *stats)
>  {
>   struct rtl8169_private *tp = netdev_priv(dev);
>   void __iomem *ioaddr = tp->mmio_addr;
> + struct pci_dev *pdev = tp->pci_dev;

+   struct device *d = &tp->pci_dev->dev;

(the patch does not use pdev alone)

[...]
> @@ -7761,7 +7764,8 @@ rtl8169_get_stats64(struct net_device *dev, struct 
> rtnl_link_stats64 *stats)
>* Fetch additonal counter values missing in stats collected by driver
>* from tally counters.
>*/
> - rtl8169_update_counters(dev);
> + if (pm_runtime_active(&pdev->dev))
> + rtl8169_update_counters(dev);

pm_runtime_active() won't change after pm_runtime_get_noresume(). You may
set a boolean active = pm_runtime_active(d) before testing netif_running().

[...]
> @@ -7842,6 +7848,12 @@ static int rtl8169_runtime_suspend(struct device 
> *device)
>   struct pci_dev *pdev = to_pci_dev(device);
>   struct net_device *dev = pci_get_drvdata(pdev);
>   struct rtl8169_private *tp = netdev_priv(dev);
> + void __iomem *ioaddr = tp->mmio_addr;
> +
> + /* Update counters before going runtime suspend */
> + if (netif_running(dev))
> + rtl8169_rx_missed(dev, ioaddr);
> + rtl8169_update_counters(dev);
>  
>   if (!tp->TxDescArray)
>   return 0;

Nits:

- the tp->TxDescArray test provides the required synchronization: see
  rtl8169_{open/close} and their pm_runtime_{get / put}.

- ioaddr is not really needed : tp->mmio_addr appears only once and it does
  not mess the 72..80 cols limit.

- even if the device can only be automatically runtime suspended some time
  after a link down event, you may address davem's point regarding stats
  reliability and move rtl8169_rx_missed + rtl8169_update_counters after
  rtl8169_net_suspend.

-- 
Ueimor

[PATCH 0/6] ncr5380: Exception handling fixes for v4.5

2016-02-22 Thread Finn Thain


These patches fix some exception handling and autosense bugs that I
accidentally introduced in v4.5-rc1.

The error recovery and autosense code in these drivers has been unstable
for a long time. Despite that, v4.5-rc1 shows a regression in as much as
it exposes a bug in the aranym emulator. This leads to error recovery,
which can crash.

Also, Michael Schmitz reported some crashes involving abort handling
for a certain target device. And Dan Carpenter found a NULL pointer deref
in the new bus reset code.

Error recovery and autosense are stable with these patches.

I tested them using a Domex 3191D PCI card. Errors during IO were
simulated by sending bus resets and unplugging/replugging the SCSI
cables. Some of these patches fix bugs that only affect more capable
hardware (like Atari). Thanks to Michael Schmitz for patiently testing
those.

Please review this series for v4.5.

---
 drivers/scsi/NCR5380.c   |  133 +++
 drivers/scsi/atari_NCR5380.c |  133 +++
 2 files changed, 118 insertions(+), 148 deletions(-)

Re: [Xen-devel] [PATCH 8/9] x86/rtc: replace paravirt_enabled() check with subarch check

2016-02-22 Thread Luis R. Rodriguez

On Mon, Feb 22, 2016 at 09:34:26AM -0500, Boris Ostrovsky wrote:
> On 02/22/2016 05:27 AM, Borislav Petkov wrote:
> >On Mon, Feb 22, 2016 at 07:07:56AM +0100, Luis R. Rodriguez wrote:
> >>diff --git a/arch/x86/include/asm/x86_init.h 
> >>b/arch/x86/include/asm/x86_init.h
> >>index 1ae89a2721d6..fe0d579b63e3 100644
> >>--- a/arch/x86/include/asm/x86_init.h
> >>+++ b/arch/x86/include/asm/x86_init.h
> >>@@ -84,11 +84,14 @@ struct x86_init_paging {
> >>   *boot cpu
> >>   * @timer_init:   initialize the platform timer (default 
> >> PIT/HPET)
> >>   * @wallclock_init:   init the wallclock device
> >>+ * @no_cmos_rtc:   set when platform has no CMOS real-time clock
> >>+ * present
> >>   */
> >>  struct x86_init_timers {
> >>void (*setup_percpu_clockev)(void);
> >>void (*timer_init)(void);
> >>void (*wallclock_init)(void);
> >>+   bool no_cmos_rtc;
> >I'd add
> >
> > u64 flags;
> >
> >to x86_init_ops and then set X86_PLATFORM_NO_RTC or so in there. The
> >reason being, others could use that flags field too, for other stuff and
> >define more bits.
> 
> Maybe timer_flags or platform_flags (or something else) to be a
> little more cscope-friendly?

Sure, I'll go with platform_flags on x86_init_ops. Will repost a new
series after 0-day testing.

  Luis

linux-next: manual merge of the f2fs tree with Linus' tree

2016-02-22 Thread Stephen Rothwell

Hi Jaegeuk,

Today's linux-next merge of the f2fs tree got conflicts in:

  fs/ext4/crypto.c
  fs/ext4/ext4.h
  fs/ext4/namei.c

between commits:

  28b4c263961c ("ext4 crypto: revalidate dentry after adding or removing the 
key")
  ff978b09f973 ("ext4 crypto: move context consistency check to 
ext4_file_open()")

from Linus' tree (presumably) bug fixes in v4.5-rc5 and commit:

  e29c55a8d406 ("ext4 crypto: migrate into vfs's crypto engine")

from the f2fs tree.

I do not have enough information to fix this up, so I dropped the last
commit from the f2fs tree for today.

I *assume* that this has been passed by Ted - I can;t easily tell as
there are no Acked-by or Reviewed-by tags in the f2fs tree commit.  If
so, perhaps you could either merge v4.5-rc5 yourself and do the
resolutions necessary (with nice comments in the merge commit), or give
me some hints on how to do it.

-- 
Cheers,
Stephen Rothwell

[PATCH 1/6] cgroup: fix error return value of cgroup_addrm_files()

2016-02-22 Thread Tejun Heo

cgroup_addrm_files() incorrectly returned 0 after add failure.  Fix
it.

Signed-off-by: Tejun Heo 
---
 kernel/cgroup.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 7ad6191..68b032d 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -3369,7 +3369,7 @@ static int cgroup_addrm_files(struct cgroup_subsys_state 
*css,
  bool is_add)
 {
struct cftype *cft, *cft_end = NULL;
-   int ret;
+   int ret = 0;
 
lockdep_assert_held(&cgroup_mutex);
 
@@ -3398,7 +3398,7 @@ static int cgroup_addrm_files(struct cgroup_subsys_state 
*css,
cgroup_rm_file(cgrp, cft);
}
}
-   return 0;
+   return ret;
 }
 
 static int cgroup_apply_cftypes(struct cftype *cfts, bool is_add)
-- 
2.5.0

[PATCH 3/6] cgroup: s/child_subsys_mask/subtree_ss_mask/

2016-02-22 Thread Tejun Heo

For consistency with cgroup->subtree_control.

* cgroup->child_subsys_mask -> cgroup->subtree_ss_mask
* cgroup_calc_child_subsys_mask() -> cgroup_calc_subtree_ss_mask()
* cgroup_refresh_child_subsys_mask() -> cgroup_refresh_subtree_ss_mask()

No functional changes.

Signed-off-by: Tejun Heo 
---
 include/linux/cgroup-defs.h | 11 +--
 kernel/cgroup.c | 48 ++---
 2 files changed, 29 insertions(+), 30 deletions(-)

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 4f3c0da..c68ae7f 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -253,13 +253,12 @@ struct cgroup {
/*
 * The bitmask of subsystems enabled on the child cgroups.
 * ->subtree_control is the one configured through
-* "cgroup.subtree_control" while ->child_subsys_mask is the
-* effective one which may have more subsystems enabled.
-* Controller knobs are made available iff it's enabled in
-* ->subtree_control.
+* "cgroup.subtree_control" while ->child_ss_mask is the effective
+* one which may have more subsystems enabled.  Controller knobs
+* are made available iff it's enabled in ->subtree_control.
 */
-   unsigned int subtree_control;
-   unsigned int child_subsys_mask;
+   unsigned long subtree_control;
+   unsigned long subtree_ss_mask;
 
/* Private pointers for each registered subsystem */
struct cgroup_subsys_state __rcu *subsys[CGROUP_SUBSYS_COUNT];
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 7727b6e..f3cd67b 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -391,10 +391,10 @@ static struct cgroup_subsys_state *cgroup_e_css(struct 
cgroup *cgrp,
 
/*
 * This function is used while updating css associations and thus
-* can't test the csses directly.  Use ->child_subsys_mask.
+* can't test the csses directly.  Use ->subtree_ss_mask.
 */
while (cgroup_parent(cgrp) &&
-  !(cgroup_parent(cgrp)->child_subsys_mask & (1 << ss->id)))
+  !(cgroup_parent(cgrp)->subtree_ss_mask & (1 << ss->id)))
cgrp = cgroup_parent(cgrp);
 
return cgroup_css(cgrp, ss);
@@ -1256,7 +1256,7 @@ static umode_t cgroup_file_mode(const struct cftype *cft)
 }
 
 /**
- * cgroup_calc_child_subsys_mask - calculate child_subsys_mask
+ * cgroup_calc_subtree_ss_mask - calculate subtree_ss_mask
  * @cgrp: the target cgroup
  * @subtree_control: the new subtree_control mask to consider
  *
@@ -1268,8 +1268,8 @@ static umode_t cgroup_file_mode(const struct cftype *cft)
  * @subtree_control is to be applied to @cgrp.  The returned mask is always
  * a superset of @subtree_control and follows the usual hierarchy rules.
  */
-static unsigned long cgroup_calc_child_subsys_mask(struct cgroup *cgrp,
- unsigned long subtree_control)
+static unsigned long cgroup_calc_subtree_ss_mask(struct cgroup *cgrp,
+unsigned long subtree_control)
 {
struct cgroup *parent = cgroup_parent(cgrp);
unsigned long cur_ss_mask = subtree_control;
@@ -1293,7 +1293,7 @@ static unsigned long cgroup_calc_child_subsys_mask(struct 
cgroup *cgrp,
 * to non-default hierarchies.
 */
if (parent)
-   new_ss_mask &= parent->child_subsys_mask;
+   new_ss_mask &= parent->subtree_ss_mask;
else
new_ss_mask &= cgrp->root->subsys_mask;
 
@@ -1306,16 +1306,16 @@ static unsigned long 
cgroup_calc_child_subsys_mask(struct cgroup *cgrp,
 }
 
 /**
- * cgroup_refresh_child_subsys_mask - update child_subsys_mask
+ * cgroup_refresh_subtree_ss_mask - update subtree_ss_mask
  * @cgrp: the target cgroup
  *
- * Update @cgrp->child_subsys_mask according to the current
- * @cgrp->subtree_control using cgroup_calc_child_subsys_mask().
+ * Update @cgrp->subtree_ss_mask according to the current
+ * @cgrp->subtree_control using cgroup_calc_subtree_ss_mask().
  */
-static void cgroup_refresh_child_subsys_mask(struct cgroup *cgrp)
+static void cgroup_refresh_subtree_ss_mask(struct cgroup *cgrp)
 {
-   cgrp->child_subsys_mask =
-   cgroup_calc_child_subsys_mask(cgrp, cgrp->subtree_control);
+   cgrp->subtree_ss_mask =
+   cgroup_calc_subtree_ss_mask(cgrp, cgrp->subtree_control);
 }
 
 /**
@@ -1542,7 +1542,7 @@ static int rebind_subsystems(struct cgroup_root *dst_root,
 
src_root->subsys_mask &= ~(1 << ssid);
scgrp->subtree_control &= ~(1 << ssid);
-   cgroup_refresh_child_subsys_mask(scgrp);
+   cgroup_refresh_subtree_ss_mask(scgrp);
 
/* default hierarchy doesn't enable controllers by default */
dst_root->subsys_mask |= 1 << ssid;
@@ -1550,7 +1550,7 @@ static in

[PATCHSET] cgroup: misc fixes and cleanups

2016-02-22 Thread Tejun Heo

Hello,

This patchset contains the following six patches.  The first one is a
fix but should be safe to route through for-4.6.  The rest are misc
cleanups and improvements which don't cause notable behavior changes.

 0001-cgroup-fix-error-return-value-of-cgroup_addrm_files.patch
 0002-Revert-cgroup-add-cgroup_subsys-css_e_css_changed.patch
 0003-cgroup-s-child_subsys_mask-subtree_ss_mask.patch
 0004-cgroup-convert-for_each_subsys_which-to-do-while-sty.patch
 0005-cgroup-use-do_each_subsys_mask-where-applicable.patch
 0006-cgroup-make-cgroup-subsystem-masks-u16.patch

The patchset based on top of cgroup/for-4.6 223ffb29f972 ("cgroup:
provide cgroup_nov1= to disable controllers in v1 mounts") and is
available in the following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-misc

diffstat follows.  I'll pull in the branch into cgroup/for-4.6
soonish.

 include/linux/cgroup-defs.h |   12 +-
 kernel/cgroup.c |  217 +++-
 2 files changed, 102 insertions(+), 127 deletions(-)

Thanks.

--
tejun

[PATCH 5/6] cgroup: use do_each_subsys_mask() where applicable

2016-02-22 Thread Tejun Heo

There are several places in cgroup_subtree_control_write() which can
use do_each_subsys_mask() instead of manual mask testing.  Use it.

No functional changes.

Signed-off-by: Tejun Heo 
---
 kernel/cgroup.c | 35 ---
 1 file changed, 12 insertions(+), 23 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 5d10298..1e561bd 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -3082,10 +3082,7 @@ static ssize_t cgroup_subtree_control_write(struct 
kernfs_open_file *of,
 * dependency.  An invisible css is made visible when the userland
 * explicitly enables it.
 */
-   for_each_subsys(ss, ssid) {
-   if (!(enable & (1 << ssid)))
-   continue;
-
+   do_each_subsys_mask(ss, ssid, enable) {
cgroup_for_each_live_child(child, cgrp) {
if (css_enable & (1 << ssid))
ret = create_css(child, ss,
@@ -3096,7 +3093,7 @@ static ssize_t cgroup_subtree_control_write(struct 
kernfs_open_file *of,
if (ret)
goto err_undo_css;
}
-   }
+   } while_each_subsys_mask();
 
/*
 * At this point, cgroup_e_css() results reflect the new csses
@@ -3115,10 +3112,7 @@ static ssize_t cgroup_subtree_control_write(struct 
kernfs_open_file *of,
 * state if it's made visible again later.  Controllers which may
 * be depended upon should provide ->css_reset() for this purpose.
 */
-   for_each_subsys(ss, ssid) {
-   if (!(disable & (1 << ssid)))
-   continue;
-
+   do_each_subsys_mask(ss, ssid, disable) {
cgroup_for_each_live_child(child, cgrp) {
struct cgroup_subsys_state *css = cgroup_css(child, ss);
 
@@ -3130,7 +3124,7 @@ static ssize_t cgroup_subtree_control_write(struct 
kernfs_open_file *of,
ss->css_reset(css);
}
}
-   }
+   } while_each_subsys_mask();
 
kernfs_activate(cgrp->kn);
ret = 0;
@@ -3142,10 +3136,7 @@ static ssize_t cgroup_subtree_control_write(struct 
kernfs_open_file *of,
cgrp->subtree_control = old_sc;
cgrp->subtree_ss_mask = old_ss;
 
-   for_each_subsys(ss, ssid) {
-   if (!(enable & (1 << ssid)))
-   continue;
-
+   do_each_subsys_mask(ss, ssid, enable) {
cgroup_for_each_live_child(child, cgrp) {
struct cgroup_subsys_state *css = cgroup_css(child, ss);
 
@@ -3157,7 +3148,7 @@ static ssize_t cgroup_subtree_control_write(struct 
kernfs_open_file *of,
else
css_clear_dir(css, NULL);
}
-   }
+   } while_each_subsys_mask();
goto out_unlock;
 }
 
@@ -4973,14 +4964,12 @@ static int cgroup_mkdir(struct kernfs_node *parent_kn, 
const char *name,
goto out_destroy;
 
/* let's create and online css's */
-   for_each_subsys(ss, ssid) {
-   if (parent->subtree_ss_mask & (1 << ssid)) {
-   ret = create_css(cgrp, ss,
-parent->subtree_control & (1 << ssid));
-   if (ret)
-   goto out_destroy;
-   }
-   }
+   do_each_subsys_mask(ss, ssid, parent->subtree_ss_mask) {
+   ret = create_css(cgrp, ss,
+parent->subtree_control & (1 << ssid));
+   if (ret)
+   goto out_destroy;
+   } while_each_subsys_mask();
 
/*
 * On the default hierarchy, a child doesn't automatically inherit
-- 
2.5.0

[PATCH 2/6] Revert "cgroup: add cgroup_subsys->css_e_css_changed()"

2016-02-22 Thread Tejun Heo

This reverts commit 56c807ba4e91f0980567b6a69de239677879b17f.

cgroup_subsys->css_e_css_changed() was supposed to be used by cgroup
writeback support; however, the change to per-inode cgroup association
made it unnecessary and the callback doesn't have any user.  Remove
it.

Signed-off-by: Tejun Heo 
---
 include/linux/cgroup-defs.h |  1 -
 kernel/cgroup.c | 18 --
 2 files changed, 19 deletions(-)

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 789471d..4f3c0da 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -434,7 +434,6 @@ struct cgroup_subsys {
void (*css_released)(struct cgroup_subsys_state *css);
void (*css_free)(struct cgroup_subsys_state *css);
void (*css_reset)(struct cgroup_subsys_state *css);
-   void (*css_e_css_changed)(struct cgroup_subsys_state *css);
 
int (*can_attach)(struct cgroup_taskset *tset);
void (*cancel_attach)(struct cgroup_taskset *tset);
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 68b032d..7727b6e 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -3127,24 +3127,6 @@ static ssize_t cgroup_subtree_control_write(struct 
kernfs_open_file *of,
}
}
 
-   /*
-* The effective csses of all the descendants (excluding @cgrp) may
-* have changed.  Subsystems can optionally subscribe to this event
-* by implementing ->css_e_css_changed() which is invoked if any of
-* the effective csses seen from the css's cgroup may have changed.
-*/
-   for_each_subsys(ss, ssid) {
-   struct cgroup_subsys_state *this_css = cgroup_css(cgrp, ss);
-   struct cgroup_subsys_state *css;
-
-   if (!ss->css_e_css_changed || !this_css)
-   continue;
-
-   css_for_each_descendant_pre(css, this_css)
-   if (css != this_css)
-   ss->css_e_css_changed(css);
-   }
-
kernfs_activate(cgrp->kn);
ret = 0;
 out_unlock:
-- 
2.5.0

[PATCH 6/6] cgroup: make cgroup subsystem masks u16

2016-02-22 Thread Tejun Heo

After the recent do_each_subsys_mask() conversion, there's no reason
to use ulong for subsystem masks.  We'll be adding more subsystem
masks to persistent data structures, let's reduce its size to u16
which should be enough for now and the foreseeable future.

This doesn't create any noticeable behavior differences.

Signed-off-by: Tejun Heo 
---
 include/linux/cgroup-defs.h |  4 ++--
 kernel/cgroup.c | 46 ++---
 2 files changed, 24 insertions(+), 26 deletions(-)

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index c68ae7f..0abf6aa 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -257,8 +257,8 @@ struct cgroup {
 * one which may have more subsystems enabled.  Controller knobs
 * are made available iff it's enabled in ->subtree_control.
 */
-   unsigned long subtree_control;
-   unsigned long subtree_ss_mask;
+   u16 subtree_control;
+   u16 subtree_ss_mask;
 
/* Private pointers for each registered subsystem */
struct cgroup_subsys_state __rcu *subsys[CGROUP_SUBSYS_COUNT];
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 1e561bd..a2f81ef 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -184,7 +184,7 @@ static bool cgrp_dfl_root_visible;
 static unsigned long cgroup_no_v1_mask;
 
 /* some controllers are not supported in the default hierarchy */
-static unsigned long cgrp_dfl_root_inhibit_ss_mask;
+static u16 cgrp_dfl_root_inhibit_ss_mask;
 
 /* The list of hierarchy roots */
 
@@ -208,19 +208,18 @@ static u64 css_serial_nr_next = 1;
  * fork/exit handlers to call. This avoids us having to do extra work in the
  * fork/exit path to check which subsystems have fork/exit callbacks.
  */
-static unsigned long have_fork_callback __read_mostly;
-static unsigned long have_exit_callback __read_mostly;
-static unsigned long have_free_callback __read_mostly;
+static u16 have_fork_callback __read_mostly;
+static u16 have_exit_callback __read_mostly;
+static u16 have_free_callback __read_mostly;
 
 /* Ditto for the can_fork callback. */
-static unsigned long have_canfork_callback __read_mostly;
+static u16 have_canfork_callback __read_mostly;
 
 static struct file_system_type cgroup2_fs_type;
 static struct cftype cgroup_dfl_base_files[];
 static struct cftype cgroup_legacy_base_files[];
 
-static int rebind_subsystems(struct cgroup_root *dst_root,
-unsigned long ss_mask);
+static int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask);
 static void css_task_iter_advance(struct css_task_iter *it);
 static int cgroup_destroy_locked(struct cgroup *cgrp);
 static int create_css(struct cgroup *cgrp, struct cgroup_subsys *ss,
@@ -1274,11 +1273,10 @@ static umode_t cgroup_file_mode(const struct cftype 
*cft)
  * @subtree_control is to be applied to @cgrp.  The returned mask is always
  * a superset of @subtree_control and follows the usual hierarchy rules.
  */
-static unsigned long cgroup_calc_subtree_ss_mask(struct cgroup *cgrp,
-unsigned long subtree_control)
+static u16 cgroup_calc_subtree_ss_mask(struct cgroup *cgrp, u16 
subtree_control)
 {
struct cgroup *parent = cgroup_parent(cgrp);
-   unsigned long cur_ss_mask = subtree_control;
+   u16 cur_ss_mask = subtree_control;
struct cgroup_subsys *ss;
int ssid;
 
@@ -1288,7 +1286,7 @@ static unsigned long cgroup_calc_subtree_ss_mask(struct 
cgroup *cgrp,
return cur_ss_mask;
 
while (true) {
-   unsigned long new_ss_mask = cur_ss_mask;
+   u16 new_ss_mask = cur_ss_mask;
 
do_each_subsys_mask(ss, ssid, cur_ss_mask) {
new_ss_mask |= ss->depends_on;
@@ -1466,12 +1464,11 @@ static int css_populate_dir(struct cgroup_subsys_state 
*css,
return ret;
 }
 
-static int rebind_subsystems(struct cgroup_root *dst_root,
-unsigned long ss_mask)
+static int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask)
 {
struct cgroup *dcgrp = &dst_root->cgrp;
struct cgroup_subsys *ss;
-   unsigned long tmp_ss_mask;
+   u16 tmp_ss_mask;
int ssid, i, ret;
 
lockdep_assert_held(&cgroup_mutex);
@@ -1507,7 +1504,7 @@ static int rebind_subsystems(struct cgroup_root *dst_root,
 */
if (dst_root == &cgrp_dfl_root) {
if (cgrp_dfl_root_visible) {
-   pr_warn("failed to create files (%d) while 
rebinding 0x%lx to default root\n",
+   pr_warn("failed to create files (%d) while 
rebinding 0x%x to default root\n",
ret, ss_mask);
pr_warn("you may retry by moving them to a 
different hierarchy and unbinding\n");
}
@@ -1599,7 +1596,7 @@ static int cgroup_show_

[PATCH 4/6] cgroup: convert for_each_subsys_which() to do-while style

2016-02-22 Thread Tejun Heo

for_each_subsys_which() allows iterating subsystems specified in a
subsystem bitmask; unfortunately, it requires the mask to be an
unsigned long l-value which can be inconvenient and makes it awkward
to use a smaller type for subsystem masks.

This patch converts for_each_subsy_which() to do-while style which
allows it to drop the l-value requirement.  The new iterator is named
do_each_subsys_mask() / while_each_subsys_mask().

Signed-off-by: Tejun Heo 
Cc: Aleksa Sarai 
---
 kernel/cgroup.c | 72 -
 1 file changed, 40 insertions(+), 32 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index f3cd67b..5d10298 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -514,22 +514,28 @@ static int notify_on_release(const struct cgroup *cgrp)
 (((ss) = cgroup_subsys[ssid]) || true); (ssid)++)
 
 /**
- * for_each_subsys_which - filter for_each_subsys with a bitmask
+ * do_each_subsys_mask - filter for_each_subsys with a bitmask
  * @ss: the iteration cursor
  * @ssid: the index of @ss, CGROUP_SUBSYS_COUNT after reaching the end
- * @ss_maskp: a pointer to the bitmask
+ * @ss_mask: the bitmask
  *
  * The block will only run for cases where the ssid-th bit (1 << ssid) of
- * mask is set to 1.
+ * @ss_mask is set.
  */
-#define for_each_subsys_which(ss, ssid, ss_maskp)  \
-   if (!CGROUP_SUBSYS_COUNT) /* to avoid spurious gcc warning */   \
+#define do_each_subsys_mask(ss, ssid, ss_mask) do {\
+   unsigned long __ss_mask = (ss_mask);\
+   if (!CGROUP_SUBSYS_COUNT) { /* to avoid spurious gcc warning */ \
(ssid) = 0; \
-   else\
-   for_each_set_bit(ssid, ss_maskp, CGROUP_SUBSYS_COUNT)   \
-   if (((ss) = cgroup_subsys[ssid]) && false)  \
-   break;  \
-   else
+   break;  \
+   }   \
+   for_each_set_bit(ssid, &__ss_mask, CGROUP_SUBSYS_COUNT) {   \
+   (ss) = cgroup_subsys[ssid]; \
+   {
+
+#define while_each_subsys_mask()   \
+   }   \
+   }   \
+} while (false)
 
 /* iterate across the hierarchies */
 #define for_each_root(root)\
@@ -1284,8 +1290,9 @@ static unsigned long cgroup_calc_subtree_ss_mask(struct 
cgroup *cgrp,
while (true) {
unsigned long new_ss_mask = cur_ss_mask;
 
-   for_each_subsys_which(ss, ssid, &cur_ss_mask)
+   do_each_subsys_mask(ss, ssid, cur_ss_mask) {
new_ss_mask |= ss->depends_on;
+   } while_each_subsys_mask();
 
/*
 * Mask out subsystems which aren't available.  This can
@@ -1469,7 +1476,7 @@ static int rebind_subsystems(struct cgroup_root *dst_root,
 
lockdep_assert_held(&cgroup_mutex);
 
-   for_each_subsys_which(ss, ssid, &ss_mask) {
+   do_each_subsys_mask(ss, ssid, ss_mask) {
/* if @ss has non-root csses attached to it, can't move */
if (css_next_child(NULL, cgroup_css(&ss->root->cgrp, ss)))
return -EBUSY;
@@ -1477,14 +1484,14 @@ static int rebind_subsystems(struct cgroup_root 
*dst_root,
/* can't move between two non-dummy roots either */
if (ss->root != &cgrp_dfl_root && dst_root != &cgrp_dfl_root)
return -EBUSY;
-   }
+   } while_each_subsys_mask();
 
/* skip creating root files on dfl_root for inhibited subsystems */
tmp_ss_mask = ss_mask;
if (dst_root == &cgrp_dfl_root)
tmp_ss_mask &= ~cgrp_dfl_root_inhibit_ss_mask;
 
-   for_each_subsys_which(ss, ssid, &tmp_ss_mask) {
+   do_each_subsys_mask(ss, ssid, tmp_ss_mask) {
struct cgroup *scgrp = &ss->root->cgrp;
int tssid;
 
@@ -1507,19 +1514,19 @@ static int rebind_subsystems(struct cgroup_root 
*dst_root,
continue;
}
 
-   for_each_subsys_which(ss, tssid, &tmp_ss_mask) {
+   do_each_subsys_mask(ss, tssid, tmp_ss_mask) {
if (tssid == ssid)
break;
css_clear_dir(cgroup_css(scgrp, ss), dcgrp);
-   }
+   } while_each_subsys_mask();
return ret;
-   }
+   } while_each_subsys_mask();
 
/*
 * Nothing can fail from this point on.  Remove f

Re: [RFC/RFT][PATCH 1/1] cpufreq: New governor using utilization data from the scheduler

2016-02-22 Thread Rafael J. Wysocki

On Mon, Feb 22, 2016 at 3:16 PM, Juri Lelli  wrote:
> Hi Rafael,

Hi,

> thanks for this RFC. I'm going to test it more in the next few days, but
> I already have some questions from skimming through it. Please find them
> inline below.
>
> On 22/02/16 00:18, Rafael J. Wysocki wrote:
>> From: Rafael J. Wysocki 
>>
>> Add a new cpufreq scaling governor, called "schedutil", that uses
>> scheduler-provided CPU utilization information as input for making
>> its decisions.
>>
>
> I guess the first (macro) question is why did you decide to go with a
> complete new governor, where new here is w.r.t. the sched-freq solution.

Probably the most comprehensive answer to this question is my intro
message: http://marc.info/?l=linux-pm&m=145609673008122&w=2

The executive summary is probably that this was the most
straightforward way to use the scheduler-provided numbers in cpufreq
that I could think about.

> AFAICT, it is true that your solution directly builds on top of the
> latest changes to cpufreq core and governor, but it also seems to have
> more than a few points in common with sched-freq,

That surely isn't a drawback, is it?

If two people come to the same conclusions in different ways, that's
an indication that the conclusions may actually be correct.

> and sched-freq has been discussed and evaluated for already quite some time.

Yes, it has.

Does this mean that no one is allowed to try any alternatives to it now?

> Also, it appears to me that they both shares (or they might encounter in the
> future as development progresses) the same kind of problems, like for
> example the fact that we can't trigger opp changes from scheduler
> context ATM.

"Give them a finger and they will ask for the hand."

If you read my intro message linked above, you'll find a paragraph or
two about that in it.

And the short summary is that I have a plan to actually implement that
feature in the schedutil governor at least for the ACPI cpufreq
driver.  It shouldn't be too difficult to do either AFAICS.  So it is
not "we can't", but rather "we haven't implemented that yet" in this
particular case.

I may not be able to do that in the next few days, as I have other
things to do too, but you may expect to see that done at one point.

So it's not a fundamental issue or anything, it's just that I haven't
done that *yet* at this point, OK?

> Don't get me wrong. I think that looking at different ways to solve a
> problem is always beneficial, since I guess that the goal in the end is
> to come up with something that suits everybody's needs.

Precisely.

> I was only curious about your thoughts on sched-freq. But we can also wait 
> for the
> next RFC from Steve for this macro question. :-)

Right, but I have some thoughts anyway.

My goal, that may be quite different from yours, is to reduce the
cpufreq's overhead as much as I possibly can.  If I have to change the
way it drives the CPU frequency selection to achieve that goal, I will
do that, but if that can stay the way it is, that's fine too.

Some progress has been made already here: we have dealt with the
timers for good now I think.

This patch deals with the overhead associated with the load tracking
carried by "traditional" cpufreq governors and with a couple of
questionable things done by "ondemand" in addition to that (which is
one of the reasons why I didn't want to modify "ondemand" itself for
now).

The next step will be to teach the governor and the ACPI driver to
switch CPU frequencies in the scheduler context, without spawning
extra work items etc.

Finally, the sampling should go away and that's where I want it to be.

I just don't want to run extra code when that's not necessary and I
want things to stay simple when that's as good as it can get.  If
sched-freq can pull that off for me, that's fine, but can it really?

> [...]
>
>> +static void sugov_update_commit(struct policy_dbs_info *policy_dbs, u64 
>> time,
>> + unsigned int next_freq)
>> +{
>> + struct sugov_policy *sg_policy = to_sg_policy(policy_dbs);
>> +
>> + sg_policy->next_freq = next_freq;
>> + policy_dbs->last_sample_time = time;
>> + policy_dbs->work_in_progress = true;
>> + irq_work_queue(&policy_dbs->irq_work);
>
> Here we basically use the system wq to be able to do the freq transition
> in process context. CFS is probably fine with this, but don't you think
> we might get into troubles when, in the future, we will want to service
> RT/DL requests more properly and they will end up being serviced
> together with all the others wq users and at !RT priority?

That may be regarded as a problem, but I'm not sure why you're talking
about it in the context of this particular patch.  That problem has
been there forever in cpufreq: in theory RT tasks may stall frequency
changes indefinitely.

Is the problem real, though?

Suppose that that actually happens and there are RT tasks effectively
stalling frequency updates.  In that case some other important
activ

Re: 4.4-final: 28 bioset threads on small notebook

2016-02-22 Thread Kent Overstreet

On Sun, Feb 21, 2016 at 05:40:59PM +0800, Ming Lei wrote:
> On Sun, Feb 21, 2016 at 2:43 PM, Ming Lin-SSI  wrote:
> >>-Original Message-
> >
> > So it's almost already "per request_queue"
> 
> Yes, that is because of the following line:
> 
> q->bio_split = bioset_create(BIO_POOL_SIZE, 0);
> 
> in blk_alloc_queue_node().
> 
> Looks like this bio_set doesn't need to be per-request_queue, and
> now it is only used for fast-cloning bio for splitting, and one global
> split bio_set should be enough.

It does have to be per request queue for stacking block devices (which includes
loopback).

Re: [PATCH 08/10] x86/xsaves: Fix PTRACE frames for XSAVES

2016-02-22 Thread Andy Lutomirski

On Mon, Feb 22, 2016 at 2:53 PM, Dave Hansen
 wrote:
> On 02/22/2016 02:45 PM, Andy Lutomirski wrote:
>>> +/*
>>> > + * Convert from kernel XSAVES compacted format to standard format and 
>>> > copy
>>> > + * to a ptrace buffer. It supports partial copy but pos always starts 
>>> > from
>>> > + * zero. This is called from xstateregs_get() and there we check the cpu
>>> > + * has XSAVES.
>>> > + */
>>> > +int copyout_from_xsaves(unsigned int pos, unsigned int count, void *kbuf,
>>> > +   void __user *ubuf, const struct xregs_state 
>>> > *xsave)
>> Now that you've written this code, can it be shared with the signal
>> handling code?
>
> It could be.  But the signal handler code has the advantage of already
> having the data in the registers since it's running on its *own* FPU
> state, so it can just call XSAVE(S) directly.
>
> This ptrace code *could* do a kernel_fpu_begin(), XRSTOR the user buffer
> into the registers, XRSTOR the ptracee's system state in to the
> registers, then XSAVES the whole thing to the kernel buffer, then
> kernel_fpu_end().
>
> Or, we could remove the signal handler's ability to XSAVE directly to
> userspace.  But it already *had* that and we know it works.

Some day I kind of want to delete all this xsave/xrstor directly on
user buffers code.   I've never been thrilled with the concept, and it
has messy (although AFAICT not presently buggy [1]) interactions with
context switches, and it can't run with preemption disabled because it
can take page faults.

In the mean time, it's fine as far as I know, but maybe it would be
cleaner if it used the software copy code.  Or maybe we can change it
later if a good reason shows up.

[1] actually there's a minor bug in the 32-bit compat code that
Borislav has a patch for.

-- 
Andy Lutomirski
AMA Capital Management, LLC

[PATCH 1/8] tty: n_gsm: fix formatting errors

2016-02-22 Thread Andrej Krpic

Minor formatting changes to remove errors and reduce number of
warnings produced by checkpatch.pl script.

Signed-off-by: Andrej Krpic 
---
 drivers/tty/n_gsm.c | 138 
 1 file changed, 95 insertions(+), 43 deletions(-)

diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c
index c016207..cc3b374 100644
--- a/drivers/tty/n_gsm.c
+++ b/drivers/tty/n_gsm.c
@@ -431,6 +431,7 @@ static int gsm_read_ea(unsigned int *val, u8 c)
 static u8 gsm_encode_modem(const struct gsm_dlci *dlci)
 {
u8 modembits = 0;
+
/* FC is true flow control not modem bits */
if (dlci->throttled)
modembits |= MDM_FC;
@@ -489,7 +490,8 @@ static void gsm_print_packet(const char *hdr, int addr, int 
cr,
if (!(control & 0x01)) {
pr_cont("I N(S)%d N(R)%d",
(control & 0x0E) >> 1, (control & 0xE0) >> 5);
-   } else switch (control & 0x0F) {
+   } else
+   switch (control & 0x0F) {
case RR:
pr_cont("RR(%d)", (control & 0xE0) >> 5);
break;
@@ -511,6 +513,7 @@ static void gsm_print_packet(const char *hdr, int addr, int 
cr,
 
if (dlen) {
int ct = 0;
+
while (dlen--) {
if (ct % 8 == 0) {
pr_cont("\n");
@@ -542,6 +545,7 @@ static void gsm_print_packet(const char *hdr, int addr, int 
cr,
 static int gsm_stuff_frame(const u8 *input, u8 *output, int len)
 {
int olen = 0;
+
while (len--) {
if (*input == GSM1_SOF || *input == GSM1_ESCAPE
|| *input == XON || *input == XOFF) {
@@ -695,7 +699,7 @@ static void gsm_data_kick(struct gsm_mux *gsm)
len += 2;
} else {
gsm->txframe[0] = GSM0_SOF;
-   memcpy(gsm->txframe + 1 , msg->data, msg->len);
+   memcpy(gsm->txframe + 1, msg->data, msg->len);
gsm->txframe[msg->len + 1] = GSM0_SOF;
len = msg->len + 2;
}
@@ -711,7 +715,8 @@ static void gsm_data_kick(struct gsm_mux *gsm)
/* FIXME: Can eliminate one SOF in many more cases */
gsm->tx_bytes -= msg->len;
/* For a burst of frames skip the extra SOF within the
-  burst */
+* burst
+*/
skip_sof = 1;
 
list_del(&msg->list);
@@ -750,7 +755,7 @@ static void __gsm_data_queue(struct gsm_dlci *dlci, struct 
gsm_msg *msg)
*--dp = (msg->addr << 2) | 2 | EA;
else
*--dp = (msg->addr << 2) | EA;
-   *fcs = gsm_fcs_add_block(INIT_FCS, dp , msg->data - dp);
+   *fcs = gsm_fcs_add_block(INIT_FCS, dp, msg->data - dp);
/* Ugly protocol layering violation */
if (msg->ctrl == UI || msg->ctrl == (UI|PF))
*fcs = gsm_fcs_add_block(*fcs, msg->data, msg->len);
@@ -760,7 +765,8 @@ static void __gsm_data_queue(struct gsm_dlci *dlci, struct 
gsm_msg *msg)
msg->data, msg->len);
 
/* Move the header back and adjust the length, also allow for the FCS
-  now tacked on the end */
+* now tacked on the end
+*/
msg->len += (msg->data - dp) + 1;
msg->data = dp;
 
@@ -783,6 +789,7 @@ static void __gsm_data_queue(struct gsm_dlci *dlci, struct 
gsm_msg *msg)
 static void gsm_data_queue(struct gsm_dlci *dlci, struct gsm_msg *msg)
 {
unsigned long flags;
+
spin_lock_irqsave(&dlci->gsm->tx_lock, flags);
__gsm_data_queue(dlci, msg);
spin_unlock_irqrestore(&dlci->gsm->tx_lock, flags);
@@ -821,7 +828,8 @@ static int gsm_dlci_data_output(struct gsm_mux *gsm, struct 
gsm_dlci *dlci)
 
msg = gsm_data_alloc(gsm, dlci->addr, size, gsm->ftype);
/* FIXME: need a timer or something to kick this so it can't
-  get stuck with no work outstanding and no buffer free */
+* get stuck with no work outstanding and no buffer free
+*/
if (msg == NULL)
return -ENOMEM;
dp = msg->data;
@@ -829,11 +837,13 @@ static int gsm_dlci_data_output(struct gsm_mux *gsm, 
struct gsm_dlci *dlci)
case 1: /* Unstructured */
break;
case 2: /* Unstructed with modem bits.
-   Always one byte as we never send inline break data */
+* Always one byte as we never send inline break data
+*/
*dp++ = gsm_encode_modem(dlci);
break;
}
-   WARN_ON(kfifo_out_locked(dlci->fifo, dp , len, &dlci->lock) !=

[PATCH 3/8] tty: n_gsm: make mux work as a responder station

2016-02-22 Thread Andrej Krpic

Comment suggests that cr == 1 represents a received command and cr == 0
a received response. Received frames are then filtered:
 - correctly by rejection of SABM and DISC responses, they are
   command only frame types and
 - incorrectly by rejection of UA (a response only frame type) responses.

Mux as a initiator successfully establishes DLC by receiving UA response
frame to a previously sent open channel command (SABM). Incorrect
equation (eqA) makes UA "reject cr == 0 commands" case correct, but
filters out all received SABM and DISC command frames.

Change eqA to eqB to match the intent and fix filtering of UA frames.
This enables reception of SABM and DISC frames and consequently
makes mux  work as a responder station.

receivedreceiving as  eqA  eqB   3GPP TS 27.010
 CR bit initiator (ir)cr=CR^(1-ir)   cr=CR^ir5.2.1.2
   0  0 10 0 (response)
   1  0 _\  01 1 (command)
   0  1  /  01 1 (command)
   1  1 10 0 (response)

Signed-off-by: Andrej Krpic 
---
 drivers/tty/n_gsm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c
index a0fb92c..05b562d 100644
--- a/drivers/tty/n_gsm.c
+++ b/drivers/tty/n_gsm.c
@@ -1798,7 +1798,7 @@ static void gsm_queue(struct gsm_mux *gsm)
 
gsm_print_packet("<--", address, cr, gsm->control, gsm->buf, gsm->len);
 
-   cr ^= 1 - gsm->initiator;   /* Flip so 1 always means command */
+   cr ^= gsm->initiator;   /* Flip so 1 always means command */
dlci = gsm->dlci[address];
 
switch (gsm->control) {
@@ -1829,7 +1829,7 @@ static void gsm_queue(struct gsm_mux *gsm)
break;
case UA:
case UA|PF:
-   if (cr == 0 || dlci == NULL)
+   if (cr || dlci == NULL)
break;
switch (dlci->state) {
case DLCI_CLOSING:
-- 
2.7.0

[PATCH 2/8] tty: n_gsm: fix C/R bit when sending as a responder

2016-02-22 Thread Andrej Krpic

According to the specification (3GPP TS 27.010 v12.0.0 R1, 5.2.1.2),
C/R bit must be the same for corresponding command and response.
If mux is an initiator (station that initiates DLC 0), valid sent
commands and received responses must have C/R bit set to 1.
For a station that is a responder, valid sent commands and received
responses must have C/R bit set to 0.

Change the value of C/R bit in command and response frames to
depend on whether the station is initator or not.

Signed-off-by: Andrej Krpic 
---
 drivers/tty/n_gsm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c
index cc3b374..a0fb92c 100644
--- a/drivers/tty/n_gsm.c
+++ b/drivers/tty/n_gsm.c
@@ -622,7 +622,7 @@ static void gsm_send(struct gsm_mux *gsm, int addr, int cr, 
int control)
 
 static inline void gsm_response(struct gsm_mux *gsm, int addr, int control)
 {
-   gsm_send(gsm, addr, 0, control);
+   gsm_send(gsm, addr, gsm->initiator ? 0 : 1, control);
 }
 
 /**
@@ -636,7 +636,7 @@ static inline void gsm_response(struct gsm_mux *gsm, int 
addr, int control)
 
 static inline void gsm_command(struct gsm_mux *gsm, int addr, int control)
 {
-   gsm_send(gsm, addr, 1, control);
+   gsm_send(gsm, addr, gsm->initiator ? 1 : 0, control);
 }
 
 /* Data transmission */
-- 
2.7.0

[PATCH 7/8] tty: n_gsm: properly format Modem Status Command message

2016-02-22 Thread Andrej Krpic

Change format of Modem Status Command (MSC) message that is
sent to the one expected in the receive function gsm_control_modem
and specified in 3GPP TS 27.010 version 12.0.0 Release 12, 5.4.6.3.7.

Wrongly formatted MSC causes DLC to be marked as constipated. A bug
appears after format of transmitted control messages is fixed and
control messages start to be recognized.

Signed-off-by: Andrej Krpic 
---
 drivers/tty/n_gsm.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c
index 8aa90e0..b0d9edd 100644
--- a/drivers/tty/n_gsm.c
+++ b/drivers/tty/n_gsm.c
@@ -2874,12 +2874,11 @@ static int gsmtty_modem_update(struct gsm_dlci *dlci, 
u8 brk)
if (brk)
len++;
 
-   modembits[0] = len << 1 | EA;   /* Data bytes */
-   modembits[1] = dlci->addr << 2 | 3; /* DLCI, EA, 1 */
-   modembits[2] = gsm_encode_modem(dlci) << 1 | EA;
+   modembits[0] = dlci->addr << 2 | 3; /* DLCI, EA, 1 */
+   modembits[1] = gsm_encode_modem(dlci) << 1 | EA;
if (brk)
-   modembits[3] = brk << 4 | 2 | EA;   /* Valid, EA */
-   ctrl = gsm_control_send(dlci->gsm, CMD_MSC, modembits, len + 1);
+   modembits[2] = brk << 4 | 2 | EA;   /* Valid, EA */
+   ctrl = gsm_control_send(dlci->gsm, CMD_MSC, modembits, len);
if (ctrl == NULL)
return -ENOMEM;
return gsm_control_wait(dlci->gsm, ctrl);
-- 
2.7.0

[PATCH 6/8] tty: n_gsm: add missing length field in control channel commands

2016-02-22 Thread Andrej Krpic

Observing debug output while running initator and responder using n_gsm
shows all control channel commands sent by initiator are malformed
 - they don't include length field (3GPP TS 07.10 ver 7.2.0, 5.4.6.1).

Add length field to transmitted control channel commands in the
gsm_control_transmit) as it is done in gsm_control_reply and expected in
gsm_dlci_command.

Signed-off-by: Andrej Krpic 
---
 drivers/tty/n_gsm.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c
index 3c4c521..8aa90e0 100644
--- a/drivers/tty/n_gsm.c
+++ b/drivers/tty/n_gsm.c
@@ -1320,12 +1320,13 @@ static void gsm_control_response(struct gsm_mux *gsm, 
unsigned int command,
 
 static void gsm_control_transmit(struct gsm_mux *gsm, struct gsm_control *ctrl)
 {
-   struct gsm_msg *msg = gsm_data_alloc(gsm, 0, ctrl->len + 1, gsm->ftype);
+   struct gsm_msg *msg = gsm_data_alloc(gsm, 0, ctrl->len + 2, gsm->ftype);
 
if (msg == NULL)
return;
-   msg->data[0] = (ctrl->cmd << 1) | 2 | EA;   /* command */
-   memcpy(msg->data + 1, ctrl->data, ctrl->len);
+   msg->data[0] = (ctrl->cmd << 1) | 2 | EA;   /* command */
+   msg->data[1] = ((ctrl->len) << 1) | EA;
+   memcpy(msg->data + 2, ctrl->data, ctrl->len);
gsm_data_queue(gsm->dlci[0], msg);
 }
 
-- 
2.7.0

[PATCH 8/8] tty: n_gsm: Enable reception of frames separated with a single SOF marker

2016-02-22 Thread Andrej Krpic

Frames may be separated with a single SOF (5.2.6.1 of 27.010 mux spec).
While transmission of a single SOF between frames is implemented in
gsm_data_kick, the reception isn't.

As a side effect, it is now possible to receive and ignore a stream of
consecutive SOFs (5.2.5 of 27.010 mux spec).

Signed-off-by: Andrej Krpic 
---
 drivers/tty/n_gsm.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c
index b0d9edd..12b149d 100644
--- a/drivers/tty/n_gsm.c
+++ b/drivers/tty/n_gsm.c
@@ -1895,9 +1895,14 @@ static void gsm0_receive(struct gsm_mux *gsm, unsigned 
char c)
}
break;
case GSM_ADDRESS:   /* Address EA */
-   gsm->fcs = gsm_fcs_add(gsm->fcs, c);
+   /* Ignore (not first) GSM0_SOF as it decodes into
+* reserved DLCI 62 (5.6 of the 27.010 mux spec).
+*/
+   if (c == GSM0_SOF)
+   break;
if (gsm_read_ea(&gsm->address, c))
gsm->state = GSM_CONTROL;
+   gsm->fcs = gsm_fcs_add(gsm->fcs, c);
break;
case GSM_CONTROL:   /* Control Byte */
gsm->fcs = gsm_fcs_add(gsm->fcs, c);
@@ -1944,13 +1949,10 @@ static void gsm0_receive(struct gsm_mux *gsm, unsigned 
char c)
case GSM_FCS:   /* FCS follows the packet */
gsm->received_fcs = c;
gsm_queue(gsm);
-   gsm->state = GSM_SSOF;
-   break;
-   case GSM_SSOF:
-   if (c == GSM0_SOF) {
-   gsm->state = GSM_SEARCH;
-   break;
-   }
+   /* Frames can be separated with a single GSM0_SOF.
+* Deal with consecutive GSM0_SOFs in GSM_ADDRESS.
+*/
+   gsm->state = GSM_SEARCH;
break;
}
 }
-- 
2.7.0

[PATCH 0/8] tty: n_gsm: Make mux work as a responder station

2016-02-22 Thread Andrej Krpic

When using n_gsm you have to explicitly set it to work as a initiator
station. This led me to believe that it can also work as a responder.

In a use case where I needed responder station mux I came to the
conclusion that it actually does not work. Second and third patch
fix dealings with frame C/R bit that then enable mux to work as a
responder station.

Next patches in the series then fix bugs that were found after two 
instances of n_gsm connected over null-modem cable were used.

First patch fixes formatting errors, such as space before comma, and
most of the warnings reported by the checkpatch script.


Andrej Krpic (8):
  tty: n_gsm: fix formatting errors
  tty: n_gsm: fix C/R bit when sending as a responder
  tty: n_gsm: make mux work as a responder station
  tty: n_gsm: send DM response when accessing an invalid channel
  tty: n_gsm: replace dead code with a meaningful comment
  tty: n_gsm: add missing length field in control channel commands
  tty: n_gsm: properly format Modem Status Command message
  tty: n_gsm: Enable reception of frames separated with a single SOF
marker

 drivers/tty/n_gsm.c | 191 +---
 1 file changed, 123 insertions(+), 68 deletions(-)

-- 
2.7.0

[PATCH 5/8] tty: n_gsm: replace dead code with a meaningful comment

2016-02-22 Thread Andrej Krpic

UI/UIH frame can be received as a command from other station or
as a response to a command we issued earlier.

Add this knowledge to the source code as a comment and remove
useless #if 0/#endif block.

Signed-off-by: Andrej Krpic 
---
 drivers/tty/n_gsm.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c
index 8551fa4..3c4c521 100644
--- a/drivers/tty/n_gsm.c
+++ b/drivers/tty/n_gsm.c
@@ -1852,10 +1852,11 @@ static void gsm_queue(struct gsm_mux *gsm)
case UI|PF:
case UIH:
case UIH|PF:
-#if 0
-   if (cr)
-   goto invalid;
-#endif
+   /* Don't reject frames based on cr value as UI/UIH
+* frame can be received as a command from other
+* station or as a response to a command we issued
+* earlier.
+*/
if (dlci == NULL || dlci->state != DLCI_OPEN) {
gsm_response(gsm, address, DM|PF);
return;
-- 
2.7.0

[PATCH 4/8] tty: n_gsm: send DM response when accessing an invalid channel

2016-02-22 Thread Andrej Krpic

Change C/R bit in a response to a UI/UIH frame sent to
non-existing/closed channel. As DM frame type is only valid as a
response, it should be sent using gsm_response function.

Signed-off-by: Andrej Krpic 
---
 drivers/tty/n_gsm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c
index 05b562d..8551fa4 100644
--- a/drivers/tty/n_gsm.c
+++ b/drivers/tty/n_gsm.c
@@ -1857,7 +1857,7 @@ static void gsm_queue(struct gsm_mux *gsm)
goto invalid;
 #endif
if (dlci == NULL || dlci->state != DLCI_OPEN) {
-   gsm_command(gsm, address, DM|PF);
+   gsm_response(gsm, address, DM|PF);
return;
}
dlci->data(dlci, gsm->buf, gsm->len);
-- 
2.7.0

Re: [PATCH 08/10] x86/xsaves: Fix PTRACE frames for XSAVES

2016-02-22 Thread Dave Hansen

On 02/22/2016 02:45 PM, Andy Lutomirski wrote:
>> +/*
>> > + * Convert from kernel XSAVES compacted format to standard format and copy
>> > + * to a ptrace buffer. It supports partial copy but pos always starts from
>> > + * zero. This is called from xstateregs_get() and there we check the cpu
>> > + * has XSAVES.
>> > + */
>> > +int copyout_from_xsaves(unsigned int pos, unsigned int count, void *kbuf,
>> > +   void __user *ubuf, const struct xregs_state *xsave)
> Now that you've written this code, can it be shared with the signal
> handling code?

It could be.  But the signal handler code has the advantage of already
having the data in the registers since it's running on its *own* FPU
state, so it can just call XSAVE(S) directly.

This ptrace code *could* do a kernel_fpu_begin(), XRSTOR the user buffer
into the registers, XRSTOR the ptracee's system state in to the
registers, then XSAVES the whole thing to the kernel buffer, then
kernel_fpu_end().

Or, we could remove the signal handler's ability to XSAVE directly to
userspace.  But it already *had* that and we know it works.

Re: [PATCH 08/10] x86/xsaves: Fix PTRACE frames for XSAVES

2016-02-22 Thread Yu-cheng Yu

On Mon, Feb 22, 2016 at 02:45:54PM -0800, Andy Lutomirski wrote:
> > +/*
> > + * Convert from kernel XSAVES compacted format to standard format and copy
> > + * to a ptrace buffer. It supports partial copy but pos always starts from
> > + * zero. This is called from xstateregs_get() and there we check the cpu
> > + * has XSAVES.
> > + */
> > +int copyout_from_xsaves(unsigned int pos, unsigned int count, void *kbuf,
> > +   void __user *ubuf, const struct xregs_state *xsave)
> 
> Now that you've written this code, can it be shared with the signal
> handling code?
> 

For signal handling, we most likely save registers directly to memory.
But for ptrace, the thread being debugged is not the active thread.
Please let me think about it more.

--Yu-cheng

Re: [PATCH 08/10] x86/xsaves: Fix PTRACE frames for XSAVES

2016-02-22 Thread Andy Lutomirski

On Mon, Feb 22, 2016 at 11:00 AM, Yu-cheng Yu  wrote:
> XSAVES uses compacted format and is a kernel instruction. The kernel
> should use standard-format, non-supervisor state data for PTRACE.
>

> +/*
> + * Convert from kernel XSAVES compacted format to standard format and copy
> + * to a ptrace buffer. It supports partial copy but pos always starts from
> + * zero. This is called from xstateregs_get() and there we check the cpu
> + * has XSAVES.
> + */
> +int copyout_from_xsaves(unsigned int pos, unsigned int count, void *kbuf,
> +   void __user *ubuf, const struct xregs_state *xsave)

Now that you've written this code, can it be shared with the signal
handling code?

--Andy


--Andy

Re: [PATCH 04/10] x86/xsaves: Introduce a new check that allows correct xstates copy from kernel to user directly

2016-02-22 Thread Andy Lutomirski

On Mon, Feb 22, 2016 at 10:58 AM, Yu-cheng Yu  wrote:
> XSAVES is a kernel instruction and uses a compacted format. When
> working with user space, the kernel should provide standard-format,
> non-supervisor state data. We cannot do __copy_to_user() from a compacted-
> format kernel xstate area to a signal frame.
>
> Note that the path to copy_fpstate_to_sigframe() does currently check if
> the thread has used FPU, but add a WARN_ONCE() there to detect any
> potential mis-use.
>
> Dave Hansen proposes this method to simplify copy xstate directly to user.
>
> Signed-off-by: Fenghua Yu 
> Signed-off by: Yu-cheng Yu 
> ---
>  arch/x86/kernel/fpu/signal.c | 41 -
>  1 file changed, 40 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
> index 0fbf60c..7676718 100644
> --- a/arch/x86/kernel/fpu/signal.c
> +++ b/arch/x86/kernel/fpu/signal.c
> @@ -130,6 +130,45 @@ static inline int copy_fpregs_to_sigframe(struct 
> xregs_state __user *buf)
> return err;
>  }
>
> +static int should_save_registers_directly(void)

I don't like the name of this function because:

> +{
> +   /*
> +* In signal handling path, the kernel already checks if
> +* FPU instructions have been used before it calls
> +* copy_fpstate_to_sigframe(). We check this here again
> +* to detect any potential mis-use and saving invalid
> +* register values directly to a signal frame.
> +*/
> +   WARN_ONCE(!current->thread.fpu.fpstate_active,
> + "direct FPU save with no math use\n");

... Here "direct" seems to mean that we're asking whether to directly save ...

> +
> +   /*
> +* In the case that we are using a compacted kernel
> +* xsave area, we can not copy the thread.fpu.state
> +* directly to userspace and *must* save it from the
> +* registers directly.
> +*/

... and here "directly" means *both* copying directly to userspace and
saving using xsave directly.

So can you rename it to something with an obvious meaning like
"may_memcpy_fpu_regs" or similar?

--Andy

Re: [PATCH 9/9] ARM: mmp: mark usb_dma_mask as __maybe_unused

2016-02-22 Thread Robert Jarzmik

Arnd Bergmann  writes:

> This variable may be used by some devices that each have their
> on Kconfig symbol, or by none of them, and that causes a build
> warning:
>
> arch/arm/mach-mmp/devices.c:241:12: error: 'usb_dma_mask' defined but not 
> used [-Werror=unused-variable]
>
> Marking it __maybe_unused avoids the warning.
>
> Signed-off-by: Arnd Bergmann 
> ---
>  arch/arm/mach-mmp/devices.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Hi Haojian,

I still didn't get an ack on this one, do you want to carry it through your
tree, or let Arnd carry it or let me carry it ?

Cheers.

--
Robert

[1] The patch
> diff --git a/arch/arm/mach-mmp/devices.c b/arch/arm/mach-mmp/devices.c
> index 3330ac7cfbef..671c7a09ab3d 100644
> --- a/arch/arm/mach-mmp/devices.c
> +++ b/arch/arm/mach-mmp/devices.c
> @@ -238,7 +238,7 @@ void pxa_usb_phy_deinit(void __iomem *phy_reg)
>  #endif
>  
>  #if IS_ENABLED(CONFIG_USB_SUPPORT)
> -static u64 usb_dma_mask = ~(u32)0;
> +static u64 __maybe_unused usb_dma_mask = ~(u32)0;
>  
>  #if IS_ENABLED(CONFIG_USB_MV_UDC)
>  struct resource pxa168_u2o_resources[] = {

-- 
Robert

[PATCH 01/24] staging: lustre: Dynamic LNet Configuration (DLC) IOCTL changes

2016-02-22 Thread James Simmons

From: Amir Shehata 

This is the fourth patch of a set of patches that enables DLC.

This patch changes the IOCTL infrastructure in preparation of
adding extra IOCTL communication between user and kernel space.
The changes include:
- adding a common header to be passed to ioctl infra functions
  instead of passing an exact structure.  This header is meant
  to be included in all structures to be passed through that
  interface.  The IOCTL handler casts this header to a particular
  type that it expects
- All sanity testing on the past in structure is performed in the
  generic ioctl infrastructure code.
- All ioctl handlers changed to take the header instead of a
  particular structure type

Signed-off-by: Amir Shehata 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2456
Reviewed-on: http://review.whamcloud.com/8021
Reviewed-by: Doug Oucharek 
Reviewed-by: James Simmons 
Reviewed-by: John L. Hammond 
Reviewed-by: Oleg Drokin 
---
 .../lustre/include/linux/libcfs/libcfs_ioctl.h |   23 
 drivers/staging/lustre/lnet/lnet/module.c  |4 +-
 drivers/staging/lustre/lnet/selftest/conctl.c  |9 +++-
 drivers/staging/lustre/lnet/selftest/console.c |2 +-
 drivers/staging/lustre/lnet/selftest/console.h |1 -
 .../lustre/lustre/libcfs/linux/linux-module.c  |   54 
 drivers/staging/lustre/lustre/libcfs/module.c  |   51 +-
 7 files changed, 80 insertions(+), 64 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_ioctl.h 
b/drivers/staging/lustre/include/linux/libcfs/libcfs_ioctl.h
index e4463ad..0598702 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs_ioctl.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_ioctl.h
@@ -43,9 +43,13 @@
 
 #define LIBCFS_IOCTL_VERSION 0x0001000a
 
-struct libcfs_ioctl_data {
+struct libcfs_ioctl_hdr {
__u32 ioc_len;
__u32 ioc_version;
+};
+
+struct libcfs_ioctl_data {
+   struct libcfs_ioctl_hdr ioc_hdr;
 
__u64 ioc_nid;
__u64 ioc_u64[1];
@@ -70,11 +74,6 @@ struct libcfs_ioctl_data {
 
 #define ioc_priority ioc_u32[0]
 
-struct libcfs_ioctl_hdr {
-   __u32 ioc_len;
-   __u32 ioc_version;
-};
-
 struct libcfs_debug_ioctl_data {
struct libcfs_ioctl_hdr hdr;
unsigned int subs;
@@ -90,7 +89,7 @@ do {  \
 
 struct libcfs_ioctl_handler {
struct list_head item;
-   int (*handle_ioctl)(unsigned int cmd, struct libcfs_ioctl_data *data);
+   int (*handle_ioctl)(unsigned int cmd, struct libcfs_ioctl_hdr *hdr);
 };
 
 #define DECLARE_IOCTL_HANDLER(ident, func)   \
@@ -148,9 +147,9 @@ static inline int libcfs_ioctl_packlen(struct 
libcfs_ioctl_data *data)
return len;
 }
 
-static inline int libcfs_ioctl_is_invalid(struct libcfs_ioctl_data *data)
+static inline bool libcfs_ioctl_is_invalid(struct libcfs_ioctl_data *data)
 {
-   if (data->ioc_len > (1<<30)) {
+   if (data->ioc_hdr.ioc_len > (1 << 30)) {
CERROR("LIBCFS ioctl: ioc_len larger than 1<<30\n");
return 1;
}
@@ -186,7 +185,7 @@ static inline int libcfs_ioctl_is_invalid(struct 
libcfs_ioctl_data *data)
CERROR("LIBCFS ioctl: plen2 nonzero but no pbuf2 pointer\n");
return 1;
}
-   if ((__u32)libcfs_ioctl_packlen(data) != data->ioc_len) {
+   if ((__u32)libcfs_ioctl_packlen(data) != data->ioc_hdr.ioc_len) {
CERROR("LIBCFS ioctl: packlen != ioc_len\n");
return 1;
}
@@ -206,7 +205,9 @@ static inline int libcfs_ioctl_is_invalid(struct 
libcfs_ioctl_data *data)
 
 int libcfs_register_ioctl(struct libcfs_ioctl_handler *hand);
 int libcfs_deregister_ioctl(struct libcfs_ioctl_handler *hand);
-int libcfs_ioctl_getdata(char *buf, char *end, void __user *arg);
+int libcfs_ioctl_getdata_len(const struct libcfs_ioctl_hdr __user *arg,
+__u32 *buf_len);
 int libcfs_ioctl_popdata(void __user *arg, void *buf, int size);
+int libcfs_ioctl_data_adjust(struct libcfs_ioctl_data *data);
 
 #endif /* __LIBCFS_IOCTL_H__ */
diff --git a/drivers/staging/lustre/lnet/lnet/module.c 
b/drivers/staging/lustre/lnet/lnet/module.c
index cd37303..46f5241 100644
--- a/drivers/staging/lustre/lnet/lnet/module.c
+++ b/drivers/staging/lustre/lnet/lnet/module.c
@@ -84,7 +84,7 @@ lnet_unconfigure(void)
 }
 
 static int
-lnet_ioctl(unsigned int cmd, struct libcfs_ioctl_data *data)
+lnet_ioctl(unsigned int cmd, struct libcfs_ioctl_hdr *hdr)
 {
int rc;
 
@@ -103,7 +103,7 @@ lnet_ioctl(unsigned int cmd, struct libcfs_ioctl_data *data)
 */
rc = LNetNIInit(LNET_PID_ANY);
if (rc >= 0) {
-   rc = LNetCtl(cmd, data);
+   rc = LNetCtl(cmd, hdr);
LNetNIFini();
}
return rc;
diff --git a/drivers/staging/lustre/l

[PATCH 03/24] staging: lustre: fix crash due to NULL networks string

2016-02-22 Thread James Simmons

From: Amir Shehata 

If there is an invalid networks or ip2nets lnet_parse_networks()
gets called with a NULL 'network' string parameter

lnet_parse_networks() needs to sanitize its input string now that
it's being called from multiple places.  Instead, check for
a NULL string everytime the function is called, which reduces the
probability of errors with other code modifications.

Signed-off-by: Amir Shehata 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5540
Reviewed-on: http://review.whamcloud.com/11626
Reviewed-by: Isaac Huang 
Reviewed-by: Doug Oucharek 
Reviewed-by: Oleg Drokin 
---
 drivers/staging/lustre/lnet/lnet/api-ni.c |5 +
 drivers/staging/lustre/lnet/lnet/config.c |9 -
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c 
b/drivers/staging/lustre/lnet/lnet/api-ni.c
index b2b914a..c68d01e 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -1535,7 +1535,6 @@ LNetNIInit(lnet_pid_t requested_pid)
lnet_ping_info_t *pinfo;
lnet_handle_md_t md_handle;
struct list_head net_head;
-   char *nets;
 
INIT_LIST_HEAD(&net_head);
 
@@ -1550,13 +1549,11 @@ LNetNIInit(lnet_pid_t requested_pid)
return rc;
}
 
-   nets = lnet_get_networks();
-
rc = lnet_prepare(requested_pid);
if (rc)
goto failed0;
 
-   rc = lnet_parse_networks(&net_head, nets);
+   rc = lnet_parse_networks(&net_head, lnet_get_networks());
if (rc < 0)
goto failed1;
 
diff --git a/drivers/staging/lustre/lnet/lnet/config.c 
b/drivers/staging/lustre/lnet/lnet/config.c
index 1ef07cd..013d41b 100644
--- a/drivers/staging/lustre/lnet/lnet/config.c
+++ b/drivers/staging/lustre/lnet/lnet/config.c
@@ -184,7 +184,7 @@ int
 lnet_parse_networks(struct list_head *nilist, char *networks)
 {
struct cfs_expr_list *el = NULL;
-   int tokensize = strlen(networks) + 1;
+   int tokensize;
char *tokens;
char *str;
char *tmp;
@@ -192,6 +192,11 @@ lnet_parse_networks(struct list_head *nilist, char 
*networks)
__u32 net;
int nnets = 0;
 
+   if (!networks) {
+   CERROR("networks string is undefined\n");
+   return -EINVAL;
+   }
+
if (strlen(networks) > LNET_SINGLE_TEXTBUF_NOB) {
/* _WAY_ conservative */
LCONSOLE_ERROR_MSG(0x112,
@@ -199,6 +204,8 @@ lnet_parse_networks(struct list_head *nilist, char 
*networks)
return -EINVAL;
}
 
+   tokensize = strlen(networks) + 1;
+
LIBCFS_ALLOC(tokens, tokensize);
if (!tokens) {
CERROR("Can't allocate net tokens\n");
-- 
1.7.1

[PATCH 04/24] staging: lustre: DLC user/kernel space glue code

2016-02-22 Thread James Simmons

From: Amir Shehata 

This is the sixth patch of a set of patches that enables DLC.

This patch enables the user space to call into the kernel space
DLC code.  Added handlers in the LNetCtl function to call
the new functions added for Dynamic Lnet Configuration

Signed-off-by: Amir Shehata 
ntel-bug-id: https://jira.hpdd.intel.com/browse/LU-2456
Reviewed-on: http://review.whamcloud.com/8023
Reviewed-by: James Simmons 
Reviewed-by: Doug Oucharek 
Reviewed-by: John L. Hammond 
Reviewed-by: Oleg Drokin 
---
 .../staging/lustre/include/linux/lnet/lib-lnet.h   |   22 ++-
 .../staging/lustre/include/linux/lnet/lib-types.h  |8 +
 drivers/staging/lustre/lnet/lnet/api-ni.c  |  188 ++--
 drivers/staging/lustre/lnet/lnet/module.c  |   64 +++-
 drivers/staging/lustre/lnet/lnet/peer.c|   30 ++--
 drivers/staging/lustre/lnet/lnet/router.c  |   67 ++--
 6 files changed, 329 insertions(+), 50 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h 
b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
index 1157819..5e16fe0 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
@@ -39,6 +39,7 @@
 #include "api.h"
 #include "lnet.h"
 #include "lib-types.h"
+#include "lib-dlc.h"
 
 extern lnet_t  the_lnet;   /* THE network */
 
@@ -458,6 +459,12 @@ int lnet_del_route(__u32 net, lnet_nid_t gw_nid);
 void lnet_destroy_routes(void);
 int lnet_get_route(int idx, __u32 *net, __u32 *hops,
   lnet_nid_t *gateway, __u32 *alive, __u32 *priority);
+int lnet_get_net_config(int idx, __u32 *cpt_count, __u64 *nid,
+   int *peer_timeout, int *peer_tx_credits,
+   int *peer_rtr_cr, int *max_tx_credits,
+   struct lnet_ioctl_net_config *net_config);
+int lnet_get_rtr_pool_cfg(int idx, struct lnet_ioctl_pool_cfg *pool_cfg);
+
 void lnet_router_debugfs_init(void);
 void lnet_router_debugfs_fini(void);
 int  lnet_rtrpools_alloc(int im_a_router);
@@ -467,6 +474,10 @@ int lnet_rtrpools_enable(void);
 void lnet_rtrpools_disable(void);
 void lnet_rtrpools_free(int keep_pools);
 lnet_remotenet_t *lnet_find_net_locked(__u32 net);
+int lnet_dyn_add_ni(lnet_pid_t requested_pid, char *nets,
+   __s32 peer_timeout, __s32 peer_cr, __s32 peer_buf_cr,
+   __s32 credits);
+int lnet_dyn_del_ni(__u32 net);
 
 int lnet_islocalnid(lnet_nid_t nid);
 int lnet_islocalnet(__u32 net);
@@ -693,11 +704,12 @@ void lnet_peer_tables_cleanup(lnet_ni_t *ni);
 void lnet_peer_tables_destroy(void);
 int lnet_peer_tables_create(void);
 void lnet_debug_peer(lnet_nid_t nid);
-int lnet_get_peers(int count, __u64 *nid, char *alivness,
-  int *ncpt, int *refcount,
-  int *ni_peer_tx_credits, int *peer_tx_credits,
-  int *peer_rtr_credits, int *peer_min_rtr_credtis,
-  int *peer_tx_qnob);
+int lnet_get_peer_info(__u32 peer_index, __u64 *nid,
+  char alivness[LNET_MAX_STR_LEN],
+  __u32 *cpt_iter, __u32 *refcount,
+  __u32 *ni_peer_tx_credits, __u32 *peer_tx_credits,
+  __u32 *peer_rtr_credits, __u32 *peer_min_rtr_credtis,
+  __u32 *peer_tx_qnob);
 
 static inline void
 lnet_peer_set_alive(lnet_peer_t *lp)
diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h 
b/drivers/staging/lustre/include/linux/lnet/lib-types.h
index b0ba9d8..e4a8f6e 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
@@ -627,6 +627,14 @@ typedef struct {
/* test protocol compatibility flags */
int   ln_testprotocompat;
 
+   /*
+* 0 - load the NIs from the mod params
+* 1 - do not load the NIs from the mod params
+* Reverse logic to ensure that other calls to LNetNIInit
+* need no change
+*/
+   bool  ln_nis_from_mod_params;
+
 } lnet_t;
 
 #endif
diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c 
b/drivers/staging/lustre/lnet/lnet/api-ni.c
index c68d01e..fa65797 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -1553,7 +1553,9 @@ LNetNIInit(lnet_pid_t requested_pid)
if (rc)
goto failed0;
 
-   rc = lnet_parse_networks(&net_head, lnet_get_networks());
+   rc = lnet_parse_networks(&net_head,
+!the_lnet.ln_nis_from_mod_params ?
+lnet_get_networks() : "");
if (rc < 0)
goto failed1;
 
@@ -1668,6 +1670,94 @@ LNetNIFini(void)
 }
 EXPORT_SYMBOL(LNetNIFini);
 
+/**
+ * Grabs the ni data from the ni structure and fills the out
+ * parameters
+ *
+ * \param[in] ni network   interface structure
+ * \param[out] c

[PATCH 08/24] staging: lustre: return appropriate errno when adding route

2016-02-22 Thread James Simmons

From: Amir Shehata 

When adding route it ignored specific scenarios, namely:
1. route already exists
2. route is on a local net
3. route is unreacheable

This patch returns the appropriate return codes from the lower level
function lnet_add_route(), and then ignores the above case from the
calling function, lnet_parse_route().  This is needed so we don't
halt processing routes in the module parameters.

However, we can now add routes dynamically, and it should be returned
to the user whether adding the requested route succeeded or failed.

In userspace it is determined whether to continue adding routes or to
halt processing.  Currently "lnetctl import < config" continues
adding the rest of the configuration and reports at the end which
operations passed and which ones failed.

Signed-off-by: Amir Shehata 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6045
Reviewed-on: http://review.whamcloud.com/13116
Reviewed-by: James Simmons 
Reviewed-by: Doug Oucharek 
Reviewed-by: Isaac Huang 
Reviewed-by: Oleg Drokin 
---
 drivers/staging/lustre/lnet/lnet/config.c |2 +-
 drivers/staging/lustre/lnet/lnet/router.c |   11 +++
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/config.c 
b/drivers/staging/lustre/lnet/lnet/config.c
index c04a0ef..8c80625 100644
--- a/drivers/staging/lustre/lnet/lnet/config.c
+++ b/drivers/staging/lustre/lnet/lnet/config.c
@@ -769,7 +769,7 @@ lnet_parse_route(char *str, int *im_a_router)
}
 
rc = lnet_add_route(net, hops, nid, priority);
-   if (rc) {
+   if (rc && rc != -EEXIST && rc != -EHOSTUNREACH) {
CERROR("Can't create route to %s via %s\n",
   libcfs_net2str(net),
   libcfs_nid2str(nid));
diff --git a/drivers/staging/lustre/lnet/lnet/router.c 
b/drivers/staging/lustre/lnet/lnet/router.c
index d748931..511e446 100644
--- a/drivers/staging/lustre/lnet/lnet/router.c
+++ b/drivers/staging/lustre/lnet/lnet/router.c
@@ -317,7 +317,7 @@ lnet_add_route(__u32 net, unsigned int hops, lnet_nid_t 
gateway,
return -EINVAL;
 
if (lnet_islocalnet(net))  /* it's a local network */
-   return 0;  /* ignore the route entry */
+   return -EEXIST;
 
/* Assume net, route, all new */
LIBCFS_ALLOC(route, sizeof(*route));
@@ -348,7 +348,7 @@ lnet_add_route(__u32 net, unsigned int hops, lnet_nid_t 
gateway,
LIBCFS_FREE(rnet, sizeof(*rnet));
 
if (rc == -EHOSTUNREACH) /* gateway is not on a local net */
-   return 0;   /* ignore the route entry */
+   return rc;  /* ignore the route entry */
CERROR("Error %d creating route %s %d %s\n", rc,
   libcfs_net2str(net), hops,
   libcfs_nid2str(gateway));
@@ -395,14 +395,17 @@ lnet_add_route(__u32 net, unsigned int hops, lnet_nid_t 
gateway,
/* -1 for notify or !add_route */
lnet_peer_decref_locked(route->lr_gateway);
lnet_net_unlock(LNET_LOCK_EX);
+   rc = 0;
 
-   if (!add_route)
+   if (!add_route) {
+   rc = -EEXIST;
LIBCFS_FREE(route, sizeof(*route));
+   }
 
if (rnet != rnet2)
LIBCFS_FREE(rnet, sizeof(*rnet));
 
-   return 0;
+   return rc;
 }
 
 int
-- 
1.7.1

[PATCH v2] ASoC: wm9713: fix regmap free path

2016-02-22 Thread Robert Jarzmik

In the conversion to regmap, I assumed that the devm_() variant could be
used in the soc probe function.

As a mater of fact with the current code the regmap is freed twice
because of the devm_() call:
(mutex_lock) from [] (debugfs_remove_recursive+0x50/0x1d0)
(debugfs_remove_recursive) from [] (regmap_debugfs_exit+0x1c/0xd4)
(regmap_debugfs_exit) from [] (regmap_exit+0x28/0xc8)
(regmap_exit) from [] (release_nodes+0x18c/0x204)
(release_nodes) from [] (device_release+0x18/0x90)
(device_release) from [] (kobject_release+0x90/0x1bc)
(kobject_release) from [] (wm9713_soc_remove+0x1c/0x24)
(wm9713_soc_remove) from [] (soc_remove_component+0x50/0x7c)
(soc_remove_component) from [] (soc_remove_dai_links+0x118/0x228)
(soc_remove_dai_links) from [] (snd_soc_register_card+0x4e4/0xdd4)
(snd_soc_register_card) from [] (devm_snd_soc_register_card+0x34/0x70)

Fix this by replacing the devm_regmap initialization code with the non
devm_() variant.

Fixes: 700dadfefc3d ASoC: wm9713: convert to regmap
Cc: Lars-Peter Clausen 
Signed-off-by: Robert Jarzmik 
---
Since v1: changed the fix into removing the devm_() variant
---
 sound/soc/codecs/wm9713.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/soc/codecs/wm9713.c b/sound/soc/codecs/wm9713.c
index 4e522302bb8a..edd270f6d8e5 100644
--- a/sound/soc/codecs/wm9713.c
+++ b/sound/soc/codecs/wm9713.c
@@ -1212,7 +1212,7 @@ static int wm9713_soc_probe(struct snd_soc_codec *codec)
if (IS_ERR(wm9713->ac97))
return PTR_ERR(wm9713->ac97);
 
-   regmap = devm_regmap_init_ac97(wm9713->ac97, &wm9713_regmap_config);
+   regmap = regmap_init_ac97(wm9713->ac97, &wm9713_regmap_config);
if (IS_ERR(regmap)) {
snd_soc_free_ac97_codec(wm9713->ac97);
return PTR_ERR(regmap);
-- 
2.1.4

[PATCH 09/24] staging: lustre: make some lnet functions static

2016-02-22 Thread James Simmons

From: Frank Zago 

Some functions and variables are only used in their C file, so reduce
their scope. This reduces the code size, and fixes sparse warnings
such as:

warning: symbol 'proc_lnet_routes' was not declared.
Should it be static?
warning: symbol 'proc_lnet_routers' was not declared.
Should it be static?

Some prototypes were removed from C files and added to the proper
header.

Signed-off-by: Frank Zago 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5396
Reviewed-on: http://review.whamcloud.com/12206
Reviewed-by: James Simmons 
Reviewed-by: John L. Hammond 
Reviewed-by: Andreas Dilger 
---
 .../staging/lustre/include/linux/lnet/lib-lnet.h   |2 ++
 drivers/staging/lustre/lnet/lnet/router_proc.c |2 --
 drivers/staging/lustre/lnet/selftest/console.c |4 +---
 drivers/staging/lustre/lnet/selftest/framework.c   |   10 --
 drivers/staging/lustre/lnet/selftest/module.c  |4 +---
 drivers/staging/lustre/lnet/selftest/rpc.c |2 +-
 6 files changed, 5 insertions(+), 19 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h 
b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
index 2ee3d73..0928bc9 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
@@ -448,6 +448,8 @@ lnet_ni_t *lnet_nid2ni_locked(lnet_nid_t nid, int cpt);
 lnet_ni_t *lnet_net2ni_locked(__u32 net, int cpt);
 lnet_ni_t *lnet_net2ni(__u32 net);
 
+extern int portal_rotor;
+
 int lnet_init(void);
 void lnet_fini(void);
 
diff --git a/drivers/staging/lustre/lnet/lnet/router_proc.c 
b/drivers/staging/lustre/lnet/lnet/router_proc.c
index 5cb87b8..fc643df 100644
--- a/drivers/staging/lustre/lnet/lnet/router_proc.c
+++ b/drivers/staging/lustre/lnet/lnet/router_proc.c
@@ -800,8 +800,6 @@ static struct lnet_portal_rotorsportal_rotors[] = {
},
 };
 
-extern int portal_rotor;
-
 static int __proc_lnet_portal_rotor(void *data, int write,
loff_t pos, void __user *buffer, int nob)
 {
diff --git a/drivers/staging/lustre/lnet/selftest/console.c 
b/drivers/staging/lustre/lnet/selftest/console.c
index badc696..e8ca1bf 100644
--- a/drivers/staging/lustre/lnet/selftest/console.c
+++ b/drivers/staging/lustre/lnet/selftest/console.c
@@ -1693,8 +1693,6 @@ lstcon_new_session_id(lst_sid_t *sid)
sid->ses_stamp = cfs_time_current();
 }
 
-extern srpc_service_t lstcon_acceptor_service;
-
 int
 lstcon_session_new(char *name, int key, unsigned feats,
   int timeout, int force, lst_sid_t __user *sid_up)
@@ -1973,7 +1971,7 @@ out:
return rc;
 }
 
-srpc_service_t lstcon_acceptor_service;
+static srpc_service_t lstcon_acceptor_service;
 static void lstcon_init_acceptor_service(void)
 {
/* initialize selftest console acceptor service table */
diff --git a/drivers/staging/lustre/lnet/selftest/framework.c 
b/drivers/staging/lustre/lnet/selftest/framework.c
index 7eca046..3bbc720 100644
--- a/drivers/staging/lustre/lnet/selftest/framework.c
+++ b/drivers/staging/lustre/lnet/selftest/framework.c
@@ -1629,16 +1629,6 @@ static srpc_service_t sfw_services[] = {
}
 };
 
-extern sfw_test_client_ops_t ping_test_client;
-extern srpc_service_t  ping_test_service;
-extern void ping_init_test_client(void);
-extern void ping_init_test_service(void);
-
-extern sfw_test_client_ops_t brw_test_client;
-extern srpc_service_t  brw_test_service;
-extern void brw_init_test_client(void);
-extern void brw_init_test_service(void);
-
 int
 sfw_startup(void)
 {
diff --git a/drivers/staging/lustre/lnet/selftest/module.c 
b/drivers/staging/lustre/lnet/selftest/module.c
index c4bf442..cbb7884 100644
--- a/drivers/staging/lustre/lnet/selftest/module.c
+++ b/drivers/staging/lustre/lnet/selftest/module.c
@@ -37,6 +37,7 @@
 #define DEBUG_SUBSYSTEM S_LNET
 
 #include "selftest.h"
+#include "console.h"
 
 enum {
LST_INIT_NONE = 0,
@@ -47,9 +48,6 @@ enum {
LST_INIT_CONSOLE
 };
 
-extern int lstcon_console_init(void);
-extern int lstcon_console_fini(void);
-
 static int lst_init_step = LST_INIT_NONE;
 
 struct cfs_wi_sched *lst_sched_serial;
diff --git a/drivers/staging/lustre/lnet/selftest/rpc.c 
b/drivers/staging/lustre/lnet/selftest/rpc.c
index 4213198..1b76933 100644
--- a/drivers/staging/lustre/lnet/selftest/rpc.c
+++ b/drivers/staging/lustre/lnet/selftest/rpc.c
@@ -1097,7 +1097,7 @@ srpc_client_rpc_expired(void *data)
spin_unlock(&srpc_data.rpc_glock);
 }
 
-inline void
+static void
 srpc_add_client_rpc_timer(srpc_client_rpc_t *rpc)
 {
stt_timer_t *timer = &rpc->crpc_timer;
-- 
1.7.1

[PATCH 10/24] staging: lustre: missed a few cases of using NULL instead of 0

2016-02-22 Thread James Simmons

From: Frank Zago 

It is preferable to use NULL instead of 0 for pointers. This fixes sparse
warnings such as:

lustre/fld/fld_request.c:126:17: warning: Using plain integer as NULL pointer

The second parameter of class_match_param() was changed to a const, to
be able to remove a cast in one user, to prevent splitting a long
line. No other code change.

Signed-off-by: Frank Zago 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5396
Reviewed-on: http://review.whamcloud.com/12567
Reviewed-by: Dmitry Eremin 
Reviewed-by: John L. Hammond 
Reviewed-by: Bob Glossman 
Reviewed-by: James Simmons 
Reviewed-by: Oleg Drokin 
---
 drivers/staging/lustre/lnet/lnet/api-ni.c  |2 +-
 drivers/staging/lustre/lustre/llite/llite_lib.c|2 +-
 .../staging/lustre/lustre/obdclass/obd_config.c|4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c 
b/drivers/staging/lustre/lnet/lnet/api-ni.c
index 3b7bc36..f223d5d 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -904,7 +904,7 @@ lnet_ping_info_setup(lnet_ping_info_t **ppinfo, 
lnet_handle_md_t *md_handle,
 {
lnet_process_id_t id = {LNET_NID_ANY, LNET_PID_ANY};
lnet_handle_me_t me_handle;
-   lnet_md_t md = {0};
+   lnet_md_t md = { NULL };
int rc, rc2;
 
if (set_eq) {
diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c 
b/drivers/staging/lustre/lustre/llite/llite_lib.c
index 4de085d..2440b07 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -1940,7 +1940,7 @@ int ll_prep_inode(struct inode **inode, struct 
ptlrpc_request *req,
  struct super_block *sb, struct lookup_intent *it)
 {
struct ll_sb_info *sbi = NULL;
-   struct lustre_md md;
+   struct lustre_md md = { NULL };
int rc;
 
LASSERT(*inode || sb);
diff --git a/drivers/staging/lustre/lustre/obdclass/obd_config.c 
b/drivers/staging/lustre/lustre/obdclass/obd_config.c
index c4128ac..6417946 100644
--- a/drivers/staging/lustre/lustre/obdclass/obd_config.c
+++ b/drivers/staging/lustre/lustre/obdclass/obd_config.c
@@ -72,7 +72,7 @@ EXPORT_SYMBOL(class_find_param);
 
 /* returns 0 if this is the first key in the buffer, else 1.
valp points to first char after key. */
-static int class_match_param(char *buf, char *key, char **valp)
+static int class_match_param(char *buf, const char *key, char **valp)
 {
if (!buf)
return 1;
@@ -1008,7 +1008,7 @@ int class_process_proc_param(char *prefix, struct 
lprocfs_vars *lvars,
/* Search proc entries */
while (lvars[j].name) {
var = &lvars[j];
-   if (class_match_param(key, (char *)var->name, NULL) == 0
+   if (!class_match_param(key, var->name, NULL)
&& keylen == strlen(var->name)) {
matched++;
rc = -EROFS;
-- 
1.7.1

[PATCH 05/24] staging: lustre: make local functions static for LNet ni

2016-02-22 Thread James Simmons

From: Frank Zago 

The function lnet_unprepare can be made static.

Signed-off-by: Frank Zago 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5396
Reviewed-on: http://review.whamcloud.com/11306
Reviewed-by: James Simmons 
Reviewed-by: Patrick Farrell 
Reviewed-by: Amir Shehata 
Reviewed-by: John L. Hammond 
Reviewed-by: Oleg Drokin 
---
 drivers/staging/lustre/lnet/lnet/api-ni.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c 
b/drivers/staging/lustre/lnet/lnet/api-ni.c
index fa65797..7583ae4 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -525,7 +525,7 @@ lnet_res_lh_initialize(struct lnet_res_container *rec, 
lnet_libhandle_t *lh)
list_add(&lh->lh_hash_chain, &rec->rec_lh_hash[hash]);
 }
 
-int lnet_unprepare(void);
+static int lnet_unprepare(void);
 
 static int
 lnet_prepare(lnet_pid_t requested_pid)
@@ -611,7 +611,7 @@ lnet_prepare(lnet_pid_t requested_pid)
return rc;
 }
 
-int
+static int
 lnet_unprepare(void)
 {
/*
-- 
1.7.1

[PATCH 14/24] staging: lustre: handle lnet_check_routes() errors

2016-02-22 Thread James Simmons

From: Amir Shehata 

After adding a route, lnet_check_routes() is called to ensure that
the route added doesn't invalidate the routing configuration.  If
lnet_check_routes() fails then the route just added, which caused the
current configuration to be invalidated is deleted, and an error
is returned to the user.

Signed-off-by: Amir Shehata 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6218
Reviewed-on: http://review.whamcloud.com/13445
Reviewed-by: Liang Zhen 
Reviewed-by: Doug Oucharek 
Reviewed-by: Oleg Drokin 
---
 drivers/staging/lustre/lnet/lnet/api-ni.c |8 +++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c 
b/drivers/staging/lustre/lnet/lnet/api-ni.c
index ccd7dcd..ed121a8 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -1907,8 +1907,14 @@ LNetCtl(unsigned int cmd, void *arg)
config->cfg_config_u.cfg_route.rtr_hop,
config->cfg_nid,

config->cfg_config_u.cfg_route.rtr_priority);
+   if (!rc) {
+   rc = lnet_check_routes();
+   if (rc)
+   lnet_del_route(config->cfg_net,
+  config->cfg_nid);
+   }
mutex_unlock(&the_lnet.ln_api_mutex);
-   return rc ? rc : lnet_check_routes();
+   return rc;
 
case IOC_LIBCFS_DEL_ROUTE:
config = arg;
-- 
1.7.1

[PATCH 07/24] staging: lustre: improve LNet clean up code and API

2016-02-22 Thread James Simmons

From: Amir Shehata 

This patch addresses a set of related issues: LU-5568, LU-5734,
LU-5839, LU-5849, LU-5850.

Create the local lnet_startup_lndni() API.  This function starts
up one LND.  lnet_startup_lndnis() calls this function in a loop
on every ni in the list passed in.  lnet_startup_lndni() is
responsible for cleaning up after itself in case of failure.
It calls lnet_free_ni() if the ni fails to start.  It calls
lnet_shutdown_lndni() if it successfully called the
lnd startup function, but fails later on.

lnet_startup_lndnis() also cleans up after itself.
If lnet_startup_lndni() fails then lnet_shutdown_lndnis() is
called to clean up all nis that might have been
started, and then free the rest of the nis on the list
which have not been started yet.

To facilitate the above changes lnet_dyn_del_ni() now
manages the ping info.  It calls lnet_shutdown_lndni(),
to shutdown the NI.  lnet_shutdown_lndni() is no longer
an exposed API and doesn't manage the ping info, making
it callable from lnet_startup_lndni() as well.

There are two scenarios for calling lnet_startup_lndni()

1. from lnet_startup_lndnis()
If lnet_startup_lndni() fails it requires to shutdown the ni
without doing anything with the ping information as it hasn't
been created yet.

2. from lnet_dyn_add_ni()
As above it will shutdown the ni, and then lnet_dyn_add_ni() will
take care of managing the ping info

The second part of this change is to ensure that the LOLND is not
added by lnet_parse_networks(), but the caller which needs to do
it (IE: LNetNIInit)

This change ensures that lnet_dyn_add_ni() need only check if there is
only one net that's being added, if not then it frees everything,
otherwise it proceeds to startup the requested net.

Signed-off-by: Amir Shehata 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5734
Reviewed-on: http://review.whamcloud.com/12658
Reviewed-by: Liang Zhen 
Reviewed-by: Isaac Huang 
Reviewed-by: James Simmons 
Reviewed-by: Oleg Drokin 
---
 .../staging/lustre/include/linux/lnet/lib-lnet.h   |2 +
 drivers/staging/lustre/lnet/lnet/api-ni.c  |  468 ++--
 drivers/staging/lustre/lnet/lnet/config.c  |   14 +-
 3 files changed, 251 insertions(+), 233 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h 
b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
index 5e16fe0..2ee3d73 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
@@ -422,6 +422,8 @@ lnet_ni_decref(lnet_ni_t *ni)
 }
 
 void lnet_ni_free(lnet_ni_t *ni);
+lnet_ni_t *
+lnet_ni_alloc(__u32 net, struct cfs_expr_list *el, struct list_head *nilist);
 
 static inline int
 lnet_nid2peerhash(lnet_nid_t nid)
diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c 
b/drivers/staging/lustre/lnet/lnet/api-ni.c
index 3bed4c3..3b7bc36 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -1066,6 +1066,20 @@ lnet_ni_tq_credits(lnet_ni_t *ni)
 }
 
 static void
+lnet_ni_unlink_locked(lnet_ni_t *ni)
+{
+   if (!list_empty(&ni->ni_cptlist)) {
+   list_del_init(&ni->ni_cptlist);
+   lnet_ni_decref_locked(ni, 0);
+   }
+
+   /* move it to zombie list and nobody can find it anymore */
+   LASSERT(!list_empty(&ni->ni_list));
+   list_move(&ni->ni_list, &the_lnet.ln_nis_zombie);
+   lnet_ni_decref_locked(ni, 0);   /* drop ln_nis' ref */
+}
+
+static void
 lnet_clear_zombies_nis_locked(void)
 {
int i;
@@ -1148,14 +1162,7 @@ lnet_shutdown_lndnis(void)
while (!list_empty(&the_lnet.ln_nis)) {
ni = list_entry(the_lnet.ln_nis.next,
lnet_ni_t, ni_list);
-   /* move it to zombie list and nobody can find it anymore */
-   list_move(&ni->ni_list, &the_lnet.ln_nis_zombie);
-   lnet_ni_decref_locked(ni, 0);   /* drop ln_nis' ref */
-
-   if (!list_empty(&ni->ni_cptlist)) {
-   list_del_init(&ni->ni_cptlist);
-   lnet_ni_decref_locked(ni, 0);
-   }
+   lnet_ni_unlink_locked(ni);
}
 
/* Drop the cached eqwait NI. */
@@ -1192,228 +1199,196 @@ lnet_shutdown_lndnis(void)
lnet_net_unlock(LNET_LOCK_EX);
 }
 
-int
-lnet_shutdown_lndni(__u32 net)
+/* shutdown down the NI and release refcount */
+static void
+lnet_shutdown_lndni(struct lnet_ni *ni)
 {
-   lnet_ping_info_t *pinfo;
-   lnet_handle_md_t md_handle;
-   lnet_ni_t *found_ni = NULL;
-   int ni_count;
-   int rc;
-
-   if (LNET_NETTYP(net) == LOLND)
-   return -EINVAL;
-
-   ni_count = lnet_get_ni_count();
-
-   /* create and link a new ping info, before removing the old one */
-   rc = lnet_ping_info_setup(&pinfo, &md_handle, ni_count - 1, false);
-   if (rc)
-   return rc;
-
-   /* proceed with shutting down the NI */
lnet_n

[PATCH 11/24] staging: lustre: startup lnet acceptor thread dynamically

2016-02-22 Thread James Simmons

From: Amir Shehata 

With DLC it's possible to start up a system with no NIs that require
the acceptor thread, and thus it won't start.  Later on the user
can add an NI that requires the acceptor thread to start, it is
then necessary to start it up.

If the user removes a NI and as a result there are no more
NIs that require the acceptor thread then it should be stopped.
This patch adds logic in the dynamically adding and removing NIs
code to ensure the above logic is implemented.

Signed-off-by: Amir Shehata 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6002
Reviewed-on: http://review.whamcloud.com/13010
Reviewed-by: Isaac Huang 
Reviewed-by: Doug Oucharek 
Reviewed-by: Oleg Drokin 
---
 drivers/staging/lustre/lnet/lnet/acceptor.c |   10 --
 drivers/staging/lustre/lnet/lnet/api-ni.c   |   14 ++
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/acceptor.c 
b/drivers/staging/lustre/lnet/lnet/acceptor.c
index 07df727..9fe3ff7 100644
--- a/drivers/staging/lustre/lnet/lnet/acceptor.c
+++ b/drivers/staging/lustre/lnet/lnet/acceptor.c
@@ -46,7 +46,9 @@ static struct {
int pta_shutdown;
struct socket   *pta_sock;
struct completion   pta_signal;
-} lnet_acceptor_state;
+} lnet_acceptor_state = {
+   .pta_shutdown = 1
+};
 
 int
 lnet_acceptor_port(void)
@@ -444,6 +446,10 @@ lnet_acceptor_start(void)
long rc2;
long secure;
 
+   /* if acceptor is already running return immediately */
+   if (!lnet_acceptor_state.pta_shutdown)
+   return 0;
+
LASSERT(!lnet_acceptor_state.pta_sock);
 
rc = lnet_acceptor_get_tunables();
@@ -484,7 +490,7 @@ lnet_acceptor_start(void)
 void
 lnet_acceptor_stop(void)
 {
-   if (!lnet_acceptor_state.pta_sock) /* not running */
+   if (lnet_acceptor_state.pta_shutdown) /* not running */
return;
 
lnet_acceptor_state.pta_shutdown = 1;
diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c 
b/drivers/staging/lustre/lnet/lnet/api-ni.c
index f223d5d..9497ce1 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -1785,6 +1785,16 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, char *nets,
if (rc)
goto failed1;
 
+   if (ni->ni_lnd->lnd_accept) {
+   rc = lnet_acceptor_start();
+   if (rc < 0) {
+   /* shutdown the ni that we just started */
+   CERROR("Failed to start up acceptor thread\n");
+   lnet_shutdown_lndni(ni);
+   goto failed1;
+   }
+   }
+
lnet_ping_target_update(pinfo, md_handle);
mutex_unlock(&the_lnet.ln_api_mutex);
 
@@ -1832,6 +1842,10 @@ lnet_dyn_del_ni(__u32 net)
lnet_ni_decref_locked(ni, 0);
 
lnet_shutdown_lndni(ni);
+
+   if (!lnet_count_acceptor_nis())
+   lnet_acceptor_stop();
+
lnet_ping_target_update(pinfo, md_handle);
goto out;
 failed:
-- 
1.7.1

[PATCH 13/24] staging: lustre: return -EEXIST if NI is not unique

2016-02-22 Thread James Simmons

From: Amir Shehata 

Return -EEXIST and not -EINVAL when trying to add a
network interface which is not unique.

Some minor cleanup in api-ni.c

Signed-off-by: Amir Shehata 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5875
Reviewed-on: http://review.whamcloud.com/13056
Reviewed-by: Isaac Huang 
Reviewed-by: James Simmons 
Reviewed-by: Oleg Drokin 
---
 drivers/staging/lustre/lnet/lnet/api-ni.c |   20 +---
 1 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c 
b/drivers/staging/lustre/lnet/lnet/api-ni.c
index 62a9e45..ccd7dcd 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -1219,7 +1219,7 @@ static int
 lnet_startup_lndni(struct lnet_ni *ni, __s32 peer_timeout,
   __s32 peer_cr, __s32 peer_buf_cr, __s32 credits)
 {
-   int rc = 0;
+   int rc = -EINVAL;
int lnd_type;
lnd_t *lnd;
struct lnet_tx_queue *tq;
@@ -1237,19 +1237,19 @@ lnet_startup_lndni(struct lnet_ni *ni, __s32 
peer_timeout,
 
/* Make sure this new NI is unique. */
lnet_net_lock(LNET_LOCK_EX);
-   if (!lnet_net_unique(LNET_NIDNET(ni->ni_nid), &the_lnet.ln_nis)) {
+   rc = lnet_net_unique(LNET_NIDNET(ni->ni_nid), &the_lnet.ln_nis);
+   lnet_net_unlock(LNET_LOCK_EX);
+   if (!rc) {
if (lnd_type == LOLND) {
-   lnet_net_unlock(LNET_LOCK_EX);
lnet_ni_free(ni);
return 0;
}
-   lnet_net_unlock(LNET_LOCK_EX);
 
CERROR("Net %s is not unique\n",
   libcfs_net2str(LNET_NIDNET(ni->ni_nid)));
+   rc = -EEXIST;
goto failed0;
}
-   lnet_net_unlock(LNET_LOCK_EX);
 
mutex_lock(&the_lnet.ln_lnd_mutex);
lnd = lnet_find_lnd_by_type(lnd_type);
@@ -1265,6 +1265,7 @@ lnet_startup_lndni(struct lnet_ni *ni, __s32 peer_timeout,
CERROR("Can't load LND %s, module %s, rc=%d\n",
   libcfs_lnd2str(lnd_type),
   libcfs_lnd2modname(lnd_type), rc);
+   rc = -EINVAL;
goto failed0;
}
}
@@ -1354,7 +1355,7 @@ lnet_startup_lndni(struct lnet_ni *ni, __s32 peer_timeout,
return 0;
 failed0:
lnet_ni_free(ni);
-   return -EINVAL;
+   return rc;
 }
 
 static int
@@ -1503,7 +1504,7 @@ int
 LNetNIInit(lnet_pid_t requested_pid)
 {
int im_a_router = 0;
-   int rc, rc2;
+   int rc;
int ni_count;
lnet_ping_info_t *pinfo;
lnet_handle_md_t md_handle;
@@ -1592,10 +1593,7 @@ LNetNIInit(lnet_pid_t requested_pid)
return 0;
 
 err_stop_ping:
-   lnet_ping_md_unlink(pinfo, &md_handle);
-   lnet_ping_info_free(pinfo);
-   rc2 = LNetEQFree(the_lnet.ln_ping_target_eq);
-   LASSERT(!rc2);
+   lnet_ping_target_fini();
 err_acceptor_stop:
the_lnet.ln_refcount = 0;
lnet_acceptor_stop();
-- 
1.7.1

[PATCH 12/24] staging: lustre: reject invalid net configuration for lnet

2016-02-22 Thread James Simmons

From: Amir Shehata 

Currently if there exists a route that goes over a
remote net and then this net is added dynamically as
a local net, then traffic stops because the code in
lnet_send() determines that the destination nid
can be reached from another local_ni, but the src_nid
is still stuck on the earlier NI, because the src_nid
is stored in the ptlrpc layer and is not updated
when a local NI is configured.

Signed-off-by: Amir Shehata 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5874
Reviewed-on: http://review.whamcloud.com/12912
Reviewed-by: Isaac Huang 
Reviewed-by: James Simmons 
Reviewed-by: Oleg Drokin 
---
 drivers/staging/lustre/lnet/lnet/api-ni.c |   18 +-
 1 files changed, 17 insertions(+), 1 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c 
b/drivers/staging/lustre/lnet/lnet/api-ni.c
index 9497ce1..62a9e45 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -1756,6 +1756,7 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, char *nets,
lnet_handle_md_t md_handle;
struct lnet_ni *ni;
struct list_head net_head;
+   lnet_remotenet_t *rnet;
int rc;
 
INIT_LIST_HEAD(&net_head);
@@ -1772,12 +1773,27 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, char *nets,
goto failed0;
}
 
+   ni = list_entry(net_head.next, struct lnet_ni, ni_list);
+
+   lnet_net_lock(LNET_LOCK_EX);
+   rnet = lnet_find_net_locked(LNET_NIDNET(ni->ni_nid));
+   lnet_net_unlock(LNET_LOCK_EX);
+   /*
+* make sure that the net added doesn't invalidate the current
+* configuration LNet is keeping
+*/
+   if (rnet) {
+   CERROR("Adding net %s will invalidate routing configuration\n",
+  nets);
+   rc = -EUSERS;
+   goto failed0;
+   }
+
rc = lnet_ping_info_setup(&pinfo, &md_handle, 1 + lnet_get_ni_count(),
  false);
if (rc)
goto failed0;
 
-   ni = list_entry(net_head.next, struct lnet_ni, ni_list);
list_del_init(&ni->ni_list);
 
rc = lnet_startup_lndni(ni, peer_timeout, peer_cr,
-- 
1.7.1

[PATCH 18/24] staging: lustre: remove messages from lazy portal on NI shutdown

2016-02-22 Thread James Simmons

From: Amir Shehata 

When shutting down an NI in a busy system, some messages received
on this NI, might be on the lazy portal.  They would have grabbed
a ref count on the NI.  Therefore NI will not be removed until
messages are processed.

In order to avoid this scenario, when an NI is shutdown go through
all messages queued on the lazy portal and drop messages for the
NI being shutdown

Signed-off-by: Amir Shehata 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6040
Reviewed-on: http://review.whamcloud.com/13836
Reviewed-by: Isaac Huang 
Reviewed-by: Liang Zhen 
Reviewed-by: Oleg Drokin 
---
 .../staging/lustre/include/linux/lnet/lib-lnet.h   |1 +
 drivers/staging/lustre/lnet/lnet/api-ni.c  |6 ++
 drivers/staging/lustre/lnet/lnet/lib-ptl.c |   54 +---
 3 files changed, 43 insertions(+), 18 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h 
b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
index 0928bc9..a5f1aec 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
@@ -482,6 +482,7 @@ int lnet_dyn_add_ni(lnet_pid_t requested_pid, char *nets,
__s32 peer_timeout, __s32 peer_cr, __s32 peer_buf_cr,
__s32 credits);
 int lnet_dyn_del_ni(__u32 net);
+int lnet_clear_lazy_portal(struct lnet_ni *ni, int portal, char *reason);
 
 int lnet_islocalnid(lnet_nid_t nid);
 int lnet_islocalnet(__u32 net);
diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c 
b/drivers/staging/lustre/lnet/lnet/api-ni.c
index 0c7db19..3ecc96a 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -1196,10 +1196,16 @@ lnet_shutdown_lndnis(void)
 static void
 lnet_shutdown_lndni(struct lnet_ni *ni)
 {
+   int i;
+
lnet_net_lock(LNET_LOCK_EX);
lnet_ni_unlink_locked(ni);
lnet_net_unlock(LNET_LOCK_EX);
 
+   /* clear messages for this NI on the lazy portal */
+   for (i = 0; i < the_lnet.ln_nportals; i++)
+   lnet_clear_lazy_portal(ni, i, "Shutting down NI");
+
/* Do peer table cleanup for this ni */
lnet_peer_tables_cleanup(ni);
 
diff --git a/drivers/staging/lustre/lnet/lnet/lib-ptl.c 
b/drivers/staging/lustre/lnet/lnet/lib-ptl.c
index 0cdeea9..5a9ab87 100644
--- a/drivers/staging/lustre/lnet/lnet/lib-ptl.c
+++ b/drivers/staging/lustre/lnet/lnet/lib-ptl.c
@@ -902,17 +902,8 @@ LNetSetLazyPortal(int portal)
 }
 EXPORT_SYMBOL(LNetSetLazyPortal);
 
-/**
- * Turn off the lazy portal attribute. Delayed requests on the portal,
- * if any, will be all dropped when this function returns.
- *
- * \param portal Index of the portal to disable the lazy attribute on.
- *
- * \retval 0   On success.
- * \retval -EINVAL If \a portal is not a valid index.
- */
 int
-LNetClearLazyPortal(int portal)
+lnet_clear_lazy_portal(struct lnet_ni *ni, int portal, char *reason)
 {
struct lnet_portal *ptl;
LIST_HEAD(zombies);
@@ -931,21 +922,48 @@ LNetClearLazyPortal(int portal)
return 0;
}
 
-   if (the_lnet.ln_shutdown)
-   CWARN("Active lazy portal %d on exit\n", portal);
-   else
-   CDEBUG(D_NET, "clearing portal %d lazy\n", portal);
+   if (ni) {
+   struct lnet_msg *msg, *tmp;
 
-   /* grab all the blocked messages atomically */
-   list_splice_init(&ptl->ptl_msg_delayed, &zombies);
+   /* grab all messages which are on the NI passed in */
+   list_for_each_entry_safe(msg, tmp, &ptl->ptl_msg_delayed,
+msg_list) {
+   if (msg->msg_rxpeer->lp_ni == ni)
+   list_move(&msg->msg_list, &zombies);
+   }
+   } else {
+   if (the_lnet.ln_shutdown)
+   CWARN("Active lazy portal %d on exit\n", portal);
+   else
+   CDEBUG(D_NET, "clearing portal %d lazy\n", portal);
+
+   /* grab all the blocked messages atomically */
+   list_splice_init(&ptl->ptl_msg_delayed, &zombies);
 
-   lnet_ptl_unsetopt(ptl, LNET_PTL_LAZY);
+   lnet_ptl_unsetopt(ptl, LNET_PTL_LAZY);
+   }
 
lnet_ptl_unlock(ptl);
lnet_res_unlock(LNET_LOCK_EX);
 
-   lnet_drop_delayed_msg_list(&zombies, "Clearing lazy portal attr");
+   lnet_drop_delayed_msg_list(&zombies, reason);
 
return 0;
 }
+
+/**
+ * Turn off the lazy portal attribute. Delayed requests on the portal,
+ * if any, will be all dropped when this function returns.
+ *
+ * \param portal Index of the portal to disable the lazy attribute on.
+ *
+ * \retval 0   On success.
+ * \retval -EINVAL If \a portal is not a valid index.
+ */
+int
+LNetClearLazyPortal(int portal)
+{
+   return lnet_clear_lazy_portal(NULL, portal,
+ "Clearing lazy portal attr")

[PATCH 16/24] staging: lustre: assume a kernel build

2016-02-22 Thread James Simmons

From: John L. Hammond 

In lnet/lnet/ and lnet/selftest/ assume a kernel build (assume that
 __KERNEL__ is defined). Remove some common code only needed for user
space LNet.

Only part of the work of this patch got merged. This is the final
bits.

Signed-off-by: John L. Hammond 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2675
Reviewed-on: http://review.whamcloud.com/13121
Reviewed-by: James Simmons 
Reviewed-by: Amir Shehata 
Reviewed-by: Oleg Drokin 
---
 .../staging/lustre/include/linux/lnet/lib-types.h  |4 --
 drivers/staging/lustre/lnet/lnet/acceptor.c|2 -
 drivers/staging/lustre/lnet/lnet/api-ni.c  |   39 ++--
 drivers/staging/lustre/lnet/lnet/lib-eq.c  |3 --
 drivers/staging/lustre/lnet/lnet/lib-md.c  |3 --
 drivers/staging/lustre/lnet/lnet/lib-me.c  |3 --
 drivers/staging/lustre/lnet/lnet/lib-move.c|5 ---
 drivers/staging/lustre/lnet/lnet/lib-msg.c |   20 +--
 drivers/staging/lustre/lnet/lnet/router.c  |   11 ++
 9 files changed, 7 insertions(+), 83 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h 
b/drivers/staging/lustre/include/linux/lnet/lib-types.h
index 06d4656..f588e06 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
@@ -577,8 +577,6 @@ typedef struct {
/* dying LND instances */
struct list_head  ln_nis_zombie;
lnet_ni_t*ln_loni;  /* the loopback NI */
-   /* NI to wait for events in */
-   lnet_ni_t*ln_eq_waitni;
 
/* remote networks with routes to them */
struct list_head *ln_remote_nets_hash;
@@ -608,8 +606,6 @@ typedef struct {
 
struct mutex  ln_api_mutex;
struct mutex  ln_lnd_mutex;
-   int   ln_init;  /* lnet_init()
-  called? */
/* Have I called LNetNIInit myself? */
int   ln_niinit_self;
/* LNetNIInit/LNetNIFini counter */
diff --git a/drivers/staging/lustre/lnet/lnet/acceptor.c 
b/drivers/staging/lustre/lnet/lnet/acceptor.c
index 9fe3ff7..8f9876b 100644
--- a/drivers/staging/lustre/lnet/lnet/acceptor.c
+++ b/drivers/staging/lustre/lnet/lnet/acceptor.c
@@ -206,8 +206,6 @@ lnet_connect(struct socket **sockp, lnet_nid_t peer_nid,
 }
 EXPORT_SYMBOL(lnet_connect);
 
-/* Below is the code common for both kernel and MT user-space */
-
 static int
 lnet_accept(struct socket *sock, __u32 magic)
 {
diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c 
b/drivers/staging/lustre/lnet/lnet/api-ni.c
index 0ec656a..0c7db19 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -291,7 +291,6 @@ lnet_register_lnd(lnd_t *lnd)
 {
mutex_lock(&the_lnet.ln_lnd_mutex);
 
-   LASSERT(the_lnet.ln_init);
LASSERT(libcfs_isknown_lnd(lnd->lnd_type));
LASSERT(!lnet_find_lnd_by_type(lnd->lnd_type));
 
@@ -309,7 +308,6 @@ lnet_unregister_lnd(lnd_t *lnd)
 {
mutex_lock(&the_lnet.ln_lnd_mutex);
 
-   LASSERT(the_lnet.ln_init);
LASSERT(lnet_find_lnd_by_type(lnd->lnd_type) == lnd);
LASSERT(!lnd->lnd_refcount);
 
@@ -1166,12 +1164,6 @@ lnet_shutdown_lndnis(void)
lnet_ni_unlink_locked(ni);
}
 
-   /* Drop the cached eqwait NI. */
-   if (the_lnet.ln_eq_waitni) {
-   lnet_ni_decref_locked(the_lnet.ln_eq_waitni, 0);
-   the_lnet.ln_eq_waitni = NULL;
-   }
-
/* Drop the cached loopback NI. */
if (the_lnet.ln_loni) {
lnet_ni_decref_locked(the_lnet.ln_loni, 0);
@@ -1364,7 +1356,6 @@ lnet_startup_lndnis(struct list_head *nilist)
 {
struct lnet_ni *ni;
int rc;
-   int lnd_type;
int ni_count = 0;
 
while (!list_empty(nilist)) {
@@ -1378,14 +1369,6 @@ lnet_startup_lndnis(struct list_head *nilist)
ni_count++;
}
 
-   if (the_lnet.ln_eq_waitni && ni_count > 1) {
-   lnd_type = the_lnet.ln_eq_waitni->ni_lnd->lnd_type;
-   LCONSOLE_ERROR_MSG(0x109, "LND %s can only run 
single-network\n",
-  libcfs_lnd2str(lnd_type));
-   rc = -EINVAL;
-   goto failed;
-   }
-
return ni_count;
 failed:
lnet_shutdown_lndnis();
@@ -1396,10 +1379,9 @@ failed:
 /**
  * Initialize LNet library.
  *
- * Only userspace program needs to call this function - it's automatically
- * called in the kernel at module loading time. Caller has to call lnet_fini()
- * after a call to lnet_init(), if and only if the latter returned 0. It must
- * be called exactly once.
+ * Automatically called at module loading time. Caller has to call
+ * lnet_exit() aft

[PATCH 17/24] staging: lustre: prevent assert on LNet module unload

2016-02-22 Thread James Simmons

From: Amir Shehata 

There is a use case where lnet can be unloaded while there are
no NIs configured.  Removing lnet in this case will cause
LNetFini() to be called without a prior call to LNetNIFini().
This will cause the LASSERT(the_lnet.ln_refcount == 0) to be
triggered.

To deal with this use case when LNet is configured a reference
count on the module is taken using try_module_get().  This way
LNet must be unconfigured before it could be removed; therefore
avoiding the above case.  When LNet is unconfigured module_put()
is called to return the reference count.

Signed-off-by: Amir Shehata 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6010
Reviewed-on: http://review.whamcloud.com/13110
Reviewed-by: James Simmons 
Reviewed-by: Doug Oucharek 
Reviewed-by: Oleg Drokin 
---
 drivers/staging/lustre/lnet/lnet/module.c |9 +
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/module.c 
b/drivers/staging/lustre/lnet/lnet/module.c
index 8f053d7..e12fe37 100644
--- a/drivers/staging/lustre/lnet/lnet/module.c
+++ b/drivers/staging/lustre/lnet/lnet/module.c
@@ -53,13 +53,21 @@ lnet_configure(void *arg)
mutex_lock(&lnet_config_mutex);
 
if (!the_lnet.ln_niinit_self) {
+   rc = try_module_get(THIS_MODULE);
+
+   if (rc != 1)
+   goto out;
+
rc = LNetNIInit(LNET_PID_LUSTRE);
if (rc >= 0) {
the_lnet.ln_niinit_self = 1;
rc = 0;
+   } else {
+   module_put(THIS_MODULE);
}
}
 
+out:
mutex_unlock(&lnet_config_mutex);
return rc;
 }
@@ -74,6 +82,7 @@ lnet_unconfigure(void)
if (the_lnet.ln_niinit_self) {
the_lnet.ln_niinit_self = 0;
LNetNIFini();
+   module_put(THIS_MODULE);
}
 
mutex_lock(&the_lnet.ln_api_mutex);
-- 
1.7.1

[PATCH 21/24] staging: lustre: use sock.h in only acceptor.c

2016-02-22 Thread James Simmons

On some platforms having sock.h in lib-types.h would collide with
other included header files being used in the LNet layer. Looking
at what was needed from sock.h only acceptor.c is dependent on it.
To avoid these issues we just use sock.h only in acceptor.c.

Signed-off-by: James Simmons 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6763
Reviewed-on: http://review.whamcloud.com/15386
Reviewed-by: Chris Horn 
Reviewed-by: Amir Shehata 
Reviewed-by: Oleg Drokin 
---
 .../staging/lustre/include/linux/lnet/lib-types.h  |1 -
 drivers/staging/lustre/lnet/lnet/acceptor.c|1 +
 2 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h 
b/drivers/staging/lustre/include/linux/lnet/lib-types.h
index f588e06..c10f03b 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
@@ -38,7 +38,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include "types.h"
 
diff --git a/drivers/staging/lustre/lnet/lnet/acceptor.c 
b/drivers/staging/lustre/lnet/lnet/acceptor.c
index 3468433..1452bb3 100644
--- a/drivers/staging/lustre/lnet/lnet/acceptor.c
+++ b/drivers/staging/lustre/lnet/lnet/acceptor.c
@@ -36,6 +36,7 @@
 
 #define DEBUG_SUBSYSTEM S_LNET
 #include 
+#include 
 #include "../../include/linux/lnet/lib-lnet.h"
 
 static int   accept_port= 988;
-- 
1.7.1

[PATCH 15/24] staging: lustre: improvement to router checker

2016-02-22 Thread James Simmons

From: Amir Shehata 

This patch starts router checker thread all the time.

The router checker only checks routes by ping if
live_router_check_interval or dead_router_check_interval are set
to something other than 0, and there are routes configured.

If these conditions are not met the router checker sleeps until woken
up when a route is added.  It is also woken up whenever the RC is
being stopped to ensure the thread doesn't hang.

In the future when DLC starts configuring the live and dead
router_check_interval parameters, then by manipulating them
the router checker can be turned on and off by the user.

Signed-off-by: Amir Shehata 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6003
Reviewed-on: http://review.whamcloud.com/13035
Reviewed-by: Liang Zhen 
Reviewed-by: Doug Oucharek 
Reviewed-by: James Simmons 
Reviewed-by: Oleg Drokin 
---
 .../staging/lustre/include/linux/lnet/lib-types.h  |7 +++
 drivers/staging/lustre/lnet/lnet/api-ni.c  |1 +
 drivers/staging/lustre/lnet/lnet/router.c  |   51 +---
 3 files changed, 52 insertions(+), 7 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h 
b/drivers/staging/lustre/include/linux/lnet/lib-types.h
index e4a8f6e..06d4656 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
@@ -635,6 +635,13 @@ typedef struct {
 */
bool  ln_nis_from_mod_params;
 
+   /*
+* waitq for router checker.  As long as there are no routes in
+* the list, the router checker will sleep on this queue.  when
+* routes are added the thread will wake up
+*/
+   wait_queue_head_t ln_rc_waitq;
+
 } lnet_t;
 
 #endif
diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c 
b/drivers/staging/lustre/lnet/lnet/api-ni.c
index ed121a8..0ec656a 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -98,6 +98,7 @@ lnet_init_locks(void)
 {
spin_lock_init(&the_lnet.ln_eq_wait_lock);
init_waitqueue_head(&the_lnet.ln_eq_waitq);
+   init_waitqueue_head(&the_lnet.ln_rc_waitq);
mutex_init(&the_lnet.ln_lnd_mutex);
mutex_init(&the_lnet.ln_api_mutex);
 }
diff --git a/drivers/staging/lustre/lnet/lnet/router.c 
b/drivers/staging/lustre/lnet/lnet/router.c
index 511e446..ad9cd44 100644
--- a/drivers/staging/lustre/lnet/lnet/router.c
+++ b/drivers/staging/lustre/lnet/lnet/router.c
@@ -405,6 +405,9 @@ lnet_add_route(__u32 net, unsigned int hops, lnet_nid_t 
gateway,
if (rnet != rnet2)
LIBCFS_FREE(rnet, sizeof(*rnet));
 
+   /* indicate to startup the router checker if configured */
+   wake_up(&the_lnet.ln_rc_waitq);
+
return rc;
 }
 
@@ -1056,11 +1059,6 @@ lnet_router_checker_start(void)
return -EINVAL;
}
 
-   if (!the_lnet.ln_routing &&
-   live_router_check_interval <= 0 &&
-   dead_router_check_interval <= 0)
-   return 0;
-
sema_init(&the_lnet.ln_rc_signal, 0);
/*
 * EQ size doesn't matter; the callback is guaranteed to get every
@@ -1109,6 +1107,8 @@ lnet_router_checker_stop(void)
 
LASSERT(the_lnet.ln_rc_state == LNET_RC_STATE_RUNNING);
the_lnet.ln_rc_state = LNET_RC_STATE_STOPPING;
+   /* wakeup the RC thread if it's sleeping */
+   wake_up(&the_lnet.ln_rc_waitq);
 
/* block until event callback signals exit */
down(&the_lnet.ln_rc_signal);
@@ -1199,6 +1199,33 @@ lnet_prune_rc_data(int wait_unlink)
lnet_net_unlock(LNET_LOCK_EX);
 }
 
+/*
+ * This function is called to check if the RC should block indefinitely.
+ * It's called from lnet_router_checker() as well as being passed to
+ * wait_event_interruptible() to avoid the lost wake_up problem.
+ *
+ * When it's called from wait_event_interruptible() it is necessary to
+ * also not sleep if the rc state is not running to avoid a deadlock
+ * when the system is shutting down
+ */
+static inline bool
+lnet_router_checker_active(void)
+{
+   if (the_lnet.ln_rc_state != LNET_RC_STATE_RUNNING)
+   return true;
+
+   /*
+* Router Checker thread needs to run when routing is enabled in
+* order to call lnet_update_ni_status_locked()
+*/
+   if (the_lnet.ln_routing)
+   return true;
+
+   return !list_empty(&the_lnet.ln_routers) &&
+   (live_router_check_interval > 0 ||
+dead_router_check_interval > 0);
+}
+
 static int
 lnet_router_checker(void *arg)
 {
@@ -1252,8 +1279,18 @@ rescan:
 * because kernel counts # active tasks as nr_running
 * + nr_uninterruptible.
 */
-   set_current_state(TASK_INTERRUPTIBLE);
-   schedule_timeout(cfs_time_seconds(1));
+   /*
+* if there are any routes th

[PATCH 22/24] staging: lustre: Allocate the correct number of rtr buffers

2016-02-22 Thread James Simmons

From: Amir Shehata 

This patch ensures that the correct number of router buffers are
allocated.  It keeps a count that keeps track of the number of
buffers allocated.  Another count keeps the number of buffers
requested. The number of buffers allocated is set when creating
new buffers and reduced when buffers are freed.

The number of requested buffer is set when the buffers are
allocated and is checked when credits are returned to determine
whether the buffer should be freed or kept.

In lnet_rtrpool_adjust_bufs() grab lnet_net_lock() before using
rbp_nbuffers to ensure that it doesn't change by
lnet_return_rx_credits_locked() during the process of allocating
new buffers.  All other access to rbp_nbuffers is already being
protected by lnet_net_lock().

This avoids the case where we allocate less than the desired
number of buffers.

Signed-off-by: Amir Shehata 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6122
Reviewed-on: http://review.whamcloud.com/13519
Reviewed-by: Jinshan Xiong 
Reviewed-by: Doug Oucharek 
Reviewed-by: Oleg Drokin 
---
 .../staging/lustre/include/linux/lnet/lib-types.h  |5 ++-
 drivers/staging/lustre/lnet/lnet/lib-move.c|3 +-
 drivers/staging/lustre/lnet/lnet/router.c  |   32 +++-
 3 files changed, 30 insertions(+), 10 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h 
b/drivers/staging/lustre/include/linux/lnet/lib-types.h
index c10f03b..07b8db1 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
@@ -396,7 +396,10 @@ typedef struct {
struct list_headrbp_msgs;   /* messages blocking
   for a buffer */
int rbp_npages; /* # pages in each buffer */
-   int rbp_nbuffers;   /* # buffers */
+   /* requested number of buffers */
+   int rbp_req_nbuffers;
+   /* # buffers actually allocated */
+   int rbp_nbuffers;
int rbp_credits;/* # free buffers /
 blocked messages */
int rbp_mincredits; /* low water mark */
diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c 
b/drivers/staging/lustre/lnet/lnet/lib-move.c
index e5a8dbc..7bc3e91 100644
--- a/drivers/staging/lustre/lnet/lnet/lib-move.c
+++ b/drivers/staging/lustre/lnet/lnet/lib-move.c
@@ -1113,9 +1113,10 @@ lnet_return_rx_credits_locked(lnet_msg_t *msg)
 * buffers in this pool.  Make sure we never put back
 * more buffers than the stated number.
 */
-   if (rbp->rbp_credits >= rbp->rbp_nbuffers) {
+   if (unlikely(rbp->rbp_credits >= rbp->rbp_req_nbuffers)) {
/* Discard this buffer so we don't have too many. */
lnet_destroy_rtrbuf(rb, rbp->rbp_npages);
+   rbp->rbp_nbuffers--;
} else {
list_add(&rb->rb_list, &rbp->rbp_bufs);
rbp->rbp_credits++;
diff --git a/drivers/staging/lustre/lnet/lnet/router.c 
b/drivers/staging/lustre/lnet/lnet/router.c
index c1e7bc5..198ff03 100644
--- a/drivers/staging/lustre/lnet/lnet/router.c
+++ b/drivers/staging/lustre/lnet/lnet/router.c
@@ -1359,6 +1359,7 @@ lnet_rtrpool_free_bufs(lnet_rtrbufpool_t *rbp, int cpt)
lnet_net_lock(cpt);
lnet_drop_routed_msgs_locked(&rbp->rbp_msgs, cpt);
list_splice_init(&rbp->rbp_bufs, &tmp);
+   rbp->rbp_req_nbuffers = 0;
rbp->rbp_nbuffers = 0;
rbp->rbp_credits = 0;
rbp->rbp_mincredits = 0;
@@ -1379,20 +1380,33 @@ lnet_rtrpool_adjust_bufs(lnet_rtrbufpool_t *rbp, int 
nbufs, int cpt)
lnet_rtrbuf_t *rb;
int num_rb;
int num_buffers = 0;
+   int old_req_nbufs;
int npages = rbp->rbp_npages;
 
+   lnet_net_lock(cpt);
/*
 * If we are called for less buffers than already in the pool, we
-* just lower the nbuffers number and excess buffers will be
+* just lower the req_nbuffers number and excess buffers will be
 * thrown away as they are returned to the free list.  Credits
 * then get adjusted as well.
+* If we already have enough buffers allocated to serve the
+* increase requested, then we can treat that the same way as we
+* do the decrease.
 */
-   if (nbufs <= rbp->rbp_nbuffers) {
-   lnet_net_lock(cpt);
-   rbp->rbp_nbuffers = nbufs;
+   num_rb = nbufs - rbp->rbp_nbuffers;
+   if (nbufs <= rbp->rbp_req_nbuffers || num_rb <= 0) {
+   rbp->rbp_req_nbuffers = nbufs;
lnet_net_unlock(cpt);
return 0;
}
+   /*
+* store the older value of rbp_req_nbuffers and then set it to
+

Re: [PATCH 08/10] x86/xsaves: Fix PTRACE frames for XSAVES

2016-02-22 Thread Yu-cheng Yu

On Mon, Feb 22, 2016 at 02:19:05PM -0800, Dave Hansen wrote:
> On 02/22/2016 12:48 PM, Yu-cheng Yu wrote:
> > It should have been:
> > 
> > xsave->header.xfeatures = xfeatures |
> > (xsave->header.xfeatures & XFEATURE_MASK_SUPERVISOR);
> > 
> > I'll fix it.
> 
> Can we break it out to make it more clear?
> 
> /*
>  * The state that came in from userspace was user-state only.
>  * Mask all the user states out of 'xfeatures'.
>  */
> xsave->header.xfeatures &= XFEATURE_MASK_SUPERVISOR;
> /*
>  * add back in the features that came in from userspace
>  */
> xsave->header.xfeatures |= xfeatures

I will update it in the next version. Thanks!

[PATCH 20/24] staging: lustre: avoid race during lnet acceptor thread termination

2016-02-22 Thread James Simmons

From: Bruno Faccini 

This patch will avoid potential race, around socket sleepers
wait list, during acceptor thread termination and using
sk_callback_lock RW-Lock protection.

Signed-off-by: Bruno Faccini 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6476
Reviewed-on: http://review.whamcloud.com/14503
Reviewed-by: Amir Shehata 
Reviewed-by: John L. Hammond 
Reviewed-by: Oleg Drokin 
---
 drivers/staging/lustre/lnet/lnet/acceptor.c |8 +++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/acceptor.c 
b/drivers/staging/lustre/lnet/lnet/acceptor.c
index 8f9876b..3468433 100644
--- a/drivers/staging/lustre/lnet/lnet/acceptor.c
+++ b/drivers/staging/lustre/lnet/lnet/acceptor.c
@@ -488,11 +488,17 @@ lnet_acceptor_start(void)
 void
 lnet_acceptor_stop(void)
 {
+   struct sock *sk;
+
if (lnet_acceptor_state.pta_shutdown) /* not running */
return;
 
lnet_acceptor_state.pta_shutdown = 1;
-   wake_up_all(sk_sleep(lnet_acceptor_state.pta_sock->sk));
+
+   sk = lnet_acceptor_state.pta_sock->sk;
+
+   /* awake any sleepers using safe method */
+   sk->sk_state_change(sk);
 
/* block until acceptor signals exit */
wait_for_completion(&lnet_acceptor_state.pta_signal);
-- 
1.7.1

[PATCH 23/24] staging: lustre: Use lnet_is_route_alive for router aliveness

2016-02-22 Thread James Simmons

From: Chris Horn 

lctl show_route and lctl route_list will output router aliveness
information via lnet_get_route(). lnet_get_route() should use the
lnet_is_route_alive() function, introduced in e8a1124
http://review.whamcloud.com/7857, to determine route aliveness.

Signed-off-by: Chris Horn 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5733
Reviewed-on: http://review.whamcloud.com/14055
Reviewed-by: James Simmons 
Reviewed-by: Amir Shehata 
Reviewed-by: Oleg Drokin 
---
 drivers/staging/lustre/lnet/lnet/router.c |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/router.c 
b/drivers/staging/lustre/lnet/lnet/router.c
index 198ff03..5a6086b 100644
--- a/drivers/staging/lustre/lnet/lnet/router.c
+++ b/drivers/staging/lustre/lnet/lnet/router.c
@@ -607,8 +607,7 @@ lnet_get_route(int idx, __u32 *net, __u32 *hops,
*hops = route->lr_hops;
*priority = route->lr_priority;
*gateway  = route->lr_gateway->lp_nid;
-   *alive = route->lr_gateway->lp_alive &&
-!route->lr_downis;
+   *alive = lnet_is_route_alive(route);
lnet_net_unlock(cpt);
return 0;
}
-- 
1.7.1

Re: [PATCH 3/4] sched/deadline: Tracepoints for deadline scheduler

2016-02-22 Thread Steven Rostedt

On Mon, 22 Feb 2016 22:30:17 +0100
Peter Zijlstra  wrote:

> On Mon, Feb 22, 2016 at 12:48:54PM -0500, Steven Rostedt wrote:
> 
> > > As it stands, the existing tracepoint have already been an ABI
> > > trainwreck, why would I want to add more?  
> > 
> > Yes, this may become a type of ABI, but even the sched switch
> > tracepoints haven't been that bad. Has it really prevented us from
> > changing anything?  
> 
> The whole wakeup thing where we _still_ have a dummy argument, and have
> been lying about the value for a long time really stinks.

Having a dummy argument is not that bad. Yes, it's inconvenient, and
I'm not sure who even uses it (can we delete it without breaking
anything?) But it doesn't prevent us from going forward with
development.

> 
> > But let me ask, what would you recommend to finding out if the kernel
> > has really given your tasks the recommended runtime within a given
> > period? We can't expect users of SCHED_DEADLINE to be modifying the
> > kernel themselves.  
> 
> So why are these deadline specific tracepoint? Why not extend the ones
> we already have?

I'm not sure how to do that and be able to report when a period has
elapsed, and when the next period is coming.

> 
> Also, will these tracepoints still work if we implement SCHED_DEADLINE
> using Least-Laxity-First or Pfair or some other exotic algorithm? Or
> will be forever be bound to EDF just because tracepoint ABI shite?

Can we come up with generic numbers? I mean, the user that asks for
their task to have a specific runtime within a specific
period/deadline, these seem to be generic already. I'll have to read up
on those that you mention, but do that not have a "replenish" for when
the period starts again? And then a yield, showing the task has given
up its remaining time, or a block, where a task is scheduled out
because it blocked on a lock?

> 
> Worse, the proposed tracepoints are atrocious, look at crap like this:
> 
> > +   if (trace_sched_deadline_yield_enabled()) {
> > +   u64 delta_exec = rq_clock_task(rq) - p->se.exec_start;
> > +   /* Subtract the last run till now */
> > +   if (likely((s64)delta_exec > 0))
> > +   p->dl.runtime -= delta_exec;
> > +   trace_sched_deadline_yield(&p->dl);
> > +   }  
> 
> tracepoints should _NEVER_ change state, ever.

Heh, it's not really changing state. The code directly after this is:

p->dl.runtime = 0;

Without updating dl.runtime, you the tracepoint would report inaccurate
remaining time. Without that, you would get reports of yielding with
full runtimes, making it look like you never ran at all.

> 
> And there's the whole COND tracepoint muck, which also doesn't win any
> prices.

It keeps us from adding flags that may end up going away and us
maintaining a dummy field forever ;-)

> 
> So tell me why these specific tracepoints and why the existing ones
> could not be extended to include this information. For example, why a
> trace_sched_dealine_yield, and not a generic trace_sched_yield() that
> works for all classes.

But what about reporting actual runtime, and when the next period will
come. That only matters for deadline.

> 
> Tell me that the information presented does not pin the implementation.
> 
> And clean up the crap.
> 
> Then I might maybe consider this.
> 
> But do not present me with a bunch of random arse, hacked together
> tracepoints and tell me they might be useful, maybe.

They ARE useful. These are the tracepoints I'm currently using to
debug the deadline scheduler with. They have been indispensable for my
current work.

-- Steve

[PATCH 24/24] staging: lustre: Remove LASSERTS from router checker

2016-02-22 Thread James Simmons

From: Doug Oucharek 

In lnet_router_checker(), there are two LASSERTS.  Neither protects
us from anything and one of them triggered for a customer crashing
the system unecessarily.  This patch removes them.

Signed-off-by: Doug Oucharek 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7362
Reviewed-on: http://review.whamcloud.com/17003
Reviewed-by: James Simmons 
Reviewed-by: Chris Horn 
Reviewed-by: Matt Ezell 
Reviewed-by: Oleg Drokin 
---
 drivers/staging/lustre/lnet/lnet/router.c |4 
 1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/router.c 
b/drivers/staging/lustre/lnet/lnet/router.c
index 5a6086b..5e8b0ba 100644
--- a/drivers/staging/lustre/lnet/lnet/router.c
+++ b/drivers/staging/lustre/lnet/lnet/router.c
@@ -1228,8 +1228,6 @@ lnet_router_checker(void *arg)
 
cfs_block_allsigs();
 
-   LASSERT(the_lnet.ln_rc_state == LNET_RC_STATE_RUNNING);
-
while (the_lnet.ln_rc_state == LNET_RC_STATE_RUNNING) {
__u64 version;
int cpt;
@@ -1287,8 +1285,6 @@ rescan:
 cfs_time_seconds(1));
}
 
-   LASSERT(the_lnet.ln_rc_state == LNET_RC_STATE_STOPPING);
-
lnet_prune_rc_data(1); /* wait for UNLINK */
 
the_lnet.ln_rc_state = LNET_RC_STATE_SHUTDOWN;
-- 
1.7.1

[PATCH 00/24] Second batch of LNet updates

2016-02-22 Thread James Simmons

This patch set fixes many of the LNet issues encounter run in
production environments. One of the long standing issues was
not being able to reconfigure LNet after initialization. Doing
so left it in a broken state. Several other issues are also
addressed in this patch set. Merged back into this patch set
are also the suggestions for improvement from Dan Carpenter
when the original patch set was posted.

Amir Shehata (14):
  staging: lustre: Dynamic LNet Configuration (DLC) IOCTL changes
  staging: lustre: Dynamic LNet Configuration (DLC) show command
  staging: lustre: fix crash due to NULL networks string
  staging: lustre: DLC user/kernel space glue code
  staging: lustre: improve LNet clean up code and API
  staging: lustre: return appropriate errno when adding route
  staging: lustre: startup lnet acceptor thread dynamically
  staging: lustre: reject invalid net configuration for lnet
  staging: lustre: return -EEXIST if NI is not unique
  staging: lustre: handle lnet_check_routes() errors
  staging: lustre: improvement to router checker
  staging: lustre: prevent assert on LNet module unload
  staging: lustre: remove messages from lazy portal on NI shutdown
  staging: lustre: Allocate the correct number of rtr buffers

Bruno Faccini (1):
  staging: lustre: avoid race during lnet acceptor thread termination

Chris Horn (1):
  staging: lustre: Use lnet_is_route_alive for router aliveness

Doug Oucharek (1):
  staging: lustre: Remove LASSERTS from router checker

Frank Zago (4):
  staging: lustre: make local functions static for LNet ni
  staging: lustre: make some lnet functions static
  staging: lustre: missed a few cases of using NULL instead of 0
  staging: lustre: remove unnecessary EXPORT_SYMBOL from lnet layer

James Simmons (1):
  staging: lustre: use sock.h in only acceptor.c

John L. Hammond (2):
  staging: lustre: remove LUSTRE_{,SRV_}LNET_PID
  staging: lustre: assume a kernel build

 .../staging/lustre/include/linux/libcfs/libcfs.h   |2 -
 .../lustre/include/linux/libcfs/libcfs_ioctl.h |   49 +-
 .../lustre/include/linux/libcfs/linux/libcfs.h |3 -
 .../staging/lustre/include/linux/lnet/lib-dlc.h|  122 
 .../staging/lustre/include/linux/lnet/lib-lnet.h   |   22 +
 .../staging/lustre/include/linux/lnet/lib-types.h  |   25 +-
 .../staging/lustre/lnet/klnds/socklnd/socklnd.c|7 +-
 drivers/staging/lustre/lnet/lnet/acceptor.c|   21 +-
 drivers/staging/lustre/lnet/lnet/api-ni.c  |  761 +---
 drivers/staging/lustre/lnet/lnet/config.c  |   25 +-
 drivers/staging/lustre/lnet/lnet/lib-eq.c  |3 -
 drivers/staging/lustre/lnet/lnet/lib-md.c  |3 -
 drivers/staging/lustre/lnet/lnet/lib-me.c  |3 -
 drivers/staging/lustre/lnet/lnet/lib-move.c|   10 +-
 drivers/staging/lustre/lnet/lnet/lib-msg.c |   20 +-
 drivers/staging/lustre/lnet/lnet/lib-ptl.c |   54 +-
 drivers/staging/lustre/lnet/lnet/lib-socket.c  |3 -
 drivers/staging/lustre/lnet/lnet/module.c  |   81 ++-
 drivers/staging/lustre/lnet/lnet/peer.c|   63 ++
 drivers/staging/lustre/lnet/lnet/router.c  |  181 --
 drivers/staging/lustre/lnet/lnet/router_proc.c |2 -
 drivers/staging/lustre/lnet/selftest/conctl.c  |9 +-
 drivers/staging/lustre/lnet/selftest/console.c |6 +-
 drivers/staging/lustre/lnet/selftest/console.h |1 -
 drivers/staging/lustre/lnet/selftest/framework.c   |   10 -
 drivers/staging/lustre/lnet/selftest/module.c  |4 +-
 drivers/staging/lustre/lnet/selftest/rpc.c |4 +-
 .../lustre/lustre/libcfs/linux/linux-module.c  |   55 +-
 drivers/staging/lustre/lustre/libcfs/module.c  |   58 ++-
 drivers/staging/lustre/lustre/llite/llite_lib.c|2 +-
 .../staging/lustre/lustre/obdclass/obd_config.c|4 +-
 drivers/staging/lustre/lustre/ptlrpc/events.c  |4 +-
 32 files changed, 1122 insertions(+), 495 deletions(-)
 create mode 100644 drivers/staging/lustre/include/linux/lnet/lib-dlc.h

[PATCH 19/24] staging: lustre: remove unnecessary EXPORT_SYMBOL from lnet layer

2016-02-22 Thread James Simmons

From: Frank Zago 

A lot of symbols don't need to be exported at all because they are
only used in the module they belong to.

Signed-off-by: Frank Zago 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5829
Reviewed-on: http://review.whamcloud.com/13320
Reviewed-by: James Simmons 
Reviewed-by: Isaac Huang 
Reviewed-by: John L. Hammond 
Reviewed-by: Oleg Drokin 
---
 drivers/staging/lustre/lnet/lnet/lib-socket.c |3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/lib-socket.c 
b/drivers/staging/lustre/lnet/lnet/lib-socket.c
index 53dd0bd..88905d5 100644
--- a/drivers/staging/lustre/lnet/lnet/lib-socket.c
+++ b/drivers/staging/lustre/lnet/lnet/lib-socket.c
@@ -514,7 +514,6 @@ lnet_sock_listen(struct socket **sockp, __u32 local_ip, int 
local_port,
sock_release(*sockp);
return rc;
 }
-EXPORT_SYMBOL(lnet_sock_listen);
 
 int
 lnet_sock_accept(struct socket **newsockp, struct socket *sock)
@@ -558,7 +557,6 @@ failed:
sock_release(newsock);
return rc;
 }
-EXPORT_SYMBOL(lnet_sock_accept);
 
 int
 lnet_sock_connect(struct socket **sockp, int *fatal, __u32 local_ip,
@@ -596,4 +594,3 @@ lnet_sock_connect(struct socket **sockp, int *fatal, __u32 
local_ip,
sock_release(*sockp);
return rc;
 }
-EXPORT_SYMBOL(lnet_sock_connect);
-- 
1.7.1

[PATCH 06/24] staging: lustre: remove LUSTRE_{,SRV_}LNET_PID

2016-02-22 Thread James Simmons

From: John L. Hammond 

Remove LUSTRE_LNET_PID (12354) and LUSTRE_SRV_LNET_PID (12345) from
the libcfs headers and replace their uses with a new macro
LNET_PID_LUSTRE (also 12345) in lnet/types.h.

Signed-off-by: John L. Hammond 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2675
Reviewed-on: http://review.whamcloud.com/11985
Reviewed-by: James Simmons 
Reviewed-by: Bob Glossman 
Reviewed-by: Dmitry Eremin 
Reviewed-by: Andreas Dilger 
---
 .../staging/lustre/include/linux/libcfs/libcfs.h   |2 --
 .../lustre/include/linux/libcfs/linux/libcfs.h |3 ---
 .../staging/lustre/lnet/klnds/socklnd/socklnd.c|7 +--
 drivers/staging/lustre/lnet/lnet/api-ni.c  |2 +-
 drivers/staging/lustre/lnet/lnet/lib-move.c|2 +-
 drivers/staging/lustre/lnet/lnet/module.c  |4 ++--
 drivers/staging/lustre/lnet/lnet/router.c  |2 +-
 drivers/staging/lustre/lnet/selftest/rpc.c |2 +-
 drivers/staging/lustre/lustre/ptlrpc/events.c  |4 ++--
 9 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs.h 
b/drivers/staging/lustre/include/linux/libcfs/libcfs.h
index dc9b88f..5c598c8 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs.h
@@ -51,8 +51,6 @@
 #define LERRCHKSUM(hexnum) (((hexnum) & 0xf) ^ ((hexnum) >> 4 & 0xf) ^ \
   ((hexnum) >> 8 & 0xf))
 
-#define LUSTRE_SRV_LNET_PID  LUSTRE_LNET_PID
-
 #include 
 
 /* need both kernel and user-land acceptor */
diff --git a/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h 
b/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
index aac5900..d94b266 100644
--- a/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
+++ b/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
@@ -118,9 +118,6 @@ do {
\
 #define CDEBUG_STACK() (0L)
 #endif /* __x86_64__ */
 
-/* initial pid  */
-#define LUSTRE_LNET_PID  12345
-
 #define __current_nesting_level() (0)
 
 /**
diff --git a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c 
b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
index 49d716d..854814c 100644
--- a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
+++ b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
@@ -1842,7 +1842,10 @@ ksocknal_query(lnet_ni_t *ni, lnet_nid_t nid, unsigned 
long *when)
unsigned long now = cfs_time_current();
ksock_peer_t *peer = NULL;
rwlock_t *glock = &ksocknal_data.ksnd_global_lock;
-   lnet_process_id_t id = {.nid = nid, .pid = LUSTRE_SRV_LNET_PID};
+   lnet_process_id_t id = {
+   .nid = nid,
+   .pid = LNET_PID_LUSTRE,
+   };
 
read_lock(glock);
 
@@ -2187,7 +2190,7 @@ ksocknal_ctl(lnet_ni_t *ni, unsigned int cmd, void *arg)
 
case IOC_LIBCFS_ADD_PEER:
id.nid = data->ioc_nid;
-   id.pid = LUSTRE_SRV_LNET_PID;
+   id.pid = LNET_PID_LUSTRE;
return ksocknal_add_peer(ni, id,
  data->ioc_u32[0], /* IP */
  data->ioc_u32[1]); /* port */
diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c 
b/drivers/staging/lustre/lnet/lnet/api-ni.c
index 7583ae4..3bed4c3 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -2117,7 +2117,7 @@ static int lnet_ping(lnet_process_id_t id, int timeout_ms,
return -EINVAL;
 
if (id.pid == LNET_PID_ANY)
-   id.pid = LUSTRE_SRV_LNET_PID;
+   id.pid = LNET_PID_LUSTRE;
 
LIBCFS_ALLOC(info, infosz);
if (!info)
diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c 
b/drivers/staging/lustre/lnet/lnet/lib-move.c
index f2b1116..a342ce0 100644
--- a/drivers/staging/lustre/lnet/lnet/lib-move.c
+++ b/drivers/staging/lustre/lnet/lnet/lib-move.c
@@ -1407,7 +1407,7 @@ lnet_send(lnet_nid_t src_nid, lnet_msg_t *msg, lnet_nid_t 
rtr_nid)
 
msg->msg_target_is_router = 1;
msg->msg_target.nid = lp->lp_nid;
-   msg->msg_target.pid = LUSTRE_SRV_LNET_PID;
+   msg->msg_target.pid = LNET_PID_LUSTRE;
}
 
/* 'lp' is our best choice of peer */
diff --git a/drivers/staging/lustre/lnet/lnet/module.c 
b/drivers/staging/lustre/lnet/lnet/module.c
index e9b1e69..8f053d7 100644
--- a/drivers/staging/lustre/lnet/lnet/module.c
+++ b/drivers/staging/lustre/lnet/lnet/module.c
@@ -53,7 +53,7 @@ lnet_configure(void *arg)
mutex_lock(&lnet_config_mutex);
 
if (!the_lnet.ln_niinit_self) {
-   rc = LNetNIInit(LUSTRE_SRV_LNET_PID);
+   rc = LNetNIInit(LNET_PID_LUSTRE);
if (rc >= 0) {
the_lnet.ln_niinit_self = 1;
rc = 0;
@@ -99,7 +99,

[PATCH 02/24] staging: lustre: Dynamic LNet Configuration (DLC) show command

2016-02-22 Thread James Simmons

From: Amir Shehata 

This is the fifth patch of a set of patches that enables DLC.

This patch adds the new structures which will be used
in the IOCTL communication.  It also added a set of
show operations to show buffers, networks, statistics
and peer information.

Signed-off-by: Amir Shehata 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2456
Reviewed-on: http://review.whamcloud.com/8022
Reviewed-by: John L. Hammond 
Reviewed-by: Doug Oucharek 
Reviewed-by: James Simmons 
Reviewed-by: Oleg Drokin 
---
 .../lustre/include/linux/libcfs/libcfs_ioctl.h |   26 -
 .../staging/lustre/include/linux/lnet/lib-dlc.h|  122 
 .../staging/lustre/include/linux/lnet/lib-lnet.h   |5 +
 drivers/staging/lustre/lnet/lnet/api-ni.c  |   56 -
 drivers/staging/lustre/lnet/lnet/module.c  |4 +
 drivers/staging/lustre/lnet/lnet/peer.c|   61 ++
 .../lustre/lustre/libcfs/linux/linux-module.c  |3 +-
 drivers/staging/lustre/lustre/libcfs/module.c  |   15 ++-
 8 files changed, 277 insertions(+), 15 deletions(-)
 create mode 100644 drivers/staging/lustre/include/linux/lnet/lib-dlc.h

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_ioctl.h 
b/drivers/staging/lustre/include/linux/libcfs/libcfs_ioctl.h
index 0598702..f788631 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs_ioctl.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_ioctl.h
@@ -41,7 +41,8 @@
 #ifndef __LIBCFS_IOCTL_H__
 #define __LIBCFS_IOCTL_H__
 
-#define LIBCFS_IOCTL_VERSION 0x0001000a
+#define LIBCFS_IOCTL_VERSION   0x0001000a
+#define LIBCFS_IOCTL_VERSION2  0x0001000b
 
 struct libcfs_ioctl_hdr {
__u32 ioc_len;
@@ -111,9 +112,6 @@ struct libcfs_ioctl_handler {
 /* lnet ioctls */
 #define IOC_LIBCFS_GET_NI_IOWR('e', 50, long)
 #define IOC_LIBCFS_FAIL_NID_IOWR('e', 51, long)
-#define IOC_LIBCFS_ADD_ROUTE  _IOWR('e', 52, long)
-#define IOC_LIBCFS_DEL_ROUTE  _IOWR('e', 53, long)
-#define IOC_LIBCFS_GET_ROUTE  _IOWR('e', 54, long)
 #define IOC_LIBCFS_NOTIFY_ROUTER  _IOWR('e', 55, long)
 #define IOC_LIBCFS_UNCONFIGURE  _IOWR('e', 56, long)
 /* #define IOC_LIBCFS_PORTALS_COMPATIBILITY   _IOWR('e', 57, long) */
@@ -136,7 +134,25 @@ struct libcfs_ioctl_handler {
 #define IOC_LIBCFS_DEL_INTERFACE  _IOWR('e', 79, long)
 #define IOC_LIBCFS_GET_INTERFACE  _IOWR('e', 80, long)
 
-#define IOC_LIBCFS_MAX_NR   80
+/*
+ * DLC Specific IOCTL numbers.
+ * In order to maintain backward compatibility with any possible external
+ * tools which might be accessing the IOCTL numbers, a new group of IOCTL
+ * number have been allocated.
+ */
+#define IOCTL_CONFIG_SIZE  struct lnet_ioctl_config_data
+#define IOC_LIBCFS_ADD_ROUTE   _IOWR(IOC_LIBCFS_TYPE, 81, 
IOCTL_CONFIG_SIZE)
+#define IOC_LIBCFS_DEL_ROUTE   _IOWR(IOC_LIBCFS_TYPE, 82, 
IOCTL_CONFIG_SIZE)
+#define IOC_LIBCFS_GET_ROUTE   _IOWR(IOC_LIBCFS_TYPE, 83, 
IOCTL_CONFIG_SIZE)
+#define IOC_LIBCFS_ADD_NET _IOWR(IOC_LIBCFS_TYPE, 84, 
IOCTL_CONFIG_SIZE)
+#define IOC_LIBCFS_DEL_NET _IOWR(IOC_LIBCFS_TYPE, 85, 
IOCTL_CONFIG_SIZE)
+#define IOC_LIBCFS_GET_NET _IOWR(IOC_LIBCFS_TYPE, 86, 
IOCTL_CONFIG_SIZE)
+#define IOC_LIBCFS_CONFIG_RTR  _IOWR(IOC_LIBCFS_TYPE, 87, 
IOCTL_CONFIG_SIZE)
+#define IOC_LIBCFS_ADD_BUF _IOWR(IOC_LIBCFS_TYPE, 88, 
IOCTL_CONFIG_SIZE)
+#define IOC_LIBCFS_GET_BUF _IOWR(IOC_LIBCFS_TYPE, 89, 
IOCTL_CONFIG_SIZE)
+#define IOC_LIBCFS_GET_PEER_INFO   _IOWR(IOC_LIBCFS_TYPE, 90, 
IOCTL_CONFIG_SIZE)
+#define IOC_LIBCFS_GET_LNET_STATS  _IOWR(IOC_LIBCFS_TYPE, 91, 
IOCTL_CONFIG_SIZE)
+#define IOC_LIBCFS_MAX_NR  91
 
 static inline int libcfs_ioctl_packlen(struct libcfs_ioctl_data *data)
 {
diff --git a/drivers/staging/lustre/include/linux/lnet/lib-dlc.h 
b/drivers/staging/lustre/include/linux/lnet/lib-dlc.h
new file mode 100644
index 000..84a19e9
--- /dev/null
+++ b/drivers/staging/lustre/include/linux/lnet/lib-dlc.h
@@ -0,0 +1,122 @@
+/*
+ * LGPL HEADER START
+ *
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library.
+ *
+ * LGPL HEADER END
+ *
+ */
+/*
+ * Copyright (c) 2014, Intel Corporation.
+ */
+/*
+ * Author: Amir Shehat

Re: [PATCH 0/8] X.509: Software public key subtype changes

2016-02-22 Thread David Howells

Mimi Zohar  wrote:

> >  (1) - (3) These are Tadeusz's RSA akcipher conversion.
> 
> Up to here, IMA-appraisal works properly.

I don't have IMA set up anywhere.

David

[patch V3 04/28] x86/perf/intel/uncore: Add sanity checks for pci dev package id

2016-02-22 Thread Thomas Gleixner

The storage array is size limited, but misses a sanity check

Signed-off-by: Thomas Gleixner 
---
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
===
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -882,7 +882,7 @@ static int uncore_pci_probe(struct pci_d
int phys_id, ret;
 
phys_id = uncore_pcibus_to_physid(pdev->bus);
-   if (phys_id < 0)
+   if (phys_id < 0 || phys_id >= UNCORE_SOCKET_MAX)
return -ENODEV;
 
if (UNCORE_PCI_DEV_TYPE(id->driver_data) == UNCORE_EXTRA_PCI_DEV) {

[patch V3 05/28] x86/perf/intel_uncore: Cleanup hardware on exit

2016-02-22 Thread Thomas Gleixner

When tearing down the boxes nothing undoes the hardware state which was setup
by box->init_box(). Add a box->exit_box() callback and implement it for the
uncores which have an init_box() callback.

This misses the cleanup in the error exit pathes, but I cannot be bothered to
implement it before cleaning up the rest of the driver, which makes that task
way simpler.

Signed-off-by: Thomas Gleixner 
---
V2: Removed the code which affects IVB and newer as Liang pointed out that
it's incorrect
---
 arch/x86/kernel/cpu/perf_event_intel_uncore.c   |6 +-
 arch/x86/kernel/cpu/perf_event_intel_uncore.h   |9 +
 arch/x86/kernel/cpu/perf_event_intel_uncore_nhmex.c |6 ++
 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c   |   13 +
 4 files changed, 33 insertions(+), 1 deletion(-)

--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -937,6 +937,7 @@ static int uncore_pci_probe(struct pci_d
raw_spin_lock(&uncore_box_lock);
list_del(&box->list);
raw_spin_unlock(&uncore_box_lock);
+   uncore_box_exit(box);
kfree(box);
}
return ret;
@@ -982,6 +983,7 @@ static void uncore_pci_remove(struct pci
}
 
WARN_ON_ONCE(atomic_read(&box->refcnt) != 1);
+   uncore_box_exit(box);
kfree(box);
 
if (last_box)
@@ -1091,8 +1093,10 @@ static void uncore_cpu_dying(int cpu)
pmu = &type->pmus[j];
box = *per_cpu_ptr(pmu->box, cpu);
*per_cpu_ptr(pmu->box, cpu) = NULL;
-   if (box && atomic_dec_and_test(&box->refcnt))
+   if (box && atomic_dec_and_test(&box->refcnt)) {
list_add(&box->list, &boxes_to_free);
+   uncore_box_exit(box);
+   }
}
}
 }
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.h
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
@@ -61,6 +61,7 @@ struct intel_uncore_type {
 
 struct intel_uncore_ops {
void (*init_box)(struct intel_uncore_box *);
+   void (*exit_box)(struct intel_uncore_box *);
void (*disable_box)(struct intel_uncore_box *);
void (*enable_box)(struct intel_uncore_box *);
void (*disable_event)(struct intel_uncore_box *, struct perf_event *);
@@ -306,6 +307,14 @@ static inline void uncore_box_init(struc
}
 }
 
+static inline void uncore_box_exit(struct intel_uncore_box *box)
+{
+   if (test_and_clear_bit(UNCORE_BOX_FLAG_INITIATED, &box->flags)) {
+   if (box->pmu->type->ops->exit_box)
+   box->pmu->type->ops->exit_box(box);
+   }
+}
+
 static inline bool uncore_box_is_fake(struct intel_uncore_box *box)
 {
return (box->phys_id < 0);
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore_nhmex.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore_nhmex.c
@@ -201,6 +201,11 @@ static void nhmex_uncore_msr_init_box(st
wrmsrl(NHMEX_U_MSR_PMON_GLOBAL_CTL, NHMEX_U_PMON_GLOBAL_EN_ALL);
 }
 
+static void nhmex_uncore_msr_exit_box(struct intel_uncore_box *box)
+{
+   wrmsrl(NHMEX_U_MSR_PMON_GLOBAL_CTL, 0);
+}
+
 static void nhmex_uncore_msr_disable_box(struct intel_uncore_box *box)
 {
unsigned msr = uncore_msr_box_ctl(box);
@@ -250,6 +255,7 @@ static void nhmex_uncore_msr_enable_even
 
 #define NHMEX_UNCORE_OPS_COMMON_INIT() \
.init_box   = nhmex_uncore_msr_init_box,\
+   .exit_box   = nhmex_uncore_msr_exit_box,\
.disable_box= nhmex_uncore_msr_disable_box, \
.enable_box = nhmex_uncore_msr_enable_box,  \
.disable_event  = nhmex_uncore_msr_disable_event,   \
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c
@@ -95,6 +95,12 @@ static void snb_uncore_msr_init_box(stru
}
 }
 
+static void snb_uncore_msr_exit_box(struct intel_uncore_box *box)
+{
+   if (box->pmu->pmu_idx == 0)
+   wrmsrl(SNB_UNC_PERF_GLOBAL_CTL, 0);
+}
+
 static struct uncore_event_desc snb_uncore_events[] = {
INTEL_UNCORE_EVENT_DESC(clockticks, "event=0xff,umask=0x00"),
{ /* end: all zeroes */ },
@@ -116,6 +122,7 @@ static struct attribute_group snb_uncore
 
 static struct intel_uncore_ops snb_uncore_msr_ops = {
.init_box   = snb_uncore_msr_init_box,
+   .exit_box   = snb_uncore_msr_exit_box,
.disable_event  = snb_uncore_msr_disable_event,
.enable_event   = snb_uncore_msr_enable_event,
.read_counter   = uncore_msr_read_counter,
@@ -231,6 +238,11 @@ static void snb_uncore_imc_init_box(stru
box->hrtimer_duration = UNCORE_SNB_IMC_HRTIMER_INTERVAL;
 }
 
+static void snb_uncore_imc_exit_box(struct intel_uncore_box *box)
+{
+   iounmap(box->io_addr);
+}

[patch V3 01/28] x86/perf/intel/uncore: Remove pointless mask check

2016-02-22 Thread Thomas Gleixner

uncore_cpumask_init() is only ever called from intel_uncore_init() where the
mask is guaranteed to be empty.

Signed-off-by: Thomas Gleixner 
---
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |6 --
 1 file changed, 6 deletions(-)

--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -1342,12 +1342,6 @@ static void __init uncore_cpumask_init(v
 {
int cpu;
 
-   /*
-* ony invoke once from msr or pci init code
-*/
-   if (!cpumask_empty(&uncore_cpu_mask))
-   return;
-
cpu_notifier_register_begin();
 
for_each_online_cpu(cpu) {

Re: [PATCH 4/8] akcipher: Move the RSA DER encoding to the crypto layer

2016-02-22 Thread David Howells

Tadeusz Struk  wrote:

> I wonder if this should be merged with the crypto/rsa-pkcs1pad.c template
> that we already have. Looks like the two do the same padding now.
> Should we merge then and pass the hash param as a separate template param,
> e.g the public_key would allocate "pkcs1pad(rsa, sha1)"?

Ummm...  Possibly.  Is that how it's used?

warthog>git grep pkcs1pad -- Documentation
warthog1>

Anyway, the problem I have with this is that I want to get that knowledge out
of the asymmetric key in-software public key subtype.  It knows "rsa", "dsa",
"ecdsa", ... because that's all the OIDs tell it.

I guess if I have to, I can stoop to converting "rsa" to "pkcs1pad(rsa, sha1)".

Can you do me a really quick merge?  -rc5 is already out, and I want to get it
to James pronto - plus I have things that are pending on this change being
made.

Oh - and how does the padding template find the algorithm DER encoding string
to use?  I have wondered whether it should be stored in with the hash
algorithm, but it probably makes more sense to keep it with the rsa module.

David

[patch V3 07/28] x86/perf/intel_uncore: Make code readable

2016-02-22 Thread Thomas Gleixner

Cleanup the code a bit before reworking it completely.

Signed-off-by: Thomas Gleixner 
---
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |   95 +-
 1 file changed, 50 insertions(+), 45 deletions(-)

Index: b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
===
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -217,7 +217,8 @@ u64 uncore_shared_reg_config(struct inte
return config;
 }
 
-static void uncore_assign_hw_event(struct intel_uncore_box *box, struct 
perf_event *event, int idx)
+static void uncore_assign_hw_event(struct intel_uncore_box *box,
+  struct perf_event *event, int idx)
 {
struct hw_perf_event *hwc = &event->hw;
 
@@ -312,18 +313,19 @@ static void uncore_pmu_init_hrtimer(stru
box->hrtimer.function = uncore_pmu_hrtimer;
 }
 
-static struct intel_uncore_box *uncore_alloc_box(struct intel_uncore_type 
*type, int node)
+static struct intel_uncore_box *uncore_alloc_box(struct intel_uncore_type 
*type,
+int node)
 {
+   int i, size, numshared = type->num_shared_regs ;
struct intel_uncore_box *box;
-   int i, size;
 
-   size = sizeof(*box) + type->num_shared_regs * sizeof(struct 
intel_uncore_extra_reg);
+   size = sizeof(*box) + numshared * sizeof(struct intel_uncore_extra_reg);
 
box = kzalloc_node(size, GFP_KERNEL, node);
if (!box)
return NULL;
 
-   for (i = 0; i < type->num_shared_regs; i++)
+   for (i = 0; i < numshared; i++)
raw_spin_lock_init(&box->shared_regs[i].lock);
 
uncore_pmu_init_hrtimer(box);
@@ -351,7 +353,8 @@ static bool is_uncore_event(struct perf_
 }
 
 static int
-uncore_collect_events(struct intel_uncore_box *box, struct perf_event *leader, 
bool dogrp)
+uncore_collect_events(struct intel_uncore_box *box, struct perf_event *leader,
+ bool dogrp)
 {
struct perf_event *event;
int n, max_count;
@@ -412,7 +415,8 @@ uncore_get_event_constraint(struct intel
return &type->unconstrainted;
 }
 
-static void uncore_put_event_constraint(struct intel_uncore_box *box, struct 
perf_event *event)
+static void uncore_put_event_constraint(struct intel_uncore_box *box,
+   struct perf_event *event)
 {
if (box->pmu->type->ops->put_constraint)
box->pmu->type->ops->put_constraint(box, event);
@@ -592,7 +596,7 @@ static void uncore_pmu_event_del(struct
if (event == box->event_list[i]) {
uncore_put_event_constraint(box, event);
 
-   while (++i < box->n_events)
+   for (++i; i < box->n_events; i++)
box->event_list[i - 1] = box->event_list[i];
 
--box->n_events;
@@ -801,10 +805,8 @@ static void __init uncore_type_exit(stru
 
 static void __init uncore_types_exit(struct intel_uncore_type **types)
 {
-   int i;
-
-   for (i = 0; types[i]; i++)
-   uncore_type_exit(types[i]);
+   for (; *types; types++)
+   uncore_type_exit(*types);
 }
 
 static int __init uncore_type_init(struct intel_uncore_type *type)
@@ -908,9 +910,11 @@ static int uncore_pci_probe(struct pci_d
 * some device types. Hence PCI device idx would be 0 for all devices.
 * So increment pmu pointer to point to an unused array element.
 */
-   if (boot_cpu_data.x86_model == 87)
+   if (boot_cpu_data.x86_model == 87) {
while (pmu->func_id >= 0)
pmu++;
+   }
+
if (pmu->func_id < 0)
pmu->func_id = pdev->devfn;
else
@@ -1170,44 +1174,45 @@ static int uncore_cpu_prepare(int cpu, i
return 0;
 }
 
-static void
-uncore_change_context(struct intel_uncore_type **uncores, int old_cpu, int 
new_cpu)
+static void uncore_change_type_ctx(struct intel_uncore_type *type, int old_cpu,
+  int new_cpu)
 {
-   struct intel_uncore_type *type;
-   struct intel_uncore_pmu *pmu;
+   struct intel_uncore_pmu *pmu = type->pmus;
struct intel_uncore_box *box;
-   int i, j;
-
-   for (i = 0; uncores[i]; i++) {
-   type = uncores[i];
-   for (j = 0; j < type->num_boxes; j++) {
-   pmu = &type->pmus[j];
-   if (old_cpu < 0)
-   box = uncore_pmu_to_box(pmu, new_cpu);
-   else
-   box = uncore_pmu_to_box(pmu, old_cpu);
-   if (!box)
-   continue;
+   int i;
 
-   if (old_cpu < 0) {
-   WARN_ON_ONCE(box->cpu != -1);
-   box->cpu = new

[patch V3 06/28] x86/perf/intel/uncore: Drop pointless hotplug priority

2016-02-22 Thread Thomas Gleixner

The perf core doesn't do anything related to hardware pmus on hotplug. Drop
the priority.

Signed-off-by: Thomas Gleixner 
---
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |5 -
 1 file changed, 5 deletions(-)

--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -1294,11 +1294,6 @@ static int uncore_cpu_notifier(struct no
 
 static struct notifier_block uncore_cpu_nb = {
.notifier_call  = uncore_cpu_notifier,
-   /*
-* to migrate uncore events, our notifier should be executed
-* before perf core's notifier.
-*/
-   .priority   = CPU_PRI_PERF + 1,
 };
 
 static int __init type_pmu_register(struct intel_uncore_type *type)

Re: [PATCH] intel-hid: allocate correct amount of memory for private struct

2016-02-22 Thread Darren Hart

On Sun, Feb 21, 2016 at 03:22:27PM +0100, Wolfram Sang wrote:
> We want the size of the struct, not of a pointer to it. To be future
> proof, just dereference the pointer to get the desired type.
> 
> Signed-off-by: Wolfram Sang 

Queued to testing. Thank you Wolfram.

> ---
> 
> Compile tested only. Found by static code analysis.
> 
>  drivers/platform/x86/intel-hid.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/platform/x86/intel-hid.c 
> b/drivers/platform/x86/intel-hid.c
> index e20f23e04c24ce..f93abc8c1424ad 100644
> --- a/drivers/platform/x86/intel-hid.c
> +++ b/drivers/platform/x86/intel-hid.c
> @@ -180,8 +180,7 @@ static int intel_hid_probe(struct platform_device *device)
>   return -ENODEV;
>   }
>  
> - priv = devm_kzalloc(&device->dev,
> - sizeof(struct intel_hid_priv *), GFP_KERNEL);
> + priv = devm_kzalloc(&device->dev, sizeof(*priv), GFP_KERNEL);
>   if (!priv)
>   return -ENOMEM;
>   dev_set_drvdata(&device->dev, priv);
> -- 
> 2.7.0
> 
> 

-- 
Darren Hart
Intel Open Source Technology Center

[patch V3 02/28] x86/perf/intel/uncore: Simplify error rollback

2016-02-22 Thread Thomas Gleixner

No point in doing partial rollbacks. Robustify uncore_exit_type() so it does
not dereference type->pmus unconditionally and remove all the partial rollback
hackery.

Preparatory patch for proper error handling.

Signed-off-by: Thomas Gleixner 
---
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |   45 +-
 1 file changed, 24 insertions(+), 21 deletions(-)

--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -767,10 +767,12 @@ static void __init uncore_type_exit(stru
 {
int i;
 
-   for (i = 0; i < type->num_boxes; i++)
-   free_percpu(type->pmus[i].box);
-   kfree(type->pmus);
-   type->pmus = NULL;
+   if (type->pmus) {
+   for (i = 0; i < type->num_boxes; i++)
+   free_percpu(type->pmus[i].box);
+   kfree(type->pmus);
+   type->pmus = NULL;
+   }
kfree(type->events_group);
type->events_group = NULL;
 }
@@ -778,6 +780,7 @@ static void __init uncore_type_exit(stru
 static void __init uncore_types_exit(struct intel_uncore_type **types)
 {
int i;
+
for (i = 0; types[i]; i++)
uncore_type_exit(types[i]);
 }
@@ -806,7 +809,7 @@ static int __init uncore_type_init(struc
INIT_LIST_HEAD(&pmus[i].box_list);
pmus[i].box = alloc_percpu(struct intel_uncore_box *);
if (!pmus[i].box)
-   goto fail;
+   return -ENOMEM;
}
 
if (type->event_descs) {
@@ -817,7 +820,7 @@ static int __init uncore_type_init(struc
attr_group = kzalloc(sizeof(struct attribute *) * (i + 1) +
sizeof(*attr_group), GFP_KERNEL);
if (!attr_group)
-   goto fail;
+   return -ENOMEM;
 
attrs = (struct attribute **)(attr_group + 1);
attr_group->name = "events";
@@ -831,9 +834,6 @@ static int __init uncore_type_init(struc
 
type->pmu_group = &uncore_pmu_attr_group;
return 0;
-fail:
-   uncore_type_exit(type);
-   return -ENOMEM;
 }
 
 static int __init uncore_types_init(struct intel_uncore_type **types)
@@ -843,13 +843,9 @@ static int __init uncore_types_init(stru
for (i = 0; types[i]; i++) {
ret = uncore_type_init(types[i]);
if (ret)
-   goto fail;
+   return ret;
}
return 0;
-fail:
-   while (--i >= 0)
-   uncore_type_exit(types[i]);
-   return ret;
 }
 
 /*
@@ -1007,17 +1003,21 @@ static int __init uncore_pci_init(void)
 
ret = uncore_types_init(uncore_pci_uncores);
if (ret)
-   return ret;
+   goto err;
 
uncore_pci_driver->probe = uncore_pci_probe;
uncore_pci_driver->remove = uncore_pci_remove;
 
ret = pci_register_driver(uncore_pci_driver);
-   if (ret == 0)
-   pcidrv_registered = true;
-   else
-   uncore_types_exit(uncore_pci_uncores);
+   if (ret)
+   goto err;
+
+   pcidrv_registered = true;
+   return 0;
 
+err:
+   uncore_types_exit(uncore_pci_uncores);
+   uncore_pci_uncores = empty_uncore;
return ret;
 }
 
@@ -1316,9 +1316,12 @@ static int __init uncore_cpu_init(void)
 
ret = uncore_types_init(uncore_msr_uncores);
if (ret)
-   return ret;
-
+   goto err;
return 0;
+err:
+   uncore_types_exit(uncore_msr_uncores);
+   uncore_msr_uncores = empty_uncore;
+   return ret;
 }
 
 static int __init uncore_pmus_register(void)

[patch V3 00/28] x86/perf/intel/uncore|rapl: Fix error handling and sanitize pmu management

2016-02-22 Thread Thomas Gleixner

This series addresses the following issues:

 - Add proper error handling to uncore and rapl drivers

 - Get rid of the pseudo per cpuness of these drivers and do a proper per
   package storage

 - Allow them to be modular

In order to do proper per package storage I added a facility which sanity
checks the physical package id of the processors which is supplied by bios and
does a logical package id translation. That allows drivers to do allocations
for the maximum number of possible packages independent of possible BIOS
creativity.

The module patches are optional. Andi pointed out that they miss the proper
auto loading/probing machinery, but I kept them and let Peter decide what to
do with them.

Changes vs. V2:

  - Change export to GPL

  - Fix the leftover while loop

  - Fix the snb pmu_private bogosity

  - Explain in the changelogs why we want to keep pmu_private

  - Drop the proc/cpuinfo change

  - Add an explanation to the Kconfig help text why selecting 'm' might break
existing setups.

Delta patch below.

Thanks,

tglx
---

--- a/arch/x86/Kconfig.perf
+++ b/arch/x86/Kconfig.perf
@@ -8,6 +8,9 @@ config PERF_EVENTS_INTEL_UNCORE
  Include support for Intel uncore performance events. These are
  available on NehalemEX and more modern processors.
 
+ Note: Selecting 'm' might break existing setups as the drivers
+ lack the autoprobe/load magic. If you need them select: y.
+
  If unsure say y.
 
 config PERF_EVENTS_INTEL_RAPL
@@ -18,6 +21,9 @@ config PERF_EVENTS_INTEL_RAPL
  Include support for Intel rapl performance events for power
  monitoring on modern processors.
 
+ Note: Selecting 'm' might break existing setups as the drivers
+ lack the autoprobe/load magic. If you need them select: y.
+
  If unsure say y.
 
 endmenu
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -774,9 +774,9 @@ static void __uncore_exit_boxes(struct i
 
 static void uncore_exit_boxes(void *dummy)
 {
-   struct intel_uncore_type **types = uncore_msr_uncores;
+   struct intel_uncore_type **types;
 
-   while (*types)
+   for (types = uncore_msr_uncores; *types; types++)
__uncore_exit_boxes(*types++, smp_processor_id());
 }
 
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c
@@ -313,7 +313,7 @@ static int snb_uncore_imc_event_init(str
return -EINVAL;
 
event->cpu = box->cpu;
-   event->pmu_private = pmu;
+   event->pmu_private = box;
 
event->hw.idx = -1;
event->hw.last_tag = ~0ULL;
--- a/arch/x86/kernel/cpu/proc.c
+++ b/arch/x86/kernel/cpu/proc.c
@@ -12,7 +12,6 @@ static void show_cpuinfo_core(struct seq
 {
 #ifdef CONFIG_SMP
seq_printf(m, "physical id\t: %d\n", c->phys_proc_id);
-   seq_printf(m, "logical id\t: %d\n", c->logical_proc_id);
seq_printf(m, "siblings\t: %d\n",
   cpumask_weight(topology_core_cpumask(cpu)));
seq_printf(m, "core id\t\t: %d\n", c->cpu_core_id);
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9335,7 +9335,7 @@ ssize_t perf_event_sysfs_show(struct dev
 
return 0;
 }
-EXPORT_SYMBOL(perf_event_sysfs_show);
+EXPORT_SYMBOL_GPL(perf_event_sysfs_show);
 
 static int __init perf_event_sysfs_init(void)
 {

[patch V3 08/28] x86/perf/uncore: Make uncore_pcibus_to_physid static

2016-02-22 Thread Thomas Gleixner

No users outside of this file.

Signed-off-by: Thomas Gleixner 
---
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |2 +-
 arch/x86/kernel/cpu/perf_event_intel_uncore.h |1 -
 2 files changed, 1 insertion(+), 2 deletions(-)

Index: b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
===
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -21,7 +21,7 @@ static struct event_constraint uncore_co
 struct event_constraint uncore_constraint_empty =
EVENT_CONSTRAINT(0, 0, 0);
 
-int uncore_pcibus_to_physid(struct pci_bus *bus)
+static int uncore_pcibus_to_physid(struct pci_bus *bus)
 {
struct pci2phy_map *map;
int phys_id = -1;
Index: b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
===
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.h
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
@@ -125,7 +125,6 @@ struct pci2phy_map {
int pbus_to_physid[256];
 };
 
-int uncore_pcibus_to_physid(struct pci_bus *bus);
 struct pci2phy_map *__find_pci2phy_map(int segment);
 
 ssize_t uncore_event_show(struct kobject *kobj,

[patch V3 11/28] x86/topology: Create logical package id

2016-02-22 Thread Thomas Gleixner

For per package oriented services we must be able to rely on the number of cpu
packages to be within bounds. Create a tracking facility, which

- calculates the number of possible packages depending on nr_cpu_ids after
  boot

- makes sure that the package id is within the number of possible packages. If
  the apic id is outside we map it to a logical package id if there is enough
  space available.

Provide interfaces for drivers to query the mapping and do translations from
physcial to logical ids.

Signed-off-by: Thomas Gleixner 
---
 arch/x86/include/asm/processor.h |2 
 arch/x86/include/asm/topology.h  |   10 +++
 arch/x86/kernel/apic/apic.c  |   14 +
 arch/x86/kernel/cpu/common.c |2 
 arch/x86/kernel/cpu/intel.c  |   13 +
 arch/x86/kernel/smpboot.c|  100 +++
 6 files changed, 141 insertions(+)

--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -129,6 +129,8 @@ struct cpuinfo_x86 {
u16 booted_cores;
/* Physical processor id: */
u16 phys_proc_id;
+   /* Logical processor id: */
+   u16 logical_proc_id;
/* Core id: */
u16 cpu_core_id;
/* Compute unit id */
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -119,12 +119,22 @@ static inline void setup_node_to_cpumask
 
 extern const struct cpumask *cpu_coregroup_mask(int cpu);
 
+#define topology_logical_package_id(cpu)   (cpu_data(cpu).logical_proc_id)
 #define topology_physical_package_id(cpu)  (cpu_data(cpu).phys_proc_id)
 #define topology_core_id(cpu)  (cpu_data(cpu).cpu_core_id)
 
 #ifdef ENABLE_TOPO_DEFINES
 #define topology_core_cpumask(cpu) (per_cpu(cpu_core_map, cpu))
 #define topology_sibling_cpumask(cpu)  (per_cpu(cpu_sibling_map, cpu))
+
+extern unsigned int __max_logical_packages;
+#define topology_max_packages()(__max_logical_packages)
+int topology_update_package_map(unsigned int apicid, unsigned int cpu);
+extern int topology_phys_to_logical_pkg(unsigned int pkg);
+#else
+#define topology_max_packages()(1)
+static inline int
+topology_update_package_map(unsigned int apicid, unsigned int cpu) { return 0; 
}
 #endif
 
 static inline void arch_fix_phys_package_id(int num, u32 slot)
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2078,6 +2078,20 @@ int generic_processor_info(int apicid, i
cpu = cpumask_next_zero(-1, cpu_present_mask);
 
/*
+* This can happen on physical hotplug. The sanity check at boot time
+* is done from native_smp_prepare_cpus() after num_possible_cpus() is
+* established.
+*/
+   if (topology_update_package_map(apicid, cpu) < 0) {
+   int thiscpu = max + disabled_cpus;
+
+   pr_warning("ACPI: Package limit reached. Processor %d/0x%x 
ignored.\n",
+  thiscpu, apicid);
+   disabled_cpus++;
+   return -ENOSPC;
+   }
+
+   /*
 * Validate version
 */
if (version == 0x0) {
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -977,6 +977,8 @@ static void identify_cpu(struct cpuinfo_
 #ifdef CONFIG_NUMA
numa_add_cpu(smp_processor_id());
 #endif
+   /* The boot/hotplug time assigment got cleared, restore it */
+   c->logical_proc_id = topology_phys_to_logical_pkg(c->phys_proc_id);
 }
 
 /*
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -160,6 +160,19 @@ static void early_init_intel(struct cpui
pr_info("Disabling PGE capability bit\n");
setup_clear_cpu_cap(X86_FEATURE_PGE);
}
+
+   if (c->cpuid_level >= 0x0001) {
+   u32 eax, ebx, ecx, edx;
+
+   cpuid(0x0001, &eax, &ebx, &ecx, &edx);
+   /*
+* If HTT (EDX[28]) is set EBX[16:23] contain the number of
+* apicids which are reserved per package. Store the resulting
+* shift value for the package management code.
+*/
+   if (edx & (1U << 28))
+   c->x86_coreid_bits = get_count_order((ebx >> 16) & 
0xff);
+   }
 }
 
 #ifdef CONFIG_X86_32
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -97,6 +97,14 @@ DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t
 DEFINE_PER_CPU_READ_MOSTLY(struct cpuinfo_x86, cpu_info);
 EXPORT_PER_CPU_SYMBOL(cpu_info);
 
+/* Logical package management. We might want to allocate that dynamically */
+static int *physical_to_logical_pkg __read_mostly;
+static unsigned long *physical_package_map __read_mostly;;
+static unsigned long *logical_package_map  __read_mostly;
+static unsigned int max_physical_pkg_id __read_mostly;
+unsigned int __max_logical_packages __read_mostly;
+EXP

[patch V3 03/28] x86/perf/intel/uncore: Fix error handling

2016-02-22 Thread Thomas Gleixner

This driver lacks any form of proper error handling. If initialization fails
or hotplug prepare fails, it lets the facility with half initialized stuff
around.

Fix the state and memory leaks in a first step. As a second step we need to
undo the hardware state which is set via uncore_box_init() on some of the
uncore implementations.

Signed-off-by: Thomas Gleixner 
---
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |  122 ++
 arch/x86/kernel/cpu/perf_event_intel_uncore.h |   15 +--
 2 files changed, 94 insertions(+), 43 deletions(-)

Index: b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
===
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -38,6 +38,16 @@ int uncore_pcibus_to_physid(struct pci_b
return phys_id;
 }
 
+static void uncore_free_pcibus_map(void)
+{
+   struct pci2phy_map *map, *tmp;
+
+   list_for_each_entry_safe(map, tmp, &pci2phy_map_head, list) {
+   list_del(&map->list);
+   kfree(map);
+   }
+}
+
 struct pci2phy_map *__find_pci2phy_map(int segment)
 {
struct pci2phy_map *map, *alloc = NULL;
@@ -760,16 +770,28 @@ static int uncore_pmu_register(struct in
}
 
ret = perf_pmu_register(&pmu->pmu, pmu->name, -1);
+   if (!ret)
+   pmu->registered = true;
return ret;
 }
 
+static void uncore_pmu_unregister(struct intel_uncore_pmu *pmu)
+{
+   if (!pmu->registered)
+   return;
+   perf_pmu_unregister(&pmu->pmu);
+   pmu->registered = false;
+}
+
 static void __init uncore_type_exit(struct intel_uncore_type *type)
 {
int i;
 
if (type->pmus) {
-   for (i = 0; i < type->num_boxes; i++)
+   for (i = 0; i < type->num_boxes; i++) {
+   uncore_pmu_unregister(&type->pmus[i]);
free_percpu(type->pmus[i].box);
+   }
kfree(type->pmus);
type->pmus = NULL;
}
@@ -856,8 +878,8 @@ static int uncore_pci_probe(struct pci_d
struct intel_uncore_pmu *pmu;
struct intel_uncore_box *box;
struct intel_uncore_type *type;
-   int phys_id;
bool first_box = false;
+   int phys_id, ret;
 
phys_id = uncore_pcibus_to_physid(pdev->bus);
if (phys_id < 0)
@@ -906,9 +928,18 @@ static int uncore_pci_probe(struct pci_d
list_add_tail(&box->list, &pmu->box_list);
raw_spin_unlock(&uncore_box_lock);
 
-   if (first_box)
-   uncore_pmu_register(pmu);
-   return 0;
+   if (!first_box)
+   return 0;
+
+   ret = uncore_pmu_register(pmu);
+   if (ret) {
+   pci_set_drvdata(pdev, NULL);
+   raw_spin_lock(&uncore_box_lock);
+   list_del(&box->list);
+   raw_spin_unlock(&uncore_box_lock);
+   kfree(box);
+   }
+   return ret;
 }
 
 static void uncore_pci_remove(struct pci_dev *pdev)
@@ -954,7 +985,7 @@ static void uncore_pci_remove(struct pci
kfree(box);
 
if (last_box)
-   perf_pmu_unregister(&pmu->pmu);
+   uncore_pmu_unregister(pmu);
 }
 
 static int __init uncore_pci_init(void)
@@ -1018,6 +1049,7 @@ static int __init uncore_pci_init(void)
 err:
uncore_types_exit(uncore_pci_uncores);
uncore_pci_uncores = empty_uncore;
+   uncore_free_pcibus_map();
return ret;
 }
 
@@ -1027,6 +1059,7 @@ static void __init uncore_pci_exit(void)
pcidrv_registered = false;
pci_unregister_driver(uncore_pci_driver);
uncore_types_exit(uncore_pci_uncores);
+   uncore_free_pcibus_map();
}
 }
 
@@ -1223,8 +1256,7 @@ static int uncore_cpu_notifier(struct no
/* allocate/free data structure for uncore box */
switch (action & ~CPU_TASKS_FROZEN) {
case CPU_UP_PREPARE:
-   uncore_cpu_prepare(cpu, -1);
-   break;
+   return notifier_from_errno(uncore_cpu_prepare(cpu, -1));
case CPU_STARTING:
uncore_cpu_starting(cpu);
break;
@@ -1265,9 +1297,29 @@ static struct notifier_block uncore_cpu_
.priority   = CPU_PRI_PERF + 1,
 };
 
-static void __init uncore_cpu_setup(void *dummy)
+static int __init type_pmu_register(struct intel_uncore_type *type)
 {
-   uncore_cpu_starting(smp_processor_id());
+   int i, ret;
+
+   for (i = 0; i < type->num_boxes; i++) {
+   ret = uncore_pmu_register(&type->pmus[i]);
+   if (ret)
+   return ret;
+   }
+   return 0;
+}
+
+static int __init uncore_msr_pmus_register(void)
+{
+   struct intel_uncore_type **types = uncore_msr_uncores;
+   int ret;
+
+   while (*types) {
+   ret = type_pmu_register(*types++);
+   if (ret)
+

[patch V3 10/28] x86/perf/intel_uncore: Store box in event->pmu_private

2016-02-22 Thread Thomas Gleixner

Store the pmu in event->pmu_private, so we can get rid of the per cpu data
storage. We keep it after converting to per package data, because a cpu to
package lookup will be 3 loads versus one and these usage sites are the perf
fast pathes.

Signed-off-by: Thomas Gleixner 
---
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |   15 +--
 arch/x86/kernel/cpu/perf_event_intel_uncore.h |   12 ++--
 arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c |1 +
 3 files changed, 12 insertions(+), 16 deletions(-)

--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -92,11 +92,6 @@ ssize_t uncore_event_show(struct kobject
return sprintf(buf, "%s", event->config);
 }
 
-struct intel_uncore_pmu *uncore_event_to_pmu(struct perf_event *event)
-{
-   return container_of(event->pmu, struct intel_uncore_pmu, pmu);
-}
-
 struct intel_uncore_box *uncore_pmu_to_box(struct intel_uncore_pmu *pmu, int 
cpu)
 {
struct intel_uncore_box *box;
@@ -122,15 +117,6 @@ struct intel_uncore_box *uncore_pmu_to_b
return *per_cpu_ptr(pmu->box, cpu);
 }
 
-struct intel_uncore_box *uncore_event_to_box(struct perf_event *event)
-{
-   /*
-* perf core schedules event on the basis of cpu, uncore events are
-* collected by one of the cpus inside a physical package.
-*/
-   return uncore_pmu_to_box(uncore_event_to_pmu(event), 
smp_processor_id());
-}
-
 u64 uncore_msr_read_counter(struct intel_uncore_box *box, struct perf_event 
*event)
 {
u64 count;
@@ -690,6 +676,7 @@ static int uncore_pmu_event_init(struct
if (!box || box->cpu < 0)
return -EINVAL;
event->cpu = box->cpu;
+   event->pmu_private = box;
 
event->hw.idx = -1;
event->hw.last_tag = ~0ULL;
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.h
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
@@ -319,9 +319,17 @@ static inline bool uncore_box_is_fake(st
return (box->phys_id < 0);
 }
 
-struct intel_uncore_pmu *uncore_event_to_pmu(struct perf_event *event);
+static inline struct intel_uncore_pmu *uncore_event_to_pmu(struct perf_event 
*event)
+{
+   return container_of(event->pmu, struct intel_uncore_pmu, pmu);
+}
+
+static inline struct intel_uncore_box *uncore_event_to_box(struct perf_event 
*event)
+{
+   return event->pmu_private;
+}
+
 struct intel_uncore_box *uncore_pmu_to_box(struct intel_uncore_pmu *pmu, int 
cpu);
-struct intel_uncore_box *uncore_event_to_box(struct perf_event *event);
 u64 uncore_msr_read_counter(struct intel_uncore_box *box, struct perf_event 
*event);
 void uncore_pmu_start_hrtimer(struct intel_uncore_box *box);
 void uncore_pmu_cancel_hrtimer(struct intel_uncore_box *box);
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c
@@ -313,6 +313,7 @@ static int snb_uncore_imc_event_init(str
return -EINVAL;
 
event->cpu = box->cpu;
+   event->pmu_private = box;
 
event->hw.idx = -1;
event->hw.last_tag = ~0ULL;

[patch V3 14/28] x86/perf/intel_uncore: Make PCI and MSR uncore independent

2016-02-22 Thread Thomas Gleixner

Andi wanted to do this before, but the patch fell down the cracks. Implement
it with the proper error handling.

Requested-by: Andi Kleen 
Signed-off-by: Thomas Gleixner 
---
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |   34 +-
 1 file changed, 18 insertions(+), 16 deletions(-)

Index: b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
===
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -1023,7 +1023,7 @@ static int __init uncore_pci_init(void)
ret = skl_uncore_pci_init();
break;
default:
-   return 0;
+   return -ENODEV;
}
 
if (ret)
@@ -1319,7 +1319,7 @@ static int __init uncore_cpu_init(void)
knl_uncore_cpu_init();
break;
default:
-   return 0;
+   return -ENODEV;
}
 
ret = uncore_types_init(uncore_msr_uncores, true);
@@ -1344,7 +1344,7 @@ static void __init uncore_cpu_setup(void
 /* Lazy to avoid allocation of a few bytes for the normal case */
 static __initdata DECLARE_BITMAP(packages, MAX_LOCAL_APIC);
 
-static int __init uncore_cpumask_init(void)
+static int __init uncore_cpumask_init(bool msr)
 {
unsigned int cpu;
 
@@ -1355,12 +1355,15 @@ static int __init uncore_cpumask_init(vo
if (test_and_set_bit(pkg, packages))
continue;
/*
-* The first online cpu of each package takes the refcounts
-* for all other online cpus in that package.
+* The first online cpu of each package allocates and takes
+* the refcounts for all other online cpus in that package.
+* If msrs are not enabled no allocation is required.
 */
-   ret = uncore_cpu_prepare(cpu);
-   if (ret)
-   return ret;
+   if (msr) {
+   ret = uncore_cpu_prepare(cpu);
+   if (ret)
+   return ret;
+   }
uncore_event_init_cpu(cpu);
smp_call_function_single(cpu, uncore_cpu_setup, NULL, 1);
}
@@ -1370,7 +1373,7 @@ static int __init uncore_cpumask_init(vo
 
 static int __init intel_uncore_init(void)
 {
-   int ret;
+   int pret, cret, ret;
 
if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
return -ENODEV;
@@ -1380,15 +1383,14 @@ static int __init intel_uncore_init(void
 
max_packages = topology_max_packages();
 
-   ret = uncore_pci_init();
-   if (ret)
-   return ret;
-   ret = uncore_cpu_init();
-   if (ret)
-   goto err;
+   pret = uncore_pci_init();
+   cret = uncore_cpu_init();
+
+   if (cret && pret)
+   return -ENODEV;
 
cpu_notifier_register_begin();
-   ret = uncore_cpumask_init();
+   ret = uncore_cpumask_init(!cret);
if (ret)
goto err;
cpu_notifier_register_done();

[patch V3 09/28] perf: Allow storage of pmu private data in event

2016-02-22 Thread Thomas Gleixner

For pmus which are not per cpu, but e.g. per package/socket, we want to be
able to store a reference to the underlying per package/socket facility in the
event at init time so we can avoid magic storage constructs in the pmu driver.

This allows us to get rid of the per cpu dance in the intel uncore and rapl
drivers and avoids a lookup of the per package data in the perf hotpath.

Signed-off-by: Thomas Gleixner 
---
 include/linux/perf_event.h |1 +
 1 file changed, 1 insertion(+)

--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -467,6 +467,7 @@ struct perf_event {
int group_flags;
struct perf_event   *group_leader;
struct pmu  *pmu;
+   void*pmu_private;
 
enum perf_event_active_statestate;
unsigned intattach_state;

[patch V3 15/28] cpumask: Export cpumask_any_but

2016-02-22 Thread Thomas Gleixner

Almost every cpumask function is exported, just not the one I need to make the
intel uncore driver modular.

Signed-off-by: Thomas Gleixner 
---
 lib/cpumask.c |1 +
 1 file changed, 1 insertion(+)

Index: b/lib/cpumask.c
===
--- a/lib/cpumask.c
+++ b/lib/cpumask.c
@@ -41,6 +41,7 @@ int cpumask_any_but(const struct cpumask
break;
return i;
 }
+EXPORT_SYMBOL(cpumask_any_but);
 
 /* These are not inline because of header tangles. */
 #ifdef CONFIG_CPUMASK_OFFSTACK

[patch V3 12/28] x86/perf/uncore: Track packages not per cpu data

2016-02-22 Thread Thomas Gleixner

uncore is a per package facility, but the code tries to mimick a per cpu
facility with completely convoluted constructs.

Simplify the whole machinery by tracking per package information. While at it,
avoid the kfree/alloc dance when a cpu goes offline and online again. There is
no point in freeing the box after it was allocated. We just keep proper
refcounting and the first cpu which comes online in a package does the
initialization/activation of the box.

Signed-off-by: Thomas Gleixner 
---
 arch/x86/kernel/cpu/perf_event_intel_uncore.c   |  420 
 arch/x86/kernel/cpu/perf_event_intel_uncore.h   |   18 
 arch/x86/kernel/cpu/perf_event_intel_uncore_snbep.c |   10 
 3 files changed, 197 insertions(+), 251 deletions(-)

--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -9,9 +9,9 @@ struct pci_driver *uncore_pci_driver;
 /* pci bus to socket mapping */
 DEFINE_RAW_SPINLOCK(pci2phy_map_lock);
 struct list_head pci2phy_map_head = LIST_HEAD_INIT(pci2phy_map_head);
-struct pci_dev 
*uncore_extra_pci_dev[UNCORE_SOCKET_MAX][UNCORE_EXTRA_PCI_DEV_MAX];
+struct pci_extra_dev *uncore_extra_pci_dev;
+static int max_packages;
 
-static DEFINE_RAW_SPINLOCK(uncore_box_lock);
 /* mask of cpus that collect uncore events */
 static cpumask_t uncore_cpu_mask;
 
@@ -94,27 +94,7 @@ ssize_t uncore_event_show(struct kobject
 
 struct intel_uncore_box *uncore_pmu_to_box(struct intel_uncore_pmu *pmu, int 
cpu)
 {
-   struct intel_uncore_box *box;
-
-   box = *per_cpu_ptr(pmu->box, cpu);
-   if (box)
-   return box;
-
-   raw_spin_lock(&uncore_box_lock);
-   /* Recheck in lock to handle races. */
-   if (*per_cpu_ptr(pmu->box, cpu))
-   goto out;
-   list_for_each_entry(box, &pmu->box_list, list) {
-   if (box->phys_id == topology_physical_package_id(cpu)) {
-   atomic_inc(&box->refcnt);
-   *per_cpu_ptr(pmu->box, cpu) = box;
-   break;
-   }
-   }
-out:
-   raw_spin_unlock(&uncore_box_lock);
-
-   return *per_cpu_ptr(pmu->box, cpu);
+   return pmu->boxes[topology_logical_package_id(cpu)];
 }
 
 u64 uncore_msr_read_counter(struct intel_uncore_box *box, struct perf_event 
*event)
@@ -315,9 +295,9 @@ static struct intel_uncore_box *uncore_a
raw_spin_lock_init(&box->shared_regs[i].lock);
 
uncore_pmu_init_hrtimer(box);
-   atomic_set(&box->refcnt, 1);
box->cpu = -1;
-   box->phys_id = -1;
+   box->pci_phys_id = -1;
+   box->pkgid = -1;
 
/* set default hrtimer timeout */
box->hrtimer_duration = UNCORE_PMU_HRTIMER_INTERVAL;
@@ -774,14 +754,24 @@ static void uncore_pmu_unregister(struct
pmu->registered = false;
 }
 
+static void uncore_free_boxes(struct intel_uncore_pmu *pmu)
+{
+   int pkg;
+
+   for (pkg = 0; pkg < max_packages; pkg++)
+   kfree(pmu->boxes[pkg]);
+   kfree(pmu->boxes);
+}
+
 static void __init uncore_type_exit(struct intel_uncore_type *type)
 {
+   struct intel_uncore_pmu *pmu = type->pmus;
int i;
 
-   if (type->pmus) {
-   for (i = 0; i < type->num_boxes; i++) {
-   uncore_pmu_unregister(&type->pmus[i]);
-   free_percpu(type->pmus[i].box);
+   if (pmu) {
+   for (i = 0; i < type->num_boxes; i++, pmu++) {
+   uncore_pmu_unregister(pmu);
+   uncore_free_boxes(pmu);
}
kfree(type->pmus);
type->pmus = NULL;
@@ -796,37 +786,36 @@ static void __init uncore_types_exit(str
uncore_type_exit(*types);
 }
 
-static int __init uncore_type_init(struct intel_uncore_type *type)
+static int __init uncore_type_init(struct intel_uncore_type *type, bool setid)
 {
struct intel_uncore_pmu *pmus;
struct attribute_group *attr_group;
struct attribute **attrs;
+   size_t size;
int i, j;
 
pmus = kzalloc(sizeof(*pmus) * type->num_boxes, GFP_KERNEL);
if (!pmus)
return -ENOMEM;
 
-   type->pmus = pmus;
-
-   type->unconstrainted = (struct event_constraint)
-   __EVENT_CONSTRAINT(0, (1ULL << type->num_counters) - 1,
-   0, type->num_counters, 0, 0);
+   size = max_packages * sizeof(struct intel_uncore_box *);
 
for (i = 0; i < type->num_boxes; i++) {
-   pmus[i].func_id = -1;
-   pmus[i].pmu_idx = i;
-   pmus[i].type = type;
-   INIT_LIST_HEAD(&pmus[i].box_list);
-   pmus[i].box = alloc_percpu(struct intel_uncore_box *);
-   if (!pmus[i].box)
+   pmus[i].func_id = setid ? i : -1;
+   pmus[i].pmu_idx = i;
+   pmus[i].type= type;
+   pmus[i].boxes   = kzalloc(size, GFP_KERNEL);
+

[patch V3 16/28] x86/perf/intel_uncore: Make it modular

2016-02-22 Thread Thomas Gleixner

Now that we have a proper cleanup all over the place, it's simple to make this
a modular driver.

Signed-off-by: Thomas Gleixner 
---
 arch/x86/Kconfig  |6 ++
 arch/x86/Kconfig.perf |   16 
 arch/x86/kernel/cpu/Makefile  |3 ++-
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |   24 ++--
 4 files changed, 38 insertions(+), 11 deletions(-)

--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -160,10 +160,6 @@ config INSTRUCTION_DECODER
def_bool y
depends on KPROBES || PERF_EVENTS || UPROBES
 
-config PERF_EVENTS_INTEL_UNCORE
-   def_bool y
-   depends on PERF_EVENTS && CPU_SUP_INTEL && PCI
-
 config OUTPUT_FORMAT
string
default "elf32-i386" if X86_32
@@ -1039,6 +1035,8 @@ config X86_THERMAL_VECTOR
def_bool y
depends on X86_MCE_INTEL
 
+source "arch/x86/Kconfig.perf"
+
 config X86_LEGACY_VM86
bool "Legacy VM86 support"
default n
--- /dev/null
+++ b/arch/x86/Kconfig.perf
@@ -0,0 +1,16 @@
+menu "Performance monitoring"
+
+config PERF_EVENTS_INTEL_UNCORE
+   tristate "Intel uncore performance events"
+   depends on PERF_EVENTS && CPU_SUP_INTEL && PCI
+   default y
+   ---help---
+ Include support for Intel uncore performance events. These are
+ available on NehalemEX and more modern processors.
+
+ Note: Selecting 'm' might break existing setups as the drivers
+ lack the autoprobe/load magic. If you need them select: y.
+
+ If unsure say y.
+
+endmenu
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -43,7 +43,8 @@ obj-$(CONFIG_CPU_SUP_INTEL)   += perf_eve
 obj-$(CONFIG_CPU_SUP_INTEL)+= perf_event_intel_pt.o 
perf_event_intel_bts.o
 obj-$(CONFIG_CPU_SUP_INTEL)+= perf_event_intel_cstate.o
 
-obj-$(CONFIG_PERF_EVENTS_INTEL_UNCORE) += perf_event_intel_uncore.o \
+obj-$(CONFIG_PERF_EVENTS_INTEL_UNCORE) += perf_event_intel_uncores.o
+perf_event_intel_uncores-objs  := perf_event_intel_uncore.o \
   perf_event_intel_uncore_snb.o \
   perf_event_intel_uncore_snbep.o \
   perf_event_intel_uncore_nhmex.o
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -21,6 +21,8 @@ static struct event_constraint uncore_co
 struct event_constraint uncore_constraint_empty =
EVENT_CONSTRAINT(0, 0, 0);
 
+MODULE_LICENSE("GPL");
+
 static int uncore_pcibus_to_physid(struct pci_bus *bus)
 {
struct pci2phy_map *map;
@@ -754,7 +756,7 @@ static void uncore_pmu_unregister(struct
pmu->registered = false;
 }
 
-static void __init __uncore_exit_boxes(struct intel_uncore_type *type, int cpu)
+static void __uncore_exit_boxes(struct intel_uncore_type *type, int cpu)
 {
struct intel_uncore_pmu *pmu = type->pmus;
struct intel_uncore_box *box;
@@ -770,7 +772,7 @@ static void __init __uncore_exit_boxes(s
}
 }
 
-static void __init uncore_exit_boxes(void *dummy)
+static void uncore_exit_boxes(void *dummy)
 {
struct intel_uncore_type **types;
 
@@ -787,7 +789,7 @@ static void uncore_free_boxes(struct int
kfree(pmu->boxes);
 }
 
-static void __init uncore_type_exit(struct intel_uncore_type *type)
+static void uncore_type_exit(struct intel_uncore_type *type)
 {
struct intel_uncore_pmu *pmu = type->pmus;
int i;
@@ -804,7 +806,7 @@ static void __init uncore_type_exit(stru
type->events_group = NULL;
 }
 
-static void __init uncore_types_exit(struct intel_uncore_type **types)
+static void uncore_types_exit(struct intel_uncore_type **types)
 {
for (; *types; types++)
uncore_type_exit(*types);
@@ -1060,7 +1062,7 @@ static int __init uncore_pci_init(void)
return ret;
 }
 
-static void __init uncore_pci_exit(void)
+static void uncore_pci_exit(void)
 {
if (pcidrv_registered) {
pcidrv_registered = false;
@@ -1404,4 +1406,14 @@ static int __init intel_uncore_init(void
cpu_notifier_register_done();
return ret;
 }
-device_initcall(intel_uncore_init);
+module_init(intel_uncore_init);
+
+static void __exit intel_uncore_exit(void)
+{
+   cpu_notifier_register_done();
+   __unregister_cpu_notifier(&uncore_cpu_nb);
+   uncore_types_exit(uncore_msr_uncores);
+   uncore_pci_exit();
+   cpu_notifier_register_done();
+}
+module_exit(intel_uncore_exit);

[patch V3 17/28] x86/perf/cqm: Get rid of the silly for_each_cpu lookups

2016-02-22 Thread Thomas Gleixner

CQM is a strict per package facility. Use the proper cpumasks to lookup the
readers.

Signed-off-by: Thomas Gleixner 
---
 arch/x86/kernel/cpu/perf_event_intel_cqm.c |   34 ++---
 1 file changed, 12 insertions(+), 22 deletions(-)

Index: b/arch/x86/kernel/cpu/perf_event_intel_cqm.c
===
--- a/arch/x86/kernel/cpu/perf_event_intel_cqm.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_cqm.c
@@ -1244,15 +1244,12 @@ static struct pmu intel_cqm_pmu = {
 
 static inline void cqm_pick_event_reader(int cpu)
 {
-   int phys_id = topology_physical_package_id(cpu);
-   int i;
+   int reader;
 
-   for_each_cpu(i, &cqm_cpumask) {
-   if (phys_id == topology_physical_package_id(i))
-   return; /* already got reader for this socket */
-   }
-
-   cpumask_set_cpu(cpu, &cqm_cpumask);
+   /* First online cpu in package becomes the reader */
+   reader = cpumask_any_and(&cqm_cpumask, topology_core_cpumask(cpu));
+   if (reader >= nr_cpu_ids)
+   cpumask_set_cpu(cpu, &cqm_cpumask);
 }
 
 static void intel_cqm_cpu_starting(unsigned int cpu)
@@ -1270,24 +1267,17 @@ static void intel_cqm_cpu_starting(unsig
 
 static void intel_cqm_cpu_exit(unsigned int cpu)
 {
-   int phys_id = topology_physical_package_id(cpu);
-   int i;
+   int target;
 
-   /*
-* Is @cpu a designated cqm reader?
-*/
+   /* Is @cpu the current cqm reader for this package ? */
if (!cpumask_test_and_clear_cpu(cpu, &cqm_cpumask))
return;
 
-   for_each_online_cpu(i) {
-   if (i == cpu)
-   continue;
-
-   if (phys_id == topology_physical_package_id(i)) {
-   cpumask_set_cpu(i, &cqm_cpumask);
-   break;
-   }
-   }
+   /* Find another online reader in this package */
+   target = cpumask_any_but(topology_core_cpumask(cpu), cpu);
+
+   if (target < nr_cpu_ids)
+   cpumask_set_cpu(target, &cqm_cpumask);
 }
 
 static int intel_cqm_cpu_notifier(struct notifier_block *nb,

[patch V3 20/28] x86/perf/intel/rapl: Sanitize the quirk handling

2016-02-22 Thread Thomas Gleixner

There is no point in having a quirk machinery for a single possible
function. Get rid of it and move the quirk to a place where it makes actually
sense.

Signed-off-by: Thomas Gleixner 
---
 arch/x86/kernel/cpu/perf_event_intel_rapl.c |   49 ++--
 1 file changed, 19 insertions(+), 30 deletions(-)

Index: b/arch/x86/kernel/cpu/perf_event_intel_rapl.c
===
--- a/arch/x86/kernel/cpu/perf_event_intel_rapl.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_rapl.c
@@ -133,7 +133,6 @@ static int rapl_cntr_mask;
 static DEFINE_PER_CPU(struct rapl_pmu *, rapl_pmu);
 static DEFINE_PER_CPU(struct rapl_pmu *, rapl_pmu_to_free);
 
-static struct x86_pmu_quirk *rapl_quirks;
 static inline u64 rapl_read_counter(struct perf_event *event)
 {
u64 raw;
@@ -141,15 +140,6 @@ static inline u64 rapl_read_counter(stru
return raw;
 }
 
-#define rapl_add_quirk(func_)  \
-do {   \
-   static struct x86_pmu_quirk __quirk __initdata = {  \
-   .func = func_,  \
-   };  \
-   __quirk.next = rapl_quirks; \
-   rapl_quirks = &__quirk; \
-} while (0)
-
 static inline u64 rapl_scale(u64 v, int cfg)
 {
if (cfg > NR_RAPL_DOMAINS) {
@@ -564,17 +554,6 @@ static void rapl_cpu_init(int cpu)
cpumask_set_cpu(cpu, &rapl_cpu_mask);
 }
 
-static __init void rapl_hsw_server_quirk(void)
-{
-   /*
-* DRAM domain on HSW server has fixed energy unit which can be
-* different than the unit from power unit MSR.
-* "Intel Xeon Processor E5-1600 and E5-2600 v3 Product Families, V2
-* of 2. Datasheet, September 2014, Reference Number: 330784-001 "
-*/
-   rapl_hw_unit[RAPL_IDX_RAM_NRG_STAT] = 16;
-}
-
 static int rapl_cpu_prepare(int cpu)
 {
struct rapl_pmu *pmu = per_cpu(rapl_pmu, cpu);
@@ -672,7 +651,18 @@ static int rapl_cpu_notifier(struct noti
return NOTIFY_OK;
 }
 
-static int rapl_check_hw_unit(void)
+static __init void rapl_hsw_server_quirk(void)
+{
+   /*
+* DRAM domain on HSW server has fixed energy unit which can be
+* different than the unit from power unit MSR.
+* "Intel Xeon Processor E5-1600 and E5-2600 v3 Product Families, V2
+* of 2. Datasheet, September 2014, Reference Number: 330784-001 "
+*/
+   rapl_hw_unit[RAPL_IDX_RAM_NRG_STAT] = 16;
+}
+
+static int rapl_check_hw_unit(void (*quirk)(void))
 {
u64 msr_rapl_power_unit_bits;
int i;
@@ -683,6 +673,9 @@ static int rapl_check_hw_unit(void)
for (i = 0; i < NR_RAPL_DOMAINS; i++)
rapl_hw_unit[i] = (msr_rapl_power_unit_bits >> 8) & 0x1FULL;
 
+   /* Apply cpu model quirk */
+   if (quirk)
+   quirk();
return 0;
 }
 
@@ -701,9 +694,9 @@ static const struct x86_cpu_id rapl_cpu_
 
 static int __init rapl_pmu_init(void)
 {
+   void (*quirk)(void) = NULL;
struct rapl_pmu *pmu;
int cpu, ret;
-   struct x86_pmu_quirk *quirk;
int i;
 
/*
@@ -720,7 +713,7 @@ static int __init rapl_pmu_init(void)
rapl_pmu_events_group.attrs = rapl_events_cln_attr;
break;
case 63: /* Haswell-Server */
-   rapl_add_quirk(rapl_hsw_server_quirk);
+   quirk = rapl_hsw_server_quirk;
rapl_cntr_mask = RAPL_IDX_SRV;
rapl_pmu_events_group.attrs = rapl_events_srv_attr;
break;
@@ -736,7 +729,7 @@ static int __init rapl_pmu_init(void)
rapl_pmu_events_group.attrs = rapl_events_srv_attr;
break;
case 87: /* Knights Landing */
-   rapl_add_quirk(rapl_hsw_server_quirk);
+   quirk = rapl_hsw_server_quirk;
rapl_cntr_mask = RAPL_IDX_KNL;
rapl_pmu_events_group.attrs = rapl_events_knl_attr;
break;
@@ -745,14 +738,10 @@ static int __init rapl_pmu_init(void)
return -ENODEV;
}
 
-   ret = rapl_check_hw_unit();
+   ret = rapl_check_hw_unit(quirk);
if (ret)
return ret;
 
-   /* run cpu model quirks */
-   for (quirk = rapl_quirks; quirk; quirk = quirk->next)
-   quirk->func();
-
cpu_notifier_register_begin();
 
for_each_online_cpu(cpu) {

[patch V3 21/28] x86/perf/intel/rapl: Calculate timing once

2016-02-22 Thread Thomas Gleixner

No point in doing the same calculation over and over. Do it once in
rapl_check_hw_unit().

Signed-off-by: Thomas Gleixner 
---
 arch/x86/kernel/cpu/perf_event_intel_rapl.c |   39 
 1 file changed, 18 insertions(+), 21 deletions(-)

Index: b/arch/x86/kernel/cpu/perf_event_intel_rapl.c
===
--- a/arch/x86/kernel/cpu/perf_event_intel_rapl.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_rapl.c
@@ -129,6 +129,7 @@ static int rapl_hw_unit[NR_RAPL_DOMAINS]
 static struct pmu rapl_pmu_class;
 static cpumask_t rapl_cpu_mask;
 static int rapl_cntr_mask;
+static u64 rapl_timer_ms;
 
 static DEFINE_PER_CPU(struct rapl_pmu *, rapl_pmu);
 static DEFINE_PER_CPU(struct rapl_pmu *, rapl_pmu_to_free);
@@ -558,7 +559,6 @@ static int rapl_cpu_prepare(int cpu)
 {
struct rapl_pmu *pmu = per_cpu(rapl_pmu, cpu);
int phys_id = topology_physical_package_id(cpu);
-   u64 ms;
 
if (pmu)
return 0;
@@ -575,19 +575,7 @@ static int rapl_cpu_prepare(int cpu)
 
pmu->pmu = &rapl_pmu_class;
 
-   /*
-* use reference of 200W for scaling the timeout
-* to avoid missing counter overflows.
-* 200W = 200 Joules/sec
-* divide interval by 2 to avoid lockstep (2 * 100)
-* if hw unit is 32, then we use 2 ms 1/200/2
-*/
-   if (rapl_hw_unit[0] < 32)
-   ms = (1000 / (2 * 100)) * (1ULL << (32 - rapl_hw_unit[0] - 1));
-   else
-   ms = 2;
-
-   pmu->timer_interval = ms_to_ktime(ms);
+   pmu->timer_interval = ms_to_ktime(rapl_timer_ms);
 
rapl_hrtimer_init(pmu);
 
@@ -676,6 +664,19 @@ static int rapl_check_hw_unit(void (*qui
/* Apply cpu model quirk */
if (quirk)
quirk();
+
+   /*
+* Calculate the timer rate:
+* Use reference of 200W for scaling the timeout to avoid counter
+* overflows. 200W = 200 Joules/sec
+* Divide interval by 2 to avoid lockstep (2 * 100)
+* if hw unit is 32, then we use 2 ms 1/200/2
+*/
+   rapl_timer_ms = 2;
+   if (rapl_hw_unit[0] < 32) {
+   rapl_timer_ms = (1000 / (2 * 100));
+   rapl_timer_ms *= (1ULL << (32 - rapl_hw_unit[0] - 1));
+   }
return 0;
 }
 
@@ -695,9 +696,7 @@ static const struct x86_cpu_id rapl_cpu_
 static int __init rapl_pmu_init(void)
 {
void (*quirk)(void) = NULL;
-   struct rapl_pmu *pmu;
-   int cpu, ret;
-   int i;
+   int cpu, ret, i;
 
/*
 * check for Intel processor family 6
@@ -758,15 +757,14 @@ static int __init rapl_pmu_init(void)
}
 
__perf_cpu_notifier(rapl_cpu_notifier);
-
-   pmu = __this_cpu_read(rapl_pmu);
+   cpu_notifier_register_done();
 
pr_info("RAPL PMU detected,"
" API unit is 2^-32 Joules,"
" %d fixed counters"
" %llu ms ovfl timer\n",
hweight32(rapl_cntr_mask),
-   ktime_to_ms(pmu->timer_interval));
+   rapl_timer_ms);
for (i = 0; i < NR_RAPL_DOMAINS; i++) {
if (rapl_cntr_mask & (1 << i)) {
pr_info("hw unit of domain %s 2^-%d Joules\n",
@@ -774,7 +772,6 @@ static int __init rapl_pmu_init(void)
}
}
 
-   cpu_notifier_register_done();
return 0;
 
 out:

[patch V3 18/28] x86/perf/intel_rapl: Make Knights Landings support functional

2016-02-22 Thread Thomas Gleixner

The Knights Landings support added the events and the detection case, but then
returns 0 without actually initializing the driver.

Fixes: 3a2a7797326a4 "perf/x86/intel/rapl: Add support for Knights Landing 
(KNL)"
Signed-off-by: Thomas Gleixner 
Cc: Dasaratharaman Chandramouli 
---
 arch/x86/kernel/cpu/perf_event_intel_rapl.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: b/arch/x86/kernel/cpu/perf_event_intel_rapl.c
===
--- a/arch/x86/kernel/cpu/perf_event_intel_rapl.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_rapl.c
@@ -731,7 +731,7 @@ static int __init rapl_pmu_init(void)
rapl_add_quirk(rapl_hsw_server_quirk);
rapl_cntr_mask = RAPL_IDX_KNL;
rapl_pmu_events_group.attrs = rapl_events_knl_attr;
-
+   break;
default:
/* unsupported */
return 0;

< 1 2 3 4 5 6 7 8 9 10 >

301 - 400 of 1230 matches

Mail list logo