Re: [PATCH 1/2] vfs: make sure struct filename->iname is word-aligned
On Fri, Feb 19 2016, Theodore Ts'o wrote: > On Thu, Feb 18, 2016 at 09:10:21PM +0100, Rasmus Villemoes wrote: >> >> Sure, that would work as well. I don't really care how ->iname is pushed >> out to offset 32, but I'd like to know if it's worth it. > > Do you have access to one of these platforms where unaligned access is > really painful? No. But FWIW, I did a microbenchmark on my aging Core2, doing nothing but lstat() on the same "..." string in a loop. 'before' is 4.4.2 with a few unrelated patches, 'after' is that plus 1/2 and 2/2. In perf_x_y, x is length of "aaa..." string and y is alignment mod 8 in userspace. $ grep strncpy_from_user *.report perf_30_0_after.report: 5.47% s_f_u[k] strncpy_from_user perf_30_0_before.report: 7.40% s_f_u[k] strncpy_from_user perf_30_3_after.report: 5.05% s_f_u[k] strncpy_from_user perf_30_3_before.report: 7.29% s_f_u[k] strncpy_from_user perf_30_4_after.report: 4.88% s_f_u[k] strncpy_from_user perf_30_4_before.report: 7.28% s_f_u[k] strncpy_from_user perf_30_6_after.report: 5.43% s_f_u[k] strncpy_from_user perf_30_6_before.report: 6.74% s_f_u[k] strncpy_from_user perf_40_0_after.report: 5.68% s_f_u[k] strncpy_from_user perf_40_0_before.report:10.99% s_f_u[k] strncpy_from_user perf_40_3_after.report: 5.37% s_f_u[k] strncpy_from_user perf_40_3_before.report:10.62% s_f_u[k] strncpy_from_user perf_40_4_after.report: 5.61% s_f_u[k] strncpy_from_user perf_40_4_before.report:10.91% s_f_u[k] strncpy_from_user perf_40_6_after.report: 5.81% s_f_u[k] strncpy_from_user perf_40_6_before.report:10.84% s_f_u[k] strncpy_from_user perf_50_0_after.report: 6.29% s_f_u[k] strncpy_from_user perf_50_0_before.report:12.46% s_f_u[k] strncpy_from_user perf_50_3_after.report: 7.15% s_f_u[k] strncpy_from_user perf_50_3_before.report:14.09% s_f_u[k] strncpy_from_user perf_50_4_after.report: 7.64% s_f_u[k] strncpy_from_user perf_50_4_before.report:14.10% s_f_u[k] strncpy_from_user perf_50_6_after.report: 7.30% s_f_u[k] strncpy_from_user perf_50_6_before.report:14.10% s_f_u[k] strncpy_from_user perf_60_0_after.report: 6.81% s_f_u[k] strncpy_from_user perf_60_0_before.report:13.25% s_f_u[k] strncpy_from_user perf_60_3_after.report: 9.48% s_f_u[k] strncpy_from_user perf_60_3_before.report:13.26% s_f_u[k] strncpy_from_user perf_60_4_after.report: 9.90% s_f_u[k] strncpy_from_user perf_60_4_before.report:15.09% s_f_u[k] strncpy_from_user perf_60_6_after.report: 9.91% s_f_u[k] strncpy_from_user perf_60_6_before.report:13.85% s_f_u[k] strncpy_from_user So the numbers vary and it's a bit odd that some of the userspace-unaligned cases seem faster than the corresponding aligned ones, but overall I think it's ok to conclude there's a measurable difference. Note the huge jump from 30_y_before to 40_y_before. I suppose that's because we do an unaligned store crossing a cache line boundary when the string is > 32 bytes. I suppose 2/2 is also responsible for some of the above, since it not only aligns the kernel-side stores, but also means we stay within a single cacheline for strings up to 56 bytes. I should measure the effect of 1/2 by itself, but compiling a kernel takes forever for me, so I won't get to that tonight. [It turns out that 32 is the median length from 'git ls-files' in the kernel tree, with 33.2 being the mean, so even though I used relatively long paths above to get strncpy_from_user to stand out, such path lengths are not totally uncommon.] > The usual thing is to benchmark something like "git > stat" which has to stat every single file in a repository's working > directory. I tried that as well; strncpy_from_user was around 0.5% both before and after. Rasmus
[PATCH] Staging: comedi: contec_pci_dio: fixed comment blocks coding style issues
Makes the comment blocks start with /* on separate lines, and end with */ on separate lines as well, starting with * for each comment lines. Signed-off-by: Philippe Loctaux --- drivers/staging/comedi/drivers/contec_pci_dio.c | 47 + 1 file changed, 24 insertions(+), 23 deletions(-) diff --git a/drivers/staging/comedi/drivers/contec_pci_dio.c b/drivers/staging/comedi/drivers/contec_pci_dio.c index 4956a49..5f84839 100644 --- a/drivers/staging/comedi/drivers/contec_pci_dio.c +++ b/drivers/staging/comedi/drivers/contec_pci_dio.c @@ -1,29 +1,30 @@ /* -comedi/drivers/contec_pci_dio.c - -COMEDI - Linux Control and Measurement Device Interface -Copyright (C) 2000 David A. Schleef - -This program is free software; you can redistribute it and/or modify -it under the terms of the GNU General Public License as published by -the Free Software Foundation; either version 2 of the License, or -(at your option) any later version. + * comedi/drivers/contec_pci_dio.c + * + * COMEDI - Linux Control and Measurement Device Interface + * Copyright (C) 2000 David A. Schleef + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ -This program is distributed in the hope that it will be useful, -but WITHOUT ANY WARRANTY; without even the implied warranty of -MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -GNU General Public License for more details. -*/ /* -Driver: contec_pci_dio -Description: Contec PIO1616L digital I/O board -Devices: [Contec] PIO1616L (contec_pci_dio) -Author: Stefano Rivoir -Updated: Wed, 27 Jun 2007 13:00:06 +0100 -Status: works - -Configuration Options: not applicable, uses comedi PCI auto config -*/ + * Driver: contec_pci_dio + * Description: Contec PIO1616L digital I/O board + * Devices: [Contec] PIO1616L (contec_pci_dio) + * Author: Stefano Rivoir + * Updated: Wed, 27 Jun 2007 13:00:06 +0100 + * Status: works + * + * Configuration Options: not applicable, uses comedi PCI auto config + */ #include -- 2.7.1
Re: [PATCH 1/2] MIPS: Add barriers between dcache & icache flushes
On 02/22/2016 13:09, Paul Burton wrote: > Index-based cache operations may be arbitrarily reordered by out of > order CPUs. Thus code which writes back the dcache & then invalidates > the icache using indexed cache ops must include a barrier between > operating on the 2 caches in order to prevent the scenario in which: > > - icache invalidation occurs. > > - icache fetch occurs, due to speculation. > > - dcache writeback occurs. > > If the above were allowed to happen then the icache would contain stale > data. Forcing the dcache writeback to complete before the icache > invalidation avoids this. Is there a particular symptom one should look for to check for this issue occurring? I haven't seen any odd effects on my SGI systems that appear to relate to this. I believe the R1x000 family resolves all hazards in hardware, so maybe this issue doesn't affect that CPU family? If not, let me know what to look or test for so I can check the patch out on my systems. Thanks! --J > Signed-off-by: Paul Burton > Cc: James Hogan > --- > > arch/mips/mm/c-r4k.c | 15 +-- > 1 file changed, 13 insertions(+), 2 deletions(-) > > diff --git a/arch/mips/mm/c-r4k.c b/arch/mips/mm/c-r4k.c > index caac3d7..a49010c 100644 > --- a/arch/mips/mm/c-r4k.c > +++ b/arch/mips/mm/c-r4k.c > @@ -449,6 +449,7 @@ static inline void local_r4k___flush_cache_all(void * > args) > > default: > r4k_blast_dcache(); > + mb(); /* cache instructions may be reordered */ > r4k_blast_icache(); > break; > } > @@ -493,8 +494,10 @@ static inline void local_r4k_flush_cache_range(void * > args) > return; > > r4k_blast_dcache(); > - if (exec) > + if (exec) { > + mb(); /* cache instructions may be reordered */ > r4k_blast_icache(); > + } > } > > static void r4k_flush_cache_range(struct vm_area_struct *vma, > @@ -599,8 +602,13 @@ static inline void local_r4k_flush_cache_page(void *args) > if (cpu_has_dc_aliases || (exec && !cpu_has_ic_fills_f_dc)) { > vaddr ? r4k_blast_dcache_page(addr) : > r4k_blast_dcache_user_page(addr); > - if (exec && !cpu_icache_snoops_remote_store) > + if (exec) > + mb(); /* cache instructions may be reordered */ > + > + if (exec && !cpu_icache_snoops_remote_store) { > r4k_blast_scache_page(addr); > + mb(); /* cache instructions may be reordered */ > + } > } > if (exec) { > if (vaddr && cpu_has_vtag_icache && mm == current->active_mm) { > @@ -660,6 +668,7 @@ static inline void local_r4k_flush_icache_range(unsigned > long start, unsigned lo > R4600_HIT_CACHEOP_WAR_IMPL; > protected_blast_dcache_range(start, end); > } > + mb(); /* cache instructions may be reordered */ > } > > if (end - start > icache_size) > @@ -798,6 +807,8 @@ static void local_r4k_flush_cache_sigtramp(void * arg) > protected_writeback_dcache_line(addr & ~(dc_lsize - 1)); > if (!cpu_icache_snoops_remote_store && scache_size) > protected_writeback_scache_line(addr & ~(sc_lsize - 1)); > + if ((dc_lsize || scache_size) && ic_lsize) > + mb(); /* cache instructions may be reordered */ > if (ic_lsize) > protected_flush_icache_line(addr & ~(ic_lsize - 1)); > if (MIPS4K_ICACHE_REFILL_WAR) { >
[PATCH] clk: Update some outdated comments
__clk_init() was renamed to __clk_core_init() but these comments weren't updated. Cc: Masahiro Yamada Signed-off-by: Stephen Boyd --- drivers/clk/clk.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c index 51d673370d42..d95d6f924cac 100644 --- a/drivers/clk/clk.c +++ b/drivers/clk/clk.c @@ -2157,7 +2157,7 @@ unlock: * * Dynamically removes a clk and all its child nodes from the * debugfs clk directory if clk->dentry points to debugfs created by - * clk_debug_register in __clk_init. + * clk_debug_register in __clk_core_init. */ static void clk_debug_unregister(struct clk_core *core) { @@ -2309,8 +2309,8 @@ static int __clk_core_init(struct clk_core *core) core->parent = __clk_init_parent(core); /* -* Populate core->parent if parent has already been __clk_init'd. If -* parent has not yet been __clk_init'd then place clk in the orphan +* Populate core->parent if parent has already been clk_core_init'd. If +* parent has not yet been clk_core_init'd then place clk in the orphan * list. If clk doesn't have any parents then place it in the root * clk list. * -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH] intel-pstate: Update frequencies of policy->cpus only from ->set_policy()
On 02/21/2016 08:57 PM, Viresh Kumar wrote: The intel-pstate driver is using intel_pstate_hwp_set() from two separate paths, i.e. ->set_policy() callback and sysfs update path for the files present in /sys/devices/system/cpu/intel_pstate/ directory. While an update to the sysfs path applies to all the CPUs being managed by the driver (which essentially means all the online CPUs), the update via the ->set_policy() callback applies to a smaller group of CPUs managed by the policy for which ->set_policy() is called. And so, intel_pstate_hwp_set() should update frequencies of only the CPUs that are part of policy->cpus mask, while it is called from ->set_policy() callback. In order to do that, add a parameter (cpumask) to intel_pstate_hwp_set() and apply the frequency changes only to the concerned CPUs. For ->set_policy() path, we are only concerned about policy->cpus, and so policy->rwsem lock taken by the core prior to calling ->set_policy() is enough to take care of any races. The larger lock acquired by get_online_cpus() is required only for the updates to sysfs files. Add another routine, intel_pstate_hwp_set_online_cpus(), and call it from the sysfs update paths. This also fixes a lockdep reported recently, where policy->rwsem and get_online_cpus() could have been acquired in any order causing an ABBA deadlock. The sequence of events leading to that was: intel_pstate_init(...) ...cpufreq_online(...) down_write(&policy->rwsem); // Locks policy->rwsem ... cpufreq_init_policy(policy); ...intel_pstate_hwp_set(); get_online_cpus(); // Temporarily locks cpu_hotplug.lock ... up_write(&policy->rwsem); pm_suspend(...) ...disable_nonboot_cpus() _cpu_down() cpu_hotplug_begin(); // Locks cpu_hotplug.lock __cpu_notify(CPU_DOWN_PREPARE, ...); ...cpufreq_offline_prepare(); down_write(&policy->rwsem); // Locks policy->rwsem Reported-by: Joonas Lahtinen Signed-off-by: Viresh Kumar Acked-by: Srinivas Pandruvada --- drivers/cpufreq/intel_pstate.c | 21 - 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index f4d85c2ae7b1..2e7058a2479d 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -287,7 +287,7 @@ static inline void update_turbo_state(void) cpu->pstate.max_pstate == cpu->pstate.turbo_pstate); } -static void intel_pstate_hwp_set(void) +static void intel_pstate_hwp_set(const struct cpumask *cpumask) { int min, hw_min, max, hw_max, cpu, range, adj_range; u64 value, cap; @@ -297,9 +297,7 @@ static void intel_pstate_hwp_set(void) hw_max = HWP_HIGHEST_PERF(cap); range = hw_max - hw_min; - get_online_cpus(); - - for_each_online_cpu(cpu) { + for_each_cpu(cpu, cpumask) { rdmsrl_on_cpu(cpu, MSR_HWP_REQUEST, &value); adj_range = limits->min_perf_pct * range / 100; min = hw_min + adj_range; @@ -318,7 +316,12 @@ static void intel_pstate_hwp_set(void) value |= HWP_MAX_PERF(max); wrmsrl_on_cpu(cpu, MSR_HWP_REQUEST, value); } +} +static void intel_pstate_hwp_set_online_cpus(void) +{ + get_online_cpus(); + intel_pstate_hwp_set(cpu_online_mask); put_online_cpus(); } @@ -440,7 +443,7 @@ static ssize_t store_no_turbo(struct kobject *a, struct attribute *b, limits->no_turbo = clamp_t(int, input, 0, 1); if (hwp_active) - intel_pstate_hwp_set(); + intel_pstate_hwp_set_online_cpus(); return count; } @@ -466,7 +469,7 @@ static ssize_t store_max_perf_pct(struct kobject *a, struct attribute *b, int_tofp(100)); if (hwp_active) - intel_pstate_hwp_set(); + intel_pstate_hwp_set_online_cpus(); return count; } @@ -491,7 +494,7 @@ static ssize_t store_min_perf_pct(struct kobject *a, struct attribute *b, int_tofp(100)); if (hwp_active) - intel_pstate_hwp_set(); + intel_pstate_hwp_set_online_cpus(); return count; } @@ -1112,7 +1115,7 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy) pr_debug("intel_pstate: set performance\n"); limits = &performance_limits; if (hwp_active) - intel_pstate_hwp_set(); + intel_pstate_hwp_set(policy->cpus); return 0; } @@ -1144,7 +1147,7 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy) int_tofp(100));
Re: [PATCH 4/8] akcipher: Move the RSA DER encoding to the crypto layer
On 02/22/2016 02:28 PM, David Howells wrote: > Tadeusz Struk wrote: > >> I wonder if this should be merged with the crypto/rsa-pkcs1pad.c template >> that we already have. Looks like the two do the same padding now. >> Should we merge then and pass the hash param as a separate template param, >> e.g the public_key would allocate "pkcs1pad(rsa, sha1)"? > > Ummm... Possibly. Is that how it's used? > > warthog>git grep pkcs1pad -- Documentation > warthog1> Yes, no docs. Sorry. > > Anyway, the problem I have with this is that I want to get that knowledge out > of the asymmetric key in-software public key subtype. It knows "rsa", "dsa", > "ecdsa", ... because that's all the OIDs tell it. Rigth, for now the public_key would need to build the full algorithm string as follows: vsprintf(name, "pkcs1pad(%s, %s)", pkey_algo_name[sig->pkey_algo], hash_algo_name[sig->pkey_hash_algo]); Do you plan to add more padding schemes later? > > I guess if I have to, I can stoop to converting "rsa" to "pkcs1pad(rsa, > sha1)". > > Can you do me a really quick merge? -rc5 is already out, and I want to get it > to James pronto - plus I have things that are pending on this change being > made. Yes, I can start woring on a subsequent patch based on your changes in http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=keys-rsa Is that ok with you? > > Oh - and how does the padding template find the algorithm DER encoding string > to use? I have wondered whether it should be stored in with the hash > algorithm, but it probably makes more sense to keep it with the rsa module. We can put everything into the crypto/rsa-pkcs1pad.c This is where all the padding logic should be, I think. Thanks, -- TS
Re: [PATCH 0/8] tty: n_gsm: Make mux work as a responder station *DUPLICATE*
Please ignore this duplicate thread.
[PATCH] Staging: comedi: mite: added spaces around | and *
Added spaces around | and *, fixing 2 checkpatch checks. Signed-off-by: Philippe Loctaux --- drivers/staging/comedi/drivers/mite.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/staging/comedi/drivers/mite.c b/drivers/staging/comedi/drivers/mite.c index 8f24702..d5cd4f3 100644 --- a/drivers/staging/comedi/drivers/mite.c +++ b/drivers/staging/comedi/drivers/mite.c @@ -51,7 +51,7 @@ #include "mite.h" -#define TOP_OF_PAGE(x) ((x)|(~(PAGE_MASK))) +#define TOP_OF_PAGE(x) ((x) | (~(PAGE_MASK))) struct mite_struct *mite_alloc(struct pci_dev *pcidev) { @@ -216,7 +216,7 @@ EXPORT_SYMBOL_GPL(mite_free_ring); struct mite_channel *mite_request_channel_in_range(struct mite_struct *mite, struct mite_dma_descriptor_ring - *ring, unsigned min_channel, + * ring, unsigned min_channel, unsigned max_channel) { int i; -- 2.7.1
Re: [PATCH 04/10] x86/xsaves: Introduce a new check that allows correct xstates copy from kernel to user directly
On Mon, Feb 22, 2016 at 3:27 PM, Yu-cheng Yu wrote: > On Mon, Feb 22, 2016 at 02:42:54PM -0800, Andy Lutomirski wrote: >> > +static int should_save_registers_directly(void) >> >> I don't like the name of this function because: >> >> > +{ >> > + /* >> > +* In signal handling path, the kernel already checks if >> > +* FPU instructions have been used before it calls >> > +* copy_fpstate_to_sigframe(). We check this here again >> > +* to detect any potential mis-use and saving invalid >> > +* register values directly to a signal frame. >> > +*/ >> > + WARN_ONCE(!current->thread.fpu.fpstate_active, >> > + "direct FPU save with no math use\n"); >> >> ... Here "direct" seems to mean that we're asking whether to directly save >> ... >> >> > + >> > + /* >> > +* In the case that we are using a compacted kernel >> > +* xsave area, we can not copy the thread.fpu.state >> > +* directly to userspace and *must* save it from the >> > +* registers directly. >> > +*/ >> >> ... and here "directly" means *both* copying directly to userspace and >> saving using xsave directly. >> >> So can you rename it to something with an obvious meaning like >> "may_memcpy_fpu_regs" or similar? > > This function determines whether the kernel should do > copy_fpregs_to_sigframe(). Perhaps we can re-name it to > may_copy_fpregs_to_sigframe()? > Works for me. > --Yu-cheng > -- Andy Lutomirski AMA Capital Management, LLC
[PATCH v2] mm: scale kswapd watermarks in proportion to memory
In machines with 140G of memory and enterprise flash storage, we have seen read and write bursts routinely exceed the kswapd watermarks and cause thundering herds in direct reclaim. Unfortunately, the only way to tune kswapd aggressiveness is through adjusting min_free_kbytes - the system's emergency reserves - which is entirely unrelated to the system's latency requirements. In order to get kswapd to maintain a 250M buffer of free memory, the emergency reserves need to be set to 1G. That is a lot of memory wasted for no good reason. On the other hand, it's reasonable to assume that allocation bursts and overall allocation concurrency scale with memory capacity, so it makes sense to make kswapd aggressiveness a function of that as well. Change the kswapd watermark scale factor from the currently fixed 25% of the tunable emergency reserve to a tunable 0.001% of memory. Beyond 1G of memory, this will produce bigger watermark steps than the current formula in default settings. Ensure that the new formula never chooses steps smaller than that, i.e. 25% of the emergency reserve. On a 140G machine, this raises the default watermark steps - the distance between min and low, and low and high - from 16M to 143M. Signed-off-by: Johannes Weiner Acked-by: Mel Gorman --- Documentation/sysctl/vm.txt | 18 ++ include/linux/mm.h | 1 + include/linux/mmzone.h | 2 ++ kernel/sysctl.c | 10 ++ mm/page_alloc.c | 29 +++-- 5 files changed, 58 insertions(+), 2 deletions(-) v2: Ensure 25% of emergency reserves as a minimum on small machines -Rik diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index 89a887c..b02d940 100644 --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt @@ -803,6 +803,24 @@ performance impact. Reclaim code needs to take various locks to find freeable directory and inode objects. With vfs_cache_pressure=1000, it will look for ten times more freeable objects than there are. += + +watermark_scale_factor: + +This factor controls the aggressiveness of kswapd. It defines the +amount of memory left in a node/system before kswapd is woken up and +how much memory needs to be free before kswapd goes back to sleep. + +The unit is in fractions of 10,000. The default value of 10 means the +distances between watermarks are 0.001% of the available memory in the +node/system. The maximum value is 1000, or 10% of memory. + +A high rate of threads entering direct reclaim (allocstall) or kswapd +going to sleep prematurely (kswapd_low_wmark_hit_quickly) can indicate +that the number of free pages kswapd maintains for latency reasons is +too small for the allocation bursts occurring in the system. This knob +can then be used to tune kswapd aggressiveness accordingly. + == zone_reclaim_mode: diff --git a/include/linux/mm.h b/include/linux/mm.h index a0ad7af..d330cbb 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1869,6 +1869,7 @@ extern void zone_pcp_reset(struct zone *zone); /* page_alloc.c */ extern int min_free_kbytes; +extern int watermark_scale_factor; /* nommu.c */ extern atomic_long_t mmap_pages_allocated; diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 03cbdd9..85d6702 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -833,6 +833,8 @@ static inline int is_highmem(struct zone *zone) struct ctl_table; int min_free_kbytes_sysctl_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *); +int watermark_scale_factor_sysctl_handler(struct ctl_table *, int, + void __user *, size_t *, loff_t *); extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1]; int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *); diff --git a/kernel/sysctl.c b/kernel/sysctl.c index d479707..780769e 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -126,6 +126,7 @@ static int __maybe_unused two = 2; static int __maybe_unused four = 4; static unsigned long one_ul = 1; static int one_hundred = 100; +static int one_thousand = 1000; #ifdef CONFIG_PRINTK static int ten_thousand = 1; #endif @@ -1393,6 +1394,15 @@ static struct ctl_table vm_table[] = { .extra1 = &zero, }, { + .procname = "watermark_scale_factor", + .data = &watermark_scale_factor, + .maxlen = sizeof(watermark_scale_factor), + .mode = 0644, + .proc_handler = watermark_scale_factor_sysctl_handler, + .extra1 = &one, + .extra2 = &one_thousand, + }, + { .procname = "percpu_pa
Re: [PATCH v2 12/16] clk: avoid circular clock topology
On 22 February 2016 at 03:29, Masahiro Yamada wrote: > Hi Joachim, > > > 2016-02-22 6:39 GMT+09:00 Joachim Eastwood : >> Hi everyone, >> >> On 28 December 2015 at 11:10, Masahiro Yamada >> wrote: >>> Currently, clk_register() never checks a circular parent looping, >>> but clock providers could register such an insane clock topology. >>> For example, "clk_a" could have "clk_b" as a parent, and vice versa. >>> In this case, clk_core_reparent() creates a circular parent list >>> and __clk_recalc_accuracies() calls itself recursively forever. >>> >>> The core infrastructure should be kind enough to bail out, showing >>> an appropriate error message in such a case. This helps to easily >>> find a bug in clock providers. (uh, I made such a silly mistake >>> when I was implementing my clock providers first. I was upset >>> because the kernel did not respond, without any error message.) >>> >>> This commit adds a new helper function, __clk_is_ancestor(). It >>> returns true if the second argument is a possible ancestor of the >>> first one. If a clock core is a possible ancestor of itself, it >>> would make a loop when it were registered. That should be detected >>> as an error. >> >> This commit breaks lpc18xx boot in next right now. See >> http://marc.info/?l=linux-arm-kernel&m=145608597106087&w=2 >> >> The Clock Generation Unit (CGU) on lpc18xx allow for circular parents >> in hardware. While it is obliviously not a good idea to configure the >> clocks in that manner there is nothing that stops you either. >> >> Please take a look at the second figure on: >> https://github.com/manabian/linux-lpc/wiki/LPC18xx-LPC43xx-clocks >> All PLLs can feed clock into the dividers and the dividers can feed >> clock into the PLLs. >> >> The reason why this is made possible in the CGU is because you can >> then choose where to put your divider; either before the PLL or after. >> > > > Sorry for breaking your board. No worries, that is why we have next so we can catch it before it hits mainline :-) > I am OK with reverting b58f75aa83fb. > > > > I guess your hardware could make clock looping for the best flexibility > but you do not make clock looping in actual use cases. That's right. > Maybe, does it make sense to check the parent looping > in clk_set_parent() or somewhere, not in clk_register()? I think that would be a nice addition. While the CGU can certainly be configured with loops it is indeed something that we should prevent from happening. regards, Joachim Eastwood
Re: [PATCH 04/10] x86/xsaves: Introduce a new check that allows correct xstates copy from kernel to user directly
On Mon, Feb 22, 2016 at 02:42:54PM -0800, Andy Lutomirski wrote: > > +static int should_save_registers_directly(void) > > I don't like the name of this function because: > > > +{ > > + /* > > +* In signal handling path, the kernel already checks if > > +* FPU instructions have been used before it calls > > +* copy_fpstate_to_sigframe(). We check this here again > > +* to detect any potential mis-use and saving invalid > > +* register values directly to a signal frame. > > +*/ > > + WARN_ONCE(!current->thread.fpu.fpstate_active, > > + "direct FPU save with no math use\n"); > > ... Here "direct" seems to mean that we're asking whether to directly save ... > > > + > > + /* > > +* In the case that we are using a compacted kernel > > +* xsave area, we can not copy the thread.fpu.state > > +* directly to userspace and *must* save it from the > > +* registers directly. > > +*/ > > ... and here "directly" means *both* copying directly to userspace and > saving using xsave directly. > > So can you rename it to something with an obvious meaning like > "may_memcpy_fpu_regs" or similar? This function determines whether the kernel should do copy_fpregs_to_sigframe(). Perhaps we can re-name it to may_copy_fpregs_to_sigframe()? --Yu-cheng
Re: [PATCH v3 4/5] ACPI / processor_idle : introduce ARCH_SUPPORTS_ACPI_PROCESSOR_CSTATE
On Mon, Feb 22, 2016 at 2:46 PM, Sudeep Holla wrote: > Hi Rafael, > > On 17/02/16 12:21, Sudeep Holla wrote: >> >> >> >> On 16/02/16 20:18, Rafael J. Wysocki wrote: > > > [..] > >> >>> This way it all should work without any new Kconfig options. >>> >> >> I agree with you in terms of avoiding new Kconfig option. However the >> main reason for adding it is to avoid declaring dummy functions and >> variables on ARM64. >> >> It's hard to justify the maintainers as it's totally useless on ARM64. >> E.g. boot_option_idle_override, IDLE_NOMWAIT, acpi_unlazy_tlb, >> arch_safe_halt. >> >> Other option is to push those under CONFIG_X86, but then I don't have >> much idea on what are all needed for IA64, so took an option that >> encapsulates everything under CSTATE feature Kconfig, which is not user >> visible and selected by archs supporting it by default. >> >> I am open to any other alternative. >> > > Whatever alternative methods I tried so far ended up much horrible than > this. So any suggestions are much appreciated. Well, you have to give me some time here, sorry. I'm still not done with cpufreq at this point and it's going to consume some more cycles I'm afraid. Thanks, Rafael
Re: [PATCH] lkdtm: add test for executing .rodata
On 22 Feb 2016 at 12:46, Kees Cook wrote: > GCC really wants to declare the section. :( hmm, i see, so how about going about it another way. instead of trying to do this at compile/link time, do it an load/runtime. one way of doing it would be to preserve a page in .rodata then map in a code page underneath that holds your empty function (which you can generate from C). it'd be somewhat similar to how the vsyscall page on amd64 is mapped (or used to be mapped) from the kernel image into its userland visible place.
[PATCH] Revert "clk: avoid circular clock topology"
This reverts commit 858d5881564026cbc4e6f5e25ae878a27df5d4c9. Joachim reports that this commit breaks lpc18xx boot. This is because the hardware has circular clk topology where PLLs can feed into dividers and the same dividers can feed into the PLLs. The hardware is designed this way so that you can choose to put the divider before the PLL or after the PLL depending on what you configure to be the parent of the divider and what you configure to be the parent of the PLL. So let's drop this patch for now because we have hardware that actually has loops. A future patch could check for circular parents when we change parents and fail the switch, but that's probably best left to some debugging Kconfig option so that we don't suffer the sanity checking cost all the time. Reported-by: Joachim Eastwood Cc: Masahiro Yamada Signed-off-by: Stephen Boyd --- drivers/clk/clk.c | 40 1 file changed, 40 deletions(-) diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c index 58ef3dab894a..51d673370d42 100644 --- a/drivers/clk/clk.c +++ b/drivers/clk/clk.c @@ -2242,38 +2242,6 @@ static inline void clk_debug_unregister(struct clk_core *core) #endif /** - * __clk_is_ancestor - check if a clk_core is a possible ancestor of another - * @core: clock core - * @ancestor: ancestor clock core - * - * Returns true if there is a possibility that @ancestor can be an ancestor - * of @core, false otherwise. - * - * This function can be used against @core or @ancestor that has not been - * registered yet. - */ -static bool __clk_is_ancestor(struct clk_core *core, struct clk_core *ancestor) -{ - struct clk_core *parent; - int i; - - for (i = 0; i < core->num_parents; i++) { - parent = clk_core_get_parent_by_index(core, i); - /* -* If ancestor has not been added to clk_{root,orphan}_list -* yet, clk_core_lookup() cannot find it. If parent is NULL, -* compare the name strings, too. -*/ - if ((parent && (parent == ancestor || - __clk_is_ancestor(parent, ancestor))) || - (!parent && !strcmp(core->parent_names[i], ancestor->name))) - return true; - } - - return false; -} - -/** * __clk_core_init - initialize the data structures in a struct clk_core * @core: clk_core being initialized * @@ -2338,14 +2306,6 @@ static int __clk_core_init(struct clk_core *core) "%s: invalid NULL in %s's .parent_names\n", __func__, core->name); - /* If core is an ancestor of itself, it would make a loop. */ - if (__clk_is_ancestor(core, core)) { - pr_err("%s: %s would create circular parent\n", __func__, - core->name); - ret = -EINVAL; - goto out; - } - core->parent = __clk_init_parent(core); /* -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project
[PATCH 2/6] ncr5380: Dont release lock for PIO transfer
The calls to NCR5380_transfer_pio() for DATA IN and DATA OUT phases will modify cmd->SCp.this_residual, cmd->SCp.ptr and cmd->SCp.buffer. That works as long as EH does not intervene, which became possible in atari_NCR5380.c when I changed the locking to bring it closer to NCR5380.c. If error recovery aborts the command, the scsi_cmnd in question and its buffer will be returned to the mid-layer. So the transfer has to cease, but it can't be stopped by the initiator because the target controls the bus phase. The problem does not arise if the lock is not released. That was fine for atari_scsi, because it implements DMA. For the other drivers, we have to release the lock and re-enable interrupts for long PIO data transfers. The solution is to split the transfer into small chunks. In between chunks the main loop releases the lock and re-enables interrupts. Thus interrupts can be serviced and eh_bus_reset_handler can intervene if need be. This fixes an oops in NCR5380_transfer_pio() that can happen when the EH abort handler is invoked during DATA IN or DATA OUT phase. Fixes: 11d2f63b9cf5 ("ncr5380: Change instance->host_lock to hostdata->lock") Reported-and-tested-by: Michael Schmitz Signed-off-by: Finn Thain --- drivers/scsi/NCR5380.c | 16 +--- drivers/scsi/atari_NCR5380.c | 16 +--- 2 files changed, 18 insertions(+), 14 deletions(-) Index: linux/drivers/scsi/NCR5380.c === --- linux.orig/drivers/scsi/NCR5380.c 2016-02-23 10:06:56.0 +1100 +++ linux/drivers/scsi/NCR5380.c2016-02-23 10:06:57.0 +1100 @@ -1759,9 +1759,7 @@ static void NCR5380_information_transfer unsigned char msgout = NOP; int sink = 0; int len; -#if defined(PSEUDO_DMA) || defined(REAL_DMA_POLL) int transfersize; -#endif unsigned char *data; unsigned char phase, tmp, extended_msg[10], old_phase = 0xff; struct scsi_cmnd *cmd; @@ -1854,13 +1852,17 @@ static void NCR5380_information_transfer } else #endif /* defined(PSEUDO_DMA) || defined(REAL_DMA_POLL) */ { - spin_unlock_irq(&hostdata->lock); - NCR5380_transfer_pio(instance, &phase, -(int *)&cmd->SCp.this_residual, + /* Break up transfer into 3 ms chunks, +* presuming 6 accesses per handshake. +*/ + transfersize = min((unsigned long)cmd->SCp.this_residual, + hostdata->accesses_per_ms / 2); + len = transfersize; + NCR5380_transfer_pio(instance, &phase, &len, (unsigned char **)&cmd->SCp.ptr); - spin_lock_irq(&hostdata->lock); + cmd->SCp.this_residual -= transfersize - len; } - break; + return; case PHASE_MSGIN: len = 1; data = &tmp; Index: linux/drivers/scsi/atari_NCR5380.c === --- linux.orig/drivers/scsi/atari_NCR5380.c 2016-02-23 10:06:56.0 +1100 +++ linux/drivers/scsi/atari_NCR5380.c 2016-02-23 10:06:57.0 +1100 @@ -1838,9 +1838,7 @@ static void NCR5380_information_transfer unsigned char msgout = NOP; int sink = 0; int len; -#if defined(REAL_DMA) int transfersize; -#endif unsigned char *data; unsigned char phase, tmp, extended_msg[10], old_phase = 0xff; struct scsi_cmnd *cmd; @@ -1983,18 +1981,22 @@ static void NCR5380_information_transfer } else #endif /* defined(REAL_DMA) */ { - spin_unlock_irq(&hostdata->lock); - NCR5380_transfer_pio(instance, &phase, -(int *)&cmd->SCp.this_residual, + /* Break up transfer into 3 ms chunks, +* presuming 6 accesses per handshake. +*/ + transfersize = min((unsigned long)cmd->SCp.this_residual, + hostdata->accesses_per_ms / 2); + len = transfersize; +
[PATCH 3/6] ncr5380: Dont re-enter NCR5380_select()
Calling NCR5380_select() from the abort handler causes various problems. Firstly, it means potentially re-entering NCR5380_select(). Secondly, it means that the lock is released, which permits the EH handlers to be re-entered. The combination results in crashes. Don't do it. Fixes: 8b00c3d5d40d ("ncr5380: Implement new eh_abort_handler") Reported-and-tested-by: Michael Schmitz Signed-off-by: Finn Thain --- drivers/scsi/NCR5380.c | 16 drivers/scsi/atari_NCR5380.c | 16 2 files changed, 16 insertions(+), 16 deletions(-) Index: linux/drivers/scsi/NCR5380.c === --- linux.orig/drivers/scsi/NCR5380.c 2016-02-23 10:06:57.0 +1100 +++ linux/drivers/scsi/NCR5380.c2016-02-23 10:06:58.0 +1100 @@ -2302,6 +2302,9 @@ static bool list_del_cmd(struct list_hea * If cmd was not found at all then presumably it has already been completed, * in which case return SUCCESS to try to avoid further EH measures. * If the command has not completed yet, we must not fail to find it. + * + * The lock protects driver data structures, but EH handlers also use it + * to serialize their own execution and prevent their own re-entry. */ static int NCR5380_abort(struct scsi_cmnd *cmd) @@ -2338,14 +2341,11 @@ static int NCR5380_abort(struct scsi_cmn if (list_del_cmd(&hostdata->disconnected, cmd)) { dsprintk(NDEBUG_ABORT, instance, "abort: removed %p from disconnected list\n", cmd); - cmd->result = DID_ERROR << 16; - if (!hostdata->connected) - NCR5380_select(instance, cmd); - if (hostdata->connected != cmd) { - complete_cmd(instance, cmd); - result = FAILED; - goto out; - } + /* Can't call NCR5380_select() and send ABORT because that +* means releasing the lock. Need a bus reset. +*/ + result = FAILED; + goto out; } if (hostdata->connected == cmd) { Index: linux/drivers/scsi/atari_NCR5380.c === --- linux.orig/drivers/scsi/atari_NCR5380.c 2016-02-23 10:06:57.0 +1100 +++ linux/drivers/scsi/atari_NCR5380.c 2016-02-23 10:06:58.0 +1100 @@ -2497,6 +2497,9 @@ static bool list_del_cmd(struct list_hea * If cmd was not found at all then presumably it has already been completed, * in which case return SUCCESS to try to avoid further EH measures. * If the command has not completed yet, we must not fail to find it. + * + * The lock protects driver data structures, but EH handlers also use it + * to serialize their own execution and prevent their own re-entry. */ static int NCR5380_abort(struct scsi_cmnd *cmd) @@ -2533,14 +2536,11 @@ static int NCR5380_abort(struct scsi_cmn if (list_del_cmd(&hostdata->disconnected, cmd)) { dsprintk(NDEBUG_ABORT, instance, "abort: removed %p from disconnected list\n", cmd); - cmd->result = DID_ERROR << 16; - if (!hostdata->connected) - NCR5380_select(instance, cmd); - if (hostdata->connected != cmd) { - complete_cmd(instance, cmd); - result = FAILED; - goto out; - } + /* Can't call NCR5380_select() and send ABORT because that +* means releasing the lock. Need a bus reset. +*/ + result = FAILED; + goto out; } if (hostdata->connected == cmd) {
[PATCH 4/6] ncr5380: Forget aborted commands
The list structures and related logic used in the NCR5380 driver mean that a command cannot be queued twice (i.e. can't appear on more than one queue and can't appear on the same queue more than once). The abort handler must forget the command so that the mid-layer can re-use it. E.g. the ML may send it back to the LLD via via scsi_eh_get_sense(). Fix this and also fix two error paths, so that commands get forgotten iff completed. Fixes: 8b00c3d5d40d ("ncr5380: Implement new eh_abort_handler") Tested-by: Michael Schmitz Signed-off-by: Finn Thain --- drivers/scsi/NCR5380.c | 62 +++ drivers/scsi/atari_NCR5380.c | 62 +++ 2 files changed, 34 insertions(+), 90 deletions(-) Index: linux/drivers/scsi/NCR5380.c === --- linux.orig/drivers/scsi/NCR5380.c 2016-02-23 10:06:58.0 +1100 +++ linux/drivers/scsi/NCR5380.c2016-02-23 10:07:00.0 +1100 @@ -1796,6 +1796,7 @@ static void NCR5380_information_transfer do_abort(instance); cmd->result = DID_ERROR << 16; complete_cmd(instance, cmd); + hostdata->connected = NULL; return; #endif case PHASE_DATAIN: @@ -1845,7 +1846,6 @@ static void NCR5380_information_transfer sink = 1; do_abort(instance); cmd->result = DID_ERROR << 16; - complete_cmd(instance, cmd); /* XXX - need to source or sink data here, as appropriate */ } else cmd->SCp.this_residual -= transfersize - len; @@ -2294,14 +2294,14 @@ static bool list_del_cmd(struct list_hea * [disconnected -> connected ->]... * [autosense -> connected ->] done * - * If cmd is unissued then just remove it. - * If cmd is disconnected, try to select the target. - * If cmd is connected, try to send an abort message. - * If cmd is waiting for autosense, give it a chance to complete but check - * that it isn't left connected. * If cmd was not found at all then presumably it has already been completed, * in which case return SUCCESS to try to avoid further EH measures. + * * If the command has not completed yet, we must not fail to find it. + * We have no option but to forget the aborted command (even if it still + * lacks sense data). The mid-layer may re-issue a command that is in error + * recovery (see scsi_send_eh_cmnd), but the logic and data structures in + * this driver are such that a command can appear on one queue only. * * The lock protects driver data structures, but EH handlers also use it * to serialize their own execution and prevent their own re-entry. @@ -2327,6 +2327,7 @@ static int NCR5380_abort(struct scsi_cmn "abort: removed %p from issue queue\n", cmd); cmd->result = DID_ABORT << 16; cmd->scsi_done(cmd); /* No tag or busy flag to worry about */ + goto out; } if (hostdata->selecting == cmd) { @@ -2344,6 +2345,8 @@ static int NCR5380_abort(struct scsi_cmn /* Can't call NCR5380_select() and send ABORT because that * means releasing the lock. Need a bus reset. */ + set_host_byte(cmd, DID_ERROR); + complete_cmd(instance, cmd); result = FAILED; goto out; } @@ -2351,45 +2354,9 @@ static int NCR5380_abort(struct scsi_cmn if (hostdata->connected == cmd) { dsprintk(NDEBUG_ABORT, instance, "abort: cmd %p is connected\n", cmd); hostdata->connected = NULL; - if (do_abort(instance)) { - set_host_byte(cmd, DID_ERROR); - complete_cmd(instance, cmd); - result = FAILED; - goto out; - } - set_host_byte(cmd, DID_ABORT); #ifdef REAL_DMA hostdata->dma_len = 0; #endif - if (cmd->cmnd[0] == REQUEST_SENSE) - complete_cmd(instance, cmd); - else { - struct NCR5380_cmd *ncmd = scsi_cmd_priv(cmd); - - /* Perform autosense for this command */ - list_add(&ncmd->list, &hostdata->autosense); - } - } - - if (list_find_cmd(&hostdata->autosense, cmd)) { - dsprintk(NDEBUG_ABORT, instance, -"abort: found %p on sense queue\n", cmd); - spin_unlock_irqrestore(&hostdata->lock,
[PATCH 1/6] ncr5380: Correctly clear command pointers and lists after bus reset
Commands subject to exception handling are to be returned to the scsi mid-layer. Make sure that the various command pointers and command lists in the low-level driver are correctly cleansed of affected commands. This fixes some bugs that I accidentally introduced in v4.5-rc1 including the removal of INIT_LIST_HEAD for the 'autosense' and 'disconnected' command lists, and the possible NULL pointer dereference in NCR5380_bus_reset() that was reported by Dan Carpenter. hostdata->sensing may also point to an affected command so this pointer also has to be cleared. The abort handler calls complete_cmd() to take care of this; let's have the bus reset handler do the same. The issue queue may also contain an affected command. If so, remove it. This also follows the abort handler logic. Reported-by: Dan Carpenter Fixes: 62717f537e1b ("ncr5380: Implement new eh_bus_reset_handler") Tested-by: Michael Schmitz Signed-off-by: Finn Thain --- drivers/scsi/NCR5380.c | 19 --- drivers/scsi/atari_NCR5380.c | 19 --- 2 files changed, 24 insertions(+), 14 deletions(-) Index: linux/drivers/scsi/NCR5380.c === --- linux.orig/drivers/scsi/NCR5380.c 2016-02-23 10:06:56.0 +1100 +++ linux/drivers/scsi/NCR5380.c2016-02-23 10:06:56.0 +1100 @@ -2450,7 +2450,16 @@ static int NCR5380_bus_reset(struct scsi * commands! */ - hostdata->selecting = NULL; + if (list_del_cmd(&hostdata->unissued, cmd)) { + cmd->result = DID_RESET << 16; + cmd->scsi_done(cmd); + } + + if (hostdata->selecting) { + hostdata->selecting->result = DID_RESET << 16; + complete_cmd(instance, hostdata->selecting); + hostdata->selecting = NULL; + } list_for_each_entry(ncmd, &hostdata->disconnected, list) { struct scsi_cmnd *cmd = NCR5380_to_scmd(ncmd); @@ -2458,6 +2467,7 @@ static int NCR5380_bus_reset(struct scsi set_host_byte(cmd, DID_RESET); cmd->scsi_done(cmd); } + INIT_LIST_HEAD(&hostdata->disconnected); list_for_each_entry(ncmd, &hostdata->autosense, list) { struct scsi_cmnd *cmd = NCR5380_to_scmd(ncmd); @@ -2465,6 +2475,7 @@ static int NCR5380_bus_reset(struct scsi set_host_byte(cmd, DID_RESET); cmd->scsi_done(cmd); } + INIT_LIST_HEAD(&hostdata->autosense); if (hostdata->connected) { set_host_byte(hostdata->connected, DID_RESET); @@ -2472,12 +2483,6 @@ static int NCR5380_bus_reset(struct scsi hostdata->connected = NULL; } - if (hostdata->sensing) { - set_host_byte(hostdata->connected, DID_RESET); - complete_cmd(instance, hostdata->sensing); - hostdata->sensing = NULL; - } - for (i = 0; i < 8; ++i) hostdata->busy[i] = 0; #ifdef REAL_DMA Index: linux/drivers/scsi/atari_NCR5380.c === --- linux.orig/drivers/scsi/atari_NCR5380.c 2016-02-23 10:06:56.0 +1100 +++ linux/drivers/scsi/atari_NCR5380.c 2016-02-23 10:06:56.0 +1100 @@ -2646,7 +2646,16 @@ static int NCR5380_bus_reset(struct scsi * commands! */ - hostdata->selecting = NULL; + if (list_del_cmd(&hostdata->unissued, cmd)) { + cmd->result = DID_RESET << 16; + cmd->scsi_done(cmd); + } + + if (hostdata->selecting) { + hostdata->selecting->result = DID_RESET << 16; + complete_cmd(instance, hostdata->selecting); + hostdata->selecting = NULL; + } list_for_each_entry(ncmd, &hostdata->disconnected, list) { struct scsi_cmnd *cmd = NCR5380_to_scmd(ncmd); @@ -2654,6 +2663,7 @@ static int NCR5380_bus_reset(struct scsi set_host_byte(cmd, DID_RESET); cmd->scsi_done(cmd); } + INIT_LIST_HEAD(&hostdata->disconnected); list_for_each_entry(ncmd, &hostdata->autosense, list) { struct scsi_cmnd *cmd = NCR5380_to_scmd(ncmd); @@ -2661,6 +2671,7 @@ static int NCR5380_bus_reset(struct scsi set_host_byte(cmd, DID_RESET); cmd->scsi_done(cmd); } + INIT_LIST_HEAD(&hostdata->autosense); if (hostdata->connected) { set_host_byte(hostdata->connected, DID_RESET); @@ -2668,12 +2679,6 @@ static int NCR5380_bus_reset(struct scsi hostdata->connected = NULL; } - if (hostdata->sensing) { - set_host_byte(hostdata->connected, DID_RESET); - complete_cmd(instance, hostdata->sensing); - hostdata->sensing = NULL; - } - #ifdef SUPPORT_TAGS free_all_tags(hostdata); #endif
[PATCH 5/6] ncr5380: Fix NCR5380_select() EH checks and result handling
Add missing checks for EH abort during arbitration and selection. Rework the handling of NCR5380_select() result to improve clarity. Fixes: 707d62b37fbb ("ncr5380: Fix EH during arbitration and selection") Tested-by: Michael Schmitz Signed-off-by: Finn Thain --- drivers/scsi/NCR5380.c | 16 +++- drivers/scsi/atari_NCR5380.c | 16 +++- 2 files changed, 22 insertions(+), 10 deletions(-) Index: linux/drivers/scsi/NCR5380.c === --- linux.orig/drivers/scsi/NCR5380.c 2016-02-23 10:07:00.0 +1100 +++ linux/drivers/scsi/NCR5380.c2016-02-23 10:07:01.0 +1100 @@ -815,15 +815,17 @@ static void NCR5380_main(struct work_str struct NCR5380_hostdata *hostdata = container_of(work, struct NCR5380_hostdata, main_task); struct Scsi_Host *instance = hostdata->host; - struct scsi_cmnd *cmd; int done; do { done = 1; spin_lock_irq(&hostdata->lock); - while (!hostdata->connected && - (cmd = dequeue_next_cmd(instance))) { + while (!hostdata->connected && !hostdata->selecting) { + struct scsi_cmnd *cmd = dequeue_next_cmd(instance); + + if (!cmd) + break; dsprintk(NDEBUG_MAIN, instance, "main: dequeued %p\n", cmd); @@ -840,8 +842,7 @@ static void NCR5380_main(struct work_str * entire unit. */ - cmd = NCR5380_select(instance, cmd); - if (!cmd) { + if (!NCR5380_select(instance, cmd)) { dsprintk(NDEBUG_MAIN, instance, "main: select complete\n"); } else { dsprintk(NDEBUG_MAIN | NDEBUG_QUEUES, instance, @@ -1056,6 +1057,11 @@ static struct scsi_cmnd *NCR5380_select( /* Reselection interrupt */ goto out; } + if (!hostdata->selecting) { + /* Command was aborted */ + NCR5380_write(MODE_REG, MR_BASE); + goto out; + } if (err < 0) { NCR5380_write(MODE_REG, MR_BASE); shost_printk(KERN_ERR, instance, Index: linux/drivers/scsi/atari_NCR5380.c === --- linux.orig/drivers/scsi/atari_NCR5380.c 2016-02-23 10:07:00.0 +1100 +++ linux/drivers/scsi/atari_NCR5380.c 2016-02-23 10:07:01.0 +1100 @@ -923,7 +923,6 @@ static void NCR5380_main(struct work_str struct NCR5380_hostdata *hostdata = container_of(work, struct NCR5380_hostdata, main_task); struct Scsi_Host *instance = hostdata->host; - struct scsi_cmnd *cmd; int done; /* @@ -936,8 +935,11 @@ static void NCR5380_main(struct work_str done = 1; spin_lock_irq(&hostdata->lock); - while (!hostdata->connected && - (cmd = dequeue_next_cmd(instance))) { + while (!hostdata->connected && !hostdata->selecting) { + struct scsi_cmnd *cmd = dequeue_next_cmd(instance); + + if (!cmd) + break; dsprintk(NDEBUG_MAIN, instance, "main: dequeued %p\n", cmd); @@ -960,8 +962,7 @@ static void NCR5380_main(struct work_str #ifdef SUPPORT_TAGS cmd_get_tag(cmd, cmd->cmnd[0] != REQUEST_SENSE); #endif - cmd = NCR5380_select(instance, cmd); - if (!cmd) { + if (!NCR5380_select(instance, cmd)) { dsprintk(NDEBUG_MAIN, instance, "main: select complete\n"); maybe_release_dma_irq(instance); } else { @@ -1257,6 +1258,11 @@ static struct scsi_cmnd *NCR5380_select( /* Reselection interrupt */ goto out; } + if (!hostdata->selecting) { + /* Command was aborted */ + NCR5380_write(MODE_REG, MR_BASE); + goto out; + } if (err < 0) { NCR5380_write(MODE_REG, MR_BASE); shost_printk(KERN_ERR, instance,
[PATCH 6/6] ncr5380: Call scsi_eh_prep_cmnd() and scsi_eh_restore_cmnd() as and when appropriate
This bug causes the wrong command to have its sense pointer overwritten, which sometimes leads to a NULL pointer deref. Fix this by checking which command is being requeued before restoring the scsi_eh_save data. It turns out that some targets will disconnect a REQUEST SENSE command. The autosense algorithm doesn't anticipate this. Hence multiple commands can end up undergoing autosense simultaneously, and they will all try to use the same scsi_eh_save struct, which won't work. Defer autosense when the scsi_eh_save storage is in use by another command. Fixes: f27db8eb98a1 ("ncr5380: Fix autosense bugs") Reported-and-tested-by: Michael Schmitz Signed-off-by: Finn Thain --- drivers/scsi/NCR5380.c |4 ++-- drivers/scsi/atari_NCR5380.c |4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) Index: linux/drivers/scsi/NCR5380.c === --- linux.orig/drivers/scsi/NCR5380.c 2016-02-23 10:07:01.0 +1100 +++ linux/drivers/scsi/NCR5380.c2016-02-23 10:07:02.0 +1100 @@ -760,7 +760,7 @@ static struct scsi_cmnd *dequeue_next_cm struct NCR5380_cmd *ncmd; struct scsi_cmnd *cmd; - if (list_empty(&hostdata->autosense)) { + if (hostdata->sensing || list_empty(&hostdata->autosense)) { list_for_each_entry(ncmd, &hostdata->unissued, list) { cmd = NCR5380_to_scmd(ncmd); dsprintk(NDEBUG_QUEUES, instance, "dequeue: cmd=%p target=%d busy=0x%02x lun=%llu\n", @@ -793,7 +793,7 @@ static void requeue_cmd(struct Scsi_Host struct NCR5380_hostdata *hostdata = shost_priv(instance); struct NCR5380_cmd *ncmd = scsi_cmd_priv(cmd); - if (hostdata->sensing) { + if (hostdata->sensing == cmd) { scsi_eh_restore_cmnd(cmd, &hostdata->ses); list_add(&ncmd->list, &hostdata->autosense); hostdata->sensing = NULL; Index: linux/drivers/scsi/atari_NCR5380.c === --- linux.orig/drivers/scsi/atari_NCR5380.c 2016-02-23 10:07:01.0 +1100 +++ linux/drivers/scsi/atari_NCR5380.c 2016-02-23 10:07:02.0 +1100 @@ -862,7 +862,7 @@ static struct scsi_cmnd *dequeue_next_cm struct NCR5380_cmd *ncmd; struct scsi_cmnd *cmd; - if (list_empty(&hostdata->autosense)) { + if (hostdata->sensing || list_empty(&hostdata->autosense)) { list_for_each_entry(ncmd, &hostdata->unissued, list) { cmd = NCR5380_to_scmd(ncmd); dsprintk(NDEBUG_QUEUES, instance, "dequeue: cmd=%p target=%d busy=0x%02x lun=%llu\n", @@ -901,7 +901,7 @@ static void requeue_cmd(struct Scsi_Host struct NCR5380_hostdata *hostdata = shost_priv(instance); struct NCR5380_cmd *ncmd = scsi_cmd_priv(cmd); - if (hostdata->sensing) { + if (hostdata->sensing == cmd) { scsi_eh_restore_cmnd(cmd, &hostdata->ses); list_add(&ncmd->list, &hostdata->autosense); hostdata->sensing = NULL;
Re: [PATCH net v2] r8169:fix "rtl_counters_cond == 1 (loop: 1000, delay: 10)" log spam.
Chunhao Lin : [...] > diff --git a/drivers/net/ethernet/realtek/r8169.c > b/drivers/net/ethernet/realtek/r8169.c > index 537974c..a645f8d 100644 > --- a/drivers/net/ethernet/realtek/r8169.c > +++ b/drivers/net/ethernet/realtek/r8169.c > @@ -7730,10 +7730,13 @@ rtl8169_get_stats64(struct net_device *dev, struct > rtnl_link_stats64 *stats) > { > struct rtl8169_private *tp = netdev_priv(dev); > void __iomem *ioaddr = tp->mmio_addr; > + struct pci_dev *pdev = tp->pci_dev; + struct device *d = &tp->pci_dev->dev; (the patch does not use pdev alone) [...] > @@ -7761,7 +7764,8 @@ rtl8169_get_stats64(struct net_device *dev, struct > rtnl_link_stats64 *stats) >* Fetch additonal counter values missing in stats collected by driver >* from tally counters. >*/ > - rtl8169_update_counters(dev); > + if (pm_runtime_active(&pdev->dev)) > + rtl8169_update_counters(dev); pm_runtime_active() won't change after pm_runtime_get_noresume(). You may set a boolean active = pm_runtime_active(d) before testing netif_running(). [...] > @@ -7842,6 +7848,12 @@ static int rtl8169_runtime_suspend(struct device > *device) > struct pci_dev *pdev = to_pci_dev(device); > struct net_device *dev = pci_get_drvdata(pdev); > struct rtl8169_private *tp = netdev_priv(dev); > + void __iomem *ioaddr = tp->mmio_addr; > + > + /* Update counters before going runtime suspend */ > + if (netif_running(dev)) > + rtl8169_rx_missed(dev, ioaddr); > + rtl8169_update_counters(dev); > > if (!tp->TxDescArray) > return 0; Nits: - the tp->TxDescArray test provides the required synchronization: see rtl8169_{open/close} and their pm_runtime_{get / put}. - ioaddr is not really needed : tp->mmio_addr appears only once and it does not mess the 72..80 cols limit. - even if the device can only be automatically runtime suspended some time after a link down event, you may address davem's point regarding stats reliability and move rtl8169_rx_missed + rtl8169_update_counters after rtl8169_net_suspend. -- Ueimor
[PATCH 0/6] ncr5380: Exception handling fixes for v4.5
These patches fix some exception handling and autosense bugs that I accidentally introduced in v4.5-rc1. The error recovery and autosense code in these drivers has been unstable for a long time. Despite that, v4.5-rc1 shows a regression in as much as it exposes a bug in the aranym emulator. This leads to error recovery, which can crash. Also, Michael Schmitz reported some crashes involving abort handling for a certain target device. And Dan Carpenter found a NULL pointer deref in the new bus reset code. Error recovery and autosense are stable with these patches. I tested them using a Domex 3191D PCI card. Errors during IO were simulated by sending bus resets and unplugging/replugging the SCSI cables. Some of these patches fix bugs that only affect more capable hardware (like Atari). Thanks to Michael Schmitz for patiently testing those. Please review this series for v4.5. --- drivers/scsi/NCR5380.c | 133 +++ drivers/scsi/atari_NCR5380.c | 133 +++ 2 files changed, 118 insertions(+), 148 deletions(-)
Re: [Xen-devel] [PATCH 8/9] x86/rtc: replace paravirt_enabled() check with subarch check
On Mon, Feb 22, 2016 at 09:34:26AM -0500, Boris Ostrovsky wrote: > On 02/22/2016 05:27 AM, Borislav Petkov wrote: > >On Mon, Feb 22, 2016 at 07:07:56AM +0100, Luis R. Rodriguez wrote: > >>diff --git a/arch/x86/include/asm/x86_init.h > >>b/arch/x86/include/asm/x86_init.h > >>index 1ae89a2721d6..fe0d579b63e3 100644 > >>--- a/arch/x86/include/asm/x86_init.h > >>+++ b/arch/x86/include/asm/x86_init.h > >>@@ -84,11 +84,14 @@ struct x86_init_paging { > >> *boot cpu > >> * @timer_init: initialize the platform timer (default > >> PIT/HPET) > >> * @wallclock_init: init the wallclock device > >>+ * @no_cmos_rtc: set when platform has no CMOS real-time clock > >>+ * present > >> */ > >> struct x86_init_timers { > >>void (*setup_percpu_clockev)(void); > >>void (*timer_init)(void); > >>void (*wallclock_init)(void); > >>+ bool no_cmos_rtc; > >I'd add > > > > u64 flags; > > > >to x86_init_ops and then set X86_PLATFORM_NO_RTC or so in there. The > >reason being, others could use that flags field too, for other stuff and > >define more bits. > > Maybe timer_flags or platform_flags (or something else) to be a > little more cscope-friendly? Sure, I'll go with platform_flags on x86_init_ops. Will repost a new series after 0-day testing. Luis
linux-next: manual merge of the f2fs tree with Linus' tree
Hi Jaegeuk, Today's linux-next merge of the f2fs tree got conflicts in: fs/ext4/crypto.c fs/ext4/ext4.h fs/ext4/namei.c between commits: 28b4c263961c ("ext4 crypto: revalidate dentry after adding or removing the key") ff978b09f973 ("ext4 crypto: move context consistency check to ext4_file_open()") from Linus' tree (presumably) bug fixes in v4.5-rc5 and commit: e29c55a8d406 ("ext4 crypto: migrate into vfs's crypto engine") from the f2fs tree. I do not have enough information to fix this up, so I dropped the last commit from the f2fs tree for today. I *assume* that this has been passed by Ted - I can;t easily tell as there are no Acked-by or Reviewed-by tags in the f2fs tree commit. If so, perhaps you could either merge v4.5-rc5 yourself and do the resolutions necessary (with nice comments in the merge commit), or give me some hints on how to do it. -- Cheers, Stephen Rothwell
[PATCH 1/6] cgroup: fix error return value of cgroup_addrm_files()
cgroup_addrm_files() incorrectly returned 0 after add failure. Fix it. Signed-off-by: Tejun Heo --- kernel/cgroup.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 7ad6191..68b032d 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -3369,7 +3369,7 @@ static int cgroup_addrm_files(struct cgroup_subsys_state *css, bool is_add) { struct cftype *cft, *cft_end = NULL; - int ret; + int ret = 0; lockdep_assert_held(&cgroup_mutex); @@ -3398,7 +3398,7 @@ static int cgroup_addrm_files(struct cgroup_subsys_state *css, cgroup_rm_file(cgrp, cft); } } - return 0; + return ret; } static int cgroup_apply_cftypes(struct cftype *cfts, bool is_add) -- 2.5.0
[PATCH 3/6] cgroup: s/child_subsys_mask/subtree_ss_mask/
For consistency with cgroup->subtree_control. * cgroup->child_subsys_mask -> cgroup->subtree_ss_mask * cgroup_calc_child_subsys_mask() -> cgroup_calc_subtree_ss_mask() * cgroup_refresh_child_subsys_mask() -> cgroup_refresh_subtree_ss_mask() No functional changes. Signed-off-by: Tejun Heo --- include/linux/cgroup-defs.h | 11 +-- kernel/cgroup.c | 48 ++--- 2 files changed, 29 insertions(+), 30 deletions(-) diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index 4f3c0da..c68ae7f 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -253,13 +253,12 @@ struct cgroup { /* * The bitmask of subsystems enabled on the child cgroups. * ->subtree_control is the one configured through -* "cgroup.subtree_control" while ->child_subsys_mask is the -* effective one which may have more subsystems enabled. -* Controller knobs are made available iff it's enabled in -* ->subtree_control. +* "cgroup.subtree_control" while ->child_ss_mask is the effective +* one which may have more subsystems enabled. Controller knobs +* are made available iff it's enabled in ->subtree_control. */ - unsigned int subtree_control; - unsigned int child_subsys_mask; + unsigned long subtree_control; + unsigned long subtree_ss_mask; /* Private pointers for each registered subsystem */ struct cgroup_subsys_state __rcu *subsys[CGROUP_SUBSYS_COUNT]; diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 7727b6e..f3cd67b 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -391,10 +391,10 @@ static struct cgroup_subsys_state *cgroup_e_css(struct cgroup *cgrp, /* * This function is used while updating css associations and thus -* can't test the csses directly. Use ->child_subsys_mask. +* can't test the csses directly. Use ->subtree_ss_mask. */ while (cgroup_parent(cgrp) && - !(cgroup_parent(cgrp)->child_subsys_mask & (1 << ss->id))) + !(cgroup_parent(cgrp)->subtree_ss_mask & (1 << ss->id))) cgrp = cgroup_parent(cgrp); return cgroup_css(cgrp, ss); @@ -1256,7 +1256,7 @@ static umode_t cgroup_file_mode(const struct cftype *cft) } /** - * cgroup_calc_child_subsys_mask - calculate child_subsys_mask + * cgroup_calc_subtree_ss_mask - calculate subtree_ss_mask * @cgrp: the target cgroup * @subtree_control: the new subtree_control mask to consider * @@ -1268,8 +1268,8 @@ static umode_t cgroup_file_mode(const struct cftype *cft) * @subtree_control is to be applied to @cgrp. The returned mask is always * a superset of @subtree_control and follows the usual hierarchy rules. */ -static unsigned long cgroup_calc_child_subsys_mask(struct cgroup *cgrp, - unsigned long subtree_control) +static unsigned long cgroup_calc_subtree_ss_mask(struct cgroup *cgrp, +unsigned long subtree_control) { struct cgroup *parent = cgroup_parent(cgrp); unsigned long cur_ss_mask = subtree_control; @@ -1293,7 +1293,7 @@ static unsigned long cgroup_calc_child_subsys_mask(struct cgroup *cgrp, * to non-default hierarchies. */ if (parent) - new_ss_mask &= parent->child_subsys_mask; + new_ss_mask &= parent->subtree_ss_mask; else new_ss_mask &= cgrp->root->subsys_mask; @@ -1306,16 +1306,16 @@ static unsigned long cgroup_calc_child_subsys_mask(struct cgroup *cgrp, } /** - * cgroup_refresh_child_subsys_mask - update child_subsys_mask + * cgroup_refresh_subtree_ss_mask - update subtree_ss_mask * @cgrp: the target cgroup * - * Update @cgrp->child_subsys_mask according to the current - * @cgrp->subtree_control using cgroup_calc_child_subsys_mask(). + * Update @cgrp->subtree_ss_mask according to the current + * @cgrp->subtree_control using cgroup_calc_subtree_ss_mask(). */ -static void cgroup_refresh_child_subsys_mask(struct cgroup *cgrp) +static void cgroup_refresh_subtree_ss_mask(struct cgroup *cgrp) { - cgrp->child_subsys_mask = - cgroup_calc_child_subsys_mask(cgrp, cgrp->subtree_control); + cgrp->subtree_ss_mask = + cgroup_calc_subtree_ss_mask(cgrp, cgrp->subtree_control); } /** @@ -1542,7 +1542,7 @@ static int rebind_subsystems(struct cgroup_root *dst_root, src_root->subsys_mask &= ~(1 << ssid); scgrp->subtree_control &= ~(1 << ssid); - cgroup_refresh_child_subsys_mask(scgrp); + cgroup_refresh_subtree_ss_mask(scgrp); /* default hierarchy doesn't enable controllers by default */ dst_root->subsys_mask |= 1 << ssid; @@ -1550,7 +1550,7 @@ static in
[PATCHSET] cgroup: misc fixes and cleanups
Hello, This patchset contains the following six patches. The first one is a fix but should be safe to route through for-4.6. The rest are misc cleanups and improvements which don't cause notable behavior changes. 0001-cgroup-fix-error-return-value-of-cgroup_addrm_files.patch 0002-Revert-cgroup-add-cgroup_subsys-css_e_css_changed.patch 0003-cgroup-s-child_subsys_mask-subtree_ss_mask.patch 0004-cgroup-convert-for_each_subsys_which-to-do-while-sty.patch 0005-cgroup-use-do_each_subsys_mask-where-applicable.patch 0006-cgroup-make-cgroup-subsystem-masks-u16.patch The patchset based on top of cgroup/for-4.6 223ffb29f972 ("cgroup: provide cgroup_nov1= to disable controllers in v1 mounts") and is available in the following git branch. git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-misc diffstat follows. I'll pull in the branch into cgroup/for-4.6 soonish. include/linux/cgroup-defs.h | 12 +- kernel/cgroup.c | 217 +++- 2 files changed, 102 insertions(+), 127 deletions(-) Thanks. -- tejun
[PATCH 5/6] cgroup: use do_each_subsys_mask() where applicable
There are several places in cgroup_subtree_control_write() which can use do_each_subsys_mask() instead of manual mask testing. Use it. No functional changes. Signed-off-by: Tejun Heo --- kernel/cgroup.c | 35 --- 1 file changed, 12 insertions(+), 23 deletions(-) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 5d10298..1e561bd 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -3082,10 +3082,7 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of, * dependency. An invisible css is made visible when the userland * explicitly enables it. */ - for_each_subsys(ss, ssid) { - if (!(enable & (1 << ssid))) - continue; - + do_each_subsys_mask(ss, ssid, enable) { cgroup_for_each_live_child(child, cgrp) { if (css_enable & (1 << ssid)) ret = create_css(child, ss, @@ -3096,7 +3093,7 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of, if (ret) goto err_undo_css; } - } + } while_each_subsys_mask(); /* * At this point, cgroup_e_css() results reflect the new csses @@ -3115,10 +3112,7 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of, * state if it's made visible again later. Controllers which may * be depended upon should provide ->css_reset() for this purpose. */ - for_each_subsys(ss, ssid) { - if (!(disable & (1 << ssid))) - continue; - + do_each_subsys_mask(ss, ssid, disable) { cgroup_for_each_live_child(child, cgrp) { struct cgroup_subsys_state *css = cgroup_css(child, ss); @@ -3130,7 +3124,7 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of, ss->css_reset(css); } } - } + } while_each_subsys_mask(); kernfs_activate(cgrp->kn); ret = 0; @@ -3142,10 +3136,7 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of, cgrp->subtree_control = old_sc; cgrp->subtree_ss_mask = old_ss; - for_each_subsys(ss, ssid) { - if (!(enable & (1 << ssid))) - continue; - + do_each_subsys_mask(ss, ssid, enable) { cgroup_for_each_live_child(child, cgrp) { struct cgroup_subsys_state *css = cgroup_css(child, ss); @@ -3157,7 +3148,7 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of, else css_clear_dir(css, NULL); } - } + } while_each_subsys_mask(); goto out_unlock; } @@ -4973,14 +4964,12 @@ static int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name, goto out_destroy; /* let's create and online css's */ - for_each_subsys(ss, ssid) { - if (parent->subtree_ss_mask & (1 << ssid)) { - ret = create_css(cgrp, ss, -parent->subtree_control & (1 << ssid)); - if (ret) - goto out_destroy; - } - } + do_each_subsys_mask(ss, ssid, parent->subtree_ss_mask) { + ret = create_css(cgrp, ss, +parent->subtree_control & (1 << ssid)); + if (ret) + goto out_destroy; + } while_each_subsys_mask(); /* * On the default hierarchy, a child doesn't automatically inherit -- 2.5.0
[PATCH 2/6] Revert "cgroup: add cgroup_subsys->css_e_css_changed()"
This reverts commit 56c807ba4e91f0980567b6a69de239677879b17f. cgroup_subsys->css_e_css_changed() was supposed to be used by cgroup writeback support; however, the change to per-inode cgroup association made it unnecessary and the callback doesn't have any user. Remove it. Signed-off-by: Tejun Heo --- include/linux/cgroup-defs.h | 1 - kernel/cgroup.c | 18 -- 2 files changed, 19 deletions(-) diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index 789471d..4f3c0da 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -434,7 +434,6 @@ struct cgroup_subsys { void (*css_released)(struct cgroup_subsys_state *css); void (*css_free)(struct cgroup_subsys_state *css); void (*css_reset)(struct cgroup_subsys_state *css); - void (*css_e_css_changed)(struct cgroup_subsys_state *css); int (*can_attach)(struct cgroup_taskset *tset); void (*cancel_attach)(struct cgroup_taskset *tset); diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 68b032d..7727b6e 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -3127,24 +3127,6 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of, } } - /* -* The effective csses of all the descendants (excluding @cgrp) may -* have changed. Subsystems can optionally subscribe to this event -* by implementing ->css_e_css_changed() which is invoked if any of -* the effective csses seen from the css's cgroup may have changed. -*/ - for_each_subsys(ss, ssid) { - struct cgroup_subsys_state *this_css = cgroup_css(cgrp, ss); - struct cgroup_subsys_state *css; - - if (!ss->css_e_css_changed || !this_css) - continue; - - css_for_each_descendant_pre(css, this_css) - if (css != this_css) - ss->css_e_css_changed(css); - } - kernfs_activate(cgrp->kn); ret = 0; out_unlock: -- 2.5.0
[PATCH 6/6] cgroup: make cgroup subsystem masks u16
After the recent do_each_subsys_mask() conversion, there's no reason to use ulong for subsystem masks. We'll be adding more subsystem masks to persistent data structures, let's reduce its size to u16 which should be enough for now and the foreseeable future. This doesn't create any noticeable behavior differences. Signed-off-by: Tejun Heo --- include/linux/cgroup-defs.h | 4 ++-- kernel/cgroup.c | 46 ++--- 2 files changed, 24 insertions(+), 26 deletions(-) diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index c68ae7f..0abf6aa 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -257,8 +257,8 @@ struct cgroup { * one which may have more subsystems enabled. Controller knobs * are made available iff it's enabled in ->subtree_control. */ - unsigned long subtree_control; - unsigned long subtree_ss_mask; + u16 subtree_control; + u16 subtree_ss_mask; /* Private pointers for each registered subsystem */ struct cgroup_subsys_state __rcu *subsys[CGROUP_SUBSYS_COUNT]; diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 1e561bd..a2f81ef 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -184,7 +184,7 @@ static bool cgrp_dfl_root_visible; static unsigned long cgroup_no_v1_mask; /* some controllers are not supported in the default hierarchy */ -static unsigned long cgrp_dfl_root_inhibit_ss_mask; +static u16 cgrp_dfl_root_inhibit_ss_mask; /* The list of hierarchy roots */ @@ -208,19 +208,18 @@ static u64 css_serial_nr_next = 1; * fork/exit handlers to call. This avoids us having to do extra work in the * fork/exit path to check which subsystems have fork/exit callbacks. */ -static unsigned long have_fork_callback __read_mostly; -static unsigned long have_exit_callback __read_mostly; -static unsigned long have_free_callback __read_mostly; +static u16 have_fork_callback __read_mostly; +static u16 have_exit_callback __read_mostly; +static u16 have_free_callback __read_mostly; /* Ditto for the can_fork callback. */ -static unsigned long have_canfork_callback __read_mostly; +static u16 have_canfork_callback __read_mostly; static struct file_system_type cgroup2_fs_type; static struct cftype cgroup_dfl_base_files[]; static struct cftype cgroup_legacy_base_files[]; -static int rebind_subsystems(struct cgroup_root *dst_root, -unsigned long ss_mask); +static int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask); static void css_task_iter_advance(struct css_task_iter *it); static int cgroup_destroy_locked(struct cgroup *cgrp); static int create_css(struct cgroup *cgrp, struct cgroup_subsys *ss, @@ -1274,11 +1273,10 @@ static umode_t cgroup_file_mode(const struct cftype *cft) * @subtree_control is to be applied to @cgrp. The returned mask is always * a superset of @subtree_control and follows the usual hierarchy rules. */ -static unsigned long cgroup_calc_subtree_ss_mask(struct cgroup *cgrp, -unsigned long subtree_control) +static u16 cgroup_calc_subtree_ss_mask(struct cgroup *cgrp, u16 subtree_control) { struct cgroup *parent = cgroup_parent(cgrp); - unsigned long cur_ss_mask = subtree_control; + u16 cur_ss_mask = subtree_control; struct cgroup_subsys *ss; int ssid; @@ -1288,7 +1286,7 @@ static unsigned long cgroup_calc_subtree_ss_mask(struct cgroup *cgrp, return cur_ss_mask; while (true) { - unsigned long new_ss_mask = cur_ss_mask; + u16 new_ss_mask = cur_ss_mask; do_each_subsys_mask(ss, ssid, cur_ss_mask) { new_ss_mask |= ss->depends_on; @@ -1466,12 +1464,11 @@ static int css_populate_dir(struct cgroup_subsys_state *css, return ret; } -static int rebind_subsystems(struct cgroup_root *dst_root, -unsigned long ss_mask) +static int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask) { struct cgroup *dcgrp = &dst_root->cgrp; struct cgroup_subsys *ss; - unsigned long tmp_ss_mask; + u16 tmp_ss_mask; int ssid, i, ret; lockdep_assert_held(&cgroup_mutex); @@ -1507,7 +1504,7 @@ static int rebind_subsystems(struct cgroup_root *dst_root, */ if (dst_root == &cgrp_dfl_root) { if (cgrp_dfl_root_visible) { - pr_warn("failed to create files (%d) while rebinding 0x%lx to default root\n", + pr_warn("failed to create files (%d) while rebinding 0x%x to default root\n", ret, ss_mask); pr_warn("you may retry by moving them to a different hierarchy and unbinding\n"); } @@ -1599,7 +1596,7 @@ static int cgroup_show_
[PATCH 4/6] cgroup: convert for_each_subsys_which() to do-while style
for_each_subsys_which() allows iterating subsystems specified in a subsystem bitmask; unfortunately, it requires the mask to be an unsigned long l-value which can be inconvenient and makes it awkward to use a smaller type for subsystem masks. This patch converts for_each_subsy_which() to do-while style which allows it to drop the l-value requirement. The new iterator is named do_each_subsys_mask() / while_each_subsys_mask(). Signed-off-by: Tejun Heo Cc: Aleksa Sarai --- kernel/cgroup.c | 72 - 1 file changed, 40 insertions(+), 32 deletions(-) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index f3cd67b..5d10298 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -514,22 +514,28 @@ static int notify_on_release(const struct cgroup *cgrp) (((ss) = cgroup_subsys[ssid]) || true); (ssid)++) /** - * for_each_subsys_which - filter for_each_subsys with a bitmask + * do_each_subsys_mask - filter for_each_subsys with a bitmask * @ss: the iteration cursor * @ssid: the index of @ss, CGROUP_SUBSYS_COUNT after reaching the end - * @ss_maskp: a pointer to the bitmask + * @ss_mask: the bitmask * * The block will only run for cases where the ssid-th bit (1 << ssid) of - * mask is set to 1. + * @ss_mask is set. */ -#define for_each_subsys_which(ss, ssid, ss_maskp) \ - if (!CGROUP_SUBSYS_COUNT) /* to avoid spurious gcc warning */ \ +#define do_each_subsys_mask(ss, ssid, ss_mask) do {\ + unsigned long __ss_mask = (ss_mask);\ + if (!CGROUP_SUBSYS_COUNT) { /* to avoid spurious gcc warning */ \ (ssid) = 0; \ - else\ - for_each_set_bit(ssid, ss_maskp, CGROUP_SUBSYS_COUNT) \ - if (((ss) = cgroup_subsys[ssid]) && false) \ - break; \ - else + break; \ + } \ + for_each_set_bit(ssid, &__ss_mask, CGROUP_SUBSYS_COUNT) { \ + (ss) = cgroup_subsys[ssid]; \ + { + +#define while_each_subsys_mask() \ + } \ + } \ +} while (false) /* iterate across the hierarchies */ #define for_each_root(root)\ @@ -1284,8 +1290,9 @@ static unsigned long cgroup_calc_subtree_ss_mask(struct cgroup *cgrp, while (true) { unsigned long new_ss_mask = cur_ss_mask; - for_each_subsys_which(ss, ssid, &cur_ss_mask) + do_each_subsys_mask(ss, ssid, cur_ss_mask) { new_ss_mask |= ss->depends_on; + } while_each_subsys_mask(); /* * Mask out subsystems which aren't available. This can @@ -1469,7 +1476,7 @@ static int rebind_subsystems(struct cgroup_root *dst_root, lockdep_assert_held(&cgroup_mutex); - for_each_subsys_which(ss, ssid, &ss_mask) { + do_each_subsys_mask(ss, ssid, ss_mask) { /* if @ss has non-root csses attached to it, can't move */ if (css_next_child(NULL, cgroup_css(&ss->root->cgrp, ss))) return -EBUSY; @@ -1477,14 +1484,14 @@ static int rebind_subsystems(struct cgroup_root *dst_root, /* can't move between two non-dummy roots either */ if (ss->root != &cgrp_dfl_root && dst_root != &cgrp_dfl_root) return -EBUSY; - } + } while_each_subsys_mask(); /* skip creating root files on dfl_root for inhibited subsystems */ tmp_ss_mask = ss_mask; if (dst_root == &cgrp_dfl_root) tmp_ss_mask &= ~cgrp_dfl_root_inhibit_ss_mask; - for_each_subsys_which(ss, ssid, &tmp_ss_mask) { + do_each_subsys_mask(ss, ssid, tmp_ss_mask) { struct cgroup *scgrp = &ss->root->cgrp; int tssid; @@ -1507,19 +1514,19 @@ static int rebind_subsystems(struct cgroup_root *dst_root, continue; } - for_each_subsys_which(ss, tssid, &tmp_ss_mask) { + do_each_subsys_mask(ss, tssid, tmp_ss_mask) { if (tssid == ssid) break; css_clear_dir(cgroup_css(scgrp, ss), dcgrp); - } + } while_each_subsys_mask(); return ret; - } + } while_each_subsys_mask(); /* * Nothing can fail from this point on. Remove f
Re: [RFC/RFT][PATCH 1/1] cpufreq: New governor using utilization data from the scheduler
On Mon, Feb 22, 2016 at 3:16 PM, Juri Lelli wrote: > Hi Rafael, Hi, > thanks for this RFC. I'm going to test it more in the next few days, but > I already have some questions from skimming through it. Please find them > inline below. > > On 22/02/16 00:18, Rafael J. Wysocki wrote: >> From: Rafael J. Wysocki >> >> Add a new cpufreq scaling governor, called "schedutil", that uses >> scheduler-provided CPU utilization information as input for making >> its decisions. >> > > I guess the first (macro) question is why did you decide to go with a > complete new governor, where new here is w.r.t. the sched-freq solution. Probably the most comprehensive answer to this question is my intro message: http://marc.info/?l=linux-pm&m=145609673008122&w=2 The executive summary is probably that this was the most straightforward way to use the scheduler-provided numbers in cpufreq that I could think about. > AFAICT, it is true that your solution directly builds on top of the > latest changes to cpufreq core and governor, but it also seems to have > more than a few points in common with sched-freq, That surely isn't a drawback, is it? If two people come to the same conclusions in different ways, that's an indication that the conclusions may actually be correct. > and sched-freq has been discussed and evaluated for already quite some time. Yes, it has. Does this mean that no one is allowed to try any alternatives to it now? > Also, it appears to me that they both shares (or they might encounter in the > future as development progresses) the same kind of problems, like for > example the fact that we can't trigger opp changes from scheduler > context ATM. "Give them a finger and they will ask for the hand." If you read my intro message linked above, you'll find a paragraph or two about that in it. And the short summary is that I have a plan to actually implement that feature in the schedutil governor at least for the ACPI cpufreq driver. It shouldn't be too difficult to do either AFAICS. So it is not "we can't", but rather "we haven't implemented that yet" in this particular case. I may not be able to do that in the next few days, as I have other things to do too, but you may expect to see that done at one point. So it's not a fundamental issue or anything, it's just that I haven't done that *yet* at this point, OK? > Don't get me wrong. I think that looking at different ways to solve a > problem is always beneficial, since I guess that the goal in the end is > to come up with something that suits everybody's needs. Precisely. > I was only curious about your thoughts on sched-freq. But we can also wait > for the > next RFC from Steve for this macro question. :-) Right, but I have some thoughts anyway. My goal, that may be quite different from yours, is to reduce the cpufreq's overhead as much as I possibly can. If I have to change the way it drives the CPU frequency selection to achieve that goal, I will do that, but if that can stay the way it is, that's fine too. Some progress has been made already here: we have dealt with the timers for good now I think. This patch deals with the overhead associated with the load tracking carried by "traditional" cpufreq governors and with a couple of questionable things done by "ondemand" in addition to that (which is one of the reasons why I didn't want to modify "ondemand" itself for now). The next step will be to teach the governor and the ACPI driver to switch CPU frequencies in the scheduler context, without spawning extra work items etc. Finally, the sampling should go away and that's where I want it to be. I just don't want to run extra code when that's not necessary and I want things to stay simple when that's as good as it can get. If sched-freq can pull that off for me, that's fine, but can it really? > [...] > >> +static void sugov_update_commit(struct policy_dbs_info *policy_dbs, u64 >> time, >> + unsigned int next_freq) >> +{ >> + struct sugov_policy *sg_policy = to_sg_policy(policy_dbs); >> + >> + sg_policy->next_freq = next_freq; >> + policy_dbs->last_sample_time = time; >> + policy_dbs->work_in_progress = true; >> + irq_work_queue(&policy_dbs->irq_work); > > Here we basically use the system wq to be able to do the freq transition > in process context. CFS is probably fine with this, but don't you think > we might get into troubles when, in the future, we will want to service > RT/DL requests more properly and they will end up being serviced > together with all the others wq users and at !RT priority? That may be regarded as a problem, but I'm not sure why you're talking about it in the context of this particular patch. That problem has been there forever in cpufreq: in theory RT tasks may stall frequency changes indefinitely. Is the problem real, though? Suppose that that actually happens and there are RT tasks effectively stalling frequency updates. In that case some other important activ
Re: 4.4-final: 28 bioset threads on small notebook
On Sun, Feb 21, 2016 at 05:40:59PM +0800, Ming Lei wrote: > On Sun, Feb 21, 2016 at 2:43 PM, Ming Lin-SSI wrote: > >>-Original Message- > > > > So it's almost already "per request_queue" > > Yes, that is because of the following line: > > q->bio_split = bioset_create(BIO_POOL_SIZE, 0); > > in blk_alloc_queue_node(). > > Looks like this bio_set doesn't need to be per-request_queue, and > now it is only used for fast-cloning bio for splitting, and one global > split bio_set should be enough. It does have to be per request queue for stacking block devices (which includes loopback).
Re: [PATCH 08/10] x86/xsaves: Fix PTRACE frames for XSAVES
On Mon, Feb 22, 2016 at 2:53 PM, Dave Hansen wrote: > On 02/22/2016 02:45 PM, Andy Lutomirski wrote: >>> +/* >>> > + * Convert from kernel XSAVES compacted format to standard format and >>> > copy >>> > + * to a ptrace buffer. It supports partial copy but pos always starts >>> > from >>> > + * zero. This is called from xstateregs_get() and there we check the cpu >>> > + * has XSAVES. >>> > + */ >>> > +int copyout_from_xsaves(unsigned int pos, unsigned int count, void *kbuf, >>> > + void __user *ubuf, const struct xregs_state >>> > *xsave) >> Now that you've written this code, can it be shared with the signal >> handling code? > > It could be. But the signal handler code has the advantage of already > having the data in the registers since it's running on its *own* FPU > state, so it can just call XSAVE(S) directly. > > This ptrace code *could* do a kernel_fpu_begin(), XRSTOR the user buffer > into the registers, XRSTOR the ptracee's system state in to the > registers, then XSAVES the whole thing to the kernel buffer, then > kernel_fpu_end(). > > Or, we could remove the signal handler's ability to XSAVE directly to > userspace. But it already *had* that and we know it works. Some day I kind of want to delete all this xsave/xrstor directly on user buffers code. I've never been thrilled with the concept, and it has messy (although AFAICT not presently buggy [1]) interactions with context switches, and it can't run with preemption disabled because it can take page faults. In the mean time, it's fine as far as I know, but maybe it would be cleaner if it used the software copy code. Or maybe we can change it later if a good reason shows up. [1] actually there's a minor bug in the 32-bit compat code that Borislav has a patch for. -- Andy Lutomirski AMA Capital Management, LLC
[PATCH 1/8] tty: n_gsm: fix formatting errors
Minor formatting changes to remove errors and reduce number of warnings produced by checkpatch.pl script. Signed-off-by: Andrej Krpic --- drivers/tty/n_gsm.c | 138 1 file changed, 95 insertions(+), 43 deletions(-) diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c index c016207..cc3b374 100644 --- a/drivers/tty/n_gsm.c +++ b/drivers/tty/n_gsm.c @@ -431,6 +431,7 @@ static int gsm_read_ea(unsigned int *val, u8 c) static u8 gsm_encode_modem(const struct gsm_dlci *dlci) { u8 modembits = 0; + /* FC is true flow control not modem bits */ if (dlci->throttled) modembits |= MDM_FC; @@ -489,7 +490,8 @@ static void gsm_print_packet(const char *hdr, int addr, int cr, if (!(control & 0x01)) { pr_cont("I N(S)%d N(R)%d", (control & 0x0E) >> 1, (control & 0xE0) >> 5); - } else switch (control & 0x0F) { + } else + switch (control & 0x0F) { case RR: pr_cont("RR(%d)", (control & 0xE0) >> 5); break; @@ -511,6 +513,7 @@ static void gsm_print_packet(const char *hdr, int addr, int cr, if (dlen) { int ct = 0; + while (dlen--) { if (ct % 8 == 0) { pr_cont("\n"); @@ -542,6 +545,7 @@ static void gsm_print_packet(const char *hdr, int addr, int cr, static int gsm_stuff_frame(const u8 *input, u8 *output, int len) { int olen = 0; + while (len--) { if (*input == GSM1_SOF || *input == GSM1_ESCAPE || *input == XON || *input == XOFF) { @@ -695,7 +699,7 @@ static void gsm_data_kick(struct gsm_mux *gsm) len += 2; } else { gsm->txframe[0] = GSM0_SOF; - memcpy(gsm->txframe + 1 , msg->data, msg->len); + memcpy(gsm->txframe + 1, msg->data, msg->len); gsm->txframe[msg->len + 1] = GSM0_SOF; len = msg->len + 2; } @@ -711,7 +715,8 @@ static void gsm_data_kick(struct gsm_mux *gsm) /* FIXME: Can eliminate one SOF in many more cases */ gsm->tx_bytes -= msg->len; /* For a burst of frames skip the extra SOF within the - burst */ +* burst +*/ skip_sof = 1; list_del(&msg->list); @@ -750,7 +755,7 @@ static void __gsm_data_queue(struct gsm_dlci *dlci, struct gsm_msg *msg) *--dp = (msg->addr << 2) | 2 | EA; else *--dp = (msg->addr << 2) | EA; - *fcs = gsm_fcs_add_block(INIT_FCS, dp , msg->data - dp); + *fcs = gsm_fcs_add_block(INIT_FCS, dp, msg->data - dp); /* Ugly protocol layering violation */ if (msg->ctrl == UI || msg->ctrl == (UI|PF)) *fcs = gsm_fcs_add_block(*fcs, msg->data, msg->len); @@ -760,7 +765,8 @@ static void __gsm_data_queue(struct gsm_dlci *dlci, struct gsm_msg *msg) msg->data, msg->len); /* Move the header back and adjust the length, also allow for the FCS - now tacked on the end */ +* now tacked on the end +*/ msg->len += (msg->data - dp) + 1; msg->data = dp; @@ -783,6 +789,7 @@ static void __gsm_data_queue(struct gsm_dlci *dlci, struct gsm_msg *msg) static void gsm_data_queue(struct gsm_dlci *dlci, struct gsm_msg *msg) { unsigned long flags; + spin_lock_irqsave(&dlci->gsm->tx_lock, flags); __gsm_data_queue(dlci, msg); spin_unlock_irqrestore(&dlci->gsm->tx_lock, flags); @@ -821,7 +828,8 @@ static int gsm_dlci_data_output(struct gsm_mux *gsm, struct gsm_dlci *dlci) msg = gsm_data_alloc(gsm, dlci->addr, size, gsm->ftype); /* FIXME: need a timer or something to kick this so it can't - get stuck with no work outstanding and no buffer free */ +* get stuck with no work outstanding and no buffer free +*/ if (msg == NULL) return -ENOMEM; dp = msg->data; @@ -829,11 +837,13 @@ static int gsm_dlci_data_output(struct gsm_mux *gsm, struct gsm_dlci *dlci) case 1: /* Unstructured */ break; case 2: /* Unstructed with modem bits. - Always one byte as we never send inline break data */ +* Always one byte as we never send inline break data +*/ *dp++ = gsm_encode_modem(dlci); break; } - WARN_ON(kfifo_out_locked(dlci->fifo, dp , len, &dlci->lock) !=
[PATCH 3/8] tty: n_gsm: make mux work as a responder station
Comment suggests that cr == 1 represents a received command and cr == 0 a received response. Received frames are then filtered: - correctly by rejection of SABM and DISC responses, they are command only frame types and - incorrectly by rejection of UA (a response only frame type) responses. Mux as a initiator successfully establishes DLC by receiving UA response frame to a previously sent open channel command (SABM). Incorrect equation (eqA) makes UA "reject cr == 0 commands" case correct, but filters out all received SABM and DISC command frames. Change eqA to eqB to match the intent and fix filtering of UA frames. This enables reception of SABM and DISC frames and consequently makes mux work as a responder station. receivedreceiving as eqA eqB 3GPP TS 27.010 CR bit initiator (ir)cr=CR^(1-ir) cr=CR^ir5.2.1.2 0 0 10 0 (response) 1 0 _\ 01 1 (command) 0 1 / 01 1 (command) 1 1 10 0 (response) Signed-off-by: Andrej Krpic --- drivers/tty/n_gsm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c index a0fb92c..05b562d 100644 --- a/drivers/tty/n_gsm.c +++ b/drivers/tty/n_gsm.c @@ -1798,7 +1798,7 @@ static void gsm_queue(struct gsm_mux *gsm) gsm_print_packet("<--", address, cr, gsm->control, gsm->buf, gsm->len); - cr ^= 1 - gsm->initiator; /* Flip so 1 always means command */ + cr ^= gsm->initiator; /* Flip so 1 always means command */ dlci = gsm->dlci[address]; switch (gsm->control) { @@ -1829,7 +1829,7 @@ static void gsm_queue(struct gsm_mux *gsm) break; case UA: case UA|PF: - if (cr == 0 || dlci == NULL) + if (cr || dlci == NULL) break; switch (dlci->state) { case DLCI_CLOSING: -- 2.7.0
[PATCH 2/8] tty: n_gsm: fix C/R bit when sending as a responder
According to the specification (3GPP TS 27.010 v12.0.0 R1, 5.2.1.2), C/R bit must be the same for corresponding command and response. If mux is an initiator (station that initiates DLC 0), valid sent commands and received responses must have C/R bit set to 1. For a station that is a responder, valid sent commands and received responses must have C/R bit set to 0. Change the value of C/R bit in command and response frames to depend on whether the station is initator or not. Signed-off-by: Andrej Krpic --- drivers/tty/n_gsm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c index cc3b374..a0fb92c 100644 --- a/drivers/tty/n_gsm.c +++ b/drivers/tty/n_gsm.c @@ -622,7 +622,7 @@ static void gsm_send(struct gsm_mux *gsm, int addr, int cr, int control) static inline void gsm_response(struct gsm_mux *gsm, int addr, int control) { - gsm_send(gsm, addr, 0, control); + gsm_send(gsm, addr, gsm->initiator ? 0 : 1, control); } /** @@ -636,7 +636,7 @@ static inline void gsm_response(struct gsm_mux *gsm, int addr, int control) static inline void gsm_command(struct gsm_mux *gsm, int addr, int control) { - gsm_send(gsm, addr, 1, control); + gsm_send(gsm, addr, gsm->initiator ? 1 : 0, control); } /* Data transmission */ -- 2.7.0
[PATCH 7/8] tty: n_gsm: properly format Modem Status Command message
Change format of Modem Status Command (MSC) message that is sent to the one expected in the receive function gsm_control_modem and specified in 3GPP TS 27.010 version 12.0.0 Release 12, 5.4.6.3.7. Wrongly formatted MSC causes DLC to be marked as constipated. A bug appears after format of transmitted control messages is fixed and control messages start to be recognized. Signed-off-by: Andrej Krpic --- drivers/tty/n_gsm.c | 9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c index 8aa90e0..b0d9edd 100644 --- a/drivers/tty/n_gsm.c +++ b/drivers/tty/n_gsm.c @@ -2874,12 +2874,11 @@ static int gsmtty_modem_update(struct gsm_dlci *dlci, u8 brk) if (brk) len++; - modembits[0] = len << 1 | EA; /* Data bytes */ - modembits[1] = dlci->addr << 2 | 3; /* DLCI, EA, 1 */ - modembits[2] = gsm_encode_modem(dlci) << 1 | EA; + modembits[0] = dlci->addr << 2 | 3; /* DLCI, EA, 1 */ + modembits[1] = gsm_encode_modem(dlci) << 1 | EA; if (brk) - modembits[3] = brk << 4 | 2 | EA; /* Valid, EA */ - ctrl = gsm_control_send(dlci->gsm, CMD_MSC, modembits, len + 1); + modembits[2] = brk << 4 | 2 | EA; /* Valid, EA */ + ctrl = gsm_control_send(dlci->gsm, CMD_MSC, modembits, len); if (ctrl == NULL) return -ENOMEM; return gsm_control_wait(dlci->gsm, ctrl); -- 2.7.0
[PATCH 6/8] tty: n_gsm: add missing length field in control channel commands
Observing debug output while running initator and responder using n_gsm shows all control channel commands sent by initiator are malformed - they don't include length field (3GPP TS 07.10 ver 7.2.0, 5.4.6.1). Add length field to transmitted control channel commands in the gsm_control_transmit) as it is done in gsm_control_reply and expected in gsm_dlci_command. Signed-off-by: Andrej Krpic --- drivers/tty/n_gsm.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c index 3c4c521..8aa90e0 100644 --- a/drivers/tty/n_gsm.c +++ b/drivers/tty/n_gsm.c @@ -1320,12 +1320,13 @@ static void gsm_control_response(struct gsm_mux *gsm, unsigned int command, static void gsm_control_transmit(struct gsm_mux *gsm, struct gsm_control *ctrl) { - struct gsm_msg *msg = gsm_data_alloc(gsm, 0, ctrl->len + 1, gsm->ftype); + struct gsm_msg *msg = gsm_data_alloc(gsm, 0, ctrl->len + 2, gsm->ftype); if (msg == NULL) return; - msg->data[0] = (ctrl->cmd << 1) | 2 | EA; /* command */ - memcpy(msg->data + 1, ctrl->data, ctrl->len); + msg->data[0] = (ctrl->cmd << 1) | 2 | EA; /* command */ + msg->data[1] = ((ctrl->len) << 1) | EA; + memcpy(msg->data + 2, ctrl->data, ctrl->len); gsm_data_queue(gsm->dlci[0], msg); } -- 2.7.0
[PATCH 8/8] tty: n_gsm: Enable reception of frames separated with a single SOF marker
Frames may be separated with a single SOF (5.2.6.1 of 27.010 mux spec). While transmission of a single SOF between frames is implemented in gsm_data_kick, the reception isn't. As a side effect, it is now possible to receive and ignore a stream of consecutive SOFs (5.2.5 of 27.010 mux spec). Signed-off-by: Andrej Krpic --- drivers/tty/n_gsm.c | 18 ++ 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c index b0d9edd..12b149d 100644 --- a/drivers/tty/n_gsm.c +++ b/drivers/tty/n_gsm.c @@ -1895,9 +1895,14 @@ static void gsm0_receive(struct gsm_mux *gsm, unsigned char c) } break; case GSM_ADDRESS: /* Address EA */ - gsm->fcs = gsm_fcs_add(gsm->fcs, c); + /* Ignore (not first) GSM0_SOF as it decodes into +* reserved DLCI 62 (5.6 of the 27.010 mux spec). +*/ + if (c == GSM0_SOF) + break; if (gsm_read_ea(&gsm->address, c)) gsm->state = GSM_CONTROL; + gsm->fcs = gsm_fcs_add(gsm->fcs, c); break; case GSM_CONTROL: /* Control Byte */ gsm->fcs = gsm_fcs_add(gsm->fcs, c); @@ -1944,13 +1949,10 @@ static void gsm0_receive(struct gsm_mux *gsm, unsigned char c) case GSM_FCS: /* FCS follows the packet */ gsm->received_fcs = c; gsm_queue(gsm); - gsm->state = GSM_SSOF; - break; - case GSM_SSOF: - if (c == GSM0_SOF) { - gsm->state = GSM_SEARCH; - break; - } + /* Frames can be separated with a single GSM0_SOF. +* Deal with consecutive GSM0_SOFs in GSM_ADDRESS. +*/ + gsm->state = GSM_SEARCH; break; } } -- 2.7.0
[PATCH 0/8] tty: n_gsm: Make mux work as a responder station
When using n_gsm you have to explicitly set it to work as a initiator station. This led me to believe that it can also work as a responder. In a use case where I needed responder station mux I came to the conclusion that it actually does not work. Second and third patch fix dealings with frame C/R bit that then enable mux to work as a responder station. Next patches in the series then fix bugs that were found after two instances of n_gsm connected over null-modem cable were used. First patch fixes formatting errors, such as space before comma, and most of the warnings reported by the checkpatch script. Andrej Krpic (8): tty: n_gsm: fix formatting errors tty: n_gsm: fix C/R bit when sending as a responder tty: n_gsm: make mux work as a responder station tty: n_gsm: send DM response when accessing an invalid channel tty: n_gsm: replace dead code with a meaningful comment tty: n_gsm: add missing length field in control channel commands tty: n_gsm: properly format Modem Status Command message tty: n_gsm: Enable reception of frames separated with a single SOF marker drivers/tty/n_gsm.c | 191 +--- 1 file changed, 123 insertions(+), 68 deletions(-) -- 2.7.0
[PATCH 5/8] tty: n_gsm: replace dead code with a meaningful comment
UI/UIH frame can be received as a command from other station or as a response to a command we issued earlier. Add this knowledge to the source code as a comment and remove useless #if 0/#endif block. Signed-off-by: Andrej Krpic --- drivers/tty/n_gsm.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c index 8551fa4..3c4c521 100644 --- a/drivers/tty/n_gsm.c +++ b/drivers/tty/n_gsm.c @@ -1852,10 +1852,11 @@ static void gsm_queue(struct gsm_mux *gsm) case UI|PF: case UIH: case UIH|PF: -#if 0 - if (cr) - goto invalid; -#endif + /* Don't reject frames based on cr value as UI/UIH +* frame can be received as a command from other +* station or as a response to a command we issued +* earlier. +*/ if (dlci == NULL || dlci->state != DLCI_OPEN) { gsm_response(gsm, address, DM|PF); return; -- 2.7.0
[PATCH 4/8] tty: n_gsm: send DM response when accessing an invalid channel
Change C/R bit in a response to a UI/UIH frame sent to non-existing/closed channel. As DM frame type is only valid as a response, it should be sent using gsm_response function. Signed-off-by: Andrej Krpic --- drivers/tty/n_gsm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c index 05b562d..8551fa4 100644 --- a/drivers/tty/n_gsm.c +++ b/drivers/tty/n_gsm.c @@ -1857,7 +1857,7 @@ static void gsm_queue(struct gsm_mux *gsm) goto invalid; #endif if (dlci == NULL || dlci->state != DLCI_OPEN) { - gsm_command(gsm, address, DM|PF); + gsm_response(gsm, address, DM|PF); return; } dlci->data(dlci, gsm->buf, gsm->len); -- 2.7.0
Re: [PATCH 08/10] x86/xsaves: Fix PTRACE frames for XSAVES
On 02/22/2016 02:45 PM, Andy Lutomirski wrote: >> +/* >> > + * Convert from kernel XSAVES compacted format to standard format and copy >> > + * to a ptrace buffer. It supports partial copy but pos always starts from >> > + * zero. This is called from xstateregs_get() and there we check the cpu >> > + * has XSAVES. >> > + */ >> > +int copyout_from_xsaves(unsigned int pos, unsigned int count, void *kbuf, >> > + void __user *ubuf, const struct xregs_state *xsave) > Now that you've written this code, can it be shared with the signal > handling code? It could be. But the signal handler code has the advantage of already having the data in the registers since it's running on its *own* FPU state, so it can just call XSAVE(S) directly. This ptrace code *could* do a kernel_fpu_begin(), XRSTOR the user buffer into the registers, XRSTOR the ptracee's system state in to the registers, then XSAVES the whole thing to the kernel buffer, then kernel_fpu_end(). Or, we could remove the signal handler's ability to XSAVE directly to userspace. But it already *had* that and we know it works.
Re: [PATCH 08/10] x86/xsaves: Fix PTRACE frames for XSAVES
On Mon, Feb 22, 2016 at 02:45:54PM -0800, Andy Lutomirski wrote: > > +/* > > + * Convert from kernel XSAVES compacted format to standard format and copy > > + * to a ptrace buffer. It supports partial copy but pos always starts from > > + * zero. This is called from xstateregs_get() and there we check the cpu > > + * has XSAVES. > > + */ > > +int copyout_from_xsaves(unsigned int pos, unsigned int count, void *kbuf, > > + void __user *ubuf, const struct xregs_state *xsave) > > Now that you've written this code, can it be shared with the signal > handling code? > For signal handling, we most likely save registers directly to memory. But for ptrace, the thread being debugged is not the active thread. Please let me think about it more. --Yu-cheng
Re: [PATCH 08/10] x86/xsaves: Fix PTRACE frames for XSAVES
On Mon, Feb 22, 2016 at 11:00 AM, Yu-cheng Yu wrote: > XSAVES uses compacted format and is a kernel instruction. The kernel > should use standard-format, non-supervisor state data for PTRACE. > > +/* > + * Convert from kernel XSAVES compacted format to standard format and copy > + * to a ptrace buffer. It supports partial copy but pos always starts from > + * zero. This is called from xstateregs_get() and there we check the cpu > + * has XSAVES. > + */ > +int copyout_from_xsaves(unsigned int pos, unsigned int count, void *kbuf, > + void __user *ubuf, const struct xregs_state *xsave) Now that you've written this code, can it be shared with the signal handling code? --Andy --Andy
Re: [PATCH 04/10] x86/xsaves: Introduce a new check that allows correct xstates copy from kernel to user directly
On Mon, Feb 22, 2016 at 10:58 AM, Yu-cheng Yu wrote: > XSAVES is a kernel instruction and uses a compacted format. When > working with user space, the kernel should provide standard-format, > non-supervisor state data. We cannot do __copy_to_user() from a compacted- > format kernel xstate area to a signal frame. > > Note that the path to copy_fpstate_to_sigframe() does currently check if > the thread has used FPU, but add a WARN_ONCE() there to detect any > potential mis-use. > > Dave Hansen proposes this method to simplify copy xstate directly to user. > > Signed-off-by: Fenghua Yu > Signed-off by: Yu-cheng Yu > --- > arch/x86/kernel/fpu/signal.c | 41 - > 1 file changed, 40 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c > index 0fbf60c..7676718 100644 > --- a/arch/x86/kernel/fpu/signal.c > +++ b/arch/x86/kernel/fpu/signal.c > @@ -130,6 +130,45 @@ static inline int copy_fpregs_to_sigframe(struct > xregs_state __user *buf) > return err; > } > > +static int should_save_registers_directly(void) I don't like the name of this function because: > +{ > + /* > +* In signal handling path, the kernel already checks if > +* FPU instructions have been used before it calls > +* copy_fpstate_to_sigframe(). We check this here again > +* to detect any potential mis-use and saving invalid > +* register values directly to a signal frame. > +*/ > + WARN_ONCE(!current->thread.fpu.fpstate_active, > + "direct FPU save with no math use\n"); ... Here "direct" seems to mean that we're asking whether to directly save ... > + > + /* > +* In the case that we are using a compacted kernel > +* xsave area, we can not copy the thread.fpu.state > +* directly to userspace and *must* save it from the > +* registers directly. > +*/ ... and here "directly" means *both* copying directly to userspace and saving using xsave directly. So can you rename it to something with an obvious meaning like "may_memcpy_fpu_regs" or similar? --Andy
Re: [PATCH 9/9] ARM: mmp: mark usb_dma_mask as __maybe_unused
Arnd Bergmann writes: > This variable may be used by some devices that each have their > on Kconfig symbol, or by none of them, and that causes a build > warning: > > arch/arm/mach-mmp/devices.c:241:12: error: 'usb_dma_mask' defined but not > used [-Werror=unused-variable] > > Marking it __maybe_unused avoids the warning. > > Signed-off-by: Arnd Bergmann > --- > arch/arm/mach-mmp/devices.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Hi Haojian, I still didn't get an ack on this one, do you want to carry it through your tree, or let Arnd carry it or let me carry it ? Cheers. -- Robert [1] The patch > diff --git a/arch/arm/mach-mmp/devices.c b/arch/arm/mach-mmp/devices.c > index 3330ac7cfbef..671c7a09ab3d 100644 > --- a/arch/arm/mach-mmp/devices.c > +++ b/arch/arm/mach-mmp/devices.c > @@ -238,7 +238,7 @@ void pxa_usb_phy_deinit(void __iomem *phy_reg) > #endif > > #if IS_ENABLED(CONFIG_USB_SUPPORT) > -static u64 usb_dma_mask = ~(u32)0; > +static u64 __maybe_unused usb_dma_mask = ~(u32)0; > > #if IS_ENABLED(CONFIG_USB_MV_UDC) > struct resource pxa168_u2o_resources[] = { -- Robert
[PATCH 01/24] staging: lustre: Dynamic LNet Configuration (DLC) IOCTL changes
From: Amir Shehata This is the fourth patch of a set of patches that enables DLC. This patch changes the IOCTL infrastructure in preparation of adding extra IOCTL communication between user and kernel space. The changes include: - adding a common header to be passed to ioctl infra functions instead of passing an exact structure. This header is meant to be included in all structures to be passed through that interface. The IOCTL handler casts this header to a particular type that it expects - All sanity testing on the past in structure is performed in the generic ioctl infrastructure code. - All ioctl handlers changed to take the header instead of a particular structure type Signed-off-by: Amir Shehata Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2456 Reviewed-on: http://review.whamcloud.com/8021 Reviewed-by: Doug Oucharek Reviewed-by: James Simmons Reviewed-by: John L. Hammond Reviewed-by: Oleg Drokin --- .../lustre/include/linux/libcfs/libcfs_ioctl.h | 23 drivers/staging/lustre/lnet/lnet/module.c |4 +- drivers/staging/lustre/lnet/selftest/conctl.c |9 +++- drivers/staging/lustre/lnet/selftest/console.c |2 +- drivers/staging/lustre/lnet/selftest/console.h |1 - .../lustre/lustre/libcfs/linux/linux-module.c | 54 drivers/staging/lustre/lustre/libcfs/module.c | 51 +- 7 files changed, 80 insertions(+), 64 deletions(-) diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_ioctl.h b/drivers/staging/lustre/include/linux/libcfs/libcfs_ioctl.h index e4463ad..0598702 100644 --- a/drivers/staging/lustre/include/linux/libcfs/libcfs_ioctl.h +++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_ioctl.h @@ -43,9 +43,13 @@ #define LIBCFS_IOCTL_VERSION 0x0001000a -struct libcfs_ioctl_data { +struct libcfs_ioctl_hdr { __u32 ioc_len; __u32 ioc_version; +}; + +struct libcfs_ioctl_data { + struct libcfs_ioctl_hdr ioc_hdr; __u64 ioc_nid; __u64 ioc_u64[1]; @@ -70,11 +74,6 @@ struct libcfs_ioctl_data { #define ioc_priority ioc_u32[0] -struct libcfs_ioctl_hdr { - __u32 ioc_len; - __u32 ioc_version; -}; - struct libcfs_debug_ioctl_data { struct libcfs_ioctl_hdr hdr; unsigned int subs; @@ -90,7 +89,7 @@ do { \ struct libcfs_ioctl_handler { struct list_head item; - int (*handle_ioctl)(unsigned int cmd, struct libcfs_ioctl_data *data); + int (*handle_ioctl)(unsigned int cmd, struct libcfs_ioctl_hdr *hdr); }; #define DECLARE_IOCTL_HANDLER(ident, func) \ @@ -148,9 +147,9 @@ static inline int libcfs_ioctl_packlen(struct libcfs_ioctl_data *data) return len; } -static inline int libcfs_ioctl_is_invalid(struct libcfs_ioctl_data *data) +static inline bool libcfs_ioctl_is_invalid(struct libcfs_ioctl_data *data) { - if (data->ioc_len > (1<<30)) { + if (data->ioc_hdr.ioc_len > (1 << 30)) { CERROR("LIBCFS ioctl: ioc_len larger than 1<<30\n"); return 1; } @@ -186,7 +185,7 @@ static inline int libcfs_ioctl_is_invalid(struct libcfs_ioctl_data *data) CERROR("LIBCFS ioctl: plen2 nonzero but no pbuf2 pointer\n"); return 1; } - if ((__u32)libcfs_ioctl_packlen(data) != data->ioc_len) { + if ((__u32)libcfs_ioctl_packlen(data) != data->ioc_hdr.ioc_len) { CERROR("LIBCFS ioctl: packlen != ioc_len\n"); return 1; } @@ -206,7 +205,9 @@ static inline int libcfs_ioctl_is_invalid(struct libcfs_ioctl_data *data) int libcfs_register_ioctl(struct libcfs_ioctl_handler *hand); int libcfs_deregister_ioctl(struct libcfs_ioctl_handler *hand); -int libcfs_ioctl_getdata(char *buf, char *end, void __user *arg); +int libcfs_ioctl_getdata_len(const struct libcfs_ioctl_hdr __user *arg, +__u32 *buf_len); int libcfs_ioctl_popdata(void __user *arg, void *buf, int size); +int libcfs_ioctl_data_adjust(struct libcfs_ioctl_data *data); #endif /* __LIBCFS_IOCTL_H__ */ diff --git a/drivers/staging/lustre/lnet/lnet/module.c b/drivers/staging/lustre/lnet/lnet/module.c index cd37303..46f5241 100644 --- a/drivers/staging/lustre/lnet/lnet/module.c +++ b/drivers/staging/lustre/lnet/lnet/module.c @@ -84,7 +84,7 @@ lnet_unconfigure(void) } static int -lnet_ioctl(unsigned int cmd, struct libcfs_ioctl_data *data) +lnet_ioctl(unsigned int cmd, struct libcfs_ioctl_hdr *hdr) { int rc; @@ -103,7 +103,7 @@ lnet_ioctl(unsigned int cmd, struct libcfs_ioctl_data *data) */ rc = LNetNIInit(LNET_PID_ANY); if (rc >= 0) { - rc = LNetCtl(cmd, data); + rc = LNetCtl(cmd, hdr); LNetNIFini(); } return rc; diff --git a/drivers/staging/lustre/l
[PATCH 03/24] staging: lustre: fix crash due to NULL networks string
From: Amir Shehata If there is an invalid networks or ip2nets lnet_parse_networks() gets called with a NULL 'network' string parameter lnet_parse_networks() needs to sanitize its input string now that it's being called from multiple places. Instead, check for a NULL string everytime the function is called, which reduces the probability of errors with other code modifications. Signed-off-by: Amir Shehata Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5540 Reviewed-on: http://review.whamcloud.com/11626 Reviewed-by: Isaac Huang Reviewed-by: Doug Oucharek Reviewed-by: Oleg Drokin --- drivers/staging/lustre/lnet/lnet/api-ni.c |5 + drivers/staging/lustre/lnet/lnet/config.c |9 - 2 files changed, 9 insertions(+), 5 deletions(-) diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c index b2b914a..c68d01e 100644 --- a/drivers/staging/lustre/lnet/lnet/api-ni.c +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c @@ -1535,7 +1535,6 @@ LNetNIInit(lnet_pid_t requested_pid) lnet_ping_info_t *pinfo; lnet_handle_md_t md_handle; struct list_head net_head; - char *nets; INIT_LIST_HEAD(&net_head); @@ -1550,13 +1549,11 @@ LNetNIInit(lnet_pid_t requested_pid) return rc; } - nets = lnet_get_networks(); - rc = lnet_prepare(requested_pid); if (rc) goto failed0; - rc = lnet_parse_networks(&net_head, nets); + rc = lnet_parse_networks(&net_head, lnet_get_networks()); if (rc < 0) goto failed1; diff --git a/drivers/staging/lustre/lnet/lnet/config.c b/drivers/staging/lustre/lnet/lnet/config.c index 1ef07cd..013d41b 100644 --- a/drivers/staging/lustre/lnet/lnet/config.c +++ b/drivers/staging/lustre/lnet/lnet/config.c @@ -184,7 +184,7 @@ int lnet_parse_networks(struct list_head *nilist, char *networks) { struct cfs_expr_list *el = NULL; - int tokensize = strlen(networks) + 1; + int tokensize; char *tokens; char *str; char *tmp; @@ -192,6 +192,11 @@ lnet_parse_networks(struct list_head *nilist, char *networks) __u32 net; int nnets = 0; + if (!networks) { + CERROR("networks string is undefined\n"); + return -EINVAL; + } + if (strlen(networks) > LNET_SINGLE_TEXTBUF_NOB) { /* _WAY_ conservative */ LCONSOLE_ERROR_MSG(0x112, @@ -199,6 +204,8 @@ lnet_parse_networks(struct list_head *nilist, char *networks) return -EINVAL; } + tokensize = strlen(networks) + 1; + LIBCFS_ALLOC(tokens, tokensize); if (!tokens) { CERROR("Can't allocate net tokens\n"); -- 1.7.1
[PATCH 04/24] staging: lustre: DLC user/kernel space glue code
From: Amir Shehata This is the sixth patch of a set of patches that enables DLC. This patch enables the user space to call into the kernel space DLC code. Added handlers in the LNetCtl function to call the new functions added for Dynamic Lnet Configuration Signed-off-by: Amir Shehata ntel-bug-id: https://jira.hpdd.intel.com/browse/LU-2456 Reviewed-on: http://review.whamcloud.com/8023 Reviewed-by: James Simmons Reviewed-by: Doug Oucharek Reviewed-by: John L. Hammond Reviewed-by: Oleg Drokin --- .../staging/lustre/include/linux/lnet/lib-lnet.h | 22 ++- .../staging/lustre/include/linux/lnet/lib-types.h |8 + drivers/staging/lustre/lnet/lnet/api-ni.c | 188 ++-- drivers/staging/lustre/lnet/lnet/module.c | 64 +++- drivers/staging/lustre/lnet/lnet/peer.c| 30 ++-- drivers/staging/lustre/lnet/lnet/router.c | 67 ++-- 6 files changed, 329 insertions(+), 50 deletions(-) diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h index 1157819..5e16fe0 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h @@ -39,6 +39,7 @@ #include "api.h" #include "lnet.h" #include "lib-types.h" +#include "lib-dlc.h" extern lnet_t the_lnet; /* THE network */ @@ -458,6 +459,12 @@ int lnet_del_route(__u32 net, lnet_nid_t gw_nid); void lnet_destroy_routes(void); int lnet_get_route(int idx, __u32 *net, __u32 *hops, lnet_nid_t *gateway, __u32 *alive, __u32 *priority); +int lnet_get_net_config(int idx, __u32 *cpt_count, __u64 *nid, + int *peer_timeout, int *peer_tx_credits, + int *peer_rtr_cr, int *max_tx_credits, + struct lnet_ioctl_net_config *net_config); +int lnet_get_rtr_pool_cfg(int idx, struct lnet_ioctl_pool_cfg *pool_cfg); + void lnet_router_debugfs_init(void); void lnet_router_debugfs_fini(void); int lnet_rtrpools_alloc(int im_a_router); @@ -467,6 +474,10 @@ int lnet_rtrpools_enable(void); void lnet_rtrpools_disable(void); void lnet_rtrpools_free(int keep_pools); lnet_remotenet_t *lnet_find_net_locked(__u32 net); +int lnet_dyn_add_ni(lnet_pid_t requested_pid, char *nets, + __s32 peer_timeout, __s32 peer_cr, __s32 peer_buf_cr, + __s32 credits); +int lnet_dyn_del_ni(__u32 net); int lnet_islocalnid(lnet_nid_t nid); int lnet_islocalnet(__u32 net); @@ -693,11 +704,12 @@ void lnet_peer_tables_cleanup(lnet_ni_t *ni); void lnet_peer_tables_destroy(void); int lnet_peer_tables_create(void); void lnet_debug_peer(lnet_nid_t nid); -int lnet_get_peers(int count, __u64 *nid, char *alivness, - int *ncpt, int *refcount, - int *ni_peer_tx_credits, int *peer_tx_credits, - int *peer_rtr_credits, int *peer_min_rtr_credtis, - int *peer_tx_qnob); +int lnet_get_peer_info(__u32 peer_index, __u64 *nid, + char alivness[LNET_MAX_STR_LEN], + __u32 *cpt_iter, __u32 *refcount, + __u32 *ni_peer_tx_credits, __u32 *peer_tx_credits, + __u32 *peer_rtr_credits, __u32 *peer_min_rtr_credtis, + __u32 *peer_tx_qnob); static inline void lnet_peer_set_alive(lnet_peer_t *lp) diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h index b0ba9d8..e4a8f6e 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h @@ -627,6 +627,14 @@ typedef struct { /* test protocol compatibility flags */ int ln_testprotocompat; + /* +* 0 - load the NIs from the mod params +* 1 - do not load the NIs from the mod params +* Reverse logic to ensure that other calls to LNetNIInit +* need no change +*/ + bool ln_nis_from_mod_params; + } lnet_t; #endif diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c index c68d01e..fa65797 100644 --- a/drivers/staging/lustre/lnet/lnet/api-ni.c +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c @@ -1553,7 +1553,9 @@ LNetNIInit(lnet_pid_t requested_pid) if (rc) goto failed0; - rc = lnet_parse_networks(&net_head, lnet_get_networks()); + rc = lnet_parse_networks(&net_head, +!the_lnet.ln_nis_from_mod_params ? +lnet_get_networks() : ""); if (rc < 0) goto failed1; @@ -1668,6 +1670,94 @@ LNetNIFini(void) } EXPORT_SYMBOL(LNetNIFini); +/** + * Grabs the ni data from the ni structure and fills the out + * parameters + * + * \param[in] ni network interface structure + * \param[out] c
[PATCH 08/24] staging: lustre: return appropriate errno when adding route
From: Amir Shehata When adding route it ignored specific scenarios, namely: 1. route already exists 2. route is on a local net 3. route is unreacheable This patch returns the appropriate return codes from the lower level function lnet_add_route(), and then ignores the above case from the calling function, lnet_parse_route(). This is needed so we don't halt processing routes in the module parameters. However, we can now add routes dynamically, and it should be returned to the user whether adding the requested route succeeded or failed. In userspace it is determined whether to continue adding routes or to halt processing. Currently "lnetctl import < config" continues adding the rest of the configuration and reports at the end which operations passed and which ones failed. Signed-off-by: Amir Shehata Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6045 Reviewed-on: http://review.whamcloud.com/13116 Reviewed-by: James Simmons Reviewed-by: Doug Oucharek Reviewed-by: Isaac Huang Reviewed-by: Oleg Drokin --- drivers/staging/lustre/lnet/lnet/config.c |2 +- drivers/staging/lustre/lnet/lnet/router.c | 11 +++ 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/staging/lustre/lnet/lnet/config.c b/drivers/staging/lustre/lnet/lnet/config.c index c04a0ef..8c80625 100644 --- a/drivers/staging/lustre/lnet/lnet/config.c +++ b/drivers/staging/lustre/lnet/lnet/config.c @@ -769,7 +769,7 @@ lnet_parse_route(char *str, int *im_a_router) } rc = lnet_add_route(net, hops, nid, priority); - if (rc) { + if (rc && rc != -EEXIST && rc != -EHOSTUNREACH) { CERROR("Can't create route to %s via %s\n", libcfs_net2str(net), libcfs_nid2str(nid)); diff --git a/drivers/staging/lustre/lnet/lnet/router.c b/drivers/staging/lustre/lnet/lnet/router.c index d748931..511e446 100644 --- a/drivers/staging/lustre/lnet/lnet/router.c +++ b/drivers/staging/lustre/lnet/lnet/router.c @@ -317,7 +317,7 @@ lnet_add_route(__u32 net, unsigned int hops, lnet_nid_t gateway, return -EINVAL; if (lnet_islocalnet(net)) /* it's a local network */ - return 0; /* ignore the route entry */ + return -EEXIST; /* Assume net, route, all new */ LIBCFS_ALLOC(route, sizeof(*route)); @@ -348,7 +348,7 @@ lnet_add_route(__u32 net, unsigned int hops, lnet_nid_t gateway, LIBCFS_FREE(rnet, sizeof(*rnet)); if (rc == -EHOSTUNREACH) /* gateway is not on a local net */ - return 0; /* ignore the route entry */ + return rc; /* ignore the route entry */ CERROR("Error %d creating route %s %d %s\n", rc, libcfs_net2str(net), hops, libcfs_nid2str(gateway)); @@ -395,14 +395,17 @@ lnet_add_route(__u32 net, unsigned int hops, lnet_nid_t gateway, /* -1 for notify or !add_route */ lnet_peer_decref_locked(route->lr_gateway); lnet_net_unlock(LNET_LOCK_EX); + rc = 0; - if (!add_route) + if (!add_route) { + rc = -EEXIST; LIBCFS_FREE(route, sizeof(*route)); + } if (rnet != rnet2) LIBCFS_FREE(rnet, sizeof(*rnet)); - return 0; + return rc; } int -- 1.7.1
[PATCH v2] ASoC: wm9713: fix regmap free path
In the conversion to regmap, I assumed that the devm_() variant could be used in the soc probe function. As a mater of fact with the current code the regmap is freed twice because of the devm_() call: (mutex_lock) from [] (debugfs_remove_recursive+0x50/0x1d0) (debugfs_remove_recursive) from [] (regmap_debugfs_exit+0x1c/0xd4) (regmap_debugfs_exit) from [] (regmap_exit+0x28/0xc8) (regmap_exit) from [] (release_nodes+0x18c/0x204) (release_nodes) from [] (device_release+0x18/0x90) (device_release) from [] (kobject_release+0x90/0x1bc) (kobject_release) from [] (wm9713_soc_remove+0x1c/0x24) (wm9713_soc_remove) from [] (soc_remove_component+0x50/0x7c) (soc_remove_component) from [] (soc_remove_dai_links+0x118/0x228) (soc_remove_dai_links) from [] (snd_soc_register_card+0x4e4/0xdd4) (snd_soc_register_card) from [] (devm_snd_soc_register_card+0x34/0x70) Fix this by replacing the devm_regmap initialization code with the non devm_() variant. Fixes: 700dadfefc3d ASoC: wm9713: convert to regmap Cc: Lars-Peter Clausen Signed-off-by: Robert Jarzmik --- Since v1: changed the fix into removing the devm_() variant --- sound/soc/codecs/wm9713.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sound/soc/codecs/wm9713.c b/sound/soc/codecs/wm9713.c index 4e522302bb8a..edd270f6d8e5 100644 --- a/sound/soc/codecs/wm9713.c +++ b/sound/soc/codecs/wm9713.c @@ -1212,7 +1212,7 @@ static int wm9713_soc_probe(struct snd_soc_codec *codec) if (IS_ERR(wm9713->ac97)) return PTR_ERR(wm9713->ac97); - regmap = devm_regmap_init_ac97(wm9713->ac97, &wm9713_regmap_config); + regmap = regmap_init_ac97(wm9713->ac97, &wm9713_regmap_config); if (IS_ERR(regmap)) { snd_soc_free_ac97_codec(wm9713->ac97); return PTR_ERR(regmap); -- 2.1.4
[PATCH 09/24] staging: lustre: make some lnet functions static
From: Frank Zago Some functions and variables are only used in their C file, so reduce their scope. This reduces the code size, and fixes sparse warnings such as: warning: symbol 'proc_lnet_routes' was not declared. Should it be static? warning: symbol 'proc_lnet_routers' was not declared. Should it be static? Some prototypes were removed from C files and added to the proper header. Signed-off-by: Frank Zago Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5396 Reviewed-on: http://review.whamcloud.com/12206 Reviewed-by: James Simmons Reviewed-by: John L. Hammond Reviewed-by: Andreas Dilger --- .../staging/lustre/include/linux/lnet/lib-lnet.h |2 ++ drivers/staging/lustre/lnet/lnet/router_proc.c |2 -- drivers/staging/lustre/lnet/selftest/console.c |4 +--- drivers/staging/lustre/lnet/selftest/framework.c | 10 -- drivers/staging/lustre/lnet/selftest/module.c |4 +--- drivers/staging/lustre/lnet/selftest/rpc.c |2 +- 6 files changed, 5 insertions(+), 19 deletions(-) diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h index 2ee3d73..0928bc9 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h @@ -448,6 +448,8 @@ lnet_ni_t *lnet_nid2ni_locked(lnet_nid_t nid, int cpt); lnet_ni_t *lnet_net2ni_locked(__u32 net, int cpt); lnet_ni_t *lnet_net2ni(__u32 net); +extern int portal_rotor; + int lnet_init(void); void lnet_fini(void); diff --git a/drivers/staging/lustre/lnet/lnet/router_proc.c b/drivers/staging/lustre/lnet/lnet/router_proc.c index 5cb87b8..fc643df 100644 --- a/drivers/staging/lustre/lnet/lnet/router_proc.c +++ b/drivers/staging/lustre/lnet/lnet/router_proc.c @@ -800,8 +800,6 @@ static struct lnet_portal_rotorsportal_rotors[] = { }, }; -extern int portal_rotor; - static int __proc_lnet_portal_rotor(void *data, int write, loff_t pos, void __user *buffer, int nob) { diff --git a/drivers/staging/lustre/lnet/selftest/console.c b/drivers/staging/lustre/lnet/selftest/console.c index badc696..e8ca1bf 100644 --- a/drivers/staging/lustre/lnet/selftest/console.c +++ b/drivers/staging/lustre/lnet/selftest/console.c @@ -1693,8 +1693,6 @@ lstcon_new_session_id(lst_sid_t *sid) sid->ses_stamp = cfs_time_current(); } -extern srpc_service_t lstcon_acceptor_service; - int lstcon_session_new(char *name, int key, unsigned feats, int timeout, int force, lst_sid_t __user *sid_up) @@ -1973,7 +1971,7 @@ out: return rc; } -srpc_service_t lstcon_acceptor_service; +static srpc_service_t lstcon_acceptor_service; static void lstcon_init_acceptor_service(void) { /* initialize selftest console acceptor service table */ diff --git a/drivers/staging/lustre/lnet/selftest/framework.c b/drivers/staging/lustre/lnet/selftest/framework.c index 7eca046..3bbc720 100644 --- a/drivers/staging/lustre/lnet/selftest/framework.c +++ b/drivers/staging/lustre/lnet/selftest/framework.c @@ -1629,16 +1629,6 @@ static srpc_service_t sfw_services[] = { } }; -extern sfw_test_client_ops_t ping_test_client; -extern srpc_service_t ping_test_service; -extern void ping_init_test_client(void); -extern void ping_init_test_service(void); - -extern sfw_test_client_ops_t brw_test_client; -extern srpc_service_t brw_test_service; -extern void brw_init_test_client(void); -extern void brw_init_test_service(void); - int sfw_startup(void) { diff --git a/drivers/staging/lustre/lnet/selftest/module.c b/drivers/staging/lustre/lnet/selftest/module.c index c4bf442..cbb7884 100644 --- a/drivers/staging/lustre/lnet/selftest/module.c +++ b/drivers/staging/lustre/lnet/selftest/module.c @@ -37,6 +37,7 @@ #define DEBUG_SUBSYSTEM S_LNET #include "selftest.h" +#include "console.h" enum { LST_INIT_NONE = 0, @@ -47,9 +48,6 @@ enum { LST_INIT_CONSOLE }; -extern int lstcon_console_init(void); -extern int lstcon_console_fini(void); - static int lst_init_step = LST_INIT_NONE; struct cfs_wi_sched *lst_sched_serial; diff --git a/drivers/staging/lustre/lnet/selftest/rpc.c b/drivers/staging/lustre/lnet/selftest/rpc.c index 4213198..1b76933 100644 --- a/drivers/staging/lustre/lnet/selftest/rpc.c +++ b/drivers/staging/lustre/lnet/selftest/rpc.c @@ -1097,7 +1097,7 @@ srpc_client_rpc_expired(void *data) spin_unlock(&srpc_data.rpc_glock); } -inline void +static void srpc_add_client_rpc_timer(srpc_client_rpc_t *rpc) { stt_timer_t *timer = &rpc->crpc_timer; -- 1.7.1
[PATCH 10/24] staging: lustre: missed a few cases of using NULL instead of 0
From: Frank Zago It is preferable to use NULL instead of 0 for pointers. This fixes sparse warnings such as: lustre/fld/fld_request.c:126:17: warning: Using plain integer as NULL pointer The second parameter of class_match_param() was changed to a const, to be able to remove a cast in one user, to prevent splitting a long line. No other code change. Signed-off-by: Frank Zago Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5396 Reviewed-on: http://review.whamcloud.com/12567 Reviewed-by: Dmitry Eremin Reviewed-by: John L. Hammond Reviewed-by: Bob Glossman Reviewed-by: James Simmons Reviewed-by: Oleg Drokin --- drivers/staging/lustre/lnet/lnet/api-ni.c |2 +- drivers/staging/lustre/lustre/llite/llite_lib.c|2 +- .../staging/lustre/lustre/obdclass/obd_config.c|4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c index 3b7bc36..f223d5d 100644 --- a/drivers/staging/lustre/lnet/lnet/api-ni.c +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c @@ -904,7 +904,7 @@ lnet_ping_info_setup(lnet_ping_info_t **ppinfo, lnet_handle_md_t *md_handle, { lnet_process_id_t id = {LNET_NID_ANY, LNET_PID_ANY}; lnet_handle_me_t me_handle; - lnet_md_t md = {0}; + lnet_md_t md = { NULL }; int rc, rc2; if (set_eq) { diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c index 4de085d..2440b07 100644 --- a/drivers/staging/lustre/lustre/llite/llite_lib.c +++ b/drivers/staging/lustre/lustre/llite/llite_lib.c @@ -1940,7 +1940,7 @@ int ll_prep_inode(struct inode **inode, struct ptlrpc_request *req, struct super_block *sb, struct lookup_intent *it) { struct ll_sb_info *sbi = NULL; - struct lustre_md md; + struct lustre_md md = { NULL }; int rc; LASSERT(*inode || sb); diff --git a/drivers/staging/lustre/lustre/obdclass/obd_config.c b/drivers/staging/lustre/lustre/obdclass/obd_config.c index c4128ac..6417946 100644 --- a/drivers/staging/lustre/lustre/obdclass/obd_config.c +++ b/drivers/staging/lustre/lustre/obdclass/obd_config.c @@ -72,7 +72,7 @@ EXPORT_SYMBOL(class_find_param); /* returns 0 if this is the first key in the buffer, else 1. valp points to first char after key. */ -static int class_match_param(char *buf, char *key, char **valp) +static int class_match_param(char *buf, const char *key, char **valp) { if (!buf) return 1; @@ -1008,7 +1008,7 @@ int class_process_proc_param(char *prefix, struct lprocfs_vars *lvars, /* Search proc entries */ while (lvars[j].name) { var = &lvars[j]; - if (class_match_param(key, (char *)var->name, NULL) == 0 + if (!class_match_param(key, var->name, NULL) && keylen == strlen(var->name)) { matched++; rc = -EROFS; -- 1.7.1
[PATCH 05/24] staging: lustre: make local functions static for LNet ni
From: Frank Zago The function lnet_unprepare can be made static. Signed-off-by: Frank Zago Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5396 Reviewed-on: http://review.whamcloud.com/11306 Reviewed-by: James Simmons Reviewed-by: Patrick Farrell Reviewed-by: Amir Shehata Reviewed-by: John L. Hammond Reviewed-by: Oleg Drokin --- drivers/staging/lustre/lnet/lnet/api-ni.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c index fa65797..7583ae4 100644 --- a/drivers/staging/lustre/lnet/lnet/api-ni.c +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c @@ -525,7 +525,7 @@ lnet_res_lh_initialize(struct lnet_res_container *rec, lnet_libhandle_t *lh) list_add(&lh->lh_hash_chain, &rec->rec_lh_hash[hash]); } -int lnet_unprepare(void); +static int lnet_unprepare(void); static int lnet_prepare(lnet_pid_t requested_pid) @@ -611,7 +611,7 @@ lnet_prepare(lnet_pid_t requested_pid) return rc; } -int +static int lnet_unprepare(void) { /* -- 1.7.1
[PATCH 14/24] staging: lustre: handle lnet_check_routes() errors
From: Amir Shehata After adding a route, lnet_check_routes() is called to ensure that the route added doesn't invalidate the routing configuration. If lnet_check_routes() fails then the route just added, which caused the current configuration to be invalidated is deleted, and an error is returned to the user. Signed-off-by: Amir Shehata Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6218 Reviewed-on: http://review.whamcloud.com/13445 Reviewed-by: Liang Zhen Reviewed-by: Doug Oucharek Reviewed-by: Oleg Drokin --- drivers/staging/lustre/lnet/lnet/api-ni.c |8 +++- 1 files changed, 7 insertions(+), 1 deletions(-) diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c index ccd7dcd..ed121a8 100644 --- a/drivers/staging/lustre/lnet/lnet/api-ni.c +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c @@ -1907,8 +1907,14 @@ LNetCtl(unsigned int cmd, void *arg) config->cfg_config_u.cfg_route.rtr_hop, config->cfg_nid, config->cfg_config_u.cfg_route.rtr_priority); + if (!rc) { + rc = lnet_check_routes(); + if (rc) + lnet_del_route(config->cfg_net, + config->cfg_nid); + } mutex_unlock(&the_lnet.ln_api_mutex); - return rc ? rc : lnet_check_routes(); + return rc; case IOC_LIBCFS_DEL_ROUTE: config = arg; -- 1.7.1
[PATCH 07/24] staging: lustre: improve LNet clean up code and API
From: Amir Shehata This patch addresses a set of related issues: LU-5568, LU-5734, LU-5839, LU-5849, LU-5850. Create the local lnet_startup_lndni() API. This function starts up one LND. lnet_startup_lndnis() calls this function in a loop on every ni in the list passed in. lnet_startup_lndni() is responsible for cleaning up after itself in case of failure. It calls lnet_free_ni() if the ni fails to start. It calls lnet_shutdown_lndni() if it successfully called the lnd startup function, but fails later on. lnet_startup_lndnis() also cleans up after itself. If lnet_startup_lndni() fails then lnet_shutdown_lndnis() is called to clean up all nis that might have been started, and then free the rest of the nis on the list which have not been started yet. To facilitate the above changes lnet_dyn_del_ni() now manages the ping info. It calls lnet_shutdown_lndni(), to shutdown the NI. lnet_shutdown_lndni() is no longer an exposed API and doesn't manage the ping info, making it callable from lnet_startup_lndni() as well. There are two scenarios for calling lnet_startup_lndni() 1. from lnet_startup_lndnis() If lnet_startup_lndni() fails it requires to shutdown the ni without doing anything with the ping information as it hasn't been created yet. 2. from lnet_dyn_add_ni() As above it will shutdown the ni, and then lnet_dyn_add_ni() will take care of managing the ping info The second part of this change is to ensure that the LOLND is not added by lnet_parse_networks(), but the caller which needs to do it (IE: LNetNIInit) This change ensures that lnet_dyn_add_ni() need only check if there is only one net that's being added, if not then it frees everything, otherwise it proceeds to startup the requested net. Signed-off-by: Amir Shehata Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5734 Reviewed-on: http://review.whamcloud.com/12658 Reviewed-by: Liang Zhen Reviewed-by: Isaac Huang Reviewed-by: James Simmons Reviewed-by: Oleg Drokin --- .../staging/lustre/include/linux/lnet/lib-lnet.h |2 + drivers/staging/lustre/lnet/lnet/api-ni.c | 468 ++-- drivers/staging/lustre/lnet/lnet/config.c | 14 +- 3 files changed, 251 insertions(+), 233 deletions(-) diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h index 5e16fe0..2ee3d73 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h @@ -422,6 +422,8 @@ lnet_ni_decref(lnet_ni_t *ni) } void lnet_ni_free(lnet_ni_t *ni); +lnet_ni_t * +lnet_ni_alloc(__u32 net, struct cfs_expr_list *el, struct list_head *nilist); static inline int lnet_nid2peerhash(lnet_nid_t nid) diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c index 3bed4c3..3b7bc36 100644 --- a/drivers/staging/lustre/lnet/lnet/api-ni.c +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c @@ -1066,6 +1066,20 @@ lnet_ni_tq_credits(lnet_ni_t *ni) } static void +lnet_ni_unlink_locked(lnet_ni_t *ni) +{ + if (!list_empty(&ni->ni_cptlist)) { + list_del_init(&ni->ni_cptlist); + lnet_ni_decref_locked(ni, 0); + } + + /* move it to zombie list and nobody can find it anymore */ + LASSERT(!list_empty(&ni->ni_list)); + list_move(&ni->ni_list, &the_lnet.ln_nis_zombie); + lnet_ni_decref_locked(ni, 0); /* drop ln_nis' ref */ +} + +static void lnet_clear_zombies_nis_locked(void) { int i; @@ -1148,14 +1162,7 @@ lnet_shutdown_lndnis(void) while (!list_empty(&the_lnet.ln_nis)) { ni = list_entry(the_lnet.ln_nis.next, lnet_ni_t, ni_list); - /* move it to zombie list and nobody can find it anymore */ - list_move(&ni->ni_list, &the_lnet.ln_nis_zombie); - lnet_ni_decref_locked(ni, 0); /* drop ln_nis' ref */ - - if (!list_empty(&ni->ni_cptlist)) { - list_del_init(&ni->ni_cptlist); - lnet_ni_decref_locked(ni, 0); - } + lnet_ni_unlink_locked(ni); } /* Drop the cached eqwait NI. */ @@ -1192,228 +1199,196 @@ lnet_shutdown_lndnis(void) lnet_net_unlock(LNET_LOCK_EX); } -int -lnet_shutdown_lndni(__u32 net) +/* shutdown down the NI and release refcount */ +static void +lnet_shutdown_lndni(struct lnet_ni *ni) { - lnet_ping_info_t *pinfo; - lnet_handle_md_t md_handle; - lnet_ni_t *found_ni = NULL; - int ni_count; - int rc; - - if (LNET_NETTYP(net) == LOLND) - return -EINVAL; - - ni_count = lnet_get_ni_count(); - - /* create and link a new ping info, before removing the old one */ - rc = lnet_ping_info_setup(&pinfo, &md_handle, ni_count - 1, false); - if (rc) - return rc; - - /* proceed with shutting down the NI */ lnet_n
[PATCH 11/24] staging: lustre: startup lnet acceptor thread dynamically
From: Amir Shehata With DLC it's possible to start up a system with no NIs that require the acceptor thread, and thus it won't start. Later on the user can add an NI that requires the acceptor thread to start, it is then necessary to start it up. If the user removes a NI and as a result there are no more NIs that require the acceptor thread then it should be stopped. This patch adds logic in the dynamically adding and removing NIs code to ensure the above logic is implemented. Signed-off-by: Amir Shehata Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6002 Reviewed-on: http://review.whamcloud.com/13010 Reviewed-by: Isaac Huang Reviewed-by: Doug Oucharek Reviewed-by: Oleg Drokin --- drivers/staging/lustre/lnet/lnet/acceptor.c | 10 -- drivers/staging/lustre/lnet/lnet/api-ni.c | 14 ++ 2 files changed, 22 insertions(+), 2 deletions(-) diff --git a/drivers/staging/lustre/lnet/lnet/acceptor.c b/drivers/staging/lustre/lnet/lnet/acceptor.c index 07df727..9fe3ff7 100644 --- a/drivers/staging/lustre/lnet/lnet/acceptor.c +++ b/drivers/staging/lustre/lnet/lnet/acceptor.c @@ -46,7 +46,9 @@ static struct { int pta_shutdown; struct socket *pta_sock; struct completion pta_signal; -} lnet_acceptor_state; +} lnet_acceptor_state = { + .pta_shutdown = 1 +}; int lnet_acceptor_port(void) @@ -444,6 +446,10 @@ lnet_acceptor_start(void) long rc2; long secure; + /* if acceptor is already running return immediately */ + if (!lnet_acceptor_state.pta_shutdown) + return 0; + LASSERT(!lnet_acceptor_state.pta_sock); rc = lnet_acceptor_get_tunables(); @@ -484,7 +490,7 @@ lnet_acceptor_start(void) void lnet_acceptor_stop(void) { - if (!lnet_acceptor_state.pta_sock) /* not running */ + if (lnet_acceptor_state.pta_shutdown) /* not running */ return; lnet_acceptor_state.pta_shutdown = 1; diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c index f223d5d..9497ce1 100644 --- a/drivers/staging/lustre/lnet/lnet/api-ni.c +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c @@ -1785,6 +1785,16 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, char *nets, if (rc) goto failed1; + if (ni->ni_lnd->lnd_accept) { + rc = lnet_acceptor_start(); + if (rc < 0) { + /* shutdown the ni that we just started */ + CERROR("Failed to start up acceptor thread\n"); + lnet_shutdown_lndni(ni); + goto failed1; + } + } + lnet_ping_target_update(pinfo, md_handle); mutex_unlock(&the_lnet.ln_api_mutex); @@ -1832,6 +1842,10 @@ lnet_dyn_del_ni(__u32 net) lnet_ni_decref_locked(ni, 0); lnet_shutdown_lndni(ni); + + if (!lnet_count_acceptor_nis()) + lnet_acceptor_stop(); + lnet_ping_target_update(pinfo, md_handle); goto out; failed: -- 1.7.1
[PATCH 13/24] staging: lustre: return -EEXIST if NI is not unique
From: Amir Shehata Return -EEXIST and not -EINVAL when trying to add a network interface which is not unique. Some minor cleanup in api-ni.c Signed-off-by: Amir Shehata Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5875 Reviewed-on: http://review.whamcloud.com/13056 Reviewed-by: Isaac Huang Reviewed-by: James Simmons Reviewed-by: Oleg Drokin --- drivers/staging/lustre/lnet/lnet/api-ni.c | 20 +--- 1 files changed, 9 insertions(+), 11 deletions(-) diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c index 62a9e45..ccd7dcd 100644 --- a/drivers/staging/lustre/lnet/lnet/api-ni.c +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c @@ -1219,7 +1219,7 @@ static int lnet_startup_lndni(struct lnet_ni *ni, __s32 peer_timeout, __s32 peer_cr, __s32 peer_buf_cr, __s32 credits) { - int rc = 0; + int rc = -EINVAL; int lnd_type; lnd_t *lnd; struct lnet_tx_queue *tq; @@ -1237,19 +1237,19 @@ lnet_startup_lndni(struct lnet_ni *ni, __s32 peer_timeout, /* Make sure this new NI is unique. */ lnet_net_lock(LNET_LOCK_EX); - if (!lnet_net_unique(LNET_NIDNET(ni->ni_nid), &the_lnet.ln_nis)) { + rc = lnet_net_unique(LNET_NIDNET(ni->ni_nid), &the_lnet.ln_nis); + lnet_net_unlock(LNET_LOCK_EX); + if (!rc) { if (lnd_type == LOLND) { - lnet_net_unlock(LNET_LOCK_EX); lnet_ni_free(ni); return 0; } - lnet_net_unlock(LNET_LOCK_EX); CERROR("Net %s is not unique\n", libcfs_net2str(LNET_NIDNET(ni->ni_nid))); + rc = -EEXIST; goto failed0; } - lnet_net_unlock(LNET_LOCK_EX); mutex_lock(&the_lnet.ln_lnd_mutex); lnd = lnet_find_lnd_by_type(lnd_type); @@ -1265,6 +1265,7 @@ lnet_startup_lndni(struct lnet_ni *ni, __s32 peer_timeout, CERROR("Can't load LND %s, module %s, rc=%d\n", libcfs_lnd2str(lnd_type), libcfs_lnd2modname(lnd_type), rc); + rc = -EINVAL; goto failed0; } } @@ -1354,7 +1355,7 @@ lnet_startup_lndni(struct lnet_ni *ni, __s32 peer_timeout, return 0; failed0: lnet_ni_free(ni); - return -EINVAL; + return rc; } static int @@ -1503,7 +1504,7 @@ int LNetNIInit(lnet_pid_t requested_pid) { int im_a_router = 0; - int rc, rc2; + int rc; int ni_count; lnet_ping_info_t *pinfo; lnet_handle_md_t md_handle; @@ -1592,10 +1593,7 @@ LNetNIInit(lnet_pid_t requested_pid) return 0; err_stop_ping: - lnet_ping_md_unlink(pinfo, &md_handle); - lnet_ping_info_free(pinfo); - rc2 = LNetEQFree(the_lnet.ln_ping_target_eq); - LASSERT(!rc2); + lnet_ping_target_fini(); err_acceptor_stop: the_lnet.ln_refcount = 0; lnet_acceptor_stop(); -- 1.7.1
[PATCH 12/24] staging: lustre: reject invalid net configuration for lnet
From: Amir Shehata Currently if there exists a route that goes over a remote net and then this net is added dynamically as a local net, then traffic stops because the code in lnet_send() determines that the destination nid can be reached from another local_ni, but the src_nid is still stuck on the earlier NI, because the src_nid is stored in the ptlrpc layer and is not updated when a local NI is configured. Signed-off-by: Amir Shehata Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5874 Reviewed-on: http://review.whamcloud.com/12912 Reviewed-by: Isaac Huang Reviewed-by: James Simmons Reviewed-by: Oleg Drokin --- drivers/staging/lustre/lnet/lnet/api-ni.c | 18 +- 1 files changed, 17 insertions(+), 1 deletions(-) diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c index 9497ce1..62a9e45 100644 --- a/drivers/staging/lustre/lnet/lnet/api-ni.c +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c @@ -1756,6 +1756,7 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, char *nets, lnet_handle_md_t md_handle; struct lnet_ni *ni; struct list_head net_head; + lnet_remotenet_t *rnet; int rc; INIT_LIST_HEAD(&net_head); @@ -1772,12 +1773,27 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, char *nets, goto failed0; } + ni = list_entry(net_head.next, struct lnet_ni, ni_list); + + lnet_net_lock(LNET_LOCK_EX); + rnet = lnet_find_net_locked(LNET_NIDNET(ni->ni_nid)); + lnet_net_unlock(LNET_LOCK_EX); + /* +* make sure that the net added doesn't invalidate the current +* configuration LNet is keeping +*/ + if (rnet) { + CERROR("Adding net %s will invalidate routing configuration\n", + nets); + rc = -EUSERS; + goto failed0; + } + rc = lnet_ping_info_setup(&pinfo, &md_handle, 1 + lnet_get_ni_count(), false); if (rc) goto failed0; - ni = list_entry(net_head.next, struct lnet_ni, ni_list); list_del_init(&ni->ni_list); rc = lnet_startup_lndni(ni, peer_timeout, peer_cr, -- 1.7.1
[PATCH 18/24] staging: lustre: remove messages from lazy portal on NI shutdown
From: Amir Shehata When shutting down an NI in a busy system, some messages received on this NI, might be on the lazy portal. They would have grabbed a ref count on the NI. Therefore NI will not be removed until messages are processed. In order to avoid this scenario, when an NI is shutdown go through all messages queued on the lazy portal and drop messages for the NI being shutdown Signed-off-by: Amir Shehata Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6040 Reviewed-on: http://review.whamcloud.com/13836 Reviewed-by: Isaac Huang Reviewed-by: Liang Zhen Reviewed-by: Oleg Drokin --- .../staging/lustre/include/linux/lnet/lib-lnet.h |1 + drivers/staging/lustre/lnet/lnet/api-ni.c |6 ++ drivers/staging/lustre/lnet/lnet/lib-ptl.c | 54 +--- 3 files changed, 43 insertions(+), 18 deletions(-) diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h index 0928bc9..a5f1aec 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h @@ -482,6 +482,7 @@ int lnet_dyn_add_ni(lnet_pid_t requested_pid, char *nets, __s32 peer_timeout, __s32 peer_cr, __s32 peer_buf_cr, __s32 credits); int lnet_dyn_del_ni(__u32 net); +int lnet_clear_lazy_portal(struct lnet_ni *ni, int portal, char *reason); int lnet_islocalnid(lnet_nid_t nid); int lnet_islocalnet(__u32 net); diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c index 0c7db19..3ecc96a 100644 --- a/drivers/staging/lustre/lnet/lnet/api-ni.c +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c @@ -1196,10 +1196,16 @@ lnet_shutdown_lndnis(void) static void lnet_shutdown_lndni(struct lnet_ni *ni) { + int i; + lnet_net_lock(LNET_LOCK_EX); lnet_ni_unlink_locked(ni); lnet_net_unlock(LNET_LOCK_EX); + /* clear messages for this NI on the lazy portal */ + for (i = 0; i < the_lnet.ln_nportals; i++) + lnet_clear_lazy_portal(ni, i, "Shutting down NI"); + /* Do peer table cleanup for this ni */ lnet_peer_tables_cleanup(ni); diff --git a/drivers/staging/lustre/lnet/lnet/lib-ptl.c b/drivers/staging/lustre/lnet/lnet/lib-ptl.c index 0cdeea9..5a9ab87 100644 --- a/drivers/staging/lustre/lnet/lnet/lib-ptl.c +++ b/drivers/staging/lustre/lnet/lnet/lib-ptl.c @@ -902,17 +902,8 @@ LNetSetLazyPortal(int portal) } EXPORT_SYMBOL(LNetSetLazyPortal); -/** - * Turn off the lazy portal attribute. Delayed requests on the portal, - * if any, will be all dropped when this function returns. - * - * \param portal Index of the portal to disable the lazy attribute on. - * - * \retval 0 On success. - * \retval -EINVAL If \a portal is not a valid index. - */ int -LNetClearLazyPortal(int portal) +lnet_clear_lazy_portal(struct lnet_ni *ni, int portal, char *reason) { struct lnet_portal *ptl; LIST_HEAD(zombies); @@ -931,21 +922,48 @@ LNetClearLazyPortal(int portal) return 0; } - if (the_lnet.ln_shutdown) - CWARN("Active lazy portal %d on exit\n", portal); - else - CDEBUG(D_NET, "clearing portal %d lazy\n", portal); + if (ni) { + struct lnet_msg *msg, *tmp; - /* grab all the blocked messages atomically */ - list_splice_init(&ptl->ptl_msg_delayed, &zombies); + /* grab all messages which are on the NI passed in */ + list_for_each_entry_safe(msg, tmp, &ptl->ptl_msg_delayed, +msg_list) { + if (msg->msg_rxpeer->lp_ni == ni) + list_move(&msg->msg_list, &zombies); + } + } else { + if (the_lnet.ln_shutdown) + CWARN("Active lazy portal %d on exit\n", portal); + else + CDEBUG(D_NET, "clearing portal %d lazy\n", portal); + + /* grab all the blocked messages atomically */ + list_splice_init(&ptl->ptl_msg_delayed, &zombies); - lnet_ptl_unsetopt(ptl, LNET_PTL_LAZY); + lnet_ptl_unsetopt(ptl, LNET_PTL_LAZY); + } lnet_ptl_unlock(ptl); lnet_res_unlock(LNET_LOCK_EX); - lnet_drop_delayed_msg_list(&zombies, "Clearing lazy portal attr"); + lnet_drop_delayed_msg_list(&zombies, reason); return 0; } + +/** + * Turn off the lazy portal attribute. Delayed requests on the portal, + * if any, will be all dropped when this function returns. + * + * \param portal Index of the portal to disable the lazy attribute on. + * + * \retval 0 On success. + * \retval -EINVAL If \a portal is not a valid index. + */ +int +LNetClearLazyPortal(int portal) +{ + return lnet_clear_lazy_portal(NULL, portal, + "Clearing lazy portal attr")
[PATCH 16/24] staging: lustre: assume a kernel build
From: John L. Hammond In lnet/lnet/ and lnet/selftest/ assume a kernel build (assume that __KERNEL__ is defined). Remove some common code only needed for user space LNet. Only part of the work of this patch got merged. This is the final bits. Signed-off-by: John L. Hammond Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2675 Reviewed-on: http://review.whamcloud.com/13121 Reviewed-by: James Simmons Reviewed-by: Amir Shehata Reviewed-by: Oleg Drokin --- .../staging/lustre/include/linux/lnet/lib-types.h |4 -- drivers/staging/lustre/lnet/lnet/acceptor.c|2 - drivers/staging/lustre/lnet/lnet/api-ni.c | 39 ++-- drivers/staging/lustre/lnet/lnet/lib-eq.c |3 -- drivers/staging/lustre/lnet/lnet/lib-md.c |3 -- drivers/staging/lustre/lnet/lnet/lib-me.c |3 -- drivers/staging/lustre/lnet/lnet/lib-move.c|5 --- drivers/staging/lustre/lnet/lnet/lib-msg.c | 20 +-- drivers/staging/lustre/lnet/lnet/router.c | 11 ++ 9 files changed, 7 insertions(+), 83 deletions(-) diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h index 06d4656..f588e06 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h @@ -577,8 +577,6 @@ typedef struct { /* dying LND instances */ struct list_head ln_nis_zombie; lnet_ni_t*ln_loni; /* the loopback NI */ - /* NI to wait for events in */ - lnet_ni_t*ln_eq_waitni; /* remote networks with routes to them */ struct list_head *ln_remote_nets_hash; @@ -608,8 +606,6 @@ typedef struct { struct mutex ln_api_mutex; struct mutex ln_lnd_mutex; - int ln_init; /* lnet_init() - called? */ /* Have I called LNetNIInit myself? */ int ln_niinit_self; /* LNetNIInit/LNetNIFini counter */ diff --git a/drivers/staging/lustre/lnet/lnet/acceptor.c b/drivers/staging/lustre/lnet/lnet/acceptor.c index 9fe3ff7..8f9876b 100644 --- a/drivers/staging/lustre/lnet/lnet/acceptor.c +++ b/drivers/staging/lustre/lnet/lnet/acceptor.c @@ -206,8 +206,6 @@ lnet_connect(struct socket **sockp, lnet_nid_t peer_nid, } EXPORT_SYMBOL(lnet_connect); -/* Below is the code common for both kernel and MT user-space */ - static int lnet_accept(struct socket *sock, __u32 magic) { diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c index 0ec656a..0c7db19 100644 --- a/drivers/staging/lustre/lnet/lnet/api-ni.c +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c @@ -291,7 +291,6 @@ lnet_register_lnd(lnd_t *lnd) { mutex_lock(&the_lnet.ln_lnd_mutex); - LASSERT(the_lnet.ln_init); LASSERT(libcfs_isknown_lnd(lnd->lnd_type)); LASSERT(!lnet_find_lnd_by_type(lnd->lnd_type)); @@ -309,7 +308,6 @@ lnet_unregister_lnd(lnd_t *lnd) { mutex_lock(&the_lnet.ln_lnd_mutex); - LASSERT(the_lnet.ln_init); LASSERT(lnet_find_lnd_by_type(lnd->lnd_type) == lnd); LASSERT(!lnd->lnd_refcount); @@ -1166,12 +1164,6 @@ lnet_shutdown_lndnis(void) lnet_ni_unlink_locked(ni); } - /* Drop the cached eqwait NI. */ - if (the_lnet.ln_eq_waitni) { - lnet_ni_decref_locked(the_lnet.ln_eq_waitni, 0); - the_lnet.ln_eq_waitni = NULL; - } - /* Drop the cached loopback NI. */ if (the_lnet.ln_loni) { lnet_ni_decref_locked(the_lnet.ln_loni, 0); @@ -1364,7 +1356,6 @@ lnet_startup_lndnis(struct list_head *nilist) { struct lnet_ni *ni; int rc; - int lnd_type; int ni_count = 0; while (!list_empty(nilist)) { @@ -1378,14 +1369,6 @@ lnet_startup_lndnis(struct list_head *nilist) ni_count++; } - if (the_lnet.ln_eq_waitni && ni_count > 1) { - lnd_type = the_lnet.ln_eq_waitni->ni_lnd->lnd_type; - LCONSOLE_ERROR_MSG(0x109, "LND %s can only run single-network\n", - libcfs_lnd2str(lnd_type)); - rc = -EINVAL; - goto failed; - } - return ni_count; failed: lnet_shutdown_lndnis(); @@ -1396,10 +1379,9 @@ failed: /** * Initialize LNet library. * - * Only userspace program needs to call this function - it's automatically - * called in the kernel at module loading time. Caller has to call lnet_fini() - * after a call to lnet_init(), if and only if the latter returned 0. It must - * be called exactly once. + * Automatically called at module loading time. Caller has to call + * lnet_exit() aft
[PATCH 17/24] staging: lustre: prevent assert on LNet module unload
From: Amir Shehata There is a use case where lnet can be unloaded while there are no NIs configured. Removing lnet in this case will cause LNetFini() to be called without a prior call to LNetNIFini(). This will cause the LASSERT(the_lnet.ln_refcount == 0) to be triggered. To deal with this use case when LNet is configured a reference count on the module is taken using try_module_get(). This way LNet must be unconfigured before it could be removed; therefore avoiding the above case. When LNet is unconfigured module_put() is called to return the reference count. Signed-off-by: Amir Shehata Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6010 Reviewed-on: http://review.whamcloud.com/13110 Reviewed-by: James Simmons Reviewed-by: Doug Oucharek Reviewed-by: Oleg Drokin --- drivers/staging/lustre/lnet/lnet/module.c |9 + 1 files changed, 9 insertions(+), 0 deletions(-) diff --git a/drivers/staging/lustre/lnet/lnet/module.c b/drivers/staging/lustre/lnet/lnet/module.c index 8f053d7..e12fe37 100644 --- a/drivers/staging/lustre/lnet/lnet/module.c +++ b/drivers/staging/lustre/lnet/lnet/module.c @@ -53,13 +53,21 @@ lnet_configure(void *arg) mutex_lock(&lnet_config_mutex); if (!the_lnet.ln_niinit_self) { + rc = try_module_get(THIS_MODULE); + + if (rc != 1) + goto out; + rc = LNetNIInit(LNET_PID_LUSTRE); if (rc >= 0) { the_lnet.ln_niinit_self = 1; rc = 0; + } else { + module_put(THIS_MODULE); } } +out: mutex_unlock(&lnet_config_mutex); return rc; } @@ -74,6 +82,7 @@ lnet_unconfigure(void) if (the_lnet.ln_niinit_self) { the_lnet.ln_niinit_self = 0; LNetNIFini(); + module_put(THIS_MODULE); } mutex_lock(&the_lnet.ln_api_mutex); -- 1.7.1
[PATCH 21/24] staging: lustre: use sock.h in only acceptor.c
On some platforms having sock.h in lib-types.h would collide with other included header files being used in the LNet layer. Looking at what was needed from sock.h only acceptor.c is dependent on it. To avoid these issues we just use sock.h only in acceptor.c. Signed-off-by: James Simmons Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6763 Reviewed-on: http://review.whamcloud.com/15386 Reviewed-by: Chris Horn Reviewed-by: Amir Shehata Reviewed-by: Oleg Drokin --- .../staging/lustre/include/linux/lnet/lib-types.h |1 - drivers/staging/lustre/lnet/lnet/acceptor.c|1 + 2 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h index f588e06..c10f03b 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h @@ -38,7 +38,6 @@ #include #include #include -#include #include "types.h" diff --git a/drivers/staging/lustre/lnet/lnet/acceptor.c b/drivers/staging/lustre/lnet/lnet/acceptor.c index 3468433..1452bb3 100644 --- a/drivers/staging/lustre/lnet/lnet/acceptor.c +++ b/drivers/staging/lustre/lnet/lnet/acceptor.c @@ -36,6 +36,7 @@ #define DEBUG_SUBSYSTEM S_LNET #include +#include #include "../../include/linux/lnet/lib-lnet.h" static int accept_port= 988; -- 1.7.1
[PATCH 15/24] staging: lustre: improvement to router checker
From: Amir Shehata This patch starts router checker thread all the time. The router checker only checks routes by ping if live_router_check_interval or dead_router_check_interval are set to something other than 0, and there are routes configured. If these conditions are not met the router checker sleeps until woken up when a route is added. It is also woken up whenever the RC is being stopped to ensure the thread doesn't hang. In the future when DLC starts configuring the live and dead router_check_interval parameters, then by manipulating them the router checker can be turned on and off by the user. Signed-off-by: Amir Shehata Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6003 Reviewed-on: http://review.whamcloud.com/13035 Reviewed-by: Liang Zhen Reviewed-by: Doug Oucharek Reviewed-by: James Simmons Reviewed-by: Oleg Drokin --- .../staging/lustre/include/linux/lnet/lib-types.h |7 +++ drivers/staging/lustre/lnet/lnet/api-ni.c |1 + drivers/staging/lustre/lnet/lnet/router.c | 51 +--- 3 files changed, 52 insertions(+), 7 deletions(-) diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h index e4a8f6e..06d4656 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h @@ -635,6 +635,13 @@ typedef struct { */ bool ln_nis_from_mod_params; + /* +* waitq for router checker. As long as there are no routes in +* the list, the router checker will sleep on this queue. when +* routes are added the thread will wake up +*/ + wait_queue_head_t ln_rc_waitq; + } lnet_t; #endif diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c index ed121a8..0ec656a 100644 --- a/drivers/staging/lustre/lnet/lnet/api-ni.c +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c @@ -98,6 +98,7 @@ lnet_init_locks(void) { spin_lock_init(&the_lnet.ln_eq_wait_lock); init_waitqueue_head(&the_lnet.ln_eq_waitq); + init_waitqueue_head(&the_lnet.ln_rc_waitq); mutex_init(&the_lnet.ln_lnd_mutex); mutex_init(&the_lnet.ln_api_mutex); } diff --git a/drivers/staging/lustre/lnet/lnet/router.c b/drivers/staging/lustre/lnet/lnet/router.c index 511e446..ad9cd44 100644 --- a/drivers/staging/lustre/lnet/lnet/router.c +++ b/drivers/staging/lustre/lnet/lnet/router.c @@ -405,6 +405,9 @@ lnet_add_route(__u32 net, unsigned int hops, lnet_nid_t gateway, if (rnet != rnet2) LIBCFS_FREE(rnet, sizeof(*rnet)); + /* indicate to startup the router checker if configured */ + wake_up(&the_lnet.ln_rc_waitq); + return rc; } @@ -1056,11 +1059,6 @@ lnet_router_checker_start(void) return -EINVAL; } - if (!the_lnet.ln_routing && - live_router_check_interval <= 0 && - dead_router_check_interval <= 0) - return 0; - sema_init(&the_lnet.ln_rc_signal, 0); /* * EQ size doesn't matter; the callback is guaranteed to get every @@ -1109,6 +1107,8 @@ lnet_router_checker_stop(void) LASSERT(the_lnet.ln_rc_state == LNET_RC_STATE_RUNNING); the_lnet.ln_rc_state = LNET_RC_STATE_STOPPING; + /* wakeup the RC thread if it's sleeping */ + wake_up(&the_lnet.ln_rc_waitq); /* block until event callback signals exit */ down(&the_lnet.ln_rc_signal); @@ -1199,6 +1199,33 @@ lnet_prune_rc_data(int wait_unlink) lnet_net_unlock(LNET_LOCK_EX); } +/* + * This function is called to check if the RC should block indefinitely. + * It's called from lnet_router_checker() as well as being passed to + * wait_event_interruptible() to avoid the lost wake_up problem. + * + * When it's called from wait_event_interruptible() it is necessary to + * also not sleep if the rc state is not running to avoid a deadlock + * when the system is shutting down + */ +static inline bool +lnet_router_checker_active(void) +{ + if (the_lnet.ln_rc_state != LNET_RC_STATE_RUNNING) + return true; + + /* +* Router Checker thread needs to run when routing is enabled in +* order to call lnet_update_ni_status_locked() +*/ + if (the_lnet.ln_routing) + return true; + + return !list_empty(&the_lnet.ln_routers) && + (live_router_check_interval > 0 || +dead_router_check_interval > 0); +} + static int lnet_router_checker(void *arg) { @@ -1252,8 +1279,18 @@ rescan: * because kernel counts # active tasks as nr_running * + nr_uninterruptible. */ - set_current_state(TASK_INTERRUPTIBLE); - schedule_timeout(cfs_time_seconds(1)); + /* +* if there are any routes th
[PATCH 22/24] staging: lustre: Allocate the correct number of rtr buffers
From: Amir Shehata This patch ensures that the correct number of router buffers are allocated. It keeps a count that keeps track of the number of buffers allocated. Another count keeps the number of buffers requested. The number of buffers allocated is set when creating new buffers and reduced when buffers are freed. The number of requested buffer is set when the buffers are allocated and is checked when credits are returned to determine whether the buffer should be freed or kept. In lnet_rtrpool_adjust_bufs() grab lnet_net_lock() before using rbp_nbuffers to ensure that it doesn't change by lnet_return_rx_credits_locked() during the process of allocating new buffers. All other access to rbp_nbuffers is already being protected by lnet_net_lock(). This avoids the case where we allocate less than the desired number of buffers. Signed-off-by: Amir Shehata Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6122 Reviewed-on: http://review.whamcloud.com/13519 Reviewed-by: Jinshan Xiong Reviewed-by: Doug Oucharek Reviewed-by: Oleg Drokin --- .../staging/lustre/include/linux/lnet/lib-types.h |5 ++- drivers/staging/lustre/lnet/lnet/lib-move.c|3 +- drivers/staging/lustre/lnet/lnet/router.c | 32 +++- 3 files changed, 30 insertions(+), 10 deletions(-) diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h index c10f03b..07b8db1 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h @@ -396,7 +396,10 @@ typedef struct { struct list_headrbp_msgs; /* messages blocking for a buffer */ int rbp_npages; /* # pages in each buffer */ - int rbp_nbuffers; /* # buffers */ + /* requested number of buffers */ + int rbp_req_nbuffers; + /* # buffers actually allocated */ + int rbp_nbuffers; int rbp_credits;/* # free buffers / blocked messages */ int rbp_mincredits; /* low water mark */ diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c index e5a8dbc..7bc3e91 100644 --- a/drivers/staging/lustre/lnet/lnet/lib-move.c +++ b/drivers/staging/lustre/lnet/lnet/lib-move.c @@ -1113,9 +1113,10 @@ lnet_return_rx_credits_locked(lnet_msg_t *msg) * buffers in this pool. Make sure we never put back * more buffers than the stated number. */ - if (rbp->rbp_credits >= rbp->rbp_nbuffers) { + if (unlikely(rbp->rbp_credits >= rbp->rbp_req_nbuffers)) { /* Discard this buffer so we don't have too many. */ lnet_destroy_rtrbuf(rb, rbp->rbp_npages); + rbp->rbp_nbuffers--; } else { list_add(&rb->rb_list, &rbp->rbp_bufs); rbp->rbp_credits++; diff --git a/drivers/staging/lustre/lnet/lnet/router.c b/drivers/staging/lustre/lnet/lnet/router.c index c1e7bc5..198ff03 100644 --- a/drivers/staging/lustre/lnet/lnet/router.c +++ b/drivers/staging/lustre/lnet/lnet/router.c @@ -1359,6 +1359,7 @@ lnet_rtrpool_free_bufs(lnet_rtrbufpool_t *rbp, int cpt) lnet_net_lock(cpt); lnet_drop_routed_msgs_locked(&rbp->rbp_msgs, cpt); list_splice_init(&rbp->rbp_bufs, &tmp); + rbp->rbp_req_nbuffers = 0; rbp->rbp_nbuffers = 0; rbp->rbp_credits = 0; rbp->rbp_mincredits = 0; @@ -1379,20 +1380,33 @@ lnet_rtrpool_adjust_bufs(lnet_rtrbufpool_t *rbp, int nbufs, int cpt) lnet_rtrbuf_t *rb; int num_rb; int num_buffers = 0; + int old_req_nbufs; int npages = rbp->rbp_npages; + lnet_net_lock(cpt); /* * If we are called for less buffers than already in the pool, we -* just lower the nbuffers number and excess buffers will be +* just lower the req_nbuffers number and excess buffers will be * thrown away as they are returned to the free list. Credits * then get adjusted as well. +* If we already have enough buffers allocated to serve the +* increase requested, then we can treat that the same way as we +* do the decrease. */ - if (nbufs <= rbp->rbp_nbuffers) { - lnet_net_lock(cpt); - rbp->rbp_nbuffers = nbufs; + num_rb = nbufs - rbp->rbp_nbuffers; + if (nbufs <= rbp->rbp_req_nbuffers || num_rb <= 0) { + rbp->rbp_req_nbuffers = nbufs; lnet_net_unlock(cpt); return 0; } + /* +* store the older value of rbp_req_nbuffers and then set it to +
Re: [PATCH 08/10] x86/xsaves: Fix PTRACE frames for XSAVES
On Mon, Feb 22, 2016 at 02:19:05PM -0800, Dave Hansen wrote: > On 02/22/2016 12:48 PM, Yu-cheng Yu wrote: > > It should have been: > > > > xsave->header.xfeatures = xfeatures | > > (xsave->header.xfeatures & XFEATURE_MASK_SUPERVISOR); > > > > I'll fix it. > > Can we break it out to make it more clear? > > /* > * The state that came in from userspace was user-state only. > * Mask all the user states out of 'xfeatures'. > */ > xsave->header.xfeatures &= XFEATURE_MASK_SUPERVISOR; > /* > * add back in the features that came in from userspace > */ > xsave->header.xfeatures |= xfeatures I will update it in the next version. Thanks!
[PATCH 20/24] staging: lustre: avoid race during lnet acceptor thread termination
From: Bruno Faccini This patch will avoid potential race, around socket sleepers wait list, during acceptor thread termination and using sk_callback_lock RW-Lock protection. Signed-off-by: Bruno Faccini Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6476 Reviewed-on: http://review.whamcloud.com/14503 Reviewed-by: Amir Shehata Reviewed-by: John L. Hammond Reviewed-by: Oleg Drokin --- drivers/staging/lustre/lnet/lnet/acceptor.c |8 +++- 1 files changed, 7 insertions(+), 1 deletions(-) diff --git a/drivers/staging/lustre/lnet/lnet/acceptor.c b/drivers/staging/lustre/lnet/lnet/acceptor.c index 8f9876b..3468433 100644 --- a/drivers/staging/lustre/lnet/lnet/acceptor.c +++ b/drivers/staging/lustre/lnet/lnet/acceptor.c @@ -488,11 +488,17 @@ lnet_acceptor_start(void) void lnet_acceptor_stop(void) { + struct sock *sk; + if (lnet_acceptor_state.pta_shutdown) /* not running */ return; lnet_acceptor_state.pta_shutdown = 1; - wake_up_all(sk_sleep(lnet_acceptor_state.pta_sock->sk)); + + sk = lnet_acceptor_state.pta_sock->sk; + + /* awake any sleepers using safe method */ + sk->sk_state_change(sk); /* block until acceptor signals exit */ wait_for_completion(&lnet_acceptor_state.pta_signal); -- 1.7.1
[PATCH 23/24] staging: lustre: Use lnet_is_route_alive for router aliveness
From: Chris Horn lctl show_route and lctl route_list will output router aliveness information via lnet_get_route(). lnet_get_route() should use the lnet_is_route_alive() function, introduced in e8a1124 http://review.whamcloud.com/7857, to determine route aliveness. Signed-off-by: Chris Horn Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5733 Reviewed-on: http://review.whamcloud.com/14055 Reviewed-by: James Simmons Reviewed-by: Amir Shehata Reviewed-by: Oleg Drokin --- drivers/staging/lustre/lnet/lnet/router.c |3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/drivers/staging/lustre/lnet/lnet/router.c b/drivers/staging/lustre/lnet/lnet/router.c index 198ff03..5a6086b 100644 --- a/drivers/staging/lustre/lnet/lnet/router.c +++ b/drivers/staging/lustre/lnet/lnet/router.c @@ -607,8 +607,7 @@ lnet_get_route(int idx, __u32 *net, __u32 *hops, *hops = route->lr_hops; *priority = route->lr_priority; *gateway = route->lr_gateway->lp_nid; - *alive = route->lr_gateway->lp_alive && -!route->lr_downis; + *alive = lnet_is_route_alive(route); lnet_net_unlock(cpt); return 0; } -- 1.7.1
Re: [PATCH 3/4] sched/deadline: Tracepoints for deadline scheduler
On Mon, 22 Feb 2016 22:30:17 +0100 Peter Zijlstra wrote: > On Mon, Feb 22, 2016 at 12:48:54PM -0500, Steven Rostedt wrote: > > > > As it stands, the existing tracepoint have already been an ABI > > > trainwreck, why would I want to add more? > > > > Yes, this may become a type of ABI, but even the sched switch > > tracepoints haven't been that bad. Has it really prevented us from > > changing anything? > > The whole wakeup thing where we _still_ have a dummy argument, and have > been lying about the value for a long time really stinks. Having a dummy argument is not that bad. Yes, it's inconvenient, and I'm not sure who even uses it (can we delete it without breaking anything?) But it doesn't prevent us from going forward with development. > > > But let me ask, what would you recommend to finding out if the kernel > > has really given your tasks the recommended runtime within a given > > period? We can't expect users of SCHED_DEADLINE to be modifying the > > kernel themselves. > > So why are these deadline specific tracepoint? Why not extend the ones > we already have? I'm not sure how to do that and be able to report when a period has elapsed, and when the next period is coming. > > Also, will these tracepoints still work if we implement SCHED_DEADLINE > using Least-Laxity-First or Pfair or some other exotic algorithm? Or > will be forever be bound to EDF just because tracepoint ABI shite? Can we come up with generic numbers? I mean, the user that asks for their task to have a specific runtime within a specific period/deadline, these seem to be generic already. I'll have to read up on those that you mention, but do that not have a "replenish" for when the period starts again? And then a yield, showing the task has given up its remaining time, or a block, where a task is scheduled out because it blocked on a lock? > > Worse, the proposed tracepoints are atrocious, look at crap like this: > > > + if (trace_sched_deadline_yield_enabled()) { > > + u64 delta_exec = rq_clock_task(rq) - p->se.exec_start; > > + /* Subtract the last run till now */ > > + if (likely((s64)delta_exec > 0)) > > + p->dl.runtime -= delta_exec; > > + trace_sched_deadline_yield(&p->dl); > > + } > > tracepoints should _NEVER_ change state, ever. Heh, it's not really changing state. The code directly after this is: p->dl.runtime = 0; Without updating dl.runtime, you the tracepoint would report inaccurate remaining time. Without that, you would get reports of yielding with full runtimes, making it look like you never ran at all. > > And there's the whole COND tracepoint muck, which also doesn't win any > prices. It keeps us from adding flags that may end up going away and us maintaining a dummy field forever ;-) > > So tell me why these specific tracepoints and why the existing ones > could not be extended to include this information. For example, why a > trace_sched_dealine_yield, and not a generic trace_sched_yield() that > works for all classes. But what about reporting actual runtime, and when the next period will come. That only matters for deadline. > > Tell me that the information presented does not pin the implementation. > > And clean up the crap. > > Then I might maybe consider this. > > But do not present me with a bunch of random arse, hacked together > tracepoints and tell me they might be useful, maybe. They ARE useful. These are the tracepoints I'm currently using to debug the deadline scheduler with. They have been indispensable for my current work. -- Steve
[PATCH 24/24] staging: lustre: Remove LASSERTS from router checker
From: Doug Oucharek In lnet_router_checker(), there are two LASSERTS. Neither protects us from anything and one of them triggered for a customer crashing the system unecessarily. This patch removes them. Signed-off-by: Doug Oucharek Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7362 Reviewed-on: http://review.whamcloud.com/17003 Reviewed-by: James Simmons Reviewed-by: Chris Horn Reviewed-by: Matt Ezell Reviewed-by: Oleg Drokin --- drivers/staging/lustre/lnet/lnet/router.c |4 1 files changed, 0 insertions(+), 4 deletions(-) diff --git a/drivers/staging/lustre/lnet/lnet/router.c b/drivers/staging/lustre/lnet/lnet/router.c index 5a6086b..5e8b0ba 100644 --- a/drivers/staging/lustre/lnet/lnet/router.c +++ b/drivers/staging/lustre/lnet/lnet/router.c @@ -1228,8 +1228,6 @@ lnet_router_checker(void *arg) cfs_block_allsigs(); - LASSERT(the_lnet.ln_rc_state == LNET_RC_STATE_RUNNING); - while (the_lnet.ln_rc_state == LNET_RC_STATE_RUNNING) { __u64 version; int cpt; @@ -1287,8 +1285,6 @@ rescan: cfs_time_seconds(1)); } - LASSERT(the_lnet.ln_rc_state == LNET_RC_STATE_STOPPING); - lnet_prune_rc_data(1); /* wait for UNLINK */ the_lnet.ln_rc_state = LNET_RC_STATE_SHUTDOWN; -- 1.7.1
[PATCH 00/24] Second batch of LNet updates
This patch set fixes many of the LNet issues encounter run in production environments. One of the long standing issues was not being able to reconfigure LNet after initialization. Doing so left it in a broken state. Several other issues are also addressed in this patch set. Merged back into this patch set are also the suggestions for improvement from Dan Carpenter when the original patch set was posted. Amir Shehata (14): staging: lustre: Dynamic LNet Configuration (DLC) IOCTL changes staging: lustre: Dynamic LNet Configuration (DLC) show command staging: lustre: fix crash due to NULL networks string staging: lustre: DLC user/kernel space glue code staging: lustre: improve LNet clean up code and API staging: lustre: return appropriate errno when adding route staging: lustre: startup lnet acceptor thread dynamically staging: lustre: reject invalid net configuration for lnet staging: lustre: return -EEXIST if NI is not unique staging: lustre: handle lnet_check_routes() errors staging: lustre: improvement to router checker staging: lustre: prevent assert on LNet module unload staging: lustre: remove messages from lazy portal on NI shutdown staging: lustre: Allocate the correct number of rtr buffers Bruno Faccini (1): staging: lustre: avoid race during lnet acceptor thread termination Chris Horn (1): staging: lustre: Use lnet_is_route_alive for router aliveness Doug Oucharek (1): staging: lustre: Remove LASSERTS from router checker Frank Zago (4): staging: lustre: make local functions static for LNet ni staging: lustre: make some lnet functions static staging: lustre: missed a few cases of using NULL instead of 0 staging: lustre: remove unnecessary EXPORT_SYMBOL from lnet layer James Simmons (1): staging: lustre: use sock.h in only acceptor.c John L. Hammond (2): staging: lustre: remove LUSTRE_{,SRV_}LNET_PID staging: lustre: assume a kernel build .../staging/lustre/include/linux/libcfs/libcfs.h |2 - .../lustre/include/linux/libcfs/libcfs_ioctl.h | 49 +- .../lustre/include/linux/libcfs/linux/libcfs.h |3 - .../staging/lustre/include/linux/lnet/lib-dlc.h| 122 .../staging/lustre/include/linux/lnet/lib-lnet.h | 22 + .../staging/lustre/include/linux/lnet/lib-types.h | 25 +- .../staging/lustre/lnet/klnds/socklnd/socklnd.c|7 +- drivers/staging/lustre/lnet/lnet/acceptor.c| 21 +- drivers/staging/lustre/lnet/lnet/api-ni.c | 761 +--- drivers/staging/lustre/lnet/lnet/config.c | 25 +- drivers/staging/lustre/lnet/lnet/lib-eq.c |3 - drivers/staging/lustre/lnet/lnet/lib-md.c |3 - drivers/staging/lustre/lnet/lnet/lib-me.c |3 - drivers/staging/lustre/lnet/lnet/lib-move.c| 10 +- drivers/staging/lustre/lnet/lnet/lib-msg.c | 20 +- drivers/staging/lustre/lnet/lnet/lib-ptl.c | 54 +- drivers/staging/lustre/lnet/lnet/lib-socket.c |3 - drivers/staging/lustre/lnet/lnet/module.c | 81 ++- drivers/staging/lustre/lnet/lnet/peer.c| 63 ++ drivers/staging/lustre/lnet/lnet/router.c | 181 -- drivers/staging/lustre/lnet/lnet/router_proc.c |2 - drivers/staging/lustre/lnet/selftest/conctl.c |9 +- drivers/staging/lustre/lnet/selftest/console.c |6 +- drivers/staging/lustre/lnet/selftest/console.h |1 - drivers/staging/lustre/lnet/selftest/framework.c | 10 - drivers/staging/lustre/lnet/selftest/module.c |4 +- drivers/staging/lustre/lnet/selftest/rpc.c |4 +- .../lustre/lustre/libcfs/linux/linux-module.c | 55 +- drivers/staging/lustre/lustre/libcfs/module.c | 58 ++- drivers/staging/lustre/lustre/llite/llite_lib.c|2 +- .../staging/lustre/lustre/obdclass/obd_config.c|4 +- drivers/staging/lustre/lustre/ptlrpc/events.c |4 +- 32 files changed, 1122 insertions(+), 495 deletions(-) create mode 100644 drivers/staging/lustre/include/linux/lnet/lib-dlc.h
[PATCH 19/24] staging: lustre: remove unnecessary EXPORT_SYMBOL from lnet layer
From: Frank Zago A lot of symbols don't need to be exported at all because they are only used in the module they belong to. Signed-off-by: Frank Zago Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5829 Reviewed-on: http://review.whamcloud.com/13320 Reviewed-by: James Simmons Reviewed-by: Isaac Huang Reviewed-by: John L. Hammond Reviewed-by: Oleg Drokin --- drivers/staging/lustre/lnet/lnet/lib-socket.c |3 --- 1 files changed, 0 insertions(+), 3 deletions(-) diff --git a/drivers/staging/lustre/lnet/lnet/lib-socket.c b/drivers/staging/lustre/lnet/lnet/lib-socket.c index 53dd0bd..88905d5 100644 --- a/drivers/staging/lustre/lnet/lnet/lib-socket.c +++ b/drivers/staging/lustre/lnet/lnet/lib-socket.c @@ -514,7 +514,6 @@ lnet_sock_listen(struct socket **sockp, __u32 local_ip, int local_port, sock_release(*sockp); return rc; } -EXPORT_SYMBOL(lnet_sock_listen); int lnet_sock_accept(struct socket **newsockp, struct socket *sock) @@ -558,7 +557,6 @@ failed: sock_release(newsock); return rc; } -EXPORT_SYMBOL(lnet_sock_accept); int lnet_sock_connect(struct socket **sockp, int *fatal, __u32 local_ip, @@ -596,4 +594,3 @@ lnet_sock_connect(struct socket **sockp, int *fatal, __u32 local_ip, sock_release(*sockp); return rc; } -EXPORT_SYMBOL(lnet_sock_connect); -- 1.7.1
[PATCH 06/24] staging: lustre: remove LUSTRE_{,SRV_}LNET_PID
From: John L. Hammond Remove LUSTRE_LNET_PID (12354) and LUSTRE_SRV_LNET_PID (12345) from the libcfs headers and replace their uses with a new macro LNET_PID_LUSTRE (also 12345) in lnet/types.h. Signed-off-by: John L. Hammond Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2675 Reviewed-on: http://review.whamcloud.com/11985 Reviewed-by: James Simmons Reviewed-by: Bob Glossman Reviewed-by: Dmitry Eremin Reviewed-by: Andreas Dilger --- .../staging/lustre/include/linux/libcfs/libcfs.h |2 -- .../lustre/include/linux/libcfs/linux/libcfs.h |3 --- .../staging/lustre/lnet/klnds/socklnd/socklnd.c|7 +-- drivers/staging/lustre/lnet/lnet/api-ni.c |2 +- drivers/staging/lustre/lnet/lnet/lib-move.c|2 +- drivers/staging/lustre/lnet/lnet/module.c |4 ++-- drivers/staging/lustre/lnet/lnet/router.c |2 +- drivers/staging/lustre/lnet/selftest/rpc.c |2 +- drivers/staging/lustre/lustre/ptlrpc/events.c |4 ++-- 9 files changed, 13 insertions(+), 15 deletions(-) diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs.h b/drivers/staging/lustre/include/linux/libcfs/libcfs.h index dc9b88f..5c598c8 100644 --- a/drivers/staging/lustre/include/linux/libcfs/libcfs.h +++ b/drivers/staging/lustre/include/linux/libcfs/libcfs.h @@ -51,8 +51,6 @@ #define LERRCHKSUM(hexnum) (((hexnum) & 0xf) ^ ((hexnum) >> 4 & 0xf) ^ \ ((hexnum) >> 8 & 0xf)) -#define LUSTRE_SRV_LNET_PID LUSTRE_LNET_PID - #include /* need both kernel and user-land acceptor */ diff --git a/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h b/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h index aac5900..d94b266 100644 --- a/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h +++ b/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h @@ -118,9 +118,6 @@ do { \ #define CDEBUG_STACK() (0L) #endif /* __x86_64__ */ -/* initial pid */ -#define LUSTRE_LNET_PID 12345 - #define __current_nesting_level() (0) /** diff --git a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c index 49d716d..854814c 100644 --- a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c +++ b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c @@ -1842,7 +1842,10 @@ ksocknal_query(lnet_ni_t *ni, lnet_nid_t nid, unsigned long *when) unsigned long now = cfs_time_current(); ksock_peer_t *peer = NULL; rwlock_t *glock = &ksocknal_data.ksnd_global_lock; - lnet_process_id_t id = {.nid = nid, .pid = LUSTRE_SRV_LNET_PID}; + lnet_process_id_t id = { + .nid = nid, + .pid = LNET_PID_LUSTRE, + }; read_lock(glock); @@ -2187,7 +2190,7 @@ ksocknal_ctl(lnet_ni_t *ni, unsigned int cmd, void *arg) case IOC_LIBCFS_ADD_PEER: id.nid = data->ioc_nid; - id.pid = LUSTRE_SRV_LNET_PID; + id.pid = LNET_PID_LUSTRE; return ksocknal_add_peer(ni, id, data->ioc_u32[0], /* IP */ data->ioc_u32[1]); /* port */ diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c index 7583ae4..3bed4c3 100644 --- a/drivers/staging/lustre/lnet/lnet/api-ni.c +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c @@ -2117,7 +2117,7 @@ static int lnet_ping(lnet_process_id_t id, int timeout_ms, return -EINVAL; if (id.pid == LNET_PID_ANY) - id.pid = LUSTRE_SRV_LNET_PID; + id.pid = LNET_PID_LUSTRE; LIBCFS_ALLOC(info, infosz); if (!info) diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c index f2b1116..a342ce0 100644 --- a/drivers/staging/lustre/lnet/lnet/lib-move.c +++ b/drivers/staging/lustre/lnet/lnet/lib-move.c @@ -1407,7 +1407,7 @@ lnet_send(lnet_nid_t src_nid, lnet_msg_t *msg, lnet_nid_t rtr_nid) msg->msg_target_is_router = 1; msg->msg_target.nid = lp->lp_nid; - msg->msg_target.pid = LUSTRE_SRV_LNET_PID; + msg->msg_target.pid = LNET_PID_LUSTRE; } /* 'lp' is our best choice of peer */ diff --git a/drivers/staging/lustre/lnet/lnet/module.c b/drivers/staging/lustre/lnet/lnet/module.c index e9b1e69..8f053d7 100644 --- a/drivers/staging/lustre/lnet/lnet/module.c +++ b/drivers/staging/lustre/lnet/lnet/module.c @@ -53,7 +53,7 @@ lnet_configure(void *arg) mutex_lock(&lnet_config_mutex); if (!the_lnet.ln_niinit_self) { - rc = LNetNIInit(LUSTRE_SRV_LNET_PID); + rc = LNetNIInit(LNET_PID_LUSTRE); if (rc >= 0) { the_lnet.ln_niinit_self = 1; rc = 0; @@ -99,7 +99,
[PATCH 02/24] staging: lustre: Dynamic LNet Configuration (DLC) show command
From: Amir Shehata This is the fifth patch of a set of patches that enables DLC. This patch adds the new structures which will be used in the IOCTL communication. It also added a set of show operations to show buffers, networks, statistics and peer information. Signed-off-by: Amir Shehata Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2456 Reviewed-on: http://review.whamcloud.com/8022 Reviewed-by: John L. Hammond Reviewed-by: Doug Oucharek Reviewed-by: James Simmons Reviewed-by: Oleg Drokin --- .../lustre/include/linux/libcfs/libcfs_ioctl.h | 26 - .../staging/lustre/include/linux/lnet/lib-dlc.h| 122 .../staging/lustre/include/linux/lnet/lib-lnet.h |5 + drivers/staging/lustre/lnet/lnet/api-ni.c | 56 - drivers/staging/lustre/lnet/lnet/module.c |4 + drivers/staging/lustre/lnet/lnet/peer.c| 61 ++ .../lustre/lustre/libcfs/linux/linux-module.c |3 +- drivers/staging/lustre/lustre/libcfs/module.c | 15 ++- 8 files changed, 277 insertions(+), 15 deletions(-) create mode 100644 drivers/staging/lustre/include/linux/lnet/lib-dlc.h diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_ioctl.h b/drivers/staging/lustre/include/linux/libcfs/libcfs_ioctl.h index 0598702..f788631 100644 --- a/drivers/staging/lustre/include/linux/libcfs/libcfs_ioctl.h +++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_ioctl.h @@ -41,7 +41,8 @@ #ifndef __LIBCFS_IOCTL_H__ #define __LIBCFS_IOCTL_H__ -#define LIBCFS_IOCTL_VERSION 0x0001000a +#define LIBCFS_IOCTL_VERSION 0x0001000a +#define LIBCFS_IOCTL_VERSION2 0x0001000b struct libcfs_ioctl_hdr { __u32 ioc_len; @@ -111,9 +112,6 @@ struct libcfs_ioctl_handler { /* lnet ioctls */ #define IOC_LIBCFS_GET_NI_IOWR('e', 50, long) #define IOC_LIBCFS_FAIL_NID_IOWR('e', 51, long) -#define IOC_LIBCFS_ADD_ROUTE _IOWR('e', 52, long) -#define IOC_LIBCFS_DEL_ROUTE _IOWR('e', 53, long) -#define IOC_LIBCFS_GET_ROUTE _IOWR('e', 54, long) #define IOC_LIBCFS_NOTIFY_ROUTER _IOWR('e', 55, long) #define IOC_LIBCFS_UNCONFIGURE _IOWR('e', 56, long) /* #define IOC_LIBCFS_PORTALS_COMPATIBILITY _IOWR('e', 57, long) */ @@ -136,7 +134,25 @@ struct libcfs_ioctl_handler { #define IOC_LIBCFS_DEL_INTERFACE _IOWR('e', 79, long) #define IOC_LIBCFS_GET_INTERFACE _IOWR('e', 80, long) -#define IOC_LIBCFS_MAX_NR 80 +/* + * DLC Specific IOCTL numbers. + * In order to maintain backward compatibility with any possible external + * tools which might be accessing the IOCTL numbers, a new group of IOCTL + * number have been allocated. + */ +#define IOCTL_CONFIG_SIZE struct lnet_ioctl_config_data +#define IOC_LIBCFS_ADD_ROUTE _IOWR(IOC_LIBCFS_TYPE, 81, IOCTL_CONFIG_SIZE) +#define IOC_LIBCFS_DEL_ROUTE _IOWR(IOC_LIBCFS_TYPE, 82, IOCTL_CONFIG_SIZE) +#define IOC_LIBCFS_GET_ROUTE _IOWR(IOC_LIBCFS_TYPE, 83, IOCTL_CONFIG_SIZE) +#define IOC_LIBCFS_ADD_NET _IOWR(IOC_LIBCFS_TYPE, 84, IOCTL_CONFIG_SIZE) +#define IOC_LIBCFS_DEL_NET _IOWR(IOC_LIBCFS_TYPE, 85, IOCTL_CONFIG_SIZE) +#define IOC_LIBCFS_GET_NET _IOWR(IOC_LIBCFS_TYPE, 86, IOCTL_CONFIG_SIZE) +#define IOC_LIBCFS_CONFIG_RTR _IOWR(IOC_LIBCFS_TYPE, 87, IOCTL_CONFIG_SIZE) +#define IOC_LIBCFS_ADD_BUF _IOWR(IOC_LIBCFS_TYPE, 88, IOCTL_CONFIG_SIZE) +#define IOC_LIBCFS_GET_BUF _IOWR(IOC_LIBCFS_TYPE, 89, IOCTL_CONFIG_SIZE) +#define IOC_LIBCFS_GET_PEER_INFO _IOWR(IOC_LIBCFS_TYPE, 90, IOCTL_CONFIG_SIZE) +#define IOC_LIBCFS_GET_LNET_STATS _IOWR(IOC_LIBCFS_TYPE, 91, IOCTL_CONFIG_SIZE) +#define IOC_LIBCFS_MAX_NR 91 static inline int libcfs_ioctl_packlen(struct libcfs_ioctl_data *data) { diff --git a/drivers/staging/lustre/include/linux/lnet/lib-dlc.h b/drivers/staging/lustre/include/linux/lnet/lib-dlc.h new file mode 100644 index 000..84a19e9 --- /dev/null +++ b/drivers/staging/lustre/include/linux/lnet/lib-dlc.h @@ -0,0 +1,122 @@ +/* + * LGPL HEADER START + * + * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. + * + * LGPL HEADER END + * + */ +/* + * Copyright (c) 2014, Intel Corporation. + */ +/* + * Author: Amir Shehat
Re: [PATCH 0/8] X.509: Software public key subtype changes
Mimi Zohar wrote: > > (1) - (3) These are Tadeusz's RSA akcipher conversion. > > Up to here, IMA-appraisal works properly. I don't have IMA set up anywhere. David
[patch V3 04/28] x86/perf/intel/uncore: Add sanity checks for pci dev package id
The storage array is size limited, but misses a sanity check Signed-off-by: Thomas Gleixner --- arch/x86/kernel/cpu/perf_event_intel_uncore.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: b/arch/x86/kernel/cpu/perf_event_intel_uncore.c === --- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c @@ -882,7 +882,7 @@ static int uncore_pci_probe(struct pci_d int phys_id, ret; phys_id = uncore_pcibus_to_physid(pdev->bus); - if (phys_id < 0) + if (phys_id < 0 || phys_id >= UNCORE_SOCKET_MAX) return -ENODEV; if (UNCORE_PCI_DEV_TYPE(id->driver_data) == UNCORE_EXTRA_PCI_DEV) {
[patch V3 05/28] x86/perf/intel_uncore: Cleanup hardware on exit
When tearing down the boxes nothing undoes the hardware state which was setup by box->init_box(). Add a box->exit_box() callback and implement it for the uncores which have an init_box() callback. This misses the cleanup in the error exit pathes, but I cannot be bothered to implement it before cleaning up the rest of the driver, which makes that task way simpler. Signed-off-by: Thomas Gleixner --- V2: Removed the code which affects IVB and newer as Liang pointed out that it's incorrect --- arch/x86/kernel/cpu/perf_event_intel_uncore.c |6 +- arch/x86/kernel/cpu/perf_event_intel_uncore.h |9 + arch/x86/kernel/cpu/perf_event_intel_uncore_nhmex.c |6 ++ arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c | 13 + 4 files changed, 33 insertions(+), 1 deletion(-) --- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c @@ -937,6 +937,7 @@ static int uncore_pci_probe(struct pci_d raw_spin_lock(&uncore_box_lock); list_del(&box->list); raw_spin_unlock(&uncore_box_lock); + uncore_box_exit(box); kfree(box); } return ret; @@ -982,6 +983,7 @@ static void uncore_pci_remove(struct pci } WARN_ON_ONCE(atomic_read(&box->refcnt) != 1); + uncore_box_exit(box); kfree(box); if (last_box) @@ -1091,8 +1093,10 @@ static void uncore_cpu_dying(int cpu) pmu = &type->pmus[j]; box = *per_cpu_ptr(pmu->box, cpu); *per_cpu_ptr(pmu->box, cpu) = NULL; - if (box && atomic_dec_and_test(&box->refcnt)) + if (box && atomic_dec_and_test(&box->refcnt)) { list_add(&box->list, &boxes_to_free); + uncore_box_exit(box); + } } } } --- a/arch/x86/kernel/cpu/perf_event_intel_uncore.h +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.h @@ -61,6 +61,7 @@ struct intel_uncore_type { struct intel_uncore_ops { void (*init_box)(struct intel_uncore_box *); + void (*exit_box)(struct intel_uncore_box *); void (*disable_box)(struct intel_uncore_box *); void (*enable_box)(struct intel_uncore_box *); void (*disable_event)(struct intel_uncore_box *, struct perf_event *); @@ -306,6 +307,14 @@ static inline void uncore_box_init(struc } } +static inline void uncore_box_exit(struct intel_uncore_box *box) +{ + if (test_and_clear_bit(UNCORE_BOX_FLAG_INITIATED, &box->flags)) { + if (box->pmu->type->ops->exit_box) + box->pmu->type->ops->exit_box(box); + } +} + static inline bool uncore_box_is_fake(struct intel_uncore_box *box) { return (box->phys_id < 0); --- a/arch/x86/kernel/cpu/perf_event_intel_uncore_nhmex.c +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore_nhmex.c @@ -201,6 +201,11 @@ static void nhmex_uncore_msr_init_box(st wrmsrl(NHMEX_U_MSR_PMON_GLOBAL_CTL, NHMEX_U_PMON_GLOBAL_EN_ALL); } +static void nhmex_uncore_msr_exit_box(struct intel_uncore_box *box) +{ + wrmsrl(NHMEX_U_MSR_PMON_GLOBAL_CTL, 0); +} + static void nhmex_uncore_msr_disable_box(struct intel_uncore_box *box) { unsigned msr = uncore_msr_box_ctl(box); @@ -250,6 +255,7 @@ static void nhmex_uncore_msr_enable_even #define NHMEX_UNCORE_OPS_COMMON_INIT() \ .init_box = nhmex_uncore_msr_init_box,\ + .exit_box = nhmex_uncore_msr_exit_box,\ .disable_box= nhmex_uncore_msr_disable_box, \ .enable_box = nhmex_uncore_msr_enable_box, \ .disable_event = nhmex_uncore_msr_disable_event, \ --- a/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c @@ -95,6 +95,12 @@ static void snb_uncore_msr_init_box(stru } } +static void snb_uncore_msr_exit_box(struct intel_uncore_box *box) +{ + if (box->pmu->pmu_idx == 0) + wrmsrl(SNB_UNC_PERF_GLOBAL_CTL, 0); +} + static struct uncore_event_desc snb_uncore_events[] = { INTEL_UNCORE_EVENT_DESC(clockticks, "event=0xff,umask=0x00"), { /* end: all zeroes */ }, @@ -116,6 +122,7 @@ static struct attribute_group snb_uncore static struct intel_uncore_ops snb_uncore_msr_ops = { .init_box = snb_uncore_msr_init_box, + .exit_box = snb_uncore_msr_exit_box, .disable_event = snb_uncore_msr_disable_event, .enable_event = snb_uncore_msr_enable_event, .read_counter = uncore_msr_read_counter, @@ -231,6 +238,11 @@ static void snb_uncore_imc_init_box(stru box->hrtimer_duration = UNCORE_SNB_IMC_HRTIMER_INTERVAL; } +static void snb_uncore_imc_exit_box(struct intel_uncore_box *box) +{ + iounmap(box->io_addr); +}
[patch V3 01/28] x86/perf/intel/uncore: Remove pointless mask check
uncore_cpumask_init() is only ever called from intel_uncore_init() where the mask is guaranteed to be empty. Signed-off-by: Thomas Gleixner --- arch/x86/kernel/cpu/perf_event_intel_uncore.c |6 -- 1 file changed, 6 deletions(-) --- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c @@ -1342,12 +1342,6 @@ static void __init uncore_cpumask_init(v { int cpu; - /* -* ony invoke once from msr or pci init code -*/ - if (!cpumask_empty(&uncore_cpu_mask)) - return; - cpu_notifier_register_begin(); for_each_online_cpu(cpu) {
Re: [PATCH 4/8] akcipher: Move the RSA DER encoding to the crypto layer
Tadeusz Struk wrote: > I wonder if this should be merged with the crypto/rsa-pkcs1pad.c template > that we already have. Looks like the two do the same padding now. > Should we merge then and pass the hash param as a separate template param, > e.g the public_key would allocate "pkcs1pad(rsa, sha1)"? Ummm... Possibly. Is that how it's used? warthog>git grep pkcs1pad -- Documentation warthog1> Anyway, the problem I have with this is that I want to get that knowledge out of the asymmetric key in-software public key subtype. It knows "rsa", "dsa", "ecdsa", ... because that's all the OIDs tell it. I guess if I have to, I can stoop to converting "rsa" to "pkcs1pad(rsa, sha1)". Can you do me a really quick merge? -rc5 is already out, and I want to get it to James pronto - plus I have things that are pending on this change being made. Oh - and how does the padding template find the algorithm DER encoding string to use? I have wondered whether it should be stored in with the hash algorithm, but it probably makes more sense to keep it with the rsa module. David
[patch V3 07/28] x86/perf/intel_uncore: Make code readable
Cleanup the code a bit before reworking it completely. Signed-off-by: Thomas Gleixner --- arch/x86/kernel/cpu/perf_event_intel_uncore.c | 95 +- 1 file changed, 50 insertions(+), 45 deletions(-) Index: b/arch/x86/kernel/cpu/perf_event_intel_uncore.c === --- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c @@ -217,7 +217,8 @@ u64 uncore_shared_reg_config(struct inte return config; } -static void uncore_assign_hw_event(struct intel_uncore_box *box, struct perf_event *event, int idx) +static void uncore_assign_hw_event(struct intel_uncore_box *box, + struct perf_event *event, int idx) { struct hw_perf_event *hwc = &event->hw; @@ -312,18 +313,19 @@ static void uncore_pmu_init_hrtimer(stru box->hrtimer.function = uncore_pmu_hrtimer; } -static struct intel_uncore_box *uncore_alloc_box(struct intel_uncore_type *type, int node) +static struct intel_uncore_box *uncore_alloc_box(struct intel_uncore_type *type, +int node) { + int i, size, numshared = type->num_shared_regs ; struct intel_uncore_box *box; - int i, size; - size = sizeof(*box) + type->num_shared_regs * sizeof(struct intel_uncore_extra_reg); + size = sizeof(*box) + numshared * sizeof(struct intel_uncore_extra_reg); box = kzalloc_node(size, GFP_KERNEL, node); if (!box) return NULL; - for (i = 0; i < type->num_shared_regs; i++) + for (i = 0; i < numshared; i++) raw_spin_lock_init(&box->shared_regs[i].lock); uncore_pmu_init_hrtimer(box); @@ -351,7 +353,8 @@ static bool is_uncore_event(struct perf_ } static int -uncore_collect_events(struct intel_uncore_box *box, struct perf_event *leader, bool dogrp) +uncore_collect_events(struct intel_uncore_box *box, struct perf_event *leader, + bool dogrp) { struct perf_event *event; int n, max_count; @@ -412,7 +415,8 @@ uncore_get_event_constraint(struct intel return &type->unconstrainted; } -static void uncore_put_event_constraint(struct intel_uncore_box *box, struct perf_event *event) +static void uncore_put_event_constraint(struct intel_uncore_box *box, + struct perf_event *event) { if (box->pmu->type->ops->put_constraint) box->pmu->type->ops->put_constraint(box, event); @@ -592,7 +596,7 @@ static void uncore_pmu_event_del(struct if (event == box->event_list[i]) { uncore_put_event_constraint(box, event); - while (++i < box->n_events) + for (++i; i < box->n_events; i++) box->event_list[i - 1] = box->event_list[i]; --box->n_events; @@ -801,10 +805,8 @@ static void __init uncore_type_exit(stru static void __init uncore_types_exit(struct intel_uncore_type **types) { - int i; - - for (i = 0; types[i]; i++) - uncore_type_exit(types[i]); + for (; *types; types++) + uncore_type_exit(*types); } static int __init uncore_type_init(struct intel_uncore_type *type) @@ -908,9 +910,11 @@ static int uncore_pci_probe(struct pci_d * some device types. Hence PCI device idx would be 0 for all devices. * So increment pmu pointer to point to an unused array element. */ - if (boot_cpu_data.x86_model == 87) + if (boot_cpu_data.x86_model == 87) { while (pmu->func_id >= 0) pmu++; + } + if (pmu->func_id < 0) pmu->func_id = pdev->devfn; else @@ -1170,44 +1174,45 @@ static int uncore_cpu_prepare(int cpu, i return 0; } -static void -uncore_change_context(struct intel_uncore_type **uncores, int old_cpu, int new_cpu) +static void uncore_change_type_ctx(struct intel_uncore_type *type, int old_cpu, + int new_cpu) { - struct intel_uncore_type *type; - struct intel_uncore_pmu *pmu; + struct intel_uncore_pmu *pmu = type->pmus; struct intel_uncore_box *box; - int i, j; - - for (i = 0; uncores[i]; i++) { - type = uncores[i]; - for (j = 0; j < type->num_boxes; j++) { - pmu = &type->pmus[j]; - if (old_cpu < 0) - box = uncore_pmu_to_box(pmu, new_cpu); - else - box = uncore_pmu_to_box(pmu, old_cpu); - if (!box) - continue; + int i; - if (old_cpu < 0) { - WARN_ON_ONCE(box->cpu != -1); - box->cpu = new
[patch V3 06/28] x86/perf/intel/uncore: Drop pointless hotplug priority
The perf core doesn't do anything related to hardware pmus on hotplug. Drop the priority. Signed-off-by: Thomas Gleixner --- arch/x86/kernel/cpu/perf_event_intel_uncore.c |5 - 1 file changed, 5 deletions(-) --- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c @@ -1294,11 +1294,6 @@ static int uncore_cpu_notifier(struct no static struct notifier_block uncore_cpu_nb = { .notifier_call = uncore_cpu_notifier, - /* -* to migrate uncore events, our notifier should be executed -* before perf core's notifier. -*/ - .priority = CPU_PRI_PERF + 1, }; static int __init type_pmu_register(struct intel_uncore_type *type)
Re: [PATCH] intel-hid: allocate correct amount of memory for private struct
On Sun, Feb 21, 2016 at 03:22:27PM +0100, Wolfram Sang wrote: > We want the size of the struct, not of a pointer to it. To be future > proof, just dereference the pointer to get the desired type. > > Signed-off-by: Wolfram Sang Queued to testing. Thank you Wolfram. > --- > > Compile tested only. Found by static code analysis. > > drivers/platform/x86/intel-hid.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/drivers/platform/x86/intel-hid.c > b/drivers/platform/x86/intel-hid.c > index e20f23e04c24ce..f93abc8c1424ad 100644 > --- a/drivers/platform/x86/intel-hid.c > +++ b/drivers/platform/x86/intel-hid.c > @@ -180,8 +180,7 @@ static int intel_hid_probe(struct platform_device *device) > return -ENODEV; > } > > - priv = devm_kzalloc(&device->dev, > - sizeof(struct intel_hid_priv *), GFP_KERNEL); > + priv = devm_kzalloc(&device->dev, sizeof(*priv), GFP_KERNEL); > if (!priv) > return -ENOMEM; > dev_set_drvdata(&device->dev, priv); > -- > 2.7.0 > > -- Darren Hart Intel Open Source Technology Center
[patch V3 02/28] x86/perf/intel/uncore: Simplify error rollback
No point in doing partial rollbacks. Robustify uncore_exit_type() so it does not dereference type->pmus unconditionally and remove all the partial rollback hackery. Preparatory patch for proper error handling. Signed-off-by: Thomas Gleixner --- arch/x86/kernel/cpu/perf_event_intel_uncore.c | 45 +- 1 file changed, 24 insertions(+), 21 deletions(-) --- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c @@ -767,10 +767,12 @@ static void __init uncore_type_exit(stru { int i; - for (i = 0; i < type->num_boxes; i++) - free_percpu(type->pmus[i].box); - kfree(type->pmus); - type->pmus = NULL; + if (type->pmus) { + for (i = 0; i < type->num_boxes; i++) + free_percpu(type->pmus[i].box); + kfree(type->pmus); + type->pmus = NULL; + } kfree(type->events_group); type->events_group = NULL; } @@ -778,6 +780,7 @@ static void __init uncore_type_exit(stru static void __init uncore_types_exit(struct intel_uncore_type **types) { int i; + for (i = 0; types[i]; i++) uncore_type_exit(types[i]); } @@ -806,7 +809,7 @@ static int __init uncore_type_init(struc INIT_LIST_HEAD(&pmus[i].box_list); pmus[i].box = alloc_percpu(struct intel_uncore_box *); if (!pmus[i].box) - goto fail; + return -ENOMEM; } if (type->event_descs) { @@ -817,7 +820,7 @@ static int __init uncore_type_init(struc attr_group = kzalloc(sizeof(struct attribute *) * (i + 1) + sizeof(*attr_group), GFP_KERNEL); if (!attr_group) - goto fail; + return -ENOMEM; attrs = (struct attribute **)(attr_group + 1); attr_group->name = "events"; @@ -831,9 +834,6 @@ static int __init uncore_type_init(struc type->pmu_group = &uncore_pmu_attr_group; return 0; -fail: - uncore_type_exit(type); - return -ENOMEM; } static int __init uncore_types_init(struct intel_uncore_type **types) @@ -843,13 +843,9 @@ static int __init uncore_types_init(stru for (i = 0; types[i]; i++) { ret = uncore_type_init(types[i]); if (ret) - goto fail; + return ret; } return 0; -fail: - while (--i >= 0) - uncore_type_exit(types[i]); - return ret; } /* @@ -1007,17 +1003,21 @@ static int __init uncore_pci_init(void) ret = uncore_types_init(uncore_pci_uncores); if (ret) - return ret; + goto err; uncore_pci_driver->probe = uncore_pci_probe; uncore_pci_driver->remove = uncore_pci_remove; ret = pci_register_driver(uncore_pci_driver); - if (ret == 0) - pcidrv_registered = true; - else - uncore_types_exit(uncore_pci_uncores); + if (ret) + goto err; + + pcidrv_registered = true; + return 0; +err: + uncore_types_exit(uncore_pci_uncores); + uncore_pci_uncores = empty_uncore; return ret; } @@ -1316,9 +1316,12 @@ static int __init uncore_cpu_init(void) ret = uncore_types_init(uncore_msr_uncores); if (ret) - return ret; - + goto err; return 0; +err: + uncore_types_exit(uncore_msr_uncores); + uncore_msr_uncores = empty_uncore; + return ret; } static int __init uncore_pmus_register(void)
[patch V3 00/28] x86/perf/intel/uncore|rapl: Fix error handling and sanitize pmu management
This series addresses the following issues: - Add proper error handling to uncore and rapl drivers - Get rid of the pseudo per cpuness of these drivers and do a proper per package storage - Allow them to be modular In order to do proper per package storage I added a facility which sanity checks the physical package id of the processors which is supplied by bios and does a logical package id translation. That allows drivers to do allocations for the maximum number of possible packages independent of possible BIOS creativity. The module patches are optional. Andi pointed out that they miss the proper auto loading/probing machinery, but I kept them and let Peter decide what to do with them. Changes vs. V2: - Change export to GPL - Fix the leftover while loop - Fix the snb pmu_private bogosity - Explain in the changelogs why we want to keep pmu_private - Drop the proc/cpuinfo change - Add an explanation to the Kconfig help text why selecting 'm' might break existing setups. Delta patch below. Thanks, tglx --- --- a/arch/x86/Kconfig.perf +++ b/arch/x86/Kconfig.perf @@ -8,6 +8,9 @@ config PERF_EVENTS_INTEL_UNCORE Include support for Intel uncore performance events. These are available on NehalemEX and more modern processors. + Note: Selecting 'm' might break existing setups as the drivers + lack the autoprobe/load magic. If you need them select: y. + If unsure say y. config PERF_EVENTS_INTEL_RAPL @@ -18,6 +21,9 @@ config PERF_EVENTS_INTEL_RAPL Include support for Intel rapl performance events for power monitoring on modern processors. + Note: Selecting 'm' might break existing setups as the drivers + lack the autoprobe/load magic. If you need them select: y. + If unsure say y. endmenu --- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c @@ -774,9 +774,9 @@ static void __uncore_exit_boxes(struct i static void uncore_exit_boxes(void *dummy) { - struct intel_uncore_type **types = uncore_msr_uncores; + struct intel_uncore_type **types; - while (*types) + for (types = uncore_msr_uncores; *types; types++) __uncore_exit_boxes(*types++, smp_processor_id()); } --- a/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c @@ -313,7 +313,7 @@ static int snb_uncore_imc_event_init(str return -EINVAL; event->cpu = box->cpu; - event->pmu_private = pmu; + event->pmu_private = box; event->hw.idx = -1; event->hw.last_tag = ~0ULL; --- a/arch/x86/kernel/cpu/proc.c +++ b/arch/x86/kernel/cpu/proc.c @@ -12,7 +12,6 @@ static void show_cpuinfo_core(struct seq { #ifdef CONFIG_SMP seq_printf(m, "physical id\t: %d\n", c->phys_proc_id); - seq_printf(m, "logical id\t: %d\n", c->logical_proc_id); seq_printf(m, "siblings\t: %d\n", cpumask_weight(topology_core_cpumask(cpu))); seq_printf(m, "core id\t\t: %d\n", c->cpu_core_id); --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -9335,7 +9335,7 @@ ssize_t perf_event_sysfs_show(struct dev return 0; } -EXPORT_SYMBOL(perf_event_sysfs_show); +EXPORT_SYMBOL_GPL(perf_event_sysfs_show); static int __init perf_event_sysfs_init(void) {
[patch V3 08/28] x86/perf/uncore: Make uncore_pcibus_to_physid static
No users outside of this file. Signed-off-by: Thomas Gleixner --- arch/x86/kernel/cpu/perf_event_intel_uncore.c |2 +- arch/x86/kernel/cpu/perf_event_intel_uncore.h |1 - 2 files changed, 1 insertion(+), 2 deletions(-) Index: b/arch/x86/kernel/cpu/perf_event_intel_uncore.c === --- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c @@ -21,7 +21,7 @@ static struct event_constraint uncore_co struct event_constraint uncore_constraint_empty = EVENT_CONSTRAINT(0, 0, 0); -int uncore_pcibus_to_physid(struct pci_bus *bus) +static int uncore_pcibus_to_physid(struct pci_bus *bus) { struct pci2phy_map *map; int phys_id = -1; Index: b/arch/x86/kernel/cpu/perf_event_intel_uncore.h === --- a/arch/x86/kernel/cpu/perf_event_intel_uncore.h +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.h @@ -125,7 +125,6 @@ struct pci2phy_map { int pbus_to_physid[256]; }; -int uncore_pcibus_to_physid(struct pci_bus *bus); struct pci2phy_map *__find_pci2phy_map(int segment); ssize_t uncore_event_show(struct kobject *kobj,
[patch V3 11/28] x86/topology: Create logical package id
For per package oriented services we must be able to rely on the number of cpu packages to be within bounds. Create a tracking facility, which - calculates the number of possible packages depending on nr_cpu_ids after boot - makes sure that the package id is within the number of possible packages. If the apic id is outside we map it to a logical package id if there is enough space available. Provide interfaces for drivers to query the mapping and do translations from physcial to logical ids. Signed-off-by: Thomas Gleixner --- arch/x86/include/asm/processor.h |2 arch/x86/include/asm/topology.h | 10 +++ arch/x86/kernel/apic/apic.c | 14 + arch/x86/kernel/cpu/common.c |2 arch/x86/kernel/cpu/intel.c | 13 + arch/x86/kernel/smpboot.c| 100 +++ 6 files changed, 141 insertions(+) --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -129,6 +129,8 @@ struct cpuinfo_x86 { u16 booted_cores; /* Physical processor id: */ u16 phys_proc_id; + /* Logical processor id: */ + u16 logical_proc_id; /* Core id: */ u16 cpu_core_id; /* Compute unit id */ --- a/arch/x86/include/asm/topology.h +++ b/arch/x86/include/asm/topology.h @@ -119,12 +119,22 @@ static inline void setup_node_to_cpumask extern const struct cpumask *cpu_coregroup_mask(int cpu); +#define topology_logical_package_id(cpu) (cpu_data(cpu).logical_proc_id) #define topology_physical_package_id(cpu) (cpu_data(cpu).phys_proc_id) #define topology_core_id(cpu) (cpu_data(cpu).cpu_core_id) #ifdef ENABLE_TOPO_DEFINES #define topology_core_cpumask(cpu) (per_cpu(cpu_core_map, cpu)) #define topology_sibling_cpumask(cpu) (per_cpu(cpu_sibling_map, cpu)) + +extern unsigned int __max_logical_packages; +#define topology_max_packages()(__max_logical_packages) +int topology_update_package_map(unsigned int apicid, unsigned int cpu); +extern int topology_phys_to_logical_pkg(unsigned int pkg); +#else +#define topology_max_packages()(1) +static inline int +topology_update_package_map(unsigned int apicid, unsigned int cpu) { return 0; } #endif static inline void arch_fix_phys_package_id(int num, u32 slot) --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -2078,6 +2078,20 @@ int generic_processor_info(int apicid, i cpu = cpumask_next_zero(-1, cpu_present_mask); /* +* This can happen on physical hotplug. The sanity check at boot time +* is done from native_smp_prepare_cpus() after num_possible_cpus() is +* established. +*/ + if (topology_update_package_map(apicid, cpu) < 0) { + int thiscpu = max + disabled_cpus; + + pr_warning("ACPI: Package limit reached. Processor %d/0x%x ignored.\n", + thiscpu, apicid); + disabled_cpus++; + return -ENOSPC; + } + + /* * Validate version */ if (version == 0x0) { --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -977,6 +977,8 @@ static void identify_cpu(struct cpuinfo_ #ifdef CONFIG_NUMA numa_add_cpu(smp_processor_id()); #endif + /* The boot/hotplug time assigment got cleared, restore it */ + c->logical_proc_id = topology_phys_to_logical_pkg(c->phys_proc_id); } /* --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -160,6 +160,19 @@ static void early_init_intel(struct cpui pr_info("Disabling PGE capability bit\n"); setup_clear_cpu_cap(X86_FEATURE_PGE); } + + if (c->cpuid_level >= 0x0001) { + u32 eax, ebx, ecx, edx; + + cpuid(0x0001, &eax, &ebx, &ecx, &edx); + /* +* If HTT (EDX[28]) is set EBX[16:23] contain the number of +* apicids which are reserved per package. Store the resulting +* shift value for the package management code. +*/ + if (edx & (1U << 28)) + c->x86_coreid_bits = get_count_order((ebx >> 16) & 0xff); + } } #ifdef CONFIG_X86_32 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -97,6 +97,14 @@ DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t DEFINE_PER_CPU_READ_MOSTLY(struct cpuinfo_x86, cpu_info); EXPORT_PER_CPU_SYMBOL(cpu_info); +/* Logical package management. We might want to allocate that dynamically */ +static int *physical_to_logical_pkg __read_mostly; +static unsigned long *physical_package_map __read_mostly;; +static unsigned long *logical_package_map __read_mostly; +static unsigned int max_physical_pkg_id __read_mostly; +unsigned int __max_logical_packages __read_mostly; +EXP
[patch V3 03/28] x86/perf/intel/uncore: Fix error handling
This driver lacks any form of proper error handling. If initialization fails or hotplug prepare fails, it lets the facility with half initialized stuff around. Fix the state and memory leaks in a first step. As a second step we need to undo the hardware state which is set via uncore_box_init() on some of the uncore implementations. Signed-off-by: Thomas Gleixner --- arch/x86/kernel/cpu/perf_event_intel_uncore.c | 122 ++ arch/x86/kernel/cpu/perf_event_intel_uncore.h | 15 +-- 2 files changed, 94 insertions(+), 43 deletions(-) Index: b/arch/x86/kernel/cpu/perf_event_intel_uncore.c === --- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c @@ -38,6 +38,16 @@ int uncore_pcibus_to_physid(struct pci_b return phys_id; } +static void uncore_free_pcibus_map(void) +{ + struct pci2phy_map *map, *tmp; + + list_for_each_entry_safe(map, tmp, &pci2phy_map_head, list) { + list_del(&map->list); + kfree(map); + } +} + struct pci2phy_map *__find_pci2phy_map(int segment) { struct pci2phy_map *map, *alloc = NULL; @@ -760,16 +770,28 @@ static int uncore_pmu_register(struct in } ret = perf_pmu_register(&pmu->pmu, pmu->name, -1); + if (!ret) + pmu->registered = true; return ret; } +static void uncore_pmu_unregister(struct intel_uncore_pmu *pmu) +{ + if (!pmu->registered) + return; + perf_pmu_unregister(&pmu->pmu); + pmu->registered = false; +} + static void __init uncore_type_exit(struct intel_uncore_type *type) { int i; if (type->pmus) { - for (i = 0; i < type->num_boxes; i++) + for (i = 0; i < type->num_boxes; i++) { + uncore_pmu_unregister(&type->pmus[i]); free_percpu(type->pmus[i].box); + } kfree(type->pmus); type->pmus = NULL; } @@ -856,8 +878,8 @@ static int uncore_pci_probe(struct pci_d struct intel_uncore_pmu *pmu; struct intel_uncore_box *box; struct intel_uncore_type *type; - int phys_id; bool first_box = false; + int phys_id, ret; phys_id = uncore_pcibus_to_physid(pdev->bus); if (phys_id < 0) @@ -906,9 +928,18 @@ static int uncore_pci_probe(struct pci_d list_add_tail(&box->list, &pmu->box_list); raw_spin_unlock(&uncore_box_lock); - if (first_box) - uncore_pmu_register(pmu); - return 0; + if (!first_box) + return 0; + + ret = uncore_pmu_register(pmu); + if (ret) { + pci_set_drvdata(pdev, NULL); + raw_spin_lock(&uncore_box_lock); + list_del(&box->list); + raw_spin_unlock(&uncore_box_lock); + kfree(box); + } + return ret; } static void uncore_pci_remove(struct pci_dev *pdev) @@ -954,7 +985,7 @@ static void uncore_pci_remove(struct pci kfree(box); if (last_box) - perf_pmu_unregister(&pmu->pmu); + uncore_pmu_unregister(pmu); } static int __init uncore_pci_init(void) @@ -1018,6 +1049,7 @@ static int __init uncore_pci_init(void) err: uncore_types_exit(uncore_pci_uncores); uncore_pci_uncores = empty_uncore; + uncore_free_pcibus_map(); return ret; } @@ -1027,6 +1059,7 @@ static void __init uncore_pci_exit(void) pcidrv_registered = false; pci_unregister_driver(uncore_pci_driver); uncore_types_exit(uncore_pci_uncores); + uncore_free_pcibus_map(); } } @@ -1223,8 +1256,7 @@ static int uncore_cpu_notifier(struct no /* allocate/free data structure for uncore box */ switch (action & ~CPU_TASKS_FROZEN) { case CPU_UP_PREPARE: - uncore_cpu_prepare(cpu, -1); - break; + return notifier_from_errno(uncore_cpu_prepare(cpu, -1)); case CPU_STARTING: uncore_cpu_starting(cpu); break; @@ -1265,9 +1297,29 @@ static struct notifier_block uncore_cpu_ .priority = CPU_PRI_PERF + 1, }; -static void __init uncore_cpu_setup(void *dummy) +static int __init type_pmu_register(struct intel_uncore_type *type) { - uncore_cpu_starting(smp_processor_id()); + int i, ret; + + for (i = 0; i < type->num_boxes; i++) { + ret = uncore_pmu_register(&type->pmus[i]); + if (ret) + return ret; + } + return 0; +} + +static int __init uncore_msr_pmus_register(void) +{ + struct intel_uncore_type **types = uncore_msr_uncores; + int ret; + + while (*types) { + ret = type_pmu_register(*types++); + if (ret) +
[patch V3 10/28] x86/perf/intel_uncore: Store box in event->pmu_private
Store the pmu in event->pmu_private, so we can get rid of the per cpu data storage. We keep it after converting to per package data, because a cpu to package lookup will be 3 loads versus one and these usage sites are the perf fast pathes. Signed-off-by: Thomas Gleixner --- arch/x86/kernel/cpu/perf_event_intel_uncore.c | 15 +-- arch/x86/kernel/cpu/perf_event_intel_uncore.h | 12 ++-- arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c |1 + 3 files changed, 12 insertions(+), 16 deletions(-) --- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c @@ -92,11 +92,6 @@ ssize_t uncore_event_show(struct kobject return sprintf(buf, "%s", event->config); } -struct intel_uncore_pmu *uncore_event_to_pmu(struct perf_event *event) -{ - return container_of(event->pmu, struct intel_uncore_pmu, pmu); -} - struct intel_uncore_box *uncore_pmu_to_box(struct intel_uncore_pmu *pmu, int cpu) { struct intel_uncore_box *box; @@ -122,15 +117,6 @@ struct intel_uncore_box *uncore_pmu_to_b return *per_cpu_ptr(pmu->box, cpu); } -struct intel_uncore_box *uncore_event_to_box(struct perf_event *event) -{ - /* -* perf core schedules event on the basis of cpu, uncore events are -* collected by one of the cpus inside a physical package. -*/ - return uncore_pmu_to_box(uncore_event_to_pmu(event), smp_processor_id()); -} - u64 uncore_msr_read_counter(struct intel_uncore_box *box, struct perf_event *event) { u64 count; @@ -690,6 +676,7 @@ static int uncore_pmu_event_init(struct if (!box || box->cpu < 0) return -EINVAL; event->cpu = box->cpu; + event->pmu_private = box; event->hw.idx = -1; event->hw.last_tag = ~0ULL; --- a/arch/x86/kernel/cpu/perf_event_intel_uncore.h +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.h @@ -319,9 +319,17 @@ static inline bool uncore_box_is_fake(st return (box->phys_id < 0); } -struct intel_uncore_pmu *uncore_event_to_pmu(struct perf_event *event); +static inline struct intel_uncore_pmu *uncore_event_to_pmu(struct perf_event *event) +{ + return container_of(event->pmu, struct intel_uncore_pmu, pmu); +} + +static inline struct intel_uncore_box *uncore_event_to_box(struct perf_event *event) +{ + return event->pmu_private; +} + struct intel_uncore_box *uncore_pmu_to_box(struct intel_uncore_pmu *pmu, int cpu); -struct intel_uncore_box *uncore_event_to_box(struct perf_event *event); u64 uncore_msr_read_counter(struct intel_uncore_box *box, struct perf_event *event); void uncore_pmu_start_hrtimer(struct intel_uncore_box *box); void uncore_pmu_cancel_hrtimer(struct intel_uncore_box *box); --- a/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c @@ -313,6 +313,7 @@ static int snb_uncore_imc_event_init(str return -EINVAL; event->cpu = box->cpu; + event->pmu_private = box; event->hw.idx = -1; event->hw.last_tag = ~0ULL;
[patch V3 14/28] x86/perf/intel_uncore: Make PCI and MSR uncore independent
Andi wanted to do this before, but the patch fell down the cracks. Implement it with the proper error handling. Requested-by: Andi Kleen Signed-off-by: Thomas Gleixner --- arch/x86/kernel/cpu/perf_event_intel_uncore.c | 34 +- 1 file changed, 18 insertions(+), 16 deletions(-) Index: b/arch/x86/kernel/cpu/perf_event_intel_uncore.c === --- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c @@ -1023,7 +1023,7 @@ static int __init uncore_pci_init(void) ret = skl_uncore_pci_init(); break; default: - return 0; + return -ENODEV; } if (ret) @@ -1319,7 +1319,7 @@ static int __init uncore_cpu_init(void) knl_uncore_cpu_init(); break; default: - return 0; + return -ENODEV; } ret = uncore_types_init(uncore_msr_uncores, true); @@ -1344,7 +1344,7 @@ static void __init uncore_cpu_setup(void /* Lazy to avoid allocation of a few bytes for the normal case */ static __initdata DECLARE_BITMAP(packages, MAX_LOCAL_APIC); -static int __init uncore_cpumask_init(void) +static int __init uncore_cpumask_init(bool msr) { unsigned int cpu; @@ -1355,12 +1355,15 @@ static int __init uncore_cpumask_init(vo if (test_and_set_bit(pkg, packages)) continue; /* -* The first online cpu of each package takes the refcounts -* for all other online cpus in that package. +* The first online cpu of each package allocates and takes +* the refcounts for all other online cpus in that package. +* If msrs are not enabled no allocation is required. */ - ret = uncore_cpu_prepare(cpu); - if (ret) - return ret; + if (msr) { + ret = uncore_cpu_prepare(cpu); + if (ret) + return ret; + } uncore_event_init_cpu(cpu); smp_call_function_single(cpu, uncore_cpu_setup, NULL, 1); } @@ -1370,7 +1373,7 @@ static int __init uncore_cpumask_init(vo static int __init intel_uncore_init(void) { - int ret; + int pret, cret, ret; if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) return -ENODEV; @@ -1380,15 +1383,14 @@ static int __init intel_uncore_init(void max_packages = topology_max_packages(); - ret = uncore_pci_init(); - if (ret) - return ret; - ret = uncore_cpu_init(); - if (ret) - goto err; + pret = uncore_pci_init(); + cret = uncore_cpu_init(); + + if (cret && pret) + return -ENODEV; cpu_notifier_register_begin(); - ret = uncore_cpumask_init(); + ret = uncore_cpumask_init(!cret); if (ret) goto err; cpu_notifier_register_done();
[patch V3 09/28] perf: Allow storage of pmu private data in event
For pmus which are not per cpu, but e.g. per package/socket, we want to be able to store a reference to the underlying per package/socket facility in the event at init time so we can avoid magic storage constructs in the pmu driver. This allows us to get rid of the per cpu dance in the intel uncore and rapl drivers and avoids a lookup of the per package data in the perf hotpath. Signed-off-by: Thomas Gleixner --- include/linux/perf_event.h |1 + 1 file changed, 1 insertion(+) --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -467,6 +467,7 @@ struct perf_event { int group_flags; struct perf_event *group_leader; struct pmu *pmu; + void*pmu_private; enum perf_event_active_statestate; unsigned intattach_state;
[patch V3 15/28] cpumask: Export cpumask_any_but
Almost every cpumask function is exported, just not the one I need to make the intel uncore driver modular. Signed-off-by: Thomas Gleixner --- lib/cpumask.c |1 + 1 file changed, 1 insertion(+) Index: b/lib/cpumask.c === --- a/lib/cpumask.c +++ b/lib/cpumask.c @@ -41,6 +41,7 @@ int cpumask_any_but(const struct cpumask break; return i; } +EXPORT_SYMBOL(cpumask_any_but); /* These are not inline because of header tangles. */ #ifdef CONFIG_CPUMASK_OFFSTACK
[patch V3 12/28] x86/perf/uncore: Track packages not per cpu data
uncore is a per package facility, but the code tries to mimick a per cpu facility with completely convoluted constructs. Simplify the whole machinery by tracking per package information. While at it, avoid the kfree/alloc dance when a cpu goes offline and online again. There is no point in freeing the box after it was allocated. We just keep proper refcounting and the first cpu which comes online in a package does the initialization/activation of the box. Signed-off-by: Thomas Gleixner --- arch/x86/kernel/cpu/perf_event_intel_uncore.c | 420 arch/x86/kernel/cpu/perf_event_intel_uncore.h | 18 arch/x86/kernel/cpu/perf_event_intel_uncore_snbep.c | 10 3 files changed, 197 insertions(+), 251 deletions(-) --- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c @@ -9,9 +9,9 @@ struct pci_driver *uncore_pci_driver; /* pci bus to socket mapping */ DEFINE_RAW_SPINLOCK(pci2phy_map_lock); struct list_head pci2phy_map_head = LIST_HEAD_INIT(pci2phy_map_head); -struct pci_dev *uncore_extra_pci_dev[UNCORE_SOCKET_MAX][UNCORE_EXTRA_PCI_DEV_MAX]; +struct pci_extra_dev *uncore_extra_pci_dev; +static int max_packages; -static DEFINE_RAW_SPINLOCK(uncore_box_lock); /* mask of cpus that collect uncore events */ static cpumask_t uncore_cpu_mask; @@ -94,27 +94,7 @@ ssize_t uncore_event_show(struct kobject struct intel_uncore_box *uncore_pmu_to_box(struct intel_uncore_pmu *pmu, int cpu) { - struct intel_uncore_box *box; - - box = *per_cpu_ptr(pmu->box, cpu); - if (box) - return box; - - raw_spin_lock(&uncore_box_lock); - /* Recheck in lock to handle races. */ - if (*per_cpu_ptr(pmu->box, cpu)) - goto out; - list_for_each_entry(box, &pmu->box_list, list) { - if (box->phys_id == topology_physical_package_id(cpu)) { - atomic_inc(&box->refcnt); - *per_cpu_ptr(pmu->box, cpu) = box; - break; - } - } -out: - raw_spin_unlock(&uncore_box_lock); - - return *per_cpu_ptr(pmu->box, cpu); + return pmu->boxes[topology_logical_package_id(cpu)]; } u64 uncore_msr_read_counter(struct intel_uncore_box *box, struct perf_event *event) @@ -315,9 +295,9 @@ static struct intel_uncore_box *uncore_a raw_spin_lock_init(&box->shared_regs[i].lock); uncore_pmu_init_hrtimer(box); - atomic_set(&box->refcnt, 1); box->cpu = -1; - box->phys_id = -1; + box->pci_phys_id = -1; + box->pkgid = -1; /* set default hrtimer timeout */ box->hrtimer_duration = UNCORE_PMU_HRTIMER_INTERVAL; @@ -774,14 +754,24 @@ static void uncore_pmu_unregister(struct pmu->registered = false; } +static void uncore_free_boxes(struct intel_uncore_pmu *pmu) +{ + int pkg; + + for (pkg = 0; pkg < max_packages; pkg++) + kfree(pmu->boxes[pkg]); + kfree(pmu->boxes); +} + static void __init uncore_type_exit(struct intel_uncore_type *type) { + struct intel_uncore_pmu *pmu = type->pmus; int i; - if (type->pmus) { - for (i = 0; i < type->num_boxes; i++) { - uncore_pmu_unregister(&type->pmus[i]); - free_percpu(type->pmus[i].box); + if (pmu) { + for (i = 0; i < type->num_boxes; i++, pmu++) { + uncore_pmu_unregister(pmu); + uncore_free_boxes(pmu); } kfree(type->pmus); type->pmus = NULL; @@ -796,37 +786,36 @@ static void __init uncore_types_exit(str uncore_type_exit(*types); } -static int __init uncore_type_init(struct intel_uncore_type *type) +static int __init uncore_type_init(struct intel_uncore_type *type, bool setid) { struct intel_uncore_pmu *pmus; struct attribute_group *attr_group; struct attribute **attrs; + size_t size; int i, j; pmus = kzalloc(sizeof(*pmus) * type->num_boxes, GFP_KERNEL); if (!pmus) return -ENOMEM; - type->pmus = pmus; - - type->unconstrainted = (struct event_constraint) - __EVENT_CONSTRAINT(0, (1ULL << type->num_counters) - 1, - 0, type->num_counters, 0, 0); + size = max_packages * sizeof(struct intel_uncore_box *); for (i = 0; i < type->num_boxes; i++) { - pmus[i].func_id = -1; - pmus[i].pmu_idx = i; - pmus[i].type = type; - INIT_LIST_HEAD(&pmus[i].box_list); - pmus[i].box = alloc_percpu(struct intel_uncore_box *); - if (!pmus[i].box) + pmus[i].func_id = setid ? i : -1; + pmus[i].pmu_idx = i; + pmus[i].type= type; + pmus[i].boxes = kzalloc(size, GFP_KERNEL); +
[patch V3 16/28] x86/perf/intel_uncore: Make it modular
Now that we have a proper cleanup all over the place, it's simple to make this a modular driver. Signed-off-by: Thomas Gleixner --- arch/x86/Kconfig |6 ++ arch/x86/Kconfig.perf | 16 arch/x86/kernel/cpu/Makefile |3 ++- arch/x86/kernel/cpu/perf_event_intel_uncore.c | 24 ++-- 4 files changed, 38 insertions(+), 11 deletions(-) --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -160,10 +160,6 @@ config INSTRUCTION_DECODER def_bool y depends on KPROBES || PERF_EVENTS || UPROBES -config PERF_EVENTS_INTEL_UNCORE - def_bool y - depends on PERF_EVENTS && CPU_SUP_INTEL && PCI - config OUTPUT_FORMAT string default "elf32-i386" if X86_32 @@ -1039,6 +1035,8 @@ config X86_THERMAL_VECTOR def_bool y depends on X86_MCE_INTEL +source "arch/x86/Kconfig.perf" + config X86_LEGACY_VM86 bool "Legacy VM86 support" default n --- /dev/null +++ b/arch/x86/Kconfig.perf @@ -0,0 +1,16 @@ +menu "Performance monitoring" + +config PERF_EVENTS_INTEL_UNCORE + tristate "Intel uncore performance events" + depends on PERF_EVENTS && CPU_SUP_INTEL && PCI + default y + ---help--- + Include support for Intel uncore performance events. These are + available on NehalemEX and more modern processors. + + Note: Selecting 'm' might break existing setups as the drivers + lack the autoprobe/load magic. If you need them select: y. + + If unsure say y. + +endmenu --- a/arch/x86/kernel/cpu/Makefile +++ b/arch/x86/kernel/cpu/Makefile @@ -43,7 +43,8 @@ obj-$(CONFIG_CPU_SUP_INTEL) += perf_eve obj-$(CONFIG_CPU_SUP_INTEL)+= perf_event_intel_pt.o perf_event_intel_bts.o obj-$(CONFIG_CPU_SUP_INTEL)+= perf_event_intel_cstate.o -obj-$(CONFIG_PERF_EVENTS_INTEL_UNCORE) += perf_event_intel_uncore.o \ +obj-$(CONFIG_PERF_EVENTS_INTEL_UNCORE) += perf_event_intel_uncores.o +perf_event_intel_uncores-objs := perf_event_intel_uncore.o \ perf_event_intel_uncore_snb.o \ perf_event_intel_uncore_snbep.o \ perf_event_intel_uncore_nhmex.o --- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c @@ -21,6 +21,8 @@ static struct event_constraint uncore_co struct event_constraint uncore_constraint_empty = EVENT_CONSTRAINT(0, 0, 0); +MODULE_LICENSE("GPL"); + static int uncore_pcibus_to_physid(struct pci_bus *bus) { struct pci2phy_map *map; @@ -754,7 +756,7 @@ static void uncore_pmu_unregister(struct pmu->registered = false; } -static void __init __uncore_exit_boxes(struct intel_uncore_type *type, int cpu) +static void __uncore_exit_boxes(struct intel_uncore_type *type, int cpu) { struct intel_uncore_pmu *pmu = type->pmus; struct intel_uncore_box *box; @@ -770,7 +772,7 @@ static void __init __uncore_exit_boxes(s } } -static void __init uncore_exit_boxes(void *dummy) +static void uncore_exit_boxes(void *dummy) { struct intel_uncore_type **types; @@ -787,7 +789,7 @@ static void uncore_free_boxes(struct int kfree(pmu->boxes); } -static void __init uncore_type_exit(struct intel_uncore_type *type) +static void uncore_type_exit(struct intel_uncore_type *type) { struct intel_uncore_pmu *pmu = type->pmus; int i; @@ -804,7 +806,7 @@ static void __init uncore_type_exit(stru type->events_group = NULL; } -static void __init uncore_types_exit(struct intel_uncore_type **types) +static void uncore_types_exit(struct intel_uncore_type **types) { for (; *types; types++) uncore_type_exit(*types); @@ -1060,7 +1062,7 @@ static int __init uncore_pci_init(void) return ret; } -static void __init uncore_pci_exit(void) +static void uncore_pci_exit(void) { if (pcidrv_registered) { pcidrv_registered = false; @@ -1404,4 +1406,14 @@ static int __init intel_uncore_init(void cpu_notifier_register_done(); return ret; } -device_initcall(intel_uncore_init); +module_init(intel_uncore_init); + +static void __exit intel_uncore_exit(void) +{ + cpu_notifier_register_done(); + __unregister_cpu_notifier(&uncore_cpu_nb); + uncore_types_exit(uncore_msr_uncores); + uncore_pci_exit(); + cpu_notifier_register_done(); +} +module_exit(intel_uncore_exit);
[patch V3 17/28] x86/perf/cqm: Get rid of the silly for_each_cpu lookups
CQM is a strict per package facility. Use the proper cpumasks to lookup the readers. Signed-off-by: Thomas Gleixner --- arch/x86/kernel/cpu/perf_event_intel_cqm.c | 34 ++--- 1 file changed, 12 insertions(+), 22 deletions(-) Index: b/arch/x86/kernel/cpu/perf_event_intel_cqm.c === --- a/arch/x86/kernel/cpu/perf_event_intel_cqm.c +++ b/arch/x86/kernel/cpu/perf_event_intel_cqm.c @@ -1244,15 +1244,12 @@ static struct pmu intel_cqm_pmu = { static inline void cqm_pick_event_reader(int cpu) { - int phys_id = topology_physical_package_id(cpu); - int i; + int reader; - for_each_cpu(i, &cqm_cpumask) { - if (phys_id == topology_physical_package_id(i)) - return; /* already got reader for this socket */ - } - - cpumask_set_cpu(cpu, &cqm_cpumask); + /* First online cpu in package becomes the reader */ + reader = cpumask_any_and(&cqm_cpumask, topology_core_cpumask(cpu)); + if (reader >= nr_cpu_ids) + cpumask_set_cpu(cpu, &cqm_cpumask); } static void intel_cqm_cpu_starting(unsigned int cpu) @@ -1270,24 +1267,17 @@ static void intel_cqm_cpu_starting(unsig static void intel_cqm_cpu_exit(unsigned int cpu) { - int phys_id = topology_physical_package_id(cpu); - int i; + int target; - /* -* Is @cpu a designated cqm reader? -*/ + /* Is @cpu the current cqm reader for this package ? */ if (!cpumask_test_and_clear_cpu(cpu, &cqm_cpumask)) return; - for_each_online_cpu(i) { - if (i == cpu) - continue; - - if (phys_id == topology_physical_package_id(i)) { - cpumask_set_cpu(i, &cqm_cpumask); - break; - } - } + /* Find another online reader in this package */ + target = cpumask_any_but(topology_core_cpumask(cpu), cpu); + + if (target < nr_cpu_ids) + cpumask_set_cpu(target, &cqm_cpumask); } static int intel_cqm_cpu_notifier(struct notifier_block *nb,
[patch V3 20/28] x86/perf/intel/rapl: Sanitize the quirk handling
There is no point in having a quirk machinery for a single possible function. Get rid of it and move the quirk to a place where it makes actually sense. Signed-off-by: Thomas Gleixner --- arch/x86/kernel/cpu/perf_event_intel_rapl.c | 49 ++-- 1 file changed, 19 insertions(+), 30 deletions(-) Index: b/arch/x86/kernel/cpu/perf_event_intel_rapl.c === --- a/arch/x86/kernel/cpu/perf_event_intel_rapl.c +++ b/arch/x86/kernel/cpu/perf_event_intel_rapl.c @@ -133,7 +133,6 @@ static int rapl_cntr_mask; static DEFINE_PER_CPU(struct rapl_pmu *, rapl_pmu); static DEFINE_PER_CPU(struct rapl_pmu *, rapl_pmu_to_free); -static struct x86_pmu_quirk *rapl_quirks; static inline u64 rapl_read_counter(struct perf_event *event) { u64 raw; @@ -141,15 +140,6 @@ static inline u64 rapl_read_counter(stru return raw; } -#define rapl_add_quirk(func_) \ -do { \ - static struct x86_pmu_quirk __quirk __initdata = { \ - .func = func_, \ - }; \ - __quirk.next = rapl_quirks; \ - rapl_quirks = &__quirk; \ -} while (0) - static inline u64 rapl_scale(u64 v, int cfg) { if (cfg > NR_RAPL_DOMAINS) { @@ -564,17 +554,6 @@ static void rapl_cpu_init(int cpu) cpumask_set_cpu(cpu, &rapl_cpu_mask); } -static __init void rapl_hsw_server_quirk(void) -{ - /* -* DRAM domain on HSW server has fixed energy unit which can be -* different than the unit from power unit MSR. -* "Intel Xeon Processor E5-1600 and E5-2600 v3 Product Families, V2 -* of 2. Datasheet, September 2014, Reference Number: 330784-001 " -*/ - rapl_hw_unit[RAPL_IDX_RAM_NRG_STAT] = 16; -} - static int rapl_cpu_prepare(int cpu) { struct rapl_pmu *pmu = per_cpu(rapl_pmu, cpu); @@ -672,7 +651,18 @@ static int rapl_cpu_notifier(struct noti return NOTIFY_OK; } -static int rapl_check_hw_unit(void) +static __init void rapl_hsw_server_quirk(void) +{ + /* +* DRAM domain on HSW server has fixed energy unit which can be +* different than the unit from power unit MSR. +* "Intel Xeon Processor E5-1600 and E5-2600 v3 Product Families, V2 +* of 2. Datasheet, September 2014, Reference Number: 330784-001 " +*/ + rapl_hw_unit[RAPL_IDX_RAM_NRG_STAT] = 16; +} + +static int rapl_check_hw_unit(void (*quirk)(void)) { u64 msr_rapl_power_unit_bits; int i; @@ -683,6 +673,9 @@ static int rapl_check_hw_unit(void) for (i = 0; i < NR_RAPL_DOMAINS; i++) rapl_hw_unit[i] = (msr_rapl_power_unit_bits >> 8) & 0x1FULL; + /* Apply cpu model quirk */ + if (quirk) + quirk(); return 0; } @@ -701,9 +694,9 @@ static const struct x86_cpu_id rapl_cpu_ static int __init rapl_pmu_init(void) { + void (*quirk)(void) = NULL; struct rapl_pmu *pmu; int cpu, ret; - struct x86_pmu_quirk *quirk; int i; /* @@ -720,7 +713,7 @@ static int __init rapl_pmu_init(void) rapl_pmu_events_group.attrs = rapl_events_cln_attr; break; case 63: /* Haswell-Server */ - rapl_add_quirk(rapl_hsw_server_quirk); + quirk = rapl_hsw_server_quirk; rapl_cntr_mask = RAPL_IDX_SRV; rapl_pmu_events_group.attrs = rapl_events_srv_attr; break; @@ -736,7 +729,7 @@ static int __init rapl_pmu_init(void) rapl_pmu_events_group.attrs = rapl_events_srv_attr; break; case 87: /* Knights Landing */ - rapl_add_quirk(rapl_hsw_server_quirk); + quirk = rapl_hsw_server_quirk; rapl_cntr_mask = RAPL_IDX_KNL; rapl_pmu_events_group.attrs = rapl_events_knl_attr; break; @@ -745,14 +738,10 @@ static int __init rapl_pmu_init(void) return -ENODEV; } - ret = rapl_check_hw_unit(); + ret = rapl_check_hw_unit(quirk); if (ret) return ret; - /* run cpu model quirks */ - for (quirk = rapl_quirks; quirk; quirk = quirk->next) - quirk->func(); - cpu_notifier_register_begin(); for_each_online_cpu(cpu) {
[patch V3 21/28] x86/perf/intel/rapl: Calculate timing once
No point in doing the same calculation over and over. Do it once in rapl_check_hw_unit(). Signed-off-by: Thomas Gleixner --- arch/x86/kernel/cpu/perf_event_intel_rapl.c | 39 1 file changed, 18 insertions(+), 21 deletions(-) Index: b/arch/x86/kernel/cpu/perf_event_intel_rapl.c === --- a/arch/x86/kernel/cpu/perf_event_intel_rapl.c +++ b/arch/x86/kernel/cpu/perf_event_intel_rapl.c @@ -129,6 +129,7 @@ static int rapl_hw_unit[NR_RAPL_DOMAINS] static struct pmu rapl_pmu_class; static cpumask_t rapl_cpu_mask; static int rapl_cntr_mask; +static u64 rapl_timer_ms; static DEFINE_PER_CPU(struct rapl_pmu *, rapl_pmu); static DEFINE_PER_CPU(struct rapl_pmu *, rapl_pmu_to_free); @@ -558,7 +559,6 @@ static int rapl_cpu_prepare(int cpu) { struct rapl_pmu *pmu = per_cpu(rapl_pmu, cpu); int phys_id = topology_physical_package_id(cpu); - u64 ms; if (pmu) return 0; @@ -575,19 +575,7 @@ static int rapl_cpu_prepare(int cpu) pmu->pmu = &rapl_pmu_class; - /* -* use reference of 200W for scaling the timeout -* to avoid missing counter overflows. -* 200W = 200 Joules/sec -* divide interval by 2 to avoid lockstep (2 * 100) -* if hw unit is 32, then we use 2 ms 1/200/2 -*/ - if (rapl_hw_unit[0] < 32) - ms = (1000 / (2 * 100)) * (1ULL << (32 - rapl_hw_unit[0] - 1)); - else - ms = 2; - - pmu->timer_interval = ms_to_ktime(ms); + pmu->timer_interval = ms_to_ktime(rapl_timer_ms); rapl_hrtimer_init(pmu); @@ -676,6 +664,19 @@ static int rapl_check_hw_unit(void (*qui /* Apply cpu model quirk */ if (quirk) quirk(); + + /* +* Calculate the timer rate: +* Use reference of 200W for scaling the timeout to avoid counter +* overflows. 200W = 200 Joules/sec +* Divide interval by 2 to avoid lockstep (2 * 100) +* if hw unit is 32, then we use 2 ms 1/200/2 +*/ + rapl_timer_ms = 2; + if (rapl_hw_unit[0] < 32) { + rapl_timer_ms = (1000 / (2 * 100)); + rapl_timer_ms *= (1ULL << (32 - rapl_hw_unit[0] - 1)); + } return 0; } @@ -695,9 +696,7 @@ static const struct x86_cpu_id rapl_cpu_ static int __init rapl_pmu_init(void) { void (*quirk)(void) = NULL; - struct rapl_pmu *pmu; - int cpu, ret; - int i; + int cpu, ret, i; /* * check for Intel processor family 6 @@ -758,15 +757,14 @@ static int __init rapl_pmu_init(void) } __perf_cpu_notifier(rapl_cpu_notifier); - - pmu = __this_cpu_read(rapl_pmu); + cpu_notifier_register_done(); pr_info("RAPL PMU detected," " API unit is 2^-32 Joules," " %d fixed counters" " %llu ms ovfl timer\n", hweight32(rapl_cntr_mask), - ktime_to_ms(pmu->timer_interval)); + rapl_timer_ms); for (i = 0; i < NR_RAPL_DOMAINS; i++) { if (rapl_cntr_mask & (1 << i)) { pr_info("hw unit of domain %s 2^-%d Joules\n", @@ -774,7 +772,6 @@ static int __init rapl_pmu_init(void) } } - cpu_notifier_register_done(); return 0; out:
[patch V3 18/28] x86/perf/intel_rapl: Make Knights Landings support functional
The Knights Landings support added the events and the detection case, but then returns 0 without actually initializing the driver. Fixes: 3a2a7797326a4 "perf/x86/intel/rapl: Add support for Knights Landing (KNL)" Signed-off-by: Thomas Gleixner Cc: Dasaratharaman Chandramouli --- arch/x86/kernel/cpu/perf_event_intel_rapl.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: b/arch/x86/kernel/cpu/perf_event_intel_rapl.c === --- a/arch/x86/kernel/cpu/perf_event_intel_rapl.c +++ b/arch/x86/kernel/cpu/perf_event_intel_rapl.c @@ -731,7 +731,7 @@ static int __init rapl_pmu_init(void) rapl_add_quirk(rapl_hsw_server_quirk); rapl_cntr_mask = RAPL_IDX_KNL; rapl_pmu_events_group.attrs = rapl_events_knl_attr; - + break; default: /* unsupported */ return 0;