[PATCH] PM / suspend: Count suspend-to-idle loop as sleep time
From: Rafael J. Wysocki There is a difference in behavior between suspend-to-idle and suspend-to-RAM in the timekeeping handling that leads to functional issues. Namely, every iteration of the loop in s2idle_loop() increases the monotinic clock somewhat, even if timekeeping_suspend() and timekeeping_resume() are invoked from s2idle_enter(), and if many of them are carried out in a row, the monotonic clock can grow significantly while the system is regarded as suspended, which doesn't happen during suspend-to-RAM and so it is unexpected and leads to confusion and misbehavior in user space (similar to what ensued when we tried to combine the boottime and monotonic clocks). To avoid that, count all iterations of the loop in s2idle_loop() as "sleep time" and adjust the clock for that on exit from suspend-to-idle. [That also covers systems on which timekeeping is not suspended by by s2idle_enter().] Signed-off-by: Rafael J. Wysocki --- This is a replacement for https://patchwork.kernel.org/patch/10599209/ I decided to count the entire loop in s2idle_loop() as "sleep time" as the patch is then simpler and it also covers systems where timekeeping is not suspended in the final step of suspend-to-idle. I dropped the "Fixes:" tag, because the monotonic clock delta problem has been present on the latter since the very introduction of "freeze" (as suspend-to-idle was referred to previously) and so this doesn't fix any particular later commits. --- kernel/power/suspend.c | 18 ++ 1 file changed, 18 insertions(+) Index: linux-pm/kernel/power/suspend.c === --- linux-pm.orig/kernel/power/suspend.c +++ linux-pm/kernel/power/suspend.c @@ -109,8 +109,12 @@ static void s2idle_enter(void) static void s2idle_loop(void) { + ktime_t start, delta; + pm_pr_dbg("suspend-to-idle\n"); + start = ktime_get(); + for (;;) { int error; @@ -150,6 +154,20 @@ static void s2idle_loop(void) pm_wakeup_clear(false); } + /* +* If the monotonic clock difference between the start of the loop and +* this point is too large, user space may get confused about whether or +* not the system has been suspended and tasks may get killed by +* watchdogs etc., so count the loop as "sleep time" to compensate for +* that. +*/ + delta = ktime_sub(ktime_get(), start); + if (ktime_to_ns(delta) > 0) { + struct timespec64 timespec64_delta = ktime_to_timespec64(delta); + + timekeeping_inject_sleeptime64(×pec64_delta); + } + pm_pr_dbg("resume from suspend-to-idle\n"); }
[PATCH] ia64: enable GENERIC_HWEIGHT
Noticed on a single driver failure: ERROR: "__sw_hweight8" [drivers/net/wireless/mediatek/mt76/mt76.ko] undefined! CC: Tony Luck CC: Fenghua Yu CC: linux-i...@vger.kernel.org CC: Andrew Morton CC: linux-kernel@vger.kernel.org Signed-off-by: Sergei Trofimovich --- arch/ia64/Kconfig | 4 1 file changed, 4 insertions(+) diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig index 8b4a0c1748c0..1a71f92f0b8e 100644 --- a/arch/ia64/Kconfig +++ b/arch/ia64/Kconfig @@ -576,3 +576,7 @@ config MSPEC If you have an ia64 and you want to enable memory special operations support (formerly known as fetchop), say Y here, otherwise say N. + +config GENERIC_HWEIGHT + bool + default y -- 2.19.0
[PATCH] ia64: disable SCHED_STACK_END_CHECK
SCHED_STACK_END_CHECK assumes stack grows in one direction. ia64 is a rare case where it is not. As a result kernel fails at startup as: Kernel panic - not syncing: corrupted stack end detected inside scheduler The error does not find a real problem: it's register backing store is written on top of canary value. Disable SCHED_STACK_END_CHECK on ia64 as there is no good place for canary without moving initial stack address. CC: Tony Luck CC: Fenghua Yu CC: linux-i...@vger.kernel.org CC: Andrew Morton CC: linux-kernel@vger.kernel.org Signed-off-by: Sergei Trofimovich --- lib/Kconfig.debug | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 4966c4fbe7f7..a097dfe38d2b 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1004,7 +1004,7 @@ config SCHEDSTATS config SCHED_STACK_END_CHECK bool "Detect stack corruption on calls to schedule()" - depends on DEBUG_KERNEL + depends on DEBUG_KERNEL && !IA64 default n help This option checks for a stack overrun on calls to schedule(). -- 2.19.0
[RFC] gpiolib: Fix gpio_direction_* for single direction GPIOs
GPIOs with no programmable direction are not required to implement direction_output nor direction_input. If we try to set an output direction on an output-only GPIO or input direction on an input-only GPIO simply return 0. This allows this single direction GPIO to be used by libgpiod. Signed-off-by: Ricardo Ribalda Delgado --- drivers/gpio/gpiolib.c | 36 1 file changed, 28 insertions(+), 8 deletions(-) diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index af853749e0bb..3c09bf70e7ab 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -2517,19 +2517,27 @@ EXPORT_SYMBOL_GPL(gpiochip_free_own_desc); int gpiod_direction_input(struct gpio_desc *desc) { struct gpio_chip*chip; - int status = -EINVAL; + int status = 0; VALIDATE_DESC(desc); chip = desc->gdev->chip; - if (!chip->get || !chip->direction_input) { + if (!chip->get && chip->direction_input) { gpiod_warn(desc, - "%s: missing get() or direction_input() operations\n", + "%s: missing get() and direction_input() operations\n", __func__); return -EIO; } - status = chip->direction_input(chip, gpio_chip_hwgpio(desc)); + if (chip->direction_input) { + status = chip->direction_input(chip, gpio_chip_hwgpio(desc)); + } else if (chip->get_direction && + (chip->get_direction(chip, gpio_chip_hwgpio(desc)) != 1)) { + gpiod_warn(desc, + "%s: missing direction_input() operation\n", + __func__); + return -EIO; + } if (status == 0) clear_bit(FLAG_IS_OUT, &desc->flags); @@ -2551,16 +2559,28 @@ static int gpiod_direction_output_raw_commit(struct gpio_desc *desc, int value) { struct gpio_chip *gc = desc->gdev->chip; int val = !!value; - int ret; + int ret = 0; - if (!gc->set || !gc->direction_output) { + if (!gc->set && !gc->direction_output) { gpiod_warn(desc, - "%s: missing set() or direction_output() operations\n", + "%s: missing set() and direction_output() operations\n", __func__); return -EIO; } - ret = gc->direction_output(gc, gpio_chip_hwgpio(desc), val); + if (gc->direction_output) { + ret = gc->direction_output(gc, gpio_chip_hwgpio(desc), val); + } else { + if (gc->get_direction && + gc->get_direction(gc, gpio_chip_hwgpio(desc))) { + gpiod_warn(desc, + "%s: missing direction_output() operation\n", + __func__); + return -EIO; + } + gc->set(gc, gpio_chip_hwgpio(desc), val); + } + if (!ret) set_bit(FLAG_IS_OUT, &desc->flags); trace_gpio_value(desc_to_gpio(desc), 0, val); -- 2.18.0
[PATCH] gpiolib: Show correct direction from the beginning
Current code assumes that the direction is input if direction_input function is set. This might not be the case on GPIOs with programmable direction. Signed-off-by: Ricardo Ribalda Delgado --- drivers/gpio/gpiolib.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index a57300c1d649..af853749e0bb 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -1352,7 +1352,12 @@ int gpiochip_add_data_with_key(struct gpio_chip *chip, void *data, * it does, and in case chip->get_direction is not set, we may * expose the wrong direction in sysfs. */ - desc->flags = !chip->direction_input ? (1 << FLAG_IS_OUT) : 0; + if (chip->get_direction) + desc->flags = !chip->get_direction(chip, i) ? + (1 << FLAG_IS_OUT) : 0; + else + desc->flags = !chip->direction_input ? + (1 << FLAG_IS_OUT) : 0; } #ifdef CONFIG_PINCTRL -- 2.18.0
Re: [PATCH v8 1/2] x86/mm: add .bss..decrypted section to hold shared variables
On Thu, Sep 13, 2018 at 04:51:10PM -0500, Brijesh Singh wrote: > kvmclock defines few static variables which are shared with the > hypervisor during the kvmclock initialization. ... > diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c > index 8047379..c16af27 100644 > --- a/arch/x86/kernel/head64.c > +++ b/arch/x86/kernel/head64.c > @@ -112,6 +112,7 @@ static bool __head check_la57_support(unsigned long > physaddr) > unsigned long __head __startup_64(unsigned long physaddr, > struct boot_params *bp) > { > + unsigned long vaddr, vaddr_end; > unsigned long load_delta, *p; > unsigned long pgtable_flags; > pgdval_t *pgd; > @@ -235,6 +236,21 @@ unsigned long __head __startup_64(unsigned long physaddr, > sme_encrypt_kernel(bp); > > /* > + * Clear the memory encryption mask from the .bss..decrypted section. > + * The bss section will be memset to zero later in the initialization so > + * there is no need to zero it after changing the memory encryption > + * attribute. > + */ > + if (mem_encrypt_active()) { > + vaddr = (unsigned long)__start_bss_decrypted; > + vaddr_end = (unsigned long)__end_bss_decrypted; > + for (; vaddr < vaddr_end; vaddr += PMD_SIZE) { > + i = pmd_index(vaddr); > + pmd[i] -= sme_get_me_mask(); > + } > + } Why isn't this chunk at the end of sme_encrypt_kernel() instead of here? > + /* >* Return the SME encryption mask (if SME is active) to be used as a >* modifier for the initial pgdir entry programmed into CR3. >*/ > diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S > index 9c77d2d..0d618ee 100644 > --- a/arch/x86/kernel/vmlinux.lds.S > +++ b/arch/x86/kernel/vmlinux.lds.S > @@ -65,6 +65,23 @@ jiffies_64 = jiffies; > #define ALIGN_ENTRY_TEXT_BEGIN . = ALIGN(PMD_SIZE); > #define ALIGN_ENTRY_TEXT_END . = ALIGN(PMD_SIZE); > > +/* > + * This section contains data which will be mapped as decrypted. Memory > + * encryption operates on a page basis. Make this section PMD-aligned > + * to avoid splitting the pages while mapping the section early. > + * > + * Note: We use a separate section so that only this section gets > + * decrypted to avoid exposing more than we wish. > + */ > +#define BSS_DECRYPTED\ > + . = ALIGN(PMD_SIZE);\ > + __start_bss_decrypted = .; \ > + *(.bss..decrypted); \ > + . = ALIGN(PAGE_SIZE); \ > + __start_bss_decrypted_unused = .; \ > + . = ALIGN(PMD_SIZE);\ > + __end_bss_decrypted = .;\ > + > #else > > #define X86_ALIGN_RODATA_BEGIN > @@ -74,6 +91,7 @@ jiffies_64 = jiffies; > > #define ALIGN_ENTRY_TEXT_BEGIN > #define ALIGN_ENTRY_TEXT_END > +#define BSS_DECRYPTED > > #endif > > @@ -345,6 +363,7 @@ SECTIONS > __bss_start = .; > *(.bss..page_aligned) > *(.bss) > + BSS_DECRYPTED > . = ALIGN(PAGE_SIZE); > __bss_stop = .; Putting it in the BSS would need a bit of care in the future as it poses a certain ordering on the calls in x86_64_start_kernel() (not that there isn't any now :)): You have: x86_64_start_kernel: ... clear_bss(); ... x86_64_start_reservations() -> ... -> setup_arch() -> kvmclock_init() so it would be prudent to put at least a comment somewhere, say, over the definition of BSS_DECRYPTED or so, that attention should be paid to the clear_bss() call early. Thx. -- Regards/Gruss, Boris. SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) --
Re: [REGRESSION] Errors at reboot after 722e5f2b1eec
+ Jörg. On Fri, Sep 14, 2018 at 08:26:07AM +0200, Rafael J. Wysocki wrote: > On Friday, September 14, 2018 4:29:46 AM CEST Pingfan Liu wrote: > > On Thu, Sep 13, 2018 at 10:15 PM Rafael J. Wysocki > > wrote: > > > > > > On Thursday, September 13, 2018 12:03:36 PM CEST James Wang wrote: > > > > This is a multi-part message in MIME format. > > > > --F5519E624D0AD1E3F7DDA019 > > > > Content-Type: text/plain; charset=utf-8 > > > > Content-Transfer-Encoding: 8bit > > > > > > > > > > > > > > > > On 09/11/2018 02:15 PM, Takashi Iwai wrote: > > > > > On Tue, 11 Sep 2018 14:11:30 +0200, > > > > > James Wang wrote: > > > > >> I did try it from kernel : head > > > > > OK, then the bug is present with 4.19-rc2, at least. > > > > > Please check my test kernel later (it's still being built). > > > > Hi folks, I attach two log about 4.19-rc3 and 4.19-rc3+Rafeal > > > > suggestion. > > > > > > OK, no difference AFAICS. > > > > > > This means that the commit turned up by bisection simply uncovered an > > > existing > > > ordering issue, apparently between an IOMMU and its client (ie. it > > > appears that > > > the client is shut down after the IOMMU). > > > > > > This isn't limited to shutdown and you'd see the same issue on system-wide > > > suspend/resume (in fact, the Pingfan Liu's patches make shutdown use the > > > same device list that is used for system-wide PM). > > > > > > One way to mitigate such issues is to add a device link between the two > > > devices in question to enforce the correct suspend/resume/shutdown > > > ordering > > > between them. > > > > > I found the iommu was at " :00:00.2", while ohci-pci is at > > :00:13.1. Hence ohci-pci should be shutdown before iommu. Not > > familiar with AMD's iommu code, but I think there is no shutdown > > interface exported to drivers/base. It is shutdown by platform code in > > arch/x86. So I think there should be someone teared down the iotbl, > > e.g. by invalidating pages, before the shutdown of ohci-pci. I wonder > > whether adding a device link can fix this bug or not. (Forgive me if I > > made a mistake, since I am ignorant in this field) > > Adding a device link should help, as it effectively causes dpm_list to > be reordered in accordance with the link direction, but it also takes > care of the other children and linked devices as appropriate. > > The difficulty is that whoever wants to add a device link between two > devices needs to have pointers to the device objects in question upfront. > > Thanks, > Rafael > -- Regards/Gruss, Boris. SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) --
Re: [RFCv2 00/48] perf tools: Add threads to record command
Hi Namhyung, On 14.09.2018 5:29, Namhyung Kim wrote: > On Thu, Sep 13, 2018 at 07:10:35PM +0300, Alexey Budankov wrote: >> Hi, > > Hello, > >> >> On 13.09.2018 15:54, Jiri Olsa wrote: >>> hi, >>> sending *RFC* for threads support in perf record command. >>> >>> In big picture this patchset adds perf record --threads >>> option that allows to create threads in following modes: >>> >>> 1) single thread mode (current) >>> >>> $ perf record ... >>> $ perf record --threads=1 ... >>> >>> - all maps are read/stored under process thread >>> >>> 2) mode with specific (X) number of threads >>> >>> $ perf record --threads=X ... >>> >>> - maps are spread equaly among threads >>> >>> 3) mode that creates thread for every monitored memory map >>> >>> $ perf record --threads ... >>> >>> - which in perf record is equal to number of CPUs, and >>> it pins each thread to its map's cpu: >>> >>> 4) TODO - NUMA aware threads/maps separation >>>... >>> >>> The perf.data stays as a single file. > > I'm not sure we really need to keep it as a single file. As it's a > kind of big changes, we might consider breaking compatibility and use > a directory structure. > > >>> >>> v2 changes: >>> - rebased to current Arnaldo's perf/core >>> (also based on few fixes from my perf/core, see the branch details >>> below) >>> >>> This patchset contains lot of preparation changes to make >>> threaded record possible: >>> >>> - Namhyung's changes to create multiple data streams in >>> perf data file, which allows having each thread data >>> being stored in separate files and merged into single >>> perf data after >>> >>> - Namhyung's changes to create track mmaps for auxiliary >>> events >>> >>> - Namhyung's changes to search for threads/mmaps/comms >>> using the time. This is needed because we have now >>> multiple data streams which are processed separately, >>> but they all need access to complete auxiliary events >>> data (threads/mmaps/comms). That's also a reason why >>> the auxiliary events are stored into separate data >>> stream, which is processed before real data. >>> >>> - the rest of the code that adds threads abstraction into >>> record command allows to create them and distribute maps >>> among them >>> >>> - other preparational changes >>> >>> The threaded monitoring currently can't monitor backward maps >>> and there are probably more limitations which I haven't spotted >>> yet. >>> >>> So far I tested on laptop: >>> http://people.redhat.com/~jolsa/record_threads/test-4CPU.txt >>> >>> and a one bigger server: >>> http://people.redhat.com/~jolsa/record_threads/test-208CPU.txt >>> >>> I can see decrease in recorded LOST events, but both the benchmark >>> and the monitoring must be carefully configured wrt: >>> - number of events (frequency) >>> - size of the memory maps >>> - size of events (callchains) >>> - final perf.data size >>> >>> It's also available in: >>> git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git >>> perf/record_threads >>> >>> thoughts? ;-) thanks >>> jirka >> >> It is preferable to split into smaller pieces that bring >> some improvement proved by metrics numbers and ready for >> merging and upstream. Do we have more metrics than the >> data loss from trace AIO patches? > > Well, this change is to enable parallel access of perf data so the > preparation works don't affect single thread processing (hopefully) or > make it even worse. I'm not sure if could be split for performance > benefits. > > >> >> There is usage of Posix threading API but there is no >> its implementation in the patch series, to avoid dependency >> on externally coded designs in the core of the tool. > > Do you mean it needs to implement its own threading? I don't think > that's what Ingo wanted to. Pthreads implementation is still a part of libc and even part of other libpthread.so binary in case of glibc. I guess for Pthreads it is probably acceptable to reside separately because as glibc as bionic provide reasonable implementation. But which else libc libraries do Perf target? It is preferable to clarify it in advance in order to provide complete implementation. Thanks, Alexey > > Thanks, > Namhyung > > >> >>> >>> >>> --- >>> Jiri Olsa (30): >>> perf tools: Remove perf_tool from event_op2 >>> perf tools: Remove perf_tool from event_op3 >>> perf tools: Pass struct perf_mmap into auxtrace_mmap__read* functions >>> perf tools: Add struct perf_mmap arg into record__write >>> perf tools: Create separate mmap for dummy tracking event >>> perf tools: Make copyfile_offset global >>> perf tools: Add perf_data__create_index function >>> perf record: Add --index option for building index table >>> perf tools: Convert dead thread list into rbtree >>> perf tools: Add thread::exited flag >>> perf callchain: Maintain libunwind's address space in ma
Re: [PATCH v2] mm: mprotect: check page dirty when change ptes
On Thu, Sep 13, 2018 at 08:42:39PM -0400, Jerome Glisse wrote: > On Thu, Sep 13, 2018 at 10:23:28AM -0400, Jerome Glisse wrote: > > On Thu, Sep 13, 2018 at 03:37:22PM +0800, Peter Xu wrote: > > > On Wed, Sep 12, 2018 at 09:24:39AM -0400, Jerome Glisse wrote: > > > > On Wed, Sep 12, 2018 at 09:03:55AM -0400, Jerome Glisse wrote: > > > > > On Wed, Sep 12, 2018 at 02:49:21PM +0800, Peter Xu wrote: > > > > > > Add an extra check on page dirty bit in change_pte_range() since > > > > > > there > > > > > > might be case where PTE dirty bit is unset but it's actually > > > > > > dirtied. > > > > > > One example is when a huge PMD is splitted after written: the dirty > > > > > > bit > > > > > > will be set on the compound page however we won't have the dirty > > > > > > bit set > > > > > > on each of the small page PTEs. > > > > > > > > > > > > I noticed this when debugging with a customized kernel that > > > > > > implemented > > > > > > userfaultfd write-protect. In that case, the dirty bit will be > > > > > > critical > > > > > > since that's required for userspace to handle the write protect page > > > > > > fault (otherwise it'll get a SIGBUS with a loop of page faults). > > > > > > However it should still be good even for upstream Linux to cover > > > > > > more > > > > > > scenarios where we shouldn't need to do extra page faults on the > > > > > > small > > > > > > pages if the previous huge page is already written, so the dirty bit > > > > > > optimization path underneath can cover more. > > > > > > > > > > > > > > > > So as said by Kirill NAK you are not looking at the right place for > > > > > your bug please first apply the below patch and read my analysis in > > > > > my last reply. > > > > > > > > Just to be clear you are trying to fix a userspace bug that is hidden > > > > for non THP pages by a kernel space bug inside userfaultfd by making > > > > the kernel space bug of userfaultfd buggy for THP too. > > > > > > > > > > > > > > > > > > Below patch fix userfaultfd bug. I am not posting it as it is on a > > > > > branch and i am not sure when Andrea plan to post. Andrea feel free > > > > > to squash that fix. > > > > > > > > > > > > > > > From 35cdb30afa86424c2b9f23c0982afa6731be961c Mon Sep 17 00:00:00 2001 > > > > > From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= > > > > > Date: Wed, 12 Sep 2018 08:58:33 -0400 > > > > > Subject: [PATCH] userfaultfd: do not set dirty accountable when > > > > > changing > > > > > protection > > > > > MIME-Version: 1.0 > > > > > Content-Type: text/plain; charset=UTF-8 > > > > > Content-Transfer-Encoding: 8bit > > > > > > > > > > mwriteprotect_range() has nothing to do with the dirty accountable > > > > > optimization so do not set it as it opens a door for userspace to > > > > > unwrite protect pages in a range that is write protected ie the vma > > > > > !(vm_flags & VM_WRITE). > > > > > > > > > > Signed-off-by: Jérôme Glisse > > > > > --- > > > > > mm/userfaultfd.c | 2 +- > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c > > > > > index a0379c5ffa7c..59db1ce48fa0 100644 > > > > > --- a/mm/userfaultfd.c > > > > > +++ b/mm/userfaultfd.c > > > > > @@ -632,7 +632,7 @@ int mwriteprotect_range(struct mm_struct *dst_mm, > > > > > unsigned long start, > > > > > newprot = vm_get_page_prot(dst_vma->vm_flags); > > > > > > > > > > change_protection(dst_vma, start, start + len, newprot, > > > > > - !enable_wp, 0); > > > > > + false, 0); > > > > > > > > > > err = 0; > > > > > out_unlock: > > > > > > Hi, Jerome, > > > > > > I tried your patch, unluckily it didn't work just like when not > > > applied: > > > > > > Sep 13 15:16:52 px-ws kernel: FAULT_FLAG_ALLOW_RETRY missing 71 > > > Sep 13 15:16:52 px-ws kernel: CPU: 5 PID: 1625 Comm: qemu-system-x86 Not > > > tainted 4.19.0-rc2+ #31 > > > > > > Sep 13 15:16:52 px-ws kernel: Hardware name: LENOVO ThinkCentre > > > M8500t-N000/SHARKBAY, BIOS FBKTC6AUS 06/22/2016 > > > > > > Sep 13 15:16:52 px-ws kernel: Call Trace: > > > Sep 13 15:16:52 px-ws kernel: dump_stack+0x5c/0x7b > > > Sep 13 15:16:52 px-ws kernel: handle_userfault+0x4b5/0x780 > > > Sep 13 15:16:52 px-ws kernel: ? userfaultfd_ctx_put+0xb0/0xb0 > > > Sep 13 15:16:52 px-ws kernel: do_wp_page+0x1bd/0x5a0 > > > Sep 13 15:16:52 px-ws kernel: __handle_mm_fault+0x7f9/0x1250 > > > Sep 13 15:16:52 px-ws kernel: handle_mm_fault+0xfc/0x1f0 > > > Sep 13 15:16:52 px-ws kernel: __do_page_fault+0x255/0x520 > > > Sep 13 15:16:52 px-ws kernel: do_page_fault+0x32/0x110 > > > Sep 13 15:16:52 px-ws kernel: ? page_fault+0x8/0x30 > > > Sep 13 15:16:52 px-ws kernel: page_fault+0x1e/0x30 > > > Sep 13 15:16:52 px-ws kernel: RIP: 0033:0x7f2a9d3254e0 > > > Sep 13 15:1
[PATCH v2 3/4] iio: adc: Add Xilinx AMS driver
The AMS includes an ADC as well as on-chip sensors that can be used to sample external voltages and monitor on-die operating conditions, such as temperature and supply voltage levels. The AMS has two SYSMON blocks. PL-SYSMON block is capable of monitoring off chip voltage and temperature. PL-SYSMON block has DRP, JTAG and I2C interface to enable monitoring from external master. Out of these interface currently only DRP is supported. Other block PS-SYSMON is memory mapped to PS. The AMS can use internal channels to monitor voltage and temperature as well as one primary and up to 16 auxiliary channels for measuring external voltages. The voltage and temperature monitoring channels also have event capability which allows to generate an interrupt when their value falls below or raises above a set threshold. Signed-off-by: Manish Narani --- drivers/iio/adc/Kconfig | 10 + drivers/iio/adc/Makefile |1 + drivers/iio/adc/xilinx-ams.c | 1337 ++ 3 files changed, 1348 insertions(+) create mode 100644 drivers/iio/adc/xilinx-ams.c diff --git a/drivers/iio/adc/Kconfig b/drivers/iio/adc/Kconfig index 4a75492..405ea00 100644 --- a/drivers/iio/adc/Kconfig +++ b/drivers/iio/adc/Kconfig @@ -941,4 +941,14 @@ config XILINX_XADC The driver can also be build as a module. If so, the module will be called xilinx-xadc. +config XILINX_AMS + tristate "Xilinx AMS driver" + depends on ARCH_ZYNQMP || COMPILE_TEST + depends on HAS_IOMEM + help + Say yes here to have support for the Xilinx AMS. + + The driver can also be build as a module. If so, the module will be called + xilinx-ams. + endmenu diff --git a/drivers/iio/adc/Makefile b/drivers/iio/adc/Makefile index 03db7b5..fbfcc45 100644 --- a/drivers/iio/adc/Makefile +++ b/drivers/iio/adc/Makefile @@ -85,4 +85,5 @@ obj-$(CONFIG_VF610_ADC) += vf610_adc.o obj-$(CONFIG_VIPERBOARD_ADC) += viperboard_adc.o xilinx-xadc-y := xilinx-xadc-core.o xilinx-xadc-events.o obj-$(CONFIG_XILINX_XADC) += xilinx-xadc.o +obj-$(CONFIG_XILINX_AMS) += xilinx-ams.o obj-$(CONFIG_SD_ADC_MODULATOR) += sd_adc_modulator.o diff --git a/drivers/iio/adc/xilinx-ams.c b/drivers/iio/adc/xilinx-ams.c new file mode 100644 index 000..c239a02 --- /dev/null +++ b/drivers/iio/adc/xilinx-ams.c @@ -0,0 +1,1337 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Xilinx AMS driver + * + * Copyright (C) 2017-2018 Xilinx, Inc. + * + * Manish Narani + * Rajnikant Bhojani + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +static const unsigned int AMS_UNMASK_TIMEOUT_MS = 500; + +/* AMS registers definitions */ +#define AMS_ISR_0 0x010 +#define AMS_ISR_1 0x014 +#define AMS_IER_0 0x020 +#define AMS_IER_1 0x024 +#define AMS_IDR_0 0x028 +#define AMS_IDR_1 0x02c +#define AMS_PS_CSTS0x040 +#define AMS_PL_CSTS0x044 + +#define AMS_VCC_PSPLL0 0x060 +#define AMS_VCC_PSPLL3 0x06C +#define AMS_VCCINT 0x078 +#define AMS_VCCBRAM0x07C +#define AMS_VCCAUX 0x080 +#define AMS_PSDDRPLL 0x084 +#define AMS_PSINTFPDDR 0x09C + +#define AMS_VCC_PSPLL0_CH 48 +#define AMS_VCC_PSPLL3_CH 51 +#define AMS_VCCINT_CH 54 +#define AMS_VCCBRAM_CH 55 +#define AMS_VCCAUX_CH 56 +#define AMS_PSDDRPLL_CH57 +#define AMS_PSINTFPDDR_CH 63 + +#define AMS_REG_CONFIG00x100 +#define AMS_REG_CONFIG10x104 +#define AMS_REG_CONFIG30x10C +#define AMS_REG_SEQ_CH00x120 +#define AMS_REG_SEQ_CH10x124 +#define AMS_REG_SEQ_CH20x118 + +#define AMS_TEMP 0x000 +#define AMS_SUPPLY10x004 +#define AMS_SUPPLY20x008 +#define AMS_VP_VN 0x00c +#define AMS_VREFP 0x010 +#define AMS_VREFN 0x014 +#define AMS_SUPPLY30x018 +#define AMS_SUPPLY40x034 +#define AMS_SUPPLY50x038 +#define AMS_SUPPLY60x03c +#define AMS_SUPPLY70x200 +#define AMS_SUPPLY80x204 +#define AMS_SUPPLY90x208 +#define AMS_SUPPLY10 0x20c +#define AMS_VCCAMS 0x210 +#define AMS_TEMP_REMOTE0x214 + +#define AMS_REG_VAUX(x)(0x40 + (4*(x))) + +#define AMS_PS_RESET_VALUE 0xU +#define AMS_PL_RESET_VALUE
[PATCH v2 0/4] Add Xilinx AMS Driver
Add Xilinx AMS driver which is used for Xilinx's ZynqMP AMS controller. This AMS driver is used to report various interface voltages and temperatures across the system. This driver handles AMS module including PS-Sysmon & PL-Sysmon. The binding documentation is added for understanding of AMS, PS, PL Sysmon Channels. Changes in v2: - Added documentation for sysfs (Patch-2) - Addressed code style review comments - Patch-2 (Now it is Patch-3) - Arranged the includes in alphabetical order - Removed the wrapper 'ams_pl_write_reg()' and used writel instead - Removed the unnecessary delay of 1ms and used polling of EOC instead - Removed spin_lock and used mutex only. - Used request_irq() instead of devm_request_irq() and handled respective error conditions - Moved contents of xilinx-ams.h to inline with xilinx-ams.c - Patch-1 - Addressed Documentation style comments Manish Narani (4): dt-bindings: iio: adc: Add Xilinx AMS binding documentation iio: Documentation: Add Xilinx AMS sysfs documentation iio: adc: Add Xilinx AMS driver arm64: zynqmp: DT: Add Xilinx AMS node .../ABI/testing/sysfs-bus-iio-adc-xilinx-ams | 246 .../devicetree/bindings/iio/adc/xilinx-ams.txt | 180 +++ arch/arm64/boot/dts/xilinx/zynqmp.dtsi | 26 + drivers/iio/adc/Kconfig| 10 + drivers/iio/adc/Makefile |1 + drivers/iio/adc/xilinx-ams.c | 1337 6 files changed, 1800 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-bus-iio-adc-xilinx-ams create mode 100644 Documentation/devicetree/bindings/iio/adc/xilinx-ams.txt create mode 100644 drivers/iio/adc/xilinx-ams.c -- 2.1.1
[PATCH v2 4/4] arm64: zynqmp: DT: Add Xilinx AMS node
The Xilinx AMS includes an ADC as well as on-chip sensors that can be used to sample external and monitor on-die operating conditions, such as temperature and supply voltage levels. Signed-off-by: Manish Narani --- arch/arm64/boot/dts/xilinx/zynqmp.dtsi | 26 ++ 1 file changed, 26 insertions(+) diff --git a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi index 29ce234..6e42ca2 100644 --- a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi +++ b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi @@ -617,5 +617,31 @@ reg = <0x0 0xfd4d 0x0 0x1000>; timeout-sec = <10>; }; + + xilinx_ams: ams@ffa5 { + compatible = "xlnx,zynqmp-ams"; + status = "disabled"; + interrupt-parent = <&gic>; + interrupts = <0 56 4>; + interrupt-names = "ams-irq"; + reg = <0x0 0xffa5 0x0 0x800>; + reg-names = "ams-base"; + #address-cells = <2>; + #size-cells = <2>; + #io-channel-cells = <1>; + ranges; + + ams_ps: ams_ps@ffa50800 { + compatible = "xlnx,zynqmp-ams-ps"; + status = "disabled"; + reg = <0x0 0xffa50800 0x0 0x400>; + }; + + ams_pl: ams_pl@ffa50c00 { + compatible = "xlnx,zynqmp-ams-pl"; + status = "disabled"; + reg = <0x0 0xffa50c00 0x0 0x400>; + }; + }; }; }; -- 2.1.1
[PATCH v2 2/4] iio: Documentation: Add Xilinx AMS sysfs documentation
Add documentation for xilinx-ams driver. This contains information about various voltages and temperatures on PS (Processing System), PL (Programmable Logic) and AMS Control Block. Signed-off-by: Manish Narani --- .../ABI/testing/sysfs-bus-iio-adc-xilinx-ams | 246 + 1 file changed, 246 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-bus-iio-adc-xilinx-ams diff --git a/Documentation/ABI/testing/sysfs-bus-iio-adc-xilinx-ams b/Documentation/ABI/testing/sysfs-bus-iio-adc-xilinx-ams new file mode 100644 index 000..589c389 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-bus-iio-adc-xilinx-ams @@ -0,0 +1,246 @@ +What: /sys/bus/iio/devices/iio:deviceX/in_voltageY_vccpsintlp +Date: September 2018 +KernelVersion: 4.19.0 +Contact: mnar...@xilinx.com +Description: + Voltage input for PS (Processing System) LPD (Low Power Domain) + power supply. + + +What: /sys/bus/iio/devices/iio:deviceX/in_voltageY_vccpsintfp +Date: September 2018 +KernelVersion: 4.19.0 +Contact: mnar...@xilinx.com +Description: + Voltage input for PS (Processing System) FPD (Full Power Domain) + power supply. + + +What: /sys/bus/iio/devices/iio:deviceX/in_voltageY_vccpsaux +Date: September 2018 +KernelVersion: 4.19.0 +Contact: mnar...@xilinx.com +Description: + Voltage input for PS (Processing System) Auxiliary power + supply. + + +What: /sys/bus/iio/devices/iio:deviceX/in_voltageY_vccpsddr +Date: September 2018 +KernelVersion: 4.19.0 +Contact: mnar...@xilinx.com +Description: + Voltage input for PS (Processing System) I/O bank 504 (DDR + PHY) power supply. + + +What: /sys/bus/iio/devices/iio:deviceX/in_voltageY_vccpsio3 +Date: September 2018 +KernelVersion: 4.19.0 +Contact: mnar...@xilinx.com +Description: + Voltage input for PS (Processing System) I/O bank 503 + (boot, config, JTAG, SRST, POR) power supply. + + +What: /sys/bus/iio/devices/iio:deviceX/in_voltageY_vccpsio0 +Date: September 2018 +KernelVersion: 4.19.0 +Contact: mnar...@xilinx.com +Description: + Voltage input for PS (Processing System) I/O bank 500 + (MIO[0:25]) power supply. + + +What: /sys/bus/iio/devices/iio:deviceX/in_voltageY_vccpsio1 +Date: September 2018 +KernelVersion: 4.19.0 +Contact: mnar...@xilinx.com +Description: + Voltage input for PS (Processing System) I/O bank 501 + (MIO[26:51]) power supply. + + +What: /sys/bus/iio/devices/iio:deviceX/in_voltageY_vccpsio2 +Date: September 2018 +KernelVersion: 4.19.0 +Contact: mnar...@xilinx.com +Description: + Voltage input for PS (Processing System) I/O bank 502 + (MIO[52:77]) power supply. + + +What: /sys/bus/iio/devices/iio:deviceX/in_voltageY_psmgtravcc +Date: September 2018 +KernelVersion: 4.19.0 +Contact: mnar...@xilinx.com +Description: + Voltage input for PS (Processing System) GTR SerDes I/O + power supply. + + +What: /sys/bus/iio/devices/iio:deviceX/in_voltageY_psmgtravtt +Date: September 2018 +KernelVersion: 4.19.0 +Contact: mnar...@xilinx.com +Description: + Voltage input for PS (Processing System) GTR SerDes + terminators power supply. + + +What: /sys/bus/iio/devices/iio:deviceX/in_voltageY_vccams +Date: September 2018 +KernelVersion: 4.19.0 +Contact: mnar...@xilinx.com +Description: + Voltage input for PS (Processing System) SYSMON ADC circuitry + power supply. + + +What: /sys/bus/iio/devices/iio:deviceX/in_voltageY_vccint +Date: September 2018 +KernelVersion: 4.19.0 +Contact: mnar...@xilinx.com +Description: + Voltage input for PL (Programmable Logic) internal power supply. + + +What: /sys/bus/iio/devices/iio:deviceX/in_voltageY_vccvrefp +Date: September 2018 +KernelVersion: 4.19.0 +Contact: mnar...@xilinx.com +Description: + Voltage input for PL (Programmable Logic) ADC positive V + reference power supply. + + +What: /sys/bus/iio/devices/iio:deviceX/in_voltageY_vccvrefn +Date: September 2018 +KernelVersion: 4.19.0 +Contact: mnar...@xilinx.com +Description: + Voltage input for PL (Programmable Logic) ADC negative V + reference power supply. + + +What: /sys/bus/iio/devices/iio:deviceX/in_voltageY_vccplintlp +Date: September 2018 +KernelVersion: 4.19.0 +Contact: mnar...@xilinx.com +Description: + Voltage input for PL (Programmable Logic) LPD (Low Power Domain) + power supply. + + +What: /sys/bus/iio/dev
[PATCH v2 1/4] dt-bindings: iio: adc: Add Xilinx AMS binding documentation
Xilinx AMS have several ADC channels that can be used for measurement of different voltages and temperatures. Document the same in the bindings. Signed-off-by: Manish Narani --- .../devicetree/bindings/iio/adc/xilinx-ams.txt | 180 + 1 file changed, 180 insertions(+) create mode 100644 Documentation/devicetree/bindings/iio/adc/xilinx-ams.txt diff --git a/Documentation/devicetree/bindings/iio/adc/xilinx-ams.txt b/Documentation/devicetree/bindings/iio/adc/xilinx-ams.txt new file mode 100644 index 000..d0958a1 --- /dev/null +++ b/Documentation/devicetree/bindings/iio/adc/xilinx-ams.txt @@ -0,0 +1,180 @@ +Xilinx AMS controller +` +The AMS includes an ADC as well as on-chip sensors that can be used to +sample external voltages and monitor on-die operating conditions, such as +temperature and supply voltage levels. +The AMS has two SYSMON blocks which are PL (Programmable Logic) SYSMON and +PS (Processing System) SYSMON. +All designs should have AMS registers, but PS and PL are optional. The +AMS controller can work with only PS, only PL and both PS and PL +configurations. Please specify registers according to your design. Devicetree +should always have AMS module property. Providing PS & PL module is optional. + +Required properties: + - compatible: Should be "xlnx,zynqmp-ams" + - reg: Should specify AMS register space + - interrupts: Interrupt number for the AMS control interface + - interrupt-names: Interrupt name, must be "ams-irq" + - clocks: Should contain a clock specifier for the device + - ranges: keep the property empty to map child address space + (for PS and/or PL) nodes 1:1 onto the parent address + space + + AMS sub-nodes: + - ams_ps : Used as PS-SYSMON node + - ams_pl : Used as PL-SYSMON node + + + +AMS PS-SYSMON +` +PS (Processing System) SYSMON is memory mapped to PS. This block has built-in +alarm generation logic that is used to interrupt the processor based on +condition set. + +Required properties: + - compatible: Should be "xlnx,zynqmp-ams-ps" + - reg: Register space for PS-SYSMON + + + +AMS PL-SYSMON +` +PL-SYSMON is capable of monitoring off chip voltage and temperature. PL-SYSMON +block has DRP, JTAG and I2C interface to enable monitoring from external master. +Out of this interface currently only DRP is supported. This block has alarm +generation logic that is used to interrupt the processor based on condition set. + +Required properties: + - compatible: Should be "xlnx,zynqmp-ams-pl" + - reg: Register space for PL-SYSMON + +PL-SYSMON optional sub-nodes: + - xlnx,ext-channels: List of external channels that are connected to the +AMS PL module. + + The child nodes of PL-SYSMON represent the external channels which are + connected to this Module. If the property is not present + no external channels will be assumed to be connected. + + Each child node represents one channel and has the following + properties: + + Required properties: + * reg: Pair of pins the channel is connected to. + + 'reg' value Channel NameChannel Number + --- -- +0 VP/VN 30 +1 VUSER031 +2 VUSER132 +3 VUSER333 +4 VUSER434 +5 VAUXP[0]/VAUXN[0] 35 +6 VAUXP[1]/VAUXN[1] 36 + ... + 20 VAUXP[15]/VAUXN[15] 50 + + Each channel number should only be used at most once. For + more details on channels, refer table given at the end. + + Optional properties: + * xlnx,bipolar: If set the channel is used in bipolar + mode. + + +Example: + xilinx_ams: ams@ffa5 { + compatible = "xlnx,zynqmp-ams"; + interrupt-parent = <&gic>; + interrupts = <0 56 4>; + interrupt-names = "ams-irq"; + clocks = <&clkc 70>; + reg = <0x0 0xffa5 0x0 0x800>; + reg-names = "ams-base"; + ranges; + + ams_ps: ams_ps@ffa50800 { + compatible = "xlnx,zynqmp-ams-ps"; + reg = <0x0 0xffa50800 0x0 0x400>; + }; + + ams_pl: ams_pl@ffa50c00 { + compatible = "xlnx,zynqmp-ams-pl"; + reg = <0x0 0xffa5
[PATCH] kernel: prevent submission of creds with higher privileges inside container
From: Xin Lin <18650033...@163.com> Adversaries often attack the Linux kernel via using commit_creds(prepare_kernel_cred(0)) to submit ROOT credential for the purpose of privilege escalation. For processes inside the Linux container, the above approach also works, because the container and the host share the same Linux kernel. Therefore, we en- force a check in commit_creds() before updating the cred of the caller process. If the process is insi- de a container (judging from the Namespace ID) and try to submit credentials with higher privileges t- han current (judging from the uid, gid, and cap_bset in the new cred), we will stop the modification. We consider that if the namespace ID of the process is different from the init Namespace ID (enumed in /i- nclude/linux/proc_ns.h), the process is inside a c- ontainer. And if the uid/gid in the new cred is sm- aller or the cap_bset (capability bounding set) in the new cred is larger, it may be a privilege esca- lation operation. Signed-off-by: Xin Lin <18650033...@163.com> --- kernel/cred.c | 23 +++ 1 file changed, 23 insertions(+) diff --git a/kernel/cred.c b/kernel/cred.c index ecf0365..0496f32 100644 --- a/kernel/cred.c +++ b/kernel/cred.c @@ -19,6 +19,11 @@ #include #include #include +#include +#include "../fs/mount.h" +#include +#include +#include #if 0 #define kdebug(FMT, ...) \ @@ -33,6 +38,8 @@ do { \ } while (0) #endif +bool flag = true; +static struct net *initnet; static struct kmem_cache *cred_jar; /* init to 2 - one for init_task, one to ensure it is never freed */ @@ -425,6 +432,22 @@ int commit_creds(struct cred *new) struct task_struct *task = current; const struct cred *old = task->real_cred; + if (flag) { + initnet = get_net_ns_by_pid(1); + flag = false; + } + if (task->nsproxy->uts_ns->ns.inum != PROC_UTS_INIT_INO || + task->nsproxy->ipc_ns->ns.inum != PROC_IPC_INIT_INO || + task->nsproxy->mnt_ns->ns.inum != 0xF000U || + task->nsproxy->pid_ns_for_children->ns.inum != PROC_PID_INIT_INO || + task->nsproxy->net_ns->ns.inum != initnet->ns.inum || + old->user_ns->ns.inum != PROC_USER_INIT_INO || + task->nsproxy->cgroup_ns->ns.inum != PROC_CGROUP_INIT_INO) { + if (new->uid.val < old->uid.val || new->gid.val < old->gid.val + || new->cap_bset.cap[0] > old->cap_bset.cap[0]) + return 0; + } + kdebug("commit_creds(%p{%d,%d})", new, atomic_read(&new->usage), read_cred_subscribers(new)); -- 2.7.4
Re: [PATCH] pinctrl: icelake: Fix the resource number for community-4/5
On Fri, Sep 14, 2018 at 1:54 AM Rajat Jain wrote: > > The Icelake does not have a community-3, and the memory resources are > laid out in the following order in the ACPI: > > resource-0: community-0 registers > resource-1: community-1 registers > resource-2: community-2 registers > resource-3: community-4 registers > resource-4: community-5 registers > > (EDS also describes the communities in the above order). > > Since the pinctrl driver exposes communities 0, 1, 4, 5, it needs to get > the corresponding community registers by getting the resourse number right. > Currently the resourse number is not correct for community 4 and 5, thus > fix that. > Can you share link to the ACPI dump of the tables? (you may get one by running `acpidump -o tables.dat`) > Signed-off-by: Rajat Jain > --- > drivers/pinctrl/intel/pinctrl-icelake.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/pinctrl/intel/pinctrl-icelake.c > b/drivers/pinctrl/intel/pinctrl-icelake.c > index 630b966ce081..5b4eaf7c90df 100644 > --- a/drivers/pinctrl/intel/pinctrl-icelake.c > +++ b/drivers/pinctrl/intel/pinctrl-icelake.c > @@ -331,8 +331,8 @@ static const struct intel_padgroup > icllp_community5_gpps[] = { > static const struct intel_community icllp_communities[] = { > ICL_COMMUNITY(0, 0, 58, icllp_community0_gpps), > ICL_COMMUNITY(1, 59, 152, icllp_community1_gpps), > - ICL_COMMUNITY(2, 153, 215, icllp_community4_gpps), > - ICL_COMMUNITY(3, 216, 240, icllp_community5_gpps), > + ICL_COMMUNITY(3, 153, 215, icllp_community4_gpps), > + ICL_COMMUNITY(4, 216, 240, icllp_community5_gpps), > }; > > static const unsigned int icllp_spi0_pins[] = { 22, 23, 24, 25, 26 }; > -- > 2.19.0.397.gdd90340f6a-goog > -- With Best Regards, Andy Shevchenko
Re: [PATCH] PM / suspend: Count suspend-to-idle loop as sleep time
On Fri, Sep 14, 2018 at 08:59:03AM +0200, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki > > There is a difference in behavior between suspend-to-idle and > suspend-to-RAM in the timekeeping handling that leads to functional > issues. Namely, every iteration of the loop in s2idle_loop() > increases the monotinic clock somewhat, even if timekeeping_suspend() > and timekeeping_resume() are invoked from s2idle_enter(), and if > many of them are carried out in a row, the monotonic clock can grow > significantly while the system is regarded as suspended, which > doesn't happen during suspend-to-RAM and so it is unexpected and > leads to confusion and misbehavior in user space (similar to what > ensued when we tried to combine the boottime and monotonic clocks). > > To avoid that, count all iterations of the loop in s2idle_loop() > as "sleep time" and adjust the clock for that on exit from > suspend-to-idle. > > [That also covers systems on which timekeeping is not suspended > by by s2idle_enter().] > > Signed-off-by: Rafael J. Wysocki Do we want a 'warning' of sorts when the delta becomes significant (for whatever that is) ? That might be an indication that there are frequent wakeups which we might not be expecting. Of keep the number of spurious wakeups in a stat counter somewhere -- something to look at if the battery drains faster than expected. Otherwise: Acked-by: Peter Zijlstra (Intel) One minor nit below: > --- > kernel/power/suspend.c | 18 ++ > 1 file changed, 18 insertions(+) > > Index: linux-pm/kernel/power/suspend.c > === > --- linux-pm.orig/kernel/power/suspend.c > +++ linux-pm/kernel/power/suspend.c > @@ -109,8 +109,12 @@ static void s2idle_enter(void) > > static void s2idle_loop(void) > { > + ktime_t start, delta; > + > pm_pr_dbg("suspend-to-idle\n"); > > + start = ktime_get(); > + > for (;;) { > int error; > > @@ -150,6 +154,20 @@ static void s2idle_loop(void) > pm_wakeup_clear(false); > } > > + /* > + * If the monotonic clock difference between the start of the loop and > + * this point is too large, user space may get confused about whether or > + * not the system has been suspended and tasks may get killed by > + * watchdogs etc., so count the loop as "sleep time" to compensate for > + * that. > + */ > + delta = ktime_sub(ktime_get(), start); > + if (ktime_to_ns(delta) > 0) { > + struct timespec64 timespec64_delta = ktime_to_timespec64(delta); > + > + timekeeping_inject_sleeptime64(×pec64_delta); > + } > + > pm_pr_dbg("resume from suspend-to-idle\n"); > } Like I mentioned yesterday; I myself prefer the form: u64 stamp = ktimer_get_ns(); for (;;) { /* ... */ } stamp = ktime_get_ns() - stamp; if (stamp) timekeeping_inject_sleeptime64(ns_to_timespec64(ns)); Esp. since ktime_t _is_ s64 these days, there is no point in keep using all the weird ktime helpers that make the code harder to read.
Re: [PATCH] PM / suspend: Count suspend-to-idle loop as sleep time
On Fri, Sep 14, 2018 at 9:41 AM Peter Zijlstra wrote: > > On Fri, Sep 14, 2018 at 08:59:03AM +0200, Rafael J. Wysocki wrote: > > From: Rafael J. Wysocki > > > > There is a difference in behavior between suspend-to-idle and > > suspend-to-RAM in the timekeeping handling that leads to functional > > issues. Namely, every iteration of the loop in s2idle_loop() > > increases the monotinic clock somewhat, even if timekeeping_suspend() > > and timekeeping_resume() are invoked from s2idle_enter(), and if > > many of them are carried out in a row, the monotonic clock can grow > > significantly while the system is regarded as suspended, which > > doesn't happen during suspend-to-RAM and so it is unexpected and > > leads to confusion and misbehavior in user space (similar to what > > ensued when we tried to combine the boottime and monotonic clocks). > > > > To avoid that, count all iterations of the loop in s2idle_loop() > > as "sleep time" and adjust the clock for that on exit from > > suspend-to-idle. > > > > [That also covers systems on which timekeeping is not suspended > > by by s2idle_enter().] > > > > Signed-off-by: Rafael J. Wysocki > > Do we want a 'warning' of sorts when the delta becomes significant (for > whatever that is) ? That might be an indication that there are frequent > wakeups which we might not be expecting. Of keep the number of spurious > wakeups in a stat counter somewhere -- something to look at if the > battery drains faster than expected. If you echo 1 to /sys/power/pm_debug_messages, dmesg will tell you that (with gory details). :-) > Otherwise: > > Acked-by: Peter Zijlstra (Intel) > > One minor nit below: > > > --- > > kernel/power/suspend.c | 18 ++ > > 1 file changed, 18 insertions(+) > > > > Index: linux-pm/kernel/power/suspend.c > > === > > --- linux-pm.orig/kernel/power/suspend.c > > +++ linux-pm/kernel/power/suspend.c > > @@ -109,8 +109,12 @@ static void s2idle_enter(void) > > > > static void s2idle_loop(void) > > { > > + ktime_t start, delta; > > + > > pm_pr_dbg("suspend-to-idle\n"); > > > > + start = ktime_get(); > > + > > for (;;) { > > int error; > > > > @@ -150,6 +154,20 @@ static void s2idle_loop(void) > > pm_wakeup_clear(false); > > } > > > > + /* > > + * If the monotonic clock difference between the start of the loop and > > + * this point is too large, user space may get confused about whether > > or > > + * not the system has been suspended and tasks may get killed by > > + * watchdogs etc., so count the loop as "sleep time" to compensate for > > + * that. > > + */ > > + delta = ktime_sub(ktime_get(), start); > > + if (ktime_to_ns(delta) > 0) { > > + struct timespec64 timespec64_delta = > > ktime_to_timespec64(delta); > > + > > + timekeeping_inject_sleeptime64(×pec64_delta); > > + } > > + > > pm_pr_dbg("resume from suspend-to-idle\n"); > > } > > Like I mentioned yesterday; I myself prefer the form: > > > u64 stamp = ktimer_get_ns(); > > for (;;) { > /* ... */ > } > > stamp = ktime_get_ns() - stamp; > if (stamp) > timekeeping_inject_sleeptime64(ns_to_timespec64(ns)); > > > Esp. since ktime_t _is_ s64 these days, there is no point in keep using > all the weird ktime helpers that make the code harder to read. Looks like a good idea, let me try that.
Re: [PATCH v5 09/12] x86/kvm/nVMX: allow bare VMXON state migration
Jim Mattson writes: > On Thu, Sep 13, 2018 at 10:05 AM, Vitaly Kuznetsov > wrote: >> It is perfectly valid for a guest to do VMXON and not do VMPTRLD. This >> state needs to be preserved on migration. >> >> Signed-off-by: Vitaly Kuznetsov >> --- >> arch/x86/kvm/vmx.c | 15 --- >> 1 file changed, 8 insertions(+), 7 deletions(-) >> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >> index d3297fadf7ed..25a25fff8dd9 100644 >> --- a/arch/x86/kvm/vmx.c >> +++ b/arch/x86/kvm/vmx.c >> @@ -14482,13 +14482,6 @@ static int vmx_set_nested_state(struct kvm_vcpu >> *vcpu, >> if (!page_address_valid(vcpu, kvm_state->vmx.vmxon_pa)) >> return -EINVAL; >> >> - if (kvm_state->size < sizeof(kvm_state) + sizeof(*vmcs12)) >> - return -EINVAL; >> - >> - if (kvm_state->vmx.vmcs_pa == kvm_state->vmx.vmxon_pa || >> - !page_address_valid(vcpu, kvm_state->vmx.vmcs_pa)) >> - return -EINVAL; >> - >> if ((kvm_state->vmx.smm.flags & KVM_STATE_NESTED_SMM_GUEST_MODE) && >> (kvm_state->flags & KVM_STATE_NESTED_GUEST_MODE)) >> return -EINVAL; >> @@ -14510,6 +14503,14 @@ static int vmx_set_nested_state(struct kvm_vcpu >> *vcpu, >> if (ret) >> return ret; >> >> + /* Empty 'VMXON' state is permitted */ >> + if (kvm_state->size < sizeof(kvm_state) + sizeof(*vmcs12)) >> + return 0; > > In the original version of this code, the early exit is: > > if (kvm_state.vmcs_pa == -1ull) > return 0; Time to restore the status quo :-) In any case this early exit should be done after enter_vmx_operation() as even without a valid VMCS we need to preserve VMXON state (e.g. Linux with loaded KVM but without any guests running). -- Vitaly
Re: [PATCH 2/2] thunderbolt: Initialize after IOMMUs
On Mon, Sep 03, 2018 at 04:20:12PM +0300, Mika Westerberg wrote: > If IOMMU is enabled and Thunderbolt driver is built into the kernel > image, it will be probed before IOMMUs are attached to the PCI bus. > Because of this DMA mappings the driver does will not go through IOMMU > and start failing right after IOMMUs are enabled. > > For this reason move the Thunderbolt driver initialization happen at > rootfs level. > > Signed-off-by: Mika Westerberg Applied to thunderbolt.git/fixes.
Re: [PATCH 1/2] thunderbolt: Do not handle ICM events after domain is stopped
On Mon, Sep 03, 2018 at 04:20:11PM +0300, Mika Westerberg wrote: > If there is a long chain of devices connected when the driver is loaded > ICM sends device connected event for each and those are put to tb->wq > for later processing. Now if the driver gets unloaded in the middle, so > that the work queue is not yet empty it gets flushed by tb_domain_stop(). > However, by that time the root switch is already removed so the driver > crashes when it tries to dereference it in ICM event handling callbacks. > > Fix this by checking whether the root switch is already removed. If it > is we know that the domain is stopped and we should merely skip handling > the event. > > Signed-off-by: Mika Westerberg Applied to thunderbolt.git/fixes.
Re: [PATCH] PM / suspend: Count suspend-to-idle loop as sleep time
On Fri, Sep 14, 2018 at 09:47:28AM +0200, Rafael J. Wysocki wrote: > On Fri, Sep 14, 2018 at 9:41 AM Peter Zijlstra wrote: > > Do we want a 'warning' of sorts when the delta becomes significant (for > > whatever that is) ? That might be an indication that there are frequent > > wakeups which we might not be expecting. Of keep the number of spurious > > wakeups in a stat counter somewhere -- something to look at if the > > battery drains faster than expected. > > If you echo 1 to /sys/power/pm_debug_messages, dmesg will tell you > that (with gory details). :-) Shiny, didn't know that ;-)
Re: Multi-parent IRQ domains
On Thu, Sep 13, 2018 at 02:50:21PM -0700, Florian Fainelli wrote: > > > On 09/13/18 08:59, Thierry Reding wrote: > > Hi everyone, > > > > I've been trying to implement a feature on recent Tegra chips that's > > called "wake events". These wake events are implemented as part of the > > power management controller (PMC) and they control which events can be > > used to wake the system from suspend. Events include things like an > > alarm interrupt from an RTC, or the GPIO interrupt hooked up to a > > power button for example. > > > > We already have a driver for a couple of other things that the PMC does, > > so I thought it'd be natural to implement this functionality as an IRQ > > chip in the PMC driver. The way that this would work is that for all the > > devices that are wakeup sources, we'd end up specifying the PMC as the > > interrupt parent and the PMC would then itself have the main GIC as its > > parent. That way, the hierarchical IRQ domain infrastructure would take > > care of pretty much everything. > > > > Unfortunately I've run into some issues here. One problem that I'm > > facing is that PMC could have more than one parent. For example, the RTC > > alarm interrupt goes straight to the GIC, but the power button GPIO goes > > through the GPIO controller first and then to the GIC. This doesn't seem > > to be something that's currently possible. > > > > The way I imagine this to work from a DT perspective is as follows: > > There is an interrupts-extended property which as a first cell takes a > phandle to the parent interrupt controller, e.g: > > interrupts-extended = <&gic 73 GIPC_SPI IRQ_TYPE_LEVEL_HIGH>, > <&pmc 10>; > > (provided that I understood your problem and example correctly). We have > a similar use case on our STB chips where we have "main" interrupts > wired to the ARM GIC and wake-up interrupts wired to a specific wake-up > interrupt controller in an always-on domain. The two controllers have a > different #interrupt-cells specifier. > > The interrupt infrastructure in Linux already knows how to deal with > that and whether you use of_get_irq() or platform_get_irq() you should > obtain the correct virtual interrupt numbers that you expected, parented > to the appropriate interrupt controller. Yeah, I had looked at the interrupts-extended property as well, but while I think it might work, one issue with that is that we'd be modelling the regular IRQ and wakeup IRQ as two different IRQs. On Tegra, however, the wakeup interrupt is the same as the regular interrupt, except that the bits to enable the wake logic is in a different hardware block. So the PMC acts more like an additional mux for a subset of the interrupts in the system. In addition to that, if I were to model the interrupts separately, I'd also have to modify existing drivers for all the wakeup-able devices to deal with two separate interrupts, which is rather undesirable. Thierry signature.asc Description: PGP signature
[PATCH v2] PM / suspend: Count suspend-to-idle loop as sleep time
From: Rafael J. Wysocki There is a difference in behavior between suspend-to-idle and suspend-to-RAM in the timekeeping handling that leads to functional issues. Namely, every iteration of the loop in s2idle_loop() increases the monotinic clock somewhat, even if timekeeping_suspend() and timekeeping_resume() are invoked from s2idle_enter(), and if many of them are carried out in a row, the monotonic clock can grow significantly while the system is regarded as suspended, which doesn't happen during suspend-to-RAM and so it is unexpected and leads to confusion and misbehavior in user space (similar to what ensued when we tried to combine the boottime and monotonic clocks). To avoid that, count all iterations of the loop in s2idle_loop() as "sleep time" and adjust the clock for that on exit from suspend-to-idle. [That also covers systems on which timekeeping is not suspended by s2idle_enter().] Signed-off-by: Rafael J. Wysocki Acked-by: Peter Zijlstra (Intel) --- -> v2: - Switched over to ktime_get_ns() as suggested by Peter. - Tentatively added "Acked-by" from Peter. --- kernel/power/suspend.c | 18 ++ 1 file changed, 18 insertions(+) Index: linux-pm/kernel/power/suspend.c === --- linux-pm.orig/kernel/power/suspend.c +++ linux-pm/kernel/power/suspend.c @@ -109,8 +109,12 @@ static void s2idle_enter(void) static void s2idle_loop(void) { + u64 delta_ns; + pm_pr_dbg("suspend-to-idle\n"); + delta_ns = ktime_get_ns(); + for (;;) { int error; @@ -150,6 +154,20 @@ static void s2idle_loop(void) pm_wakeup_clear(false); } + /* +* If the monotonic clock difference between the start of the loop and +* this point is too large, user space may get confused about whether or +* not the system has been suspended and tasks may get killed by +* watchdogs etc., so count the loop as "sleep time" to compensate for +* that. +*/ + delta_ns = ktime_get_ns() - delta_ns; + if (delta_ns) { + struct timespec64 timespec64_delta = ns_to_timespec64(delta_ns); + + timekeeping_inject_sleeptime64(×pec64_delta); + } + pm_pr_dbg("resume from suspend-to-idle\n"); }
[PATCH net-next v3 04/11] net: mscc: ocelot: move the HSIO header to include/soc
Since HSIO address space can be used by different drivers (PLL, SerDes muxing, temperature sensor), let's move it somewhere it can be included by all drivers. Acked-by: Alexandre Belloni Signed-off-by: Quentin Schulz --- drivers/net/ethernet/mscc/ocelot.h | 1 +- drivers/net/ethernet/mscc/ocelot_hsio.h | 785 +- drivers/net/ethernet/mscc/ocelot_regs.c | 1 +- include/soc/mscc/ocelot_hsio.h | 785 +- 4 files changed, 786 insertions(+), 786 deletions(-) delete mode 100644 drivers/net/ethernet/mscc/ocelot_hsio.h create mode 100644 include/soc/mscc/ocelot_hsio.h diff --git a/drivers/net/ethernet/mscc/ocelot.h b/drivers/net/ethernet/mscc/ocelot.h index 616bec3..2da20a3 100644 --- a/drivers/net/ethernet/mscc/ocelot.h +++ b/drivers/net/ethernet/mscc/ocelot.h @@ -16,7 +16,6 @@ #include "ocelot_ana.h" #include "ocelot_dev.h" -#include "ocelot_hsio.h" #include "ocelot_qsys.h" #include "ocelot_rew.h" #include "ocelot_sys.h" diff --git a/drivers/net/ethernet/mscc/ocelot_hsio.h b/drivers/net/ethernet/mscc/ocelot_hsio.h deleted file mode 100644 index d93ddec..000 --- a/drivers/net/ethernet/mscc/ocelot_hsio.h +++ /dev/null @@ -1,785 +0,0 @@ -/* SPDX-License-Identifier: (GPL-2.0 OR MIT) */ -/* - * Microsemi Ocelot Switch driver - * - * Copyright (c) 2017 Microsemi Corporation - */ - -#ifndef _MSCC_OCELOT_HSIO_H_ -#define _MSCC_OCELOT_HSIO_H_ - -#define HSIO_PLL5G_CFG0_ENA_ROT BIT(31) -#define HSIO_PLL5G_CFG0_ENA_LANE BIT(30) -#define HSIO_PLL5G_CFG0_ENA_CLKTREE BIT(29) -#define HSIO_PLL5G_CFG0_DIV4 BIT(28) -#define HSIO_PLL5G_CFG0_ENA_LOCK_FINE BIT(27) -#define HSIO_PLL5G_CFG0_SELBGV820(x) (((x) << 23) & GENMASK(26, 23)) -#define HSIO_PLL5G_CFG0_SELBGV820_M GENMASK(26, 23) -#define HSIO_PLL5G_CFG0_SELBGV820_X(x)(((x) & GENMASK(26, 23)) >> 23) -#define HSIO_PLL5G_CFG0_LOOP_BW_RES(x)(((x) << 18) & GENMASK(22, 18)) -#define HSIO_PLL5G_CFG0_LOOP_BW_RES_M GENMASK(22, 18) -#define HSIO_PLL5G_CFG0_LOOP_BW_RES_X(x) (((x) & GENMASK(22, 18)) >> 18) -#define HSIO_PLL5G_CFG0_SELCPI(x) (((x) << 16) & GENMASK(17, 16)) -#define HSIO_PLL5G_CFG0_SELCPI_M GENMASK(17, 16) -#define HSIO_PLL5G_CFG0_SELCPI_X(x) (((x) & GENMASK(17, 16)) >> 16) -#define HSIO_PLL5G_CFG0_ENA_VCO_CONTRHBIT(15) -#define HSIO_PLL5G_CFG0_ENA_CP1 BIT(14) -#define HSIO_PLL5G_CFG0_ENA_VCO_BUF BIT(13) -#define HSIO_PLL5G_CFG0_ENA_BIAS BIT(12) -#define HSIO_PLL5G_CFG0_CPU_CLK_DIV(x)(((x) << 6) & GENMASK(11, 6)) -#define HSIO_PLL5G_CFG0_CPU_CLK_DIV_M GENMASK(11, 6) -#define HSIO_PLL5G_CFG0_CPU_CLK_DIV_X(x) (((x) & GENMASK(11, 6)) >> 6) -#define HSIO_PLL5G_CFG0_CORE_CLK_DIV(x) ((x) & GENMASK(5, 0)) -#define HSIO_PLL5G_CFG0_CORE_CLK_DIV_MGENMASK(5, 0) - -#define HSIO_PLL5G_CFG1_ENA_DIRECTBIT(18) -#define HSIO_PLL5G_CFG1_ROT_SPEED BIT(17) -#define HSIO_PLL5G_CFG1_ROT_DIR BIT(16) -#define HSIO_PLL5G_CFG1_READBACK_DATA_SEL BIT(15) -#define HSIO_PLL5G_CFG1_RC_ENABLE BIT(14) -#define HSIO_PLL5G_CFG1_RC_CTRL_DATA(x) (((x) << 6) & GENMASK(13, 6)) -#define HSIO_PLL5G_CFG1_RC_CTRL_DATA_MGENMASK(13, 6) -#define HSIO_PLL5G_CFG1_RC_CTRL_DATA_X(x) (((x) & GENMASK(13, 6)) >> 6) -#define HSIO_PLL5G_CFG1_QUARTER_RATE BIT(5) -#define HSIO_PLL5G_CFG1_PWD_TXBIT(4) -#define HSIO_PLL5G_CFG1_PWD_RXBIT(3) -#define HSIO_PLL5G_CFG1_OUT_OF_RANGE_RECAL_ENABIT(2) -#define HSIO_PLL5G_CFG1_HALF_RATE BIT(1) -#define HSIO_PLL5G_CFG1_FORCE_SET_ENA BIT(0) - -#define HSIO_PLL5G_CFG2_ENA_TEST_MODE BIT(30) -#define HSIO_PLL5G_CFG2_ENA_PFD_IN_FLIP BIT(29) -#define HSIO_PLL5G_CFG2_ENA_VCO_NREF_TESTOUT BIT(28) -#define HSIO_PLL5G_CFG2_ENA_FBTESTOUT BIT(27) -#define HSIO_PLL5G_CFG2_ENA_RCPLL BIT(26) -#define HSIO_PLL5G_CFG2_ENA_CP2 BIT(25) -#define HSIO_PLL5G_CFG2_ENA_CLK_BYPASS1 BIT(24) -#define HSIO_PLL5G_CFG2_AMPC_SEL(x) (((x) << 16) & GENMASK(23, 16)) -#define HSIO_PLL5G_CFG2_AMPC_SEL_MGENMASK(23, 16) -#define HSIO_PLL5G_CFG2_AMPC_SEL_X(x) (((x) & GENMASK(23, 16)) >> 16) -#define HSIO_PLL5G_CFG2_ENA_CLK_BYPASS
Re: [PATCH v7 1/6] dt-bindings: ti-lmu: Remove LM3697
Hi! > >> How do I politely explain that the original implementation was wrong for > >> certain devices? > > > > Implementation? Device tree is hardware description. > > Yes this hardware description is incorrect. The hardware description is > describing a MFD but this LED driver (and a couple others) only perform > 1 function and that is to drive a LED string. So what? Does not seem incorrect to me. Maybe the description should not be in MFD directory, but other than that... > >> Isn't code and documentation supposed to be pushed in stages > >> together? > > > > Device tree is _not_ documentation. And yes, it is normally pushed > > together. But that did not happen here, and bindings are already in. > > Hmm.. Its not documentation but it is in the Documentation folder. > And just because the bindings are in does not mean they cannot be > changed. You may want to learn more about device tree and/or talk to the device tree maintainers. This is an old article. https://lwn.net/Articles/561462/ NAK on this patch. I see that this binding has problems, but introducing different binding for subset of devices is _not_ a fix. > > What about the multi function devices? They should have same binding. > > The MFD devices defined are not in contention here only the SFD. I'd like to see common solutions for SFD and MFD, as the hardware is similar, and that includes the code. Having code that is easier to maintain is important, and having many drivers are harder to maintain than one driver. Milo's code looks better than yours in that regard. I disagree about Milo's code being "nightmare" to modify, and care about "easy to maintain" more than "binary size". Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: [PATCHES] tty ioctls cleanups, compat and not only
On Fri, Sep 14, 2018 at 4:27 AM Al Viro wrote: > On Thu, Sep 13, 2018 at 10:56:48PM +0200, Arnd Bergmann wrote: > commit de36af5ca465156863b5fb7548e3660ea7d3bbcf > Author: Al Viro > Date: Thu Sep 13 22:12:15 2018 -0400 > > change semantics of ldisc ->compat_ioctl() > > First of all, make it return int. Returning long when native method > had never allowed that is ridiculous and inconvenient. Ack. > More importantly, change the caller; if ldisc ->compat_ioctl() is NULL > or returns -ENOIOCTLCMD, tty_compat_ioctl() will try to feed cmd and > compat_ptr(arg) to ldisc's native ->ioctl(). > > That simplifies ->compat_ioctl() instances quite a bit - they only > need to deal with ioctls that are neither generic tty ones (those > would get shunted off to tty_ioctl()) nor simple compat pointer ones. This does sound very appealing, but there is a small downside: The difference between ".compat_ioctl = NULL" and ".compat_ioctl=native_ioctl" is now very subtle, and I wouldn't necessarily expect casual readers to understand that. If we go with my file_operations patch for generic_compat_ioctl_ptrarg and add generic_compat_ioctl_intarg, we can do the same thing here with ldisc_compat_ioctl_ptrarg/ldisc_compat_ioctl_intarg to make it a little more consistent with fops and self-documenting. > diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c > index 963bb0309e25..ae0dd57a8e99 100644 > --- a/drivers/bluetooth/hci_ldisc.c > +++ b/drivers/bluetooth/hci_ldisc.c > @@ -821,6 +821,7 @@ static int __init hci_uart_init(void) > hci_uart_ldisc.read = hci_uart_tty_read; > hci_uart_ldisc.write= hci_uart_tty_write; > hci_uart_ldisc.ioctl= hci_uart_tty_ioctl; > + hci_uart_ldisc.compat_ioctl = hci_uart_tty_ioctl; > hci_uart_ldisc.poll = hci_uart_tty_poll; > hci_uart_ldisc.receive_buf = hci_uart_tty_receive; > hci_uart_ldisc.write_wakeup = hci_uart_tty_wakeup; so this would become hci_uart_ldisc.compat_ioctl = ldisc_compat_ioctl_intarg; > static struct tty_ldisc_ops sp_ldisc = { > .owner = THIS_MODULE, > .magic = TTY_LDISC_MAGIC, > @@ -776,9 +758,6 @@ static struct tty_ldisc_ops sp_ldisc = { > .open = sixpack_open, > .close = sixpack_close, > .ioctl = sixpack_ioctl, > -#ifdef CONFIG_COMPAT > - .compat_ioctl = sixpack_compat_ioctl, > -#endif > .receive_buf= sixpack_receive_buf, > .write_wakeup = sixpack_write_wakeup, > }; And this would be .compat_ioctl = ldisc_compat_ioctl_ptrarg, respectively > diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c > index 0f75ae6bfaa7..483ad432d906 100644 > --- a/drivers/tty/tty_io.c > +++ b/drivers/tty/tty_io.c > @@ -2824,6 +2824,9 @@ static long tty_compat_ioctl(struct file *file, > unsigned int cmd, > return hung_up_tty_compat_ioctl(file, cmd, arg); > if (ld->ops->compat_ioctl) > retval = ld->ops->compat_ioctl(tty, file, cmd, arg); > + if (retval == -ENOIOCTLCMD && ld->ops->ioctl) > + retval = ld->ops->ioctl(tty, file, > + (unsigned long)compat_ptr(cmd), arg); > tty_ldisc_deref(ld); The added fallback would still be useful in case someone forgets to add the .compat_ioctl assignment to a newly added ldisc. Another idea I had for a final fallback would be tty_ldisc_deref(ld); +if (retval == -ENOIOCTLCMD && _IOC_TYPE(cmd) == 'T') { + retval = tty_ioctl(file, cmd, (unsigned long)compat_ptr(arg)); + WARN_ON_ONCE(retval != -ENOIOCTLCMD && retval != -ENOTTY); +} Seeing that you list every single 'T' command in tty_compat_ioctl() that we handle in the native case, that WARN_ON_ONCE should not trigger for any input, but it would catch (and warn about) any of those that might get added in the future to the native code path without the compat entry. That might be too much -- if it gets easy enough to keep the code correct in the first place, there is no need. Arnd
Re: [RFCv2 00/48] perf tools: Add threads to record command
On Fri, Sep 14, 2018 at 11:29:10AM +0900, Namhyung Kim wrote: > On Thu, Sep 13, 2018 at 07:10:35PM +0300, Alexey Budankov wrote: > > Hi, > > Hello, > > > > > On 13.09.2018 15:54, Jiri Olsa wrote: > > > hi, > > > sending *RFC* for threads support in perf record command. > > > > > > In big picture this patchset adds perf record --threads > > > option that allows to create threads in following modes: > > > > > > 1) single thread mode (current) > > > > > > $ perf record ... > > > $ perf record --threads=1 ... > > > > > > - all maps are read/stored under process thread > > > > > > 2) mode with specific (X) number of threads > > > > > > $ perf record --threads=X ... > > > > > > - maps are spread equaly among threads > > > > > > 3) mode that creates thread for every monitored memory map > > > > > > $ perf record --threads ... > > > > > > - which in perf record is equal to number of CPUs, and > > > it pins each thread to its map's cpu: > > > > > > 4) TODO - NUMA aware threads/maps separation > > >... > > > > > > The perf.data stays as a single file. > > I'm not sure we really need to keep it as a single file. As it's a > kind of big changes, we might consider breaking compatibility and use > a directory structure. moving the files into the perf.data at the end is actualy not a lot code.. and I think it's one of the 'small' things that make this feature more user friendly jirka
Re: [PATCH v7 1/6] dt-bindings: ti-lmu: Remove LM3697
Hi! > All be it the analog side of these devices may be the same but the digital > side for each chip is different. > The register maps are not the same, the number of supported LED strings are > not > the same, there are features not supported by this driver and will > be a pain to add. Digital side looks similar enough that Milo was able to share code between them. Yes, obviously register map will be different. > Adding features will be a complete night mare especially for the LM3632 which > also support Torch and Flash. > > > Here are where a single driver will start getting messy and support difficult > LM3533 has ALS and an ADC for an ALS analog sensor > LM3631 has no ALS functionality > LM3632 has strobe/torch functionality and no ramp support > LM3633 has lvled support coupled with the hvled support > LM3695 does not even appear to be available publicly > LM3697 is the only device that that this driver could be used for as is. We already been through this once. More than one of these has ramp support, and more than one has overvoltage protection. (Plus, all of them are LED drivers). Surely non-ugly design can be done where the code is shared? Thanks, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: [RFCv2 00/48] perf tools: Add threads to record command
On Thu, Sep 13, 2018 at 07:10:35PM +0300, Alexey Budankov wrote: > Hi, > > On 13.09.2018 15:54, Jiri Olsa wrote: > > hi, > > sending *RFC* for threads support in perf record command. > > > > In big picture this patchset adds perf record --threads > > option that allows to create threads in following modes: > > > > 1) single thread mode (current) > > > > $ perf record ... > > $ perf record --threads=1 ... > > > > - all maps are read/stored under process thread > > > > 2) mode with specific (X) number of threads > > > > $ perf record --threads=X ... > > > > - maps are spread equaly among threads > > > > 3) mode that creates thread for every monitored memory map > > > > $ perf record --threads ... > > > > - which in perf record is equal to number of CPUs, and > > it pins each thread to its map's cpu: > > > > 4) TODO - NUMA aware threads/maps separation > >... > > > > The perf.data stays as a single file. > > > > v2 changes: > > - rebased to current Arnaldo's perf/core > > (also based on few fixes from my perf/core, see the branch details > > below) > > > > This patchset contains lot of preparation changes to make > > threaded record possible: > > > > - Namhyung's changes to create multiple data streams in > > perf data file, which allows having each thread data > > being stored in separate files and merged into single > > perf data after > > > > - Namhyung's changes to create track mmaps for auxiliary > > events > > > > - Namhyung's changes to search for threads/mmaps/comms > > using the time. This is needed because we have now > > multiple data streams which are processed separately, > > but they all need access to complete auxiliary events > > data (threads/mmaps/comms). That's also a reason why > > the auxiliary events are stored into separate data > > stream, which is processed before real data. > > > > - the rest of the code that adds threads abstraction into > > record command allows to create them and distribute maps > > among them > > > > - other preparational changes > > > > The threaded monitoring currently can't monitor backward maps > > and there are probably more limitations which I haven't spotted > > yet. > > > > So far I tested on laptop: > > http://people.redhat.com/~jolsa/record_threads/test-4CPU.txt > > > > and a one bigger server: > > http://people.redhat.com/~jolsa/record_threads/test-208CPU.txt > > > > I can see decrease in recorded LOST events, but both the benchmark > > and the monitoring must be carefully configured wrt: > > - number of events (frequency) > > - size of the memory maps > > - size of events (callchains) > > - final perf.data size > > > > It's also available in: > > git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git > > perf/record_threads > > > > thoughts? ;-) thanks > > jirka > > It is preferable to split into smaller pieces that bring > some improvement proved by metrics numbers and ready for > merging and upstream. Do we have more metrics than the > data loss from trace AIO patches? well the primary focus is to get more events in, so the LOST metric is the main one > > There is usage of Posix threading API but there is no > its implementation in the patch series, to avoid dependency > on externally coded designs in the core of the tool. well, we use pthreads in here, bt it's really not that much code.. we could make that generic in future if needed jirka
[PATCH 1/4] parisc: Move __IGNORE* entries to non uapi header
All the __IGNORE* entries are resides in the uapi header file and it is not used by any user space applications. One of the patch in this patch series will generate the uapi header file and system call table file. So if we move all the __IGNORE* entries to non uapi header, it will simplify the uapi header and system call table file generation script. It is correct to keep __IGNORE* entry in non uapi header asm/unistd.h while uapi/asm/unistd.h must hold information only useful for user space applications. Signed-off-by: Firoz Khan --- arch/parisc/include/asm/unistd.h | 6 ++ arch/parisc/include/uapi/asm/unistd.h | 7 --- 2 files changed, 6 insertions(+), 7 deletions(-) diff --git a/arch/parisc/include/asm/unistd.h b/arch/parisc/include/asm/unistd.h index 3d507d0..fc71edf 100644 --- a/arch/parisc/include/asm/unistd.h +++ b/arch/parisc/include/asm/unistd.h @@ -165,6 +165,12 @@ type name(type1 arg1, type2 arg2, type3 arg3, type4 arg4, type5 arg5) \ #define __ARCH_WANT_SYS_CLONE #define __ARCH_WANT_COMPAT_SYS_SENDFILE +#define __IGNORE_select/* newselect */ +#define __IGNORE_fadvise64 /* fadvise64_64 */ +#define __IGNORE_pkey_mprotect +#define __IGNORE_pkey_alloc +#define __IGNORE_pkey_free + #endif /* __ASSEMBLY__ */ #undef STR diff --git a/arch/parisc/include/uapi/asm/unistd.h b/arch/parisc/include/uapi/asm/unistd.h index dc77c5a..bb52e12 100644 --- a/arch/parisc/include/uapi/asm/unistd.h +++ b/arch/parisc/include/uapi/asm/unistd.h @@ -368,13 +368,6 @@ #define __NR_Linux_syscalls(__NR_io_pgetevents + 1) - -#define __IGNORE_select/* newselect */ -#define __IGNORE_fadvise64 /* fadvise64_64 */ -#define __IGNORE_pkey_mprotect -#define __IGNORE_pkey_alloc -#define __IGNORE_pkey_free - #define LINUX_GATEWAY_ADDR 0x100 #endif /* _UAPI_ASM_PARISC_UNISTD_H_ */ -- 1.9.1
[PATCH v3 0/7] regulator/mfd/dt_bindings: bd718x7: Support ROHM bd71847
Patch series adding support for ROHM BD71847 PMIC. ROHM BD71847 is modified version of ROHM BD71837 Power Management IC. BD71847 integrates 6 buck regulators and 6 LDO’s. The patch series adds support for BD71847 in BD71837 driver. It also extends BD71837 to support all BUCK/LDO output voltages. Support for "pickable voltage ranges" is introduced in order to fully support BD71837/BD71847 voltage selection registers. Some of the voltage regulators provide different multiple voltage ranges and output voltage is determined by range selection bit(s) and voltage selection bits instead of plain voltage selection bits. Changelog v3 - Fix warning caused by void * to unsigned int cast - Fix patch dependency issue (patch 4 used define from patch 5) - Fix indent problem. Changelog v2 - Dropped first patch (voltage monitoring fix) and "rebased" on top of next-20180912 because first patch was already applied. - Fixed two commit message typos. Fixes based on comments from Lee Jones - removed PMIC revision check. - stored chip type directly to data pointer - minor readability improvement to DT binding - unified naming for last enum members - moved unrelated changes from pickable ranges patch - removed C++ style comment Patch 2: limited BD71847 support (without range selection) Patch 3: BD71847 device-tree bindings for regulators Patch 4: BD71847 device-tree bindings for mfd Patch 5: Voltage range selection support to regulator helpers Patch 6: Simple renaming plenty of BD71837 occurrences to BD718XX Patch 7: BD71837/BD71847 Full voltage range support using helpers from patch 5 Patch 8: Renaming bd71837 regulator driver to bd718x7 this patch series is based on linux-next tag next-20180912. Matti Vaittinen (7): regulator/mfd: Support ROHM BD71847 power management IC regulator: dt bindings: add BD71847 device-tree binding documentation mfd: dt bindings: add BD71847 device-tree binding documentation regulator: Support regulators where voltage ranges are selectable regulator/mfd: bd718xx: rename bd71837/bd71847 common instances regulator: bd718XX use pickable ranges regulator: bd718xx: rename bd71837 to 718xx .../devicetree/bindings/mfd/rohm,bd71837-pmic.txt | 17 +- .../bindings/regulator/rohm,bd71837-regulator.txt | 12 +- drivers/mfd/rohm-bd718x7.c | 162 ++- drivers/regulator/Kconfig |2 +- drivers/regulator/Makefile |2 +- drivers/regulator/bd71837-regulator.c | 644 --- drivers/regulator/bd718x7-regulator.c | 1138 drivers/regulator/core.c |5 + drivers/regulator/helpers.c| 232 include/linux/mfd/rohm-bd718x7.h | 333 +++--- include/linux/regulator/driver.h | 20 +- 11 files changed, 1646 insertions(+), 921 deletions(-) delete mode 100644 drivers/regulator/bd71837-regulator.c create mode 100644 drivers/regulator/bd718x7-regulator.c -- 2.14.3
[PATCH 3/4] parisc: Add system call table generation support
The system call tables are in different format in all architecture and it will be difficult to manually add or modify the system calls in the respective files. To make it easy by keeping a script and which'll generate the header file and syscall table file so this change will unify them across all architectures. The system call table generation script is added in syscalls directory which contain the script to generate both uapi header file system call table generation file and syscall_32/64.tbl file which'll be the input for the scripts. syscall_32/64.tbl contains the list of available system calls along with system call number and corresponding entry point. Add a new system call in this architecture will be possible by adding new entry in the syscall_32/64.tbl file. Adding a new table entry consisting of: - System call number. - ABI. - System call name. - Entry point name. - Compat entry name, if required. syscallhdr.sh and syscalltbl.sh will generate uapi header- unistd_32/64.h and syscall_table_32/64/c32.h files respectively. File syscall_table_32/64/c32.h is included by syscall.S - the real system call table. Both .sh files will parse the content syscall.tbl to generate the header and table files. ARM, s390 and x86 architecuture does have the similar support. I leverage their implementation to come up with a generic solution. Signed-off-by: Firoz Khan --- arch/parisc/kernel/syscalls/Makefile | 52 + arch/parisc/kernel/syscalls/syscall_32.tbl | 358 + arch/parisc/kernel/syscalls/syscall_64.tbl | 357 arch/parisc/kernel/syscalls/syscallhdr.sh | 38 +++ arch/parisc/kernel/syscalls/syscalltbl.sh | 36 +++ 5 files changed, 841 insertions(+) create mode 100644 arch/parisc/kernel/syscalls/Makefile create mode 100644 arch/parisc/kernel/syscalls/syscall_32.tbl create mode 100644 arch/parisc/kernel/syscalls/syscall_64.tbl create mode 100644 arch/parisc/kernel/syscalls/syscallhdr.sh create mode 100644 arch/parisc/kernel/syscalls/syscalltbl.sh diff --git a/arch/parisc/kernel/syscalls/Makefile b/arch/parisc/kernel/syscalls/Makefile new file mode 100644 index 000..12d1def --- /dev/null +++ b/arch/parisc/kernel/syscalls/Makefile @@ -0,0 +1,52 @@ +# SPDX-License-Identifier: GPL-2.0 +out := arch/$(SRCARCH)/include/generated/asm +uapi := arch/$(SRCARCH)/include/generated/uapi/asm + +_dummy := $(shell [ -d '$(uapi)' ] || mkdir -p '$(uapi)') \ + $(shell [ -d '$(out)' ] || mkdir -p '$(out)') + +syscall32 := $(srctree)/$(src)/syscall_32.tbl +syscall64 := $(srctree)/$(src)/syscall_64.tbl + +syshdr := $(srctree)/$(src)/syscallhdr.sh +systbl := $(srctree)/$(src)/syscalltbl.sh + +quiet_cmd_syshdr = SYSHDR $@ + cmd_syshdr = $(CONFIG_SHELL) '$(syshdr)' '$<' '$@' \ + '$(syshdr_abi_$(basetarget))' \ + '$(syshdr_pfx_$(basetarget))' \ + '$(syshdr_offset_$(basetarget))' + +quiet_cmd_systbl = SYSTBL $@ + cmd_systbl = $(CONFIG_SHELL) '$(systbl)' '$<' '$@' \ + '$(systbl_abi_$(basetarget))' + +syshdr_offset_unistd_32 := __NR_Linux +$(uapi)/unistd_32.h: $(syscall32) $(syshdr) + $(call if_changed,syshdr) + +syshdr_offset_unistd_64 := __NR_Linux +$(uapi)/unistd_64.h: $(syscall64) $(syshdr) + $(call if_changed,syshdr) + +systbl_abi_syscall_table_32 := 32 +$(out)/syscall_table_32.h: $(syscall32) $(systbl) + $(call if_changed,systbl) + +systbl_abi_syscall_table_64 := 64 +$(out)/syscall_table_64.h: $(syscall64) $(systbl) + $(call if_changed,systbl) + +systbl_abi_syscall_table_c32 := c32 +$(out)/syscall_table_c32.h: $(syscall32) $(systbl) + $(call if_changed,systbl) + +uapisyshdr-y += unistd_32.h unistd_64.h +syshdr-y += syscall_table_32.h syscall_table_64.h syscall_table_c32.h + +targets+= $(uapisyshdr-y) $(syshdr-y) + +PHONY += all +all: $(addprefix $(uapi)/,$(uapisyshdr-y)) +all: $(addprefix $(out)/,$(syshdr-y)) + @: diff --git a/arch/parisc/kernel/syscalls/syscall_32.tbl b/arch/parisc/kernel/syscalls/syscall_32.tbl new file mode 100644 index 000..637abc8 --- /dev/null +++ b/arch/parisc/kernel/syscalls/syscall_32.tbl @@ -0,0 +1,358 @@ +# +# 32-bit system call numbers and entry vectors +# +# The format is: +# +# +# The abi is always "common" for this file. +# +0 common restart_syscall sys_restart_syscall +1 common exitsys_exit +2 common forksys_fork_wrapper +3 common readsys_read +4 common write sys_write +5 common opensys_open compat_sys_open +6 common close sys_cl
[PATCH 2/4] parisc: Replace __NR_Linux_syscalls macro with __NR_syscalls
__NR_Linux_syscalls macro holds the number of system call exist in PARISC architecture. This macro is currently the part of uapi/asm/ unistd.h file. We have to change the value of __NR_Linux_syscalls, if we add or delete a system call. One of the patch in this patch series has a script which will generate a uapi header based on syscall.tbl file. The syscall.tbl file contains the number of system call information. So we have two option to update __NR_Linux_syscalls value. 1. Update __NR_Linux_syscalls in uapi/asm/unistd.h manually by counting the no.of system calls. No need to update NR_syscalls until we either add a new system call or delete an existing system call. 2. We can keep this feature it above mentioned script, that'll count the number of syscalls and keep it in a generated file. In this case we don't need to explicitly update __NR_Linux_syscalls in asm/unistd.h file. The 2nd option will be the recommended one. For that, I moved the __NR_Linux_syscalls macro from uapi/asm/unistd.h to asm/unistd.h. The macro name also changed form __NR_Linux_syscalls to __NR_syscalls for making the name convention same across all architecture. While __NR_ syscalls isn't strictly part of the uapi, having it as part of the generated header to simplifies the implementation. Signed-off-by: Firoz Khan --- arch/parisc/include/asm/unistd.h | 2 ++ arch/parisc/include/uapi/asm/unistd.h | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/parisc/include/asm/unistd.h b/arch/parisc/include/asm/unistd.h index fc71edf..71417c1 100644 --- a/arch/parisc/include/asm/unistd.h +++ b/arch/parisc/include/asm/unistd.h @@ -8,6 +8,8 @@ #define SYS_ify(syscall_name) __NR_##syscall_name +#define __NR_Linux_syscalls __NR_syscalls + #ifndef ASM_LINE_SEP # define ASM_LINE_SEP ; #endif diff --git a/arch/parisc/include/uapi/asm/unistd.h b/arch/parisc/include/uapi/asm/unistd.h index bb52e12..7532817 100644 --- a/arch/parisc/include/uapi/asm/unistd.h +++ b/arch/parisc/include/uapi/asm/unistd.h @@ -366,7 +366,7 @@ #define __NR_statx (__NR_Linux + 349) #define __NR_io_pgetevents (__NR_Linux + 350) -#define __NR_Linux_syscalls(__NR_io_pgetevents + 1) +#define __NR_syscalls (__NR_Linux + 351) #define LINUX_GATEWAY_ADDR 0x100 -- 1.9.1
[PATCH v3 1/7] regulator/mfd: Support ROHM BD71847 power management IC
BD71847 is reduced version of BD71837. DVS bucks 3 and 4 are removed as is LDO7. Voltage ranges of some regulators are expanded. Add initial support for BD71847 with BD71837 driver. Signed-off-by: Matti Vaittinen --- drivers/mfd/rohm-bd718x7.c| 70 +-- drivers/regulator/bd71837-regulator.c | 1055 +++-- include/linux/mfd/rohm-bd718x7.h | 271 - 3 files changed, 879 insertions(+), 517 deletions(-) diff --git a/drivers/mfd/rohm-bd718x7.c b/drivers/mfd/rohm-bd718x7.c index 75c8ec659547..ce5aa5cd3545 100644 --- a/drivers/mfd/rohm-bd718x7.c +++ b/drivers/mfd/rohm-bd718x7.c @@ -7,21 +7,16 @@ // Datasheet available from // https://www.rohm.com/datasheet/BD71837MWV/bd71837mwv-e +#include #include #include #include #include #include #include +#include #include - -/* - * gpio_keys.h requires definiton of bool. It is brought in - * by above includes. Keep this as last until gpio_keys.h gets fixed. - */ -#include - -static const u8 supported_revisions[] = { 0xA2 /* BD71837 */ }; +#include static struct gpio_keys_button button = { .code = KEY_POWER, @@ -41,8 +36,8 @@ static struct mfd_cell bd71837_mfd_cells[] = { .platform_data = &bd718xx_powerkey_data, .pdata_size = sizeof(bd718xx_powerkey_data), }, - { .name = "bd71837-clk", }, - { .name = "bd71837-pmic", }, + { .name = "bd718xx-clk", }, + { .name = "bd718xx-pmic", }, }; static const struct regmap_irq bd71837_irqs[] = { @@ -61,16 +56,16 @@ static struct regmap_irq_chip bd71837_irq_chip = { .num_irqs = ARRAY_SIZE(bd71837_irqs), .num_regs = 1, .irq_reg_stride = 1, - .status_base = BD71837_REG_IRQ, - .mask_base = BD71837_REG_MIRQ, - .ack_base = BD71837_REG_IRQ, + .status_base = BD718XX_REG_IRQ, + .mask_base = BD718XX_REG_MIRQ, + .ack_base = BD718XX_REG_IRQ, .init_ack_masked = true, .mask_invert = false, }; static const struct regmap_range pmic_status_range = { - .range_min = BD71837_REG_IRQ, - .range_max = BD71837_REG_POW_STATE, + .range_min = BD718XX_REG_IRQ, + .range_max = BD718XX_REG_POW_STATE, }; static const struct regmap_access_table volatile_regs = { @@ -82,7 +77,7 @@ static const struct regmap_config bd71837_regmap_config = { .reg_bits = 8, .val_bits = 8, .volatile_table = &volatile_regs, - .max_register = BD71837_MAX_REGISTER - 1, + .max_register = BD718XX_MAX_REGISTER - 1, .cache_type = REGCACHE_RBTREE, }; @@ -90,8 +85,12 @@ static int bd71837_i2c_probe(struct i2c_client *i2c, const struct i2c_device_id *id) { struct bd71837 *bd71837; - int ret, i; - unsigned int val; + int ret; + + if (!i2c->irq) { + dev_err(&i2c->dev, "No IRQ configured\n"); + return -EINVAL; + } bd71837 = devm_kzalloc(&i2c->dev, sizeof(struct bd71837), GFP_KERNEL); @@ -99,12 +98,8 @@ static int bd71837_i2c_probe(struct i2c_client *i2c, return -ENOMEM; bd71837->chip_irq = i2c->irq; - - if (!bd71837->chip_irq) { - dev_err(&i2c->dev, "No IRQ configured\n"); - return -EINVAL; - } - + bd71837->chip_type = (unsigned int)(uintptr_t) + of_device_get_match_data(&i2c->dev); bd71837->dev = &i2c->dev; dev_set_drvdata(&i2c->dev, bd71837); @@ -114,20 +109,6 @@ static int bd71837_i2c_probe(struct i2c_client *i2c, return PTR_ERR(bd71837->regmap); } - ret = regmap_read(bd71837->regmap, BD71837_REG_REV, &val); - if (ret) { - dev_err(&i2c->dev, "Read BD71837_REG_DEVICE failed\n"); - return ret; - } - for (i = 0; i < ARRAY_SIZE(supported_revisions); i++) - if (supported_revisions[i] == val) - break; - - if (i == ARRAY_SIZE(supported_revisions)) { - dev_err(&i2c->dev, "Unsupported chip revision\n"); - return -ENODEV; - } - ret = devm_regmap_add_irq_chip(&i2c->dev, bd71837->regmap, bd71837->chip_irq, IRQF_ONESHOT, 0, &bd71837_irq_chip, &bd71837->irq_data); @@ -138,7 +119,7 @@ static int bd71837_i2c_probe(struct i2c_client *i2c, /* Configure short press to 10 milliseconds */ ret = regmap_update_bits(bd71837->regmap, -BD71837_REG_PWRONCONFIG0, +BD718XX_REG_PWRONCONFIG0, BD718XX_PWRBTN_PRESS_DURATION_MASK, BD718XX_PWRBTN_SHORT_PRESS_10MS); if (ret) { @@ -149,7 +130,7 @@ static int bd71837_i2c_probe(struct i2c_client *i2c, /* Configure long press to 10 seconds */ ret = regmap_
Re: [PATCH] PM / suspend: Count suspend-to-idle loop as sleep time
Hi! On 09/14/2018 09:59 AM, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki > > There is a difference in behavior between suspend-to-idle and > suspend-to-RAM in the timekeeping handling that leads to functional > issues. Namely, every iteration of the loop in s2idle_loop() > increases the monotinic clock somewhat, even if timekeeping_suspend() > and timekeeping_resume() are invoked from s2idle_enter(), and if > many of them are carried out in a row, the monotonic clock can grow > significantly while the system is regarded as suspended, which > doesn't happen during suspend-to-RAM and so it is unexpected and > leads to confusion and misbehavior in user space (similar to what > ensued when we tried to combine the boottime and monotonic clocks). > > To avoid that, count all iterations of the loop in s2idle_loop() > as "sleep time" and adjust the clock for that on exit from > suspend-to-idle. > > [That also covers systems on which timekeeping is not suspended > by by s2idle_enter().] > > Signed-off-by: Rafael J. Wysocki > --- > > This is a replacement for https://patchwork.kernel.org/patch/10599209/ > > I decided to count the entire loop in s2idle_loop() as "sleep time" as the > patch is then simpler and it also covers systems where timekeeping is not > suspended in the final step of suspend-to-idle. > > I dropped the "Fixes:" tag, because the monotonic clock delta problem > has been present on the latter since the very introduction of "freeze" > (as suspend-to-idle was referred to previously) and so this doesn't fix > any particular later commits. > > --- > kernel/power/suspend.c | 18 ++ > 1 file changed, 18 insertions(+) > > Index: linux-pm/kernel/power/suspend.c > === > --- linux-pm.orig/kernel/power/suspend.c > +++ linux-pm/kernel/power/suspend.c > @@ -109,8 +109,12 @@ static void s2idle_enter(void) > > static void s2idle_loop(void) > { > + ktime_t start, delta; > + > pm_pr_dbg("suspend-to-idle\n"); > > + start = ktime_get(); > + > for (;;) { > int error; > > @@ -150,6 +154,20 @@ static void s2idle_loop(void) > pm_wakeup_clear(false); > } > > + /* > + * If the monotonic clock difference between the start of the loop and > + * this point is too large, user space may get confused about whether or > + * not the system has been suspended and tasks may get killed by > + * watchdogs etc., so count the loop as "sleep time" to compensate for > + * that. > + */ > + delta = ktime_sub(ktime_get(), start); > + if (ktime_to_ns(delta) > 0) { > + struct timespec64 timespec64_delta = ktime_to_timespec64(delta); > + > + timekeeping_inject_sleeptime64(×pec64_delta); > + } But doesn't injecting sleep time here make monotonic clock too large by the amount of sleeptime? tick_freeze() / tick_unfreeze() already injects the sleeptime (otherwise delta would be 0). > + > pm_pr_dbg("resume from suspend-to-idle\n"); > } > >
Re: [RFCv2 00/48] perf tools: Add threads to record command
On Fri, Sep 14, 2018 at 10:26:53AM +0200, Jiri Olsa wrote: SNIP > > > The threaded monitoring currently can't monitor backward maps > > > and there are probably more limitations which I haven't spotted > > > yet. > > > > > > So far I tested on laptop: > > > http://people.redhat.com/~jolsa/record_threads/test-4CPU.txt > > > > > > and a one bigger server: > > > http://people.redhat.com/~jolsa/record_threads/test-208CPU.txt > > > > > > I can see decrease in recorded LOST events, but both the benchmark > > > and the monitoring must be carefully configured wrt: > > > - number of events (frequency) > > > - size of the memory maps > > > - size of events (callchains) > > > - final perf.data size > > > > > > It's also available in: > > > git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git > > > perf/record_threads > > > > > > thoughts? ;-) thanks > > > jirka > > > > It is preferable to split into smaller pieces that bring > > some improvement proved by metrics numbers and ready for > > merging and upstream. Do we have more metrics than the > > data loss from trace AIO patches? > > well the primary focus is to get more events in, > so the LOST metric is the main one actualy I was hoping, could you please run it through the same tests as you do for AIO code on some huge server? thanks, jirka
Re: [PATCH] cpufreq: Convert to using %pOFn instead of device_node.name
On Tuesday, August 28, 2018 3:52:15 AM CEST Rob Herring wrote: > In preparation to remove the node name pointer from struct device_node, > convert printf users to use the %pOFn format specifier. > > Cc: "Rafael J. Wysocki" > Cc: Viresh Kumar > Cc: linux-arm-ker...@lists.infradead.org > Cc: linux-samsung-...@vger.kernel.org > Cc: linux...@vger.kernel.org > Signed-off-by: Rob Herring > --- > drivers/cpufreq/s5pv210-cpufreq.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/cpufreq/s5pv210-cpufreq.c > b/drivers/cpufreq/s5pv210-cpufreq.c > index 5d31c2db12a3..dbecd7667db2 100644 > --- a/drivers/cpufreq/s5pv210-cpufreq.c > +++ b/drivers/cpufreq/s5pv210-cpufreq.c > @@ -611,8 +611,8 @@ static int s5pv210_cpufreq_probe(struct platform_device > *pdev) > for_each_compatible_node(np, NULL, "samsung,s5pv210-dmc") { > id = of_alias_get_id(np, "dmc"); > if (id < 0 || id >= ARRAY_SIZE(dmc_base)) { > - pr_err("%s: failed to get alias of dmc node '%s'\n", > - __func__, np->name); > + pr_err("%s: failed to get alias of dmc node '%pOFn'\n", > + __func__, np); > of_node_put(np); > return id; > } > Applied, thanks!
[PATCH v3 2/7] regulator: dt bindings: add BD71847 device-tree binding documentation
Add ROHM BD71847 Power Management IC regulator binding information to device-tree binding documents. Signed-off-by: Matti Vaittinen Reviewed-by: Rob Herring --- .../devicetree/bindings/regulator/rohm,bd71837-regulator.txt | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/Documentation/devicetree/bindings/regulator/rohm,bd71837-regulator.txt b/Documentation/devicetree/bindings/regulator/rohm,bd71837-regulator.txt index 76ead07072b1..4b98ca26e61a 100644 --- a/Documentation/devicetree/bindings/regulator/rohm,bd71837-regulator.txt +++ b/Documentation/devicetree/bindings/regulator/rohm,bd71837-regulator.txt @@ -1,7 +1,9 @@ -ROHM BD71837 Power Management Integrated Circuit (PMIC) regulator bindings +ROHM BD71837 and BD71847 Power Management Integrated Circuit regulator bindings Required properties: - - regulator-name: should be "buck1", ..., "buck8" and "ldo1", ..., "ldo7" + - regulator-name: should be "buck1", ..., "buck8" and "ldo1", ..., "ldo7" for + BD71837. For BD71847 names should be "buck1", ..., "buck6" + and "ldo1", ..., "ldo6" List of regulators provided by this controller. BD71837 regulators node should be sub node of the BD71837 MFD node. See BD71837 MFD bindings at @@ -16,10 +18,14 @@ disabled by driver at startup. LDO5 and LDO6 are supplied by those and if they are disabled at startup the voltage monitoring for LDO5/LDO6 will cause PMIC to reset. -The valid names for regulator nodes are: +The valid names for BD71837 regulator nodes are: BUCK1, BUCK2, BUCK3, BUCK4, BUCK5, BUCK6, BUCK7, BUCK8 LDO1, LDO2, LDO3, LDO4, LDO5, LDO6, LDO7 +The valid names for BD71847 regulator nodes are: +BUCK1, BUCK2, BUCK3, BUCK4, BUCK5, BUCK6 +LDO1, LDO2, LDO3, LDO4, LDO5, LDO6 + Optional properties: - Any optional property defined in bindings/regulator/regulator.txt -- 2.14.3
[PATCH v3 3/7] mfd: dt bindings: add BD71847 device-tree binding documentation
Add ROHM BD71847 Power Management IC MFD binding information to device-tree binding documents. Signed-off-by: Matti Vaittinen Reviewed-by: Rob Herring Acked-for-MFD-by: Lee Jones --- .../devicetree/bindings/mfd/rohm,bd71837-pmic.txt | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/Documentation/devicetree/bindings/mfd/rohm,bd71837-pmic.txt b/Documentation/devicetree/bindings/mfd/rohm,bd71837-pmic.txt index 3ca56fdb5ffe..a4b056761eaa 100644 --- a/Documentation/devicetree/bindings/mfd/rohm,bd71837-pmic.txt +++ b/Documentation/devicetree/bindings/mfd/rohm,bd71837-pmic.txt @@ -1,16 +1,17 @@ -* ROHM BD71837 Power Management Integrated Circuit bindings +* ROHM BD71837 and BD71847 Power Management Integrated Circuit bindings -BD71837MWV is a programmable Power Management IC for powering single-core, -dual-core, and quad-core SoCs such as NXP-i.MX 8M. It is optimized for -low BOM cost and compact solution footprint. It integrates 8 Buck -egulators and 7 LDOs to provide all the power rails required by the SoC and -the commonly used peripherals. +BD71837MWV and BD71847MWV are programmable Power Management ICs for powering +single-core, dual-core, and quad-core SoCs such as NXP-i.MX 8M. They are +optimized for low BOM cost and compact solution footprint. BD71837MWV +integrates 8 Buck regulators and 7 LDOs. BD71847MWV contains 6 Buck regulators +and 6 LDOs. -Datasheet for PMIC is available at: +Datasheet for BD71837 is available at: https://www.rohm.com/datasheet/BD71837MWV/bd71837mwv-e Required properties: - - compatible : Should be "rohm,bd71837". + - compatible : Should be "rohm,bd71837" for bd71837 + "rohm,bd71847" for bd71847. - reg : I2C slave address. - interrupt-parent: Phandle to the parent interrupt controller. - interrupts : The interrupt line the device is connected to. -- 2.14.3
[PATCH v3 4/7] regulator: Support regulators where voltage ranges are selectable
For example ROHM BD71837 and ROHM BD71847 Power management ICs have regulators which provide multiple linear ranges. Ranges can be selected by individual non contagious bit in vsel register. Add regmap helper functions for selecting ranges. Signed-off-by: Matti Vaittinen --- drivers/regulator/core.c | 5 + drivers/regulator/helpers.c | 232 +++ include/linux/regulator/driver.h | 20 +++- 3 files changed, 256 insertions(+), 1 deletion(-) diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c index 5f2218c67edf..b836a129288a 100644 --- a/drivers/regulator/core.c +++ b/drivers/regulator/core.c @@ -2788,6 +2788,11 @@ static int regulator_map_voltage(struct regulator_dev *rdev, int min_uV, if (desc->ops->list_voltage == regulator_list_voltage_linear_range) return regulator_map_voltage_linear_range(rdev, min_uV, max_uV); + if (desc->ops->list_voltage == + regulator_list_voltage_pickable_linear_range) + return regulator_map_voltage_pickable_linear_range(rdev, + min_uV, max_uV); + return regulator_map_voltage_iterate(rdev, min_uV, max_uV); } diff --git a/drivers/regulator/helpers.c b/drivers/regulator/helpers.c index ef09021dc46e..5686a1335bd3 100644 --- a/drivers/regulator/helpers.c +++ b/drivers/regulator/helpers.c @@ -103,6 +103,128 @@ int regulator_disable_regmap(struct regulator_dev *rdev) } EXPORT_SYMBOL_GPL(regulator_disable_regmap); +static int regulator_range_selector_to_index(struct regulator_dev *rdev, +unsigned int rval) +{ + int i; + + if (!rdev->desc->linear_range_selectors) + return -EINVAL; + + rval &= rdev->desc->vsel_range_mask; + + for (i = 0; i < rdev->desc->n_linear_ranges; i++) { + if (rdev->desc->linear_range_selectors[i] == rval) + return i; + } + return -EINVAL; +} + +/** + * regulator_get_voltage_sel_pickable_regmap - pickable range get_voltage_sel + * + * @rdev: regulator to operate on + * + * Regulators that use regmap for their register I/O and use pickable + * ranges can set the vsel_reg, vsel_mask, vsel_range_reg and vsel_range_mask + * fields in their descriptor and then use this as their get_voltage_vsel + * operation, saving some code. + */ +int regulator_get_voltage_sel_pickable_regmap(struct regulator_dev *rdev) +{ + unsigned int r_val; + int range; + unsigned int val; + int ret, i; + unsigned int voltages_in_range = 0; + + if (!rdev->desc->linear_ranges) + return -EINVAL; + + ret = regmap_read(rdev->regmap, rdev->desc->vsel_reg, &val); + if (ret != 0) + return ret; + + ret = regmap_read(rdev->regmap, rdev->desc->vsel_range_reg, &r_val); + if (ret != 0) + return ret; + + val &= rdev->desc->vsel_mask; + val >>= ffs(rdev->desc->vsel_mask) - 1; + + range = regulator_range_selector_to_index(rdev, r_val); + if (range < 0) + return -EINVAL; + + for (i = 0; i < range; i++) + voltages_in_range += (rdev->desc->linear_ranges[i].max_sel - +rdev->desc->linear_ranges[i].min_sel) + 1; + + return val + voltages_in_range; +} +EXPORT_SYMBOL_GPL(regulator_get_voltage_sel_pickable_regmap); + +/** + * regulator_set_voltage_sel_pickable_regmap - pickable range set_voltage_sel + * + * @rdev: regulator to operate on + * @sel: Selector to set + * + * Regulators that use regmap for their register I/O and use pickable + * ranges can set the vsel_reg, vsel_mask, vsel_range_reg and vsel_range_mask + * fields in their descriptor and then use this as their set_voltage_vsel + * operation, saving some code. + */ +int regulator_set_voltage_sel_pickable_regmap(struct regulator_dev *rdev, + unsigned int sel) +{ + unsigned int range; + int ret, i; + unsigned int voltages_in_range = 0; + + for (i = 0; i < rdev->desc->n_linear_ranges; i++) { + voltages_in_range = (rdev->desc->linear_ranges[i].max_sel - +rdev->desc->linear_ranges[i].min_sel) + 1; + if (sel < voltages_in_range) + break; + sel -= voltages_in_range; + } + + if (i == rdev->desc->n_linear_ranges) + return -EINVAL; + + sel <<= ffs(rdev->desc->vsel_mask) - 1; + sel += rdev->desc->linear_ranges[i].min_sel; + + range = rdev->desc->linear_range_selectors[i]; + + if (rdev->desc->vsel_reg == rdev->desc->vsel_range_reg) { + ret = regmap_update_bits(rdev->regmap, +rdev->desc->vsel_reg, +rdev->desc->vsel_range_mask | +
[PATCH v3 5/7] regulator/mfd: bd718xx: rename bd71837/bd71847 common instances
Rename parts of code that support both BD71837 and BD71847 to BD718XX. Signed-off-by: Matti Vaittinen Acked-for-MFD-by: Lee Jones --- drivers/mfd/rohm-bd718x7.c| 98 +-- drivers/regulator/bd71837-regulator.c | 26 +- include/linux/mfd/rohm-bd718x7.h | 64 +++ 3 files changed, 94 insertions(+), 94 deletions(-) diff --git a/drivers/mfd/rohm-bd718x7.c b/drivers/mfd/rohm-bd718x7.c index ce5aa5cd3545..161c8aac6d86 100644 --- a/drivers/mfd/rohm-bd718x7.c +++ b/drivers/mfd/rohm-bd718x7.c @@ -2,9 +2,9 @@ // // Copyright (C) 2018 ROHM Semiconductors // -// ROHM BD71837MWV PMIC driver +// ROHM BD71837MWV and BD71847MWV PMIC driver // -// Datasheet available from +// Datasheet for BD71837MWV available from // https://www.rohm.com/datasheet/BD71837MWV/bd71837mwv-e #include @@ -30,7 +30,7 @@ static struct gpio_keys_platform_data bd718xx_powerkey_data = { .name = "bd718xx-pwrkey", }; -static struct mfd_cell bd71837_mfd_cells[] = { +static struct mfd_cell bd718xx_mfd_cells[] = { { .name = "gpio-keys", .platform_data = &bd718xx_powerkey_data, @@ -40,20 +40,20 @@ static struct mfd_cell bd71837_mfd_cells[] = { { .name = "bd718xx-pmic", }, }; -static const struct regmap_irq bd71837_irqs[] = { - REGMAP_IRQ_REG(BD71837_INT_SWRST, 0, BD71837_INT_SWRST_MASK), - REGMAP_IRQ_REG(BD71837_INT_PWRBTN_S, 0, BD71837_INT_PWRBTN_S_MASK), - REGMAP_IRQ_REG(BD71837_INT_PWRBTN_L, 0, BD71837_INT_PWRBTN_L_MASK), - REGMAP_IRQ_REG(BD71837_INT_PWRBTN, 0, BD71837_INT_PWRBTN_MASK), - REGMAP_IRQ_REG(BD71837_INT_WDOG, 0, BD71837_INT_WDOG_MASK), - REGMAP_IRQ_REG(BD71837_INT_ON_REQ, 0, BD71837_INT_ON_REQ_MASK), - REGMAP_IRQ_REG(BD71837_INT_STBY_REQ, 0, BD71837_INT_STBY_REQ_MASK), +static const struct regmap_irq bd718xx_irqs[] = { + REGMAP_IRQ_REG(BD718XX_INT_SWRST, 0, BD718XX_INT_SWRST_MASK), + REGMAP_IRQ_REG(BD718XX_INT_PWRBTN_S, 0, BD718XX_INT_PWRBTN_S_MASK), + REGMAP_IRQ_REG(BD718XX_INT_PWRBTN_L, 0, BD718XX_INT_PWRBTN_L_MASK), + REGMAP_IRQ_REG(BD718XX_INT_PWRBTN, 0, BD718XX_INT_PWRBTN_MASK), + REGMAP_IRQ_REG(BD718XX_INT_WDOG, 0, BD718XX_INT_WDOG_MASK), + REGMAP_IRQ_REG(BD718XX_INT_ON_REQ, 0, BD718XX_INT_ON_REQ_MASK), + REGMAP_IRQ_REG(BD718XX_INT_STBY_REQ, 0, BD718XX_INT_STBY_REQ_MASK), }; -static struct regmap_irq_chip bd71837_irq_chip = { - .name = "bd71837-irq", - .irqs = bd71837_irqs, - .num_irqs = ARRAY_SIZE(bd71837_irqs), +static struct regmap_irq_chip bd718xx_irq_chip = { + .name = "bd718xx-irq", + .irqs = bd718xx_irqs, + .num_irqs = ARRAY_SIZE(bd718xx_irqs), .num_regs = 1, .irq_reg_stride = 1, .status_base = BD718XX_REG_IRQ, @@ -73,7 +73,7 @@ static const struct regmap_access_table volatile_regs = { .n_yes_ranges = 1, }; -static const struct regmap_config bd71837_regmap_config = { +static const struct regmap_config bd718xx_regmap_config = { .reg_bits = 8, .val_bits = 8, .volatile_table = &volatile_regs, @@ -81,10 +81,10 @@ static const struct regmap_config bd71837_regmap_config = { .cache_type = REGCACHE_RBTREE, }; -static int bd71837_i2c_probe(struct i2c_client *i2c, +static int bd718xx_i2c_probe(struct i2c_client *i2c, const struct i2c_device_id *id) { - struct bd71837 *bd71837; + struct bd718xx *bd718xx; int ret; if (!i2c->irq) { @@ -92,33 +92,33 @@ static int bd71837_i2c_probe(struct i2c_client *i2c, return -EINVAL; } - bd71837 = devm_kzalloc(&i2c->dev, sizeof(struct bd71837), GFP_KERNEL); + bd718xx = devm_kzalloc(&i2c->dev, sizeof(struct bd718xx), GFP_KERNEL); - if (!bd71837) + if (!bd718xx) return -ENOMEM; - bd71837->chip_irq = i2c->irq; - bd71837->chip_type = (unsigned int)(uintptr_t) + bd718xx->chip_irq = i2c->irq; + bd718xx->chip_type = (unsigned int) of_device_get_match_data(&i2c->dev); - bd71837->dev = &i2c->dev; - dev_set_drvdata(&i2c->dev, bd71837); + bd718xx->dev = &i2c->dev; + dev_set_drvdata(&i2c->dev, bd718xx); - bd71837->regmap = devm_regmap_init_i2c(i2c, &bd71837_regmap_config); - if (IS_ERR(bd71837->regmap)) { + bd718xx->regmap = devm_regmap_init_i2c(i2c, &bd718xx_regmap_config); + if (IS_ERR(bd718xx->regmap)) { dev_err(&i2c->dev, "regmap initialization failed\n"); - return PTR_ERR(bd71837->regmap); + return PTR_ERR(bd718xx->regmap); } - ret = devm_regmap_add_irq_chip(&i2c->dev, bd71837->regmap, - bd71837->chip_irq, IRQF_ONESHOT, 0, - &bd71837_irq_chip, &bd71837->irq_data); + ret = devm_regmap_add_irq_chip(&i2c->dev
Re: [PATCH 2/2] clk: meson-g12a: Add AO Clock controller driver
On Tue, 2018-08-28 at 11:35 +0800, Jian Hu wrote: > Hi: Jerome > > On 2018/8/27 21:07, Jerome Brunet wrote: > > On Fri, 2018-08-24 at 21:34 +0800, Jian Hu wrote: > > > > > > > > > > I am confued about aoclk81's parent clocks. > > > > > > I can not get the example of axg audio clock driver, Could you provide > > > the link? Had it merged into clk-meson.git? > > > > Yes and mainline as well : drivers/clk/meson/axg-audio.c > > > > Basically this driver is creating bypass input clocks (audio_pclk, > > mst_in[0-9], > > etc...) . > > > > This allows to collect input clocks from DT (like any consumer should) will > > keeping constant in the controller clock tree. > > > > > From what I've seen of your controller drivers, the EE controller should > > > have > > > > one input, the AO should have 3. > > > > > > > > . > > > > I still can not get the example meaning in axg audio driver. here is one specific example: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/clk/meson/axg-audio.c?h=v4.19-rc1#n785 > > In 26 page of A113D_Datasheet V0.7 20170725-Baylibre.pdf,We can see the > aoclk81 has two parents. clk81 and ao_slow_clk. I can not get 3 parents. > > clk81|\ > ---| \aoclk81 > src0 |\ | |--- > -| \ ao_slow_clk | | > | |-| / > -| / |/ > src1|/ > > > src0 is from xtal, if can generate 32k clock, but it is never used. > src1 is from gpio clock, It is never used. If necessary, the ao_slow_clk > maybe described. > > So why aoclk81 has 3 parents? I never said that. I said your clock controller driver should have 3 input clocks. These clock are the one used by the controller but not produced by it: for the AO controller xtal, ao_xtal, and ee controller clk81.
Re: [PATCH -next] ASoC: qdsp6: q6asm-dai: remove duplicated include from q6asm-dai.c
On 14/09/18 02:36, YueHaibing wrote: Remove duplicated include. Signed-off-by: YueHaibing Acked-by: Srinivas Kandagatla --- sound/soc/qcom/qdsp6/q6asm-dai.c | 1 - 1 file changed, 1 deletion(-) diff --git a/sound/soc/qcom/qdsp6/q6asm-dai.c b/sound/soc/qcom/qdsp6/q6asm-dai.c index c75fab3..c3806d7 100644 --- a/sound/soc/qcom/qdsp6/q6asm-dai.c +++ b/sound/soc/qcom/qdsp6/q6asm-dai.c @@ -8,7 +8,6 @@ #include #include #include -#include #include #include #include
Re: [PATCH] cpufreq: remove unnecessary unlikely()
On Friday, September 7, 2018 6:09:55 PM CEST Igor Stoppa wrote: > WARN_ON() already contains an unlikely(), so it's not necessary to wrap it > into another. > > Signed-off-by: Igor Stoppa > Cc: Srivatsa S. Bhat > Cc: "Rafael J. Wysocki" > Cc: linux...@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > --- > drivers/cpufreq/cpufreq.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > index f53fb41efb7b..7aa3dcad2175 100644 > --- a/drivers/cpufreq/cpufreq.c > +++ b/drivers/cpufreq/cpufreq.c > @@ -403,7 +403,7 @@ EXPORT_SYMBOL_GPL(cpufreq_freq_transition_begin); > void cpufreq_freq_transition_end(struct cpufreq_policy *policy, > struct cpufreq_freqs *freqs, int transition_failed) > { > - if (unlikely(WARN_ON(!policy->transition_ongoing))) > + if (WARN_ON(!policy->transition_ongoing)) > return; > > cpufreq_notify_post_transition(policy, freqs, transition_failed); > Applied, thanks!
[PATCH net-next 5/5] net: phy: mscc: remove unneeded temporary variable
Here, the rc variable is either used only for the condition right after the assignment or right before being used as the return value of the function it's being used in. So let's remove this unneeded temporary variable whenever possible. Signed-off-by: Quentin Schulz --- drivers/net/phy/mscc.c | 17 + 1 file changed, 5 insertions(+), 12 deletions(-) diff --git a/drivers/net/phy/mscc.c b/drivers/net/phy/mscc.c index efa9352..24f4754 100644 --- a/drivers/net/phy/mscc.c +++ b/drivers/net/phy/mscc.c @@ -199,10 +199,7 @@ static const struct vsc8531_edge_rate_table edge_table[] = { static int vsc85xx_phy_page_set(struct phy_device *phydev, u16 page) { - int rc; - - rc = phy_write(phydev, MSCC_EXT_PAGE_ACCESS, page); - return rc; + return phy_write(phydev, MSCC_EXT_PAGE_ACCESS, page); } static int vsc85xx_get_sset_count(struct phy_device *phydev) @@ -504,7 +501,7 @@ static void vsc85xx_wol_get(struct phy_device *phydev, static int vsc85xx_edge_rate_magic_get(struct phy_device *phydev) { u32 vdd, sd; - int rc, i, j; + int i, j; struct device *dev = &phydev->mdio.dev; struct device_node *of_node = dev->of_node; u8 sd_array_size = ARRAY_SIZE(edge_table[0].slowdown); @@ -512,12 +509,10 @@ static int vsc85xx_edge_rate_magic_get(struct phy_device *phydev) if (!of_node) return -ENODEV; - rc = of_property_read_u32(of_node, "vsc8531,vddmac", &vdd); - if (rc != 0) + if (of_property_read_u32(of_node, "vsc8531,vddmac", &vdd)) vdd = MSCC_VDDMAC_3300; - rc = of_property_read_u32(of_node, "vsc8531,edge-slowdown", &sd); - if (rc != 0) + if (of_property_read_u32(of_node, "vsc8531,edge-slowdown", &sd)) sd = 0; for (i = 0; i < ARRAY_SIZE(edge_table); i++) @@ -762,9 +757,7 @@ static int vsc85xx_config_init(struct phy_device *phydev) return rc; } - rc = genphy_config_init(phydev); - - return rc; + return genphy_config_init(phydev); } static int vsc85xx_ack_interrupt(struct phy_device *phydev) -- git-series 0.9.1
[PATCH 2/3] powerpc: Add system call table generation support
The system call tables are in different format in all architecture and it will be difficult to manually add or modify the system calls in the respective files. To make it easy by keeping a script and which'll generate the header file and syscall table file so this change will unify them across all architectures. The system call table generation script is added in syscalls directory which contain the script to generate both uapi header file system call table generation file and syscall_32/64.tbl file which'll be the input for the scripts. syscall_32/64.tbl contains the list of available system calls along with system call number and corresponding entry point. Add a new system call in this architecture will be possible by adding new entry in the syscall_32/64.tbl file. Adding a new table entry consisting of: - System call number. - ABI. - System call name. - Entry point name. - Compat entry name, if required. syscallhdr.sh and syscalltbl.sh will generate uapi header- unistd_32/64.h and syscall_table_32/64/c32.h files respectively. File syscall_table_32/64/c32.h is included by syscall.S - the real system call table. Both .sh files will parse the content syscall.tbl to generate the header and table files. ARM, s390 and x86 architecuture does have the similar support. I leverage their implementation to come up with a generic solution. Signed-off-by: Firoz Khan --- arch/powerpc/kernel/syscalls/Makefile | 51 arch/powerpc/kernel/syscalls/syscall_32.tbl | 378 arch/powerpc/kernel/syscalls/syscall_64.tbl | 372 +++ arch/powerpc/kernel/syscalls/syscallhdr.sh | 37 +++ arch/powerpc/kernel/syscalls/syscalltbl.sh | 38 +++ 5 files changed, 876 insertions(+) create mode 100644 arch/powerpc/kernel/syscalls/Makefile create mode 100644 arch/powerpc/kernel/syscalls/syscall_32.tbl create mode 100644 arch/powerpc/kernel/syscalls/syscall_64.tbl create mode 100644 arch/powerpc/kernel/syscalls/syscallhdr.sh create mode 100644 arch/powerpc/kernel/syscalls/syscalltbl.sh diff --git a/arch/powerpc/kernel/syscalls/Makefile b/arch/powerpc/kernel/syscalls/Makefile new file mode 100644 index 000..0c87acb --- /dev/null +++ b/arch/powerpc/kernel/syscalls/Makefile @@ -0,0 +1,51 @@ +# SPDX-License-Identifier: GPL-2.0 +out := arch/$(SRCARCH)/include/generated/asm +uapi := arch/$(SRCARCH)/include/generated/uapi/asm + +_dummy := $(shell [ -d '$(uapi)' ] || mkdir -p '$(uapi)') \ + $(shell [ -d '$(out)' ] || mkdir -p '$(out)') + +syscall32 := $(srctree)/$(src)/syscall_32.tbl +syscall64 := $(srctree)/$(src)/syscall_64.tbl + +syshdr := $(srctree)/$(src)/syscallhdr.sh +systbl := $(srctree)/$(src)/syscalltbl.sh + +quiet_cmd_syshdr = SYSHDR $@ + cmd_syshdr = $(CONFIG_SHELL) '$(syshdr)' '$<' '$@' \ + '$(syshdr_abi_$(basetarget))' \ + '$(syshdr_pfx_$(basetarget))' \ + '$(syshdr_offset_$(basetarget))' + +quiet_cmd_systbl = SYSTBL $@ + cmd_systbl = $(CONFIG_SHELL) '$(systbl)' '$<' '$@' \ + '$(systbl_abi_$(basetarget))' + +$(uapi)/unistd_32.h: $(syscall32) $(syshdr) + $(call if_changed,syshdr) + +$(uapi)/unistd_64.h: $(syscall64) $(syshdr) + $(call if_changed,syshdr) + +systbl_abi_syscall_table_32 := 32 +$(out)/syscall_table_32.h: $(syscall32) $(systbl) + $(call if_changed,systbl) + +systbl_abi_syscall_table_64 := 64 +$(out)/syscall_table_64.h: $(syscall64) $(systbl) + $(call if_changed,systbl) + +systbl_abi_syscall_table_c32 := c32 +$(out)/syscall_table_c32.h: $(syscall32) $(systbl) + $(call if_changed,systbl) + +uapisyshdr-y += unistd_32.h unistd_64.h +syshdr-y += syscall_table_32.h syscall_table_64.h \ + syscall_table_c32.h + +targets+= $(uapisyshdr-y) $(syshdr-y) + +PHONY += all +all: $(addprefix $(uapi)/,$(uapisyshdr-y)) +all: $(addprefix $(out)/,$(syshdr-y)) + @: diff --git a/arch/powerpc/kernel/syscalls/syscall_32.tbl b/arch/powerpc/kernel/syscalls/syscall_32.tbl new file mode 100644 index 000..50c419c --- /dev/null +++ b/arch/powerpc/kernel/syscalls/syscall_32.tbl @@ -0,0 +1,378 @@ +# +# 32-bit system call numbers and entry vectors +# +# The format is: +# +# +# The abi is always "common" for this file. +# +0 common restart_syscall sys_restart_syscall +1 common exitsys_exit +2 common forkppc_fork +3 common readsys_read +4 common write sys_write +5 common opensys_open compat_sys_open +6 common close sys_close +
[PATCH v3 7/7] regulator: bd718xx: rename bd71837 to 718xx
rename bd71837-regulator.c to bd718x7-regulator.c to reflect the fact that also BD71847 is now supported by the driver. Signed-off-by: Matti Vaittinen --- drivers/regulator/Kconfig | 2 +- drivers/regulator/Makefile | 2 +- drivers/regulator/{bd71837-regulator.c => bd718x7-regulator.c} | 0 3 files changed, 2 insertions(+), 2 deletions(-) rename drivers/regulator/{bd71837-regulator.c => bd718x7-regulator.c} (100%) diff --git a/drivers/regulator/Kconfig b/drivers/regulator/Kconfig index 329cdd33ed62..6e96ef1bd74a 100644 --- a/drivers/regulator/Kconfig +++ b/drivers/regulator/Kconfig @@ -189,7 +189,7 @@ config REGULATOR_BD718XX and LDO regulators. This driver can also be built as a module. If so, the module - will be called bd71837-regulator. + will be called bd718x7-regulator. config REGULATOR_BD9571MWV tristate "ROHM BD9571MWV Regulators" diff --git a/drivers/regulator/Makefile b/drivers/regulator/Makefile index 801d9a34a203..eac4d794f3b8 100644 --- a/drivers/regulator/Makefile +++ b/drivers/regulator/Makefile @@ -27,7 +27,7 @@ obj-$(CONFIG_REGULATOR_AS3711) += as3711-regulator.o obj-$(CONFIG_REGULATOR_AS3722) += as3722-regulator.o obj-$(CONFIG_REGULATOR_AXP20X) += axp20x-regulator.o obj-$(CONFIG_REGULATOR_BCM590XX) += bcm590xx-regulator.o -obj-$(CONFIG_REGULATOR_BD718XX) += bd71837-regulator.o +obj-$(CONFIG_REGULATOR_BD718XX) += bd718x7-regulator.o obj-$(CONFIG_REGULATOR_BD9571MWV) += bd9571mwv-regulator.o obj-$(CONFIG_REGULATOR_DA903X) += da903x.o obj-$(CONFIG_REGULATOR_DA9052) += da9052-regulator.o diff --git a/drivers/regulator/bd71837-regulator.c b/drivers/regulator/bd718x7-regulator.c similarity index 100% rename from drivers/regulator/bd71837-regulator.c rename to drivers/regulator/bd718x7-regulator.c -- 2.14.3
[PATCH net-next 4/5] net: phy: mscc: shorten `x != 0` condition to `x`
`if (x != 0)` is basically a more verbose version of `if (x)` so let's use the latter so it's consistent throughout the whole driver. Signed-off-by: Quentin Schulz --- drivers/net/phy/mscc.c | 30 +++--- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/drivers/net/phy/mscc.c b/drivers/net/phy/mscc.c index 734d9fb..efa9352 100644 --- a/drivers/net/phy/mscc.c +++ b/drivers/net/phy/mscc.c @@ -311,11 +311,11 @@ static int vsc85xx_mdix_set(struct phy_device *phydev, u8 mdix) DISABLE_HP_AUTO_MDIX_MASK); } rc = phy_write(phydev, MSCC_PHY_BYPASS_CONTROL, reg_val); - if (rc != 0) + if (rc) return rc; rc = vsc85xx_phy_page_set(phydev, MSCC_PHY_PAGE_EXTENDED); - if (rc != 0) + if (rc) return rc; reg_val = phy_read(phydev, MSCC_PHY_EXT_MODE_CNTL); @@ -325,11 +325,11 @@ static int vsc85xx_mdix_set(struct phy_device *phydev, u8 mdix) else if (mdix == ETH_TP_MDI_X) reg_val |= FORCE_MDI_CROSSOVER_MDIX; rc = phy_write(phydev, MSCC_PHY_EXT_MODE_CNTL, reg_val); - if (rc != 0) + if (rc) return rc; rc = vsc85xx_phy_page_set(phydev, MSCC_PHY_PAGE_STANDARD); - if (rc != 0) + if (rc) return rc; return genphy_restart_aneg(phydev); @@ -341,7 +341,7 @@ static int vsc85xx_downshift_get(struct phy_device *phydev, u8 *count) u16 reg_val; rc = vsc85xx_phy_page_set(phydev, MSCC_PHY_PAGE_EXTENDED); - if (rc != 0) + if (rc) goto out; reg_val = phy_read(phydev, MSCC_PHY_ACTIPHY_CNTL); @@ -373,14 +373,14 @@ static int vsc85xx_downshift_set(struct phy_device *phydev, u8 count) } rc = vsc85xx_phy_page_set(phydev, MSCC_PHY_PAGE_EXTENDED); - if (rc != 0) + if (rc) goto out; reg_val = phy_read(phydev, MSCC_PHY_ACTIPHY_CNTL); reg_val &= ~(DOWNSHIFT_CNTL_MASK); reg_val |= count; rc = phy_write(phydev, MSCC_PHY_ACTIPHY_CNTL, reg_val); - if (rc != 0) + if (rc) goto out; rc = vsc85xx_phy_page_set(phydev, MSCC_PHY_PAGE_STANDARD); @@ -401,7 +401,7 @@ static int vsc85xx_wol_set(struct phy_device *phydev, mutex_lock(&phydev->lock); rc = vsc85xx_phy_page_set(phydev, MSCC_PHY_PAGE_EXTENDED_2); - if (rc != 0) + if (rc) goto out_unlock; if (wol->wolopts & WAKE_MAGIC) { @@ -439,7 +439,7 @@ static int vsc85xx_wol_set(struct phy_device *phydev, phy_write(phydev, MSCC_PHY_WOL_MAC_CONTROL, reg_val); rc = vsc85xx_phy_page_set(phydev, MSCC_PHY_PAGE_STANDARD); - if (rc != 0) + if (rc) goto out_unlock; if (wol->wolopts & WAKE_MAGIC) { @@ -447,14 +447,14 @@ static int vsc85xx_wol_set(struct phy_device *phydev, reg_val = phy_read(phydev, MII_VSC85XX_INT_MASK); reg_val |= MII_VSC85XX_INT_MASK_WOL; rc = phy_write(phydev, MII_VSC85XX_INT_MASK, reg_val); - if (rc != 0) + if (rc) goto out_unlock; } else { /* Disable the WOL interrupt */ reg_val = phy_read(phydev, MII_VSC85XX_INT_MASK); reg_val &= (~MII_VSC85XX_INT_MASK_WOL); rc = phy_write(phydev, MII_VSC85XX_INT_MASK, reg_val); - if (rc != 0) + if (rc) goto out_unlock; } /* Clear WOL iterrupt status */ @@ -595,13 +595,13 @@ static int vsc85xx_edge_rate_cntl_set(struct phy_device *phydev, u8 edge_rate) mutex_lock(&phydev->lock); rc = vsc85xx_phy_page_set(phydev, MSCC_PHY_PAGE_EXTENDED_2); - if (rc != 0) + if (rc) goto out_unlock; reg_val = phy_read(phydev, MSCC_PHY_WOL_MAC_CONTROL); reg_val &= ~(EDGE_RATE_CNTL_MASK); reg_val |= (edge_rate << EDGE_RATE_CNTL_POS); rc = phy_write(phydev, MSCC_PHY_WOL_MAC_CONTROL, reg_val); - if (rc != 0) + if (rc) goto out_unlock; rc = vsc85xx_phy_page_set(phydev, MSCC_PHY_PAGE_STANDARD); @@ -636,7 +636,7 @@ static int vsc85xx_mac_if_set(struct phy_device *phydev, goto out_unlock; } rc = phy_write(phydev, MSCC_PHY_EXT_PHY_CNTL_1, reg_val); - if (rc != 0) + if (rc) goto out_unlock; rc = genphy_soft_reset(phydev); @@ -655,7 +655,7 @@ static int vsc85xx_default_config(struct phy_device *phydev) phydev->mdix_ctrl = ETH_TP_MDI_AUTO; mutex_lock(&phydev->lock); rc = vsc85xx_phy_page_set(phydev, MSCC_PHY_PAGE_EXTENDED_2); - if (rc != 0) + if (rc) goto out_unlock; reg_val = phy_read(phydev, MSCC_PHY_RGMII_CNTL); -- git-series 0.9.1
[PATCH v3 6/7] regulator: bd718XX use pickable ranges
Few regulators in BD71837 and BD71847 can output voltages from different voltage ranges. Register interface is arranged so that used range is selected by toggling bits which are not next to actual voltage selection bits. Then the voltage inside selected range is determined by voltage selection bits (as usual). Support BD71837 and BD71847 selectible range voltages using new pickable ranges helpers. Signed-off-by: Matti Vaittinen --- drivers/regulator/bd71837-regulator.c | 153 +++--- include/linux/mfd/rohm-bd718x7.h | 32 +++ 2 files changed, 143 insertions(+), 42 deletions(-) diff --git a/drivers/regulator/bd71837-regulator.c b/drivers/regulator/bd71837-regulator.c index 8e89334f94d1..d2522d4e1505 100644 --- a/drivers/regulator/bd71837-regulator.c +++ b/drivers/regulator/bd71837-regulator.c @@ -78,6 +78,34 @@ static int bd718xx_set_voltage_sel_restricted(struct regulator_dev *rdev, return regulator_set_voltage_sel_regmap(rdev, sel); } +static int bd718xx_set_voltage_sel_pickable_restricted( + struct regulator_dev *rdev, unsigned int sel) +{ + if (regulator_is_enabled_regmap(rdev)) + return -EBUSY; + + return regulator_set_voltage_sel_pickable_regmap(rdev, sel); +} + +static struct regulator_ops bd718xx_pickable_range_ldo_ops = { + .enable = regulator_enable_regmap, + .disable = regulator_disable_regmap, + .is_enabled = regulator_is_enabled_regmap, + .list_voltage = regulator_list_voltage_pickable_linear_range, + .set_voltage_sel = bd718xx_set_voltage_sel_pickable_restricted, + .get_voltage_sel = regulator_get_voltage_sel_pickable_regmap, +}; + +static struct regulator_ops bd718xx_pickable_range_buck_ops = { + .enable = regulator_enable_regmap, + .disable = regulator_disable_regmap, + .is_enabled = regulator_is_enabled_regmap, + .list_voltage = regulator_list_voltage_pickable_linear_range, + .set_voltage_sel = bd718xx_set_voltage_sel_pickable_restricted, + .get_voltage_sel = regulator_get_voltage_sel_pickable_regmap, + .set_voltage_time_sel = regulator_set_voltage_time_sel, +}; + static struct regulator_ops bd718xx_ldo_regulator_ops = { .enable = regulator_enable_regmap, .disable = regulator_disable_regmap, @@ -139,24 +167,61 @@ static const struct regulator_linear_range bd718xx_dvs_buck_volts[] = { /* * BD71837 BUCK5 + * 0.7V to 1.35V (range 0) + * and + * 0.675 to 1.325 (range 1) + */ +static const struct regulator_linear_range bd71837_buck5_volts[] = { + /* Ranges when VOLT_SEL bit is 0 */ + REGULATOR_LINEAR_RANGE(70, 0x00, 0x03, 10), + REGULATOR_LINEAR_RANGE(105, 0x04, 0x05, 5), + REGULATOR_LINEAR_RANGE(120, 0x06, 0x07, 15), + /* Ranges when VOLT_SEL bit is 1 */ + REGULATOR_LINEAR_RANGE(675000, 0x0, 0x3, 10), + REGULATOR_LINEAR_RANGE(1025000, 0x4, 0x5, 5), + REGULATOR_LINEAR_RANGE(1175000, 0x6, 0x7, 15), +}; + +/* + * Range selector for first 3 linear ranges is 0x0 + * and 0x1 for last 3 ranges. + */ +static const unsigned int bd71837_buck5_volt_range_sel[] = { + 0x0, 0x0, 0x0, 0x80, 0x80, 0x80 +}; + +/* * BD71847 BUCK3 - * 0.7V to 1.35V () */ -static const struct regulator_linear_range bd718xx_1st_nodvs_buck_volts[] = { +static const struct regulator_linear_range bd71847_buck3_volts[] = { + /* Ranges when VOLT_SEL bits are 00 */ REGULATOR_LINEAR_RANGE(70, 0x00, 0x03, 10), REGULATOR_LINEAR_RANGE(105, 0x04, 0x05, 5), REGULATOR_LINEAR_RANGE(120, 0x06, 0x07, 15), + /* Ranges when VOLT_SEL bits are 01 */ + REGULATOR_LINEAR_RANGE(55, 0x0, 0x7, 5), + /* Ranges when VOLT_SEL bits are 11 */ + REGULATOR_LINEAR_RANGE(675000, 0x0, 0x3, 10), + REGULATOR_LINEAR_RANGE(1025000, 0x4, 0x5, 5), + REGULATOR_LINEAR_RANGE(1175000, 0x6, 0x7, 15), +}; + +static const unsigned int bd71847_buck3_volt_range_sel[] = { + 0x0, 0x0, 0x0, 0x40, 0x80, 0x80, 0x80 }; -static const struct regulator_linear_range bd71847_buck4_voltage_ranges[] = { +static const struct regulator_linear_range bd71847_buck4_volts[] = { REGULATOR_LINEAR_RANGE(300, 0x00, 0x03, 10), + REGULATOR_LINEAR_RANGE(260, 0x00, 0x03, 10), }; +static const unsigned int bd71847_buck4_volt_range_sel[] = { 0x0, 0x40 }; + /* * BUCK6 * 3.0V to 3.3V (step 100mV) */ -static const struct regulator_linear_range bd71837_buck6_voltage_ranges[] = { +static const struct regulator_linear_range bd71837_buck6_volts[] = { REGULATOR_LINEAR_RANGE(300, 0x00, 0x03, 10), }; @@ -190,8 +255,11 @@ static const struct regulator_linear_range bd718xx_4th_nodvs_buck_volts[] = { */ static const struct regulator_linear_range bd718xx_ldo1_volts[] = { REGULATOR_LINEAR_RANGE(300, 0x00, 0x03, 10), + REGULATOR_LINEAR_RANG
Re: [RESEND PATCH v4 5/6] arm64/mm: Populate the swapper_pg_dir by fixmap.
Hi Jun, On 13/09/18 11:50, Jun Yao wrote: > On Fri, Sep 07, 2018 at 10:58:22AM +0100, James Morse wrote: >> On 22/08/18 10:54, Jun Yao wrote: >>> WRITE_ONCE(*pmdp, pmd); >>> dsb(ishst); >>> } >>> @@ -480,6 +511,19 @@ static inline phys_addr_t pmd_page_paddr(pmd_t pmd) >>> >>> static inline void set_pud(pud_t *pudp, pud_t pud) >>> { >>> +#ifdef __PAGETABLE_PUD_FOLDED >>> + if (in_swapper_pgdir(pudp)) { >>> + pud_t *fixmap_pudp; >>> + >>> + spin_lock(&swapper_pgdir_lock); >>> + fixmap_pudp = (pud_t *)pgd_set_fixmap(__pa(pudp)); >> >> I see this called 68 times during boot on a 64K/42bit-VA, 65 of which appear >> to >> be during paging_init(). What do you think to keeping paging_init()s use of >> the >> pgd fixmap for swapper_pg_dir, deliberately to skip the in_swapper_pgdir() >> test >> during paging_init()? > > I find that the __create_pgd_mapping() is used to set up the page table > during paging_init(). And there are six functions calling it with > different pgdps: > > update_mapping_prot() init_mm.pgd(swapper_pg_dir) > create_mapping_noalloc() init_mm.pgd(swapper_pg_dir) > __map_memblock() pgdp(swapper_pg_dir) > map_kernel_segment() pgdp(swapper_pg_dir) > create_pgd_mapping() !(init_mm.pgd) > map_entry_trampoline()tramp_pg_dir > > In order to skip the in_swapper_pgdir() test during paging_init(), We > need a way to determine if we are currently in paging_init(). We don't need to know paging_init() is the caller, we just want to fool in_swapper_pgdir() into not matching the address. Using pgd_set_fixmap() in paging_init(), as we do today would do this, as the value passed to map_kernel()/map_mem() would never match swapper_pg_dir(). (this is just to stop unnecessary tlbi for every write to swapper_pg_dir, as we don't need to do that during early boot when it isn't loaded) > The way I > can think of is to create a function similar to __create_pgd_mapping(). > And it is used to create the page table during paging_init(). It differs > from the __create_pgd_mapping() only in that it calls > p?d_populate_without_test(). However, in this way, I'm worried that I am > reinventing the wheel. I agree duplicating the code doesn't sound good. Something like this? (barely tested): %< diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 691a05bbf87b..64ba422482cc 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -644,8 +690,12 @@ void __init set_init_mm_pgd(pgd_t *pgd) */ void __init paging_init(void) { - map_kernel(swapper_pg_dir); - map_mem(swapper_pg_dir); + pgd_t *pgdp = pgd_set_fixmap(__pa_symbol(swapper_pg_dir)); + + map_kernel(pgdp); + map_mem(pgdp); + pgd_clear_fixmap(); + cpu_replace_ttbr1(lm_alias(swapper_pg_dir)); set_init_mm_pgd(swapper_pg_dir); } %< This reduced the pgd-fixmap setup/teardown calls during boot from 68 to 2... Thanks, James
Re: [Question] vendor-specific cpu enable-method
Hi. 2018-09-13 11:29 GMT+09:00 Jisheng Zhang : > On Thu, 13 Sep 2018 10:23:35 +0900 Masahiro Yamada wrote: > >> Hello. >> >> >> Sorry if I am asking a stupid question. >> >> >> For arm64, there are only 2 cpu methods, psci and spin-table. >> >> Why do we still allow vendor-specific methods upstreamed >> for arm 32bit ports? >> >> To me, it looks like SoC vendors continue inventing >> different (but similar) ways to do the same thing. >> >> It is a historical reason for old platforms. >> >> However, if I look at Documentation/devicetree/bindings/arm/cpus.txt >> enable-method properties are still increasing. >> >> >> psci is available in arch/arm/kernel/psci_smp.c, >> but not all SoCs support the security extension. >> Is there a simpler one like spin-table available for arm32? > > Per my understanding, spin-table is similar as the "pen" based > solution in arm32, both can't reliably support kexec, suspend etc... Right. spin-table is based on pen-based implementation, and just a back-up plan in case psci is not available for some reasons. >> >> If we force generic methods like psci or spin-table >> for new platforms, we can stop proliferated smp code. >> (Of course, we are just shifting the complexity >> from the kernel to firmware.) > > psci is good but not all SoCs support secure extensions. spin-table > can't support kexec, suspend. Except prefer psci for news SoCs > with secure extensions, no better solutions AFAIK. OK, psci is preferred if it is available. Otherwise, ... vendor specific code. Thanks. -- Best Regards Masahiro Yamada
Re: [PATCH] acpi: remove a meaningless null check before debugfs_remove
On Saturday, August 18, 2018 12:49:20 PM CEST zhong jiang wrote: > debugfs_remove has taken null pointer into account. So it is safe > to remove the check before debugfs_remove. > > Signed-off-by: zhong jiang > --- > drivers/acpi/custom_method.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/drivers/acpi/custom_method.c b/drivers/acpi/custom_method.c > index e967c11..4451877 100644 > --- a/drivers/acpi/custom_method.c > +++ b/drivers/acpi/custom_method.c > @@ -92,8 +92,7 @@ static int __init acpi_custom_method_init(void) > > static void __exit acpi_custom_method_exit(void) > { > - if (cm_dentry) > - debugfs_remove(cm_dentry); > + debugfs_remove(cm_dentry); > } > > module_init(acpi_custom_method_init); > Applied, thanks!
Re: [RESEND PATCH v4 5/6] arm64/mm: Populate the swapper_pg_dir by fixmap.
Hi Jun, On 10/09/18 12:41, Jun Yao wrote: > On Fri, Sep 07, 2018 at 10:58:22AM +0100, James Morse wrote: >> On 22/08/18 10:54, Jun Yao wrote: >>> WRITE_ONCE(*pmdp, pmd); >>> dsb(ishst); >>> } >>> @@ -480,6 +511,19 @@ static inline phys_addr_t pmd_page_paddr(pmd_t pmd) >>> >>> static inline void set_pud(pud_t *pudp, pud_t pud) >>> { >>> +#ifdef __PAGETABLE_PUD_FOLDED >>> + if (in_swapper_pgdir(pudp)) { >>> + pud_t *fixmap_pudp; >>> + >>> + spin_lock(&swapper_pgdir_lock); >>> + fixmap_pudp = (pud_t *)pgd_set_fixmap(__pa(pudp)); >> >> This is a bit subtle: are you using the pgd fixmap entry because the path >> from >> map_mem() uses the other three? >> >> Using the pgd fix slot for a pud looks a bit strange to me, but its arguably >> a >> side-effect of the folding. > > Yes, it's a side-effect of the folding. > > When the CONFIG_PGTABLE_LEVELS == 3, the pud is folded into the pgd. It > means that the pgd is never none and it is also a pud. That's why I use > the pgd fixmap entry. > > Maybe write this more clearly: > > static inline void set_pud(pud_t *pudp, pud_t pud) > { > #ifdef __PAGETABLE_PUD_FOLDED > pgd_t *pgdp = (pgd_t *)pudp; > > if (...) { > pgd_t *fixmap_pgdp; > pud_t *fixmap_pudp; > > spin_lock(...); > fixmap_pgdp = pgd_set_fixmap(__pa(pgdp)); > fixmap_pudp = pud_set_fixmap_offset(fixmap_pgdp, 0UL); Using two fixmap entries is excessive, this is behind __PAGETABLE_PUD_FOLDED, so we should know what is going on. (The folding confuses me every time I look at it) > Do you have any way to make it look more reasonable? I'm just reacting to a function with 'pud' in the name, that takes two pud's as arguments, using the pgd fixmap slot. I think its fine to leave it like this, as in_swapper_pg_dir() has told us this is the pgd we're dealing with, it just looks funny. >> I see this called 68 times during boot on a 64K/42bit-VA, 65 of which appear >> to >> be during paging_init(). What do you think to keeping paging_init()s use of >> the >> pgd fixmap for swapper_pg_dir, deliberately to skip the in_swapper_pgdir() >> test >> during paging_init()? > > I think the set_pud() should not be called on a 64K/42bit-VA. As only > the level 2 and level 3 page tables are in use. It means that the pmd is > folded into the pud and the pud is never none. So the set_pud() should > not be called. (yes, sorry, it was just the one I picked on!) > I think a variable can be introduced to indicate whether paging_init() > has been completed. And decide whether or not to skip the > in_swapper_pgdir() base on the value of it. > > I don't know if this is reasonable. What do you think? I think we can just trick in_swapper_pgdir(), this code only runs once, and its already in a very strange environment. Thanks, James
Re: [PATCH v2 11/11] arm: dts: ste-dbx5x0: Update coresight bindings for hardware port
On Wed, Sep 12, 2018 at 3:54 PM Suzuki K Poulose wrote: > Switch to the new coresight bindings > > Cc: Linus Walleij > Cc: Mathieu Poirier > Signed-off-by: Suzuki K Poulose Acked-by: Linus Walleij I guess you will merge this with the rest of the patches so I do not need to queue it? Yours, Linus Walleij
Re: [PATCH] mailbox: PCC: handle parse error
On Monday, August 27, 2018 10:55:41 PM CEST Al Stone wrote: > On 08/27/2018 01:19 PM, David Arcari wrote: > > acpi_pcc_probe calls acpi_table_parse_entries_array but fails to check > > for an error return. This in turn can result in calling kcalloc with > > a negative count as well as emitting the following misleading erorr > > message: > > > > [2.642015] Could not allocate space for PCC mbox channels > > > > Fixes: 8f8027c5f935 ("mailbox: PCC: erroneous error message when parsing > > ACPI PCCT") > > > > Signed-off-by: David Arcari > > Cc: Al Stone > > Cc: Jassi Brar > > --- > > drivers/mailbox/pcc.c | 7 +-- > > 1 file changed, 5 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/mailbox/pcc.c b/drivers/mailbox/pcc.c > > index 311e91b..256f18b 100644 > > --- a/drivers/mailbox/pcc.c > > +++ b/drivers/mailbox/pcc.c > > @@ -461,8 +461,11 @@ static int __init acpi_pcc_probe(void) > > count = acpi_table_parse_entries_array(ACPI_SIG_PCCT, > > sizeof(struct acpi_table_pcct), proc, > > ACPI_PCCT_TYPE_RESERVED, MAX_PCC_SUBSPACES); > > - if (count == 0 || count > MAX_PCC_SUBSPACES) { > > - pr_warn("Invalid PCCT: %d PCC subspaces\n", count); > > + if (count <= 0 || count > MAX_PCC_SUBSPACES) { > > + if (count < 0) > > + pr_warn("Error parsing PCC subspaces from PCCT\n"); > > + else > > + pr_warn("Invalid PCCT: %d PCC subspaces\n", count); > > return -EINVAL; > > } > > > > > > Thanks, David. Nice catch. > > Reviewed-by: Al Stone Patch applied, thanks!
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups
On 13-Sep 21:20, Peter Zijlstra wrote: > On Wed, Sep 12, 2018 at 06:42:09PM +0100, Patrick Bellasi wrote: > > On 12-Sep 18:24, Peter Zijlstra wrote: > > > On Tue, Aug 28, 2018 at 02:53:10PM +0100, Patrick Bellasi wrote: > > > > > { > > > > + int group_id[UCLAMP_CNT] = { UCLAMP_NOT_VALID }; > > > > + int lower_bound, upper_bound; > > > > + struct uclamp_se *uc_se; > > > > + int result = 0; > > > > > > I think the thing would become much more readable if you set > > > lower/upper_bound right here. > > > Actually it could also make sense to have them before the mutex ;) > > Indeed. > > + upper_bound = (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX) > + ? attr->sched_util_max > + : p->uclamp[UCLAMP_MAX].value; > + > + if (upper_bound == UCLAMP_NOT_VALID) > + upper_bound = SCHED_CAPACITY_SCALE; > + if (attr->sched_util_min > upper_bound) { > + result = -EINVAL; > + goto done; > + } > + > + result = uclamp_group_find(UCLAMP_MIN, attr->sched_util_min); > + if (result == -ENOSPC) { > + pr_err(UCLAMP_ENOSPC_FMT, "MIN"); > + goto done; > + } > + group_id[UCLAMP_MIN] = result; > + } > + if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX) { > + lower_bound = (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN) > + ? attr->sched_util_min > + : p->uclamp[UCLAMP_MIN].value; > + > + if (lower_bound == UCLAMP_NOT_VALID) > + lower_bound = 0; > + if (attr->sched_util_max < lower_bound || > + attr->sched_util_max > SCHED_CAPACITY_SCALE) { > + result = -EINVAL; > + goto done; > + } > > That would end up soething like: > > unsigned int lower_bound = p->uclamp[UCLAMP_MIN].value; > unsigned int upper_bound = p->uclamp[UCLAMP_MAX].value; > > if (sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN) > lower_bound = attr->sched_util_min; > > if (sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX) > upper_bound = attr->sched_util_max; > > if (lower_bound > upper_bound || > upper_bound > SCHED_CAPACITY_MAX) > return -EINVAL; > > mutex_lock(...); Yes... much cleaner, thanks. -- #include Patrick Bellasi
Re: [PATCH] PM / suspend: Count suspend-to-idle loop as sleep time
On Friday, September 14, 2018 10:28:44 AM CEST Mika Penttilä wrote: > Hi! > > > On 09/14/2018 09:59 AM, Rafael J. Wysocki wrote: > > From: Rafael J. Wysocki > > > > There is a difference in behavior between suspend-to-idle and > > suspend-to-RAM in the timekeeping handling that leads to functional > > issues. Namely, every iteration of the loop in s2idle_loop() > > increases the monotinic clock somewhat, even if timekeeping_suspend() > > and timekeeping_resume() are invoked from s2idle_enter(), and if > > many of them are carried out in a row, the monotonic clock can grow > > significantly while the system is regarded as suspended, which > > doesn't happen during suspend-to-RAM and so it is unexpected and > > leads to confusion and misbehavior in user space (similar to what > > ensued when we tried to combine the boottime and monotonic clocks). > > > > To avoid that, count all iterations of the loop in s2idle_loop() > > as "sleep time" and adjust the clock for that on exit from > > suspend-to-idle. > > > > [That also covers systems on which timekeeping is not suspended > > by by s2idle_enter().] > > > > Signed-off-by: Rafael J. Wysocki > > --- > > > > This is a replacement for https://patchwork.kernel.org/patch/10599209/ > > > > I decided to count the entire loop in s2idle_loop() as "sleep time" as the > > patch is then simpler and it also covers systems where timekeeping is not > > suspended in the final step of suspend-to-idle. > > > > I dropped the "Fixes:" tag, because the monotonic clock delta problem > > has been present on the latter since the very introduction of "freeze" > > (as suspend-to-idle was referred to previously) and so this doesn't fix > > any particular later commits. > > > > --- > > kernel/power/suspend.c | 18 ++ > > 1 file changed, 18 insertions(+) > > > > Index: linux-pm/kernel/power/suspend.c > > === > > --- linux-pm.orig/kernel/power/suspend.c > > +++ linux-pm/kernel/power/suspend.c > > @@ -109,8 +109,12 @@ static void s2idle_enter(void) > > > > static void s2idle_loop(void) > > { > > + ktime_t start, delta; > > + > > pm_pr_dbg("suspend-to-idle\n"); > > > > + start = ktime_get(); > > + > > for (;;) { > > int error; > > > > @@ -150,6 +154,20 @@ static void s2idle_loop(void) > > pm_wakeup_clear(false); > > } > > > > + /* > > +* If the monotonic clock difference between the start of the loop and > > +* this point is too large, user space may get confused about whether or > > +* not the system has been suspended and tasks may get killed by > > +* watchdogs etc., so count the loop as "sleep time" to compensate for > > +* that. > > +*/ > > + delta = ktime_sub(ktime_get(), start); > > + if (ktime_to_ns(delta) > 0) { > > + struct timespec64 timespec64_delta = ktime_to_timespec64(delta); > > + > > + timekeeping_inject_sleeptime64(×pec64_delta); > > + } > > But doesn't injecting sleep time here make monotonic clock too large by the > amount of sleeptime? > tick_freeze() / tick_unfreeze() already injects the sleeptime (otherwise > delta would be 0). No, it doesn't. The delta here is the extra time taken by the loop which hasn't been counted as sleep time yet. Thanks, Rafael
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups
On 13-Sep 21:14, Peter Zijlstra wrote: > On Wed, Sep 12, 2018 at 06:52:02PM +0100, Patrick Bellasi wrote: > > On 12-Sep 19:42, Peter Zijlstra wrote: > > > On Wed, Sep 12, 2018 at 06:35:15PM +0100, Patrick Bellasi wrote: > > > > On 12-Sep 18:12, Peter Zijlstra wrote: > > > > > > > > No idea; but if you want to go all fancy you can replace he whole > > > > > uclamp_map thing with something like: > > > > > > > > > > struct uclamp_map { > > > > > union { > > > > > struct { > > > > > unsigned long v : 10; > > > > > unsigned long c : BITS_PER_LONG - 10; > > > > > }; > > > > > atomic_long_t s; > > > > > }; > > > > > }; > > > > > > > > That sounds really cool and scary at the same time :) > > > > > > > > The v:10 requires that we never set SCHED_CAPACITY_SCALE>1024 > > > > or that we use it to track a percentage value (i.e. [0..100]). > > > > > > Or we pick 11 bits, it seems unlikely that capacity be larger than 2k. > > > > Just remembered a past experience where we had unaligned access traps > > on some machine because... don't you see any potentially issue on > > using bitfleds like you suggest above ? > > > > I'm thinking to: > > > >commit 317d359df95d ("sched/core: Force proper alignment of 'struct > > util_est'") > > There should not be (and I'm still confused by that particular commit > you reference). If we access everything through the uclamp_map::s, and > only use the bitfields to interpret the results, it all 'works'. Yes, the problem above was different... still I was wondering if there could be bitfields related alignment issue lurking somewhere. But, as you say, if we always R/W atomically via uclamp_map::s there should be none. > The tricky thing we did earlier was trying to use u64 accesses for 2 > u32 variables. And somehow ia64 didn't get the alignment right. Right, np... sorry for the noise. -- #include Patrick Bellasi
Re: [PATCH] perf/x86/intel/lbr: Optimize context switches for LBR
> > In principle the LBRs need to be flushed between threads. So does > > current code. > > IMHO, ideally, LBRs stack would be preserved and restored when > switching between execution stacks. That would allow implementing > per-thread statistical call graph view in Perf tools, fully based > on HW capabilities. It could be advantageous for some cases, in > comparison with traditional dwarf based call graph. This is already supported when you use LBR call stack mode (perf record --call-graph lbr) This change is only optimizing the case when call stack mode is not used. Of course in call stack mode the context switch overhead is even higher, because it not only writes, but also reads. -Andi
Re: [PATCH] gpiolib: Free the last requested descriptor
On Thu, Sep 13, 2018 at 3:37 PM Ricardo Ribalda Delgado wrote: > The current code only frees N-1 gpios if an error occurs during > gpiod_set_transitory, gpiod_direction_output or gpiod_direction_input. > Leading to gpios that cannot be used by userspace nor other drivers. > > Cc: Timur Tabi > Cc: sta...@vger.kernel.org > Fixes: ab3dbcf78f60f46d ("gpioib: do not free unrequested descriptors) > Signed-off-by: Ricardo Ribalda Delgado Patch applied for fixes with Jan's Reported-by tag. Yours, Linus Walleij
Re: [PATCH v2 11/11] arm: dts: ste-dbx5x0: Update coresight bindings for hardware port
Hi Linus, On 14/09/18 09:43, Linus Walleij wrote: On Wed, Sep 12, 2018 at 3:54 PM Suzuki K Poulose wrote: Switch to the new coresight bindings Cc: Linus Walleij Cc: Mathieu Poirier Signed-off-by: Suzuki K Poulose Acked-by: Linus Walleij Thanks. I guess you will merge this with the rest of the patches so I do not need to queue it? We haven't figured out that yet. I guess it would be ideal if we could merge this via arm-soc ? To avoid possible conflict with other changes to the dts in the series. Sudeep has queued the TC2 and Juno changes. So, if you could queue it, that is fine with us. Suzuki
Re: [LKP] [ipc] 61224adcd2: general_protection_fault:#[##]
On 09/13/2018 07:28 PM, David Howells wrote: kernel test robot wrote: To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp qemu -k job-script # job-script is attached in this email Tried that. That seems to work for me. Or, at least, it comes to a prompt at which I can log in. I'm not sure whether anything else is meant to happen. It's a 0day problem, we will fix it as soon as possible. Best Regards, Rong Chen
Re: [PATCH] gpiolib: fix freeing of descriptors on error
On Thu, Sep 13, 2018 at 9:56 PM Ricardo Ribalda Delgado wrote: > Hi Jim > > I have sent an identical patch 3 hours before you [1] ;) > Seems that we are working on the same stuff. Haha w00t but I'm grateful that you fine people are taking the chardev ABI for a ride! I recorded Jim's contribution as Reported-by on your patch. Yours, Linus Walleij
Re: [PATCH v1 2/3] locking/pvqspinlock, hv: Enable PV qspinlock for Hyper-V
On 18-09-13 13:06:13, Thomas Gleixner wrote: > On Thu, 13 Sep 2018, Yi Sun wrote: > > +#include > > > > /* representing HT siblings of each logical CPU */ > > DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map); > > @@ -1335,6 +1336,7 @@ void __init native_smp_prepare_boot_cpu(void) > > /* already set me in cpu_online_mask in boot_cpu_init() */ > > cpumask_set_cpu(me, cpu_callout_mask); > > cpu_set_state_online(me); > > + hv_init_spinlocks(); > > No. We have smp_ops.smp_prepare_boot_cpu for that. > Got it, will change it. Thanks! > Thanks, > > tglx
Re: [PATCH] printk: CON_PRINTBUFFER console registration is a bit racy
On Fri 2018-09-14 11:34:28, Sergey Senozhatsky wrote: > CON_PRINTBUFFER console registration requires us to do several > preparation steps: > - Rollback console_seq to replay logbuf messages which were already > seen on other consoles; > - Set exclusive_console flag so console_unlock() will ->write() logbuf > messages only to the exclusive_console driver. > > The way we do it, however, is a bit racy > > logbuf_lock_irqsave(flags); > console_seq = syslog_seq; > console_idx = syslog_idx; > logbuf_unlock_irqrestore(flags); > << preemption enabled > << irqs enabled > exclusive_console = newcon; > console_unlock(); > > We rollback console_seq under logbuf_lock with IRQs disabled, but > we set exclusive_console with local IRQs enabled and logbuf unlocked. > If the system oops-es or panic-s before we set exclusive_console - and > given that we have IRQs and preemption enabled there is such a > possibility - we will re-play all logbuf messages to every registered > console, which may be a bit annoying and time consuming. > > Move exclusive_console assignment under the same IRQs-disabled and > logbuf_lock protected section where we rollback console_seq. > > Signed-off-by: Sergey Senozhatsky > --- > kernel/printk/printk.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > index 992bb6bb7ac2..c52a986a72b3 100644 > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -2705,7 +2705,6 @@ void register_console(struct console *newcon) > logbuf_lock_irqsave(flags); > console_seq = syslog_seq; > console_idx = syslog_idx; > - logbuf_unlock_irqrestore(flags); > /* >* We're about to replay the log buffer. Only do this to the >* just-registered console to avoid excessive message spam to > @@ -2713,6 +2712,7 @@ void register_console(struct console *newcon) >*/ > exclusive_console = newcon; > exclusive_console_stop_seq = console_seq; > + logbuf_unlock_irqrestore(flags); This would make a false feeling that these variables need to be synchronized using logbug_lock. It might either confuse people or it could get removed during a code clean up. A better solution would be to explicitly disable preemption around the entire section and add a comment. Well, I am not sure if it is worth the code complexity. Best Regards, Petr
Re: [PATCH] pinctrl: at91: don't use the same irqchip with multiple gpiochips
On Thu, Sep 13, 2018 at 2:46 PM Ludovic Desroches wrote: > Sharing the same irqchip with multiple gpiochips is not a good > practice. For instance, when installing hooks, we change the state > of the irqchip. The initial state of the irqchip for the second > gpiochip to register is then disrupted. > > Signed-off-by: Ludovic Desroches Patch applied. Yours, Linus Walleij
Re: [PATCH] cpufreq: intel_pstate: Optimize IO boost in non HWP mode
On Tuesday, September 11, 2018 7:35:15 PM CEST Francisco Jerez wrote: > > "Rafael J. Wysocki" writes: > > > On Thursday, September 6, 2018 6:20:08 AM CEST Francisco Jerez wrote: > > > >> Srinivas Pandruvada writes: > >>=20 > >> > [...] > >> > > >> >> > >=3D20 > >> >> > > This patch causes a number of statistically significant > >> >> > > regressions > >> >> > > (with significance of 1%) on the two systems I've tested it > >> >> > > on. On > >> >> > > my > >> >> >=3D20 > >> >> > Sure. These patches are targeted to Atom clients where some of > >> >> > these > >> >> > server like workload may have some minor regression on few watts > >> >> > TDP > >> >> > parts. > >> >>=3D20 > >> >> Neither the 36% regression of fs-mark, the 21% regression of sqlite, > >> >> nor > >> >> the 10% regression of warsaw qualify as small. And most of the test > >> >> cases on the list of regressions aren't exclusively server-like, if > >> >> at > >> >> all. Warsaw, gtkperf, jxrendermark and lightsmark are all graphics > >> >> benchmarks -- Latency is as important if not more for interactive > >> >> workloads than it is for server workloads. In the case of a conflict > >> >> like the one we're dealing with right now between optimizing for > >> >> throughput (e.g. for the maximum number of requests per second) and > >> >> optimizing for latency (e.g. for the minimum request duration), you > >> >> are > >> >> more likely to be concerned about the former than about the latter in > >> >> a > >> >> server setup. > >> > > >> > Eero, > >> > Please add your test results here. > >> > > >> > No matter which algorithm you do, there will be variations. So you have > >> > to look at the platforms which you are targeting. For this platform=3D= > 20 > >> > number one item is use of less turbo and hope you know why? > >>=20 > >> Unfortunately the current controller uses turbo frequently on Atoms for > >> TDP-limited graphics workloads regardless of IOWAIT boosting. IOWAIT > >> boosting simply exacerbated the pre-existing energy efficiency problem. > > > > My current understanding of the issue at hand is that using IOWAIT boosti= > ng > > on Atoms is a regression relative to the previous behavior. > > Not universally. IOWAIT boosting helps under roughly the same > conditions on Atom as it does on big core, so applying this patch will > necessarily cause regressions too (see my reply from Sep. 3 for some > numbers), and won't completely restore the previous behavior since it > simply decreases the degree of IOWAIT boosting applied without being > able to avoid it (c.f. the series I'm working on that does something > similar to IOWAIT boosting when it's able to determine it's actually > CPU-bound, which prevents energy inefficient behavior for non-CPU-bound > workloads that don't benefit from a higher CPU clock frequency anyway). Well, OK. That doesn't seem to be a clear-cut regression situation, then, since getting back is not desirable, apparently. Or would it restore the previous behavior if we didn't do any IOWAIT boosting on Atoms at all? > > That is what Srinivas is trying to address here AFAICS. > > > > Now, you seem to be saying that the overall behavior is suboptimal and the > > IOWAIT boosting doesn't matter that much, > > I was just saying that IOWAIT boosting is less than half of the energy > efficiency problem, and this patch only partially addresses that half of > the problem. Well, fair enough, but there are two things to consider here, the general energy-efficiency problem and the difference made by IOWAIT boosting. If the general energy-efficiency problem had existed for a relatively long time, but it has got worse recently due to the IOWAIT boosting, it still may be desirable to get the IOWAIT boosting out of the picture first and then get to the general problem. I'm not sure if that is the case as my experience with Atoms is limited anyway, so please advise. Thanks, Rafael
Re: [PATCH 3/6] xenbus: dispatch per-domU watch event to per-domU xenwatch thread
On 14/09/18 09:34, Dongli Zhang wrote: > This is the 3rd patch of a (6-patch) patch set. > > This patch dispatches the watch event to per-domU xenwatch thread when the > event meets all of below conditions: > > 1. The watch is registered to use xenwatch multithreading feature and the >get_domid() method returns valid domid. > 2. There is xenwatch thread (mtwatch domain) available for the domid >obtained from get_domid() method. > 3. The xenwatch thread is forked successfully by kthread_run() during >initialization and therefore its state should be MTWATCH_DOMAIN_UP. > > Signed-off-by: Dongli Zhang > --- > drivers/xen/xenbus/xenbus_xs.c | 53 > ++ > 1 file changed, 49 insertions(+), 4 deletions(-) > > diff --git a/drivers/xen/xenbus/xenbus_xs.c b/drivers/xen/xenbus/xenbus_xs.c > index 741dc54..7335e19 100644 > --- a/drivers/xen/xenbus/xenbus_xs.c > +++ b/drivers/xen/xenbus/xenbus_xs.c > @@ -896,6 +896,32 @@ static struct xenbus_watch *find_watch(const char *token) > return NULL; > } > > +static int xs_watch_insert_event(struct xs_watch_event *event, domid_t domid) > +{ > + struct mtwatch_domain *domain; > + int ret = -1; > + > + rcu_read_lock(); > + > + domain = mtwatch_find_domain(domid); > + if (!domain) { > + rcu_read_unlock(); > + return ret; > + } > + > + spin_lock(&domain->events_lock); > + if (domain->state == MTWATCH_DOMAIN_UP) { > + list_add_tail(&event->list, &domain->events); > + wake_up(&domain->events_wq); > + ret = 0; > + } > + spin_unlock(&domain->events_lock); > + > + rcu_read_unlock(); > + > + return ret; > +} > + > int xs_watch_msg(struct xs_watch_event *event) > { > if (count_strings(event->body, event->len) != 2) { > @@ -908,10 +934,29 @@ int xs_watch_msg(struct xs_watch_event *event) > spin_lock(&watches_lock); > event->handle = find_watch(event->token); > if (event->handle != NULL) { > - spin_lock(&watch_events_lock); > - list_add_tail(&event->list, &watch_events); > - wake_up(&watch_events_waitq); > - spin_unlock(&watch_events_lock); > + domid_t domid = 0; I won't repeat it again (including following patches): use another special value like DOMID_SELF. Juergen
metadata operation reordering regards to crash
Hi, all, A probably bit of complex question: Does nowadays practical filesystems, eg., extX, btfs, preserve metadata operation order through a crash/power failure? What I know is modern filesystems ensure metadata consistency after crash/power failure. Journal filesystems like extX do that by write-ahead logging of metadata operations into transactions. Other filesystems do that in various ways as btfs do that by COW. What I'm not so far clear is whether these filesystems preserve metadata operation order after a crash. For example, op 1. rename(A, B) op 2. rename(C, D) As mentioned above, metadata consistency is ensured after a crash. Thus, B is either the original B(or not exists) or has been replaced by A. The same to D. Is it possible that, after a crash, D has been replaced by C but B is still the original file(or not exists)? Or, from the view of implementation, before the crash - in a journal filesystem, Is the atomic transaction `rename(C, D)` permitted to be written to disk journal before the transaction `rename(A, B)`? - in other filesystems, say btfs, Is it permit to reorder `rename(C,D)` and `rename(A,B)` atomic operation hiting disk? The question is meaningful as many applications do that: if (flag_file_says_need_generate_data) { open_write_sync_close(data_tmp); rename(data_tmp, data); open_write_sync_close(flag_file_tmp, no_need_to_generate_data); rename(flag_file_tmp, flag_file) } use_data_file() If flag is here but data is not after a crash, that is a problem. Thanks, Trol
Re: [PATCH v1 2/3] locking/pvqspinlock, hv: Enable PV qspinlock for Hyper-V
On 18-09-13 13:24:07, Thomas Gleixner wrote: > On Thu, 13 Sep 2018, Yi Sun wrote: > > +++ b/arch/x86/hyperv/hv_spinlock.c > > @@ -0,0 +1,99 @@ > > +// SPDX-License-Identifier: GPL-2.0 > > + > > +/* > > + * Hyper-V specific spinlock code. > > + * > > + * Copyright (C) 2018, Intel, Inc. > > + * > > + * Author : Yi Sun > > + * > > + * This program is free software; you can redistribute it and/or modify it > > + * under the terms of the GNU General Public License version 2 as published > > + * by the Free Software Foundation. > > + * > > + * This program is distributed in the hope that it will be useful, but > > + * WITHOUT ANY WARRANTY; without even the implied warranty of > > + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or > > + * NON INFRINGEMENT. See the GNU General Public License for more > > + * details. > > Please remove the boilerplate. The SPDX identifier is enough. If in doubt > talk to Intel legal. > I will check this. Thanks! > > +static bool hv_pvspin = true; > > __initdata please. > > > +static u32 spin_wait_info = 0; > > No need for 0 initialization. > Will modify them. > > + > > +static void hv_notify_long_spin_wait(void) > > +{ > > + u64 input = spin_wait_info | 0x; > > What? The result is always 0x . > Oh, sorry for such error. > > + spin_wait_info++; > > + hv_do_fast_hypercall8(HVCALL_NOTIFY_LONG_SPIN_WAIT, input); > > +} > > + > > +static void hv_qlock_kick(int cpu) > > +{ > > + spin_wait_info--; > > Looking at the above this is completely pointless and I have no idea what > that variable is supposed to do. > Per Microsoft Hypervisor Top Level Functional Specification, the input parameter of HVCALL_NOTIFY_LONG_SPIN_WAIT is defined as below: SpinwaitInfo – Specifies the accumulated count the guest was spinning. So, it is increased when guest is spinning and reduced when spinlock is released. I may add comments to explain it. > > + apic->send_IPI(cpu, X86_PLATFORM_IPI_VECTOR); > > +} > > + > > +/* > > + * Halt the current CPU & release it back to the host > > + */ > > +static void hv_qlock_wait(u8 *byte, u8 val) > > +{ > > + unsigned long msr_val; > > + > > + if (READ_ONCE(*byte) != val) > > + return; > > + > > + hv_notify_long_spin_wait(); > > + > > + rdmsrl(HV_X64_MSR_GUEST_IDLE, msr_val); > > Magic rdmsrl(). That wants a comment what this is for. > Per above spec, reading HV_X64_MSR_GUEST_IDLE can make guest idle. I will add comment here. > > +/* > > + * Hyper-V does not support this so far. > > + */ > > +bool hv_vcpu_is_preempted(int vcpu) > > static ? > PV_CALLEE_SAVE_REGS_THUNK definition is below. #define PV_CALLEE_SAVE_REGS_THUNK(func) \ extern typeof(func) __raw_callee_save_##func; \ .. So, the hv_vcpu_is_preempted cannot be declared as 'static'. Otherwise, the make fails with below info. arch/x86/hyperv/hv_spinlock.o: In function `__raw_callee_save_hv_vcpu_is_preempted': hv_spinlock.c:(.text+0xd): undefined reference to `hv_vcpu_is_preempted' > > +{ > > + return false; > > +} > > +PV_CALLEE_SAVE_REGS_THUNK(hv_vcpu_is_preempted); > > + > > +void __init hv_init_spinlocks(void) > > +{ > > + if (!hv_pvspin || > > + !apic || > > + !(ms_hyperv.hints & HV_X64_CLUSTER_IPI_RECOMMENDED) || > > + !(ms_hyperv.features & HV_X64_MSR_GUEST_IDLE_AVAILABLE)) { > > + pr_warn("hv: PV spinlocks disabled\n"); > > Why does this need to be pr_warn? This is purely informational. Also please > use pr_fmt instead of the 'hv:' prefix. > Got it. Thanks! > > +static __init int hv_parse_nopvspin(char *arg) > > +{ > > + hv_pvspin = false; > > + return 0; > > +} > > +early_param("hv_nopvspin", hv_parse_nopvspin); > > That lacks Documentation of the command line parameter. Wants to be in the > same patch. > Will merge patch 3 into 2. Thanks! > Thanks, > > tglx
Re: [PATCH v4 03/16] sched/core: uclamp: add CPU's clamp groups accounting
On 13-Sep 21:12, Peter Zijlstra wrote: > On Tue, Aug 28, 2018 at 02:53:11PM +0100, Patrick Bellasi wrote: > > +static inline void uclamp_cpu_get_id(struct task_struct *p, > > +struct rq *rq, int clamp_id) > > +{ > > + struct uclamp_group *uc_grp; > > + struct uclamp_cpu *uc_cpu; > > + int clamp_value; > > + int group_id; > > + > > + /* Every task must reference a clamp group */ > > + group_id = p->uclamp[clamp_id].group_id; > > > +} > > + > > +static inline void uclamp_cpu_put_id(struct task_struct *p, > > +struct rq *rq, int clamp_id) > > +{ > > + struct uclamp_group *uc_grp; > > + struct uclamp_cpu *uc_cpu; > > + unsigned int clamp_value; > > + int group_id; > > + > > + /* New tasks don't have a previous clamp group */ > > + group_id = p->uclamp[clamp_id].group_id; > > + if (group_id == UCLAMP_NOT_VALID) > > + return; > > *confused*, so on enqueue they must have a group_id, but then on dequeue > they might no longer have? Why not ? Tasks always have a (task-specific) group_id, once set the first time. IOW, init_task::group_id is UCLAMP_NOT_VALID, as well as all the tasks forked under reset_on_fork, otherwise the get the group_id of the parent. Actually, I just noted that the reset_on_fork path is now setting p::group_id=0, which is semantically equivalent to UCLAMP_NOT_VALID... but will update that assignment for consistency in v5. The only way to set a !UCLAMP_NOT_VALID value for p::group_id is via the syscall, which will either fails or set a new valid group_id. Thus, if we have a valid p::group_id @enqueue time, we will have one @dequeue time too. Eventually it could be a different one, because while RUNNABLE we do a syscall... but this case is addressed by the following patch: [PATCH v4 04/16] sched/core: uclamp: update CPU's refcount on clamp changes https://lore.kernel.org/lkml/20180828135324.21976-5-patrick.bell...@arm.com/ Does that makes sense ? > > +} > > > @@ -1110,6 +1313,7 @@ static inline void enqueue_task(struct rq *rq, struct > > task_struct *p, int flags) > > if (!(flags & ENQUEUE_RESTORE)) > > sched_info_queued(rq, p); > > > > + uclamp_cpu_get(rq, p); > > p->sched_class->enqueue_task(rq, p, flags); > > } > > > > @@ -1121,6 +1325,7 @@ static inline void dequeue_task(struct rq *rq, struct > > task_struct *p, int flags) > > if (!(flags & DEQUEUE_SAVE)) > > sched_info_dequeued(rq, p); > > > > + uclamp_cpu_put(rq, p); > > p->sched_class->dequeue_task(rq, p, flags); > > } > > The ordering, is that right? We get while the task isn't enqueued yet, > which would suggest we put when the task is dequeued. That's the "usual trick" required for correct schedutil updates. The scheduler class code will likely poke schedutil and thus we wanna be sure to have updated CPU clamps by the time we have to compute the next OPP. Cheers, Patrick -- #include Patrick Bellasi
Re: [PATCH 11/11] UAPI: Check headers build for C++ [ver #2]
On Thu, Sep 6, 2018 at 11:21 AM David Howells wrote: > + > +typedef __s8 int8_t; > +typedef __s16 int16_t; > +typedef __s32 int32_t; > +typedef __s64 int64_t; > +typedef __u8 uint8_t; > +typedef __u16 uint16_t; > +typedef __u32 uint32_t; > +typedef __u64 uint64_t; > +typedef long int intptr_t; > +typedef unsigned long int uintptr_t; > +typedef unsigned short u_short; > +typedef unsigned int u_int; > +typedef unsigned long u_long; > +typedef char *caddr_t; > + > +typedef __kernel_clockid_t clockid_t; > +typedef __kernel_ino_t ino_t; > +typedef __kernel_pid_t pid_t; > +typedef __kernel_sa_family_t sa_family_t; > +typedef __kernel_size_tsize_t; > +typedef __kernel_uid_t uid_t; > + > +typedef unsigned long elf_greg_t; > +typedef elf_greg_t elf_gregset_t[1]; > +typedef unsigned long long elf_fpregset_t[1]; > +typedef unsigned long long elf_fpxregset_t[1]; > + > +#define INT_MIN ((int)0x8000) > +#define INT_MAX ((int)0x7fff) > + > +extern size_t strlen(const char *); > +extern void *memset(void *, int, size_t); > +extern void *memcpy(void *, const void *, size_t); > +extern __u16 ntohs(__u16); > +extern __u16 htons(__u16); > +extern __u32 ntohl(__u32); > +extern __u32 htonl(__u32); > + > +typedef uint32_t grant_ref_t; > +typedef uint16_t domid_t; > +typedef unsigned long xen_pfn_t; > + > +#define MSG_FIN 0x200 > + > +typedef int SVGA3dMSPattern; > +typedef int SVGA3dMSQualityLevel; > + > +struct sockaddr > +{ > + sa_family_t sa_family; > + charsa_data[14]; > +}; > +#define sockaddr_storage __kernel_sockaddr_storage I think we need to reduce that list as much as we can. In https://patchwork.ozlabs.org/patch/968814/, Joseph Myers pointed out header file (linux/elfcore.h) that simply cannot be included from user space at all, because its dependencies cannot be met without running into conflicting type definitions, and he would like to include that file in order to automatically check that it's compatible with the glibc version (I pointed out a couple of architectures on which it is in fact incompatible right now). In the list above, I see multiple classes of bugs that could be addressed: - references to identifiers that are only present in kernel internal headers: SVGA3dMSPattern, MSG_FIN, xen_pfn_t, ... I think these are all simple bugs, and we should either remove the references, or make sure the respective dependencies are included in the uapi headers as well, possibly renamed with a __kernel_ prefix to avoid clashing with user space headers. - references to user space types that should use the uapi internal types: sockaddr_storage, clockid_t, uid_t, ... I think these just need to get the __kernel_prefix consistently as we did a few years ago. Note that using the headers otherwise is broken anyway when the types in libc are different from the ones in the kernel. - standard types (uint32_t): either include the correct user space headers ifndef __KERNEL__ or use the kernel types - standard functions (memcpy(), ntohs(), ...): These should already be handled in the headers by including the user space. If we missed any, we should probably do the same thing there. Arnd
Re: [PATCH 1/3] drivers/base: move device_shutdown() to base/power
On Monday, August 20, 2018 11:18:35 AM CEST Pingfan Liu wrote: > Consider the shutdown as a system state transition, i.e. something like > suspend, hibernate, hence move it under the base/power. (This is a first > step to unify the duplicate code logic on devices_kset and dpm_list.) I don't really think that device_shutodwn() belongs in base/power/. Moving it to a separate file sounds like a good idea, but let that file reside in base/ proper. Then, I would separate the dpm_list definition, locking etc out of base/power/, move it to the new file containing device_shutdown() and make the code in base/power/ refer to it (you may want to move the list entry list head from dev_pm_info to struct device directly while at that too). Then, it should be straightforward enough to switch device_shutdown() over to using it. Thanks, Rafael
Re: [RFC] memory_hotplug: Free pages as pageblock_order
On Wed 12-09-18 20:12:30, Arun KS wrote: > On 2018-09-12 18:47, Michal Hocko wrote: > > On Wed 12-09-18 22:57:43, Balbir Singh wrote: > > > On Wed, Sep 12, 2018 at 12:38:53PM +0200, Michal Hocko wrote: > > > > On Wed 12-09-18 14:56:45, Arun KS wrote: > > > > > When free pages are done with pageblock_order, time spend on > > > > > coalescing pages by buddy allocator can be reduced. With > > > > > section size of 256MB, hot add latency of a single section > > > > > shows improvement from 50-60 ms to less than 1 ms, hence > > > > > improving the hot add latency by 60%. > > > > > > > > Where does the improvement come from? You are still doing the same > > > > amount of work except that the number of callbacks is lower. Is this the > > > > real source of 60% improvement? > > > > > > > > > > It looks like only the first page of the pageblock is initialized, is > > > some of the cost amortized in terms of doing one initialization for > > > the page with order (order) and then relying on split_page and helpers > > > to do the rest? Of course the number of callbacks reduce by a > > > significant > > > number as well. > > > > Ohh, I have missed that part. Now when re-reading I can see the reason > > for the perf improvement. It is most likely the higher order free which > > ends up being much cheaper. This part makes some sense. > > > > How much is this feasible is another question. Do not forget we have > > those external providers of the online callback and those would need to > > be updated as well. > Sure Michal, I ll look into this. > > > > > Btw. the normal memmap init code path does the same per-page free as > > well. If we really want to speed the hotplug path then I guess the init > > one would see a bigger improvement and those two should be in sync. > Thanks for pointers, Will look further. I haven't looked closer and I will be travelling next week so just hint. Have a look at the nobootmem and how it frees pages to the page allocator in __free_pages_boot_core. Seems exactly what you want and it also answers your question about reference counting. -- Michal Hocko SUSE Labs
Re: [PATCH] cpufreq: intel_pstate: warn if HWP Native mode and OOB mode are both set
On Sunday, July 29, 2018 6:14:56 AM CEST ocea...@zoho.com.cn wrote: > From: Ocean He > > Refer to Intel HWPM Compliance Guide (Document Number: 566474): Bit[6] or > bit[8] of MSR MISC_PWR_MGMT is set only when BIOS select either HWP Native > mode or HWP OOB mode. Bit[6] and bit[8] cannot be set both at the same > time. > > Add a warning message if intel_pstate detects both HWP Native mode and > HWP OOB mode are set at the same time. > > Signed-off-by: Ocean He > --- > drivers/cpufreq/intel_pstate.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c > index 3c39712..678d3f2 100644 > --- a/drivers/cpufreq/intel_pstate.c > +++ b/drivers/cpufreq/intel_pstate.c > @@ -2455,8 +2455,12 @@ static bool __init > intel_pstate_platform_pwr_mgmt_exists(void) > id = x86_match_cpu(intel_pstate_cpu_oob_ids); > if (id) { > rdmsrl(MSR_MISC_PWR_MGMT, misc_pwr); > - if ( misc_pwr & (1 << 8)) > + if (misc_pwr & (1 << 8)) { > + if (misc_pwr & (1 << 6)) > + pr_warn(FW_WARN > + "HWP Native mode and OOB mode cannot be > set both at the same time.\n"); > return true; > + } > } > > idx = acpi_match_platform_list(plat_info); > Srinivas, what do you think about this one?
Re: [PATCH] perf/x86/intel/lbr: Optimize context switches for LBR
Hi Andi, On 14.09.2018 11:54, Andi Kleen wrote: >>> In principle the LBRs need to be flushed between threads. So does >>> current code. >> >> IMHO, ideally, LBRs stack would be preserved and restored when >> switching between execution stacks. That would allow implementing >> per-thread statistical call graph view in Perf tools, fully based >> on HW capabilities. It could be advantageous for some cases, in >> comparison with traditional dwarf based call graph. > > This is already supported when you use LBR call stack mode > (perf record --call-graph lbr) Which kernel versions does it make sense to try? Thanks, Alexey > > This change is only optimizing the case when call stack mode is not used. > > Of course in call stack mode the context switch overhead is even higher, > because it not only writes, but also reads. > > -Andi >
Re: [PATCH 3/4][v2] x86, hibernate: Extract the common code of 64/32 bit system
On Tuesday, September 11, 2018 7:20:46 PM CEST Chen Yu wrote: > From: Zhimin Gu > > Reduce the hibernation code duplication between x86-32 and x86-64 > by extracting the common code into hibernate.c. > > No functional change. > > Acked-by: Chen Yu > Acked-by: Pavel Machek > Cc: "Rafael J. Wysocki" > Cc: Thomas Gleixner > Signed-off-by: Zhimin Gu > --- > arch/x86/power/hibernate.c| 265 ++ > arch/x86/power/hibernate_32.c | 23 +-- > arch/x86/power/hibernate_64.c | 246 +-- > arch/x86/power/hibernate_asm_64.S | 2 +- > 4 files changed, 269 insertions(+), 267 deletions(-) > create mode 100644 arch/x86/power/hibernate.c > > diff --git a/arch/x86/power/hibernate.c b/arch/x86/power/hibernate.c > new file mode 100644 > index ..6aeac4d3c9df > --- /dev/null > +++ b/arch/x86/power/hibernate.c > @@ -0,0 +1,265 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Hibernation support for x86 > + * > + * Copyright (c) 2007 Rafael J. Wysocki > + * Copyright (c) 2002 Pavel Machek > + * Copyright (c) 2001 Patrick Mochel > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +/* Defined in hibernate_asm_64.S or hibernate_asm_32.S */ > +extern asmlinkage __visible int restore_image(void); > + > +/* > + * Address to jump to in the last phase of restore in order to get to the > image > + * kernel's text (this value is passed in the image header). > + */ > +unsigned long restore_jump_address __visible; > +unsigned long jump_address_phys; > + > +/* > + * Value of the cr3 register from before the hibernation (this value is > passed > + * in the image header). > + */ > +unsigned long restore_cr3 __visible; > +unsigned long temp_pgt __visible; > +unsigned long relocated_restore_code __visible; > + > +#ifdef CONFIG_X86_32 > + #define RESTORE_MAGIC 0x12345678UL > +#else > + #define RESTORE_MAGIC 0x23456789ABCDEF01UL > +#endif > + > +/** > + * pfn_is_nosave - check if given pfn is in the 'nosave' section > + * @pfn: the page frame number to be checked > + * > + * returns non-zero if the pfn is in the 'nosave' section, > + * otherwise returns zero > + */ > +int pfn_is_nosave(unsigned long pfn) > +{ > + unsigned long nosave_begin_pfn; > + unsigned long nosave_end_pfn; > + > + nosave_begin_pfn = __pa_symbol(&__nosave_begin) >> PAGE_SHIFT; > + nosave_end_pfn = PAGE_ALIGN(__pa_symbol(&__nosave_end)) >> PAGE_SHIFT; > + > + return pfn >= nosave_begin_pfn && pfn < nosave_end_pfn; > +} > + > +#ifdef CONFIG_X86_64 > +static int relocate_restore_code(void) > +{ > + pgd_t *pgd; > + p4d_t *p4d; > + pud_t *pud; > + pmd_t *pmd; > + pte_t *pte; > + > + relocated_restore_code = get_safe_page(GFP_ATOMIC); > + if (!relocated_restore_code) > + return -ENOMEM; > + > + memcpy((void *)relocated_restore_code, core_restore_code, PAGE_SIZE); > + > + /* Make the page containing the relocated code executable */ > + pgd = (pgd_t *)__va(read_cr3_pa()) + > + pgd_index(relocated_restore_code); > + p4d = p4d_offset(pgd, relocated_restore_code); > + if (p4d_large(*p4d)) { > + set_p4d(p4d, __p4d(p4d_val(*p4d) & ~_PAGE_NX)); > + goto out; > + } > + pud = pud_offset(p4d, relocated_restore_code); > + if (pud_large(*pud)) { > + set_pud(pud, __pud(pud_val(*pud) & ~_PAGE_NX)); > + goto out; > + } > + pmd = pmd_offset(pud, relocated_restore_code); > + if (pmd_large(*pmd)) { > + set_pmd(pmd, __pmd(pmd_val(*pmd) & ~_PAGE_NX)); > + goto out; > + } > + pte = pte_offset_kernel(pmd, relocated_restore_code); > + set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_NX)); > +out: > + __flush_tlb_all(); > + return 0; > +} > + > +#define MD5_DIGEST_SIZE 16 > + > +struct restore_data_record { > + unsigned long jump_address; > + unsigned long jump_address_phys; > + unsigned long cr3; > + unsigned long magic; > + u8 e820_digest[MD5_DIGEST_SIZE]; > +}; > + > +#if IS_BUILTIN(CONFIG_CRYPTO_MD5) > +/** > + * get_e820_md5 - calculate md5 according to given e820 table > + * > + * @table: the e820 table to be calculated > + * @buf: the md5 result to be stored to > + */ > +static int get_e820_md5(struct e820_table *table, void *buf) > +{ > + struct crypto_shash *tfm; > + struct shash_desc *desc; > + int size; > + int ret = 0; > + > + tfm = crypto_alloc_shash("md5", 0, 0); > + if (IS_ERR(tfm)) > + return -ENOMEM; > + > + desc = kmalloc(sizeof(struct shash_desc) + crypto_shash_descsize(tfm), > +GFP_KERNEL); > + if (!desc) { > + ret = -ENOMEM; > + goto free_tfm; > + } > + > + desc->tfm = tfm
OK
Greetings, I humbly solicit for your partnership to transfer €15 million Euros into your personal or company’s account .Contact me for more detailed explanation. Kindly send me the followings Full Names Address Occupation Direct Mobile Telephone Lines Nationality Ahmed Zama +22675844869
Re: [RESEND PATCH V5 0/8] x86: infrastructure to enable FSGSBASE
* Chang S. Bae wrote: > Resending the patchset that was posted before [6]. > > Given feedbacks from [1], it was suggested to separate two parts > and to (re-)submit this patchset first. > > To facilitate FSGSBASE, Andy's FS/GS base read fix is first > ordered, then some helper functions and refactoring work > are included. Cleanup for the vDSO initialization is > for preparing per-CPU base finder that will be used in the > paranoid entry, when FSGSBASE enabled. > > Changes from V4 [5]: > * Change patch ordering; putting the fix first before introducing > the helper functions > * Cleanup further for vDSO CPU initialization codes > > Changes from V3 [4]: > * Unify CPU number initialization > > Changes from V2 [3]: > * Bisect the CPU number initialization patch > * Drop patches for introducing i386 CPU_NUMBER and switching > write_rdtscp_aux() to use wrmsr_safe() > > Changes from V1 [2]: > * Rename the x86-64 CPU_NUMBER segment from PER_CPU > * Add i386 CPU_NUMBER equivalent to x86-64 at GDT entry 23 > * Add additional helper function to store CPU number > * Switch write_rdtscp_aux() to use wrmsr_safe() > > [1] FSGSBASE patch set V2: > https://lore.kernel.org/patchwork/cover/912063/ > [2] infrastructure for FSGSBASE > V1: https://lore.kernel.org/patchwork/cover/913139/ > [3] V2: https://lore.kernel.org/patchwork/cover/913644/ > [4] V3: https://lore.kernel.org/patchwork/cover/949775/ > [5] V4: https://lore.kernel.org/patchwork/cover/951712/ > [6] V5: https://lore.kernel.org/patchwork/cover/956526/ > > Andy Lutomirski (1): > x86/arch_prctl/64: Make ptrace read FS/GS base accurately > > Chang S. Bae (7): > x86/fsgsbase/64: Introduce FS/GS base helper functions > x86/fsgsbase/64: Make ptrace use FS/GS base helpers > x86/fsgsbase/64: Use FS/GS base helpers in core dump > x86/fsgsbase/64: Factor out load FS/GS segments from __switch_to > x86/segments/64: Rename PER_CPU segment to CPU_NUMBER > x86/vdso: Introduce helper functions for CPU and node number > x86/vdso: Move out the CPU initialization > > arch/x86/entry/vdso/vgetcpu.c | 9 +- > arch/x86/entry/vdso/vma.c | 38 + > arch/x86/include/asm/elf.h | 6 +- > arch/x86/include/asm/fsgsbase.h | 47 +++ > arch/x86/include/asm/segment.h | 46 +- > arch/x86/include/asm/vgtod.h| 26 -- > arch/x86/kernel/cpu/common.c| 24 ++ > arch/x86/kernel/process_64.c| 183 > +++- > arch/x86/kernel/ptrace.c| 28 ++ > 9 files changed, 270 insertions(+), 137 deletions(-) > create mode 100644 arch/x86/include/asm/fsgsbase.h Looks mostly good, here's a couple of details I noticed before I almost applied the series: - Please use a single, unified name-space for all the new FS/GS helpers, such as: x86_fsbase_read_cpu() x86_fsbase_read_task() x86_fsbase_write_cpu() x86_fsbase_write_cpu_inactive() x86_fsbase_write_task() x86_gsbase_read_cpu() x86_gsbase_read_task() x86_gsbase_write_task() x86_fsgsbase_load() # was: load_fsgs() x86_fsgsbase_read_task()# was: task_seg_base() etc. - Please port patch #7 to latest -tip, it's conflicting with: e78e5a91456f: x86/vdso: Fix lsl operand order The resolution is to preserve the modified code. Also some style nits: - Please fix the inconsistent capitalization of 'CPU' in various places, comments, descriptions, etc. - Comments should start with a capital letter, like sentences do. Right now there's a mixed style of: + /* +* If performance here mattered, we could protect the LDT +* with RCU. This is a slow path, though, so we can just +* take the mutex. +*/ + /* +* set the selector to 0 as a notion, that the segment base is +* overwritten, which will be checked for skipping the segment load +* during context switch. +*/ - Please refer to functions in changelogs and comments with fn_name(), not just fn_name. For example: x86/fsgsbase/64: Factor out load FS/GS segments from __switch_to But there's more examples. - Please read your own patches once again and fix typos in comments and changelogs, such as: +* correctly wrt barrier() and to keep gcc from cleverly Thanks, Ingo
Re: [PATCH v8 04/26] PM / Domains: Add support for CPU devices to genpd
On Friday, August 24, 2018 8:47:21 AM CEST Ulf Hansson wrote: > On 6 August 2018 at 11:36, Rafael J. Wysocki wrote: > > On Fri, Aug 3, 2018 at 1:43 PM, Ulf Hansson wrote: > >> On 19 July 2018 at 12:25, Rafael J. Wysocki wrote: > >>> On Wednesday, June 20, 2018 7:22:04 PM CEST Ulf Hansson wrote: > To enable a device belonging to a CPU to be attached to a PM domain > managed > by genpd, let's do a few changes to genpd as to make it convenient to > manage the specifics around CPUs. > > First, as to be able to quickly find out what CPUs that are attached to a > genpd, which typically becomes useful from a genpd governor as following > changes is about to show, let's add a cpumask 'cpus' to the struct > generic_pm_domain. > > At the point when a device that belongs to a CPU, is attached/detached to > its corresponding PM domain via genpd_add_device(), let's update the > cpumask in genpd->cpus. Moreover, propagate the update of the cpumask to > the master domains, which makes the genpd->cpus to contain a cpumask that > hierarchically reflect all CPUs for a genpd, including CPUs attached to > subdomains. > > Second, to unconditionally manage CPUs and the cpumask in genpd->cpus, is > unnecessary for cases when only non-CPU devices are parts of a genpd. > Let's avoid this by adding a new configuration bit, > GENPD_FLAG_CPU_DOMAIN. > Clients must set the bit before they call pm_genpd_init(), as to instruct > genpd that it shall deal with CPUs and thus manage the cpumask in > genpd->cpus. > > Cc: Lina Iyer > Co-developed-by: Lina Iyer > Signed-off-by: Ulf Hansson > --- > drivers/base/power/domain.c | 69 - > include/linux/pm_domain.h | 3 ++ > 2 files changed, 71 insertions(+), 1 deletion(-) > > diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c > index 21d298e1820b..6149ce0bfa7b 100644 > --- a/drivers/base/power/domain.c > +++ b/drivers/base/power/domain.c > @@ -20,6 +20,7 @@ > #include > #include > #include > +#include > > #include "power.h" > > @@ -126,6 +127,7 @@ static const struct genpd_lock_ops genpd_spin_ops = { > #define genpd_is_irq_safe(genpd) (genpd->flags & > GENPD_FLAG_IRQ_SAFE) > #define genpd_is_always_on(genpd)(genpd->flags & > GENPD_FLAG_ALWAYS_ON) > #define genpd_is_active_wakeup(genpd)(genpd->flags & > GENPD_FLAG_ACTIVE_WAKEUP) > +#define genpd_is_cpu_domain(genpd) (genpd->flags & > GENPD_FLAG_CPU_DOMAIN) > > static inline bool irq_safe_dev_in_no_sleep_domain(struct device *dev, > const struct generic_pm_domain *genpd) > @@ -1377,6 +1379,62 @@ static void genpd_free_dev_data(struct device > *dev, > dev_pm_put_subsys_data(dev); > } > > +static void __genpd_update_cpumask(struct generic_pm_domain *genpd, > +int cpu, bool set, unsigned int depth) > +{ > + struct gpd_link *link; > + > + if (!genpd_is_cpu_domain(genpd)) > + return; > + > + list_for_each_entry(link, &genpd->slave_links, slave_node) { > + struct generic_pm_domain *master = link->master; > + > + genpd_lock_nested(master, depth + 1); > + __genpd_update_cpumask(master, cpu, set, depth + 1); > + genpd_unlock(master); > + } > + > + if (set) > + cpumask_set_cpu(cpu, genpd->cpus); > + else > + cpumask_clear_cpu(cpu, genpd->cpus); > +} > >>> > >>> As noted elsewhere, there is a concern about the possible weight of this > >>> cpumask and I think that it would be good to explicitly put a limit on it. > >> > >> I have been digesting your comments on the series, but wonder if this > >> is still a relevant concern? > > > > Well, there are systems with very large cpumasks and it is sort of > > good to have that in mind when designing any code using them. > > Right. > > So, if I avoid allocating the cpumask for those genpd structures that > doesn't need it (those not having GENPD_FLAG_CPU_DOMAIN set), would > that be sufficient to deal with your concern? Yes, it would, if I understand you correctly. Thanks, Rafael
Re: [PATCH v4 06/16] sched/cpufreq: uclamp: add utilization clamping for FAIR tasks
On Tue, Aug 28, 2018 at 02:53:14PM +0100, Patrick Bellasi wrote: > diff --git a/kernel/sched/cpufreq_schedutil.c > b/kernel/sched/cpufreq_schedutil.c > index 3fffad3bc8a8..949082555ee8 100644 > --- a/kernel/sched/cpufreq_schedutil.c > +++ b/kernel/sched/cpufreq_schedutil.c > @@ -222,8 +222,13 @@ static unsigned long sugov_get_util(struct sugov_cpu > *sg_cpu) >* CFS tasks and we use the same metric to track the effective >* utilization (PELT windows are synchronized) we can directly add them >* to obtain the CPU's actual utilization. > + * > + * CFS utilization can be boosted or capped, depending on utilization > + * clamp constraints configured for currently RUNNABLE tasks. >*/ > util = cpu_util_cfs(rq); > + if (util) > + util = uclamp_util(rq, util); Should that not be: util = clamp_util(rq, cpu_util_cfs(rq)); Because if !util might we not still want to enforce the min clamp? > util += cpu_util_rt(rq); > > /* > @@ -322,11 +328,24 @@ static void sugov_iowait_boost(struct sugov_cpu > *sg_cpu, u64 time, > return; > sg_cpu->iowait_boost_pending = true; > > + /* > + * Boost FAIR tasks only up to the CPU clamped utilization. > + * > + * Since DL tasks have a much more advanced bandwidth control, it's > + * safe to assume that IO boost does not apply to those tasks. > + * Instead, since RT tasks are not utiliation clamped, we don't want > + * to apply clamping on IO boost while there is blocked RT > + * utilization. > + */ > + max_boost = sg_cpu->iowait_boost_max; > + if (!cpu_util_rt(cpu_rq(sg_cpu->cpu))) > + max_boost = uclamp_util(cpu_rq(sg_cpu->cpu), max_boost); OK I suppose. > + > /* Double the boost at each request */ > if (sg_cpu->iowait_boost) { > sg_cpu->iowait_boost <<= 1; > - if (sg_cpu->iowait_boost > sg_cpu->iowait_boost_max) > - sg_cpu->iowait_boost = sg_cpu->iowait_boost_max; > + if (sg_cpu->iowait_boost > max_boost) > + sg_cpu->iowait_boost = max_boost; > return; > } > > +static inline unsigned int uclamp_value(struct rq *rq, int clamp_id) > +{ > + struct uclamp_cpu *uc_cpu = &rq->uclamp; > + > + if (uc_cpu->value[clamp_id] == UCLAMP_NOT_VALID) > + return uclamp_none(clamp_id); > + > + return uc_cpu->value[clamp_id]; > +} Would that not be more readable as: static inline unsigned int uclamp_value(struct rq *rq, int clamp_id) { unsigned int val = rq->uclamp.value[clamp_id]; if (unlikely(val == UCLAMP_NOT_VALID)) val = uclamp_none(clamp_id); return val; } And how come NOT_VALID is possible? I thought the idea was to always have all things a valid value.
Re: [RFCv2 00/48] perf tools: Add threads to record command
* Namhyung Kim wrote: > > > The perf.data stays as a single file. > > I'm not sure we really need to keep it as a single file. As it's a > kind of big changes, we might consider breaking compatibility and use > a directory structure. Agreed - and to make use of the highly scalable Linux VFS implementation we should attempt to use per CPU file resources as well. Any cross-CPU contention should stick out like a sore thumb. > > There is usage of Posix threading API but there is no > > its implementation in the patch series, to avoid dependency > > on externally coded designs in the core of the tool. > > Do you mean it needs to implement its own threading? I don't think > that's what Ingo wanted to. Yeah, I didn't mean that: every libc hoping to work on Linux implements a pthread API, plus the pthread APIs we are using are really just narrow wrappers on top of system calls that were written with libc pthread APIs in mind. So it's not a problem to rely on pthreads.h. (And if we have trouble with any particular pthread detail we can single out specific functionality and not use it or use our own implementation.) The AIO library is another matter: it's a family of interfaces with complex libc specific design choices that cannot be influenced. I.e. my suggestion was to keep using pthreads APIs like we do today, but not use the libc AIO library. Not because there's any problem with glibc AIO: but because basic event flow is a core competency of perf that we want to implement ourselves. Is this clearer? Thanks, Ingo
Re: [RFCv2 00/48] perf tools: Add threads to record command
On 14.09.2018 11:28, Jiri Olsa wrote: > On Fri, Sep 14, 2018 at 10:26:53AM +0200, Jiri Olsa wrote: > > SNIP > The threaded monitoring currently can't monitor backward maps and there are probably more limitations which I haven't spotted yet. So far I tested on laptop: http://people.redhat.com/~jolsa/record_threads/test-4CPU.txt and a one bigger server: http://people.redhat.com/~jolsa/record_threads/test-208CPU.txt I can see decrease in recorded LOST events, but both the benchmark and the monitoring must be carefully configured wrt: - number of events (frequency) - size of the memory maps - size of events (callchains) - final perf.data size It's also available in: git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git perf/record_threads thoughts? ;-) thanks jirka >>> >>> It is preferable to split into smaller pieces that bring >>> some improvement proved by metrics numbers and ready for >>> merging and upstream. Do we have more metrics than the >>> data loss from trace AIO patches? >> >> well the primary focus is to get more events in, >> so the LOST metric is the main one > > actualy I was hoping, could you please run it through the same > tests as you do for AIO code on some huge server? Yeah, I will, but it takes some time. > > thanks, > jirka >
Re: [RFCv2 00/48] perf tools: Add threads to record command
* Jiri Olsa wrote: > On Fri, Sep 14, 2018 at 11:29:10AM +0900, Namhyung Kim wrote: > > On Thu, Sep 13, 2018 at 07:10:35PM +0300, Alexey Budankov wrote: > > > Hi, > > > > Hello, > > > > > > > > On 13.09.2018 15:54, Jiri Olsa wrote: > > > > hi, > > > > sending *RFC* for threads support in perf record command. > > > > > > > > In big picture this patchset adds perf record --threads > > > > option that allows to create threads in following modes: > > > > > > > > 1) single thread mode (current) > > > > > > > > $ perf record ... > > > > $ perf record --threads=1 ... > > > > > > > > - all maps are read/stored under process thread > > > > > > > > 2) mode with specific (X) number of threads > > > > > > > > $ perf record --threads=X ... > > > > > > > > - maps are spread equaly among threads > > > > > > > > 3) mode that creates thread for every monitored memory map > > > > > > > > $ perf record --threads ... > > > > > > > > - which in perf record is equal to number of CPUs, and > > > > it pins each thread to its map's cpu: > > > > > > > > 4) TODO - NUMA aware threads/maps separation > > > >... > > > > > > > > The perf.data stays as a single file. > > > > I'm not sure we really need to keep it as a single file. As it's a > > kind of big changes, we might consider breaking compatibility and use > > a directory structure. > > moving the files into the perf.data at the end is actualy > not a lot code.. and I think it's one of the 'small' things > that make this feature more user friendly So the user shouldn't really care about the structure of the file when most uses of perf tooling, and 'single file' versus 'single directory' has similar usability IMHO. When moving across machines it's recommended to use 'perf archive' anyway, which already creates a tarball that includes debuginfo and other context. In fact keeping the files separate has scalability advantages for 'perf report' and similar parsing tools: they could read all the streams in a per-CPU fashion already, from the very beginning. BTW., random annoyance bugreport, for me 'perf archive' is spewing a ton of these messages: $ perf archive unwind: target platform=x86 is not supported unwind: target platform=x86 is not supported unwind: target platform=x86 is not supported ... unwind: target platform=x86 is not supported unwind: target platform=x86 is not supported Now please run: $ tar xvf perf.data.tar.bz2 -C ~/.debug wherever you need to run 'perf report' on. $ perf version perf version 4.19.rc2.gcb48b6 That message is repeated 7,200 times (!) and immediately nuked my terminal history. :-/ Even if we want to emit that warning (we really shouldn't unless it's important for the user to know), there's no reason to print thousands of messages to stderr. Thanks, Ingo
[PATCH] add label interface for cache device in sysfs
label interface will be called by bcache tools in user space. Signed-off-by: Dongbo Cao --- drivers/md/bcache/sysfs.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c index 225b15aa..e64c718f 100644 --- a/drivers/md/bcache/sysfs.c +++ b/drivers/md/bcache/sysfs.c @@ -950,6 +950,13 @@ SHOW(__bch_cache) return ret; } + if (attr == &sysfs_label) { + memcpy(buf, ca->sb.label, SB_LABEL_SIZE); + buf[SB_LABEL_SIZE + 1] = '\0'; + strcat(buf, "\n"); + return strlen(buf); + } + return 0; } SHOW_LOCKED(bch_cache) @@ -993,6 +1000,17 @@ STORE(__bch_cache) atomic_set(&ca->io_errors, 0); } + if (attr == &sysfs_label) { + if (size > SB_LABEL_SIZE) + return -EINVAL; + memcpy(ca->sb.label, buf, size); + if (size < SB_LABEL_SIZE) + ca->sb.label[size] = '\0'; + if (size && ca->sb.label[size - 1] == '\n') + ca->sb.label[size - 1] = '\0'; + bcache_write_super(ca->set); + } + return size; } STORE_LOCKED(bch_cache) @@ -1009,6 +1027,7 @@ static struct attribute *bch_cache_files[] = { &sysfs_io_errors, &sysfs_clear_stats, &sysfs_cache_replacement_policy, + &sysfs_label, NULL }; KTYPE(bch_cache); -- 2.17.1
Re: [PATCH v8 07/26] PM / Domains: Add genpd governor for CPUs
On Thursday, August 9, 2018 5:39:25 PM CEST Lorenzo Pieralisi wrote: > On Mon, Aug 06, 2018 at 11:20:59AM +0200, Rafael J. Wysocki wrote: > > [...] > > > >>> > @@ -245,6 +248,56 @@ static bool always_on_power_down_ok(struct > > >>> > dev_pm_domain *domain) > > >>> > return false; > > >>> > } > > >>> > > > >>> > +static bool cpu_power_down_ok(struct dev_pm_domain *pd) > > >>> > +{ > > >>> > + struct generic_pm_domain *genpd = pd_to_genpd(pd); > > >>> > + ktime_t domain_wakeup, cpu_wakeup; > > >>> > + s64 idle_duration_ns; > > >>> > + int cpu, i; > > >>> > + > > >>> > + if (!(genpd->flags & GENPD_FLAG_CPU_DOMAIN)) > > >>> > + return true; > > >>> > + > > >>> > + /* > > >>> > +* Find the next wakeup for any of the online CPUs within the PM > > >>> > domain > > >>> > +* and its subdomains. Note, we only need the genpd->cpus, as it > > >>> > already > > >>> > +* contains a mask of all CPUs from subdomains. > > >>> > +*/ > > >>> > + domain_wakeup = ktime_set(KTIME_SEC_MAX, 0); > > >>> > + for_each_cpu_and(cpu, genpd->cpus, cpu_online_mask) { > > >>> > + cpu_wakeup = tick_nohz_get_next_wakeup(cpu); > > >>> > + if (ktime_before(cpu_wakeup, domain_wakeup)) > > >>> > + domain_wakeup = cpu_wakeup; > > >>> > + } > > >> > > >> Here's a concern I have missed before. :-/ > > >> > > >> Say, one of the CPUs you're walking here is woken up in the meantime. > > > > > > Yes, that can happen - when we miss-predicted "next wakeup". > > > > > >> > > >> I don't think it is valid to evaluate tick_nohz_get_next_wakeup() for it > > >> then > > >> to update domain_wakeup. We really should just avoid the domain power > > >> off in > > >> that case at all IMO. > > > > > > Correct. > > > > > > However, we also want to avoid locking contentions in the idle path, > > > which is what this boils done to. > > > > This already is done under genpd_lock() AFAICS, so I'm not quite sure > > what exactly you mean. > > > > Besides, this is not just about increased latency, which is a concern > > by itself but maybe not so much in all environments, but also about > > possibility of missing a CPU wakeup, which is a major issue. > > > > If one of the CPUs sharing the domain with the current one is woken up > > during cpu_power_down_ok() and the wakeup is an edge-triggered > > interrupt and the domain is turned off regardless, the wakeup may be > > missed entirely if I'm not mistaken. > > > > It looks like there needs to be a way for the hardware to prevent a > > domain poweroff when there's a pending interrupt or I don't quite see > > how this can be handled correctly. > > > > >> Sure enough, if the domain power off is already started and one of the > > >> CPUs > > >> in the domain is woken up then, too bad, it will suffer the latency (but > > >> in > > >> that case the hardware should be able to help somewhat), but otherwise > > >> CPU > > >> wakeup should prevent domain power off from being carried out. > > > > > > The CPU is not prevented from waking up, as we rely on the FW to deal > > > with that. > > > > > > Even if the above computation turns out to wrongly suggest that the > > > cluster can be powered off, the FW shall together with the genpd > > > backend driver prevent it. > > > > Fine, but then the solution depends on specific FW/HW behavior, so I'm > > not sure how generic it really is. At least, that expectation should > > be clearly documented somewhere, preferably in code comments. > > > > > To cover this case for PSCI, we also use a per cpu variable for the > > > CPU's power off state, as can be seen later in the series. > > > > Oh great, but the generic part should be independent on the underlying > > implementation of the driver. If it isn't, then it also is not > > generic. > > > > > Hope this clarifies your concern, else tell and will to elaborate a bit > > > more. > > > > Not really. > > > > There also is one more problem and that is the interaction between > > this code and the idle governor. > > > > Namely, the idle governor may select a shallower state for some > > reason, for example due to an additional latency limit derived from > > CPU utilization (like in the menu governor), and how does the code in > > cpu_power_down_ok() know what state has been selected and how does it > > honor the selection made by the idle governor? > > That's a good question and it maybe gives a path towards a solution. > > AFAICS the genPD governor only selects the idle state parameter that > determines the idle state at, say, GenPD cpumask level it does not touch > the CPUidle decision, that works on a subset of idle states (at cpu > level). I've deferred responding to this as I wasn't quite sure if I followed you at that time, but I'm afraid I'm still not following you now. :-) The idle governor has to take the total worst-case wakeup latency into account. Not just from the logical CPU itself, but also from whatever state the SoC
Re: [PATCH] PM / suspend: Count suspend-to-idle loop as sleep time
On 09/14/2018 11:46 AM, Rafael J. Wysocki wrote: > On Friday, September 14, 2018 10:28:44 AM CEST Mika Penttilä wrote: >> Hi! >> >> >> On 09/14/2018 09:59 AM, Rafael J. Wysocki wrote: >>> From: Rafael J. Wysocki >>> >>> There is a difference in behavior between suspend-to-idle and >>> suspend-to-RAM in the timekeeping handling that leads to functional >>> issues. Namely, every iteration of the loop in s2idle_loop() >>> increases the monotinic clock somewhat, even if timekeeping_suspend() >>> and timekeeping_resume() are invoked from s2idle_enter(), and if >>> many of them are carried out in a row, the monotonic clock can grow >>> significantly while the system is regarded as suspended, which >>> doesn't happen during suspend-to-RAM and so it is unexpected and >>> leads to confusion and misbehavior in user space (similar to what >>> ensued when we tried to combine the boottime and monotonic clocks). >>> >>> To avoid that, count all iterations of the loop in s2idle_loop() >>> as "sleep time" and adjust the clock for that on exit from >>> suspend-to-idle. >>> >>> [That also covers systems on which timekeeping is not suspended >>> by by s2idle_enter().] >>> >>> Signed-off-by: Rafael J. Wysocki >>> --- >>> >>> This is a replacement for https://patchwork.kernel.org/patch/10599209/ >>> >>> I decided to count the entire loop in s2idle_loop() as "sleep time" as the >>> patch is then simpler and it also covers systems where timekeeping is not >>> suspended in the final step of suspend-to-idle. >>> >>> I dropped the "Fixes:" tag, because the monotonic clock delta problem >>> has been present on the latter since the very introduction of "freeze" >>> (as suspend-to-idle was referred to previously) and so this doesn't fix >>> any particular later commits. >>> >>> --- >>> kernel/power/suspend.c | 18 ++ >>> 1 file changed, 18 insertions(+) >>> >>> Index: linux-pm/kernel/power/suspend.c >>> === >>> --- linux-pm.orig/kernel/power/suspend.c >>> +++ linux-pm/kernel/power/suspend.c >>> @@ -109,8 +109,12 @@ static void s2idle_enter(void) >>> >>> static void s2idle_loop(void) >>> { >>> + ktime_t start, delta; >>> + >>> pm_pr_dbg("suspend-to-idle\n"); >>> >>> + start = ktime_get(); >>> + >>> for (;;) { >>> int error; >>> >>> @@ -150,6 +154,20 @@ static void s2idle_loop(void) >>> pm_wakeup_clear(false); >>> } >>> >>> + /* >>> +* If the monotonic clock difference between the start of the loop and >>> +* this point is too large, user space may get confused about whether or >>> +* not the system has been suspended and tasks may get killed by >>> +* watchdogs etc., so count the loop as "sleep time" to compensate for >>> +* that. >>> +*/ >>> + delta = ktime_sub(ktime_get(), start); >>> + if (ktime_to_ns(delta) > 0) { >>> + struct timespec64 timespec64_delta = ktime_to_timespec64(delta); >>> + >>> + timekeeping_inject_sleeptime64(×pec64_delta); >>> + } >> >> But doesn't injecting sleep time here make monotonic clock too large by the >> amount of sleeptime? >> tick_freeze() / tick_unfreeze() already injects the sleeptime (otherwise >> delta would be 0). > > No, it doesn't. > > The delta here is the extra time taken by the loop which hasn't been counted > as sleep time yet. I said incorrectly monotonic clock, but timekeeping_inject_sleeptime64() forwards the wall time, by the amount of delta. Why wouldn't some other cpu update xtime when one cpu is in the loop? And if all cpus enter s2idle, tick_unfreeze() injects sleeptime. My point is that this extra injection makes wall time wrong, no? > > Thanks, > Rafael >
Dear intended recipient
-- Greetings, I wish to seek your assistance for the transfer of US$35M depository made by a politician for an investment programme that has remained dormant for years now.I shall provide you with more details and relevant documents that will help you understand the transaction. Mr. Hama Diallo --
Dear intended recipient
-- Greetings, I wish to seek your assistance for the transfer of US$35M depository made by a politician for an investment programme that has remained dormant for years now.I shall provide you with more details and relevant documents that will help you understand the transaction. Mr. Hama Diallo --
Re: [PATCH] add label interface for cache device in sysfs
On 9/14/18 5:51 PM, Dongbo Cao wrote: label interface will be called by bcache tools in user space. Signed-off-by: Dongbo Cao Hi Dongbo, In your change I see you set superblock label to cache set. What is the use case for doing this ? Thanks. Coly Li --- drivers/md/bcache/sysfs.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c index 225b15aa..e64c718f 100644 --- a/drivers/md/bcache/sysfs.c +++ b/drivers/md/bcache/sysfs.c @@ -950,6 +950,13 @@ SHOW(__bch_cache) return ret; } + if (attr == &sysfs_label) { + memcpy(buf, ca->sb.label, SB_LABEL_SIZE); + buf[SB_LABEL_SIZE + 1] = '\0'; + strcat(buf, "\n"); + return strlen(buf); + } + return 0; } SHOW_LOCKED(bch_cache) @@ -993,6 +1000,17 @@ STORE(__bch_cache) atomic_set(&ca->io_errors, 0); } + if (attr == &sysfs_label) { + if (size > SB_LABEL_SIZE) + return -EINVAL; + memcpy(ca->sb.label, buf, size); + if (size < SB_LABEL_SIZE) + ca->sb.label[size] = '\0'; + if (size && ca->sb.label[size - 1] == '\n') + ca->sb.label[size - 1] = '\0'; + bcache_write_super(ca->set); + } + return size; } STORE_LOCKED(bch_cache) @@ -1009,6 +1027,7 @@ static struct attribute *bch_cache_files[] = { &sysfs_io_errors, &sysfs_clear_stats, &sysfs_cache_replacement_policy, + &sysfs_label, NULL }; KTYPE(bch_cache);
Re: [PATCH] PM / suspend: Count suspend-to-idle loop as sleep time
On Fri, Sep 14, 2018 at 11:53 AM Mika Penttilä wrote: > > On 09/14/2018 11:46 AM, Rafael J. Wysocki wrote: > > On Friday, September 14, 2018 10:28:44 AM CEST Mika Penttilä wrote: > >> Hi! > >> > >> > >> On 09/14/2018 09:59 AM, Rafael J. Wysocki wrote: > >>> From: Rafael J. Wysocki > >>> > >>> There is a difference in behavior between suspend-to-idle and > >>> suspend-to-RAM in the timekeeping handling that leads to functional > >>> issues. Namely, every iteration of the loop in s2idle_loop() > >>> increases the monotinic clock somewhat, even if timekeeping_suspend() > >>> and timekeeping_resume() are invoked from s2idle_enter(), and if > >>> many of them are carried out in a row, the monotonic clock can grow > >>> significantly while the system is regarded as suspended, which > >>> doesn't happen during suspend-to-RAM and so it is unexpected and > >>> leads to confusion and misbehavior in user space (similar to what > >>> ensued when we tried to combine the boottime and monotonic clocks). > >>> > >>> To avoid that, count all iterations of the loop in s2idle_loop() > >>> as "sleep time" and adjust the clock for that on exit from > >>> suspend-to-idle. > >>> > >>> [That also covers systems on which timekeeping is not suspended > >>> by by s2idle_enter().] > >>> > >>> Signed-off-by: Rafael J. Wysocki > >>> --- > >>> > >>> This is a replacement for https://patchwork.kernel.org/patch/10599209/ > >>> > >>> I decided to count the entire loop in s2idle_loop() as "sleep time" as the > >>> patch is then simpler and it also covers systems where timekeeping is not > >>> suspended in the final step of suspend-to-idle. > >>> > >>> I dropped the "Fixes:" tag, because the monotonic clock delta problem > >>> has been present on the latter since the very introduction of "freeze" > >>> (as suspend-to-idle was referred to previously) and so this doesn't fix > >>> any particular later commits. > >>> > >>> --- > >>> kernel/power/suspend.c | 18 ++ > >>> 1 file changed, 18 insertions(+) > >>> > >>> Index: linux-pm/kernel/power/suspend.c > >>> === > >>> --- linux-pm.orig/kernel/power/suspend.c > >>> +++ linux-pm/kernel/power/suspend.c > >>> @@ -109,8 +109,12 @@ static void s2idle_enter(void) > >>> > >>> static void s2idle_loop(void) > >>> { > >>> + ktime_t start, delta; > >>> + > >>> pm_pr_dbg("suspend-to-idle\n"); > >>> > >>> + start = ktime_get(); > >>> + > >>> for (;;) { > >>> int error; > >>> > >>> @@ -150,6 +154,20 @@ static void s2idle_loop(void) > >>> pm_wakeup_clear(false); > >>> } > >>> > >>> + /* > >>> +* If the monotonic clock difference between the start of the loop and > >>> +* this point is too large, user space may get confused about whether > >>> or > >>> +* not the system has been suspended and tasks may get killed by > >>> +* watchdogs etc., so count the loop as "sleep time" to compensate for > >>> +* that. > >>> +*/ > >>> + delta = ktime_sub(ktime_get(), start); > >>> + if (ktime_to_ns(delta) > 0) { > >>> + struct timespec64 timespec64_delta = > >>> ktime_to_timespec64(delta); > >>> + > >>> + timekeeping_inject_sleeptime64(×pec64_delta); > >>> + } > >> > >> But doesn't injecting sleep time here make monotonic clock too large by > >> the amount of sleeptime? > >> tick_freeze() / tick_unfreeze() already injects the sleeptime (otherwise > >> delta would be 0). > > > > No, it doesn't. > > > > The delta here is the extra time taken by the loop which hasn't been counted > > as sleep time yet. > > I said incorrectly monotonic clock, but timekeeping_inject_sleeptime64() > forwards the wall time, by the amount of delta. > Why wouldn't some other cpu update xtime when one cpu is in the loop? And if > all cpus enter s2idle, tick_unfreeze() > injects sleeptime. My point is that this extra injection makes wall time > wrong, no? OK, you're right. I got that the other way around. So, the patch is withdrawn.
[PATCH 2/2] add some safety checks for label store interface of flash dev in sysfs
do some checks on the label's length and ending. Signed-off-by: Dongbo Cao --- drivers/md/bcache/sysfs.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c index e64c718f..cce793ef 100644 --- a/drivers/md/bcache/sysfs.c +++ b/drivers/md/bcache/sysfs.c @@ -447,8 +447,15 @@ STORE(__bch_flash_dev) } if (attr == &sysfs_label) { + if (size > SB_LABEL_SIZE) + return -EINVAL; memcpy(u->label, buf, SB_LABEL_SIZE); - bch_uuid_write(d->c); + if (size < SB_LABEL_SIZE) + u->label[size] = '\0'; + if (size && u->label[size - 1] == '\n') + u->label[size - 1] = '\0'; + if(d->c) + bch_uuid_write(d->c); } if (attr == &sysfs_unregister) { -- 2.17.1
[PATCH 0/2] Fixes for the SAMA5D2/SAMA5D4 watchdog
A previous change of the sama5d4_wdt driver broke the device probing with the device tree configuration described in existing DTS files, when no value is set for the "timeout-sec" property. Moreover, specifying any other value than 16 seconds for "timeout-sec" leads to a watchdog reset immediately when opening the watchdog device. Fix this by respecting hardware constraints added in recent versions of the SAMA5D2 and SAMA5D4 datasheet. Romain Izard (2): watchdog: sama5d4: fix timeout-sec usage watchdog: sama5d4: write the mode register in two steps drivers/watchdog/sama5d4_wdt.c | 33 + 1 file changed, 21 insertions(+), 12 deletions(-) -- 2.17.1
[PATCH 1/2] watchdog: sama5d4: fix timeout-sec usage
When using watchdog_init_timeout to update the default timeout value, an error means that there is no "timeout-sec" in the relevant device tree node. This should not prevent binding of the driver to the device. Fixes: 976932e40036 ("watchdog: sama5d4: make use of timeout-secs provided in devicetree") Signed-off-by: Romain Izard --- drivers/watchdog/sama5d4_wdt.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/drivers/watchdog/sama5d4_wdt.c b/drivers/watchdog/sama5d4_wdt.c index 255169916dbb..1e93c1b0e3cf 100644 --- a/drivers/watchdog/sama5d4_wdt.c +++ b/drivers/watchdog/sama5d4_wdt.c @@ -247,11 +247,7 @@ static int sama5d4_wdt_probe(struct platform_device *pdev) } } - ret = watchdog_init_timeout(wdd, wdt_timeout, &pdev->dev); - if (ret) { - dev_err(&pdev->dev, "unable to set timeout value\n"); - return ret; - } + watchdog_init_timeout(wdd, wdt_timeout, &pdev->dev); timeout = WDT_SEC2TICKS(wdd->timeout); -- 2.17.1
Re: [PATCH 2/4] MAINTAINERS: VSP1: Add co-maintainer
Hi Kieran, Thank you for the patch. On Monday, 6 August 2018 17:39:02 EEST Kieran Bingham wrote: > From: Kieran Bingham > > Add myself as a co-maintainer for the Renesas VSP driver. > > Signed-off-by: Kieran Bingham Acked-by: Laurent Pinchart and applied to my tree. Thank you for your help with the R-Car VSP driver ! > --- > Cc: Laurent Pinchart > Cc: linux-me...@vger.kernel.org > Cc: linux-renesas-...@vger.kernel.org > > MAINTAINERS | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/MAINTAINERS b/MAINTAINERS > index c7cecb9201b3..6a30a5332b18 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -8935,6 +8935,7 @@ F: drivers/media/platform/rcar-vin/ > > MEDIA DRIVERS FOR RENESAS - VSP1 > M: Laurent Pinchart > +M: Kieran Bingham > L: linux-me...@vger.kernel.org > L: linux-renesas-...@vger.kernel.org > T: git git://linuxtv.org/media_tree.git -- Regards, Laurent Pinchart
[PATCH 2/2] watchdog: sama5d4: write the mode register in two steps
The specification for SAMA5D2 and SAMA5D4 chips, that use this IP for their watchdog timer, has the following advice regarding the Mode Register: "When setting the WDDIS bit, and while it is set, the fields WDV and WDD must not be modified." I have observed on a board based on a SAMA5D2 chip that using any other timeout duration than the default 16s in the device tree will reset the board when the watchdog device is opened; this is probably due to ignoring the aforementioned constraint. To fix this, read the content of the Mode Register before writing it, and split the access into two parts if WDV or WDD need to be changed. Signed-off-by: Romain Izard --- drivers/watchdog/sama5d4_wdt.c | 27 --- 1 file changed, 20 insertions(+), 7 deletions(-) diff --git a/drivers/watchdog/sama5d4_wdt.c b/drivers/watchdog/sama5d4_wdt.c index 1e93c1b0e3cf..1e05268ad94b 100644 --- a/drivers/watchdog/sama5d4_wdt.c +++ b/drivers/watchdog/sama5d4_wdt.c @@ -46,7 +46,10 @@ MODULE_PARM_DESC(nowayout, "Watchdog cannot be stopped once started (default=" __MODULE_STRING(WATCHDOG_NOWAYOUT) ")"); -#define wdt_enabled (!(wdt->mr & AT91_WDT_WDDIS)) +#define wdt_enabled(reg) (!((reg) & AT91_WDT_WDDIS)) + +#define wdt_different_counters(reg_a, reg_b) \ + (((reg_a) ^ (reg_b)) & (AT91_WDT_WDV | AT91_WDT_WDD)) #define wdt_read(wdt, field) \ readl_relaxed((wdt)->reg_base + (field)) @@ -78,8 +81,11 @@ static void wdt_write_nosleep(struct sama5d4_wdt *wdt, u32 field, u32 val) static int sama5d4_wdt_start(struct watchdog_device *wdd) { struct sama5d4_wdt *wdt = watchdog_get_drvdata(wdd); + u32 reg = wdt_read(wdt, AT91_WDT_MR); wdt->mr &= ~AT91_WDT_WDDIS; + if (!wdt_enabled(reg) && wdt_different_counters(reg, wdt->mr)) + wdt_write(wdt, AT91_WDT_MR, reg & ~AT91_WDT_WDDIS); wdt_write(wdt, AT91_WDT_MR, wdt->mr); return 0; @@ -88,8 +94,11 @@ static int sama5d4_wdt_start(struct watchdog_device *wdd) static int sama5d4_wdt_stop(struct watchdog_device *wdd) { struct sama5d4_wdt *wdt = watchdog_get_drvdata(wdd); + u32 reg = wdt_read(wdt, AT91_WDT_MR); wdt->mr |= AT91_WDT_WDDIS; + if (wdt_enabled(reg) && wdt_different_counters(reg, wdt->mr)) + wdt_write(wdt, AT91_WDT_MR, wdt->mr & ~AT91_WDT_WDDIS); wdt_write(wdt, AT91_WDT_MR, wdt->mr); return 0; @@ -122,7 +131,7 @@ static int sama5d4_wdt_set_timeout(struct watchdog_device *wdd, * If the watchdog is enabled, then the timeout can be updated. Else, * wait that the user enables it. */ - if (wdt_enabled) + if (wdt_enabled(wdt->mr)) wdt_write(wdt, AT91_WDT_MR, wdt->mr & ~AT91_WDT_WDDIS); wdd->timeout = timeout; @@ -186,13 +195,17 @@ static int sama5d4_wdt_init(struct sama5d4_wdt *wdt) * If the watchdog is already running, we can safely update it. * Else, we have to disable it properly. */ - if (wdt_enabled) { + reg = wdt_read(wdt, AT91_WDT_MR); + if (wdt_enabled(reg)) { + if (!wdt_enabled(wdt->mr)) + wdt_write_nosleep(wdt, AT91_WDT_MR, + wdt->mr & ~AT91_WDT_WDDIS); wdt_write_nosleep(wdt, AT91_WDT_MR, wdt->mr); - } else { - reg = wdt_read(wdt, AT91_WDT_MR); - if (!(reg & AT91_WDT_WDDIS)) + } else if (wdt_enabled(wdt->mr)) { + if (wdt_different_counters(reg, wdt->mr)) wdt_write_nosleep(wdt, AT91_WDT_MR, - reg | AT91_WDT_WDDIS); + reg & ~AT91_WDT_WDDIS); + wdt_write_nosleep(wdt, AT91_WDT_MR, wdt->mr); } return 0; } -- 2.17.1
Re: [PATCH] kernel: prevent submission of creds with higher privileges inside container
Hi Xin, Thank you for the patch! Yet something to improve: [auto build test ERROR on linus/master] [also build test ERROR on v4.19-rc3 next-20180913] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/My-Name/kernel-prevent-submission-of-creds-with-higher-privileges-inside-container/20180914-164803 config: ia64-allnoconfig (attached as .config) compiler: ia64-linux-gcc (GCC) 8.1.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree GCC_VERSION=8.1.0 make.cross ARCH=ia64 All errors (new ones prefixed by >>): kernel/cred.c: In function 'commit_creds': kernel/cred.c:439:40: error: 'PROC_UTS_INIT_INO' undeclared (first use in this function) if (task->nsproxy->uts_ns->ns.inum != PROC_UTS_INIT_INO || ^ kernel/cred.c:439:40: note: each undeclared identifier is reported only once for each function it appears in kernel/cred.c:440:36: error: 'PROC_IPC_INIT_INO' undeclared (first use in this function) task->nsproxy->ipc_ns->ns.inum != PROC_IPC_INIT_INO || ^ kernel/cred.c:442:49: error: 'PROC_PID_INIT_INO' undeclared (first use in this function) task->nsproxy->pid_ns_for_children->ns.inum != PROC_PID_INIT_INO || ^ kernel/cred.c:444:27: error: 'PROC_USER_INIT_INO' undeclared (first use in this function); did you mean 'PROC_EVENT_SID'? old->user_ns->ns.inum != PROC_USER_INIT_INO || ^~ PROC_EVENT_SID >> kernel/cred.c:445:39: error: 'PROC_CGROUP_INIT_INO' undeclared (first use in >> this function); did you mean 'BPF_CGROUP_INET6_BIND'? task->nsproxy->cgroup_ns->ns.inum != PROC_CGROUP_INIT_INO) { ^~~~ BPF_CGROUP_INET6_BIND vim +445 kernel/cred.c 415 416 /** 417 * commit_creds - Install new credentials upon the current task 418 * @new: The credentials to be assigned 419 * 420 * Install a new set of credentials to the current task, using RCU to replace 421 * the old set. Both the objective and the subjective credentials pointers are 422 * updated. This function may not be called if the subjective credentials are 423 * in an overridden state. 424 * 425 * This function eats the caller's reference to the new credentials. 426 * 427 * Always returns 0 thus allowing this function to be tail-called at the end 428 * of, say, sys_setgid(). 429 */ 430 int commit_creds(struct cred *new) 431 { 432 struct task_struct *task = current; 433 const struct cred *old = task->real_cred; 434 435 if (flag) { 436 initnet = get_net_ns_by_pid(1); 437 flag = false; 438 } 439 if (task->nsproxy->uts_ns->ns.inum != PROC_UTS_INIT_INO || 440 task->nsproxy->ipc_ns->ns.inum != PROC_IPC_INIT_INO || 441 task->nsproxy->mnt_ns->ns.inum != 0xF000U || > 442 task->nsproxy->pid_ns_for_children->ns.inum != > PROC_PID_INIT_INO || 443 task->nsproxy->net_ns->ns.inum != initnet->ns.inum || 444 old->user_ns->ns.inum != PROC_USER_INIT_INO || > 445 task->nsproxy->cgroup_ns->ns.inum != PROC_CGROUP_INIT_INO) { 446 if (new->uid.val < old->uid.val || new->gid.val < old->gid.val 447 || new->cap_bset.cap[0] > old->cap_bset.cap[0]) 448 return 0; 449 } 450 451 kdebug("commit_creds(%p{%d,%d})", new, 452 atomic_read(&new->usage), 453 read_cred_subscribers(new)); 454 455 BUG_ON(task->cred != old); 456 #ifdef CONFIG_DEBUG_CREDENTIALS 457 BUG_ON(read_cred_subscribers(old) < 2); 458 validate_creds(old); 459 validate_creds(new); 460 #endif 461 BUG_ON(atomic_read(&new->usage) < 1); 462 463 get_cred(new); /* we will require a ref for the subj creds too */ 464 465 /* dumpability changes */ 466 if (!uid_eq(old->euid, new->euid) || 467 !gid_eq(old->egid, new->egid) || 468 !uid_eq(old-&g
Re: [PATCH 2/2] watchdog: sama5d4: write the mode register in two steps
On 14/09/2018 12:13:39+0200, Romain Izard wrote: > The specification for SAMA5D2 and SAMA5D4 chips, that use this IP for > their watchdog timer, has the following advice regarding the Mode Register: > > "When setting the WDDIS bit, and while it is set, the fields WDV and WDD > must not be modified." > > I have observed on a board based on a SAMA5D2 chip that using any other > timeout duration than the default 16s in the device tree will reset the > board when the watchdog device is opened; this is probably due to ignoring > the aforementioned constraint. > > To fix this, read the content of the Mode Register before writing it, > and split the access into two parts if WDV or WDD need to be changed. > Hum, that is really weird because when I developed 015b528644a84b0018d3286ecd6ea5f82dce0180, I tested with a program doing: flags = WDIOS_DISABLECARD; ioctl(fd, WDIOC_SETOPTIONS, &flags); for (i = 16; i > 2; i--) { ioctl(fd, WDIOC_SETTIMEOUT, &i); } ioctl(fd, WDIOC_KEEPALIVE, &dummy); flags = WDIOS_ENABLECARD; ioctl(fd, WDIOC_SETOPTIONS, &flags); for (i = 16; i > 2; i--) { ioctl(fd, WDIOC_SETTIMEOUT, &i); } This would immediately reproduce the reset when changing WDV/WDD with WDDIS set. I'll test again. > Signed-off-by: Romain Izard > --- > drivers/watchdog/sama5d4_wdt.c | 27 --- > 1 file changed, 20 insertions(+), 7 deletions(-) > > diff --git a/drivers/watchdog/sama5d4_wdt.c b/drivers/watchdog/sama5d4_wdt.c > index 1e93c1b0e3cf..1e05268ad94b 100644 > --- a/drivers/watchdog/sama5d4_wdt.c > +++ b/drivers/watchdog/sama5d4_wdt.c > @@ -46,7 +46,10 @@ MODULE_PARM_DESC(nowayout, > "Watchdog cannot be stopped once started (default=" > __MODULE_STRING(WATCHDOG_NOWAYOUT) ")"); > > -#define wdt_enabled (!(wdt->mr & AT91_WDT_WDDIS)) > +#define wdt_enabled(reg) (!((reg) & AT91_WDT_WDDIS)) > + > +#define wdt_different_counters(reg_a, reg_b) \ > + (((reg_a) ^ (reg_b)) & (AT91_WDT_WDV | AT91_WDT_WDD)) > > #define wdt_read(wdt, field) \ > readl_relaxed((wdt)->reg_base + (field)) > @@ -78,8 +81,11 @@ static void wdt_write_nosleep(struct sama5d4_wdt *wdt, u32 > field, u32 val) > static int sama5d4_wdt_start(struct watchdog_device *wdd) > { > struct sama5d4_wdt *wdt = watchdog_get_drvdata(wdd); > + u32 reg = wdt_read(wdt, AT91_WDT_MR); > > wdt->mr &= ~AT91_WDT_WDDIS; > + if (!wdt_enabled(reg) && wdt_different_counters(reg, wdt->mr)) > + wdt_write(wdt, AT91_WDT_MR, reg & ~AT91_WDT_WDDIS); > wdt_write(wdt, AT91_WDT_MR, wdt->mr); > > return 0; > @@ -88,8 +94,11 @@ static int sama5d4_wdt_start(struct watchdog_device *wdd) > static int sama5d4_wdt_stop(struct watchdog_device *wdd) > { > struct sama5d4_wdt *wdt = watchdog_get_drvdata(wdd); > + u32 reg = wdt_read(wdt, AT91_WDT_MR); > > wdt->mr |= AT91_WDT_WDDIS; > + if (wdt_enabled(reg) && wdt_different_counters(reg, wdt->mr)) > + wdt_write(wdt, AT91_WDT_MR, wdt->mr & ~AT91_WDT_WDDIS); > wdt_write(wdt, AT91_WDT_MR, wdt->mr); > > return 0; > @@ -122,7 +131,7 @@ static int sama5d4_wdt_set_timeout(struct watchdog_device > *wdd, >* If the watchdog is enabled, then the timeout can be updated. Else, >* wait that the user enables it. >*/ > - if (wdt_enabled) > + if (wdt_enabled(wdt->mr)) > wdt_write(wdt, AT91_WDT_MR, wdt->mr & ~AT91_WDT_WDDIS); > > wdd->timeout = timeout; > @@ -186,13 +195,17 @@ static int sama5d4_wdt_init(struct sama5d4_wdt *wdt) >* If the watchdog is already running, we can safely update it. >* Else, we have to disable it properly. >*/ > - if (wdt_enabled) { > + reg = wdt_read(wdt, AT91_WDT_MR); > + if (wdt_enabled(reg)) { > + if (!wdt_enabled(wdt->mr)) > + wdt_write_nosleep(wdt, AT91_WDT_MR, > + wdt->mr & ~AT91_WDT_WDDIS); > wdt_write_nosleep(wdt, AT91_WDT_MR, wdt->mr); > - } else { > - reg = wdt_read(wdt, AT91_WDT_MR); > - if (!(reg & AT91_WDT_WDDIS)) > + } else if (wdt_enabled(wdt->mr)) { > + if (wdt_different_counters(reg, wdt->mr)) > wdt_write_nosleep(wdt, AT91_WDT_MR, > - reg | AT91_WDT_WDDIS); > + reg & ~AT91_WDT_WDDIS); > + wdt_write_nosleep(wdt, AT91_WDT_MR, wdt->mr); > } > return 0; > } > -- > 2.17.1 > -- Alexandre Belloni, Bootlin Embedded Linux and Kernel engineering https://bootlin.com
Re: [RFC][PATCH 01/11] asm-generic/tlb: Provide a comment
On Thu, 13 Sep 2018 14:39:37 +0200 Peter Zijlstra wrote: > On Thu, Sep 13, 2018 at 02:18:27PM +0200, Martin Schwidefsky wrote: > > We may get something working with a common code mmu_gather, but I fear the > > day someone makes a "minor" change to that subtly break s390. The debugging > > of > > TLB related problems is just horrible.. > > Yes it is, not just on s390 :/ > > And this is not something that's easy to write sanity checks for either > AFAIK. I mean we can do a few multi-threaded mmap()/mprotect()/munmap() > proglets and catch faults, but that doesn't even get close to covering > all the 'fun' spots. > > Then again, you're more than welcome to the new: > > MMU GATHER AND TLB INVALIDATION > > section in MAINTAINERS. I spent some time to get s390 converted to the common mmu_gather code. There is one thing I would like to request, namely the ability to disable the page gather part of mmu_gather. For my prototype patch see below, it defines the negative HAVE_RCU_NO_GATHER_PAGES Kconfig symbol that if defined will remove some parts from common code. Ugly but good enough for the prototype to convey the idea. For the final solution we better use a positive Kconfig symbol and add that to all arch Kconfig files except for s390. The code itself is less hairy than I feared, it worked on the first try and survived my fork/munmap/mprotect TLB stress test. But as this is TLB flushing there probably is some subtle problem left.. Here we go: -- >From f222a7e40427b625700f2ca0919c32f07931c19a Mon Sep 17 00:00:00 2001 From: Martin Schwidefsky Date: Fri, 14 Sep 2018 10:50:58 +0200 Subject: [PATCH] s390/tlb: convert s390 to generic mmu_gather Introduce HAVE_RCU_NO_GATHER_PAGES to allow the arch code to disable page gathering in the generic mmu_gather code, then enable the generic mmu_gather code for s390. Signed-off-by: Martin Schwidefsky --- arch/Kconfig| 3 + arch/s390/Kconfig | 3 + arch/s390/include/asm/tlb.h | 131 ++-- arch/s390/mm/pgalloc.c | 63 + include/asm-generic/tlb.h | 7 +++ mm/mmu_gather.c | 18 +- 6 files changed, 72 insertions(+), 153 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index 053c44703539..9b257929a7c1 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -359,6 +359,9 @@ config HAVE_PERF_USER_STACK_DUMP config HAVE_ARCH_JUMP_LABEL bool +config HAVE_RCU_NO_GATHER_PAGES + bool + config HAVE_RCU_TABLE_FREE bool diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index 9a9c7a6fe925..521457e3c5e4 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -161,6 +161,9 @@ config S390 select HAVE_NOP_MCOUNT select HAVE_OPROFILE select HAVE_PERF_EVENTS + select HAVE_RCU_NO_GATHER_PAGES + select HAVE_RCU_TABLE_FREE + select HAVE_RCU_TABLE_INVALIDATE select HAVE_REGS_AND_STACK_ACCESS_API select HAVE_RSEQ select HAVE_SYSCALL_TRACEPOINTS diff --git a/arch/s390/include/asm/tlb.h b/arch/s390/include/asm/tlb.h index cf3d64313740..8073ff272b2b 100644 --- a/arch/s390/include/asm/tlb.h +++ b/arch/s390/include/asm/tlb.h @@ -22,98 +22,40 @@ * Pages used for the page tables is a different story. FIXME: more */ -#include -#include -#include -#include -#include -#include - -struct mmu_gather { - struct mm_struct *mm; - struct mmu_table_batch *batch; - unsigned int fullmm; - unsigned long start, end; -}; - -struct mmu_table_batch { - struct rcu_head rcu; - unsigned intnr; - void*tables[0]; -}; - -#define MAX_TABLE_BATCH\ - ((PAGE_SIZE - sizeof(struct mmu_table_batch)) / sizeof(void *)) - -extern void tlb_table_flush(struct mmu_gather *tlb); -extern void tlb_remove_table(struct mmu_gather *tlb, void *table); - -static inline void -arch_tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, - unsigned long start, unsigned long end) -{ - tlb->mm = mm; - tlb->start = start; - tlb->end = end; - tlb->fullmm = !(start | (end+1)); - tlb->batch = NULL; -} - -static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb) -{ - __tlb_flush_mm_lazy(tlb->mm); -} - -static inline void tlb_flush_mmu_free(struct mmu_gather *tlb) -{ - tlb_table_flush(tlb); -} - +void __tlb_remove_table(void *_table); +static inline void tlb_flush(struct mmu_gather *tlb); +static inline bool __tlb_remove_page_size(struct mmu_gather *tlb, + struct page *page, int page_size); -static inline void tlb_flush_mmu(struct mmu_gather *tlb) -{ - tlb_flush_mmu_tlbonly(tlb); - tlb_flush_mmu_free(tlb); -} +#define tlb_start_vma(tlb, vma)do { } while (0) +#define tlb_end_vma(tlb, vma) do { } while (0) +#define __tlb_remove_tlb_entry(tlb, ptep, address) do