date:20180914

[PATCH] PM / suspend: Count suspend-to-idle loop as sleep time

2018-09-14 Thread Rafael J. Wysocki

From: Rafael J. Wysocki 

There is a difference in behavior between suspend-to-idle and
suspend-to-RAM in the timekeeping handling that leads to functional
issues.  Namely, every iteration of the loop in s2idle_loop()
increases the monotinic clock somewhat, even if timekeeping_suspend()
and timekeeping_resume() are invoked from s2idle_enter(), and if
many of them are carried out in a row, the monotonic clock can grow
significantly while the system is regarded as suspended, which
doesn't happen during suspend-to-RAM and so it is unexpected and
leads to confusion and misbehavior in user space (similar to what
ensued when we tried to combine the boottime and monotonic clocks).

To avoid that, count all iterations of the loop in s2idle_loop()
as "sleep time" and adjust the clock for that on exit from
suspend-to-idle.

[That also covers systems on which timekeeping is not suspended
 by by s2idle_enter().]

Signed-off-by: Rafael J. Wysocki 
---

This is a replacement for https://patchwork.kernel.org/patch/10599209/

I decided to count the entire loop in s2idle_loop() as "sleep time" as the
patch is then simpler and it also covers systems where timekeeping is not
suspended in the final step of suspend-to-idle.

I dropped the "Fixes:" tag, because the monotonic clock delta problem
has been present on the latter since the very introduction of "freeze"
(as suspend-to-idle was referred to previously) and so this doesn't fix
any particular later commits.

---
 kernel/power/suspend.c |   18 ++
 1 file changed, 18 insertions(+)

Index: linux-pm/kernel/power/suspend.c
===
--- linux-pm.orig/kernel/power/suspend.c
+++ linux-pm/kernel/power/suspend.c
@@ -109,8 +109,12 @@ static void s2idle_enter(void)
 
 static void s2idle_loop(void)
 {
+   ktime_t start, delta;
+
pm_pr_dbg("suspend-to-idle\n");
 
+   start = ktime_get();
+
for (;;) {
int error;
 
@@ -150,6 +154,20 @@ static void s2idle_loop(void)
pm_wakeup_clear(false);
}
 
+   /*
+* If the monotonic clock difference between the start of the loop and
+* this point is too large, user space may get confused about whether or
+* not the system has been suspended and tasks may get killed by
+* watchdogs etc., so count the loop as "sleep time" to compensate for
+* that.
+*/
+   delta = ktime_sub(ktime_get(), start);
+   if (ktime_to_ns(delta) > 0) {
+   struct timespec64 timespec64_delta = ktime_to_timespec64(delta);
+
+   timekeeping_inject_sleeptime64(×pec64_delta);
+   }
+
pm_pr_dbg("resume from suspend-to-idle\n");
 }

[PATCH] ia64: enable GENERIC_HWEIGHT

2018-09-14 Thread Sergei Trofimovich

Noticed on a single driver failure:
  ERROR: "__sw_hweight8" [drivers/net/wireless/mediatek/mt76/mt76.ko] undefined!

CC: Tony Luck 
CC: Fenghua Yu 
CC: linux-i...@vger.kernel.org
CC: Andrew Morton 
CC: linux-kernel@vger.kernel.org
Signed-off-by: Sergei Trofimovich 
---
 arch/ia64/Kconfig | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 8b4a0c1748c0..1a71f92f0b8e 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -576,3 +576,7 @@ config MSPEC
  If you have an ia64 and you want to enable memory special
  operations support (formerly known as fetchop), say Y here,
  otherwise say N.
+
+config GENERIC_HWEIGHT
+   bool
+   default y
-- 
2.19.0

[PATCH] ia64: disable SCHED_STACK_END_CHECK

2018-09-14 Thread Sergei Trofimovich

SCHED_STACK_END_CHECK assumes stack grows in one direction.
ia64 is a rare case where it is not.

As a result kernel fails at startup as:
  Kernel panic - not syncing: corrupted stack end detected inside scheduler

The error does not find a real problem: it's register backing store
is written on top of canary value.

Disable SCHED_STACK_END_CHECK on ia64 as there is no good
place for canary without moving initial stack address.

CC: Tony Luck 
CC: Fenghua Yu 
CC: linux-i...@vger.kernel.org
CC: Andrew Morton 
CC: linux-kernel@vger.kernel.org
Signed-off-by: Sergei Trofimovich 
---
 lib/Kconfig.debug | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 4966c4fbe7f7..a097dfe38d2b 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1004,7 +1004,7 @@ config SCHEDSTATS
 
 config SCHED_STACK_END_CHECK
bool "Detect stack corruption on calls to schedule()"
-   depends on DEBUG_KERNEL
+   depends on DEBUG_KERNEL && !IA64
default n
help
  This option checks for a stack overrun on calls to schedule().
-- 
2.19.0

[RFC] gpiolib: Fix gpio_direction_* for single direction GPIOs

2018-09-14 Thread Ricardo Ribalda Delgado

GPIOs with no programmable direction are not required to implement
direction_output nor direction_input.

If we try to set an output direction on an output-only GPIO or input
direction on an input-only GPIO simply return 0.

This allows this single direction GPIO to be used by libgpiod.

Signed-off-by: Ricardo Ribalda Delgado 
---
 drivers/gpio/gpiolib.c | 36 
 1 file changed, 28 insertions(+), 8 deletions(-)

diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index af853749e0bb..3c09bf70e7ab 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -2517,19 +2517,27 @@ EXPORT_SYMBOL_GPL(gpiochip_free_own_desc);
 int gpiod_direction_input(struct gpio_desc *desc)
 {
struct gpio_chip*chip;
-   int status = -EINVAL;
+   int status = 0;
 
VALIDATE_DESC(desc);
chip = desc->gdev->chip;
 
-   if (!chip->get || !chip->direction_input) {
+   if (!chip->get && chip->direction_input) {
gpiod_warn(desc,
-   "%s: missing get() or direction_input() operations\n",
+   "%s: missing get() and direction_input() operations\n",
__func__);
return -EIO;
}
 
-   status = chip->direction_input(chip, gpio_chip_hwgpio(desc));
+   if (chip->direction_input) {
+   status = chip->direction_input(chip, gpio_chip_hwgpio(desc));
+   } else if (chip->get_direction &&
+ (chip->get_direction(chip, gpio_chip_hwgpio(desc)) != 1)) {
+   gpiod_warn(desc,
+   "%s: missing direction_input() operation\n",
+   __func__);
+   return -EIO;
+   }
if (status == 0)
clear_bit(FLAG_IS_OUT, &desc->flags);
 
@@ -2551,16 +2559,28 @@ static int gpiod_direction_output_raw_commit(struct 
gpio_desc *desc, int value)
 {
struct gpio_chip *gc = desc->gdev->chip;
int val = !!value;
-   int ret;
+   int ret = 0;
 
-   if (!gc->set || !gc->direction_output) {
+   if (!gc->set && !gc->direction_output) {
gpiod_warn(desc,
-  "%s: missing set() or direction_output() operations\n",
+  "%s: missing set() and direction_output() operations\n",
   __func__);
return -EIO;
}
 
-   ret = gc->direction_output(gc, gpio_chip_hwgpio(desc), val);
+   if (gc->direction_output) {
+   ret = gc->direction_output(gc, gpio_chip_hwgpio(desc), val);
+   } else {
+   if (gc->get_direction &&
+   gc->get_direction(gc, gpio_chip_hwgpio(desc))) {
+   gpiod_warn(desc,
+   "%s: missing direction_output() operation\n",
+   __func__);
+   return -EIO;
+   }
+   gc->set(gc, gpio_chip_hwgpio(desc), val);
+   }
+
if (!ret)
set_bit(FLAG_IS_OUT, &desc->flags);
trace_gpio_value(desc_to_gpio(desc), 0, val);
-- 
2.18.0

[PATCH] gpiolib: Show correct direction from the beginning

2018-09-14 Thread Ricardo Ribalda Delgado

Current code assumes that the direction is input if direction_input
function is set.
This might not be the case on GPIOs with programmable direction.

Signed-off-by: Ricardo Ribalda Delgado 
---
 drivers/gpio/gpiolib.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index a57300c1d649..af853749e0bb 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -1352,7 +1352,12 @@ int gpiochip_add_data_with_key(struct gpio_chip *chip, 
void *data,
 * it does, and in case chip->get_direction is not set, we may
 * expose the wrong direction in sysfs.
 */
-   desc->flags = !chip->direction_input ? (1 << FLAG_IS_OUT) : 0;
+   if (chip->get_direction)
+   desc->flags = !chip->get_direction(chip, i) ?
+   (1 << FLAG_IS_OUT) : 0;
+   else
+   desc->flags = !chip->direction_input ?
+   (1 << FLAG_IS_OUT) : 0;
}
 
 #ifdef CONFIG_PINCTRL
-- 
2.18.0

Re: [PATCH v8 1/2] x86/mm: add .bss..decrypted section to hold shared variables

2018-09-14 Thread Borislav Petkov

On Thu, Sep 13, 2018 at 04:51:10PM -0500, Brijesh Singh wrote:
> kvmclock defines few static variables which are shared with the
> hypervisor during the kvmclock initialization.

...

> diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
> index 8047379..c16af27 100644
> --- a/arch/x86/kernel/head64.c
> +++ b/arch/x86/kernel/head64.c
> @@ -112,6 +112,7 @@ static bool __head check_la57_support(unsigned long 
> physaddr)
>  unsigned long __head __startup_64(unsigned long physaddr,
> struct boot_params *bp)
>  {
> + unsigned long vaddr, vaddr_end;
>   unsigned long load_delta, *p;
>   unsigned long pgtable_flags;
>   pgdval_t *pgd;
> @@ -235,6 +236,21 @@ unsigned long __head __startup_64(unsigned long physaddr,
>   sme_encrypt_kernel(bp);
>  
>   /*
> +  * Clear the memory encryption mask from the .bss..decrypted section.
> +  * The bss section will be memset to zero later in the initialization so
> +  * there is no need to zero it after changing the memory encryption
> +  * attribute.
> +  */
> + if (mem_encrypt_active()) {
> + vaddr = (unsigned long)__start_bss_decrypted;
> + vaddr_end = (unsigned long)__end_bss_decrypted;
> + for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
> + i = pmd_index(vaddr);
> + pmd[i] -= sme_get_me_mask();
> + }
> + }

Why isn't this chunk at the end of sme_encrypt_kernel() instead of here?

> + /*
>* Return the SME encryption mask (if SME is active) to be used as a
>* modifier for the initial pgdir entry programmed into CR3.
>*/
> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index 9c77d2d..0d618ee 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -65,6 +65,23 @@ jiffies_64 = jiffies;
>  #define ALIGN_ENTRY_TEXT_BEGIN   . = ALIGN(PMD_SIZE);
>  #define ALIGN_ENTRY_TEXT_END . = ALIGN(PMD_SIZE);
>  
> +/*
> + * This section contains data which will be mapped as decrypted. Memory
> + * encryption operates on a page basis. Make this section PMD-aligned
> + * to avoid splitting the pages while mapping the section early.
> + *
> + * Note: We use a separate section so that only this section gets
> + * decrypted to avoid exposing more than we wish.
> + */
> +#define BSS_DECRYPTED\
> + . = ALIGN(PMD_SIZE);\
> + __start_bss_decrypted = .;  \
> + *(.bss..decrypted); \
> + . = ALIGN(PAGE_SIZE);   \
> + __start_bss_decrypted_unused = .;   \
> + . = ALIGN(PMD_SIZE);\
> + __end_bss_decrypted = .;\
> +
>  #else
>  
>  #define X86_ALIGN_RODATA_BEGIN
> @@ -74,6 +91,7 @@ jiffies_64 = jiffies;
>  
>  #define ALIGN_ENTRY_TEXT_BEGIN
>  #define ALIGN_ENTRY_TEXT_END
> +#define BSS_DECRYPTED
>  
>  #endif
>  
> @@ -345,6 +363,7 @@ SECTIONS
>   __bss_start = .;
>   *(.bss..page_aligned)
>   *(.bss)
> + BSS_DECRYPTED
>   . = ALIGN(PAGE_SIZE);
>   __bss_stop = .;

Putting it in the BSS would need a bit of care in the future as it poses
a certain ordering on the calls in x86_64_start_kernel() (not that there
isn't any now :)):

You have:

x86_64_start_kernel:

...

clear_bss();

...

x86_64_start_reservations() -> ... -> setup_arch() -> kvmclock_init()

so it would be prudent to put at least a comment somewhere, say, over
the definition of BSS_DECRYPTED or so, that attention should be paid to
the clear_bss() call early.

Thx.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
--

Re: [REGRESSION] Errors at reboot after 722e5f2b1eec

2018-09-14 Thread Borislav Petkov

+ Jörg.

On Fri, Sep 14, 2018 at 08:26:07AM +0200, Rafael J. Wysocki wrote:
> On Friday, September 14, 2018 4:29:46 AM CEST Pingfan Liu wrote:
> > On Thu, Sep 13, 2018 at 10:15 PM Rafael J. Wysocki  
> > wrote:
> > >
> > > On Thursday, September 13, 2018 12:03:36 PM CEST James Wang wrote:
> > > >  This is a multi-part message in MIME format.
> > > > --F5519E624D0AD1E3F7DDA019
> > > > Content-Type: text/plain; charset=utf-8
> > > > Content-Transfer-Encoding: 8bit
> > > >
> > > >
> > > >
> > > > On 09/11/2018 02:15 PM, Takashi Iwai wrote:
> > > > > On Tue, 11 Sep 2018 14:11:30 +0200,
> > > > > James Wang wrote:
> > > > >> I did try it from kernel : head
> > > > > OK, then the bug is present with 4.19-rc2, at least.
> > > > > Please check my test kernel later (it's still being built).
> > > > Hi folks,  I attach two log about 4.19-rc3 and 4.19-rc3+Rafeal 
> > > > suggestion.
> > >
> > > OK, no difference AFAICS.
> > >
> > > This means that the commit turned up by bisection simply uncovered an 
> > > existing
> > > ordering issue, apparently between an IOMMU and its client (ie. it 
> > > appears that
> > > the client is shut down after the IOMMU).
> > >
> > > This isn't limited to shutdown and you'd see the same issue on system-wide
> > > suspend/resume (in fact, the Pingfan Liu's patches make shutdown use the
> > > same device list that is used for system-wide PM).
> > >
> > > One way to mitigate such issues is to add a device link between the two
> > > devices in question to enforce the correct suspend/resume/shutdown 
> > > ordering
> > > between them.
> > >
> > I found the iommu was at  " :00:00.2", while  ohci-pci is at
> > :00:13.1. Hence ohci-pci should be shutdown before iommu. Not
> > familiar with AMD's iommu code, but I think there is no shutdown
> > interface exported to drivers/base. It is shutdown by platform code in
> > arch/x86. So I think there should be someone teared down the iotbl,
> > e.g. by invalidating pages, before the shutdown of ohci-pci. I wonder
> > whether adding a device link can fix this bug or not. (Forgive me if I
> > made a mistake, since I am ignorant in this field)
> 
> Adding a device link should help, as it effectively causes dpm_list to
> be reordered in accordance with the link direction, but it also takes
> care of the other children and linked devices as appropriate.
> 
> The difficulty is that whoever wants to add a device link between two
> devices needs to have pointers to the device objects in question upfront.
> 
> Thanks,
> Rafael
> 

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
--

Re: [RFCv2 00/48] perf tools: Add threads to record command

2018-09-14 Thread Alexey Budankov

Hi Namhyung,

On 14.09.2018 5:29, Namhyung Kim wrote:
> On Thu, Sep 13, 2018 at 07:10:35PM +0300, Alexey Budankov wrote:
>> Hi,
> 
> Hello,
> 
>>
>> On 13.09.2018 15:54, Jiri Olsa wrote:
>>> hi,
>>> sending *RFC* for threads support in perf record command.
>>>
>>> In big picture this patchset adds perf record --threads
>>> option that allows to create threads in following modes:
>>>
>>> 1) single thread mode (current)
>>>
>>>   $ perf record ...
>>>   $ perf record --threads=1 ...
>>>
>>>   - all maps are read/stored under process thread
>>>
>>> 2) mode with specific (X) number of threads
>>>
>>>   $ perf record --threads=X ...
>>>
>>>   - maps are spread equaly among threads
>>>
>>> 3) mode that creates thread for every monitored memory map
>>>
>>>   $ perf record --threads ...
>>>
>>>   - which in perf record is equal to number of CPUs, and
>>> it pins each thread to its map's cpu:
>>>
>>> 4) TODO - NUMA aware threads/maps separation
>>>...
>>>
>>> The perf.data stays as a single file.
> 
> I'm not sure we really need to keep it as a single file.  As it's a
> kind of big changes, we might consider breaking compatibility and use
> a directory structure.
> 
> 
>>>
>>> v2 changes:
>>>   - rebased to current Arnaldo's perf/core
>>> (also based on few fixes from my perf/core, see the branch details 
>>> below)
>>>
>>> This patchset contains lot of preparation changes to make
>>> threaded record possible:
>>>
>>>   - Namhyung's changes to create multiple data streams in
>>> perf data file, which allows having each thread data
>>> being stored in separate files and merged into single
>>> perf data after
>>>
>>>   - Namhyung's changes to create track mmaps for auxiliary
>>> events
>>>
>>>   - Namhyung's changes to search for threads/mmaps/comms
>>> using the time. This is needed because we have now
>>> multiple data streams which are processed separately,
>>> but they all need access to complete auxiliary events
>>> data (threads/mmaps/comms). That's also a reason why
>>> the auxiliary events are stored into separate data
>>> stream, which is processed before real data.
>>>
>>>   - the rest of the code that adds threads abstraction into
>>> record command allows to create them and distribute maps
>>> among them
>>>
>>>   - other preparational changes
>>>
>>> The threaded monitoring currently can't monitor backward maps
>>> and there are probably more limitations which I haven't spotted
>>> yet.
>>>
>>> So far I tested on laptop:
>>>   http://people.redhat.com/~jolsa/record_threads/test-4CPU.txt
>>>
>>> and a one bigger server:
>>>   http://people.redhat.com/~jolsa/record_threads/test-208CPU.txt
>>>
>>> I can see decrease in recorded LOST events, but both the benchmark
>>> and the monitoring must be carefully configured wrt:
>>>   - number of events (frequency)
>>>   - size of the memory maps
>>>   - size of events (callchains)
>>>   - final perf.data size
>>>
>>> It's also available in:
>>>   git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
>>>   perf/record_threads
>>>
>>> thoughts? ;-) thanks
>>> jirka
>>
>> It is preferable to split into smaller pieces that bring 
>> some improvement proved by metrics numbers and ready for 
>> merging and upstream. Do we have more metrics than the 
>> data loss from trace AIO patches?
> 
> Well, this change is to enable parallel access of perf data so the
> preparation works don't affect single thread processing (hopefully) or
> make it even worse.  I'm not sure if could be split for performance
> benefits.
> 
> 
>>
>> There is usage of Posix threading API but there is no 
>> its implementation in the patch series, to avoid dependency 
>> on externally coded designs in the core of the tool.
> 
> Do you mean it needs to implement its own threading?  I don't think
> that's what Ingo wanted to.

Pthreads implementation is still a part of libc and even part of 
other libpthread.so binary in case of glibc.

I guess for Pthreads it is probably acceptable to reside separately 
because as glibc as bionic provide reasonable implementation. 

But which else libc libraries do Perf target? It is preferable to 
clarify it in advance in order to provide complete implementation.

Thanks,
Alexey

> 
> Thanks,
> Namhyung
> 
> 
>>
>>>
>>>
>>> ---
>>> Jiri Olsa (30):
>>>   perf tools: Remove perf_tool from event_op2
>>>   perf tools: Remove perf_tool from event_op3
>>>   perf tools: Pass struct perf_mmap into auxtrace_mmap__read* functions
>>>   perf tools: Add struct perf_mmap arg into record__write
>>>   perf tools: Create separate mmap for dummy tracking event
>>>   perf tools: Make copyfile_offset global
>>>   perf tools: Add perf_data__create_index function
>>>   perf record: Add --index option for building index table
>>>   perf tools: Convert dead thread list into rbtree
>>>   perf tools: Add thread::exited flag
>>>   perf callchain: Maintain libunwind's address space in ma

Re: [PATCH v2] mm: mprotect: check page dirty when change ptes

2018-09-14 Thread Peter Xu

On Thu, Sep 13, 2018 at 08:42:39PM -0400, Jerome Glisse wrote:
> On Thu, Sep 13, 2018 at 10:23:28AM -0400, Jerome Glisse wrote:
> > On Thu, Sep 13, 2018 at 03:37:22PM +0800, Peter Xu wrote:
> > > On Wed, Sep 12, 2018 at 09:24:39AM -0400, Jerome Glisse wrote:
> > > > On Wed, Sep 12, 2018 at 09:03:55AM -0400, Jerome Glisse wrote:
> > > > > On Wed, Sep 12, 2018 at 02:49:21PM +0800, Peter Xu wrote:
> > > > > > Add an extra check on page dirty bit in change_pte_range() since 
> > > > > > there
> > > > > > might be case where PTE dirty bit is unset but it's actually 
> > > > > > dirtied.
> > > > > > One example is when a huge PMD is splitted after written: the dirty 
> > > > > > bit
> > > > > > will be set on the compound page however we won't have the dirty 
> > > > > > bit set
> > > > > > on each of the small page PTEs.
> > > > > > 
> > > > > > I noticed this when debugging with a customized kernel that 
> > > > > > implemented
> > > > > > userfaultfd write-protect.  In that case, the dirty bit will be 
> > > > > > critical
> > > > > > since that's required for userspace to handle the write protect page
> > > > > > fault (otherwise it'll get a SIGBUS with a loop of page faults).
> > > > > > However it should still be good even for upstream Linux to cover 
> > > > > > more
> > > > > > scenarios where we shouldn't need to do extra page faults on the 
> > > > > > small
> > > > > > pages if the previous huge page is already written, so the dirty bit
> > > > > > optimization path underneath can cover more.
> > > > > > 
> > > > > 
> > > > > So as said by Kirill NAK you are not looking at the right place for
> > > > > your bug please first apply the below patch and read my analysis in
> > > > > my last reply.
> > > > 
> > > > Just to be clear you are trying to fix a userspace bug that is hidden
> > > > for non THP pages by a kernel space bug inside userfaultfd by making
> > > > the kernel space bug of userfaultfd buggy for THP too.
> > > > 
> > > > 
> > > > > 
> > > > > Below patch fix userfaultfd bug. I am not posting it as it is on a
> > > > > branch and i am not sure when Andrea plan to post. Andrea feel free
> > > > > to squash that fix.
> > > > > 
> > > > > 
> > > > > From 35cdb30afa86424c2b9f23c0982afa6731be961c Mon Sep 17 00:00:00 2001
> > > > > From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= 
> > > > > Date: Wed, 12 Sep 2018 08:58:33 -0400
> > > > > Subject: [PATCH] userfaultfd: do not set dirty accountable when 
> > > > > changing
> > > > >  protection
> > > > > MIME-Version: 1.0
> > > > > Content-Type: text/plain; charset=UTF-8
> > > > > Content-Transfer-Encoding: 8bit
> > > > > 
> > > > > mwriteprotect_range() has nothing to do with the dirty accountable
> > > > > optimization so do not set it as it opens a door for userspace to
> > > > > unwrite protect pages in a range that is write protected ie the vma
> > > > > !(vm_flags & VM_WRITE).
> > > > > 
> > > > > Signed-off-by: Jérôme Glisse 
> > > > > ---
> > > > >  mm/userfaultfd.c | 2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> > > > > index a0379c5ffa7c..59db1ce48fa0 100644
> > > > > --- a/mm/userfaultfd.c
> > > > > +++ b/mm/userfaultfd.c
> > > > > @@ -632,7 +632,7 @@ int mwriteprotect_range(struct mm_struct *dst_mm, 
> > > > > unsigned long start,
> > > > >   newprot = vm_get_page_prot(dst_vma->vm_flags);
> > > > >  
> > > > >   change_protection(dst_vma, start, start + len, newprot,
> > > > > - !enable_wp, 0);
> > > > > + false, 0);
> > > > >  
> > > > >   err = 0;
> > > > >  out_unlock:
> > > 
> > > Hi, Jerome,
> > > 
> > > I tried your patch, unluckily it didn't work just like when not
> > > applied:
> > > 
> > > Sep 13 15:16:52 px-ws kernel: FAULT_FLAG_ALLOW_RETRY missing 71
> > > Sep 13 15:16:52 px-ws kernel: CPU: 5 PID: 1625 Comm: qemu-system-x86 Not 
> > > tainted 4.19.0-rc2+ #31   
> > >  
> > > Sep 13 15:16:52 px-ws kernel: Hardware name: LENOVO ThinkCentre 
> > > M8500t-N000/SHARKBAY, BIOS FBKTC6AUS 06/22/2016   
> > >   
> > > Sep 13 15:16:52 px-ws kernel: Call Trace:
> > > Sep 13 15:16:52 px-ws kernel:  dump_stack+0x5c/0x7b
> > > Sep 13 15:16:52 px-ws kernel:  handle_userfault+0x4b5/0x780
> > > Sep 13 15:16:52 px-ws kernel:  ? userfaultfd_ctx_put+0xb0/0xb0
> > > Sep 13 15:16:52 px-ws kernel:  do_wp_page+0x1bd/0x5a0
> > > Sep 13 15:16:52 px-ws kernel:  __handle_mm_fault+0x7f9/0x1250
> > > Sep 13 15:16:52 px-ws kernel:  handle_mm_fault+0xfc/0x1f0
> > > Sep 13 15:16:52 px-ws kernel:  __do_page_fault+0x255/0x520
> > > Sep 13 15:16:52 px-ws kernel:  do_page_fault+0x32/0x110
> > > Sep 13 15:16:52 px-ws kernel:  ? page_fault+0x8/0x30
> > > Sep 13 15:16:52 px-ws kernel:  page_fault+0x1e/0x30
> > > Sep 13 15:16:52 px-ws kernel: RIP: 0033:0x7f2a9d3254e0
> > > Sep 13 15:1

[PATCH v2 3/4] iio: adc: Add Xilinx AMS driver

2018-09-14 Thread Manish Narani

The AMS includes an ADC as well as on-chip sensors that can be used to
sample external voltages and monitor on-die operating conditions, such
as temperature and supply voltage levels. The AMS has two SYSMON blocks.
PL-SYSMON block is capable of monitoring off chip voltage and
temperature.
PL-SYSMON block has DRP, JTAG and I2C interface to enable monitoring
from external master. Out of these interface currently only DRP is
supported.
Other block PS-SYSMON is memory mapped to PS.
The AMS can use internal channels to monitor voltage and temperature as
well as one primary and up to 16 auxiliary channels for measuring
external voltages.
The voltage and temperature monitoring channels also have event
capability which allows to generate an interrupt when their value falls
below or raises above a set threshold.

Signed-off-by: Manish Narani 
---
 drivers/iio/adc/Kconfig  |   10 +
 drivers/iio/adc/Makefile |1 +
 drivers/iio/adc/xilinx-ams.c | 1337 ++
 3 files changed, 1348 insertions(+)
 create mode 100644 drivers/iio/adc/xilinx-ams.c

diff --git a/drivers/iio/adc/Kconfig b/drivers/iio/adc/Kconfig
index 4a75492..405ea00 100644
--- a/drivers/iio/adc/Kconfig
+++ b/drivers/iio/adc/Kconfig
@@ -941,4 +941,14 @@ config XILINX_XADC
  The driver can also be build as a module. If so, the module will be 
called
  xilinx-xadc.
 
+config XILINX_AMS
+   tristate "Xilinx AMS driver"
+   depends on ARCH_ZYNQMP || COMPILE_TEST
+   depends on HAS_IOMEM
+   help
+ Say yes here to have support for the Xilinx AMS.
+
+ The driver can also be build as a module. If so, the module will be 
called
+ xilinx-ams.
+
 endmenu
diff --git a/drivers/iio/adc/Makefile b/drivers/iio/adc/Makefile
index 03db7b5..fbfcc45 100644
--- a/drivers/iio/adc/Makefile
+++ b/drivers/iio/adc/Makefile
@@ -85,4 +85,5 @@ obj-$(CONFIG_VF610_ADC) += vf610_adc.o
 obj-$(CONFIG_VIPERBOARD_ADC) += viperboard_adc.o
 xilinx-xadc-y := xilinx-xadc-core.o xilinx-xadc-events.o
 obj-$(CONFIG_XILINX_XADC) += xilinx-xadc.o
+obj-$(CONFIG_XILINX_AMS) += xilinx-ams.o
 obj-$(CONFIG_SD_ADC_MODULATOR) += sd_adc_modulator.o
diff --git a/drivers/iio/adc/xilinx-ams.c b/drivers/iio/adc/xilinx-ams.c
new file mode 100644
index 000..c239a02
--- /dev/null
+++ b/drivers/iio/adc/xilinx-ams.c
@@ -0,0 +1,1337 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Xilinx AMS driver
+ *
+ *  Copyright (C) 2017-2018 Xilinx, Inc.
+ *
+ *  Manish Narani 
+ *  Rajnikant Bhojani 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+static const unsigned int AMS_UNMASK_TIMEOUT_MS = 500;
+
+/* AMS registers definitions */
+#define AMS_ISR_0  0x010
+#define AMS_ISR_1  0x014
+#define AMS_IER_0  0x020
+#define AMS_IER_1  0x024
+#define AMS_IDR_0  0x028
+#define AMS_IDR_1  0x02c
+#define AMS_PS_CSTS0x040
+#define AMS_PL_CSTS0x044
+
+#define AMS_VCC_PSPLL0 0x060
+#define AMS_VCC_PSPLL3 0x06C
+#define AMS_VCCINT 0x078
+#define AMS_VCCBRAM0x07C
+#define AMS_VCCAUX 0x080
+#define AMS_PSDDRPLL   0x084
+#define AMS_PSINTFPDDR 0x09C
+
+#define AMS_VCC_PSPLL0_CH  48
+#define AMS_VCC_PSPLL3_CH  51
+#define AMS_VCCINT_CH  54
+#define AMS_VCCBRAM_CH 55
+#define AMS_VCCAUX_CH  56
+#define AMS_PSDDRPLL_CH57
+#define AMS_PSINTFPDDR_CH  63
+
+#define AMS_REG_CONFIG00x100
+#define AMS_REG_CONFIG10x104
+#define AMS_REG_CONFIG30x10C
+#define AMS_REG_SEQ_CH00x120
+#define AMS_REG_SEQ_CH10x124
+#define AMS_REG_SEQ_CH20x118
+
+#define AMS_TEMP   0x000
+#define AMS_SUPPLY10x004
+#define AMS_SUPPLY20x008
+#define AMS_VP_VN  0x00c
+#define AMS_VREFP  0x010
+#define AMS_VREFN  0x014
+#define AMS_SUPPLY30x018
+#define AMS_SUPPLY40x034
+#define AMS_SUPPLY50x038
+#define AMS_SUPPLY60x03c
+#define AMS_SUPPLY70x200
+#define AMS_SUPPLY80x204
+#define AMS_SUPPLY90x208
+#define AMS_SUPPLY10   0x20c
+#define AMS_VCCAMS 0x210
+#define AMS_TEMP_REMOTE0x214
+
+#define AMS_REG_VAUX(x)(0x40 + (4*(x)))
+
+#define AMS_PS_RESET_VALUE 0xU
+#define AMS_PL_RESET_VALUE

[PATCH v2 0/4] Add Xilinx AMS Driver

2018-09-14 Thread Manish Narani

Add Xilinx AMS driver which is used for Xilinx's ZynqMP AMS controller.
This AMS driver is used to report various interface voltages and temperatures
across the system.
This driver handles AMS module including PS-Sysmon & PL-Sysmon. The binding
documentation is added for understanding of AMS, PS, PL Sysmon Channels.

Changes in v2:
- Added documentation for sysfs (Patch-2)
- Addressed code style review comments
- Patch-2 (Now it is Patch-3)
- Arranged the includes in alphabetical order
- Removed the wrapper 'ams_pl_write_reg()' and used writel
  instead
- Removed the unnecessary delay of 1ms and used polling of EOC
  instead
- Removed spin_lock and used mutex only.
- Used request_irq() instead of devm_request_irq() and handled
  respective error conditions
- Moved contents of xilinx-ams.h to inline with xilinx-ams.c
- Patch-1
- Addressed Documentation style comments

Manish Narani (4):
  dt-bindings: iio: adc: Add Xilinx AMS binding documentation
  iio: Documentation: Add Xilinx AMS sysfs documentation
  iio: adc: Add Xilinx AMS driver
  arm64: zynqmp: DT: Add Xilinx AMS node

 .../ABI/testing/sysfs-bus-iio-adc-xilinx-ams   |  246 
 .../devicetree/bindings/iio/adc/xilinx-ams.txt |  180 +++
 arch/arm64/boot/dts/xilinx/zynqmp.dtsi |   26 +
 drivers/iio/adc/Kconfig|   10 +
 drivers/iio/adc/Makefile   |1 +
 drivers/iio/adc/xilinx-ams.c   | 1337 
 6 files changed, 1800 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-bus-iio-adc-xilinx-ams
 create mode 100644 Documentation/devicetree/bindings/iio/adc/xilinx-ams.txt
 create mode 100644 drivers/iio/adc/xilinx-ams.c

-- 
2.1.1

[PATCH v2 4/4] arm64: zynqmp: DT: Add Xilinx AMS node

2018-09-14 Thread Manish Narani

The Xilinx AMS includes an ADC as well as on-chip sensors that can be
used to sample external and monitor on-die operating conditions, such as
temperature and supply voltage levels.

Signed-off-by: Manish Narani 
---
 arch/arm64/boot/dts/xilinx/zynqmp.dtsi | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi 
b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
index 29ce234..6e42ca2 100644
--- a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
+++ b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
@@ -617,5 +617,31 @@
reg = <0x0 0xfd4d 0x0 0x1000>;
timeout-sec = <10>;
};
+
+   xilinx_ams: ams@ffa5 {
+   compatible = "xlnx,zynqmp-ams";
+   status = "disabled";
+   interrupt-parent = <&gic>;
+   interrupts = <0 56 4>;
+   interrupt-names = "ams-irq";
+   reg = <0x0 0xffa5 0x0 0x800>;
+   reg-names = "ams-base";
+   #address-cells = <2>;
+   #size-cells = <2>;
+   #io-channel-cells = <1>;
+   ranges;
+
+   ams_ps: ams_ps@ffa50800 {
+   compatible = "xlnx,zynqmp-ams-ps";
+   status = "disabled";
+   reg = <0x0 0xffa50800 0x0 0x400>;
+   };
+
+   ams_pl: ams_pl@ffa50c00 {
+   compatible = "xlnx,zynqmp-ams-pl";
+   status = "disabled";
+   reg = <0x0 0xffa50c00 0x0 0x400>;
+   };
+   };
};
 };
-- 
2.1.1

[PATCH v2 2/4] iio: Documentation: Add Xilinx AMS sysfs documentation

2018-09-14 Thread Manish Narani

Add documentation for xilinx-ams driver. This contains information about
various voltages and temperatures on PS (Processing System), PL
(Programmable Logic) and AMS Control Block.

Signed-off-by: Manish Narani 
---
 .../ABI/testing/sysfs-bus-iio-adc-xilinx-ams   | 246 +
 1 file changed, 246 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-bus-iio-adc-xilinx-ams

diff --git a/Documentation/ABI/testing/sysfs-bus-iio-adc-xilinx-ams 
b/Documentation/ABI/testing/sysfs-bus-iio-adc-xilinx-ams
new file mode 100644
index 000..589c389
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-iio-adc-xilinx-ams
@@ -0,0 +1,246 @@
+What:  /sys/bus/iio/devices/iio:deviceX/in_voltageY_vccpsintlp
+Date:  September 2018
+KernelVersion: 4.19.0
+Contact:   mnar...@xilinx.com
+Description:
+   Voltage input for PS (Processing System) LPD (Low Power Domain)
+   power supply.
+
+
+What:  /sys/bus/iio/devices/iio:deviceX/in_voltageY_vccpsintfp
+Date:  September 2018
+KernelVersion: 4.19.0
+Contact:   mnar...@xilinx.com
+Description:
+   Voltage input for PS (Processing System) FPD (Full Power Domain)
+   power supply.
+
+
+What:  /sys/bus/iio/devices/iio:deviceX/in_voltageY_vccpsaux
+Date:  September 2018
+KernelVersion: 4.19.0
+Contact:   mnar...@xilinx.com
+Description:
+   Voltage input for PS (Processing System) Auxiliary power
+   supply.
+
+
+What:  /sys/bus/iio/devices/iio:deviceX/in_voltageY_vccpsddr
+Date:  September 2018
+KernelVersion: 4.19.0
+Contact:   mnar...@xilinx.com
+Description:
+   Voltage input for PS (Processing System) I/O bank 504 (DDR
+   PHY) power supply.
+
+
+What:  /sys/bus/iio/devices/iio:deviceX/in_voltageY_vccpsio3
+Date:  September 2018
+KernelVersion: 4.19.0
+Contact:   mnar...@xilinx.com
+Description:
+   Voltage input for PS (Processing System) I/O bank 503
+   (boot, config, JTAG, SRST, POR) power supply.
+
+
+What:  /sys/bus/iio/devices/iio:deviceX/in_voltageY_vccpsio0
+Date:  September 2018
+KernelVersion: 4.19.0
+Contact:   mnar...@xilinx.com
+Description:
+   Voltage input for PS (Processing System) I/O bank 500
+   (MIO[0:25]) power supply.
+
+
+What:  /sys/bus/iio/devices/iio:deviceX/in_voltageY_vccpsio1
+Date:  September 2018
+KernelVersion: 4.19.0
+Contact:   mnar...@xilinx.com
+Description:
+   Voltage input for PS (Processing System) I/O bank 501
+   (MIO[26:51]) power supply.
+
+
+What:  /sys/bus/iio/devices/iio:deviceX/in_voltageY_vccpsio2
+Date:  September 2018
+KernelVersion: 4.19.0
+Contact:   mnar...@xilinx.com
+Description:
+   Voltage input for PS (Processing System) I/O bank 502
+   (MIO[52:77]) power supply.
+
+
+What:  /sys/bus/iio/devices/iio:deviceX/in_voltageY_psmgtravcc
+Date:  September 2018
+KernelVersion: 4.19.0
+Contact:   mnar...@xilinx.com
+Description:
+   Voltage input for PS (Processing System) GTR SerDes I/O
+   power supply.
+
+
+What:  /sys/bus/iio/devices/iio:deviceX/in_voltageY_psmgtravtt
+Date:  September 2018
+KernelVersion: 4.19.0
+Contact:   mnar...@xilinx.com
+Description:
+   Voltage input for PS (Processing System) GTR SerDes
+   terminators power supply.
+
+
+What:  /sys/bus/iio/devices/iio:deviceX/in_voltageY_vccams
+Date:  September 2018
+KernelVersion: 4.19.0
+Contact:   mnar...@xilinx.com
+Description:
+   Voltage input for PS (Processing System) SYSMON ADC circuitry
+   power supply.
+
+
+What:  /sys/bus/iio/devices/iio:deviceX/in_voltageY_vccint
+Date:  September 2018
+KernelVersion: 4.19.0
+Contact:   mnar...@xilinx.com
+Description:
+   Voltage input for PL (Programmable Logic) internal power supply.
+
+
+What:  /sys/bus/iio/devices/iio:deviceX/in_voltageY_vccvrefp
+Date:  September 2018
+KernelVersion: 4.19.0
+Contact:   mnar...@xilinx.com
+Description:
+   Voltage input for PL (Programmable Logic) ADC positive V
+   reference power supply.
+
+
+What:  /sys/bus/iio/devices/iio:deviceX/in_voltageY_vccvrefn
+Date:  September 2018
+KernelVersion: 4.19.0
+Contact:   mnar...@xilinx.com
+Description:
+   Voltage input for PL (Programmable Logic) ADC negative V
+   reference power supply.
+
+
+What:  /sys/bus/iio/devices/iio:deviceX/in_voltageY_vccplintlp
+Date:  September 2018
+KernelVersion: 4.19.0
+Contact:   mnar...@xilinx.com
+Description:
+   Voltage input for PL (Programmable Logic) LPD (Low Power Domain)
+   power supply.
+
+
+What:  /sys/bus/iio/dev

[PATCH v2 1/4] dt-bindings: iio: adc: Add Xilinx AMS binding documentation

2018-09-14 Thread Manish Narani

Xilinx AMS have several ADC channels that can be used for measurement of
different voltages and temperatures. Document the same in the bindings.

Signed-off-by: Manish Narani 
---
 .../devicetree/bindings/iio/adc/xilinx-ams.txt | 180 +
 1 file changed, 180 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/iio/adc/xilinx-ams.txt

diff --git a/Documentation/devicetree/bindings/iio/adc/xilinx-ams.txt 
b/Documentation/devicetree/bindings/iio/adc/xilinx-ams.txt
new file mode 100644
index 000..d0958a1
--- /dev/null
+++ b/Documentation/devicetree/bindings/iio/adc/xilinx-ams.txt
@@ -0,0 +1,180 @@
+Xilinx AMS controller
+`
+The AMS includes an ADC as well as on-chip sensors that can be used to
+sample external voltages and monitor on-die operating conditions, such as
+temperature and supply voltage levels.
+The AMS has two SYSMON blocks which are PL (Programmable Logic) SYSMON and
+PS (Processing System) SYSMON.
+All designs should have AMS registers, but PS and PL are optional. The
+AMS controller can work with only PS, only PL and both PS and PL
+configurations. Please specify registers according to your design. Devicetree
+should always have AMS module property. Providing PS & PL module is optional.
+
+Required properties:
+   - compatible: Should be "xlnx,zynqmp-ams"
+   - reg: Should specify AMS register space
+   - interrupts: Interrupt number for the AMS control interface
+   - interrupt-names: Interrupt name, must be "ams-irq"
+   - clocks: Should contain a clock specifier for the device
+   - ranges: keep the property empty to map child address space
+ (for PS and/or PL) nodes 1:1 onto the parent address
+ space
+
+   AMS sub-nodes:
+   - ams_ps : Used as PS-SYSMON node
+   - ams_pl : Used as PL-SYSMON node
+
+
+
+AMS PS-SYSMON
+`
+PS (Processing System) SYSMON is memory mapped to PS. This block has built-in
+alarm generation logic that is used to interrupt the processor based on
+condition set.
+
+Required properties:
+   - compatible: Should be "xlnx,zynqmp-ams-ps"
+   - reg: Register space for PS-SYSMON
+
+
+
+AMS PL-SYSMON
+`
+PL-SYSMON is capable of monitoring off chip voltage and temperature. PL-SYSMON
+block has DRP, JTAG and I2C interface to enable monitoring from external 
master.
+Out of this interface currently only DRP is supported. This block has alarm
+generation logic that is used to interrupt the processor based on condition 
set.
+
+Required properties:
+   - compatible: Should be "xlnx,zynqmp-ams-pl"
+   - reg: Register space for PL-SYSMON
+
+PL-SYSMON optional sub-nodes:
+   - xlnx,ext-channels: List of external channels that are connected to the
+AMS PL module.
+
+ The child nodes of PL-SYSMON represent the external channels which are
+ connected to this Module. If the property is not present
+ no external channels will be assumed to be connected.
+
+ Each child node represents one channel and has the following
+ properties:
+
+   Required properties:
+   * reg: Pair of pins the channel is connected to.
+
+   'reg' value Channel NameChannel Number
+   --- --
+0  VP/VN 30
+1  VUSER031
+2  VUSER132
+3  VUSER333
+4  VUSER434
+5  VAUXP[0]/VAUXN[0] 35
+6  VAUXP[1]/VAUXN[1] 36
+   ...
+   20  VAUXP[15]/VAUXN[15]   50
+
+   Each channel number should only be used at most once. For
+   more details on channels, refer table given at the end.
+
+   Optional properties:
+   * xlnx,bipolar: If set the channel is used in bipolar
+ mode.
+
+
+Example:
+   xilinx_ams: ams@ffa5 {
+   compatible = "xlnx,zynqmp-ams";
+   interrupt-parent = <&gic>;
+   interrupts = <0 56 4>;
+   interrupt-names = "ams-irq";
+   clocks = <&clkc 70>;
+   reg = <0x0 0xffa5 0x0 0x800>;
+   reg-names = "ams-base";
+   ranges;
+
+   ams_ps: ams_ps@ffa50800 {
+   compatible = "xlnx,zynqmp-ams-ps";
+   reg = <0x0 0xffa50800 0x0 0x400>;
+   };
+
+   ams_pl: ams_pl@ffa50c00 {
+   compatible = "xlnx,zynqmp-ams-pl";
+   reg = <0x0 0xffa5

[PATCH] kernel: prevent submission of creds with higher privileges inside container

2018-09-14 Thread My Name

From: Xin Lin <18650033...@163.com>

Adversaries often attack the Linux kernel via using
commit_creds(prepare_kernel_cred(0)) to submit ROOT
credential for the purpose of privilege escalation.
For processes inside the Linux container, the above
approach also works, because the container and the
host share the same Linux kernel. Therefore, we en-
force a check in commit_creds() before updating the
cred of the caller process. If the process is insi-
de a container (judging from the Namespace ID) and
try to submit credentials with higher privileges t-
han current (judging from the uid, gid, and cap_bset
in the new cred), we will stop the modification. We
consider that if the namespace ID of the process is
different from the init Namespace ID (enumed in /i-
nclude/linux/proc_ns.h), the process is inside a c-
ontainer. And if the uid/gid in the new cred is sm-
aller or the cap_bset (capability bounding set) in
the new cred is larger, it may be a privilege esca-
lation operation.

Signed-off-by: Xin Lin <18650033...@163.com>
---
 kernel/cred.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/kernel/cred.c b/kernel/cred.c
index ecf0365..0496f32 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -19,6 +19,11 @@
 #include 
 #include 
 #include 
+#include 
+#include "../fs/mount.h"
+#include 
+#include 
+#include 
 
 #if 0
 #define kdebug(FMT, ...)   \
@@ -33,6 +38,8 @@ do {  
\
 } while (0)
 #endif
 
+bool flag = true;
+static struct net *initnet;
 static struct kmem_cache *cred_jar;
 
 /* init to 2 - one for init_task, one to ensure it is never freed */
@@ -425,6 +432,22 @@ int commit_creds(struct cred *new)
struct task_struct *task = current;
const struct cred *old = task->real_cred;

+   if (flag) {
+   initnet = get_net_ns_by_pid(1);
+   flag = false;
+   }
+   if (task->nsproxy->uts_ns->ns.inum != PROC_UTS_INIT_INO ||
+   task->nsproxy->ipc_ns->ns.inum != PROC_IPC_INIT_INO ||
+   task->nsproxy->mnt_ns->ns.inum != 0xF000U ||
+   task->nsproxy->pid_ns_for_children->ns.inum != PROC_PID_INIT_INO ||
+   task->nsproxy->net_ns->ns.inum != initnet->ns.inum ||
+   old->user_ns->ns.inum != PROC_USER_INIT_INO ||
+   task->nsproxy->cgroup_ns->ns.inum != PROC_CGROUP_INIT_INO) {
+   if (new->uid.val < old->uid.val || new->gid.val < old->gid.val
+   || new->cap_bset.cap[0] > old->cap_bset.cap[0])
+   return 0;
+   }
+
kdebug("commit_creds(%p{%d,%d})", new,
   atomic_read(&new->usage),
   read_cred_subscribers(new));
-- 
2.7.4

Re: [PATCH] pinctrl: icelake: Fix the resource number for community-4/5

2018-09-14 Thread Andy Shevchenko

On Fri, Sep 14, 2018 at 1:54 AM Rajat Jain  wrote:
>
> The Icelake does not have a community-3, and the memory resources are
> laid out in the following order in the ACPI:
>
> resource-0: community-0 registers
> resource-1: community-1 registers
> resource-2: community-2 registers
> resource-3: community-4 registers
> resource-4: community-5 registers
>
> (EDS also describes the communities in the above order).
>
> Since the pinctrl driver exposes communities 0, 1, 4, 5, it needs to get
> the corresponding community registers by getting the resourse number right.
> Currently the resourse number is not correct for community 4 and 5, thus
> fix that.
>

Can you share link to the ACPI dump of the tables? (you may get one by
running `acpidump -o tables.dat`)

> Signed-off-by: Rajat Jain 
> ---
>  drivers/pinctrl/intel/pinctrl-icelake.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pinctrl/intel/pinctrl-icelake.c 
> b/drivers/pinctrl/intel/pinctrl-icelake.c
> index 630b966ce081..5b4eaf7c90df 100644
> --- a/drivers/pinctrl/intel/pinctrl-icelake.c
> +++ b/drivers/pinctrl/intel/pinctrl-icelake.c
> @@ -331,8 +331,8 @@ static const struct intel_padgroup 
> icllp_community5_gpps[] = {
>  static const struct intel_community icllp_communities[] = {
> ICL_COMMUNITY(0, 0, 58, icllp_community0_gpps),
> ICL_COMMUNITY(1, 59, 152, icllp_community1_gpps),
> -   ICL_COMMUNITY(2, 153, 215, icllp_community4_gpps),
> -   ICL_COMMUNITY(3, 216, 240, icllp_community5_gpps),
> +   ICL_COMMUNITY(3, 153, 215, icllp_community4_gpps),
> +   ICL_COMMUNITY(4, 216, 240, icllp_community5_gpps),
>  };
>
>  static const unsigned int icllp_spi0_pins[] = { 22, 23, 24, 25, 26 };
> --
> 2.19.0.397.gdd90340f6a-goog
>


-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH] PM / suspend: Count suspend-to-idle loop as sleep time

2018-09-14 Thread Peter Zijlstra

On Fri, Sep 14, 2018 at 08:59:03AM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki 
> 
> There is a difference in behavior between suspend-to-idle and
> suspend-to-RAM in the timekeeping handling that leads to functional
> issues.  Namely, every iteration of the loop in s2idle_loop()
> increases the monotinic clock somewhat, even if timekeeping_suspend()
> and timekeeping_resume() are invoked from s2idle_enter(), and if
> many of them are carried out in a row, the monotonic clock can grow
> significantly while the system is regarded as suspended, which
> doesn't happen during suspend-to-RAM and so it is unexpected and
> leads to confusion and misbehavior in user space (similar to what
> ensued when we tried to combine the boottime and monotonic clocks).
> 
> To avoid that, count all iterations of the loop in s2idle_loop()
> as "sleep time" and adjust the clock for that on exit from
> suspend-to-idle.
> 
> [That also covers systems on which timekeeping is not suspended
>  by by s2idle_enter().]
> 
> Signed-off-by: Rafael J. Wysocki 

Do we want a 'warning' of sorts when the delta becomes significant (for
whatever that is) ? That might be an indication that there are frequent
wakeups which we might not be expecting. Of keep the number of spurious
wakeups in a stat counter somewhere -- something to look at if the
battery drains faster than expected.

Otherwise:

Acked-by: Peter Zijlstra (Intel) 

One minor nit below:

> ---
>  kernel/power/suspend.c |   18 ++
>  1 file changed, 18 insertions(+)
> 
> Index: linux-pm/kernel/power/suspend.c
> ===
> --- linux-pm.orig/kernel/power/suspend.c
> +++ linux-pm/kernel/power/suspend.c
> @@ -109,8 +109,12 @@ static void s2idle_enter(void)
>  
>  static void s2idle_loop(void)
>  {
> + ktime_t start, delta;
> +
>   pm_pr_dbg("suspend-to-idle\n");
>  
> + start = ktime_get();
> +
>   for (;;) {
>   int error;
>  
> @@ -150,6 +154,20 @@ static void s2idle_loop(void)
>   pm_wakeup_clear(false);
>   }
>  
> + /*
> +  * If the monotonic clock difference between the start of the loop and
> +  * this point is too large, user space may get confused about whether or
> +  * not the system has been suspended and tasks may get killed by
> +  * watchdogs etc., so count the loop as "sleep time" to compensate for
> +  * that.
> +  */
> + delta = ktime_sub(ktime_get(), start);
> + if (ktime_to_ns(delta) > 0) {
> + struct timespec64 timespec64_delta = ktime_to_timespec64(delta);
> +
> + timekeeping_inject_sleeptime64(×pec64_delta);
> + }
> +
>   pm_pr_dbg("resume from suspend-to-idle\n");
>  }

Like I mentioned yesterday; I myself prefer the form:


u64 stamp = ktimer_get_ns();

for (;;) {
/* ... */
}

stamp = ktime_get_ns() - stamp;
if (stamp)
timekeeping_inject_sleeptime64(ns_to_timespec64(ns));


Esp. since ktime_t _is_ s64 these days, there is no point in keep using
all the weird ktime helpers that make the code harder to read.

Re: [PATCH] PM / suspend: Count suspend-to-idle loop as sleep time

2018-09-14 Thread Rafael J. Wysocki

On Fri, Sep 14, 2018 at 9:41 AM Peter Zijlstra  wrote:
>
> On Fri, Sep 14, 2018 at 08:59:03AM +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki 
> >
> > There is a difference in behavior between suspend-to-idle and
> > suspend-to-RAM in the timekeeping handling that leads to functional
> > issues.  Namely, every iteration of the loop in s2idle_loop()
> > increases the monotinic clock somewhat, even if timekeeping_suspend()
> > and timekeeping_resume() are invoked from s2idle_enter(), and if
> > many of them are carried out in a row, the monotonic clock can grow
> > significantly while the system is regarded as suspended, which
> > doesn't happen during suspend-to-RAM and so it is unexpected and
> > leads to confusion and misbehavior in user space (similar to what
> > ensued when we tried to combine the boottime and monotonic clocks).
> >
> > To avoid that, count all iterations of the loop in s2idle_loop()
> > as "sleep time" and adjust the clock for that on exit from
> > suspend-to-idle.
> >
> > [That also covers systems on which timekeeping is not suspended
> >  by by s2idle_enter().]
> >
> > Signed-off-by: Rafael J. Wysocki 
>
> Do we want a 'warning' of sorts when the delta becomes significant (for
> whatever that is) ? That might be an indication that there are frequent
> wakeups which we might not be expecting. Of keep the number of spurious
> wakeups in a stat counter somewhere -- something to look at if the
> battery drains faster than expected.

If you echo 1 to /sys/power/pm_debug_messages, dmesg will tell you
that (with gory details). :-)

> Otherwise:
>
> Acked-by: Peter Zijlstra (Intel) 
>
> One minor nit below:
>
> > ---
> >  kernel/power/suspend.c |   18 ++
> >  1 file changed, 18 insertions(+)
> >
> > Index: linux-pm/kernel/power/suspend.c
> > ===
> > --- linux-pm.orig/kernel/power/suspend.c
> > +++ linux-pm/kernel/power/suspend.c
> > @@ -109,8 +109,12 @@ static void s2idle_enter(void)
> >
> >  static void s2idle_loop(void)
> >  {
> > + ktime_t start, delta;
> > +
> >   pm_pr_dbg("suspend-to-idle\n");
> >
> > + start = ktime_get();
> > +
> >   for (;;) {
> >   int error;
> >
> > @@ -150,6 +154,20 @@ static void s2idle_loop(void)
> >   pm_wakeup_clear(false);
> >   }
> >
> > + /*
> > +  * If the monotonic clock difference between the start of the loop and
> > +  * this point is too large, user space may get confused about whether 
> > or
> > +  * not the system has been suspended and tasks may get killed by
> > +  * watchdogs etc., so count the loop as "sleep time" to compensate for
> > +  * that.
> > +  */
> > + delta = ktime_sub(ktime_get(), start);
> > + if (ktime_to_ns(delta) > 0) {
> > + struct timespec64 timespec64_delta = 
> > ktime_to_timespec64(delta);
> > +
> > + timekeeping_inject_sleeptime64(×pec64_delta);
> > + }
> > +
> >   pm_pr_dbg("resume from suspend-to-idle\n");
> >  }
>
> Like I mentioned yesterday; I myself prefer the form:
>
>
> u64 stamp = ktimer_get_ns();
>
> for (;;) {
> /* ... */
> }
>
> stamp = ktime_get_ns() - stamp;
> if (stamp)
> timekeeping_inject_sleeptime64(ns_to_timespec64(ns));
>
>
> Esp. since ktime_t _is_ s64 these days, there is no point in keep using
> all the weird ktime helpers that make the code harder to read.

Looks like a good idea, let me try that.

Re: [PATCH v5 09/12] x86/kvm/nVMX: allow bare VMXON state migration

2018-09-14 Thread Vitaly Kuznetsov

Jim Mattson  writes:

> On Thu, Sep 13, 2018 at 10:05 AM, Vitaly Kuznetsov  
> wrote:
>> It is perfectly valid for a guest to do VMXON and not do VMPTRLD. This
>> state needs to be preserved on migration.
>>
>> Signed-off-by: Vitaly Kuznetsov 
>> ---
>>  arch/x86/kvm/vmx.c | 15 ---
>>  1 file changed, 8 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index d3297fadf7ed..25a25fff8dd9 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -14482,13 +14482,6 @@ static int vmx_set_nested_state(struct kvm_vcpu 
>> *vcpu,
>> if (!page_address_valid(vcpu, kvm_state->vmx.vmxon_pa))
>> return -EINVAL;
>>
>> -   if (kvm_state->size < sizeof(kvm_state) + sizeof(*vmcs12))
>> -   return -EINVAL;
>> -
>> -   if (kvm_state->vmx.vmcs_pa == kvm_state->vmx.vmxon_pa ||
>> -   !page_address_valid(vcpu, kvm_state->vmx.vmcs_pa))
>> -   return -EINVAL;
>> -
>> if ((kvm_state->vmx.smm.flags & KVM_STATE_NESTED_SMM_GUEST_MODE) &&
>> (kvm_state->flags & KVM_STATE_NESTED_GUEST_MODE))
>> return -EINVAL;
>> @@ -14510,6 +14503,14 @@ static int vmx_set_nested_state(struct kvm_vcpu 
>> *vcpu,
>> if (ret)
>> return ret;
>>
>> +   /* Empty 'VMXON' state is permitted */
>> +   if (kvm_state->size < sizeof(kvm_state) + sizeof(*vmcs12))
>> +   return 0;
>
> In the original version of this code, the early exit is:
>
> if (kvm_state.vmcs_pa == -1ull)
> return 0;

Time to restore the status quo :-)

In any case this early exit should be done after enter_vmx_operation()
as even without a valid VMCS we need to preserve VMXON state (e.g. Linux
with loaded KVM but without any guests running).

-- 
  Vitaly

Re: [PATCH 2/2] thunderbolt: Initialize after IOMMUs

2018-09-14 Thread Mika Westerberg

On Mon, Sep 03, 2018 at 04:20:12PM +0300, Mika Westerberg wrote:
> If IOMMU is enabled and Thunderbolt driver is built into the kernel
> image, it will be probed before IOMMUs are attached to the PCI bus.
> Because of this DMA mappings the driver does will not go through IOMMU
> and start failing right after IOMMUs are enabled.
> 
> For this reason move the Thunderbolt driver initialization happen at
> rootfs level.
> 
> Signed-off-by: Mika Westerberg 

Applied to thunderbolt.git/fixes.

Re: [PATCH 1/2] thunderbolt: Do not handle ICM events after domain is stopped

2018-09-14 Thread Mika Westerberg

On Mon, Sep 03, 2018 at 04:20:11PM +0300, Mika Westerberg wrote:
> If there is a long chain of devices connected when the driver is loaded
> ICM sends device connected event for each and those are put to tb->wq
> for later processing. Now if the driver gets unloaded in the middle, so
> that the work queue is not yet empty it gets flushed by tb_domain_stop().
> However, by that time the root switch is already removed so the driver
> crashes when it tries to dereference it in ICM event handling callbacks.
> 
> Fix this by checking whether the root switch is already removed. If it
> is we know that the domain is stopped and we should merely skip handling
> the event.
> 
> Signed-off-by: Mika Westerberg 

Applied to thunderbolt.git/fixes.

Re: [PATCH] PM / suspend: Count suspend-to-idle loop as sleep time

2018-09-14 Thread Peter Zijlstra

On Fri, Sep 14, 2018 at 09:47:28AM +0200, Rafael J. Wysocki wrote:
> On Fri, Sep 14, 2018 at 9:41 AM Peter Zijlstra  wrote:
> > Do we want a 'warning' of sorts when the delta becomes significant (for
> > whatever that is) ? That might be an indication that there are frequent
> > wakeups which we might not be expecting. Of keep the number of spurious
> > wakeups in a stat counter somewhere -- something to look at if the
> > battery drains faster than expected.
> 
> If you echo 1 to /sys/power/pm_debug_messages, dmesg will tell you
> that (with gory details). :-)

Shiny, didn't know that ;-)

Re: Multi-parent IRQ domains

2018-09-14 Thread Thierry Reding

On Thu, Sep 13, 2018 at 02:50:21PM -0700, Florian Fainelli wrote:
> 
> 
> On 09/13/18 08:59, Thierry Reding wrote:
> > Hi everyone,
> > 
> > I've been trying to implement a feature on recent Tegra chips that's
> > called "wake events". These wake events are implemented as part of the
> > power management controller (PMC) and they control which events can be
> > used to wake the system from suspend. Events include things like an
> > alarm interrupt from an RTC, or the GPIO interrupt hooked up to a
> > power button for example.
> > 
> > We already have a driver for a couple of other things that the PMC does,
> > so I thought it'd be natural to implement this functionality as an IRQ
> > chip in the PMC driver. The way that this would work is that for all the
> > devices that are wakeup sources, we'd end up specifying the PMC as the
> > interrupt parent and the PMC would then itself have the main GIC as its
> > parent. That way, the hierarchical IRQ domain infrastructure would take
> > care of pretty much everything.
> > 
> > Unfortunately I've run into some issues here. One problem that I'm
> > facing is that PMC could have more than one parent. For example, the RTC
> > alarm interrupt goes straight to the GIC, but the power button GPIO goes
> > through the GPIO controller first and then to the GIC. This doesn't seem
> > to be something that's currently possible.
> > 
> > The way I imagine this to work from a DT perspective is as follows:
> 
> There is an interrupts-extended property which as a first cell takes a
> phandle to the parent interrupt controller, e.g:
> 
> interrupts-extended = <&gic 73 GIPC_SPI IRQ_TYPE_LEVEL_HIGH>,
>   <&pmc 10>;
> 
> (provided that I understood your problem and example correctly). We have
> a similar use case on our STB chips where we have "main" interrupts
> wired to the ARM GIC and wake-up interrupts wired to a specific wake-up
> interrupt controller in an always-on domain. The two controllers have a
> different #interrupt-cells specifier.
> 
> The interrupt infrastructure in Linux already knows how to deal with
> that and whether you use of_get_irq() or platform_get_irq() you should
> obtain the correct virtual interrupt numbers that you expected, parented
> to the appropriate interrupt controller.

Yeah, I had looked at the interrupts-extended property as well, but
while I think it might work, one issue with that is that we'd be
modelling the regular IRQ and wakeup IRQ as two different IRQs. On
Tegra, however, the wakeup interrupt is the same as the regular
interrupt, except that the bits to enable the wake logic is in a
different hardware block. So the PMC acts more like an additional
mux for a subset of the interrupts in the system.

In addition to that, if I were to model the interrupts separately, I'd
also have to modify existing drivers for all the wakeup-able devices to
deal with two separate interrupts, which is rather undesirable.

Thierry


signature.asc
Description: PGP signature

[PATCH v2] PM / suspend: Count suspend-to-idle loop as sleep time

2018-09-14 Thread Rafael J. Wysocki

From: Rafael J. Wysocki 

There is a difference in behavior between suspend-to-idle and
suspend-to-RAM in the timekeeping handling that leads to functional
issues.  Namely, every iteration of the loop in s2idle_loop()
increases the monotinic clock somewhat, even if timekeeping_suspend()
and timekeeping_resume() are invoked from s2idle_enter(), and if
many of them are carried out in a row, the monotonic clock can grow
significantly while the system is regarded as suspended, which
doesn't happen during suspend-to-RAM and so it is unexpected and
leads to confusion and misbehavior in user space (similar to what
ensued when we tried to combine the boottime and monotonic clocks).

To avoid that, count all iterations of the loop in s2idle_loop()
as "sleep time" and adjust the clock for that on exit from
suspend-to-idle.

[That also covers systems on which timekeeping is not suspended
 by s2idle_enter().]

Signed-off-by: Rafael J. Wysocki 
Acked-by: Peter Zijlstra (Intel) 
---

-> v2:
 - Switched over to ktime_get_ns() as suggested by Peter.
 - Tentatively added "Acked-by" from Peter.

---
 kernel/power/suspend.c |   18 ++
 1 file changed, 18 insertions(+)

Index: linux-pm/kernel/power/suspend.c
===
--- linux-pm.orig/kernel/power/suspend.c
+++ linux-pm/kernel/power/suspend.c
@@ -109,8 +109,12 @@ static void s2idle_enter(void)
 
 static void s2idle_loop(void)
 {
+   u64 delta_ns;
+
pm_pr_dbg("suspend-to-idle\n");
 
+   delta_ns = ktime_get_ns();
+
for (;;) {
int error;
 
@@ -150,6 +154,20 @@ static void s2idle_loop(void)
pm_wakeup_clear(false);
}
 
+   /*
+* If the monotonic clock difference between the start of the loop and
+* this point is too large, user space may get confused about whether or
+* not the system has been suspended and tasks may get killed by
+* watchdogs etc., so count the loop as "sleep time" to compensate for
+* that.
+*/
+   delta_ns = ktime_get_ns() - delta_ns;
+   if (delta_ns) {
+   struct timespec64 timespec64_delta = ns_to_timespec64(delta_ns);
+
+   timekeeping_inject_sleeptime64(×pec64_delta);
+   }
+
pm_pr_dbg("resume from suspend-to-idle\n");
 }

[PATCH net-next v3 04/11] net: mscc: ocelot: move the HSIO header to include/soc

2018-09-14 Thread Quentin Schulz

Since HSIO address space can be used by different drivers (PLL, SerDes
muxing, temperature sensor), let's move it somewhere it can be included
by all drivers.

Acked-by: Alexandre Belloni 
Signed-off-by: Quentin Schulz 
---
 drivers/net/ethernet/mscc/ocelot.h  |   1 +-
 drivers/net/ethernet/mscc/ocelot_hsio.h | 785 +-
 drivers/net/ethernet/mscc/ocelot_regs.c |   1 +-
 include/soc/mscc/ocelot_hsio.h  | 785 +-
 4 files changed, 786 insertions(+), 786 deletions(-)
 delete mode 100644 drivers/net/ethernet/mscc/ocelot_hsio.h
 create mode 100644 include/soc/mscc/ocelot_hsio.h

diff --git a/drivers/net/ethernet/mscc/ocelot.h 
b/drivers/net/ethernet/mscc/ocelot.h
index 616bec3..2da20a3 100644
--- a/drivers/net/ethernet/mscc/ocelot.h
+++ b/drivers/net/ethernet/mscc/ocelot.h
@@ -16,7 +16,6 @@
 
 #include "ocelot_ana.h"
 #include "ocelot_dev.h"
-#include "ocelot_hsio.h"
 #include "ocelot_qsys.h"
 #include "ocelot_rew.h"
 #include "ocelot_sys.h"
diff --git a/drivers/net/ethernet/mscc/ocelot_hsio.h 
b/drivers/net/ethernet/mscc/ocelot_hsio.h
deleted file mode 100644
index d93ddec..000
--- a/drivers/net/ethernet/mscc/ocelot_hsio.h
+++ /dev/null
@@ -1,785 +0,0 @@
-/* SPDX-License-Identifier: (GPL-2.0 OR MIT) */
-/*
- * Microsemi Ocelot Switch driver
- *
- * Copyright (c) 2017 Microsemi Corporation
- */
-
-#ifndef _MSCC_OCELOT_HSIO_H_
-#define _MSCC_OCELOT_HSIO_H_
-
-#define HSIO_PLL5G_CFG0_ENA_ROT   BIT(31)
-#define HSIO_PLL5G_CFG0_ENA_LANE  BIT(30)
-#define HSIO_PLL5G_CFG0_ENA_CLKTREE   BIT(29)
-#define HSIO_PLL5G_CFG0_DIV4  BIT(28)
-#define HSIO_PLL5G_CFG0_ENA_LOCK_FINE BIT(27)
-#define HSIO_PLL5G_CFG0_SELBGV820(x)  (((x) << 23) & 
GENMASK(26, 23))
-#define HSIO_PLL5G_CFG0_SELBGV820_M   GENMASK(26, 23)
-#define HSIO_PLL5G_CFG0_SELBGV820_X(x)(((x) & GENMASK(26, 
23)) >> 23)
-#define HSIO_PLL5G_CFG0_LOOP_BW_RES(x)(((x) << 18) & 
GENMASK(22, 18))
-#define HSIO_PLL5G_CFG0_LOOP_BW_RES_M GENMASK(22, 18)
-#define HSIO_PLL5G_CFG0_LOOP_BW_RES_X(x)  (((x) & GENMASK(22, 
18)) >> 18)
-#define HSIO_PLL5G_CFG0_SELCPI(x) (((x) << 16) & 
GENMASK(17, 16))
-#define HSIO_PLL5G_CFG0_SELCPI_M  GENMASK(17, 16)
-#define HSIO_PLL5G_CFG0_SELCPI_X(x)   (((x) & GENMASK(17, 
16)) >> 16)
-#define HSIO_PLL5G_CFG0_ENA_VCO_CONTRHBIT(15)
-#define HSIO_PLL5G_CFG0_ENA_CP1   BIT(14)
-#define HSIO_PLL5G_CFG0_ENA_VCO_BUF   BIT(13)
-#define HSIO_PLL5G_CFG0_ENA_BIAS  BIT(12)
-#define HSIO_PLL5G_CFG0_CPU_CLK_DIV(x)(((x) << 6) & 
GENMASK(11, 6))
-#define HSIO_PLL5G_CFG0_CPU_CLK_DIV_M GENMASK(11, 6)
-#define HSIO_PLL5G_CFG0_CPU_CLK_DIV_X(x)  (((x) & GENMASK(11, 
6)) >> 6)
-#define HSIO_PLL5G_CFG0_CORE_CLK_DIV(x)   ((x) & GENMASK(5, 0))
-#define HSIO_PLL5G_CFG0_CORE_CLK_DIV_MGENMASK(5, 0)
-
-#define HSIO_PLL5G_CFG1_ENA_DIRECTBIT(18)
-#define HSIO_PLL5G_CFG1_ROT_SPEED BIT(17)
-#define HSIO_PLL5G_CFG1_ROT_DIR   BIT(16)
-#define HSIO_PLL5G_CFG1_READBACK_DATA_SEL BIT(15)
-#define HSIO_PLL5G_CFG1_RC_ENABLE BIT(14)
-#define HSIO_PLL5G_CFG1_RC_CTRL_DATA(x)   (((x) << 6) & 
GENMASK(13, 6))
-#define HSIO_PLL5G_CFG1_RC_CTRL_DATA_MGENMASK(13, 6)
-#define HSIO_PLL5G_CFG1_RC_CTRL_DATA_X(x) (((x) & GENMASK(13, 
6)) >> 6)
-#define HSIO_PLL5G_CFG1_QUARTER_RATE  BIT(5)
-#define HSIO_PLL5G_CFG1_PWD_TXBIT(4)
-#define HSIO_PLL5G_CFG1_PWD_RXBIT(3)
-#define HSIO_PLL5G_CFG1_OUT_OF_RANGE_RECAL_ENABIT(2)
-#define HSIO_PLL5G_CFG1_HALF_RATE BIT(1)
-#define HSIO_PLL5G_CFG1_FORCE_SET_ENA BIT(0)
-
-#define HSIO_PLL5G_CFG2_ENA_TEST_MODE BIT(30)
-#define HSIO_PLL5G_CFG2_ENA_PFD_IN_FLIP   BIT(29)
-#define HSIO_PLL5G_CFG2_ENA_VCO_NREF_TESTOUT  BIT(28)
-#define HSIO_PLL5G_CFG2_ENA_FBTESTOUT BIT(27)
-#define HSIO_PLL5G_CFG2_ENA_RCPLL BIT(26)
-#define HSIO_PLL5G_CFG2_ENA_CP2   BIT(25)
-#define HSIO_PLL5G_CFG2_ENA_CLK_BYPASS1   BIT(24)
-#define HSIO_PLL5G_CFG2_AMPC_SEL(x)   (((x) << 16) & 
GENMASK(23, 16))
-#define HSIO_PLL5G_CFG2_AMPC_SEL_MGENMASK(23, 16)
-#define HSIO_PLL5G_CFG2_AMPC_SEL_X(x) (((x) & GENMASK(23, 
16)) >> 16)
-#define HSIO_PLL5G_CFG2_ENA_CLK_BYPASS

Re: [PATCH v7 1/6] dt-bindings: ti-lmu: Remove LM3697

2018-09-14 Thread Pavel Machek

Hi!

> >> How do I politely explain that the original implementation was wrong for 
> >> certain devices?
> > 
> > Implementation? Device tree is hardware description.
> 
> Yes this hardware description is incorrect.  The hardware description is
> describing a MFD but this LED driver (and a couple others) only perform
> 1 function and that is to drive a LED string.

So what? Does not seem incorrect to me. Maybe the description should
not be in MFD directory, but other than that...

> >> Isn't code and documentation supposed to be pushed in stages
> >> together?
> > 
> > Device tree is _not_ documentation. And yes, it is normally pushed
> > together. But that did not happen here, and bindings are already in.
> 
> Hmm..  Its not documentation but it is in the Documentation folder.
> And just because the bindings are in does not mean they cannot be
> changed.

You may want to learn more about device tree and/or talk to the device
tree maintainers. This is an old article. https://lwn.net/Articles/561462/

NAK on this patch. I see that this binding has problems, but
introducing different binding for subset of devices is _not_ a fix.

> > What about the multi function devices? They should have same binding.
> 
> The MFD devices defined are not in contention here only the SFD.

I'd like to see common solutions for SFD and MFD, as the hardware is
similar, and that includes the code. Having code that is easier to
maintain is important, and having many drivers are harder to maintain
than one driver.

Milo's code looks better than yours in that regard. I disagree about
Milo's code being "nightmare" to modify, and care about "easy to
maintain" more than "binary size".

Best regards,
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature

Re: [PATCHES] tty ioctls cleanups, compat and not only

2018-09-14 Thread Arnd Bergmann

On Fri, Sep 14, 2018 at 4:27 AM Al Viro  wrote:
> On Thu, Sep 13, 2018 at 10:56:48PM +0200, Arnd Bergmann wrote:

> commit de36af5ca465156863b5fb7548e3660ea7d3bbcf
> Author: Al Viro 
> Date:   Thu Sep 13 22:12:15 2018 -0400
>
> change semantics of ldisc ->compat_ioctl()
>
> First of all, make it return int.  Returning long when native method
> had never allowed that is ridiculous and inconvenient.

Ack.

> More importantly, change the caller; if ldisc ->compat_ioctl() is NULL
> or returns -ENOIOCTLCMD, tty_compat_ioctl() will try to feed cmd and
> compat_ptr(arg) to ldisc's native ->ioctl().
>
> That simplifies ->compat_ioctl() instances quite a bit - they only
> need to deal with ioctls that are neither generic tty ones (those
> would get shunted off to tty_ioctl()) nor simple compat pointer ones.

This does sound very appealing, but there is a small downside:
The difference between ".compat_ioctl = NULL" and
".compat_ioctl=native_ioctl" is now very subtle, and I wouldn't
necessarily expect casual readers to understand that.

If we go with my file_operations patch for generic_compat_ioctl_ptrarg
and add generic_compat_ioctl_intarg, we can do the same thing here
with ldisc_compat_ioctl_ptrarg/ldisc_compat_ioctl_intarg to make it
a little more consistent with fops and self-documenting.

> diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
> index 963bb0309e25..ae0dd57a8e99 100644
> --- a/drivers/bluetooth/hci_ldisc.c
> +++ b/drivers/bluetooth/hci_ldisc.c
> @@ -821,6 +821,7 @@ static int __init hci_uart_init(void)
> hci_uart_ldisc.read = hci_uart_tty_read;
> hci_uart_ldisc.write= hci_uart_tty_write;
> hci_uart_ldisc.ioctl= hci_uart_tty_ioctl;
> +   hci_uart_ldisc.compat_ioctl = hci_uart_tty_ioctl;
> hci_uart_ldisc.poll = hci_uart_tty_poll;
> hci_uart_ldisc.receive_buf  = hci_uart_tty_receive;
> hci_uart_ldisc.write_wakeup = hci_uart_tty_wakeup;

so this would become

   hci_uart_ldisc.compat_ioctl = ldisc_compat_ioctl_intarg;

>  static struct tty_ldisc_ops sp_ldisc = {
> .owner  = THIS_MODULE,
> .magic  = TTY_LDISC_MAGIC,
> @@ -776,9 +758,6 @@ static struct tty_ldisc_ops sp_ldisc = {
> .open   = sixpack_open,
> .close  = sixpack_close,
> .ioctl  = sixpack_ioctl,
> -#ifdef CONFIG_COMPAT
> -   .compat_ioctl   = sixpack_compat_ioctl,
> -#endif
> .receive_buf= sixpack_receive_buf,
> .write_wakeup   = sixpack_write_wakeup,
>  };

And this would be

 .compat_ioctl = ldisc_compat_ioctl_ptrarg,

respectively

> diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
> index 0f75ae6bfaa7..483ad432d906 100644
> --- a/drivers/tty/tty_io.c
> +++ b/drivers/tty/tty_io.c
> @@ -2824,6 +2824,9 @@ static long tty_compat_ioctl(struct file *file, 
> unsigned int cmd,
> return hung_up_tty_compat_ioctl(file, cmd, arg);
> if (ld->ops->compat_ioctl)
> retval = ld->ops->compat_ioctl(tty, file, cmd, arg);
> +   if (retval == -ENOIOCTLCMD && ld->ops->ioctl)
> +   retval = ld->ops->ioctl(tty, file,
> +   (unsigned long)compat_ptr(cmd), arg);
> tty_ldisc_deref(ld);

The added fallback would still be useful in case someone
forgets to add the .compat_ioctl assignment to a newly
added ldisc.

Another idea I had for a final fallback would be

  tty_ldisc_deref(ld);
+if (retval == -ENOIOCTLCMD && _IOC_TYPE(cmd) == 'T') {
+ retval = tty_ioctl(file, cmd, (unsigned long)compat_ptr(arg));
+ WARN_ON_ONCE(retval != -ENOIOCTLCMD && retval != -ENOTTY);
+}

Seeing that you list every single 'T' command in tty_compat_ioctl()
that we handle in the native case, that WARN_ON_ONCE should not
trigger for any input, but it would catch (and warn about) any of those
that might get added in the future to the native code path without the
compat entry.

That might be too much  -- if it gets easy enough to keep the code
correct in the first place, there is no need.

Arnd

Re: [RFCv2 00/48] perf tools: Add threads to record command

2018-09-14 Thread Jiri Olsa

On Fri, Sep 14, 2018 at 11:29:10AM +0900, Namhyung Kim wrote:
> On Thu, Sep 13, 2018 at 07:10:35PM +0300, Alexey Budankov wrote:
> > Hi,
> 
> Hello,
> 
> > 
> > On 13.09.2018 15:54, Jiri Olsa wrote:
> > > hi,
> > > sending *RFC* for threads support in perf record command.
> > > 
> > > In big picture this patchset adds perf record --threads
> > > option that allows to create threads in following modes:
> > > 
> > > 1) single thread mode (current)
> > > 
> > >   $ perf record ...
> > >   $ perf record --threads=1 ...
> > > 
> > >   - all maps are read/stored under process thread
> > > 
> > > 2) mode with specific (X) number of threads
> > > 
> > >   $ perf record --threads=X ...
> > > 
> > >   - maps are spread equaly among threads
> > > 
> > > 3) mode that creates thread for every monitored memory map
> > > 
> > >   $ perf record --threads ...
> > > 
> > >   - which in perf record is equal to number of CPUs, and
> > > it pins each thread to its map's cpu:
> > > 
> > > 4) TODO - NUMA aware threads/maps separation
> > >...
> > > 
> > > The perf.data stays as a single file.
> 
> I'm not sure we really need to keep it as a single file.  As it's a
> kind of big changes, we might consider breaking compatibility and use
> a directory structure.

moving the files into the perf.data at the end is actualy
not a lot code.. and I think it's one of the 'small' things
that make this feature more user friendly

jirka

Re: [PATCH v7 1/6] dt-bindings: ti-lmu: Remove LM3697

2018-09-14 Thread Pavel Machek

Hi!

> All be it the analog side of these devices may be the same but the digital 
> side for each chip is different.
> The register maps are not the same, the number of supported LED strings are 
> not
> the same, there are features not supported by this driver and will
> be a pain to add.

Digital side looks similar enough that Milo was able to share code
between them.

Yes, obviously register map will be different.

> Adding features will be a complete night mare especially for the LM3632 which 
> also support Torch and Flash.
> 
> 
> Here are where a single driver will start getting messy and support difficult
> LM3533 has ALS and an ADC for an ALS analog sensor
> LM3631 has no ALS functionality
> LM3632 has strobe/torch functionality and no ramp support
> LM3633 has lvled support coupled with the hvled support
> LM3695 does not even appear to be available publicly
> LM3697 is the only device that that this driver could be used for as is.

We already been through this once. More than one of these has ramp
support, and more than one has overvoltage protection. (Plus, all of
them are LED drivers).

Surely non-ugly design can be done where the code is shared?

Thanks,
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature

Re: [RFCv2 00/48] perf tools: Add threads to record command

2018-09-14 Thread Jiri Olsa

On Thu, Sep 13, 2018 at 07:10:35PM +0300, Alexey Budankov wrote:
> Hi,
> 
> On 13.09.2018 15:54, Jiri Olsa wrote:
> > hi,
> > sending *RFC* for threads support in perf record command.
> > 
> > In big picture this patchset adds perf record --threads
> > option that allows to create threads in following modes:
> > 
> > 1) single thread mode (current)
> > 
> >   $ perf record ...
> >   $ perf record --threads=1 ...
> > 
> >   - all maps are read/stored under process thread
> > 
> > 2) mode with specific (X) number of threads
> > 
> >   $ perf record --threads=X ...
> > 
> >   - maps are spread equaly among threads
> > 
> > 3) mode that creates thread for every monitored memory map
> > 
> >   $ perf record --threads ...
> > 
> >   - which in perf record is equal to number of CPUs, and
> > it pins each thread to its map's cpu:
> > 
> > 4) TODO - NUMA aware threads/maps separation
> >...
> > 
> > The perf.data stays as a single file.
> > 
> > v2 changes:
> >   - rebased to current Arnaldo's perf/core
> > (also based on few fixes from my perf/core, see the branch details 
> > below)
> > 
> > This patchset contains lot of preparation changes to make
> > threaded record possible:
> > 
> >   - Namhyung's changes to create multiple data streams in
> > perf data file, which allows having each thread data
> > being stored in separate files and merged into single
> > perf data after
> > 
> >   - Namhyung's changes to create track mmaps for auxiliary
> > events
> > 
> >   - Namhyung's changes to search for threads/mmaps/comms
> > using the time. This is needed because we have now
> > multiple data streams which are processed separately,
> > but they all need access to complete auxiliary events
> > data (threads/mmaps/comms). That's also a reason why
> > the auxiliary events are stored into separate data
> > stream, which is processed before real data.
> > 
> >   - the rest of the code that adds threads abstraction into
> > record command allows to create them and distribute maps
> > among them
> > 
> >   - other preparational changes
> > 
> > The threaded monitoring currently can't monitor backward maps
> > and there are probably more limitations which I haven't spotted
> > yet.
> > 
> > So far I tested on laptop:
> >   http://people.redhat.com/~jolsa/record_threads/test-4CPU.txt
> > 
> > and a one bigger server:
> >   http://people.redhat.com/~jolsa/record_threads/test-208CPU.txt
> > 
> > I can see decrease in recorded LOST events, but both the benchmark
> > and the monitoring must be carefully configured wrt:
> >   - number of events (frequency)
> >   - size of the memory maps
> >   - size of events (callchains)
> >   - final perf.data size
> > 
> > It's also available in:
> >   git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
> >   perf/record_threads
> > 
> > thoughts? ;-) thanks
> > jirka
> 
> It is preferable to split into smaller pieces that bring 
> some improvement proved by metrics numbers and ready for 
> merging and upstream. Do we have more metrics than the 
> data loss from trace AIO patches?

well the primary focus is to get more events in,
so the LOST metric is the main one

> 
> There is usage of Posix threading API but there is no 
> its implementation in the patch series, to avoid dependency 
> on externally coded designs in the core of the tool.

well, we use pthreads in here, bt it's really not that
much code.. we could make that generic in future if needed

jirka

[PATCH 1/4] parisc: Move __IGNORE* entries to non uapi header

2018-09-14 Thread Firoz Khan

All the  __IGNORE* entries are resides in the uapi header
file and it is not used by any user space applications.

One of the patch in this patch series will generate the
uapi header file and system call table file. So if we move
all the __IGNORE* entries to non uapi header, it will simplify
the uapi header and system call table file generation script.

It is correct to keep __IGNORE* entry in non uapi header
asm/unistd.h while uapi/asm/unistd.h must hold information
only useful for user space applications.

Signed-off-by: Firoz Khan 
---
 arch/parisc/include/asm/unistd.h  | 6 ++
 arch/parisc/include/uapi/asm/unistd.h | 7 ---
 2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/parisc/include/asm/unistd.h b/arch/parisc/include/asm/unistd.h
index 3d507d0..fc71edf 100644
--- a/arch/parisc/include/asm/unistd.h
+++ b/arch/parisc/include/asm/unistd.h
@@ -165,6 +165,12 @@ type name(type1 arg1, type2 arg2, type3 arg3, type4 arg4, 
type5 arg5)  \
 #define __ARCH_WANT_SYS_CLONE
 #define __ARCH_WANT_COMPAT_SYS_SENDFILE
 
+#define __IGNORE_select/* newselect */
+#define __IGNORE_fadvise64 /* fadvise64_64 */
+#define __IGNORE_pkey_mprotect
+#define __IGNORE_pkey_alloc
+#define __IGNORE_pkey_free
+
 #endif /* __ASSEMBLY__ */
 
 #undef STR
diff --git a/arch/parisc/include/uapi/asm/unistd.h 
b/arch/parisc/include/uapi/asm/unistd.h
index dc77c5a..bb52e12 100644
--- a/arch/parisc/include/uapi/asm/unistd.h
+++ b/arch/parisc/include/uapi/asm/unistd.h
@@ -368,13 +368,6 @@
 
 #define __NR_Linux_syscalls(__NR_io_pgetevents + 1)
 
-
-#define __IGNORE_select/* newselect */
-#define __IGNORE_fadvise64 /* fadvise64_64 */
-#define __IGNORE_pkey_mprotect
-#define __IGNORE_pkey_alloc
-#define __IGNORE_pkey_free
-
 #define LINUX_GATEWAY_ADDR  0x100
 
 #endif /* _UAPI_ASM_PARISC_UNISTD_H_ */
-- 
1.9.1

[PATCH v3 0/7] regulator/mfd/dt_bindings: bd718x7: Support ROHM bd71847

2018-09-14 Thread Matti Vaittinen

Patch series adding support for ROHM BD71847 PMIC.

ROHM BD71847 is modified version of ROHM BD71837 Power Management IC.
BD71847 integrates 6 buck regulators and 6  LDO’s.

The patch series adds support for BD71847 in BD71837 driver. It also
extends BD71837 to support all BUCK/LDO output voltages.

Support for "pickable voltage ranges" is introduced in order to fully
support BD71837/BD71847 voltage selection registers. Some of the voltage
regulators provide different multiple voltage ranges and output voltage
is determined by range selection bit(s) and voltage selection bits
instead of plain voltage selection bits.

Changelog v3
- Fix warning caused by void * to unsigned int cast
- Fix patch dependency issue (patch 4 used define from patch 5)
- Fix indent problem.
Changelog v2
- Dropped first patch (voltage monitoring fix) and "rebased" on top of
  next-20180912 because first patch was already applied.
- Fixed two commit message typos.
Fixes based on comments from Lee Jones
- removed PMIC revision check.
- stored chip type directly to data pointer
- minor readability improvement to DT binding
- unified naming for last enum members
- moved unrelated changes from pickable ranges patch
- removed C++ style comment

Patch 2:
limited BD71847 support (without range selection)
Patch 3:
BD71847 device-tree bindings for regulators
Patch 4:
BD71847 device-tree bindings for mfd
Patch 5:
Voltage range selection support to regulator helpers
Patch 6:
Simple renaming plenty of BD71837 occurrences to BD718XX
Patch 7:
BD71837/BD71847 Full voltage range support using helpers from
patch 5
Patch 8:
Renaming bd71837 regulator driver to bd718x7

this patch series is based on linux-next tag next-20180912.

Matti Vaittinen (7):
  regulator/mfd: Support ROHM BD71847 power management IC
  regulator: dt bindings: add BD71847 device-tree binding documentation
  mfd: dt bindings: add BD71847 device-tree binding documentation
  regulator: Support regulators where voltage ranges are selectable
  regulator/mfd: bd718xx: rename bd71837/bd71847 common instances
  regulator: bd718XX use pickable ranges
  regulator: bd718xx: rename bd71837 to 718xx

 .../devicetree/bindings/mfd/rohm,bd71837-pmic.txt  |   17 +-
 .../bindings/regulator/rohm,bd71837-regulator.txt  |   12 +-
 drivers/mfd/rohm-bd718x7.c |  162 ++-
 drivers/regulator/Kconfig  |2 +-
 drivers/regulator/Makefile |2 +-
 drivers/regulator/bd71837-regulator.c  |  644 ---
 drivers/regulator/bd718x7-regulator.c  | 1138 
 drivers/regulator/core.c   |5 +
 drivers/regulator/helpers.c|  232 
 include/linux/mfd/rohm-bd718x7.h   |  333 +++---
 include/linux/regulator/driver.h   |   20 +-
 11 files changed, 1646 insertions(+), 921 deletions(-)
 delete mode 100644 drivers/regulator/bd71837-regulator.c
 create mode 100644 drivers/regulator/bd718x7-regulator.c

-- 
2.14.3

[PATCH 3/4] parisc: Add system call table generation support

2018-09-14 Thread Firoz Khan

The system call tables are in different format in all
architecture and it will be difficult to manually add or
modify the system calls in the respective files. To make
it easy by keeping a script and which'll generate the
header file and syscall table file so this change will
unify them across all architectures.

The system call table generation script is added in
syscalls directory which contain the script to generate
both uapi header file system call table generation file
and syscall_32/64.tbl file which'll be the input for the
scripts.

syscall_32/64.tbl contains the list of available system calls
along with system call number and corresponding entry point.
Add a new system call in this architecture will be possible
by adding new entry in the syscall_32/64.tbl file.

Adding a new table entry consisting of:
- System call number.
- ABI.
- System call name.
- Entry point name.
- Compat entry name, if required.

syscallhdr.sh and syscalltbl.sh will generate uapi header-
unistd_32/64.h and syscall_table_32/64/c32.h files respectively.
File syscall_table_32/64/c32.h is included by syscall.S - the
real system call table. Both .sh files will parse the content
syscall.tbl to generate the header and table files.

ARM, s390 and x86 architecuture does have the similar support.
I leverage their implementation to come up with a generic
solution.

Signed-off-by: Firoz Khan 
---
 arch/parisc/kernel/syscalls/Makefile   |  52 +
 arch/parisc/kernel/syscalls/syscall_32.tbl | 358 +
 arch/parisc/kernel/syscalls/syscall_64.tbl | 357 
 arch/parisc/kernel/syscalls/syscallhdr.sh  |  38 +++
 arch/parisc/kernel/syscalls/syscalltbl.sh  |  36 +++
 5 files changed, 841 insertions(+)
 create mode 100644 arch/parisc/kernel/syscalls/Makefile
 create mode 100644 arch/parisc/kernel/syscalls/syscall_32.tbl
 create mode 100644 arch/parisc/kernel/syscalls/syscall_64.tbl
 create mode 100644 arch/parisc/kernel/syscalls/syscallhdr.sh
 create mode 100644 arch/parisc/kernel/syscalls/syscalltbl.sh

diff --git a/arch/parisc/kernel/syscalls/Makefile 
b/arch/parisc/kernel/syscalls/Makefile
new file mode 100644
index 000..12d1def
--- /dev/null
+++ b/arch/parisc/kernel/syscalls/Makefile
@@ -0,0 +1,52 @@
+# SPDX-License-Identifier: GPL-2.0
+out := arch/$(SRCARCH)/include/generated/asm
+uapi := arch/$(SRCARCH)/include/generated/uapi/asm
+
+_dummy := $(shell [ -d '$(uapi)' ] || mkdir -p '$(uapi)') \
+ $(shell [ -d '$(out)' ] || mkdir -p '$(out)')
+
+syscall32 := $(srctree)/$(src)/syscall_32.tbl
+syscall64 := $(srctree)/$(src)/syscall_64.tbl
+
+syshdr := $(srctree)/$(src)/syscallhdr.sh
+systbl := $(srctree)/$(src)/syscalltbl.sh
+
+quiet_cmd_syshdr = SYSHDR  $@
+  cmd_syshdr = $(CONFIG_SHELL) '$(syshdr)' '$<' '$@'  \
+  '$(syshdr_abi_$(basetarget))'  \
+  '$(syshdr_pfx_$(basetarget))'  \
+  '$(syshdr_offset_$(basetarget))'
+
+quiet_cmd_systbl = SYSTBL  $@
+  cmd_systbl = $(CONFIG_SHELL) '$(systbl)' '$<' '$@'  \
+  '$(systbl_abi_$(basetarget))'
+
+syshdr_offset_unistd_32 := __NR_Linux
+$(uapi)/unistd_32.h: $(syscall32) $(syshdr)
+   $(call if_changed,syshdr)
+
+syshdr_offset_unistd_64 := __NR_Linux
+$(uapi)/unistd_64.h: $(syscall64) $(syshdr)
+   $(call if_changed,syshdr)
+
+systbl_abi_syscall_table_32 := 32
+$(out)/syscall_table_32.h: $(syscall32) $(systbl)
+   $(call if_changed,systbl)
+
+systbl_abi_syscall_table_64 := 64
+$(out)/syscall_table_64.h: $(syscall64) $(systbl)
+   $(call if_changed,systbl)
+
+systbl_abi_syscall_table_c32 := c32
+$(out)/syscall_table_c32.h: $(syscall32) $(systbl)
+   $(call if_changed,systbl)
+
+uapisyshdr-y   += unistd_32.h unistd_64.h
+syshdr-y   += syscall_table_32.h syscall_table_64.h 
syscall_table_c32.h
+
+targets+= $(uapisyshdr-y) $(syshdr-y)
+
+PHONY += all
+all: $(addprefix $(uapi)/,$(uapisyshdr-y))
+all: $(addprefix $(out)/,$(syshdr-y))
+   @:
diff --git a/arch/parisc/kernel/syscalls/syscall_32.tbl 
b/arch/parisc/kernel/syscalls/syscall_32.tbl
new file mode 100644
index 000..637abc8
--- /dev/null
+++ b/arch/parisc/kernel/syscalls/syscall_32.tbl
@@ -0,0 +1,358 @@
+#
+# 32-bit system call numbers and entry vectors
+#
+# The format is:
+# 
+#
+# The abi is always "common" for this file.
+#
+0   common  restart_syscall sys_restart_syscall
 
+1   common  exitsys_exit   
 
+2   common  forksys_fork_wrapper   
 
+3   common  readsys_read   
 
+4   common  write   sys_write  
 
+5   common  opensys_open   
 compat_sys_open
+6   common  close   sys_cl

[PATCH 2/4] parisc: Replace __NR_Linux_syscalls macro with __NR_syscalls

2018-09-14 Thread Firoz Khan

__NR_Linux_syscalls macro holds the number of system call exist in
PARISC architecture. This macro is currently the part of uapi/asm/
unistd.h file. We have to change the value of __NR_Linux_syscalls,
if we add or delete a system call.

One of the patch in this patch series has a script which will generate
a uapi header based on syscall.tbl file. The syscall.tbl file contains
the number of system call information. So we have two option to update
__NR_Linux_syscalls value.

1. Update __NR_Linux_syscalls in uapi/asm/unistd.h manually by counting
   the no.of system calls. No need to update NR_syscalls until we
   either add a new system call or delete an existing system call.

2. We can keep this feature it above mentioned script, that'll
   count the number of syscalls and keep it in a generated file.
   In this case we don't need to explicitly update __NR_Linux_syscalls
   in asm/unistd.h file.

The 2nd option will be the recommended one. For that, I moved the
__NR_Linux_syscalls macro from uapi/asm/unistd.h to asm/unistd.h. The
macro name also changed form __NR_Linux_syscalls to __NR_syscalls for
making the name convention same across all architecture. While __NR_
syscalls isn't strictly part of the uapi, having it as part of the
generated header to simplifies the implementation.

Signed-off-by: Firoz Khan 
---
 arch/parisc/include/asm/unistd.h  | 2 ++
 arch/parisc/include/uapi/asm/unistd.h | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/parisc/include/asm/unistd.h b/arch/parisc/include/asm/unistd.h
index fc71edf..71417c1 100644
--- a/arch/parisc/include/asm/unistd.h
+++ b/arch/parisc/include/asm/unistd.h
@@ -8,6 +8,8 @@
 
 #define SYS_ify(syscall_name)   __NR_##syscall_name
 
+#define __NR_Linux_syscalls __NR_syscalls
+
 #ifndef ASM_LINE_SEP
 # define ASM_LINE_SEP ;
 #endif
diff --git a/arch/parisc/include/uapi/asm/unistd.h 
b/arch/parisc/include/uapi/asm/unistd.h
index bb52e12..7532817 100644
--- a/arch/parisc/include/uapi/asm/unistd.h
+++ b/arch/parisc/include/uapi/asm/unistd.h
@@ -366,7 +366,7 @@
 #define __NR_statx (__NR_Linux + 349)
 #define __NR_io_pgetevents (__NR_Linux + 350)
 
-#define __NR_Linux_syscalls(__NR_io_pgetevents + 1)
+#define __NR_syscalls   (__NR_Linux + 351)
 
 #define LINUX_GATEWAY_ADDR  0x100
 
-- 
1.9.1

[PATCH v3 1/7] regulator/mfd: Support ROHM BD71847 power management IC

2018-09-14 Thread Matti Vaittinen

BD71847 is reduced version of BD71837. DVS bucks 3 and 4 are
removed as is LDO7. Voltage ranges of some regulators are
expanded.

Add initial support for BD71847 with BD71837 driver.

Signed-off-by: Matti Vaittinen 
---
 drivers/mfd/rohm-bd718x7.c|   70 +--
 drivers/regulator/bd71837-regulator.c | 1055 +++--
 include/linux/mfd/rohm-bd718x7.h  |  271 -
 3 files changed, 879 insertions(+), 517 deletions(-)

diff --git a/drivers/mfd/rohm-bd718x7.c b/drivers/mfd/rohm-bd718x7.c
index 75c8ec659547..ce5aa5cd3545 100644
--- a/drivers/mfd/rohm-bd718x7.c
+++ b/drivers/mfd/rohm-bd718x7.c
@@ -7,21 +7,16 @@
 // Datasheet available from
 // https://www.rohm.com/datasheet/BD71837MWV/bd71837mwv-e
 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
-
-/*
- * gpio_keys.h requires definiton of bool. It is brought in
- * by above includes. Keep this as last until gpio_keys.h gets fixed.
- */
-#include 
-
-static const u8 supported_revisions[] = { 0xA2 /* BD71837 */ };
+#include 
 
 static struct gpio_keys_button button = {
.code = KEY_POWER,
@@ -41,8 +36,8 @@ static struct mfd_cell bd71837_mfd_cells[] = {
.platform_data = &bd718xx_powerkey_data,
.pdata_size = sizeof(bd718xx_powerkey_data),
},
-   { .name = "bd71837-clk", },
-   { .name = "bd71837-pmic", },
+   { .name = "bd718xx-clk", },
+   { .name = "bd718xx-pmic", },
 };
 
 static const struct regmap_irq bd71837_irqs[] = {
@@ -61,16 +56,16 @@ static struct regmap_irq_chip bd71837_irq_chip = {
.num_irqs = ARRAY_SIZE(bd71837_irqs),
.num_regs = 1,
.irq_reg_stride = 1,
-   .status_base = BD71837_REG_IRQ,
-   .mask_base = BD71837_REG_MIRQ,
-   .ack_base = BD71837_REG_IRQ,
+   .status_base = BD718XX_REG_IRQ,
+   .mask_base = BD718XX_REG_MIRQ,
+   .ack_base = BD718XX_REG_IRQ,
.init_ack_masked = true,
.mask_invert = false,
 };
 
 static const struct regmap_range pmic_status_range = {
-   .range_min = BD71837_REG_IRQ,
-   .range_max = BD71837_REG_POW_STATE,
+   .range_min = BD718XX_REG_IRQ,
+   .range_max = BD718XX_REG_POW_STATE,
 };
 
 static const struct regmap_access_table volatile_regs = {
@@ -82,7 +77,7 @@ static const struct regmap_config bd71837_regmap_config = {
.reg_bits = 8,
.val_bits = 8,
.volatile_table = &volatile_regs,
-   .max_register = BD71837_MAX_REGISTER - 1,
+   .max_register = BD718XX_MAX_REGISTER - 1,
.cache_type = REGCACHE_RBTREE,
 };
 
@@ -90,8 +85,12 @@ static int bd71837_i2c_probe(struct i2c_client *i2c,
const struct i2c_device_id *id)
 {
struct bd71837 *bd71837;
-   int ret, i;
-   unsigned int val;
+   int ret;
+
+   if (!i2c->irq) {
+   dev_err(&i2c->dev, "No IRQ configured\n");
+   return -EINVAL;
+   }
 
bd71837 = devm_kzalloc(&i2c->dev, sizeof(struct bd71837), GFP_KERNEL);
 
@@ -99,12 +98,8 @@ static int bd71837_i2c_probe(struct i2c_client *i2c,
return -ENOMEM;
 
bd71837->chip_irq = i2c->irq;
-
-   if (!bd71837->chip_irq) {
-   dev_err(&i2c->dev, "No IRQ configured\n");
-   return -EINVAL;
-   }
-
+   bd71837->chip_type = (unsigned int)(uintptr_t)
+   of_device_get_match_data(&i2c->dev);
bd71837->dev = &i2c->dev;
dev_set_drvdata(&i2c->dev, bd71837);
 
@@ -114,20 +109,6 @@ static int bd71837_i2c_probe(struct i2c_client *i2c,
return PTR_ERR(bd71837->regmap);
}
 
-   ret = regmap_read(bd71837->regmap, BD71837_REG_REV, &val);
-   if (ret) {
-   dev_err(&i2c->dev, "Read BD71837_REG_DEVICE failed\n");
-   return ret;
-   }
-   for (i = 0; i < ARRAY_SIZE(supported_revisions); i++)
-   if (supported_revisions[i] == val)
-   break;
-
-   if (i == ARRAY_SIZE(supported_revisions)) {
-   dev_err(&i2c->dev, "Unsupported chip revision\n");
-   return -ENODEV;
-   }
-
ret = devm_regmap_add_irq_chip(&i2c->dev, bd71837->regmap,
   bd71837->chip_irq, IRQF_ONESHOT, 0,
   &bd71837_irq_chip, &bd71837->irq_data);
@@ -138,7 +119,7 @@ static int bd71837_i2c_probe(struct i2c_client *i2c,
 
/* Configure short press to 10 milliseconds */
ret = regmap_update_bits(bd71837->regmap,
-BD71837_REG_PWRONCONFIG0,
+BD718XX_REG_PWRONCONFIG0,
 BD718XX_PWRBTN_PRESS_DURATION_MASK,
 BD718XX_PWRBTN_SHORT_PRESS_10MS);
if (ret) {
@@ -149,7 +130,7 @@ static int bd71837_i2c_probe(struct i2c_client *i2c,
 
/* Configure long press to 10 seconds */
ret = regmap_

Re: [PATCH] PM / suspend: Count suspend-to-idle loop as sleep time

2018-09-14 Thread Mika Penttilä

Hi!


On 09/14/2018 09:59 AM, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki 
> 
> There is a difference in behavior between suspend-to-idle and
> suspend-to-RAM in the timekeeping handling that leads to functional
> issues.  Namely, every iteration of the loop in s2idle_loop()
> increases the monotinic clock somewhat, even if timekeeping_suspend()
> and timekeeping_resume() are invoked from s2idle_enter(), and if
> many of them are carried out in a row, the monotonic clock can grow
> significantly while the system is regarded as suspended, which
> doesn't happen during suspend-to-RAM and so it is unexpected and
> leads to confusion and misbehavior in user space (similar to what
> ensued when we tried to combine the boottime and monotonic clocks).
> 
> To avoid that, count all iterations of the loop in s2idle_loop()
> as "sleep time" and adjust the clock for that on exit from
> suspend-to-idle.
> 
> [That also covers systems on which timekeeping is not suspended
>  by by s2idle_enter().]
> 
> Signed-off-by: Rafael J. Wysocki 
> ---
> 
> This is a replacement for https://patchwork.kernel.org/patch/10599209/
> 
> I decided to count the entire loop in s2idle_loop() as "sleep time" as the
> patch is then simpler and it also covers systems where timekeeping is not
> suspended in the final step of suspend-to-idle.
> 
> I dropped the "Fixes:" tag, because the monotonic clock delta problem
> has been present on the latter since the very introduction of "freeze"
> (as suspend-to-idle was referred to previously) and so this doesn't fix
> any particular later commits.
> 
> ---
>  kernel/power/suspend.c |   18 ++
>  1 file changed, 18 insertions(+)
> 
> Index: linux-pm/kernel/power/suspend.c
> ===
> --- linux-pm.orig/kernel/power/suspend.c
> +++ linux-pm/kernel/power/suspend.c
> @@ -109,8 +109,12 @@ static void s2idle_enter(void)
>  
>  static void s2idle_loop(void)
>  {
> + ktime_t start, delta;
> +
>   pm_pr_dbg("suspend-to-idle\n");
>  
> + start = ktime_get();
> +
>   for (;;) {
>   int error;
>  
> @@ -150,6 +154,20 @@ static void s2idle_loop(void)
>   pm_wakeup_clear(false);
>   }
>  
> + /*
> +  * If the monotonic clock difference between the start of the loop and
> +  * this point is too large, user space may get confused about whether or
> +  * not the system has been suspended and tasks may get killed by
> +  * watchdogs etc., so count the loop as "sleep time" to compensate for
> +  * that.
> +  */
> + delta = ktime_sub(ktime_get(), start);
> + if (ktime_to_ns(delta) > 0) {
> + struct timespec64 timespec64_delta = ktime_to_timespec64(delta);
> +
> + timekeeping_inject_sleeptime64(×pec64_delta);
> + }

But doesn't injecting sleep time here make monotonic clock too large by the 
amount of sleeptime? 
tick_freeze() / tick_unfreeze() already injects the sleeptime (otherwise delta 
would be 0).

> +
>   pm_pr_dbg("resume from suspend-to-idle\n");
>  }
>  
>

Re: [RFCv2 00/48] perf tools: Add threads to record command

2018-09-14 Thread Jiri Olsa

On Fri, Sep 14, 2018 at 10:26:53AM +0200, Jiri Olsa wrote:

SNIP

> > > The threaded monitoring currently can't monitor backward maps
> > > and there are probably more limitations which I haven't spotted
> > > yet.
> > > 
> > > So far I tested on laptop:
> > >   http://people.redhat.com/~jolsa/record_threads/test-4CPU.txt
> > > 
> > > and a one bigger server:
> > >   http://people.redhat.com/~jolsa/record_threads/test-208CPU.txt
> > > 
> > > I can see decrease in recorded LOST events, but both the benchmark
> > > and the monitoring must be carefully configured wrt:
> > >   - number of events (frequency)
> > >   - size of the memory maps
> > >   - size of events (callchains)
> > >   - final perf.data size
> > > 
> > > It's also available in:
> > >   git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
> > >   perf/record_threads
> > > 
> > > thoughts? ;-) thanks
> > > jirka
> > 
> > It is preferable to split into smaller pieces that bring 
> > some improvement proved by metrics numbers and ready for 
> > merging and upstream. Do we have more metrics than the 
> > data loss from trace AIO patches?
> 
> well the primary focus is to get more events in,
> so the LOST metric is the main one

actualy I was hoping, could you please run it through the same
tests as you do for AIO code on some huge server? 

thanks,
jirka

Re: [PATCH] cpufreq: Convert to using %pOFn instead of device_node.name

2018-09-14 Thread Rafael J. Wysocki

On Tuesday, August 28, 2018 3:52:15 AM CEST Rob Herring wrote:
> In preparation to remove the node name pointer from struct device_node,
> convert printf users to use the %pOFn format specifier.
> 
> Cc: "Rafael J. Wysocki" 
> Cc: Viresh Kumar 
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-samsung-...@vger.kernel.org
> Cc: linux...@vger.kernel.org
> Signed-off-by: Rob Herring 
> ---
>  drivers/cpufreq/s5pv210-cpufreq.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cpufreq/s5pv210-cpufreq.c 
> b/drivers/cpufreq/s5pv210-cpufreq.c
> index 5d31c2db12a3..dbecd7667db2 100644
> --- a/drivers/cpufreq/s5pv210-cpufreq.c
> +++ b/drivers/cpufreq/s5pv210-cpufreq.c
> @@ -611,8 +611,8 @@ static int s5pv210_cpufreq_probe(struct platform_device 
> *pdev)
>   for_each_compatible_node(np, NULL, "samsung,s5pv210-dmc") {
>   id = of_alias_get_id(np, "dmc");
>   if (id < 0 || id >= ARRAY_SIZE(dmc_base)) {
> - pr_err("%s: failed to get alias of dmc node '%s'\n",
> - __func__, np->name);
> + pr_err("%s: failed to get alias of dmc node '%pOFn'\n",
> + __func__, np);
>   of_node_put(np);
>   return id;
>   }
> 

Applied, thanks!

[PATCH v3 2/7] regulator: dt bindings: add BD71847 device-tree binding documentation

2018-09-14 Thread Matti Vaittinen

Add ROHM BD71847 Power Management IC regulator binding information to
device-tree binding documents.

Signed-off-by: Matti Vaittinen 
Reviewed-by: Rob Herring 
---
 .../devicetree/bindings/regulator/rohm,bd71837-regulator.txt | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git 
a/Documentation/devicetree/bindings/regulator/rohm,bd71837-regulator.txt 
b/Documentation/devicetree/bindings/regulator/rohm,bd71837-regulator.txt
index 76ead07072b1..4b98ca26e61a 100644
--- a/Documentation/devicetree/bindings/regulator/rohm,bd71837-regulator.txt
+++ b/Documentation/devicetree/bindings/regulator/rohm,bd71837-regulator.txt
@@ -1,7 +1,9 @@
-ROHM BD71837 Power Management Integrated Circuit (PMIC) regulator bindings
+ROHM BD71837 and BD71847 Power Management Integrated Circuit regulator bindings
 
 Required properties:
- - regulator-name: should be "buck1", ..., "buck8" and "ldo1", ..., "ldo7"
+ - regulator-name: should be "buck1", ..., "buck8" and "ldo1", ..., "ldo7" for
+   BD71837. For BD71847 names should be "buck1", ..., "buck6"
+  and "ldo1", ..., "ldo6"
 
 List of regulators provided by this controller. BD71837 regulators node
 should be sub node of the BD71837 MFD node. See BD71837 MFD bindings at
@@ -16,10 +18,14 @@ disabled by driver at startup. LDO5 and LDO6 are supplied 
by those and
 if they are disabled at startup the voltage monitoring for LDO5/LDO6 will
 cause PMIC to reset.
 
-The valid names for regulator nodes are:
+The valid names for BD71837 regulator nodes are:
 BUCK1, BUCK2, BUCK3, BUCK4, BUCK5, BUCK6, BUCK7, BUCK8
 LDO1, LDO2, LDO3, LDO4, LDO5, LDO6, LDO7
 
+The valid names for BD71847 regulator nodes are:
+BUCK1, BUCK2, BUCK3, BUCK4, BUCK5, BUCK6
+LDO1, LDO2, LDO3, LDO4, LDO5, LDO6
+
 Optional properties:
 - Any optional property defined in bindings/regulator/regulator.txt
 
-- 
2.14.3

[PATCH v3 3/7] mfd: dt bindings: add BD71847 device-tree binding documentation

2018-09-14 Thread Matti Vaittinen

Add ROHM BD71847 Power Management IC MFD binding information to
device-tree binding documents.

Signed-off-by: Matti Vaittinen 
Reviewed-by: Rob Herring 
Acked-for-MFD-by: Lee Jones 
---
 .../devicetree/bindings/mfd/rohm,bd71837-pmic.txt   | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/Documentation/devicetree/bindings/mfd/rohm,bd71837-pmic.txt 
b/Documentation/devicetree/bindings/mfd/rohm,bd71837-pmic.txt
index 3ca56fdb5ffe..a4b056761eaa 100644
--- a/Documentation/devicetree/bindings/mfd/rohm,bd71837-pmic.txt
+++ b/Documentation/devicetree/bindings/mfd/rohm,bd71837-pmic.txt
@@ -1,16 +1,17 @@
-* ROHM BD71837 Power Management Integrated Circuit bindings
+* ROHM BD71837 and BD71847 Power Management Integrated Circuit bindings
 
-BD71837MWV is a programmable Power Management IC for powering single-core,
-dual-core, and quad-core SoCs such as NXP-i.MX 8M. It is optimized for
-low BOM cost and compact solution footprint. It integrates 8 Buck
-egulators and 7 LDOs to provide all the power rails required by the SoC and
-the commonly used peripherals.
+BD71837MWV and BD71847MWV are programmable Power Management ICs for powering
+single-core, dual-core, and quad-core SoCs such as NXP-i.MX 8M. They are
+optimized for low BOM cost and compact solution footprint. BD71837MWV
+integrates 8 Buck regulators and 7 LDOs. BD71847MWV contains 6 Buck regulators
+and 6 LDOs.
 
-Datasheet for PMIC is available at:
+Datasheet for BD71837 is available at:
 https://www.rohm.com/datasheet/BD71837MWV/bd71837mwv-e
 
 Required properties:
- - compatible  : Should be "rohm,bd71837".
+ - compatible  : Should be "rohm,bd71837" for bd71837
+   "rohm,bd71847" for bd71847.
  - reg : I2C slave address.
  - interrupt-parent: Phandle to the parent interrupt controller.
  - interrupts  : The interrupt line the device is connected to.
-- 
2.14.3

[PATCH v3 4/7] regulator: Support regulators where voltage ranges are selectable

2018-09-14 Thread Matti Vaittinen

For example ROHM BD71837 and ROHM BD71847 Power management ICs have
regulators which provide multiple linear ranges. Ranges can be
selected by individual non contagious bit in vsel register. Add
regmap helper functions for selecting ranges.

Signed-off-by: Matti Vaittinen 
---
 drivers/regulator/core.c |   5 +
 drivers/regulator/helpers.c  | 232 +++
 include/linux/regulator/driver.h |  20 +++-
 3 files changed, 256 insertions(+), 1 deletion(-)

diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index 5f2218c67edf..b836a129288a 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -2788,6 +2788,11 @@ static int regulator_map_voltage(struct regulator_dev 
*rdev, int min_uV,
if (desc->ops->list_voltage == regulator_list_voltage_linear_range)
return regulator_map_voltage_linear_range(rdev, min_uV, max_uV);
 
+   if (desc->ops->list_voltage ==
+   regulator_list_voltage_pickable_linear_range)
+   return regulator_map_voltage_pickable_linear_range(rdev,
+   min_uV, max_uV);
+
return regulator_map_voltage_iterate(rdev, min_uV, max_uV);
 }
 
diff --git a/drivers/regulator/helpers.c b/drivers/regulator/helpers.c
index ef09021dc46e..5686a1335bd3 100644
--- a/drivers/regulator/helpers.c
+++ b/drivers/regulator/helpers.c
@@ -103,6 +103,128 @@ int regulator_disable_regmap(struct regulator_dev *rdev)
 }
 EXPORT_SYMBOL_GPL(regulator_disable_regmap);
 
+static int regulator_range_selector_to_index(struct regulator_dev *rdev,
+unsigned int rval)
+{
+   int i;
+
+   if (!rdev->desc->linear_range_selectors)
+   return -EINVAL;
+
+   rval &= rdev->desc->vsel_range_mask;
+
+   for (i = 0; i < rdev->desc->n_linear_ranges; i++) {
+   if (rdev->desc->linear_range_selectors[i] == rval)
+   return i;
+   }
+   return -EINVAL;
+}
+
+/**
+ * regulator_get_voltage_sel_pickable_regmap - pickable range get_voltage_sel
+ *
+ * @rdev: regulator to operate on
+ *
+ * Regulators that use regmap for their register I/O and use pickable
+ * ranges can set the vsel_reg, vsel_mask, vsel_range_reg and vsel_range_mask
+ * fields in their descriptor and then use this as their get_voltage_vsel
+ * operation, saving some code.
+ */
+int regulator_get_voltage_sel_pickable_regmap(struct regulator_dev *rdev)
+{
+   unsigned int r_val;
+   int range;
+   unsigned int val;
+   int ret, i;
+   unsigned int voltages_in_range = 0;
+
+   if (!rdev->desc->linear_ranges)
+   return -EINVAL;
+
+   ret = regmap_read(rdev->regmap, rdev->desc->vsel_reg, &val);
+   if (ret != 0)
+   return ret;
+
+   ret = regmap_read(rdev->regmap, rdev->desc->vsel_range_reg, &r_val);
+   if (ret != 0)
+   return ret;
+
+   val &= rdev->desc->vsel_mask;
+   val >>= ffs(rdev->desc->vsel_mask) - 1;
+
+   range = regulator_range_selector_to_index(rdev, r_val);
+   if (range < 0)
+   return -EINVAL;
+
+   for (i = 0; i < range; i++)
+   voltages_in_range += (rdev->desc->linear_ranges[i].max_sel -
+rdev->desc->linear_ranges[i].min_sel) + 1;
+
+   return val + voltages_in_range;
+}
+EXPORT_SYMBOL_GPL(regulator_get_voltage_sel_pickable_regmap);
+
+/**
+ * regulator_set_voltage_sel_pickable_regmap - pickable range set_voltage_sel
+ *
+ * @rdev: regulator to operate on
+ * @sel: Selector to set
+ *
+ * Regulators that use regmap for their register I/O and use pickable
+ * ranges can set the vsel_reg, vsel_mask, vsel_range_reg and vsel_range_mask
+ * fields in their descriptor and then use this as their set_voltage_vsel
+ * operation, saving some code.
+ */
+int regulator_set_voltage_sel_pickable_regmap(struct regulator_dev *rdev,
+ unsigned int sel)
+{
+   unsigned int range;
+   int ret, i;
+   unsigned int voltages_in_range = 0;
+
+   for (i = 0; i < rdev->desc->n_linear_ranges; i++) {
+   voltages_in_range = (rdev->desc->linear_ranges[i].max_sel -
+rdev->desc->linear_ranges[i].min_sel) + 1;
+   if (sel < voltages_in_range)
+   break;
+   sel -= voltages_in_range;
+   }
+
+   if (i == rdev->desc->n_linear_ranges)
+   return -EINVAL;
+
+   sel <<= ffs(rdev->desc->vsel_mask) - 1;
+   sel += rdev->desc->linear_ranges[i].min_sel;
+
+   range = rdev->desc->linear_range_selectors[i];
+
+   if (rdev->desc->vsel_reg == rdev->desc->vsel_range_reg) {
+   ret = regmap_update_bits(rdev->regmap,
+rdev->desc->vsel_reg,
+rdev->desc->vsel_range_mask |
+

[PATCH v3 5/7] regulator/mfd: bd718xx: rename bd71837/bd71847 common instances

2018-09-14 Thread Matti Vaittinen

Rename parts of code that support both BD71837 and BD71847 to BD718XX.

Signed-off-by: Matti Vaittinen 
Acked-for-MFD-by: Lee Jones 
---
 drivers/mfd/rohm-bd718x7.c| 98 +--
 drivers/regulator/bd71837-regulator.c | 26 +-
 include/linux/mfd/rohm-bd718x7.h  | 64 +++
 3 files changed, 94 insertions(+), 94 deletions(-)

diff --git a/drivers/mfd/rohm-bd718x7.c b/drivers/mfd/rohm-bd718x7.c
index ce5aa5cd3545..161c8aac6d86 100644
--- a/drivers/mfd/rohm-bd718x7.c
+++ b/drivers/mfd/rohm-bd718x7.c
@@ -2,9 +2,9 @@
 //
 // Copyright (C) 2018 ROHM Semiconductors
 //
-// ROHM BD71837MWV PMIC driver
+// ROHM BD71837MWV and BD71847MWV PMIC driver
 //
-// Datasheet available from
+// Datasheet for BD71837MWV available from
 // https://www.rohm.com/datasheet/BD71837MWV/bd71837mwv-e
 
 #include 
@@ -30,7 +30,7 @@ static struct gpio_keys_platform_data bd718xx_powerkey_data = 
{
.name = "bd718xx-pwrkey",
 };
 
-static struct mfd_cell bd71837_mfd_cells[] = {
+static struct mfd_cell bd718xx_mfd_cells[] = {
{
.name = "gpio-keys",
.platform_data = &bd718xx_powerkey_data,
@@ -40,20 +40,20 @@ static struct mfd_cell bd71837_mfd_cells[] = {
{ .name = "bd718xx-pmic", },
 };
 
-static const struct regmap_irq bd71837_irqs[] = {
-   REGMAP_IRQ_REG(BD71837_INT_SWRST, 0, BD71837_INT_SWRST_MASK),
-   REGMAP_IRQ_REG(BD71837_INT_PWRBTN_S, 0, BD71837_INT_PWRBTN_S_MASK),
-   REGMAP_IRQ_REG(BD71837_INT_PWRBTN_L, 0, BD71837_INT_PWRBTN_L_MASK),
-   REGMAP_IRQ_REG(BD71837_INT_PWRBTN, 0, BD71837_INT_PWRBTN_MASK),
-   REGMAP_IRQ_REG(BD71837_INT_WDOG, 0, BD71837_INT_WDOG_MASK),
-   REGMAP_IRQ_REG(BD71837_INT_ON_REQ, 0, BD71837_INT_ON_REQ_MASK),
-   REGMAP_IRQ_REG(BD71837_INT_STBY_REQ, 0, BD71837_INT_STBY_REQ_MASK),
+static const struct regmap_irq bd718xx_irqs[] = {
+   REGMAP_IRQ_REG(BD718XX_INT_SWRST, 0, BD718XX_INT_SWRST_MASK),
+   REGMAP_IRQ_REG(BD718XX_INT_PWRBTN_S, 0, BD718XX_INT_PWRBTN_S_MASK),
+   REGMAP_IRQ_REG(BD718XX_INT_PWRBTN_L, 0, BD718XX_INT_PWRBTN_L_MASK),
+   REGMAP_IRQ_REG(BD718XX_INT_PWRBTN, 0, BD718XX_INT_PWRBTN_MASK),
+   REGMAP_IRQ_REG(BD718XX_INT_WDOG, 0, BD718XX_INT_WDOG_MASK),
+   REGMAP_IRQ_REG(BD718XX_INT_ON_REQ, 0, BD718XX_INT_ON_REQ_MASK),
+   REGMAP_IRQ_REG(BD718XX_INT_STBY_REQ, 0, BD718XX_INT_STBY_REQ_MASK),
 };
 
-static struct regmap_irq_chip bd71837_irq_chip = {
-   .name = "bd71837-irq",
-   .irqs = bd71837_irqs,
-   .num_irqs = ARRAY_SIZE(bd71837_irqs),
+static struct regmap_irq_chip bd718xx_irq_chip = {
+   .name = "bd718xx-irq",
+   .irqs = bd718xx_irqs,
+   .num_irqs = ARRAY_SIZE(bd718xx_irqs),
.num_regs = 1,
.irq_reg_stride = 1,
.status_base = BD718XX_REG_IRQ,
@@ -73,7 +73,7 @@ static const struct regmap_access_table volatile_regs = {
.n_yes_ranges = 1,
 };
 
-static const struct regmap_config bd71837_regmap_config = {
+static const struct regmap_config bd718xx_regmap_config = {
.reg_bits = 8,
.val_bits = 8,
.volatile_table = &volatile_regs,
@@ -81,10 +81,10 @@ static const struct regmap_config bd71837_regmap_config = {
.cache_type = REGCACHE_RBTREE,
 };
 
-static int bd71837_i2c_probe(struct i2c_client *i2c,
+static int bd718xx_i2c_probe(struct i2c_client *i2c,
const struct i2c_device_id *id)
 {
-   struct bd71837 *bd71837;
+   struct bd718xx *bd718xx;
int ret;
 
if (!i2c->irq) {
@@ -92,33 +92,33 @@ static int bd71837_i2c_probe(struct i2c_client *i2c,
return -EINVAL;
}
 
-   bd71837 = devm_kzalloc(&i2c->dev, sizeof(struct bd71837), GFP_KERNEL);
+   bd718xx = devm_kzalloc(&i2c->dev, sizeof(struct bd718xx), GFP_KERNEL);
 
-   if (!bd71837)
+   if (!bd718xx)
return -ENOMEM;
 
-   bd71837->chip_irq = i2c->irq;
-   bd71837->chip_type = (unsigned int)(uintptr_t)
+   bd718xx->chip_irq = i2c->irq;
+   bd718xx->chip_type = (unsigned int)
of_device_get_match_data(&i2c->dev);
-   bd71837->dev = &i2c->dev;
-   dev_set_drvdata(&i2c->dev, bd71837);
+   bd718xx->dev = &i2c->dev;
+   dev_set_drvdata(&i2c->dev, bd718xx);
 
-   bd71837->regmap = devm_regmap_init_i2c(i2c, &bd71837_regmap_config);
-   if (IS_ERR(bd71837->regmap)) {
+   bd718xx->regmap = devm_regmap_init_i2c(i2c, &bd718xx_regmap_config);
+   if (IS_ERR(bd718xx->regmap)) {
dev_err(&i2c->dev, "regmap initialization failed\n");
-   return PTR_ERR(bd71837->regmap);
+   return PTR_ERR(bd718xx->regmap);
}
 
-   ret = devm_regmap_add_irq_chip(&i2c->dev, bd71837->regmap,
-  bd71837->chip_irq, IRQF_ONESHOT, 0,
-  &bd71837_irq_chip, &bd71837->irq_data);
+   ret = devm_regmap_add_irq_chip(&i2c->dev

Re: [PATCH 2/2] clk: meson-g12a: Add AO Clock controller driver

2018-09-14 Thread Jerome Brunet

On Tue, 2018-08-28 at 11:35 +0800, Jian Hu wrote:
> Hi: Jerome
> 
> On 2018/8/27 21:07, Jerome Brunet wrote:
> > On Fri, 2018-08-24 at 21:34 +0800, Jian Hu wrote:
> > > > 
> > > 
> > > I am confued about aoclk81's parent clocks.
> > > 
> > > I can not get the example of axg audio clock driver, Could you provide
> > > the link? Had it merged into clk-meson.git?
> > 
> > Yes and mainline as well : drivers/clk/meson/axg-audio.c
> > 
> > Basically this driver is creating bypass input clocks (audio_pclk, 
> > mst_in[0-9],
> > etc...) .
> > 
> > This allows to collect input clocks from DT (like any consumer should) will
> > keeping constant in the controller clock tree.
> > 
> > > From what I've seen of your controller drivers, the EE controller should 
> > > have
> > 
> > one input, the AO should have 3.
> > 
> > 
> > 
> > .
> > 
> 
> I still can not get the example meaning in axg audio driver.

here is one specific example:


https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/clk/meson/axg-audio.c?h=v4.19-rc1#n785


> 
> In 26 page of A113D_Datasheet V0.7 20170725-Baylibre.pdf,We can see the 
> aoclk81 has two parents. clk81 and ao_slow_clk. I can not get 3 parents.
> 
>   clk81|\
> ---| \aoclk81
> src0 |\   |  |---
> -| \ ao_slow_clk  |  |
>   |  |-| /
> -| /  |/
>   src1|/
> 
> 
> src0 is from xtal, if can generate 32k clock, but it is never used.
> src1 is from gpio clock, It is never used. If necessary， the ao_slow_clk 
> maybe described.
> 
> So why aoclk81 has 3 parents?

I never said that. I said your clock controller driver should have 3 input
clocks. These clock are the one used by the controller but not produced by it:
for the AO controller xtal, ao_xtal, and ee controller clk81.

Re: [PATCH -next] ASoC: qdsp6: q6asm-dai: remove duplicated include from q6asm-dai.c

2018-09-14 Thread Srinivas Kandagatla





On 14/09/18 02:36, YueHaibing wrote:

Remove duplicated include.

Signed-off-by: YueHaibing 

Acked-by: Srinivas Kandagatla 


---
  sound/soc/qcom/qdsp6/q6asm-dai.c | 1 -
  1 file changed, 1 deletion(-)

diff --git a/sound/soc/qcom/qdsp6/q6asm-dai.c b/sound/soc/qcom/qdsp6/q6asm-dai.c
index c75fab3..c3806d7 100644
--- a/sound/soc/qcom/qdsp6/q6asm-dai.c
+++ b/sound/soc/qcom/qdsp6/q6asm-dai.c
@@ -8,7 +8,6 @@
  #include 
  #include 
  #include 
-#include 
  #include 
  #include 
  #include

Re: [PATCH] cpufreq: remove unnecessary unlikely()

2018-09-14 Thread Rafael J. Wysocki

On Friday, September 7, 2018 6:09:55 PM CEST Igor Stoppa wrote:
> WARN_ON() already contains an unlikely(), so it's not necessary to wrap it
> into another.
> 
> Signed-off-by: Igor Stoppa 
> Cc: Srivatsa S. Bhat 
> Cc: "Rafael J. Wysocki" 
> Cc: linux...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  drivers/cpufreq/cpufreq.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index f53fb41efb7b..7aa3dcad2175 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -403,7 +403,7 @@ EXPORT_SYMBOL_GPL(cpufreq_freq_transition_begin);
>  void cpufreq_freq_transition_end(struct cpufreq_policy *policy,
>   struct cpufreq_freqs *freqs, int transition_failed)
>  {
> - if (unlikely(WARN_ON(!policy->transition_ongoing)))
> + if (WARN_ON(!policy->transition_ongoing))
>   return;
>  
>   cpufreq_notify_post_transition(policy, freqs, transition_failed);
> 

Applied, thanks!

[PATCH net-next 5/5] net: phy: mscc: remove unneeded temporary variable

2018-09-14 Thread Quentin Schulz

Here, the rc variable is either used only for the condition right after
the assignment or right before being used as the return value of the
function it's being used in.

So let's remove this unneeded temporary variable whenever possible.

Signed-off-by: Quentin Schulz 
---
 drivers/net/phy/mscc.c | 17 +
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/drivers/net/phy/mscc.c b/drivers/net/phy/mscc.c
index efa9352..24f4754 100644
--- a/drivers/net/phy/mscc.c
+++ b/drivers/net/phy/mscc.c
@@ -199,10 +199,7 @@ static const struct vsc8531_edge_rate_table edge_table[] = 
{
 
 static int vsc85xx_phy_page_set(struct phy_device *phydev, u16 page)
 {
-   int rc;
-
-   rc = phy_write(phydev, MSCC_EXT_PAGE_ACCESS, page);
-   return rc;
+   return phy_write(phydev, MSCC_EXT_PAGE_ACCESS, page);
 }
 
 static int vsc85xx_get_sset_count(struct phy_device *phydev)
@@ -504,7 +501,7 @@ static void vsc85xx_wol_get(struct phy_device *phydev,
 static int vsc85xx_edge_rate_magic_get(struct phy_device *phydev)
 {
u32 vdd, sd;
-   int rc, i, j;
+   int i, j;
struct device *dev = &phydev->mdio.dev;
struct device_node *of_node = dev->of_node;
u8 sd_array_size = ARRAY_SIZE(edge_table[0].slowdown);
@@ -512,12 +509,10 @@ static int vsc85xx_edge_rate_magic_get(struct phy_device 
*phydev)
if (!of_node)
return -ENODEV;
 
-   rc = of_property_read_u32(of_node, "vsc8531,vddmac", &vdd);
-   if (rc != 0)
+   if (of_property_read_u32(of_node, "vsc8531,vddmac", &vdd))
vdd = MSCC_VDDMAC_3300;
 
-   rc = of_property_read_u32(of_node, "vsc8531,edge-slowdown", &sd);
-   if (rc != 0)
+   if (of_property_read_u32(of_node, "vsc8531,edge-slowdown", &sd))
sd = 0;
 
for (i = 0; i < ARRAY_SIZE(edge_table); i++)
@@ -762,9 +757,7 @@ static int vsc85xx_config_init(struct phy_device *phydev)
return rc;
}
 
-   rc = genphy_config_init(phydev);
-
-   return rc;
+   return genphy_config_init(phydev);
 }
 
 static int vsc85xx_ack_interrupt(struct phy_device *phydev)
-- 
git-series 0.9.1

[PATCH 2/3] powerpc: Add system call table generation support

2018-09-14 Thread Firoz Khan

The system call tables are in different format in all
architecture and it will be difficult to manually add or
modify the system calls in the respective files. To make
it easy by keeping a script and which'll generate the
header file and syscall table file so this change will
unify them across all architectures.

The system call table generation script is added in
syscalls directory which contain the script to generate
both uapi header file system call table generation file
and syscall_32/64.tbl file which'll be the input for the
scripts.

syscall_32/64.tbl contains the list of available system calls
along with system call number and corresponding entry point.
Add a new system call in this architecture will be possible
by adding new entry in the syscall_32/64.tbl file.

Adding a new table entry consisting of:
- System call number.
- ABI.
- System call name.
- Entry point name.
- Compat entry name, if required.

syscallhdr.sh and syscalltbl.sh will generate uapi header-
unistd_32/64.h and syscall_table_32/64/c32.h files respectively.
File syscall_table_32/64/c32.h is included by syscall.S - the
real system call table. Both .sh files will parse the content
syscall.tbl to generate the header and table files.

ARM, s390 and x86 architecuture does have the similar support.
I leverage their implementation to come up with a generic
solution.

Signed-off-by: Firoz Khan 
---
 arch/powerpc/kernel/syscalls/Makefile   |  51 
 arch/powerpc/kernel/syscalls/syscall_32.tbl | 378 
 arch/powerpc/kernel/syscalls/syscall_64.tbl | 372 +++
 arch/powerpc/kernel/syscalls/syscallhdr.sh  |  37 +++
 arch/powerpc/kernel/syscalls/syscalltbl.sh  |  38 +++
 5 files changed, 876 insertions(+)
 create mode 100644 arch/powerpc/kernel/syscalls/Makefile
 create mode 100644 arch/powerpc/kernel/syscalls/syscall_32.tbl
 create mode 100644 arch/powerpc/kernel/syscalls/syscall_64.tbl
 create mode 100644 arch/powerpc/kernel/syscalls/syscallhdr.sh
 create mode 100644 arch/powerpc/kernel/syscalls/syscalltbl.sh

diff --git a/arch/powerpc/kernel/syscalls/Makefile 
b/arch/powerpc/kernel/syscalls/Makefile
new file mode 100644
index 000..0c87acb
--- /dev/null
+++ b/arch/powerpc/kernel/syscalls/Makefile
@@ -0,0 +1,51 @@
+# SPDX-License-Identifier: GPL-2.0
+out := arch/$(SRCARCH)/include/generated/asm
+uapi := arch/$(SRCARCH)/include/generated/uapi/asm
+
+_dummy := $(shell [ -d '$(uapi)' ] || mkdir -p '$(uapi)') \
+ $(shell [ -d '$(out)' ] || mkdir -p '$(out)')
+
+syscall32 := $(srctree)/$(src)/syscall_32.tbl
+syscall64 := $(srctree)/$(src)/syscall_64.tbl
+
+syshdr := $(srctree)/$(src)/syscallhdr.sh
+systbl := $(srctree)/$(src)/syscalltbl.sh
+
+quiet_cmd_syshdr = SYSHDR  $@
+  cmd_syshdr = $(CONFIG_SHELL) '$(syshdr)' '$<' '$@'  \
+  '$(syshdr_abi_$(basetarget))'  \
+  '$(syshdr_pfx_$(basetarget))'  \
+  '$(syshdr_offset_$(basetarget))'
+
+quiet_cmd_systbl = SYSTBL  $@
+  cmd_systbl = $(CONFIG_SHELL) '$(systbl)' '$<' '$@'  \
+  '$(systbl_abi_$(basetarget))'
+
+$(uapi)/unistd_32.h: $(syscall32) $(syshdr)
+   $(call if_changed,syshdr)
+
+$(uapi)/unistd_64.h: $(syscall64) $(syshdr)
+   $(call if_changed,syshdr)
+
+systbl_abi_syscall_table_32 := 32
+$(out)/syscall_table_32.h: $(syscall32) $(systbl)
+   $(call if_changed,systbl)
+
+systbl_abi_syscall_table_64 := 64
+$(out)/syscall_table_64.h: $(syscall64) $(systbl)
+   $(call if_changed,systbl)
+
+systbl_abi_syscall_table_c32 := c32
+$(out)/syscall_table_c32.h: $(syscall32) $(systbl)
+   $(call if_changed,systbl)
+
+uapisyshdr-y   += unistd_32.h unistd_64.h
+syshdr-y   += syscall_table_32.h syscall_table_64.h \
+  syscall_table_c32.h
+
+targets+= $(uapisyshdr-y) $(syshdr-y)
+
+PHONY += all
+all: $(addprefix $(uapi)/,$(uapisyshdr-y))
+all: $(addprefix $(out)/,$(syshdr-y))
+   @:
diff --git a/arch/powerpc/kernel/syscalls/syscall_32.tbl 
b/arch/powerpc/kernel/syscalls/syscall_32.tbl
new file mode 100644
index 000..50c419c
--- /dev/null
+++ b/arch/powerpc/kernel/syscalls/syscall_32.tbl
@@ -0,0 +1,378 @@
+#
+# 32-bit system call numbers and entry vectors
+#
+# The format is:
+# 
+#
+# The abi is always "common" for this file.
+#
+0   common  restart_syscall sys_restart_syscall
 
+1   common  exitsys_exit   
 
+2   common  forkppc_fork   
 
+3   common  readsys_read   
 
+4   common  write   sys_write  
 
+5   common  opensys_open   
 compat_sys_open
+6   common  close   sys_close  
 
+

[PATCH v3 7/7] regulator: bd718xx: rename bd71837 to 718xx

2018-09-14 Thread Matti Vaittinen

rename bd71837-regulator.c to bd718x7-regulator.c to reflect the
fact that also BD71847 is now supported by the driver.

Signed-off-by: Matti Vaittinen 
---
 drivers/regulator/Kconfig  | 2 +-
 drivers/regulator/Makefile | 2 +-
 drivers/regulator/{bd71837-regulator.c => bd718x7-regulator.c} | 0
 3 files changed, 2 insertions(+), 2 deletions(-)
 rename drivers/regulator/{bd71837-regulator.c => bd718x7-regulator.c} (100%)

diff --git a/drivers/regulator/Kconfig b/drivers/regulator/Kconfig
index 329cdd33ed62..6e96ef1bd74a 100644
--- a/drivers/regulator/Kconfig
+++ b/drivers/regulator/Kconfig
@@ -189,7 +189,7 @@ config REGULATOR_BD718XX
  and LDO regulators.
 
  This driver can also be built as a module. If so, the module
- will be called bd71837-regulator.
+ will be called bd718x7-regulator.
 
 config REGULATOR_BD9571MWV
tristate "ROHM BD9571MWV Regulators"
diff --git a/drivers/regulator/Makefile b/drivers/regulator/Makefile
index 801d9a34a203..eac4d794f3b8 100644
--- a/drivers/regulator/Makefile
+++ b/drivers/regulator/Makefile
@@ -27,7 +27,7 @@ obj-$(CONFIG_REGULATOR_AS3711) += as3711-regulator.o
 obj-$(CONFIG_REGULATOR_AS3722) += as3722-regulator.o
 obj-$(CONFIG_REGULATOR_AXP20X) += axp20x-regulator.o
 obj-$(CONFIG_REGULATOR_BCM590XX) += bcm590xx-regulator.o
-obj-$(CONFIG_REGULATOR_BD718XX) += bd71837-regulator.o
+obj-$(CONFIG_REGULATOR_BD718XX) += bd718x7-regulator.o
 obj-$(CONFIG_REGULATOR_BD9571MWV) += bd9571mwv-regulator.o
 obj-$(CONFIG_REGULATOR_DA903X) += da903x.o
 obj-$(CONFIG_REGULATOR_DA9052) += da9052-regulator.o
diff --git a/drivers/regulator/bd71837-regulator.c 
b/drivers/regulator/bd718x7-regulator.c
similarity index 100%
rename from drivers/regulator/bd71837-regulator.c
rename to drivers/regulator/bd718x7-regulator.c
-- 
2.14.3

[PATCH net-next 4/5] net: phy: mscc: shorten `x != 0` condition to `x`

2018-09-14 Thread Quentin Schulz

`if (x != 0)` is basically a more verbose version of `if (x)` so let's
use the latter so it's consistent throughout the whole driver.

Signed-off-by: Quentin Schulz 
---
 drivers/net/phy/mscc.c | 30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/drivers/net/phy/mscc.c b/drivers/net/phy/mscc.c
index 734d9fb..efa9352 100644
--- a/drivers/net/phy/mscc.c
+++ b/drivers/net/phy/mscc.c
@@ -311,11 +311,11 @@ static int vsc85xx_mdix_set(struct phy_device *phydev, u8 
mdix)
 DISABLE_HP_AUTO_MDIX_MASK);
}
rc = phy_write(phydev, MSCC_PHY_BYPASS_CONTROL, reg_val);
-   if (rc != 0)
+   if (rc)
return rc;
 
rc = vsc85xx_phy_page_set(phydev, MSCC_PHY_PAGE_EXTENDED);
-   if (rc != 0)
+   if (rc)
return rc;
 
reg_val = phy_read(phydev, MSCC_PHY_EXT_MODE_CNTL);
@@ -325,11 +325,11 @@ static int vsc85xx_mdix_set(struct phy_device *phydev, u8 
mdix)
else if (mdix == ETH_TP_MDI_X)
reg_val |= FORCE_MDI_CROSSOVER_MDIX;
rc = phy_write(phydev, MSCC_PHY_EXT_MODE_CNTL, reg_val);
-   if (rc != 0)
+   if (rc)
return rc;
 
rc = vsc85xx_phy_page_set(phydev, MSCC_PHY_PAGE_STANDARD);
-   if (rc != 0)
+   if (rc)
return rc;
 
return genphy_restart_aneg(phydev);
@@ -341,7 +341,7 @@ static int vsc85xx_downshift_get(struct phy_device *phydev, 
u8 *count)
u16 reg_val;
 
rc = vsc85xx_phy_page_set(phydev, MSCC_PHY_PAGE_EXTENDED);
-   if (rc != 0)
+   if (rc)
goto out;
 
reg_val = phy_read(phydev, MSCC_PHY_ACTIPHY_CNTL);
@@ -373,14 +373,14 @@ static int vsc85xx_downshift_set(struct phy_device 
*phydev, u8 count)
}
 
rc = vsc85xx_phy_page_set(phydev, MSCC_PHY_PAGE_EXTENDED);
-   if (rc != 0)
+   if (rc)
goto out;
 
reg_val = phy_read(phydev, MSCC_PHY_ACTIPHY_CNTL);
reg_val &= ~(DOWNSHIFT_CNTL_MASK);
reg_val |= count;
rc = phy_write(phydev, MSCC_PHY_ACTIPHY_CNTL, reg_val);
-   if (rc != 0)
+   if (rc)
goto out;
 
rc = vsc85xx_phy_page_set(phydev, MSCC_PHY_PAGE_STANDARD);
@@ -401,7 +401,7 @@ static int vsc85xx_wol_set(struct phy_device *phydev,
 
mutex_lock(&phydev->lock);
rc = vsc85xx_phy_page_set(phydev, MSCC_PHY_PAGE_EXTENDED_2);
-   if (rc != 0)
+   if (rc)
goto out_unlock;
 
if (wol->wolopts & WAKE_MAGIC) {
@@ -439,7 +439,7 @@ static int vsc85xx_wol_set(struct phy_device *phydev,
phy_write(phydev, MSCC_PHY_WOL_MAC_CONTROL, reg_val);
 
rc = vsc85xx_phy_page_set(phydev, MSCC_PHY_PAGE_STANDARD);
-   if (rc != 0)
+   if (rc)
goto out_unlock;
 
if (wol->wolopts & WAKE_MAGIC) {
@@ -447,14 +447,14 @@ static int vsc85xx_wol_set(struct phy_device *phydev,
reg_val = phy_read(phydev, MII_VSC85XX_INT_MASK);
reg_val |= MII_VSC85XX_INT_MASK_WOL;
rc = phy_write(phydev, MII_VSC85XX_INT_MASK, reg_val);
-   if (rc != 0)
+   if (rc)
goto out_unlock;
} else {
/* Disable the WOL interrupt */
reg_val = phy_read(phydev, MII_VSC85XX_INT_MASK);
reg_val &= (~MII_VSC85XX_INT_MASK_WOL);
rc = phy_write(phydev, MII_VSC85XX_INT_MASK, reg_val);
-   if (rc != 0)
+   if (rc)
goto out_unlock;
}
/* Clear WOL iterrupt status */
@@ -595,13 +595,13 @@ static int vsc85xx_edge_rate_cntl_set(struct phy_device 
*phydev, u8 edge_rate)
 
mutex_lock(&phydev->lock);
rc = vsc85xx_phy_page_set(phydev, MSCC_PHY_PAGE_EXTENDED_2);
-   if (rc != 0)
+   if (rc)
goto out_unlock;
reg_val = phy_read(phydev, MSCC_PHY_WOL_MAC_CONTROL);
reg_val &= ~(EDGE_RATE_CNTL_MASK);
reg_val |= (edge_rate << EDGE_RATE_CNTL_POS);
rc = phy_write(phydev, MSCC_PHY_WOL_MAC_CONTROL, reg_val);
-   if (rc != 0)
+   if (rc)
goto out_unlock;
rc = vsc85xx_phy_page_set(phydev, MSCC_PHY_PAGE_STANDARD);
 
@@ -636,7 +636,7 @@ static int vsc85xx_mac_if_set(struct phy_device *phydev,
goto out_unlock;
}
rc = phy_write(phydev, MSCC_PHY_EXT_PHY_CNTL_1, reg_val);
-   if (rc != 0)
+   if (rc)
goto out_unlock;
 
rc = genphy_soft_reset(phydev);
@@ -655,7 +655,7 @@ static int vsc85xx_default_config(struct phy_device *phydev)
phydev->mdix_ctrl = ETH_TP_MDI_AUTO;
mutex_lock(&phydev->lock);
rc = vsc85xx_phy_page_set(phydev, MSCC_PHY_PAGE_EXTENDED_2);
-   if (rc != 0)
+   if (rc)
goto out_unlock;
 
reg_val = phy_read(phydev, MSCC_PHY_RGMII_CNTL);
-- 
git-series 0.9.1

[PATCH v3 6/7] regulator: bd718XX use pickable ranges

2018-09-14 Thread Matti Vaittinen

Few regulators in BD71837 and BD71847 can output voltages from
different voltage ranges. Register interface is arranged so that
used range is selected by toggling bits which are not next to actual
voltage selection bits. Then the voltage inside selected range is
determined by voltage selection bits (as usual). Support BD71837
and BD71847 selectible range voltages using new pickable ranges
helpers.

Signed-off-by: Matti Vaittinen 
---
 drivers/regulator/bd71837-regulator.c | 153 +++---
 include/linux/mfd/rohm-bd718x7.h  |  32 +++
 2 files changed, 143 insertions(+), 42 deletions(-)

diff --git a/drivers/regulator/bd71837-regulator.c 
b/drivers/regulator/bd71837-regulator.c
index 8e89334f94d1..d2522d4e1505 100644
--- a/drivers/regulator/bd71837-regulator.c
+++ b/drivers/regulator/bd71837-regulator.c
@@ -78,6 +78,34 @@ static int bd718xx_set_voltage_sel_restricted(struct 
regulator_dev *rdev,
return regulator_set_voltage_sel_regmap(rdev, sel);
 }
 
+static int bd718xx_set_voltage_sel_pickable_restricted(
+   struct regulator_dev *rdev, unsigned int sel)
+{
+   if (regulator_is_enabled_regmap(rdev))
+   return -EBUSY;
+
+   return regulator_set_voltage_sel_pickable_regmap(rdev, sel);
+}
+
+static struct regulator_ops bd718xx_pickable_range_ldo_ops = {
+   .enable = regulator_enable_regmap,
+   .disable = regulator_disable_regmap,
+   .is_enabled = regulator_is_enabled_regmap,
+   .list_voltage = regulator_list_voltage_pickable_linear_range,
+   .set_voltage_sel = bd718xx_set_voltage_sel_pickable_restricted,
+   .get_voltage_sel = regulator_get_voltage_sel_pickable_regmap,
+};
+
+static struct regulator_ops bd718xx_pickable_range_buck_ops = {
+   .enable = regulator_enable_regmap,
+   .disable = regulator_disable_regmap,
+   .is_enabled = regulator_is_enabled_regmap,
+   .list_voltage = regulator_list_voltage_pickable_linear_range,
+   .set_voltage_sel = bd718xx_set_voltage_sel_pickable_restricted,
+   .get_voltage_sel = regulator_get_voltage_sel_pickable_regmap,
+   .set_voltage_time_sel = regulator_set_voltage_time_sel,
+};
+
 static struct regulator_ops bd718xx_ldo_regulator_ops = {
.enable = regulator_enable_regmap,
.disable = regulator_disable_regmap,
@@ -139,24 +167,61 @@ static const struct regulator_linear_range 
bd718xx_dvs_buck_volts[] = {
 
 /*
  * BD71837 BUCK5
+ * 0.7V to 1.35V  (range 0)
+ * and
+ * 0.675 to 1.325 (range 1)
+ */
+static const struct regulator_linear_range bd71837_buck5_volts[] = {
+   /* Ranges when VOLT_SEL bit is 0 */
+   REGULATOR_LINEAR_RANGE(70, 0x00, 0x03, 10),
+   REGULATOR_LINEAR_RANGE(105, 0x04, 0x05, 5),
+   REGULATOR_LINEAR_RANGE(120, 0x06, 0x07, 15),
+   /* Ranges when VOLT_SEL bit is 1  */
+   REGULATOR_LINEAR_RANGE(675000, 0x0, 0x3, 10),
+   REGULATOR_LINEAR_RANGE(1025000, 0x4, 0x5, 5),
+   REGULATOR_LINEAR_RANGE(1175000, 0x6, 0x7, 15),
+};
+
+/*
+ * Range selector for first 3 linear ranges is 0x0
+ * and 0x1 for last 3 ranges.
+ */
+static const unsigned int bd71837_buck5_volt_range_sel[] = {
+   0x0, 0x0, 0x0, 0x80, 0x80, 0x80
+};
+
+/*
  * BD71847 BUCK3
- * 0.7V to 1.35V ()
  */
-static const struct regulator_linear_range bd718xx_1st_nodvs_buck_volts[] = {
+static const struct regulator_linear_range bd71847_buck3_volts[] = {
+   /* Ranges when VOLT_SEL bits are 00 */
REGULATOR_LINEAR_RANGE(70, 0x00, 0x03, 10),
REGULATOR_LINEAR_RANGE(105, 0x04, 0x05, 5),
REGULATOR_LINEAR_RANGE(120, 0x06, 0x07, 15),
+   /* Ranges when VOLT_SEL bits are 01 */
+   REGULATOR_LINEAR_RANGE(55, 0x0, 0x7, 5),
+   /* Ranges when VOLT_SEL bits are 11 */
+   REGULATOR_LINEAR_RANGE(675000, 0x0, 0x3, 10),
+   REGULATOR_LINEAR_RANGE(1025000, 0x4, 0x5, 5),
+   REGULATOR_LINEAR_RANGE(1175000, 0x6, 0x7, 15),
+};
+
+static const unsigned int bd71847_buck3_volt_range_sel[] = {
+   0x0, 0x0, 0x0, 0x40, 0x80, 0x80, 0x80
 };
 
-static const struct regulator_linear_range bd71847_buck4_voltage_ranges[] = {
+static const struct regulator_linear_range bd71847_buck4_volts[] = {
REGULATOR_LINEAR_RANGE(300, 0x00, 0x03, 10),
+   REGULATOR_LINEAR_RANGE(260, 0x00, 0x03, 10),
 };
 
+static const unsigned int bd71847_buck4_volt_range_sel[] = { 0x0, 0x40 };
+
 /*
  * BUCK6
  * 3.0V to 3.3V (step 100mV)
  */
-static const struct regulator_linear_range bd71837_buck6_voltage_ranges[] = {
+static const struct regulator_linear_range bd71837_buck6_volts[] = {
REGULATOR_LINEAR_RANGE(300, 0x00, 0x03, 10),
 };
 
@@ -190,8 +255,11 @@ static const struct regulator_linear_range 
bd718xx_4th_nodvs_buck_volts[] = {
  */
 static const struct regulator_linear_range bd718xx_ldo1_volts[] = {
REGULATOR_LINEAR_RANGE(300, 0x00, 0x03, 10),
+   REGULATOR_LINEAR_RANG

Re: [RESEND PATCH v4 5/6] arm64/mm: Populate the swapper_pg_dir by fixmap.

2018-09-14 Thread James Morse

Hi Jun,

On 13/09/18 11:50, Jun Yao wrote:
> On Fri, Sep 07, 2018 at 10:58:22AM +0100, James Morse wrote:
>> On 22/08/18 10:54, Jun Yao wrote:
>>> WRITE_ONCE(*pmdp, pmd);
>>> dsb(ishst);
>>>  }
>>> @@ -480,6 +511,19 @@ static inline phys_addr_t pmd_page_paddr(pmd_t pmd)
>>>  
>>>  static inline void set_pud(pud_t *pudp, pud_t pud)
>>>  {
>>> +#ifdef __PAGETABLE_PUD_FOLDED
>>> +   if (in_swapper_pgdir(pudp)) {
>>> +   pud_t *fixmap_pudp;
>>> +
>>> +   spin_lock(&swapper_pgdir_lock);
>>> +   fixmap_pudp = (pud_t *)pgd_set_fixmap(__pa(pudp));
>>
>> I see this called 68 times during boot on a 64K/42bit-VA, 65 of which appear 
>> to
>> be during paging_init(). What do you think to keeping paging_init()s use of 
>> the
>> pgd fixmap for swapper_pg_dir, deliberately to skip the in_swapper_pgdir() 
>> test
>> during paging_init()?
> 
> I find that the __create_pgd_mapping() is used to set up the page table
> during paging_init(). And there are six functions calling it with
> different pgdps:
> 
> update_mapping_prot() init_mm.pgd(swapper_pg_dir)
> create_mapping_noalloc()  init_mm.pgd(swapper_pg_dir)
> __map_memblock()  pgdp(swapper_pg_dir)
> map_kernel_segment()  pgdp(swapper_pg_dir)
> create_pgd_mapping()  !(init_mm.pgd)
> map_entry_trampoline()tramp_pg_dir
> 
> In order to skip the in_swapper_pgdir() test during paging_init(), We
> need a way to determine if we are currently in paging_init().

We don't need to know paging_init() is the caller, we just want to fool
in_swapper_pgdir() into not matching the address. Using pgd_set_fixmap() in
paging_init(), as we do today would do this, as the value passed to
map_kernel()/map_mem() would never match swapper_pg_dir().

(this is just to stop unnecessary tlbi for every write to swapper_pg_dir, as we
don't need to do that during early boot when it isn't loaded)


> The way I
> can think of is to create a function similar to __create_pgd_mapping().
> And it is used to create the page table during paging_init(). It differs
> from the __create_pgd_mapping() only in that it calls
> p?d_populate_without_test(). However, in this way, I'm worried that I am
> reinventing the wheel.

I agree duplicating the code doesn't sound good.

Something like this? (barely tested):
%<
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 691a05bbf87b..64ba422482cc 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -644,8 +690,12 @@ void __init set_init_mm_pgd(pgd_t *pgd)
  */
 void __init paging_init(void)
 {
-   map_kernel(swapper_pg_dir);
-   map_mem(swapper_pg_dir);
+   pgd_t *pgdp = pgd_set_fixmap(__pa_symbol(swapper_pg_dir));
+
+   map_kernel(pgdp);
+   map_mem(pgdp);
+   pgd_clear_fixmap();
+
cpu_replace_ttbr1(lm_alias(swapper_pg_dir));
set_init_mm_pgd(swapper_pg_dir);
 }
%<


This reduced the pgd-fixmap setup/teardown calls during boot from 68 to 2...


Thanks,

James

Re: [Question] vendor-specific cpu enable-method

2018-09-14 Thread Masahiro Yamada

Hi.


2018-09-13 11:29 GMT+09:00 Jisheng Zhang :
> On Thu, 13 Sep 2018 10:23:35 +0900 Masahiro Yamada wrote:
>
>> Hello.
>>
>>
>> Sorry if I am asking a stupid question.
>>
>>
>> For arm64, there are only 2 cpu methods, psci and spin-table.
>>
>> Why do we still allow vendor-specific methods upstreamed
>> for arm 32bit ports?
>>
>> To me, it looks like SoC vendors continue inventing
>> different (but similar) ways to do the same thing.
>>
>> It is a historical reason for old platforms.
>>
>> However, if I look at Documentation/devicetree/bindings/arm/cpus.txt
>> enable-method properties are still increasing.
>>
>>
>> psci is available in arch/arm/kernel/psci_smp.c,
>> but not all SoCs support the security extension.
>> Is there a simpler one like spin-table available for arm32?
>
> Per my understanding, spin-table is similar as the "pen" based
> solution in arm32, both can't reliably support kexec, suspend etc...

Right.  spin-table is based on pen-based implementation,
and just a back-up plan in case psci is not available for some reasons.


>>
>> If we force generic methods like psci or spin-table
>> for new platforms, we can stop proliferated smp code.
>> (Of course, we are just shifting the complexity
>> from the kernel to firmware.)
>
> psci is good but not all SoCs support secure extensions. spin-table
> can't support kexec, suspend. Except prefer psci for news SoCs
> with secure extensions, no better solutions AFAIK.

OK, psci is preferred if it is available.

Otherwise, ... vendor specific code.


Thanks.


-- 
Best Regards
Masahiro Yamada

Re: [PATCH] acpi: remove a meaningless null check before debugfs_remove

2018-09-14 Thread Rafael J. Wysocki

On Saturday, August 18, 2018 12:49:20 PM CEST zhong jiang wrote:
> debugfs_remove has taken null pointer into account. So it is safe
> to remove the check before debugfs_remove.
> 
> Signed-off-by: zhong jiang 
> ---
>  drivers/acpi/custom_method.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/acpi/custom_method.c b/drivers/acpi/custom_method.c
> index e967c11..4451877 100644
> --- a/drivers/acpi/custom_method.c
> +++ b/drivers/acpi/custom_method.c
> @@ -92,8 +92,7 @@ static int __init acpi_custom_method_init(void)
>  
>  static void __exit acpi_custom_method_exit(void)
>  {
> - if (cm_dentry)
> - debugfs_remove(cm_dentry);
> + debugfs_remove(cm_dentry);
>  }
>  
>  module_init(acpi_custom_method_init);
> 

Applied, thanks!

Re: [RESEND PATCH v4 5/6] arm64/mm: Populate the swapper_pg_dir by fixmap.

2018-09-14 Thread James Morse

Hi Jun,

On 10/09/18 12:41, Jun Yao wrote:
> On Fri, Sep 07, 2018 at 10:58:22AM +0100, James Morse wrote:
>> On 22/08/18 10:54, Jun Yao wrote:
>>> WRITE_ONCE(*pmdp, pmd);
>>> dsb(ishst);
>>>  }
>>> @@ -480,6 +511,19 @@ static inline phys_addr_t pmd_page_paddr(pmd_t pmd)
>>>  
>>>  static inline void set_pud(pud_t *pudp, pud_t pud)
>>>  {
>>> +#ifdef __PAGETABLE_PUD_FOLDED
>>> +   if (in_swapper_pgdir(pudp)) {
>>> +   pud_t *fixmap_pudp;
>>> +
>>> +   spin_lock(&swapper_pgdir_lock);
>>> +   fixmap_pudp = (pud_t *)pgd_set_fixmap(__pa(pudp));
>>
>> This is a bit subtle: are you using the pgd fixmap entry because the path 
>> from
>> map_mem() uses the other three?
>>
>> Using the pgd fix slot for a pud looks a bit strange to me, but its arguably 
>> a
>> side-effect of the folding.
> 
> Yes, it's a side-effect of the folding.
> 
> When the CONFIG_PGTABLE_LEVELS == 3, the pud is folded into the pgd. It
> means that the pgd is never none and it is also a pud. That's why I use
> the pgd fixmap entry.
> 
> Maybe write this more clearly:
> 
> static inline void set_pud(pud_t *pudp, pud_t pud)
> {
> #ifdef __PAGETABLE_PUD_FOLDED
>   pgd_t *pgdp = (pgd_t *)pudp;
> 
>   if (...) {
>   pgd_t *fixmap_pgdp;
>   pud_t *fixmap_pudp;
> 
>   spin_lock(...);
>   fixmap_pgdp = pgd_set_fixmap(__pa(pgdp));
>   fixmap_pudp = pud_set_fixmap_offset(fixmap_pgdp, 0UL);

Using two fixmap entries is excessive, this is behind __PAGETABLE_PUD_FOLDED, so
we should know what is going on.

(The folding confuses me every time I look at it)


> Do you have any way to make it look more reasonable?

I'm just reacting to a function with 'pud' in the name, that takes two pud's as
arguments, using the pgd fixmap slot. I think its fine to leave it like this, as
in_swapper_pg_dir() has told us this is the pgd we're dealing with, it just
looks funny.


>> I see this called 68 times during boot on a 64K/42bit-VA, 65 of which appear 
>> to
>> be during paging_init(). What do you think to keeping paging_init()s use of 
>> the
>> pgd fixmap for swapper_pg_dir, deliberately to skip the in_swapper_pgdir() 
>> test
>> during paging_init()?
> 
> I think the set_pud() should not be called on a 64K/42bit-VA. As only
> the level 2 and level 3 page tables are in use. It means that the pmd is
> folded into the pud and the pud is never none. So the set_pud() should
> not be called.

(yes, sorry, it was just the one I picked on!)


> I think a variable can be introduced to indicate whether paging_init()
> has been completed. And decide whether or not to skip the
> in_swapper_pgdir() base on the value of it.
> 
> I don't know if this is reasonable. What do you think?

I think we can just trick in_swapper_pgdir(), this code only runs once, and its
already in a very strange environment.


Thanks,

James

Re: [PATCH v2 11/11] arm: dts: ste-dbx5x0: Update coresight bindings for hardware port

2018-09-14 Thread Linus Walleij

On Wed, Sep 12, 2018 at 3:54 PM Suzuki K Poulose  wrote:

> Switch to the new coresight bindings
>
> Cc: Linus Walleij 
> Cc: Mathieu Poirier 
> Signed-off-by: Suzuki K Poulose 

Acked-by: Linus Walleij 

I guess you will merge this with the rest of the patches so I do
not need to queue it?

Yours,
Linus Walleij

Re: [PATCH] mailbox: PCC: handle parse error

2018-09-14 Thread Rafael J. Wysocki

On Monday, August 27, 2018 10:55:41 PM CEST Al Stone wrote:
> On 08/27/2018 01:19 PM, David Arcari wrote:
> > acpi_pcc_probe calls acpi_table_parse_entries_array but fails to check
> > for an error return.  This in turn can result in calling kcalloc with
> > a negative count as well as emitting the following misleading erorr
> > message:
> > 
> > [2.642015] Could not allocate space for PCC mbox channels
> > 
> > Fixes: 8f8027c5f935 ("mailbox: PCC: erroneous error message when parsing 
> > ACPI PCCT")
> > 
> > Signed-off-by: David Arcari 
> > Cc: Al Stone 
> > Cc: Jassi Brar 
> > ---
> >  drivers/mailbox/pcc.c | 7 +--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/mailbox/pcc.c b/drivers/mailbox/pcc.c
> > index 311e91b..256f18b 100644
> > --- a/drivers/mailbox/pcc.c
> > +++ b/drivers/mailbox/pcc.c
> > @@ -461,8 +461,11 @@ static int __init acpi_pcc_probe(void)
> > count = acpi_table_parse_entries_array(ACPI_SIG_PCCT,
> > sizeof(struct acpi_table_pcct), proc,
> > ACPI_PCCT_TYPE_RESERVED, MAX_PCC_SUBSPACES);
> > -   if (count == 0 || count > MAX_PCC_SUBSPACES) {
> > -   pr_warn("Invalid PCCT: %d PCC subspaces\n", count);
> > +   if (count <= 0 || count > MAX_PCC_SUBSPACES) {
> > +   if (count < 0)
> > +   pr_warn("Error parsing PCC subspaces from PCCT\n");
> > +   else
> > +   pr_warn("Invalid PCCT: %d PCC subspaces\n", count);
> > return -EINVAL;
> > }
> >  
> > 
> 
> Thanks, David.  Nice catch.
> 
> Reviewed-by: Al Stone 

Patch applied, thanks!

Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups

2018-09-14 Thread Patrick Bellasi

On 13-Sep 21:20, Peter Zijlstra wrote:
> On Wed, Sep 12, 2018 at 06:42:09PM +0100, Patrick Bellasi wrote:
> > On 12-Sep 18:24, Peter Zijlstra wrote:
> > > On Tue, Aug 28, 2018 at 02:53:10PM +0100, Patrick Bellasi wrote:
> 
> > > >  {
> > > > +   int group_id[UCLAMP_CNT] = { UCLAMP_NOT_VALID };
> > > > +   int lower_bound, upper_bound;
> > > > +   struct uclamp_se *uc_se;
> > > > +   int result = 0;
> > > 
> > > I think the thing would become much more readable if you set
> > > lower/upper_bound right here.
> 
> > Actually it could also make sense to have them before the mutex ;)
> 
> Indeed.
> 
> +   upper_bound = (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX)
> +   ? attr->sched_util_max
> +   : p->uclamp[UCLAMP_MAX].value;
> +
> +   if (upper_bound == UCLAMP_NOT_VALID)
> +   upper_bound = SCHED_CAPACITY_SCALE;
> +   if (attr->sched_util_min > upper_bound) {
> +   result = -EINVAL;
> +   goto done;
> +   }
> +
> +   result = uclamp_group_find(UCLAMP_MIN, attr->sched_util_min);
> +   if (result == -ENOSPC) {
> +   pr_err(UCLAMP_ENOSPC_FMT, "MIN");
> +   goto done;
> +   }
> +   group_id[UCLAMP_MIN] = result;
> +   }
> +   if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX) {
> +   lower_bound = (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN)
> +   ? attr->sched_util_min
> +   : p->uclamp[UCLAMP_MIN].value;
> +
> +   if (lower_bound == UCLAMP_NOT_VALID)
> +   lower_bound = 0;
> +   if (attr->sched_util_max < lower_bound ||
> +   attr->sched_util_max > SCHED_CAPACITY_SCALE) {
> +   result = -EINVAL;
> +   goto done;
> +   }
> 
> That would end up soething like:
> 
>   unsigned int lower_bound = p->uclamp[UCLAMP_MIN].value;
>   unsigned int upper_bound = p->uclamp[UCLAMP_MAX].value;
> 
>   if (sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN)
>   lower_bound = attr->sched_util_min;
> 
>   if (sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX)
>   upper_bound = attr->sched_util_max;
> 
>   if (lower_bound > upper_bound ||
>   upper_bound > SCHED_CAPACITY_MAX)
>   return -EINVAL;
> 
>   mutex_lock(...);

Yes... much cleaner, thanks.

-- 
#include 

Patrick Bellasi

Re: [PATCH] PM / suspend: Count suspend-to-idle loop as sleep time

2018-09-14 Thread Rafael J. Wysocki

On Friday, September 14, 2018 10:28:44 AM CEST Mika Penttilä wrote:
> Hi!
> 
> 
> On 09/14/2018 09:59 AM, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki 
> > 
> > There is a difference in behavior between suspend-to-idle and
> > suspend-to-RAM in the timekeeping handling that leads to functional
> > issues.  Namely, every iteration of the loop in s2idle_loop()
> > increases the monotinic clock somewhat, even if timekeeping_suspend()
> > and timekeeping_resume() are invoked from s2idle_enter(), and if
> > many of them are carried out in a row, the monotonic clock can grow
> > significantly while the system is regarded as suspended, which
> > doesn't happen during suspend-to-RAM and so it is unexpected and
> > leads to confusion and misbehavior in user space (similar to what
> > ensued when we tried to combine the boottime and monotonic clocks).
> > 
> > To avoid that, count all iterations of the loop in s2idle_loop()
> > as "sleep time" and adjust the clock for that on exit from
> > suspend-to-idle.
> > 
> > [That also covers systems on which timekeeping is not suspended
> >  by by s2idle_enter().]
> > 
> > Signed-off-by: Rafael J. Wysocki 
> > ---
> > 
> > This is a replacement for https://patchwork.kernel.org/patch/10599209/
> > 
> > I decided to count the entire loop in s2idle_loop() as "sleep time" as the
> > patch is then simpler and it also covers systems where timekeeping is not
> > suspended in the final step of suspend-to-idle.
> > 
> > I dropped the "Fixes:" tag, because the monotonic clock delta problem
> > has been present on the latter since the very introduction of "freeze"
> > (as suspend-to-idle was referred to previously) and so this doesn't fix
> > any particular later commits.
> > 
> > ---
> >  kernel/power/suspend.c |   18 ++
> >  1 file changed, 18 insertions(+)
> > 
> > Index: linux-pm/kernel/power/suspend.c
> > ===
> > --- linux-pm.orig/kernel/power/suspend.c
> > +++ linux-pm/kernel/power/suspend.c
> > @@ -109,8 +109,12 @@ static void s2idle_enter(void)
> >  
> >  static void s2idle_loop(void)
> >  {
> > +   ktime_t start, delta;
> > +
> > pm_pr_dbg("suspend-to-idle\n");
> >  
> > +   start = ktime_get();
> > +
> > for (;;) {
> > int error;
> >  
> > @@ -150,6 +154,20 @@ static void s2idle_loop(void)
> > pm_wakeup_clear(false);
> > }
> >  
> > +   /*
> > +* If the monotonic clock difference between the start of the loop and
> > +* this point is too large, user space may get confused about whether or
> > +* not the system has been suspended and tasks may get killed by
> > +* watchdogs etc., so count the loop as "sleep time" to compensate for
> > +* that.
> > +*/
> > +   delta = ktime_sub(ktime_get(), start);
> > +   if (ktime_to_ns(delta) > 0) {
> > +   struct timespec64 timespec64_delta = ktime_to_timespec64(delta);
> > +
> > +   timekeeping_inject_sleeptime64(×pec64_delta);
> > +   }
> 
> But doesn't injecting sleep time here make monotonic clock too large by the 
> amount of sleeptime? 
> tick_freeze() / tick_unfreeze() already injects the sleeptime (otherwise 
> delta would be 0).

No, it doesn't.

The delta here is the extra time taken by the loop which hasn't been counted
as sleep time yet.

Thanks,
Rafael

Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups

2018-09-14 Thread Patrick Bellasi

On 13-Sep 21:14, Peter Zijlstra wrote:
> On Wed, Sep 12, 2018 at 06:52:02PM +0100, Patrick Bellasi wrote:
> > On 12-Sep 19:42, Peter Zijlstra wrote:
> > > On Wed, Sep 12, 2018 at 06:35:15PM +0100, Patrick Bellasi wrote:
> > > > On 12-Sep 18:12, Peter Zijlstra wrote:
> > > 
> > > > > No idea; but if you want to go all fancy you can replace he whole
> > > > > uclamp_map thing with something like:
> > > > > 
> > > > > struct uclamp_map {
> > > > >   union {
> > > > >   struct {
> > > > >   unsigned long v : 10;
> > > > >   unsigned long c : BITS_PER_LONG - 10;
> > > > >   };
> > > > >   atomic_long_t s;
> > > > >   };
> > > > > };
> > > > 
> > > > That sounds really cool and scary at the same time :)
> > > > 
> > > > The v:10 requires that we never set SCHED_CAPACITY_SCALE>1024
> > > > or that we use it to track a percentage value (i.e. [0..100]).
> > > 
> > > Or we pick 11 bits, it seems unlikely that capacity be larger than 2k.
> > 
> > Just remembered a past experience where we had unaligned access traps
> > on some machine because... don't you see any potentially issue on
> > using bitfleds like you suggest above ?
> > 
> > I'm thinking to:
> > 
> >commit 317d359df95d ("sched/core: Force proper alignment of 'struct 
> > util_est'")
> 
> There should not be (and I'm still confused by that particular commit
> you reference). If we access everything through the uclamp_map::s, and
> only use the bitfields to interpret the results, it all 'works'.

Yes, the problem above was different... still I was wondering if there
could be bitfields related alignment issue lurking somewhere.
But, as you say, if we always R/W atomically via uclamp_map::s there
should be none.

> The tricky thing we did earlier was trying to use u64 accesses for 2
> u32 variables. And somehow ia64 didn't get the alignment right.

Right, np... sorry for the noise.

-- 
#include 

Patrick Bellasi

Re: [PATCH] perf/x86/intel/lbr: Optimize context switches for LBR

2018-09-14 Thread Andi Kleen

> > In principle the LBRs need to be flushed between threads. So does
> > current code.
> 
> IMHO, ideally, LBRs stack would be preserved and restored when 
> switching between execution stacks. That would allow implementing 
> per-thread statistical call graph view in Perf tools, fully based 
> on HW capabilities. It could be advantageous for some cases, in 
> comparison with traditional dwarf based call graph. 

This is already supported when you use LBR call stack mode
(perf record --call-graph lbr)

This change is only optimizing the case when call stack mode is not used.

Of course in call stack mode the context switch overhead is even higher,
because it not only writes, but also reads.

-Andi

Re: [PATCH] gpiolib: Free the last requested descriptor

2018-09-14 Thread Linus Walleij

On Thu, Sep 13, 2018 at 3:37 PM Ricardo Ribalda Delgado
 wrote:

> The current code only frees N-1 gpios if an error occurs during
> gpiod_set_transitory, gpiod_direction_output or gpiod_direction_input.
> Leading to gpios that cannot be used by userspace nor other drivers.
>
> Cc: Timur Tabi 
> Cc: sta...@vger.kernel.org
> Fixes: ab3dbcf78f60f46d ("gpioib: do not free unrequested descriptors)
> Signed-off-by: Ricardo Ribalda Delgado 

Patch applied for fixes with Jan's Reported-by tag.

Yours,
Linus Walleij

Re: [PATCH v2 11/11] arm: dts: ste-dbx5x0: Update coresight bindings for hardware port

2018-09-14 Thread Suzuki K Poulose


Hi Linus,

On 14/09/18 09:43, Linus Walleij wrote:

On Wed, Sep 12, 2018 at 3:54 PM Suzuki K Poulose  wrote:


Switch to the new coresight bindings

Cc: Linus Walleij 
Cc: Mathieu Poirier 
Signed-off-by: Suzuki K Poulose 


Acked-by: Linus Walleij 



Thanks.


I guess you will merge this with the rest of the patches so I do
not need to queue it?


We haven't figured out that yet. I guess it would be ideal if we
could merge this via arm-soc ? To avoid possible conflict with
other changes to the dts in the series. Sudeep has queued the
TC2 and Juno changes. So, if you could queue it, that is fine
with us.

Suzuki

Re: [LKP] [ipc] 61224adcd2: general_protection_fault:#[##]

2018-09-14 Thread Rong Chen





On 09/13/2018 07:28 PM, David Howells wrote:

kernel test robot  wrote:


To reproduce:

 git clone https://github.com/intel/lkp-tests.git
 cd lkp-tests
 bin/lkp qemu -k  job-script # job-script is attached in this 
email

Tried that.  That seems to work for me.  Or, at least, it comes to a prompt at
which I can log in.  I'm not sure whether anything else is meant to happen.

It's a 0day problem, we will fix it as soon as possible.

Best Regards,
Rong Chen

Re: [PATCH] gpiolib: fix freeing of descriptors on error

2018-09-14 Thread Linus Walleij

On Thu, Sep 13, 2018 at 9:56 PM Ricardo Ribalda Delgado
 wrote:

> Hi Jim
>
> I have sent an identical patch 3 hours before you [1] ;)
> Seems that we are working on the same stuff.

Haha w00t but I'm grateful that you fine people are taking the
chardev ABI for a ride!

I recorded Jim's contribution as Reported-by on your patch.

Yours,
Linus Walleij

Re: [PATCH v1 2/3] locking/pvqspinlock, hv: Enable PV qspinlock for Hyper-V

2018-09-14 Thread Yi Sun

On 18-09-13 13:06:13, Thomas Gleixner wrote:
> On Thu, 13 Sep 2018, Yi Sun wrote:
> > +#include 
> >  
> >  /* representing HT siblings of each logical CPU */
> >  DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map);
> > @@ -1335,6 +1336,7 @@ void __init native_smp_prepare_boot_cpu(void)
> > /* already set me in cpu_online_mask in boot_cpu_init() */
> > cpumask_set_cpu(me, cpu_callout_mask);
> > cpu_set_state_online(me);
> > +   hv_init_spinlocks();
> 
> No. We have smp_ops.smp_prepare_boot_cpu for that.
> 
Got it, will change it. Thanks!

> Thanks,
> 
>   tglx

Re: [PATCH] printk: CON_PRINTBUFFER console registration is a bit racy

2018-09-14 Thread Petr Mladek

On Fri 2018-09-14 11:34:28, Sergey Senozhatsky wrote:
> CON_PRINTBUFFER console registration requires us to do several
> preparation steps:
> - Rollback console_seq to replay logbuf messages which were already
>   seen on other consoles;
> - Set exclusive_console flag so console_unlock() will ->write() logbuf
>   messages only to the exclusive_console driver.
> 
> The way we do it, however, is a bit racy
> 
>   logbuf_lock_irqsave(flags);
>   console_seq = syslog_seq;
>   console_idx = syslog_idx;
>   logbuf_unlock_irqrestore(flags);
>   << preemption enabled
>   << irqs enabled
>   exclusive_console = newcon;
>   console_unlock();
> 
> We rollback console_seq under logbuf_lock with IRQs disabled, but
> we set exclusive_console with local IRQs enabled and logbuf unlocked.
> If the system oops-es or panic-s before we set exclusive_console - and
> given that we have IRQs and preemption enabled there is such a
> possibility - we will re-play all logbuf messages to every registered
> console, which may be a bit annoying and time consuming.
>
> Move exclusive_console assignment under the same IRQs-disabled and
> logbuf_lock protected section where we rollback console_seq.
> 
> Signed-off-by: Sergey Senozhatsky 
> ---
>  kernel/printk/printk.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 992bb6bb7ac2..c52a986a72b3 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2705,7 +2705,6 @@ void register_console(struct console *newcon)
>   logbuf_lock_irqsave(flags);
>   console_seq = syslog_seq;
>   console_idx = syslog_idx;
> - logbuf_unlock_irqrestore(flags);
>   /*
>* We're about to replay the log buffer.  Only do this to the
>* just-registered console to avoid excessive message spam to
> @@ -2713,6 +2712,7 @@ void register_console(struct console *newcon)
>*/
>   exclusive_console = newcon;
>   exclusive_console_stop_seq = console_seq;
> + logbuf_unlock_irqrestore(flags);

This would make a false feeling that these variables need to
be synchronized using logbug_lock. It might either confuse people
or it could get removed during a code clean up.

A better solution would be to explicitly disable preemption
around the entire section and add a comment.

Well, I am not sure if it is worth the code complexity.

Best Regards,
Petr

Re: [PATCH] pinctrl: at91: don't use the same irqchip with multiple gpiochips

2018-09-14 Thread Linus Walleij

On Thu, Sep 13, 2018 at 2:46 PM Ludovic Desroches
 wrote:

> Sharing the same irqchip with multiple gpiochips is not a good
> practice. For instance, when installing hooks, we change the state
> of the irqchip. The initial state of the irqchip for the second
> gpiochip to register is then disrupted.
>
> Signed-off-by: Ludovic Desroches 

Patch applied.

Yours,
Linus Walleij

Re: [PATCH] cpufreq: intel_pstate: Optimize IO boost in non HWP mode

2018-09-14 Thread Rafael J. Wysocki

On Tuesday, September 11, 2018 7:35:15 PM CEST Francisco Jerez wrote:
>
> "Rafael J. Wysocki"  writes:
> 
> > On Thursday, September 6, 2018 6:20:08 AM CEST Francisco Jerez wrote:
> >
> >> Srinivas Pandruvada  writes:
> >>=20
> >> > [...]
> >> >
> >> >> > >=3D20
> >> >> > > This patch causes a number of statistically significant
> >> >> > > regressions
> >> >> > > (with significance of 1%) on the two systems I've tested it
> >> >> > > on.  On
> >> >> > > my
> >> >> >=3D20
> >> >> > Sure. These patches are targeted to Atom clients where some of
> >> >> > these
> >> >> > server like workload may have some minor regression on few watts
> >> >> > TDP
> >> >> > parts.
> >> >>=3D20
> >> >> Neither the 36% regression of fs-mark, the 21% regression of sqlite,
> >> >> nor
> >> >> the 10% regression of warsaw qualify as small.  And most of the test
> >> >> cases on the list of regressions aren't exclusively server-like, if
> >> >> at
> >> >> all.  Warsaw, gtkperf, jxrendermark and lightsmark are all graphics
> >> >> benchmarks -- Latency is as important if not more for interactive
> >> >> workloads than it is for server workloads.  In the case of a conflict
> >> >> like the one we're dealing with right now between optimizing for
> >> >> throughput (e.g. for the maximum number of requests per second) and
> >> >> optimizing for latency (e.g. for the minimum request duration), you
> >> >> are
> >> >> more likely to be concerned about the former than about the latter in
> >> >> a
> >> >> server setup.
> >> >
> >> > Eero,
> >> > Please add your test results here.
> >> >
> >> > No matter which algorithm you do, there will be variations. So you have
> >> > to look at the platforms which you are targeting. For this platform=3D=
> 20
> >> > number one item is use of less turbo and hope you know why?
> >>=20
> >> Unfortunately the current controller uses turbo frequently on Atoms for
> >> TDP-limited graphics workloads regardless of IOWAIT boosting.  IOWAIT
> >> boosting simply exacerbated the pre-existing energy efficiency problem.
> >
> > My current understanding of the issue at hand is that using IOWAIT boosti=
> ng
> > on Atoms is a regression relative to the previous behavior.
> 
> Not universally.  IOWAIT boosting helps under roughly the same
> conditions on Atom as it does on big core, so applying this patch will
> necessarily cause regressions too (see my reply from Sep. 3 for some
> numbers), and won't completely restore the previous behavior since it
> simply decreases the degree of IOWAIT boosting applied without being
> able to avoid it (c.f. the series I'm working on that does something
> similar to IOWAIT boosting when it's able to determine it's actually
> CPU-bound, which prevents energy inefficient behavior for non-CPU-bound
> workloads that don't benefit from a higher CPU clock frequency anyway).

Well, OK.  That doesn't seem to be a clear-cut regression situation, then,
since getting back is not desirable, apparently.

Or would it restore the previous behavior if we didn't do any IOWAIT
boosting on Atoms at all?

> > That is what Srinivas is trying to address here AFAICS.
> >
> > Now, you seem to be saying that the overall behavior is suboptimal and the
> > IOWAIT boosting doesn't matter that much,
> 
> I was just saying that IOWAIT boosting is less than half of the energy
> efficiency problem, and this patch only partially addresses that half of
> the problem.

Well, fair enough, but there are two things to consider here, the general
energy-efficiency problem and the difference made by IOWAIT boosting.

If the general energy-efficiency problem had existed for a relatively long
time, but it has got worse recently due to the IOWAIT boosting, it still
may be desirable to get the IOWAIT boosting out of the picture first
and then get to the general problem.

I'm not sure if that is the case as my experience with Atoms is limited
anyway, so please advise.

Thanks,
Rafael

Re: [PATCH 3/6] xenbus: dispatch per-domU watch event to per-domU xenwatch thread

2018-09-14 Thread Juergen Gross

On 14/09/18 09:34, Dongli Zhang wrote:
> This is the 3rd patch of a (6-patch) patch set.
> 
> This patch dispatches the watch event to per-domU xenwatch thread when the
> event meets all of below conditions:
> 
> 1. The watch is registered to use xenwatch multithreading feature and the
>get_domid() method returns valid domid.
> 2. There is xenwatch thread (mtwatch domain) available for the domid
>obtained from get_domid() method.
> 3. The xenwatch thread is forked successfully by kthread_run() during
>initialization and therefore its state should be MTWATCH_DOMAIN_UP.
> 
> Signed-off-by: Dongli Zhang 
> ---
>  drivers/xen/xenbus/xenbus_xs.c | 53 
> ++
>  1 file changed, 49 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/xen/xenbus/xenbus_xs.c b/drivers/xen/xenbus/xenbus_xs.c
> index 741dc54..7335e19 100644
> --- a/drivers/xen/xenbus/xenbus_xs.c
> +++ b/drivers/xen/xenbus/xenbus_xs.c
> @@ -896,6 +896,32 @@ static struct xenbus_watch *find_watch(const char *token)
>   return NULL;
>  }
>  
> +static int xs_watch_insert_event(struct xs_watch_event *event, domid_t domid)
> +{
> + struct mtwatch_domain *domain;
> + int ret = -1;
> +
> + rcu_read_lock();
> +
> + domain = mtwatch_find_domain(domid);
> + if (!domain) {
> + rcu_read_unlock();
> + return ret;
> + }
> +
> + spin_lock(&domain->events_lock);
> + if (domain->state == MTWATCH_DOMAIN_UP) {
> + list_add_tail(&event->list, &domain->events);
> + wake_up(&domain->events_wq);
> + ret = 0;
> + }
> + spin_unlock(&domain->events_lock);
> +
> + rcu_read_unlock();
> +
> + return ret;
> +}
> +
>  int xs_watch_msg(struct xs_watch_event *event)
>  {
>   if (count_strings(event->body, event->len) != 2) {
> @@ -908,10 +934,29 @@ int xs_watch_msg(struct xs_watch_event *event)
>   spin_lock(&watches_lock);
>   event->handle = find_watch(event->token);
>   if (event->handle != NULL) {
> - spin_lock(&watch_events_lock);
> - list_add_tail(&event->list, &watch_events);
> - wake_up(&watch_events_waitq);
> - spin_unlock(&watch_events_lock);
> + domid_t domid = 0;

I won't repeat it again (including following patches): use another
special value like DOMID_SELF.


Juergen

metadata operation reordering regards to crash

2018-09-14 Thread 焦晓冬

Hi, all,

A probably bit of complex question:
Does nowadays practical filesystems, eg., extX, btfs, preserve metadata
operation order through a crash/power failure?

What I know is modern filesystems ensure metadata consistency
after crash/power failure. Journal filesystems like extX do that by
write-ahead logging of metadata operations into transactions. Other
filesystems do that in various ways as btfs do that by COW.

What I'm not so far clear is whether these filesystems preserve
metadata operation order after a crash.

For example,
op 1.  rename(A, B)
op 2.  rename(C, D)

As mentioned above,  metadata consistency is ensured after a crash.
Thus, B is either the original B(or not exists) or has been replaced by A.
The same to D.

Is it possible that, after a crash, D has been replaced by C but B is still
the original file(or not exists)?

Or, from the view of implementation, before the crash
- in a journal filesystem,
Is the atomic transaction `rename(C, D)` permitted to be written to disk journal
before the transaction `rename(A, B)`?
- in other filesystems, say btfs,
Is it permit to reorder `rename(C,D)` and `rename(A,B)` atomic operation hiting
disk?

The question is meaningful as many applications do that:
if (flag_file_says_need_generate_data) {
open_write_sync_close(data_tmp);
rename(data_tmp, data);

open_write_sync_close(flag_file_tmp, no_need_to_generate_data);
rename(flag_file_tmp, flag_file)
}
use_data_file()

If flag is here but data is not after a crash, that is a problem.

Thanks,
Trol

Re: [PATCH v1 2/3] locking/pvqspinlock, hv: Enable PV qspinlock for Hyper-V

2018-09-14 Thread Yi Sun

On 18-09-13 13:24:07, Thomas Gleixner wrote:
> On Thu, 13 Sep 2018, Yi Sun wrote:
> > +++ b/arch/x86/hyperv/hv_spinlock.c
> > @@ -0,0 +1,99 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +/*
> > + * Hyper-V specific spinlock code.
> > + *
> > + * Copyright (C) 2018, Intel, Inc.
> > + *
> > + * Author : Yi Sun 
> > + *
> > + * This program is free software; you can redistribute it and/or modify it
> > + * under the terms of the GNU General Public License version 2 as published
> > + * by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful, but
> > + * WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
> > + * NON INFRINGEMENT.  See the GNU General Public License for more
> > + * details.
> 
> Please remove the boilerplate. The SPDX identifier is enough. If in doubt
> talk to Intel legal.
> 
I will check this. Thanks!

> > +static bool hv_pvspin = true;
> 
> __initdata please.
> 
> > +static u32 spin_wait_info = 0;
> 
> No need for 0 initialization.
> 
Will modify them.

> > +
> > +static void hv_notify_long_spin_wait(void)
> > +{
> > +   u64 input = spin_wait_info | 0x;
> 
> What? The result is always 0x .
> 
Oh, sorry for such error.

> > +   spin_wait_info++;
> > +   hv_do_fast_hypercall8(HVCALL_NOTIFY_LONG_SPIN_WAIT, input);
> > +}
> > +
> > +static void hv_qlock_kick(int cpu)
> > +{
> > +   spin_wait_info--;
> 
> Looking at the above this is completely pointless and I have no idea what
> that variable is supposed to do.
> 
Per Microsoft Hypervisor Top Level Functional Specification, the input
parameter of HVCALL_NOTIFY_LONG_SPIN_WAIT is defined as below:

SpinwaitInfo – Specifies the accumulated count the guest was spinning.

So, it is increased when guest is spinning and reduced when spinlock is
released.

I may add comments to explain it.

> > +   apic->send_IPI(cpu, X86_PLATFORM_IPI_VECTOR);
> > +}
> > +
> > +/*
> > + * Halt the current CPU & release it back to the host
> > + */
> > +static void hv_qlock_wait(u8 *byte, u8 val)
> > +{
> > +   unsigned long msr_val;
> > +
> > +   if (READ_ONCE(*byte) != val)
> > +   return;
> > +
> > +   hv_notify_long_spin_wait();
> > +
> > +   rdmsrl(HV_X64_MSR_GUEST_IDLE, msr_val);
> 
> Magic rdmsrl(). That wants a comment what this is for.
> 
Per above spec, reading HV_X64_MSR_GUEST_IDLE can make guest idle.

I will add comment here.

> > +/*
> > + * Hyper-V does not support this so far.
> > + */
> > +bool hv_vcpu_is_preempted(int vcpu)
> 
> static ?
> 
PV_CALLEE_SAVE_REGS_THUNK definition is below. 

#define PV_CALLEE_SAVE_REGS_THUNK(func) \
extern typeof(func) __raw_callee_save_##func; \
..

So, the hv_vcpu_is_preempted cannot be declared as 'static'. Otherwise,
the make fails with below info.

arch/x86/hyperv/hv_spinlock.o: In function 
`__raw_callee_save_hv_vcpu_is_preempted':
hv_spinlock.c:(.text+0xd): undefined reference to `hv_vcpu_is_preempted'

> > +{
> > +   return false;
> > +}
> > +PV_CALLEE_SAVE_REGS_THUNK(hv_vcpu_is_preempted);
> > +
> > +void __init hv_init_spinlocks(void)
> > +{
> > +   if (!hv_pvspin ||
> > +   !apic ||
> > +   !(ms_hyperv.hints & HV_X64_CLUSTER_IPI_RECOMMENDED) ||
> > +   !(ms_hyperv.features & HV_X64_MSR_GUEST_IDLE_AVAILABLE)) {
> > +   pr_warn("hv: PV spinlocks disabled\n");
> 
> Why does this need to be pr_warn? This is purely informational. Also please
> use pr_fmt instead of the 'hv:' prefix.
> 
Got it. Thanks!

> > +static __init int hv_parse_nopvspin(char *arg)
> > +{
> > +   hv_pvspin = false;
> > +   return 0;
> > +}
> > +early_param("hv_nopvspin", hv_parse_nopvspin);
> 
> That lacks Documentation of the command line parameter. Wants to be in the
> same patch.
> 
Will merge patch 3 into 2. Thanks!

> Thanks,
> 
>   tglx

Re: [PATCH v4 03/16] sched/core: uclamp: add CPU's clamp groups accounting

2018-09-14 Thread Patrick Bellasi

On 13-Sep 21:12, Peter Zijlstra wrote:
> On Tue, Aug 28, 2018 at 02:53:11PM +0100, Patrick Bellasi wrote:
> > +static inline void uclamp_cpu_get_id(struct task_struct *p,
> > +struct rq *rq, int clamp_id)
> > +{
> > +   struct uclamp_group *uc_grp;
> > +   struct uclamp_cpu *uc_cpu;
> > +   int clamp_value;
> > +   int group_id;
> > +
> > +   /* Every task must reference a clamp group */
> > +   group_id = p->uclamp[clamp_id].group_id;
> 
> > +}
> > +
> > +static inline void uclamp_cpu_put_id(struct task_struct *p,
> > +struct rq *rq, int clamp_id)
> > +{
> > +   struct uclamp_group *uc_grp;
> > +   struct uclamp_cpu *uc_cpu;
> > +   unsigned int clamp_value;
> > +   int group_id;
> > +
> > +   /* New tasks don't have a previous clamp group */
> > +   group_id = p->uclamp[clamp_id].group_id;
> > +   if (group_id == UCLAMP_NOT_VALID)
> > +   return;
> 
> *confused*, so on enqueue they must have a group_id, but then on dequeue
> they might no longer have?

Why not ?

Tasks always have a (task-specific) group_id, once set the first time.

IOW, init_task::group_id is UCLAMP_NOT_VALID, as well as all the tasks
forked under reset_on_fork, otherwise the get the group_id of the
parent.

Actually, I just noted that the reset_on_fork path is now setting
p::group_id=0, which is semantically equivalent to UCLAMP_NOT_VALID...
but will update that assignment for consistency in v5.

The only way to set a !UCLAMP_NOT_VALID value for p::group_id is via
the syscall, which will either fails or set a new valid group_id.

Thus, if we have a valid p::group_id @enqueue time, we will have one
@dequeue time too. Eventually it could be a different one, because
while RUNNABLE we do a syscall... but this case is addressed by the
following patch:

   [PATCH v4 04/16] sched/core: uclamp: update CPU's refcount on clamp changes
   https://lore.kernel.org/lkml/20180828135324.21976-5-patrick.bell...@arm.com/

Does that makes sense ?

> > +}
> 
> > @@ -1110,6 +1313,7 @@ static inline void enqueue_task(struct rq *rq, struct 
> > task_struct *p, int flags)
> > if (!(flags & ENQUEUE_RESTORE))
> > sched_info_queued(rq, p);
> >  
> > +   uclamp_cpu_get(rq, p);
> > p->sched_class->enqueue_task(rq, p, flags);
> >  }
> >  
> > @@ -1121,6 +1325,7 @@ static inline void dequeue_task(struct rq *rq, struct 
> > task_struct *p, int flags)
> > if (!(flags & DEQUEUE_SAVE))
> > sched_info_dequeued(rq, p);
> >  
> > +   uclamp_cpu_put(rq, p);
> > p->sched_class->dequeue_task(rq, p, flags);
> >  }
> 
> The ordering, is that right? We get while the task isn't enqueued yet,
> which would suggest we put when the task is dequeued.

That's the "usual trick" required for correct schedutil updates.

The scheduler class code will likely poke schedutil and thus we
wanna be sure to have updated CPU clamps by the time we have to
compute the next OPP.

Cheers,
Patrick

-- 
#include 

Patrick Bellasi

Re: [PATCH 11/11] UAPI: Check headers build for C++ [ver #2]

2018-09-14 Thread Arnd Bergmann

On Thu, Sep 6, 2018 at 11:21 AM David Howells  wrote:

> +
> +typedef __s8   int8_t;
> +typedef __s16  int16_t;
> +typedef __s32  int32_t;
> +typedef __s64  int64_t;
> +typedef __u8   uint8_t;
> +typedef __u16  uint16_t;
> +typedef __u32  uint32_t;
> +typedef __u64  uint64_t;
> +typedef long int   intptr_t;
> +typedef unsigned long int  uintptr_t;
> +typedef unsigned short u_short;
> +typedef unsigned int   u_int;
> +typedef unsigned long  u_long;
> +typedef char   *caddr_t;
> +
> +typedef __kernel_clockid_t clockid_t;
> +typedef __kernel_ino_t ino_t;
> +typedef __kernel_pid_t pid_t;
> +typedef __kernel_sa_family_t   sa_family_t;
> +typedef __kernel_size_tsize_t;
> +typedef __kernel_uid_t uid_t;
> +
> +typedef unsigned long  elf_greg_t;
> +typedef elf_greg_t elf_gregset_t[1];
> +typedef unsigned long long elf_fpregset_t[1];
> +typedef unsigned long long elf_fpxregset_t[1];
> +
> +#define INT_MIN ((int)0x8000)
> +#define INT_MAX ((int)0x7fff)
> +
> +extern size_t strlen(const char *);
> +extern void *memset(void *, int, size_t);
> +extern void *memcpy(void *, const void *, size_t);
> +extern __u16 ntohs(__u16);
> +extern __u16 htons(__u16);
> +extern __u32 ntohl(__u32);
> +extern __u32 htonl(__u32);
> +
> +typedef uint32_t   grant_ref_t;
> +typedef uint16_t   domid_t;
> +typedef unsigned long  xen_pfn_t;
> +
> +#define MSG_FIN 0x200
> +
> +typedef int SVGA3dMSPattern;
> +typedef int SVGA3dMSQualityLevel;
> +
> +struct sockaddr
> +{
> +   sa_family_t sa_family;
> +   charsa_data[14];
> +};
> +#define sockaddr_storage __kernel_sockaddr_storage

I think we need to reduce that list as much as we can. In
https://patchwork.ozlabs.org/patch/968814/, Joseph Myers pointed
out  header file (linux/elfcore.h) that simply cannot be included from
user space at all, because its dependencies cannot be met without
running into conflicting type definitions, and he would like to include
that file in order to automatically check that it's compatible with
the glibc version (I pointed out a couple of architectures on which
it is in fact incompatible right now).

In the list above, I see multiple classes of bugs that could be
addressed:

- references to identifiers that are only present in kernel internal
  headers: SVGA3dMSPattern, MSG_FIN, xen_pfn_t, ...
  I think these are all simple bugs, and we should either remove
  the references, or make sure the respective dependencies are
  included in the uapi headers as well, possibly renamed with
  a __kernel_ prefix to avoid clashing with user space headers.

- references to user space types that should use the uapi
  internal types: sockaddr_storage, clockid_t, uid_t, ...
  I think these just need to get the __kernel_prefix
  consistently as we did a few years ago. Note that using
  the headers otherwise is broken anyway when the types
  in libc are different from the ones in the kernel.

- standard types (uint32_t): either include the correct user
  space headers ifndef __KERNEL__ or use the kernel types

- standard functions (memcpy(), ntohs(), ...): These should
  already be handled in the headers by including the user space.
  If we missed any, we should probably do the same thing
  there.


   Arnd

Re: [PATCH 1/3] drivers/base: move device_shutdown() to base/power

2018-09-14 Thread Rafael J. Wysocki

On Monday, August 20, 2018 11:18:35 AM CEST Pingfan Liu wrote:
> Consider the shutdown as a system state transition, i.e. something like
> suspend, hibernate, hence move it under the base/power. (This is a first
> step to unify the duplicate code logic on devices_kset and dpm_list.)

I don't really think that device_shutodwn() belongs in base/power/.

Moving it to a separate file sounds like a good idea, but let that file
reside in base/ proper.

Then, I would separate the dpm_list definition, locking etc out of
base/power/, move it to the new file containing device_shutdown() and
make the code in base/power/ refer to it (you may want to move the list
entry list head from dev_pm_info to struct device directly while at that
too).

Then, it should be straightforward enough to switch device_shutdown() over to
using it.

Thanks,
Rafael

Re: [RFC] memory_hotplug: Free pages as pageblock_order

2018-09-14 Thread Michal Hocko

On Wed 12-09-18 20:12:30, Arun KS wrote:
> On 2018-09-12 18:47, Michal Hocko wrote:
> > On Wed 12-09-18 22:57:43, Balbir Singh wrote:
> > > On Wed, Sep 12, 2018 at 12:38:53PM +0200, Michal Hocko wrote:
> > > > On Wed 12-09-18 14:56:45, Arun KS wrote:
> > > > > When free pages are done with pageblock_order, time spend on
> > > > > coalescing pages by buddy allocator can be reduced. With
> > > > > section size of 256MB, hot add latency of a single section
> > > > > shows improvement from 50-60 ms to less than 1 ms, hence
> > > > > improving the hot add latency by 60%.
> > > >
> > > > Where does the improvement come from? You are still doing the same
> > > > amount of work except that the number of callbacks is lower. Is this the
> > > > real source of 60% improvement?
> > > >
> > > 
> > > It looks like only the first page of the pageblock is initialized, is
> > > some of the cost amortized in terms of doing one initialization for
> > > the page with order (order) and then relying on split_page and helpers
> > > to do the rest? Of course the number of callbacks reduce by a
> > > significant
> > > number as well.
> > 
> > Ohh, I have missed that part. Now when re-reading I can see the reason
> > for the perf improvement. It is most likely the higher order free which
> > ends up being much cheaper. This part makes some sense.
> > 
> > How much is this feasible is another question. Do not forget we have
> > those external providers of the online callback and those would need to
> > be updated as well.
> Sure Michal, I ll look into this.
> 
> > 
> > Btw. the normal memmap init code path does the same per-page free as
> > well. If we really want to speed the hotplug path then I guess the init
> > one would see a bigger improvement and those two should be in sync.
> Thanks for pointers, Will look further.

I haven't looked closer and I will be travelling next week so just hint.
Have a look at the nobootmem and how it frees pages to the page
allocator in __free_pages_boot_core. Seems exactly what you want and it
also answers your question about reference counting.
-- 
Michal Hocko
SUSE Labs

Re: [PATCH] cpufreq: intel_pstate: warn if HWP Native mode and OOB mode are both set

2018-09-14 Thread Rafael J. Wysocki

On Sunday, July 29, 2018 6:14:56 AM CEST ocea...@zoho.com.cn wrote:
> From: Ocean He 
> 
> Refer to Intel HWPM Compliance Guide (Document Number: 566474): Bit[6] or
> bit[8] of MSR MISC_PWR_MGMT is set only when BIOS select either HWP Native
> mode or HWP OOB mode. Bit[6] and bit[8] cannot be set both at the same
> time.
> 
> Add a warning message if intel_pstate detects both HWP Native mode and
> HWP OOB mode are set at the same time.
> 
> Signed-off-by: Ocean He 
> ---
>  drivers/cpufreq/intel_pstate.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> index 3c39712..678d3f2 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -2455,8 +2455,12 @@ static bool __init 
> intel_pstate_platform_pwr_mgmt_exists(void)
>   id = x86_match_cpu(intel_pstate_cpu_oob_ids);
>   if (id) {
>   rdmsrl(MSR_MISC_PWR_MGMT, misc_pwr);
> - if ( misc_pwr & (1 << 8))
> + if (misc_pwr & (1 << 8)) {
> + if (misc_pwr & (1 << 6))
> + pr_warn(FW_WARN
> + "HWP Native mode and OOB mode cannot be 
> set both at the same time.\n");
>   return true;
> + }
>   }
>  
>   idx = acpi_match_platform_list(plat_info);
> 

Srinivas, what do you think about this one?

Re: [PATCH] perf/x86/intel/lbr: Optimize context switches for LBR

2018-09-14 Thread Alexey Budankov



Hi Andi,

On 14.09.2018 11:54, Andi Kleen wrote:
>>> In principle the LBRs need to be flushed between threads. So does
>>> current code.
>>
>> IMHO, ideally, LBRs stack would be preserved and restored when 
>> switching between execution stacks. That would allow implementing 
>> per-thread statistical call graph view in Perf tools, fully based 
>> on HW capabilities. It could be advantageous for some cases, in 
>> comparison with traditional dwarf based call graph. 
> 
> This is already supported when you use LBR call stack mode
> (perf record --call-graph lbr)

Which kernel versions does it make sense to try?

Thanks,
Alexey

> 
> This change is only optimizing the case when call stack mode is not used.
> 
> Of course in call stack mode the context switch overhead is even higher,
> because it not only writes, but also reads.
> 
> -Andi
>

Re: [PATCH 3/4][v2] x86, hibernate: Extract the common code of 64/32 bit system

2018-09-14 Thread Rafael J. Wysocki

On Tuesday, September 11, 2018 7:20:46 PM CEST Chen Yu wrote:
> From: Zhimin Gu 
> 
> Reduce the hibernation code duplication between x86-32 and x86-64
> by extracting the common code into hibernate.c.
> 
> No functional change.
> 
> Acked-by: Chen Yu 
> Acked-by: Pavel Machek 
> Cc: "Rafael J. Wysocki" 
> Cc: Thomas Gleixner 
> Signed-off-by: Zhimin Gu 
> ---
>  arch/x86/power/hibernate.c| 265 ++
>  arch/x86/power/hibernate_32.c |  23 +--
>  arch/x86/power/hibernate_64.c | 246 +--
>  arch/x86/power/hibernate_asm_64.S |   2 +-
>  4 files changed, 269 insertions(+), 267 deletions(-)
>  create mode 100644 arch/x86/power/hibernate.c
> 
> diff --git a/arch/x86/power/hibernate.c b/arch/x86/power/hibernate.c
> new file mode 100644
> index ..6aeac4d3c9df
> --- /dev/null
> +++ b/arch/x86/power/hibernate.c
> @@ -0,0 +1,265 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Hibernation support for x86
> + *
> + * Copyright (c) 2007 Rafael J. Wysocki 
> + * Copyright (c) 2002 Pavel Machek 
> + * Copyright (c) 2001 Patrick Mochel 
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +/* Defined in hibernate_asm_64.S or hibernate_asm_32.S */
> +extern asmlinkage __visible int restore_image(void);
> +
> +/*
> + * Address to jump to in the last phase of restore in order to get to the 
> image
> + * kernel's text (this value is passed in the image header).
> + */
> +unsigned long restore_jump_address __visible;
> +unsigned long jump_address_phys;
> +
> +/*
> + * Value of the cr3 register from before the hibernation (this value is 
> passed
> + * in the image header).
> + */
> +unsigned long restore_cr3 __visible;
> +unsigned long temp_pgt __visible;
> +unsigned long relocated_restore_code __visible;
> +
> +#ifdef CONFIG_X86_32
> + #define RESTORE_MAGIC   0x12345678UL
> +#else
> + #define RESTORE_MAGIC   0x23456789ABCDEF01UL
> +#endif
> +
> +/**
> + * pfn_is_nosave - check if given pfn is in the 'nosave' section
> + * @pfn: the page frame number to be checked
> + *
> + * returns non-zero if the pfn is in the 'nosave' section,
> + * otherwise returns zero
> + */
> +int pfn_is_nosave(unsigned long pfn)
> +{
> + unsigned long nosave_begin_pfn;
> + unsigned long nosave_end_pfn;
> +
> + nosave_begin_pfn = __pa_symbol(&__nosave_begin) >> PAGE_SHIFT;
> + nosave_end_pfn = PAGE_ALIGN(__pa_symbol(&__nosave_end)) >> PAGE_SHIFT;
> +
> + return pfn >= nosave_begin_pfn && pfn < nosave_end_pfn;
> +}
> +
> +#ifdef CONFIG_X86_64
> +static int relocate_restore_code(void)
> +{
> + pgd_t *pgd;
> + p4d_t *p4d;
> + pud_t *pud;
> + pmd_t *pmd;
> + pte_t *pte;
> +
> + relocated_restore_code = get_safe_page(GFP_ATOMIC);
> + if (!relocated_restore_code)
> + return -ENOMEM;
> +
> + memcpy((void *)relocated_restore_code, core_restore_code, PAGE_SIZE);
> +
> + /* Make the page containing the relocated code executable */
> + pgd = (pgd_t *)__va(read_cr3_pa()) +
> + pgd_index(relocated_restore_code);
> + p4d = p4d_offset(pgd, relocated_restore_code);
> + if (p4d_large(*p4d)) {
> + set_p4d(p4d, __p4d(p4d_val(*p4d) & ~_PAGE_NX));
> + goto out;
> + }
> + pud = pud_offset(p4d, relocated_restore_code);
> + if (pud_large(*pud)) {
> + set_pud(pud, __pud(pud_val(*pud) & ~_PAGE_NX));
> + goto out;
> + }
> + pmd = pmd_offset(pud, relocated_restore_code);
> + if (pmd_large(*pmd)) {
> + set_pmd(pmd, __pmd(pmd_val(*pmd) & ~_PAGE_NX));
> + goto out;
> + }
> + pte = pte_offset_kernel(pmd, relocated_restore_code);
> + set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_NX));
> +out:
> + __flush_tlb_all();
> + return 0;
> +}
> +
> +#define MD5_DIGEST_SIZE 16
> +
> +struct restore_data_record {
> + unsigned long jump_address;
> + unsigned long jump_address_phys;
> + unsigned long cr3;
> + unsigned long magic;
> + u8 e820_digest[MD5_DIGEST_SIZE];
> +};
> +
> +#if IS_BUILTIN(CONFIG_CRYPTO_MD5)
> +/**
> + * get_e820_md5 - calculate md5 according to given e820 table
> + *
> + * @table: the e820 table to be calculated
> + * @buf: the md5 result to be stored to
> + */
> +static int get_e820_md5(struct e820_table *table, void *buf)
> +{
> + struct crypto_shash *tfm;
> + struct shash_desc *desc;
> + int size;
> + int ret = 0;
> +
> + tfm = crypto_alloc_shash("md5", 0, 0);
> + if (IS_ERR(tfm))
> + return -ENOMEM;
> +
> + desc = kmalloc(sizeof(struct shash_desc) + crypto_shash_descsize(tfm),
> +GFP_KERNEL);
> + if (!desc) {
> + ret = -ENOMEM;
> + goto free_tfm;
> + }
> +
> + desc->tfm = tfm

OK

2018-09-14 Thread Ahmed Zama

Greetings,

I humbly solicit for your partnership to transfer €15 million Euros
into your personal or company’s account .Contact me for more detailed
explanation.

Kindly send me the followings

Full Names
Address
Occupation
Direct Mobile Telephone Lines
Nationality

Ahmed Zama
+22675844869

Re: [RESEND PATCH V5 0/8] x86: infrastructure to enable FSGSBASE

2018-09-14 Thread Ingo Molnar



* Chang S. Bae  wrote:

> Resending the patchset that was posted before [6].
> 
> Given feedbacks from [1], it was suggested to separate two parts
> and to (re-)submit this patchset first.
> 
> To facilitate FSGSBASE, Andy's FS/GS base read fix is first
> ordered, then some helper functions and refactoring work
> are included. Cleanup for the vDSO initialization is
> for preparing per-CPU base finder that will be used in the
> paranoid entry, when FSGSBASE enabled.
> 
> Changes from V4 [5]:
> * Change patch ordering; putting the fix first before introducing
> the helper functions
> * Cleanup further for vDSO CPU initialization codes
> 
> Changes from V3 [4]:
> * Unify CPU number initialization
> 
> Changes from V2 [3]:
> * Bisect the CPU number initialization patch
> * Drop patches for introducing i386 CPU_NUMBER and switching
> write_rdtscp_aux() to use wrmsr_safe()
> 
> Changes from V1 [2]:
> * Rename the x86-64 CPU_NUMBER segment from PER_CPU
> * Add i386 CPU_NUMBER equivalent to x86-64 at GDT entry 23
> * Add additional helper function to store CPU number
> * Switch write_rdtscp_aux() to use wrmsr_safe()
> 
> [1] FSGSBASE patch set V2: 
> https://lore.kernel.org/patchwork/cover/912063/
> [2] infrastructure for FSGSBASE 
> V1: https://lore.kernel.org/patchwork/cover/913139/
> [3] V2: https://lore.kernel.org/patchwork/cover/913644/
> [4] V3: https://lore.kernel.org/patchwork/cover/949775/
> [5] V4: https://lore.kernel.org/patchwork/cover/951712/
> [6] V5: https://lore.kernel.org/patchwork/cover/956526/
> 
> Andy Lutomirski (1):
>   x86/arch_prctl/64: Make ptrace read FS/GS base accurately
> 
> Chang S. Bae (7):
>   x86/fsgsbase/64: Introduce FS/GS base helper functions
>   x86/fsgsbase/64: Make ptrace use FS/GS base helpers
>   x86/fsgsbase/64: Use FS/GS base helpers in core dump
>   x86/fsgsbase/64: Factor out load FS/GS segments from __switch_to
>   x86/segments/64: Rename PER_CPU segment to CPU_NUMBER
>   x86/vdso: Introduce helper functions for CPU and node number
>   x86/vdso: Move out the CPU initialization
> 
>  arch/x86/entry/vdso/vgetcpu.c   |   9 +-
>  arch/x86/entry/vdso/vma.c   |  38 +
>  arch/x86/include/asm/elf.h  |   6 +-
>  arch/x86/include/asm/fsgsbase.h |  47 +++
>  arch/x86/include/asm/segment.h  |  46 +-
>  arch/x86/include/asm/vgtod.h|  26 --
>  arch/x86/kernel/cpu/common.c|  24 ++
>  arch/x86/kernel/process_64.c| 183 
> +++-
>  arch/x86/kernel/ptrace.c|  28 ++
>  9 files changed, 270 insertions(+), 137 deletions(-)
>  create mode 100644 arch/x86/include/asm/fsgsbase.h

Looks mostly good, here's a couple of details I noticed before I almost applied 
the series:


 - Please use a single, unified name-space for all the new FS/GS helpers, such 
as:

x86_fsbase_read_cpu()
x86_fsbase_read_task()
x86_fsbase_write_cpu()
x86_fsbase_write_cpu_inactive()
x86_fsbase_write_task()

x86_gsbase_read_cpu()
x86_gsbase_read_task()
x86_gsbase_write_task()

x86_fsgsbase_load() # was: load_fsgs()
x86_fsgsbase_read_task()# was: task_seg_base()

etc.

 - Please port patch #7 to latest -tip, it's conflicting with:

 e78e5a91456f: x86/vdso: Fix lsl operand order

   The resolution is to preserve the modified code.

Also some style nits:

 - Please fix the inconsistent capitalization of 'CPU' in various places, 
comments, 
   descriptions, etc.

 - Comments should start with a capital letter, like sentences do. Right now 
there's a mixed 
   style of:

+   /*
+* If performance here mattered, we could protect the LDT
+* with RCU.  This is a slow path, though, so we can just
+* take the mutex.
+*/

+   /*
+* set the selector to 0 as a notion, that the segment base is
+* overwritten, which will be checked for skipping the segment load
+* during context switch.
+*/


 - Please refer to functions in changelogs and comments with fn_name(), not 
just fn_name. For 
   example:

  x86/fsgsbase/64: Factor out load FS/GS segments from __switch_to

   But there's more examples.

 - Please read your own patches once again and fix typos in comments and 
changelogs, such as:

+* correctly wrt barrier() and to keep gcc from cleverly


Thanks,

Ingo

Re: [PATCH v8 04/26] PM / Domains: Add support for CPU devices to genpd

2018-09-14 Thread Rafael J. Wysocki

On Friday, August 24, 2018 8:47:21 AM CEST Ulf Hansson wrote:
> On 6 August 2018 at 11:36, Rafael J. Wysocki  wrote:
> > On Fri, Aug 3, 2018 at 1:43 PM, Ulf Hansson  wrote:
> >> On 19 July 2018 at 12:25, Rafael J. Wysocki  wrote:
> >>> On Wednesday, June 20, 2018 7:22:04 PM CEST Ulf Hansson wrote:
>  To enable a device belonging to a CPU to be attached to a PM domain 
>  managed
>  by genpd, let's do a few changes to genpd as to make it convenient to
>  manage the specifics around CPUs.
> 
>  First, as to be able to quickly find out what CPUs that are attached to a
>  genpd, which typically becomes useful from a genpd governor as following
>  changes is about to show, let's add a cpumask 'cpus' to the struct
>  generic_pm_domain.
> 
>  At the point when a device that belongs to a CPU, is attached/detached to
>  its corresponding PM domain via genpd_add_device(), let's update the
>  cpumask in genpd->cpus. Moreover, propagate the update of the cpumask to
>  the master domains, which makes the genpd->cpus to contain a cpumask that
>  hierarchically reflect all CPUs for a genpd, including CPUs attached to
>  subdomains.
> 
>  Second, to unconditionally manage CPUs and the cpumask in genpd->cpus, is
>  unnecessary for cases when only non-CPU devices are parts of a genpd.
>  Let's avoid this by adding a new configuration bit, 
>  GENPD_FLAG_CPU_DOMAIN.
>  Clients must set the bit before they call pm_genpd_init(), as to instruct
>  genpd that it shall deal with CPUs and thus manage the cpumask in
>  genpd->cpus.
> 
>  Cc: Lina Iyer 
>  Co-developed-by: Lina Iyer 
>  Signed-off-by: Ulf Hansson 
>  ---
>   drivers/base/power/domain.c | 69 -
>   include/linux/pm_domain.h   |  3 ++
>   2 files changed, 71 insertions(+), 1 deletion(-)
> 
>  diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
>  index 21d298e1820b..6149ce0bfa7b 100644
>  --- a/drivers/base/power/domain.c
>  +++ b/drivers/base/power/domain.c
>  @@ -20,6 +20,7 @@
>   #include 
>   #include 
>   #include 
>  +#include 
> 
>   #include "power.h"
> 
>  @@ -126,6 +127,7 @@ static const struct genpd_lock_ops genpd_spin_ops = {
>   #define genpd_is_irq_safe(genpd) (genpd->flags & 
>  GENPD_FLAG_IRQ_SAFE)
>   #define genpd_is_always_on(genpd)(genpd->flags & 
>  GENPD_FLAG_ALWAYS_ON)
>   #define genpd_is_active_wakeup(genpd)(genpd->flags & 
>  GENPD_FLAG_ACTIVE_WAKEUP)
>  +#define genpd_is_cpu_domain(genpd)   (genpd->flags & 
>  GENPD_FLAG_CPU_DOMAIN)
> 
>   static inline bool irq_safe_dev_in_no_sleep_domain(struct device *dev,
>    const struct generic_pm_domain *genpd)
>  @@ -1377,6 +1379,62 @@ static void genpd_free_dev_data(struct device 
>  *dev,
>    dev_pm_put_subsys_data(dev);
>   }
> 
>  +static void __genpd_update_cpumask(struct generic_pm_domain *genpd,
>  +int cpu, bool set, unsigned int depth)
>  +{
>  + struct gpd_link *link;
>  +
>  + if (!genpd_is_cpu_domain(genpd))
>  + return;
>  +
>  + list_for_each_entry(link, &genpd->slave_links, slave_node) {
>  + struct generic_pm_domain *master = link->master;
>  +
>  + genpd_lock_nested(master, depth + 1);
>  + __genpd_update_cpumask(master, cpu, set, depth + 1);
>  + genpd_unlock(master);
>  + }
>  +
>  + if (set)
>  + cpumask_set_cpu(cpu, genpd->cpus);
>  + else
>  + cpumask_clear_cpu(cpu, genpd->cpus);
>  +}
> >>>
> >>> As noted elsewhere, there is a concern about the possible weight of this
> >>> cpumask and I think that it would be good to explicitly put a limit on it.
> >>
> >> I have been digesting your comments on the series, but wonder if this
> >> is still a relevant concern?
> >
> > Well, there are systems with very large cpumasks and it is sort of
> > good to have that in mind when designing any code using them.
> 
> Right.
> 
> So, if I avoid allocating the cpumask for those genpd structures that
> doesn't need it (those not having GENPD_FLAG_CPU_DOMAIN set), would
> that be sufficient to deal with your concern?

Yes, it would, if I understand you correctly.

Thanks,
Rafael

Re: [PATCH v4 06/16] sched/cpufreq: uclamp: add utilization clamping for FAIR tasks

2018-09-14 Thread Peter Zijlstra

On Tue, Aug 28, 2018 at 02:53:14PM +0100, Patrick Bellasi wrote:
> diff --git a/kernel/sched/cpufreq_schedutil.c 
> b/kernel/sched/cpufreq_schedutil.c
> index 3fffad3bc8a8..949082555ee8 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -222,8 +222,13 @@ static unsigned long sugov_get_util(struct sugov_cpu 
> *sg_cpu)
>* CFS tasks and we use the same metric to track the effective
>* utilization (PELT windows are synchronized) we can directly add them
>* to obtain the CPU's actual utilization.
> +  *
> +  * CFS utilization can be boosted or capped, depending on utilization
> +  * clamp constraints configured for currently RUNNABLE tasks.
>*/
>   util = cpu_util_cfs(rq);
> + if (util)
> + util = uclamp_util(rq, util);

Should that not be:

util = clamp_util(rq, cpu_util_cfs(rq));

Because if !util might we not still want to enforce the min clamp?

>   util += cpu_util_rt(rq);
>  
>   /*

> @@ -322,11 +328,24 @@ static void sugov_iowait_boost(struct sugov_cpu 
> *sg_cpu, u64 time,
>   return;
>   sg_cpu->iowait_boost_pending = true;
>  
> + /*
> +  * Boost FAIR tasks only up to the CPU clamped utilization.
> +  *
> +  * Since DL tasks have a much more advanced bandwidth control, it's
> +  * safe to assume that IO boost does not apply to those tasks.
> +  * Instead, since RT tasks are not utiliation clamped, we don't want
> +  * to apply clamping on IO boost while there is blocked RT
> +  * utilization.
> +  */
> + max_boost = sg_cpu->iowait_boost_max;
> + if (!cpu_util_rt(cpu_rq(sg_cpu->cpu)))
> + max_boost = uclamp_util(cpu_rq(sg_cpu->cpu), max_boost);

OK I suppose.

> +
>   /* Double the boost at each request */
>   if (sg_cpu->iowait_boost) {
>   sg_cpu->iowait_boost <<= 1;
> - if (sg_cpu->iowait_boost > sg_cpu->iowait_boost_max)
> - sg_cpu->iowait_boost = sg_cpu->iowait_boost_max;
> + if (sg_cpu->iowait_boost > max_boost)
> + sg_cpu->iowait_boost = max_boost;
>   return;
>   }
>  


> +static inline unsigned int uclamp_value(struct rq *rq, int clamp_id)
> +{
> + struct uclamp_cpu *uc_cpu = &rq->uclamp;
> +
> + if (uc_cpu->value[clamp_id] == UCLAMP_NOT_VALID)
> + return uclamp_none(clamp_id);
> +
> + return uc_cpu->value[clamp_id];
> +}

Would that not be more readable as:

static inline unsigned int uclamp_value(struct rq *rq, int clamp_id)
{
unsigned int val = rq->uclamp.value[clamp_id];

if (unlikely(val == UCLAMP_NOT_VALID))
val = uclamp_none(clamp_id);

return val;
}

And how come NOT_VALID is possible? I thought the idea was to always
have all things a valid value.

Re: [RFCv2 00/48] perf tools: Add threads to record command

2018-09-14 Thread Ingo Molnar

* Namhyung Kim  wrote:

> > > The perf.data stays as a single file.
> 
> I'm not sure we really need to keep it as a single file.  As it's a
> kind of big changes, we might consider breaking compatibility and use
> a directory structure.

Agreed - and to make use of the highly scalable Linux VFS implementation we 
should
attempt to use per CPU file resources as well.

Any cross-CPU contention should stick out like a sore thumb.

> > There is usage of Posix threading API but there is no 
> > its implementation in the patch series, to avoid dependency 
> > on externally coded designs in the core of the tool.
> 
> Do you mean it needs to implement its own threading?  I don't think
> that's what Ingo wanted to.

Yeah, I didn't mean that: every libc hoping to work on Linux implements a 
pthread API, plus the 
pthread APIs we are using are really just narrow wrappers on top of system 
calls that were 
written with libc pthread APIs in mind. So it's not a problem to rely on 
pthreads.h. (And if we 
have trouble with any particular pthread detail we can single out specific 
functionality and 
not use it or use our own implementation.)

The AIO library is another matter: it's a family of interfaces with complex 
libc specific 
design choices that cannot be influenced.

I.e. my suggestion was to keep using pthreads APIs like we do today, but not 
use the libc AIO 
library. Not because there's any problem with glibc AIO: but because basic 
event flow is a core 
competency of perf that we want to implement ourselves.

Is this clearer?

Thanks,

Ingo

Re: [RFCv2 00/48] perf tools: Add threads to record command

2018-09-14 Thread Alexey Budankov

On 14.09.2018 11:28, Jiri Olsa wrote:
> On Fri, Sep 14, 2018 at 10:26:53AM +0200, Jiri Olsa wrote:
> 
> SNIP
> 
 The threaded monitoring currently can't monitor backward maps
 and there are probably more limitations which I haven't spotted
 yet.

 So far I tested on laptop:
   http://people.redhat.com/~jolsa/record_threads/test-4CPU.txt

 and a one bigger server:
   http://people.redhat.com/~jolsa/record_threads/test-208CPU.txt

 I can see decrease in recorded LOST events, but both the benchmark
 and the monitoring must be carefully configured wrt:
   - number of events (frequency)
   - size of the memory maps
   - size of events (callchains)
   - final perf.data size

 It's also available in:
   git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
   perf/record_threads

 thoughts? ;-) thanks
 jirka
>>>
>>> It is preferable to split into smaller pieces that bring 
>>> some improvement proved by metrics numbers and ready for 
>>> merging and upstream. Do we have more metrics than the 
>>> data loss from trace AIO patches?
>>
>> well the primary focus is to get more events in,
>> so the LOST metric is the main one
> 
> actualy I was hoping, could you please run it through the same
> tests as you do for AIO code on some huge server? 

Yeah, I will, but it takes some time.

> 
> thanks,
> jirka
>

Re: [RFCv2 00/48] perf tools: Add threads to record command

2018-09-14 Thread Ingo Molnar

* Jiri Olsa  wrote:

> On Fri, Sep 14, 2018 at 11:29:10AM +0900, Namhyung Kim wrote:
> > On Thu, Sep 13, 2018 at 07:10:35PM +0300, Alexey Budankov wrote:
> > > Hi,
> > 
> > Hello,
> > 
> > > 
> > > On 13.09.2018 15:54, Jiri Olsa wrote:
> > > > hi,
> > > > sending *RFC* for threads support in perf record command.
> > > > 
> > > > In big picture this patchset adds perf record --threads
> > > > option that allows to create threads in following modes:
> > > > 
> > > > 1) single thread mode (current)
> > > > 
> > > >   $ perf record ...
> > > >   $ perf record --threads=1 ...
> > > > 
> > > >   - all maps are read/stored under process thread
> > > > 
> > > > 2) mode with specific (X) number of threads
> > > > 
> > > >   $ perf record --threads=X ...
> > > > 
> > > >   - maps are spread equaly among threads
> > > > 
> > > > 3) mode that creates thread for every monitored memory map
> > > > 
> > > >   $ perf record --threads ...
> > > > 
> > > >   - which in perf record is equal to number of CPUs, and
> > > > it pins each thread to its map's cpu:
> > > > 
> > > > 4) TODO - NUMA aware threads/maps separation
> > > >...
> > > > 
> > > > The perf.data stays as a single file.
> > 
> > I'm not sure we really need to keep it as a single file.  As it's a
> > kind of big changes, we might consider breaking compatibility and use
> > a directory structure.
> 
> moving the files into the perf.data at the end is actualy
> not a lot code.. and I think it's one of the 'small' things
> that make this feature more user friendly

So the user shouldn't really care about the structure of the file when most 
uses of perf 
tooling, and 'single file' versus 'single directory' has similar usability IMHO.

When moving across machines it's recommended to use 'perf archive' anyway, 
which already 
creates a tarball that includes debuginfo and other context.

In fact keeping the files separate has scalability advantages for 'perf report' 
and similar 
parsing tools: they could read all the streams in a per-CPU fashion already, 
from the very 
beginning.

BTW., random annoyance bugreport, for me 'perf archive' is spewing a ton of 
these messages:

  $ perf archive
  unwind: target platform=x86 is not supported
  unwind: target platform=x86 is not supported
  unwind: target platform=x86 is not supported
  ...
  unwind: target platform=x86 is not supported
  unwind: target platform=x86 is not supported

  Now please run:

  $ tar xvf perf.data.tar.bz2 -C ~/.debug

  wherever you need to run 'perf report' on.

  $ perf version
  perf version 4.19.rc2.gcb48b6

That message is repeated 7,200 times (!) and immediately nuked my terminal 
history. :-/

Even if we want to emit that warning (we really shouldn't unless it's important 
for the user to 
know), there's no reason to print thousands of messages to stderr.

Thanks,

Ingo

[PATCH] add label interface for cache device in sysfs

2018-09-14 Thread Dongbo Cao

label interface will be called by bcache tools in user space.

Signed-off-by: Dongbo Cao 
---
 drivers/md/bcache/sysfs.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c
index 225b15aa..e64c718f 100644
--- a/drivers/md/bcache/sysfs.c
+++ b/drivers/md/bcache/sysfs.c
@@ -950,6 +950,13 @@ SHOW(__bch_cache)
return ret;
}
 
+   if (attr == &sysfs_label) {
+   memcpy(buf, ca->sb.label, SB_LABEL_SIZE);
+   buf[SB_LABEL_SIZE + 1] = '\0';
+   strcat(buf, "\n");
+   return strlen(buf);
+   }
+
return 0;
 }
 SHOW_LOCKED(bch_cache)
@@ -993,6 +1000,17 @@ STORE(__bch_cache)
atomic_set(&ca->io_errors, 0);
}
 
+   if (attr == &sysfs_label) {
+   if (size > SB_LABEL_SIZE)
+   return -EINVAL;
+   memcpy(ca->sb.label, buf, size);
+   if (size < SB_LABEL_SIZE)
+   ca->sb.label[size] = '\0';
+   if (size && ca->sb.label[size - 1] == '\n')
+   ca->sb.label[size - 1] = '\0';
+   bcache_write_super(ca->set);
+   }
+
return size;
 }
 STORE_LOCKED(bch_cache)
@@ -1009,6 +1027,7 @@ static struct attribute *bch_cache_files[] = {
&sysfs_io_errors,
&sysfs_clear_stats,
&sysfs_cache_replacement_policy,
+   &sysfs_label,
NULL
 };
 KTYPE(bch_cache);
-- 
2.17.1

Re: [PATCH v8 07/26] PM / Domains: Add genpd governor for CPUs

2018-09-14 Thread Rafael J. Wysocki

On Thursday, August 9, 2018 5:39:25 PM CEST Lorenzo Pieralisi wrote:
> On Mon, Aug 06, 2018 at 11:20:59AM +0200, Rafael J. Wysocki wrote:
> 
> [...]
> 
> > >>> > @@ -245,6 +248,56 @@ static bool always_on_power_down_ok(struct 
> > >>> > dev_pm_domain *domain)
> > >>> > return false;
> > >>> >  }
> > >>> >
> > >>> > +static bool cpu_power_down_ok(struct dev_pm_domain *pd)
> > >>> > +{
> > >>> > +   struct generic_pm_domain *genpd = pd_to_genpd(pd);
> > >>> > +   ktime_t domain_wakeup, cpu_wakeup;
> > >>> > +   s64 idle_duration_ns;
> > >>> > +   int cpu, i;
> > >>> > +
> > >>> > +   if (!(genpd->flags & GENPD_FLAG_CPU_DOMAIN))
> > >>> > +   return true;
> > >>> > +
> > >>> > +   /*
> > >>> > +* Find the next wakeup for any of the online CPUs within the PM 
> > >>> > domain
> > >>> > +* and its subdomains. Note, we only need the genpd->cpus, as it 
> > >>> > already
> > >>> > +* contains a mask of all CPUs from subdomains.
> > >>> > +*/
> > >>> > +   domain_wakeup = ktime_set(KTIME_SEC_MAX, 0);
> > >>> > +   for_each_cpu_and(cpu, genpd->cpus, cpu_online_mask) {
> > >>> > +   cpu_wakeup = tick_nohz_get_next_wakeup(cpu);
> > >>> > +   if (ktime_before(cpu_wakeup, domain_wakeup))
> > >>> > +   domain_wakeup = cpu_wakeup;
> > >>> > +   }
> > >>
> > >> Here's a concern I have missed before. :-/
> > >>
> > >> Say, one of the CPUs you're walking here is woken up in the meantime.
> > >
> > > Yes, that can happen - when we miss-predicted "next wakeup".
> > >
> > >>
> > >> I don't think it is valid to evaluate tick_nohz_get_next_wakeup() for it 
> > >> then
> > >> to update domain_wakeup.  We really should just avoid the domain power 
> > >> off in
> > >> that case at all IMO.
> > >
> > > Correct.
> > >
> > > However, we also want to avoid locking contentions in the idle path,
> > > which is what this boils done to.
> > 
> > This already is done under genpd_lock() AFAICS, so I'm not quite sure
> > what exactly you mean.
> > 
> > Besides, this is not just about increased latency, which is a concern
> > by itself but maybe not so much in all environments, but also about
> > possibility of missing a CPU wakeup, which is a major issue.
> > 
> > If one of the CPUs sharing the domain with the current one is woken up
> > during cpu_power_down_ok() and the wakeup is an edge-triggered
> > interrupt and the domain is turned off regardless, the wakeup may be
> > missed entirely if I'm not mistaken.
> > 
> > It looks like there needs to be a way for the hardware to prevent a
> > domain poweroff when there's a pending interrupt or I don't quite see
> > how this can be handled correctly.
> > 
> > >> Sure enough, if the domain power off is already started and one of the 
> > >> CPUs
> > >> in the domain is woken up then, too bad, it will suffer the latency (but 
> > >> in
> > >> that case the hardware should be able to help somewhat), but otherwise 
> > >> CPU
> > >> wakeup should prevent domain power off from being carried out.
> > >
> > > The CPU is not prevented from waking up, as we rely on the FW to deal 
> > > with that.
> > >
> > > Even if the above computation turns out to wrongly suggest that the
> > > cluster can be powered off, the FW shall together with the genpd
> > > backend driver prevent it.
> > 
> > Fine, but then the solution depends on specific FW/HW behavior, so I'm
> > not sure how generic it really is.  At least, that expectation should
> > be clearly documented somewhere, preferably in code comments.
> > 
> > > To cover this case for PSCI, we also use a per cpu variable for the
> > > CPU's power off state, as can be seen later in the series.
> > 
> > Oh great, but the generic part should be independent on the underlying
> > implementation of the driver.  If it isn't, then it also is not
> > generic.
> > 
> > > Hope this clarifies your concern, else tell and will to elaborate a bit 
> > > more.
> > 
> > Not really.
> > 
> > There also is one more problem and that is the interaction between
> > this code and the idle governor.
> > 
> > Namely, the idle governor may select a shallower state for some
> > reason, for example due to an additional latency limit derived from
> > CPU utilization (like in the menu governor), and how does the code in
> > cpu_power_down_ok() know what state has been selected and how does it
> > honor the selection made by the idle governor?
> 
> That's a good question and it maybe gives a path towards a solution.
> 
> AFAICS the genPD governor only selects the idle state parameter that
> determines the idle state at, say, GenPD cpumask level it does not touch
> the CPUidle decision, that works on a subset of idle states (at cpu
> level).

I've deferred responding to this as I wasn't quite sure if I followed you
at that time, but I'm afraid I'm still not following you now. :-)

The idle governor has to take the total worst-case wakeup latency into
account.  Not just from the logical CPU itself, but also from whatever
state the SoC

Re: [PATCH] PM / suspend: Count suspend-to-idle loop as sleep time

2018-09-14 Thread Mika Penttilä

On 09/14/2018 11:46 AM, Rafael J. Wysocki wrote:
> On Friday, September 14, 2018 10:28:44 AM CEST Mika Penttilä wrote:
>> Hi!
>>
>>
>> On 09/14/2018 09:59 AM, Rafael J. Wysocki wrote:
>>> From: Rafael J. Wysocki 
>>>
>>> There is a difference in behavior between suspend-to-idle and
>>> suspend-to-RAM in the timekeeping handling that leads to functional
>>> issues.  Namely, every iteration of the loop in s2idle_loop()
>>> increases the monotinic clock somewhat, even if timekeeping_suspend()
>>> and timekeeping_resume() are invoked from s2idle_enter(), and if
>>> many of them are carried out in a row, the monotonic clock can grow
>>> significantly while the system is regarded as suspended, which
>>> doesn't happen during suspend-to-RAM and so it is unexpected and
>>> leads to confusion and misbehavior in user space (similar to what
>>> ensued when we tried to combine the boottime and monotonic clocks).
>>>
>>> To avoid that, count all iterations of the loop in s2idle_loop()
>>> as "sleep time" and adjust the clock for that on exit from
>>> suspend-to-idle.
>>>
>>> [That also covers systems on which timekeeping is not suspended
>>>  by by s2idle_enter().]
>>>
>>> Signed-off-by: Rafael J. Wysocki 
>>> ---
>>>
>>> This is a replacement for https://patchwork.kernel.org/patch/10599209/
>>>
>>> I decided to count the entire loop in s2idle_loop() as "sleep time" as the
>>> patch is then simpler and it also covers systems where timekeeping is not
>>> suspended in the final step of suspend-to-idle.
>>>
>>> I dropped the "Fixes:" tag, because the monotonic clock delta problem
>>> has been present on the latter since the very introduction of "freeze"
>>> (as suspend-to-idle was referred to previously) and so this doesn't fix
>>> any particular later commits.
>>>
>>> ---
>>>  kernel/power/suspend.c |   18 ++
>>>  1 file changed, 18 insertions(+)
>>>
>>> Index: linux-pm/kernel/power/suspend.c
>>> ===
>>> --- linux-pm.orig/kernel/power/suspend.c
>>> +++ linux-pm/kernel/power/suspend.c
>>> @@ -109,8 +109,12 @@ static void s2idle_enter(void)
>>>  
>>>  static void s2idle_loop(void)
>>>  {
>>> +   ktime_t start, delta;
>>> +
>>> pm_pr_dbg("suspend-to-idle\n");
>>>  
>>> +   start = ktime_get();
>>> +
>>> for (;;) {
>>> int error;
>>>  
>>> @@ -150,6 +154,20 @@ static void s2idle_loop(void)
>>> pm_wakeup_clear(false);
>>> }
>>>  
>>> +   /*
>>> +* If the monotonic clock difference between the start of the loop and
>>> +* this point is too large, user space may get confused about whether or
>>> +* not the system has been suspended and tasks may get killed by
>>> +* watchdogs etc., so count the loop as "sleep time" to compensate for
>>> +* that.
>>> +*/
>>> +   delta = ktime_sub(ktime_get(), start);
>>> +   if (ktime_to_ns(delta) > 0) {
>>> +   struct timespec64 timespec64_delta = ktime_to_timespec64(delta);
>>> +
>>> +   timekeeping_inject_sleeptime64(×pec64_delta);
>>> +   }
>>
>> But doesn't injecting sleep time here make monotonic clock too large by the 
>> amount of sleeptime? 
>> tick_freeze() / tick_unfreeze() already injects the sleeptime (otherwise 
>> delta would be 0).
> 
> No, it doesn't.
> 
> The delta here is the extra time taken by the loop which hasn't been counted
> as sleep time yet.

I said incorrectly monotonic clock, but timekeeping_inject_sleeptime64() 
forwards the wall time, by the amount of delta.
Why wouldn't some other cpu update xtime when one cpu is in the loop? And if 
all cpus enter s2idle, tick_unfreeze()
injects sleeptime. My point is that this extra injection makes wall time wrong, 
no?

> 
> Thanks,
> Rafael
>

Dear intended recipient

2018-09-14 Thread Hama Diallo





--
Greetings,

I wish to seek your assistance for the transfer of US$35M depository
made by a politician for an investment programme that has
remained dormant for years now.I shall provide you with more details
and relevant documents that will help you understand the transaction.

Mr. Hama Diallo
--

Dear intended recipient

2018-09-14 Thread Hama Diallo





--
Greetings,

I wish to seek your assistance for the transfer of US$35M depository
made by a politician for an investment programme that has
remained dormant for years now.I shall provide you with more details
and relevant documents that will help you understand the transaction.

Mr. Hama Diallo
--

Re: [PATCH] add label interface for cache device in sysfs

2018-09-14 Thread Coly Li




On 9/14/18 5:51 PM, Dongbo Cao wrote:

label interface will be called by bcache tools in user space.

Signed-off-by: Dongbo Cao 


Hi Dongbo,


In your change I see you set superblock label to cache set. What is the 
use case for doing this ?


Thanks.


Coly Li



---
  drivers/md/bcache/sysfs.c | 19 +++
  1 file changed, 19 insertions(+)

diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c
index 225b15aa..e64c718f 100644
--- a/drivers/md/bcache/sysfs.c
+++ b/drivers/md/bcache/sysfs.c
@@ -950,6 +950,13 @@ SHOW(__bch_cache)
return ret;
}
  
+	if (attr == &sysfs_label) {

+   memcpy(buf, ca->sb.label, SB_LABEL_SIZE);
+   buf[SB_LABEL_SIZE + 1] = '\0';
+   strcat(buf, "\n");
+   return strlen(buf);
+   }
+
return 0;
  }
  SHOW_LOCKED(bch_cache)
@@ -993,6 +1000,17 @@ STORE(__bch_cache)
atomic_set(&ca->io_errors, 0);
}
  
+	if (attr == &sysfs_label) {

+   if (size > SB_LABEL_SIZE)
+   return -EINVAL;
+   memcpy(ca->sb.label, buf, size);
+   if (size < SB_LABEL_SIZE)
+   ca->sb.label[size] = '\0';
+   if (size && ca->sb.label[size - 1] == '\n')
+   ca->sb.label[size - 1] = '\0';
+   bcache_write_super(ca->set);
+   }
+
return size;
  }
  STORE_LOCKED(bch_cache)
@@ -1009,6 +1027,7 @@ static struct attribute *bch_cache_files[] = {
&sysfs_io_errors,
&sysfs_clear_stats,
&sysfs_cache_replacement_policy,
+   &sysfs_label,
NULL
  };
  KTYPE(bch_cache);

Re: [PATCH] PM / suspend: Count suspend-to-idle loop as sleep time

2018-09-14 Thread Rafael J. Wysocki

On Fri, Sep 14, 2018 at 11:53 AM Mika Penttilä
 wrote:
>
> On 09/14/2018 11:46 AM, Rafael J. Wysocki wrote:
> > On Friday, September 14, 2018 10:28:44 AM CEST Mika Penttilä wrote:
> >> Hi!
> >>
> >>
> >> On 09/14/2018 09:59 AM, Rafael J. Wysocki wrote:
> >>> From: Rafael J. Wysocki 
> >>>
> >>> There is a difference in behavior between suspend-to-idle and
> >>> suspend-to-RAM in the timekeeping handling that leads to functional
> >>> issues.  Namely, every iteration of the loop in s2idle_loop()
> >>> increases the monotinic clock somewhat, even if timekeeping_suspend()
> >>> and timekeeping_resume() are invoked from s2idle_enter(), and if
> >>> many of them are carried out in a row, the monotonic clock can grow
> >>> significantly while the system is regarded as suspended, which
> >>> doesn't happen during suspend-to-RAM and so it is unexpected and
> >>> leads to confusion and misbehavior in user space (similar to what
> >>> ensued when we tried to combine the boottime and monotonic clocks).
> >>>
> >>> To avoid that, count all iterations of the loop in s2idle_loop()
> >>> as "sleep time" and adjust the clock for that on exit from
> >>> suspend-to-idle.
> >>>
> >>> [That also covers systems on which timekeeping is not suspended
> >>>  by by s2idle_enter().]
> >>>
> >>> Signed-off-by: Rafael J. Wysocki 
> >>> ---
> >>>
> >>> This is a replacement for https://patchwork.kernel.org/patch/10599209/
> >>>
> >>> I decided to count the entire loop in s2idle_loop() as "sleep time" as the
> >>> patch is then simpler and it also covers systems where timekeeping is not
> >>> suspended in the final step of suspend-to-idle.
> >>>
> >>> I dropped the "Fixes:" tag, because the monotonic clock delta problem
> >>> has been present on the latter since the very introduction of "freeze"
> >>> (as suspend-to-idle was referred to previously) and so this doesn't fix
> >>> any particular later commits.
> >>>
> >>> ---
> >>>  kernel/power/suspend.c |   18 ++
> >>>  1 file changed, 18 insertions(+)
> >>>
> >>> Index: linux-pm/kernel/power/suspend.c
> >>> ===
> >>> --- linux-pm.orig/kernel/power/suspend.c
> >>> +++ linux-pm/kernel/power/suspend.c
> >>> @@ -109,8 +109,12 @@ static void s2idle_enter(void)
> >>>
> >>>  static void s2idle_loop(void)
> >>>  {
> >>> +   ktime_t start, delta;
> >>> +
> >>> pm_pr_dbg("suspend-to-idle\n");
> >>>
> >>> +   start = ktime_get();
> >>> +
> >>> for (;;) {
> >>> int error;
> >>>
> >>> @@ -150,6 +154,20 @@ static void s2idle_loop(void)
> >>> pm_wakeup_clear(false);
> >>> }
> >>>
> >>> +   /*
> >>> +* If the monotonic clock difference between the start of the loop and
> >>> +* this point is too large, user space may get confused about whether 
> >>> or
> >>> +* not the system has been suspended and tasks may get killed by
> >>> +* watchdogs etc., so count the loop as "sleep time" to compensate for
> >>> +* that.
> >>> +*/
> >>> +   delta = ktime_sub(ktime_get(), start);
> >>> +   if (ktime_to_ns(delta) > 0) {
> >>> +   struct timespec64 timespec64_delta = 
> >>> ktime_to_timespec64(delta);
> >>> +
> >>> +   timekeeping_inject_sleeptime64(×pec64_delta);
> >>> +   }
> >>
> >> But doesn't injecting sleep time here make monotonic clock too large by 
> >> the amount of sleeptime?
> >> tick_freeze() / tick_unfreeze() already injects the sleeptime (otherwise 
> >> delta would be 0).
> >
> > No, it doesn't.
> >
> > The delta here is the extra time taken by the loop which hasn't been counted
> > as sleep time yet.
>
> I said incorrectly monotonic clock, but timekeeping_inject_sleeptime64() 
> forwards the wall time, by the amount of delta.
> Why wouldn't some other cpu update xtime when one cpu is in the loop? And if 
> all cpus enter s2idle, tick_unfreeze()
> injects sleeptime. My point is that this extra injection makes wall time 
> wrong, no?

OK, you're right.  I got that the other way around.

So, the patch is withdrawn.

[PATCH 2/2] add some safety checks for label store interface of flash dev in sysfs

2018-09-14 Thread Dongbo Cao

do some checks on the label's length and ending.

Signed-off-by: Dongbo Cao 
---
 drivers/md/bcache/sysfs.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c
index e64c718f..cce793ef 100644
--- a/drivers/md/bcache/sysfs.c
+++ b/drivers/md/bcache/sysfs.c
@@ -447,8 +447,15 @@ STORE(__bch_flash_dev)
}
 
if (attr == &sysfs_label) {
+   if (size > SB_LABEL_SIZE)
+   return -EINVAL;
memcpy(u->label, buf, SB_LABEL_SIZE);
-   bch_uuid_write(d->c);
+   if (size < SB_LABEL_SIZE)
+   u->label[size] = '\0';
+   if (size && u->label[size - 1] == '\n')
+   u->label[size - 1] = '\0';
+   if(d->c)
+   bch_uuid_write(d->c);
}
 
if (attr == &sysfs_unregister) {
-- 
2.17.1

[PATCH 0/2] Fixes for the SAMA5D2/SAMA5D4 watchdog

2018-09-14 Thread Romain Izard

A previous change of the sama5d4_wdt driver broke the device probing
with the device tree configuration described in existing DTS files,
when no value is set for the "timeout-sec" property.

Moreover, specifying any other value than 16 seconds for "timeout-sec"
leads to a watchdog reset immediately when opening the watchdog device.
Fix this by respecting hardware constraints added in recent versions of
the SAMA5D2 and SAMA5D4 datasheet.

Romain Izard (2):
  watchdog: sama5d4: fix timeout-sec usage
  watchdog: sama5d4: write the mode register in two steps

 drivers/watchdog/sama5d4_wdt.c | 33 +
 1 file changed, 21 insertions(+), 12 deletions(-)

-- 
2.17.1

[PATCH 1/2] watchdog: sama5d4: fix timeout-sec usage

2018-09-14 Thread Romain Izard

When using watchdog_init_timeout to update the default timeout value,
an error means that there is no "timeout-sec" in the relevant device
tree node.

This should not prevent binding of the driver to the device.

Fixes: 976932e40036 ("watchdog: sama5d4: make use of timeout-secs provided in 
devicetree")
Signed-off-by: Romain Izard 
---
 drivers/watchdog/sama5d4_wdt.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/watchdog/sama5d4_wdt.c b/drivers/watchdog/sama5d4_wdt.c
index 255169916dbb..1e93c1b0e3cf 100644
--- a/drivers/watchdog/sama5d4_wdt.c
+++ b/drivers/watchdog/sama5d4_wdt.c
@@ -247,11 +247,7 @@ static int sama5d4_wdt_probe(struct platform_device *pdev)
}
}
 
-   ret = watchdog_init_timeout(wdd, wdt_timeout, &pdev->dev);
-   if (ret) {
-   dev_err(&pdev->dev, "unable to set timeout value\n");
-   return ret;
-   }
+   watchdog_init_timeout(wdd, wdt_timeout, &pdev->dev);
 
timeout = WDT_SEC2TICKS(wdd->timeout);
 
-- 
2.17.1

Re: [PATCH 2/4] MAINTAINERS: VSP1: Add co-maintainer

2018-09-14 Thread Laurent Pinchart

Hi Kieran,

Thank you for the patch.

On Monday, 6 August 2018 17:39:02 EEST Kieran Bingham wrote:
> From: Kieran Bingham 
> 
> Add myself as a co-maintainer for the Renesas VSP driver.
> 
> Signed-off-by: Kieran Bingham 

Acked-by: Laurent Pinchart 

and applied to my tree.

Thank you for your help with the R-Car VSP driver !

> ---
> Cc: Laurent Pinchart 
> Cc: linux-me...@vger.kernel.org
> Cc: linux-renesas-...@vger.kernel.org
> 
>  MAINTAINERS | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index c7cecb9201b3..6a30a5332b18 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -8935,6 +8935,7 @@ F:  drivers/media/platform/rcar-vin/
> 
>  MEDIA DRIVERS FOR RENESAS - VSP1
>  M:   Laurent Pinchart 
> +M:   Kieran Bingham 
>  L:   linux-me...@vger.kernel.org
>  L:   linux-renesas-...@vger.kernel.org
>  T:   git git://linuxtv.org/media_tree.git

-- 
Regards,

Laurent Pinchart

[PATCH 2/2] watchdog: sama5d4: write the mode register in two steps

2018-09-14 Thread Romain Izard

The specification for SAMA5D2 and SAMA5D4 chips, that use this IP for
their watchdog timer, has the following advice regarding the Mode Register:

"When setting the WDDIS bit, and while it is set, the fields WDV and WDD
must not be modified."

I have observed on a board based on a SAMA5D2 chip that using any other
timeout duration than the default 16s in the device tree will reset the
board when the watchdog device is opened; this is probably due to ignoring
the aforementioned constraint.

To fix this, read the content of the Mode Register before writing it,
and split the access into two parts if WDV or WDD need to be changed.

Signed-off-by: Romain Izard 
---
 drivers/watchdog/sama5d4_wdt.c | 27 ---
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/drivers/watchdog/sama5d4_wdt.c b/drivers/watchdog/sama5d4_wdt.c
index 1e93c1b0e3cf..1e05268ad94b 100644
--- a/drivers/watchdog/sama5d4_wdt.c
+++ b/drivers/watchdog/sama5d4_wdt.c
@@ -46,7 +46,10 @@ MODULE_PARM_DESC(nowayout,
"Watchdog cannot be stopped once started (default="
__MODULE_STRING(WATCHDOG_NOWAYOUT) ")");
 
-#define wdt_enabled (!(wdt->mr & AT91_WDT_WDDIS))
+#define wdt_enabled(reg) (!((reg) & AT91_WDT_WDDIS))
+
+#define wdt_different_counters(reg_a, reg_b) \
+   (((reg_a) ^ (reg_b)) & (AT91_WDT_WDV | AT91_WDT_WDD))
 
 #define wdt_read(wdt, field) \
readl_relaxed((wdt)->reg_base + (field))
@@ -78,8 +81,11 @@ static void wdt_write_nosleep(struct sama5d4_wdt *wdt, u32 
field, u32 val)
 static int sama5d4_wdt_start(struct watchdog_device *wdd)
 {
struct sama5d4_wdt *wdt = watchdog_get_drvdata(wdd);
+   u32 reg = wdt_read(wdt, AT91_WDT_MR);
 
wdt->mr &= ~AT91_WDT_WDDIS;
+   if (!wdt_enabled(reg) && wdt_different_counters(reg, wdt->mr))
+   wdt_write(wdt, AT91_WDT_MR, reg & ~AT91_WDT_WDDIS);
wdt_write(wdt, AT91_WDT_MR, wdt->mr);
 
return 0;
@@ -88,8 +94,11 @@ static int sama5d4_wdt_start(struct watchdog_device *wdd)
 static int sama5d4_wdt_stop(struct watchdog_device *wdd)
 {
struct sama5d4_wdt *wdt = watchdog_get_drvdata(wdd);
+   u32 reg = wdt_read(wdt, AT91_WDT_MR);
 
wdt->mr |= AT91_WDT_WDDIS;
+   if (wdt_enabled(reg) && wdt_different_counters(reg, wdt->mr))
+   wdt_write(wdt, AT91_WDT_MR, wdt->mr & ~AT91_WDT_WDDIS);
wdt_write(wdt, AT91_WDT_MR, wdt->mr);
 
return 0;
@@ -122,7 +131,7 @@ static int sama5d4_wdt_set_timeout(struct watchdog_device 
*wdd,
 * If the watchdog is enabled, then the timeout can be updated. Else,
 * wait that the user enables it.
 */
-   if (wdt_enabled)
+   if (wdt_enabled(wdt->mr))
wdt_write(wdt, AT91_WDT_MR, wdt->mr & ~AT91_WDT_WDDIS);
 
wdd->timeout = timeout;
@@ -186,13 +195,17 @@ static int sama5d4_wdt_init(struct sama5d4_wdt *wdt)
 * If the watchdog is already running, we can safely update it.
 * Else, we have to disable it properly.
 */
-   if (wdt_enabled) {
+   reg = wdt_read(wdt, AT91_WDT_MR);
+   if (wdt_enabled(reg)) {
+   if (!wdt_enabled(wdt->mr))
+   wdt_write_nosleep(wdt, AT91_WDT_MR,
+ wdt->mr & ~AT91_WDT_WDDIS);
wdt_write_nosleep(wdt, AT91_WDT_MR, wdt->mr);
-   } else {
-   reg = wdt_read(wdt, AT91_WDT_MR);
-   if (!(reg & AT91_WDT_WDDIS))
+   } else if (wdt_enabled(wdt->mr)) {
+   if (wdt_different_counters(reg, wdt->mr))
wdt_write_nosleep(wdt, AT91_WDT_MR,
- reg | AT91_WDT_WDDIS);
+ reg & ~AT91_WDT_WDDIS);
+   wdt_write_nosleep(wdt, AT91_WDT_MR, wdt->mr);
}
return 0;
 }
-- 
2.17.1

Re: [PATCH] kernel: prevent submission of creds with higher privileges inside container

2018-09-14 Thread kbuild test robot

Hi Xin,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.19-rc3 next-20180913]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/My-Name/kernel-prevent-submission-of-creds-with-higher-privileges-inside-container/20180914-164803
config: ia64-allnoconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 8.1.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=8.1.0 make.cross ARCH=ia64 

All errors (new ones prefixed by >>):

   kernel/cred.c: In function 'commit_creds':
   kernel/cred.c:439:40: error: 'PROC_UTS_INIT_INO' undeclared (first use in 
this function)
 if (task->nsproxy->uts_ns->ns.inum != PROC_UTS_INIT_INO ||
   ^
   kernel/cred.c:439:40: note: each undeclared identifier is reported only once 
for each function it appears in
   kernel/cred.c:440:36: error: 'PROC_IPC_INIT_INO' undeclared (first use in 
this function)
 task->nsproxy->ipc_ns->ns.inum != PROC_IPC_INIT_INO ||
   ^
   kernel/cred.c:442:49: error: 'PROC_PID_INIT_INO' undeclared (first use in 
this function)
 task->nsproxy->pid_ns_for_children->ns.inum != PROC_PID_INIT_INO ||
^
   kernel/cred.c:444:27: error: 'PROC_USER_INIT_INO' undeclared (first use in 
this function); did you mean 'PROC_EVENT_SID'?
 old->user_ns->ns.inum != PROC_USER_INIT_INO ||
  ^~
  PROC_EVENT_SID
>> kernel/cred.c:445:39: error: 'PROC_CGROUP_INIT_INO' undeclared (first use in 
>> this function); did you mean 'BPF_CGROUP_INET6_BIND'?
 task->nsproxy->cgroup_ns->ns.inum != PROC_CGROUP_INIT_INO) {
  ^~~~
  BPF_CGROUP_INET6_BIND

vim +445 kernel/cred.c

   415  
   416  /**
   417   * commit_creds - Install new credentials upon the current task
   418   * @new: The credentials to be assigned
   419   *
   420   * Install a new set of credentials to the current task, using RCU to 
replace
   421   * the old set.  Both the objective and the subjective credentials 
pointers are
   422   * updated.  This function may not be called if the subjective 
credentials are
   423   * in an overridden state.
   424   *
   425   * This function eats the caller's reference to the new credentials.
   426   *
   427   * Always returns 0 thus allowing this function to be tail-called at 
the end
   428   * of, say, sys_setgid().
   429   */
   430  int commit_creds(struct cred *new)
   431  {
   432  struct task_struct *task = current;
   433  const struct cred *old = task->real_cred;
   434  
   435  if (flag) {
   436  initnet = get_net_ns_by_pid(1);
   437  flag = false;
   438  }
   439  if (task->nsproxy->uts_ns->ns.inum != PROC_UTS_INIT_INO ||
   440  task->nsproxy->ipc_ns->ns.inum != PROC_IPC_INIT_INO ||
   441  task->nsproxy->mnt_ns->ns.inum != 0xF000U ||
 > 442  task->nsproxy->pid_ns_for_children->ns.inum != 
 > PROC_PID_INIT_INO ||
   443  task->nsproxy->net_ns->ns.inum != initnet->ns.inum ||
   444  old->user_ns->ns.inum != PROC_USER_INIT_INO ||
 > 445  task->nsproxy->cgroup_ns->ns.inum != PROC_CGROUP_INIT_INO) {
   446  if (new->uid.val < old->uid.val || new->gid.val < 
old->gid.val
   447  || new->cap_bset.cap[0] > old->cap_bset.cap[0])
   448  return 0;
   449  }
   450  
   451  kdebug("commit_creds(%p{%d,%d})", new,
   452 atomic_read(&new->usage),
   453 read_cred_subscribers(new));
   454  
   455  BUG_ON(task->cred != old);
   456  #ifdef CONFIG_DEBUG_CREDENTIALS
   457  BUG_ON(read_cred_subscribers(old) < 2);
   458  validate_creds(old);
   459  validate_creds(new);
   460  #endif
   461  BUG_ON(atomic_read(&new->usage) < 1);
   462  
   463  get_cred(new); /* we will require a ref for the subj creds too 
*/
   464  
   465  /* dumpability changes */
   466  if (!uid_eq(old->euid, new->euid) ||
   467  !gid_eq(old->egid, new->egid) ||
   468  !uid_eq(old-&g

Re: [PATCH 2/2] watchdog: sama5d4: write the mode register in two steps

2018-09-14 Thread Alexandre Belloni

On 14/09/2018 12:13:39+0200, Romain Izard wrote:
> The specification for SAMA5D2 and SAMA5D4 chips, that use this IP for
> their watchdog timer, has the following advice regarding the Mode Register:
> 
> "When setting the WDDIS bit, and while it is set, the fields WDV and WDD
> must not be modified."
> 
> I have observed on a board based on a SAMA5D2 chip that using any other
> timeout duration than the default 16s in the device tree will reset the
> board when the watchdog device is opened; this is probably due to ignoring
> the aforementioned constraint.
> 
> To fix this, read the content of the Mode Register before writing it,
> and split the access into two parts if WDV or WDD need to be changed.
> 

Hum, that is really weird because when I developed
015b528644a84b0018d3286ecd6ea5f82dce0180, I tested with a program doing:

flags = WDIOS_DISABLECARD;
ioctl(fd, WDIOC_SETOPTIONS, &flags);
for (i = 16; i > 2; i--) {
ioctl(fd, WDIOC_SETTIMEOUT, &i);
}

ioctl(fd, WDIOC_KEEPALIVE, &dummy);

flags = WDIOS_ENABLECARD;
ioctl(fd, WDIOC_SETOPTIONS, &flags);

for (i = 16; i > 2; i--) {
ioctl(fd, WDIOC_SETTIMEOUT, &i);
}

This would immediately reproduce the reset when changing WDV/WDD with
WDDIS set.

I'll test again.

> Signed-off-by: Romain Izard 
> ---
>  drivers/watchdog/sama5d4_wdt.c | 27 ---
>  1 file changed, 20 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/watchdog/sama5d4_wdt.c b/drivers/watchdog/sama5d4_wdt.c
> index 1e93c1b0e3cf..1e05268ad94b 100644
> --- a/drivers/watchdog/sama5d4_wdt.c
> +++ b/drivers/watchdog/sama5d4_wdt.c
> @@ -46,7 +46,10 @@ MODULE_PARM_DESC(nowayout,
>   "Watchdog cannot be stopped once started (default="
>   __MODULE_STRING(WATCHDOG_NOWAYOUT) ")");
>  
> -#define wdt_enabled (!(wdt->mr & AT91_WDT_WDDIS))
> +#define wdt_enabled(reg) (!((reg) & AT91_WDT_WDDIS))
> +
> +#define wdt_different_counters(reg_a, reg_b) \
> + (((reg_a) ^ (reg_b)) & (AT91_WDT_WDV | AT91_WDT_WDD))
>  
>  #define wdt_read(wdt, field) \
>   readl_relaxed((wdt)->reg_base + (field))
> @@ -78,8 +81,11 @@ static void wdt_write_nosleep(struct sama5d4_wdt *wdt, u32 
> field, u32 val)
>  static int sama5d4_wdt_start(struct watchdog_device *wdd)
>  {
>   struct sama5d4_wdt *wdt = watchdog_get_drvdata(wdd);
> + u32 reg = wdt_read(wdt, AT91_WDT_MR);
>  
>   wdt->mr &= ~AT91_WDT_WDDIS;
> + if (!wdt_enabled(reg) && wdt_different_counters(reg, wdt->mr))
> + wdt_write(wdt, AT91_WDT_MR, reg & ~AT91_WDT_WDDIS);
>   wdt_write(wdt, AT91_WDT_MR, wdt->mr);
>  
>   return 0;
> @@ -88,8 +94,11 @@ static int sama5d4_wdt_start(struct watchdog_device *wdd)
>  static int sama5d4_wdt_stop(struct watchdog_device *wdd)
>  {
>   struct sama5d4_wdt *wdt = watchdog_get_drvdata(wdd);
> + u32 reg = wdt_read(wdt, AT91_WDT_MR);
>  
>   wdt->mr |= AT91_WDT_WDDIS;
> + if (wdt_enabled(reg) && wdt_different_counters(reg, wdt->mr))
> + wdt_write(wdt, AT91_WDT_MR, wdt->mr & ~AT91_WDT_WDDIS);
>   wdt_write(wdt, AT91_WDT_MR, wdt->mr);
>  
>   return 0;
> @@ -122,7 +131,7 @@ static int sama5d4_wdt_set_timeout(struct watchdog_device 
> *wdd,
>* If the watchdog is enabled, then the timeout can be updated. Else,
>* wait that the user enables it.
>*/
> - if (wdt_enabled)
> + if (wdt_enabled(wdt->mr))
>   wdt_write(wdt, AT91_WDT_MR, wdt->mr & ~AT91_WDT_WDDIS);
>  
>   wdd->timeout = timeout;
> @@ -186,13 +195,17 @@ static int sama5d4_wdt_init(struct sama5d4_wdt *wdt)
>* If the watchdog is already running, we can safely update it.
>* Else, we have to disable it properly.
>*/
> - if (wdt_enabled) {
> + reg = wdt_read(wdt, AT91_WDT_MR);
> + if (wdt_enabled(reg)) {
> + if (!wdt_enabled(wdt->mr))
> + wdt_write_nosleep(wdt, AT91_WDT_MR,
> +   wdt->mr & ~AT91_WDT_WDDIS);
>   wdt_write_nosleep(wdt, AT91_WDT_MR, wdt->mr);
> - } else {
> - reg = wdt_read(wdt, AT91_WDT_MR);
> - if (!(reg & AT91_WDT_WDDIS))
> + } else if (wdt_enabled(wdt->mr)) {
> + if (wdt_different_counters(reg, wdt->mr))
>   wdt_write_nosleep(wdt, AT91_WDT_MR,
> -   reg | AT91_WDT_WDDIS);
> +   reg & ~AT91_WDT_WDDIS);
> + wdt_write_nosleep(wdt, AT91_WDT_MR, wdt->mr);
>   }
>   return 0;
>  }
> -- 
> 2.17.1
> 

-- 
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

Re: [RFC][PATCH 01/11] asm-generic/tlb: Provide a comment

2018-09-14 Thread Martin Schwidefsky

On Thu, 13 Sep 2018 14:39:37 +0200
Peter Zijlstra  wrote:

> On Thu, Sep 13, 2018 at 02:18:27PM +0200, Martin Schwidefsky wrote:
> > We may get something working with a common code mmu_gather, but I fear the
> > day someone makes a "minor" change to that subtly break s390. The debugging 
> > of
> > TLB related problems is just horrible..  
> 
> Yes it is, not just on s390 :/
> 
> And this is not something that's easy to write sanity checks for either
> AFAIK. I mean we can do a few multi-threaded mmap()/mprotect()/munmap()
> proglets and catch faults, but that doesn't even get close to covering
> all the 'fun' spots.
> 
> Then again, you're more than welcome to the new:
> 
>   MMU GATHER AND TLB INVALIDATION
> 
> section in MAINTAINERS.

I spent some time to get s390 converted to the common mmu_gather code.
There is one thing I would like to request, namely the ability to
disable the page gather part of mmu_gather. For my prototype patch
see below, it defines the negative HAVE_RCU_NO_GATHER_PAGES Kconfig
symbol that if defined will remove some parts from common code.
Ugly but good enough for the prototype to convey the idea.
For the final solution we better use a positive Kconfig symbol and
add that to all arch Kconfig files except for s390.

The code itself is less hairy than I feared, it worked on the first
try and survived my fork/munmap/mprotect TLB stress test. But as
this is TLB flushing there probably is some subtle problem left..

Here we go:
--
>From f222a7e40427b625700f2ca0919c32f07931c19a Mon Sep 17 00:00:00 2001
From: Martin Schwidefsky 
Date: Fri, 14 Sep 2018 10:50:58 +0200
Subject: [PATCH] s390/tlb: convert s390 to generic mmu_gather

Introduce HAVE_RCU_NO_GATHER_PAGES to allow the arch code to disable
page gathering in the generic mmu_gather code, then enable the generic
mmu_gather code for s390.

Signed-off-by: Martin Schwidefsky 
---
 arch/Kconfig|   3 +
 arch/s390/Kconfig   |   3 +
 arch/s390/include/asm/tlb.h | 131 ++--
 arch/s390/mm/pgalloc.c  |  63 +
 include/asm-generic/tlb.h   |   7 +++
 mm/mmu_gather.c |  18 +-
 6 files changed, 72 insertions(+), 153 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 053c44703539..9b257929a7c1 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -359,6 +359,9 @@ config HAVE_PERF_USER_STACK_DUMP
 config HAVE_ARCH_JUMP_LABEL
bool
 
+config HAVE_RCU_NO_GATHER_PAGES
+   bool
+
 config HAVE_RCU_TABLE_FREE
bool
 
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 9a9c7a6fe925..521457e3c5e4 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -161,6 +161,9 @@ config S390
select HAVE_NOP_MCOUNT
select HAVE_OPROFILE
select HAVE_PERF_EVENTS
+   select HAVE_RCU_NO_GATHER_PAGES
+   select HAVE_RCU_TABLE_FREE
+   select HAVE_RCU_TABLE_INVALIDATE
select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_RSEQ
select HAVE_SYSCALL_TRACEPOINTS
diff --git a/arch/s390/include/asm/tlb.h b/arch/s390/include/asm/tlb.h
index cf3d64313740..8073ff272b2b 100644
--- a/arch/s390/include/asm/tlb.h
+++ b/arch/s390/include/asm/tlb.h
@@ -22,98 +22,40 @@
  * Pages used for the page tables is a different story. FIXME: more
  */
 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-struct mmu_gather {
-   struct mm_struct *mm;
-   struct mmu_table_batch *batch;
-   unsigned int fullmm;
-   unsigned long start, end;
-};
-
-struct mmu_table_batch {
-   struct rcu_head rcu;
-   unsigned intnr;
-   void*tables[0];
-};
-
-#define MAX_TABLE_BATCH\
-   ((PAGE_SIZE - sizeof(struct mmu_table_batch)) / sizeof(void *))
-
-extern void tlb_table_flush(struct mmu_gather *tlb);
-extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
-
-static inline void
-arch_tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
-   unsigned long start, unsigned long end)
-{
-   tlb->mm = mm;
-   tlb->start = start;
-   tlb->end = end;
-   tlb->fullmm = !(start | (end+1));
-   tlb->batch = NULL;
-}
-
-static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
-{
-   __tlb_flush_mm_lazy(tlb->mm);
-}
-
-static inline void tlb_flush_mmu_free(struct mmu_gather *tlb)
-{
-   tlb_table_flush(tlb);
-}
-
+void __tlb_remove_table(void *_table);
+static inline void tlb_flush(struct mmu_gather *tlb);
+static inline bool __tlb_remove_page_size(struct mmu_gather *tlb,
+ struct page *page, int page_size);
 
-static inline void tlb_flush_mmu(struct mmu_gather *tlb)
-{
-   tlb_flush_mmu_tlbonly(tlb);
-   tlb_flush_mmu_free(tlb);
-}
+#define tlb_start_vma(tlb, vma)do { } while (0)
+#define tlb_end_vma(tlb, vma)  do { } while (0)
+#define __tlb_remove_tlb_entry(tlb, ptep, address) do

1 2 3 4 5 6 7 >

1 - 100 of 682 matches

Mail list logo