Re: sparc: bpf_jit: Move four assignments in bpf_jit_compile()

2016-09-03 Thread SF Markus Elfring
>> I hope so. - I propose to give the refactorings "Reduce scope of variable"
>> and "Extract a function" (and the corresponding consequences) another look.
> 
> So you _think_ it does. Come back with real proof.

Which test environments would you find acceptable for further clarification?


> I must also point out that these sorts of optimisations are things the
> compiler does automatically when compiling this code.

Do you take this detail for granted?


> Therefore it's highly likely that this change will make absolutely
> no difference whatsoever.

I find this outcome unlikely.


> (And no it won't improve compile speed in any justifiable way)

This was not a goal of my update suggestion for the function "bpf_jit_compile".


>> It is generally possible that a specific code generation variant will also 
>> affect
>> the run time properties you mentioned.
> 
> It's _possible_? Come back with benchmarks.

Which code generation variant would be useful to be clarified further?

Should we avoid to compare software things similar to "apples" and "oranges"
(while these fruits can make more fun)?   ;-)


> I must also point out that this is a "slow path" i.e. as long as it's
> not stupidly inefficient, the speed doesn't matter that much.

Can this execution path become warmer (or even "hot") for some other use cases?


> This change isn't going to improve the speed of this function by any amount
> that matters.

This is also possible.


> Unless you have some pretty damn good proof that these changes improve things,
> there is absolutely no reason to take them as-is

Would you care for a better source code structure?


> - you are making the code longer

Yes. - This is true for the suggested update in this function.


> and more difficult to read for no benefit

I proposed specific benefits for this software module.


> and wasting everyone's time in the process.

I assume that a few contributors can take the presented ideas for further 
considerations.
Will their value evolve a bit more later?

Regards,
Markus


Re: sparc: bpf_jit: Move four assignments in bpf_jit_compile()

2016-09-03 Thread SF Markus Elfring
>> I hope so. - I propose to give the refactorings "Reduce scope of variable"
>> and "Extract a function" (and the corresponding consequences) another look.
> 
> So you _think_ it does. Come back with real proof.

Which test environments would you find acceptable for further clarification?


> I must also point out that these sorts of optimisations are things the
> compiler does automatically when compiling this code.

Do you take this detail for granted?


> Therefore it's highly likely that this change will make absolutely
> no difference whatsoever.

I find this outcome unlikely.


> (And no it won't improve compile speed in any justifiable way)

This was not a goal of my update suggestion for the function "bpf_jit_compile".


>> It is generally possible that a specific code generation variant will also 
>> affect
>> the run time properties you mentioned.
> 
> It's _possible_? Come back with benchmarks.

Which code generation variant would be useful to be clarified further?

Should we avoid to compare software things similar to "apples" and "oranges"
(while these fruits can make more fun)?   ;-)


> I must also point out that this is a "slow path" i.e. as long as it's
> not stupidly inefficient, the speed doesn't matter that much.

Can this execution path become warmer (or even "hot") for some other use cases?


> This change isn't going to improve the speed of this function by any amount
> that matters.

This is also possible.


> Unless you have some pretty damn good proof that these changes improve things,
> there is absolutely no reason to take them as-is

Would you care for a better source code structure?


> - you are making the code longer

Yes. - This is true for the suggested update in this function.


> and more difficult to read for no benefit

I proposed specific benefits for this software module.


> and wasting everyone's time in the process.

I assume that a few contributors can take the presented ideas for further 
considerations.
Will their value evolve a bit more later?

Regards,
Markus


[PATCH] ARM: LPAE: initialize cachepolicy correctly

2016-09-03 Thread Stefan Agner
The cachepolicy variable gets initialized using a masked pmd
So far, the pmd has been masked with flags valid for the 2-page
table format. In the LPAE case, this lead to a wrong assumption
of what the initial cachepolicy has been used. Later a check
forces the cache policy to writealloc and prints the following
warning:
Forcing write-allocate cache policy for SMP

This patch uses PMD_SECT_WBWA to mask all cache setting flags.
The define represents the complete mask of the cache relevant
flags for both page table formats.

Signed-off-by: Stefan Agner 
---
 arch/arm/mm/mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 724d6be..241e5e2 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -137,7 +137,7 @@ void __init init_default_cache_policy(unsigned long pmd)
 
initial_pmd_value = pmd;
 
-   pmd &= PMD_SECT_TEX(1) | PMD_SECT_BUFFERABLE | PMD_SECT_CACHEABLE;
+   pmd &= PMD_SECT_WBWA;
 
for (i = 0; i < ARRAY_SIZE(cache_policies); i++)
if (cache_policies[i].pmd == pmd) {
-- 
2.9.0



[PATCH] ARM: LPAE: initialize cachepolicy correctly

2016-09-03 Thread Stefan Agner
The cachepolicy variable gets initialized using a masked pmd
So far, the pmd has been masked with flags valid for the 2-page
table format. In the LPAE case, this lead to a wrong assumption
of what the initial cachepolicy has been used. Later a check
forces the cache policy to writealloc and prints the following
warning:
Forcing write-allocate cache policy for SMP

This patch uses PMD_SECT_WBWA to mask all cache setting flags.
The define represents the complete mask of the cache relevant
flags for both page table formats.

Signed-off-by: Stefan Agner 
---
 arch/arm/mm/mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 724d6be..241e5e2 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -137,7 +137,7 @@ void __init init_default_cache_policy(unsigned long pmd)
 
initial_pmd_value = pmd;
 
-   pmd &= PMD_SECT_TEX(1) | PMD_SECT_BUFFERABLE | PMD_SECT_CACHEABLE;
+   pmd &= PMD_SECT_WBWA;
 
for (i = 0; i < ARRAY_SIZE(cache_policies); i++)
if (cache_policies[i].pmd == pmd) {
-- 
2.9.0



Re: [PATCH 2/4] sparc: bpf_jit: Move four assignments in bpf_jit_compile()

2016-09-03 Thread Julian Calaby
Hi Markus,

On Sun, Sep 4, 2016 at 2:33 PM, SF Markus Elfring
 wrote:
>>> Date: Sat, 3 Sep 2016 17:45:28 +0200
>>>
>>> Move the assignments for four local variables a bit at the beginning
>>> so that they will only be performed if a corresponding memory allocation
>>> succeeded by this function.
> …
>>> @@ -362,10 +362,10 @@ do {  *prog++ = BR_OPC | WDISP22(OFF);
>>> \
>>>
>>>  void bpf_jit_compile(struct bpf_prog *fp)
>>>  {
>>> -   unsigned int cleanup_addr, proglen, oldproglen = 0;
>>> -   u32 temp[8], *prog, *func, seen = 0, pass;
>>> -   const struct sock_filter *filter = fp->insns;
>>> -   int i, flen = fp->len, pc_ret0 = -1;
>>> +   unsigned int cleanup_addr, proglen, oldproglen;
>>> +   u32 temp[8], *prog, *func, seen, pass;
>>> +   const struct sock_filter *filter;
>>> +   int i, flen = fp->len, pc_ret0;
>>> unsigned int *addrs;
>>> void *image;
>>>
>>> @@ -385,6 +385,10 @@ void bpf_jit_compile(struct bpf_prog *fp)
>>> }
>>> cleanup_addr = proglen; /* epilogue address */
>>> image = NULL;
>>> +   filter = fp->insns;
>>> +   oldproglen = 0;
>>> +   pc_ret0 = -1;
>>> +   seen = 0;
>>> for (pass = 0; pass < 10; pass++) {
>>> u8 seen_or_pass0 = (pass == 0) ? (SEEN_XREG | SEEN_DATAREF 
>>> | SEEN_MEM) : seen;
> …
>> If you were moving the assignments on declaration onto separate lines
>> at the top of the file then ok,
>
> I see another software design option where the transformation result might be 
> looking
> more pleasing for you again.
>
>
>> but why all the way down here?
>
> * How do you think about the reason I gave in the short commit message?

Does this change improve the resulting binary? I.e. does it make it
smaller or faster? If it's smaller, by how much? if it's faster,
measure it.

Otherwise this change is useless churn - you're making the code more
complicated, longer and harder to read for practically no benefit.

Thanks,

-- 
Julian Calaby

Email: julian.cal...@gmail.com
Profile: http://www.google.com/profiles/julian.calaby/


Re: [PATCH 2/4] sparc: bpf_jit: Move four assignments in bpf_jit_compile()

2016-09-03 Thread Julian Calaby
Hi Markus,

On Sun, Sep 4, 2016 at 2:33 PM, SF Markus Elfring
 wrote:
>>> Date: Sat, 3 Sep 2016 17:45:28 +0200
>>>
>>> Move the assignments for four local variables a bit at the beginning
>>> so that they will only be performed if a corresponding memory allocation
>>> succeeded by this function.
> …
>>> @@ -362,10 +362,10 @@ do {  *prog++ = BR_OPC | WDISP22(OFF);
>>> \
>>>
>>>  void bpf_jit_compile(struct bpf_prog *fp)
>>>  {
>>> -   unsigned int cleanup_addr, proglen, oldproglen = 0;
>>> -   u32 temp[8], *prog, *func, seen = 0, pass;
>>> -   const struct sock_filter *filter = fp->insns;
>>> -   int i, flen = fp->len, pc_ret0 = -1;
>>> +   unsigned int cleanup_addr, proglen, oldproglen;
>>> +   u32 temp[8], *prog, *func, seen, pass;
>>> +   const struct sock_filter *filter;
>>> +   int i, flen = fp->len, pc_ret0;
>>> unsigned int *addrs;
>>> void *image;
>>>
>>> @@ -385,6 +385,10 @@ void bpf_jit_compile(struct bpf_prog *fp)
>>> }
>>> cleanup_addr = proglen; /* epilogue address */
>>> image = NULL;
>>> +   filter = fp->insns;
>>> +   oldproglen = 0;
>>> +   pc_ret0 = -1;
>>> +   seen = 0;
>>> for (pass = 0; pass < 10; pass++) {
>>> u8 seen_or_pass0 = (pass == 0) ? (SEEN_XREG | SEEN_DATAREF 
>>> | SEEN_MEM) : seen;
> …
>> If you were moving the assignments on declaration onto separate lines
>> at the top of the file then ok,
>
> I see another software design option where the transformation result might be 
> looking
> more pleasing for you again.
>
>
>> but why all the way down here?
>
> * How do you think about the reason I gave in the short commit message?

Does this change improve the resulting binary? I.e. does it make it
smaller or faster? If it's smaller, by how much? if it's faster,
measure it.

Otherwise this change is useless churn - you're making the code more
complicated, longer and harder to read for practically no benefit.

Thanks,

-- 
Julian Calaby

Email: julian.cal...@gmail.com
Profile: http://www.google.com/profiles/julian.calaby/


Re: [PATCH v2 0/3] Add Platform MHU mailbox driver for Amlogic GXBB

2016-09-03 Thread Jassi Brar
On Sun, Sep 4, 2016 at 2:15 AM, Kevin Hilman  wrote:
> On Fri, Sep 2, 2016 at 10:33 PM, Jassi Brar  wrote:
>> On Sat, Sep 3, 2016 at 5:04 AM, Kevin Hilman  wrote:
>>> Hi Jassi,
>>>
>>> Neil Armstrong  writes:
>>>
 In order to support Mailbox links for the Amlogic GXBB SoC, add a generic
 platform MHU driver based on arm_mhu.c.

 This patchset follows a RFC thread along the GXBB SCPI support at :
 http://lkml.kernel.org/r/1466503374-28841-1-git-send-email-narmstr...@baylibre.com
 And specific MHU discussions at :
 http://lkml.kernel.org/r/CABb+yY3HqJG2+GMWCWF9PomxobrwWGZ=tze5nvxpchmddlh...@mail.gmail.com

 Changes since v1 at 
 http://lkml.kernel.org/r/1470732737-18391-1-git-send-email-narmstr...@baylibre.com
  :
  - Fix irq to signed to detect platform_get_irq() failures
  - Introduced back the secure channel
  - Fixed indexes
>>>
>>> How is this version looking to you?
>>>
>>> With your review/ack on the driver, I'd be happy to take it via the
>>> amlogic tree as there shouldn't be any conflicts with your tree.
>>>
>> I am ok with the driver, but I don't understand how it helps going via
>> amlogic tree. I would like to send a pull request every release ;)
>
> That's fine with me too.  I offered tot take int just in case you had
> already sent pull requests for this cycle, and also since I'll take
> the DT changes through arm-soc.
>
This revision came after rc2, so it wasn't included. I will queue
patch-1 & 2 for next release, and you the patch-3.

Thanks.


Re: [PATCH v2 0/3] Add Platform MHU mailbox driver for Amlogic GXBB

2016-09-03 Thread Jassi Brar
On Sun, Sep 4, 2016 at 2:15 AM, Kevin Hilman  wrote:
> On Fri, Sep 2, 2016 at 10:33 PM, Jassi Brar  wrote:
>> On Sat, Sep 3, 2016 at 5:04 AM, Kevin Hilman  wrote:
>>> Hi Jassi,
>>>
>>> Neil Armstrong  writes:
>>>
 In order to support Mailbox links for the Amlogic GXBB SoC, add a generic
 platform MHU driver based on arm_mhu.c.

 This patchset follows a RFC thread along the GXBB SCPI support at :
 http://lkml.kernel.org/r/1466503374-28841-1-git-send-email-narmstr...@baylibre.com
 And specific MHU discussions at :
 http://lkml.kernel.org/r/CABb+yY3HqJG2+GMWCWF9PomxobrwWGZ=tze5nvxpchmddlh...@mail.gmail.com

 Changes since v1 at 
 http://lkml.kernel.org/r/1470732737-18391-1-git-send-email-narmstr...@baylibre.com
  :
  - Fix irq to signed to detect platform_get_irq() failures
  - Introduced back the secure channel
  - Fixed indexes
>>>
>>> How is this version looking to you?
>>>
>>> With your review/ack on the driver, I'd be happy to take it via the
>>> amlogic tree as there shouldn't be any conflicts with your tree.
>>>
>> I am ok with the driver, but I don't understand how it helps going via
>> amlogic tree. I would like to send a pull request every release ;)
>
> That's fine with me too.  I offered tot take int just in case you had
> already sent pull requests for this cycle, and also since I'll take
> the DT changes through arm-soc.
>
This revision came after rc2, so it wasn't included. I will queue
patch-1 & 2 for next release, and you the patch-3.

Thanks.


Re: sparc: bpf_jit: Move four assignments in bpf_jit_compile()

2016-09-03 Thread Julian Calaby
Hi Markus,

On Sun, Sep 4, 2016 at 3:00 PM, SF Markus Elfring
 wrote:
>> Does this change improve the resulting binary?
>
> I hope so. - I propose to give the refactorings "Reduce scope of variable"
> and "Extract a function" (and the corresponding consequences) another look.

So you _think_ it does. Come back with real proof.

I must also point out that these sorts of optimisations are things the
compiler does automatically when compiling this code. Therefore it's
highly likely that this change will make absolutely no difference
whatsoever. (And no it won't improve compile speed in any justifiable
way)

>> I.e. does it make it smaller or faster?
>
> It is generally possible that a specific code generation variant will also 
> affect
> the run time properties you mentioned.

It's _possible_? Come back with benchmarks.

I must also point out that this is a "slow path" i.e. as long as it's
not stupidly inefficient, the speed doesn't matter that much. This
change isn't going to improve the speed of this function by any amount
that matters.

>> Otherwise this change is useless churn - you're making the code more
>> complicated, longer and harder to read for practically no benefit.
>
> I imagine that there other reasons you could eventually accept
> for this use case, aren't there?

Unless you have some pretty damn good proof that these changes improve
things, there is absolutely no reason to take them as-is - you are
making the code longer and more difficult to read for no benefit and
wasting everyone's time in the process.

Thanks,

-- 
Julian Calaby

Email: julian.cal...@gmail.com
Profile: http://www.google.com/profiles/julian.calaby/


Re: sparc: bpf_jit: Move four assignments in bpf_jit_compile()

2016-09-03 Thread Julian Calaby
Hi Markus,

On Sun, Sep 4, 2016 at 3:00 PM, SF Markus Elfring
 wrote:
>> Does this change improve the resulting binary?
>
> I hope so. - I propose to give the refactorings "Reduce scope of variable"
> and "Extract a function" (and the corresponding consequences) another look.

So you _think_ it does. Come back with real proof.

I must also point out that these sorts of optimisations are things the
compiler does automatically when compiling this code. Therefore it's
highly likely that this change will make absolutely no difference
whatsoever. (And no it won't improve compile speed in any justifiable
way)

>> I.e. does it make it smaller or faster?
>
> It is generally possible that a specific code generation variant will also 
> affect
> the run time properties you mentioned.

It's _possible_? Come back with benchmarks.

I must also point out that this is a "slow path" i.e. as long as it's
not stupidly inefficient, the speed doesn't matter that much. This
change isn't going to improve the speed of this function by any amount
that matters.

>> Otherwise this change is useless churn - you're making the code more
>> complicated, longer and harder to read for practically no benefit.
>
> I imagine that there other reasons you could eventually accept
> for this use case, aren't there?

Unless you have some pretty damn good proof that these changes improve
things, there is absolutely no reason to take them as-is - you are
making the code longer and more difficult to read for no benefit and
wasting everyone's time in the process.

Thanks,

-- 
Julian Calaby

Email: julian.cal...@gmail.com
Profile: http://www.google.com/profiles/julian.calaby/


arch/mips/include/asm/mach-cavium-octeon/mangle-port.h:19:40: error: right shift count >= width of type

2016-09-03 Thread kbuild test robot
Hi Steven,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   28e68154c5e2793123b248d38cf17b34dcb16d87
commit: 1685ddbe35cd4637f7f841d5f9755dd0470bd68d MIPS: Octeon: Changes to 
support readq()/writeq() usage.
date:   8 weeks ago
config: mips-cavium_octeon_defconfig (attached as .config)
compiler: mips64-linux-gnuabi64-gcc (Debian 5.4.0-6) 5.4.0 20160609
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout 1685ddbe35cd4637f7f841d5f9755dd0470bd68d
# save the attached .config to linux build tree
make.cross ARCH=mips 

All errors (new ones prefixed by >>):

   In file included from arch/mips/include/asm/io.h:32:0,
from arch/mips/include/asm/page.h:176,
from arch/mips/vdso/vdso.h:26,
from arch/mips/vdso/gettimeofday.c:11:
   arch/mips/include/asm/mach-cavium-octeon/mangle-port.h: In function 
'__should_swizzle_bits':
>> arch/mips/include/asm/mach-cavium-octeon/mangle-port.h:19:40: error: right 
>> shift count >= width of type [-Werror=shift-count-overflow]
 unsigned long did = ((unsigned long)a >> 40) & 0xff;
   ^
   cc1: all warnings being treated as errors

vim +19 arch/mips/include/asm/mach-cavium-octeon/mangle-port.h

13  #ifdef __BIG_ENDIAN
14  
15  static inline bool __should_swizzle_bits(volatile void *a)
16  {
17  extern const bool octeon_should_swizzle_table[];
18  
  > 19  unsigned long did = ((unsigned long)a >> 40) & 0xff;
20  return octeon_should_swizzle_table[did];
21  }
22  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


arch/mips/include/asm/mach-cavium-octeon/mangle-port.h:19:40: error: right shift count >= width of type

2016-09-03 Thread kbuild test robot
Hi Steven,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   28e68154c5e2793123b248d38cf17b34dcb16d87
commit: 1685ddbe35cd4637f7f841d5f9755dd0470bd68d MIPS: Octeon: Changes to 
support readq()/writeq() usage.
date:   8 weeks ago
config: mips-cavium_octeon_defconfig (attached as .config)
compiler: mips64-linux-gnuabi64-gcc (Debian 5.4.0-6) 5.4.0 20160609
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout 1685ddbe35cd4637f7f841d5f9755dd0470bd68d
# save the attached .config to linux build tree
make.cross ARCH=mips 

All errors (new ones prefixed by >>):

   In file included from arch/mips/include/asm/io.h:32:0,
from arch/mips/include/asm/page.h:176,
from arch/mips/vdso/vdso.h:26,
from arch/mips/vdso/gettimeofday.c:11:
   arch/mips/include/asm/mach-cavium-octeon/mangle-port.h: In function 
'__should_swizzle_bits':
>> arch/mips/include/asm/mach-cavium-octeon/mangle-port.h:19:40: error: right 
>> shift count >= width of type [-Werror=shift-count-overflow]
 unsigned long did = ((unsigned long)a >> 40) & 0xff;
   ^
   cc1: all warnings being treated as errors

vim +19 arch/mips/include/asm/mach-cavium-octeon/mangle-port.h

13  #ifdef __BIG_ENDIAN
14  
15  static inline bool __should_swizzle_bits(volatile void *a)
16  {
17  extern const bool octeon_should_swizzle_table[];
18  
  > 19  unsigned long did = ((unsigned long)a >> 40) & 0xff;
20  return octeon_should_swizzle_table[did];
21  }
22  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: cmsg newgroup alt.sex.fetish.bool (was Re: [PATCH] arch: all: include: asm: bitops: Use bool instead of int for all bit test functions)

2016-09-03 Thread Al Viro
On Sun, Sep 04, 2016 at 06:36:56AM +0800, Chen Gang wrote:

> And for all: shall I provide the proof for another archs?
> 
> For me, Boolean gives additional chance to compiler to improve the code.

Whereas for compiler it gives nothing.  Not in those cases.

> If the compiler can not improve the code, it can treat it as int simply.
> So theoretically, at least, Boolean should not be worse than int.

Except for pointless code churn and pandering to irrational beliefs, that is...
Please, RTFISO9899 and learn the semantics of _Bool; it's not that complicated.
Start with 6.2.5[2,6] and 6.3.1.2, then look through 6.8.4 and 6.8.5 to
figure out the semantics of conditions in if/while/for.  Note also 6.5.8,
6.5.9, 6.5.13 and 6.5.14 and observe that type of (x > 5 && y < 1) is *NOT* 
_Bool; it's int.

If you can show any improvement or loss in code generation in this case
(static inline int converted to static inline bool), I would really like to
see the details.  As in .config/file/function/gcc version/target architecture.
Optimizer bugs happens, but they should be reported when found, and I would
expect _Bool handling to be _less_ exercised than that of normal logical
expressions, so loss is probably more likely.  And yes, it also should be
reported.


Re: cmsg newgroup alt.sex.fetish.bool (was Re: [PATCH] arch: all: include: asm: bitops: Use bool instead of int for all bit test functions)

2016-09-03 Thread Al Viro
On Sun, Sep 04, 2016 at 06:36:56AM +0800, Chen Gang wrote:

> And for all: shall I provide the proof for another archs?
> 
> For me, Boolean gives additional chance to compiler to improve the code.

Whereas for compiler it gives nothing.  Not in those cases.

> If the compiler can not improve the code, it can treat it as int simply.
> So theoretically, at least, Boolean should not be worse than int.

Except for pointless code churn and pandering to irrational beliefs, that is...
Please, RTFISO9899 and learn the semantics of _Bool; it's not that complicated.
Start with 6.2.5[2,6] and 6.3.1.2, then look through 6.8.4 and 6.8.5 to
figure out the semantics of conditions in if/while/for.  Note also 6.5.8,
6.5.9, 6.5.13 and 6.5.14 and observe that type of (x > 5 && y < 1) is *NOT* 
_Bool; it's int.

If you can show any improvement or loss in code generation in this case
(static inline int converted to static inline bool), I would really like to
see the details.  As in .config/file/function/gcc version/target architecture.
Optimizer bugs happens, but they should be reported when found, and I would
expect _Bool handling to be _less_ exercised than that of normal logical
expressions, so loss is probably more likely.  And yes, it also should be
reported.


[PATCH 1/3 (fix commit message)] perf tools: Recognize hugetlb mapping as anon mapping

2016-09-03 Thread Wang Nan
Hugetlbfs mapping should be recognized as anon mapping so user has
a chance to create /tmp/perf-.map file for symbol resolving. This
patch utilizes MAP_HUGETLB to identify hugetlb mapping.

After this patch, if perf is started before the program starts using
huge pages (so perf gets MMAP2 events from kernel), perf is able to
recognize hugetlb mapping as anon mapping.

Signed-off-by: Wang Nan 
Signed-off-by: Hou Pengyang 
Cc: He Kuang 
Cc: Arnaldo Carvalho de Melo 
Cc: Nilay Vaish 
---
 tools/perf/util/map.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
index 728129a..a42010d 100644
--- a/tools/perf/util/map.c
+++ b/tools/perf/util/map.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "map.h"
 #include "thread.h"
 #include "strlist.h"
@@ -24,9 +25,10 @@ const char *map_type__name[MAP__NR_TYPES] = {
[MAP__VARIABLE] = "Variables",
 };
 
-static inline int is_anon_memory(const char *filename)
+static inline int is_anon_memory(const char *filename, u32 flags)
 {
-   return !strcmp(filename, "//anon") ||
+   return flags & MAP_HUGETLB ||
+  !strcmp(filename, "//anon") ||
   !strncmp(filename, "/dev/zero", sizeof("/dev/zero") - 1) ||
   !strncmp(filename, "/anon_hugepage", sizeof("/anon_hugepage") - 
1);
 }
@@ -155,7 +157,7 @@ struct map *map__new(struct machine *machine, u64 start, 
u64 len,
int anon, no_dso, vdso, android;
 
android = is_android_lib(filename);
-   anon = is_anon_memory(filename);
+   anon = is_anon_memory(filename, flags);
vdso = is_vdso_map(filename);
no_dso = is_no_dso_memory(filename);
 
-- 
1.8.3.4



[PATCH 1/3 (fix commit message)] perf tools: Recognize hugetlb mapping as anon mapping

2016-09-03 Thread Wang Nan
Hugetlbfs mapping should be recognized as anon mapping so user has
a chance to create /tmp/perf-.map file for symbol resolving. This
patch utilizes MAP_HUGETLB to identify hugetlb mapping.

After this patch, if perf is started before the program starts using
huge pages (so perf gets MMAP2 events from kernel), perf is able to
recognize hugetlb mapping as anon mapping.

Signed-off-by: Wang Nan 
Signed-off-by: Hou Pengyang 
Cc: He Kuang 
Cc: Arnaldo Carvalho de Melo 
Cc: Nilay Vaish 
---
 tools/perf/util/map.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
index 728129a..a42010d 100644
--- a/tools/perf/util/map.c
+++ b/tools/perf/util/map.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "map.h"
 #include "thread.h"
 #include "strlist.h"
@@ -24,9 +25,10 @@ const char *map_type__name[MAP__NR_TYPES] = {
[MAP__VARIABLE] = "Variables",
 };
 
-static inline int is_anon_memory(const char *filename)
+static inline int is_anon_memory(const char *filename, u32 flags)
 {
-   return !strcmp(filename, "//anon") ||
+   return flags & MAP_HUGETLB ||
+  !strcmp(filename, "//anon") ||
   !strncmp(filename, "/dev/zero", sizeof("/dev/zero") - 1) ||
   !strncmp(filename, "/anon_hugepage", sizeof("/anon_hugepage") - 
1);
 }
@@ -155,7 +157,7 @@ struct map *map__new(struct machine *machine, u64 start, 
u64 len,
int anon, no_dso, vdso, android;
 
android = is_android_lib(filename);
-   anon = is_anon_memory(filename);
+   anon = is_anon_memory(filename, flags);
vdso = is_vdso_map(filename);
no_dso = is_no_dso_memory(filename);
 
-- 
1.8.3.4



Re: sparc: bpf_jit: Move four assignments in bpf_jit_compile()

2016-09-03 Thread SF Markus Elfring
> Does this change improve the resulting binary?

I hope so. - I propose to give the refactorings "Reduce scope of variable"
and "Extract a function" (and the corresponding consequences) another look.


> I.e. does it make it smaller or faster?

It is generally possible that a specific code generation variant will also 
affect
the run time properties you mentioned.


> Otherwise this change is useless churn - you're making the code more
> complicated, longer and harder to read for practically no benefit.

I imagine that there other reasons you could eventually accept
for this use case, aren't there?

Regards,
Markus


Re: sparc: bpf_jit: Move four assignments in bpf_jit_compile()

2016-09-03 Thread SF Markus Elfring
> Does this change improve the resulting binary?

I hope so. - I propose to give the refactorings "Reduce scope of variable"
and "Extract a function" (and the corresponding consequences) another look.


> I.e. does it make it smaller or faster?

It is generally possible that a specific code generation variant will also 
affect
the run time properties you mentioned.


> Otherwise this change is useless churn - you're making the code more
> complicated, longer and harder to read for practically no benefit.

I imagine that there other reasons you could eventually accept
for this use case, aren't there?

Regards,
Markus


[tip:timers/core 6/6] include/trace/events/alarmtimer.h:62: undefined reference to `rtc_ktime_to_tm'

2016-09-03 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers/core
head:   a0a6e06d545a753740c9d8d5ce2c4fdd3ab1c021
commit: a0a6e06d545a753740c9d8d5ce2c4fdd3ab1c021 [6/6] time: alarmtimer: Add 
tracepoints for alarmtimers
config: s390-default_defconfig (attached as .config)
compiler: s390x-linux-gnu-gcc (Debian 5.4.0-6) 5.4.0 20160609
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout a0a6e06d545a753740c9d8d5ce2c4fdd3ab1c021
# save the attached .config to linux build tree
make.cross ARCH=s390 

All errors (new ones prefixed by >>):

   kernel/built-in.o: In function `perf_trace_alarm_processing':
>> include/trace/events/alarmtimer.h:62: undefined reference to 
>> `rtc_ktime_to_tm'
>> include/trace/events/alarmtimer.h:62: undefined reference to 
>> `rtc_ktime_to_tm'
>> include/trace/events/alarmtimer.h:62: undefined reference to 
>> `rtc_ktime_to_tm'
>> include/trace/events/alarmtimer.h:62: undefined reference to 
>> `rtc_ktime_to_tm'
>> include/trace/events/alarmtimer.h:62: undefined reference to 
>> `rtc_ktime_to_tm'
   kernel/built-in.o:include/trace/events/alarmtimer.h:62: more undefined 
references to `rtc_ktime_to_tm' follow

vim +62 include/trace/events/alarmtimer.h

56  
57  TP_PROTO(struct rtc_time *time, int flag),
58  
59  TP_ARGS(time, flag)
60  );
61  
  > 62  DECLARE_EVENT_CLASS(alarm_processing,
63  
64  TP_PROTO(struct alarm *alarm, char *process_name),
65  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


[tip:timers/core 6/6] include/trace/events/alarmtimer.h:62: undefined reference to `rtc_ktime_to_tm'

2016-09-03 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers/core
head:   a0a6e06d545a753740c9d8d5ce2c4fdd3ab1c021
commit: a0a6e06d545a753740c9d8d5ce2c4fdd3ab1c021 [6/6] time: alarmtimer: Add 
tracepoints for alarmtimers
config: s390-default_defconfig (attached as .config)
compiler: s390x-linux-gnu-gcc (Debian 5.4.0-6) 5.4.0 20160609
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout a0a6e06d545a753740c9d8d5ce2c4fdd3ab1c021
# save the attached .config to linux build tree
make.cross ARCH=s390 

All errors (new ones prefixed by >>):

   kernel/built-in.o: In function `perf_trace_alarm_processing':
>> include/trace/events/alarmtimer.h:62: undefined reference to 
>> `rtc_ktime_to_tm'
>> include/trace/events/alarmtimer.h:62: undefined reference to 
>> `rtc_ktime_to_tm'
>> include/trace/events/alarmtimer.h:62: undefined reference to 
>> `rtc_ktime_to_tm'
>> include/trace/events/alarmtimer.h:62: undefined reference to 
>> `rtc_ktime_to_tm'
>> include/trace/events/alarmtimer.h:62: undefined reference to 
>> `rtc_ktime_to_tm'
   kernel/built-in.o:include/trace/events/alarmtimer.h:62: more undefined 
references to `rtc_ktime_to_tm' follow

vim +62 include/trace/events/alarmtimer.h

56  
57  TP_PROTO(struct rtc_time *time, int flag),
58  
59  TP_ARGS(time, flag)
60  );
61  
  > 62  DECLARE_EVENT_CLASS(alarm_processing,
63  
64  TP_PROTO(struct alarm *alarm, char *process_name),
65  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [PATCH 2/4] sparc: bpf_jit: Move four assignments in bpf_jit_compile()

2016-09-03 Thread SF Markus Elfring
>> Date: Sat, 3 Sep 2016 17:45:28 +0200
>>
>> Move the assignments for four local variables a bit at the beginning
>> so that they will only be performed if a corresponding memory allocation
>> succeeded by this function.
…
>> @@ -362,10 +362,10 @@ do {  *prog++ = BR_OPC | WDISP22(OFF); 
>>\
>>
>>  void bpf_jit_compile(struct bpf_prog *fp)
>>  {
>> -   unsigned int cleanup_addr, proglen, oldproglen = 0;
>> -   u32 temp[8], *prog, *func, seen = 0, pass;
>> -   const struct sock_filter *filter = fp->insns;
>> -   int i, flen = fp->len, pc_ret0 = -1;
>> +   unsigned int cleanup_addr, proglen, oldproglen;
>> +   u32 temp[8], *prog, *func, seen, pass;
>> +   const struct sock_filter *filter;
>> +   int i, flen = fp->len, pc_ret0;
>> unsigned int *addrs;
>> void *image;
>>
>> @@ -385,6 +385,10 @@ void bpf_jit_compile(struct bpf_prog *fp)
>> }
>> cleanup_addr = proglen; /* epilogue address */
>> image = NULL;
>> +   filter = fp->insns;
>> +   oldproglen = 0;
>> +   pc_ret0 = -1;
>> +   seen = 0;
>> for (pass = 0; pass < 10; pass++) {
>> u8 seen_or_pass0 = (pass == 0) ? (SEEN_XREG | SEEN_DATAREF | 
>> SEEN_MEM) : seen;
…
> If you were moving the assignments on declaration onto separate lines
> at the top of the file then ok,

I see another software design option where the transformation result might be 
looking
more pleasing for you again.


> but why all the way down here?

* How do you think about the reason I gave in the short commit message?

* Are you interested in an other software refactoring instead?
  http://refactoring.com/catalog/reduceScopeOfVariable.html

  Would you eventually like to move the source code for this for loop into 
another function?
  http://refactoring.com/catalog/extractMethod.html

Regards,
Markus


Re: [PATCH 2/4] sparc: bpf_jit: Move four assignments in bpf_jit_compile()

2016-09-03 Thread SF Markus Elfring
>> Date: Sat, 3 Sep 2016 17:45:28 +0200
>>
>> Move the assignments for four local variables a bit at the beginning
>> so that they will only be performed if a corresponding memory allocation
>> succeeded by this function.
…
>> @@ -362,10 +362,10 @@ do {  *prog++ = BR_OPC | WDISP22(OFF); 
>>\
>>
>>  void bpf_jit_compile(struct bpf_prog *fp)
>>  {
>> -   unsigned int cleanup_addr, proglen, oldproglen = 0;
>> -   u32 temp[8], *prog, *func, seen = 0, pass;
>> -   const struct sock_filter *filter = fp->insns;
>> -   int i, flen = fp->len, pc_ret0 = -1;
>> +   unsigned int cleanup_addr, proglen, oldproglen;
>> +   u32 temp[8], *prog, *func, seen, pass;
>> +   const struct sock_filter *filter;
>> +   int i, flen = fp->len, pc_ret0;
>> unsigned int *addrs;
>> void *image;
>>
>> @@ -385,6 +385,10 @@ void bpf_jit_compile(struct bpf_prog *fp)
>> }
>> cleanup_addr = proglen; /* epilogue address */
>> image = NULL;
>> +   filter = fp->insns;
>> +   oldproglen = 0;
>> +   pc_ret0 = -1;
>> +   seen = 0;
>> for (pass = 0; pass < 10; pass++) {
>> u8 seen_or_pass0 = (pass == 0) ? (SEEN_XREG | SEEN_DATAREF | 
>> SEEN_MEM) : seen;
…
> If you were moving the assignments on declaration onto separate lines
> at the top of the file then ok,

I see another software design option where the transformation result might be 
looking
more pleasing for you again.


> but why all the way down here?

* How do you think about the reason I gave in the short commit message?

* Are you interested in an other software refactoring instead?
  http://refactoring.com/catalog/reduceScopeOfVariable.html

  Would you eventually like to move the source code for this for loop into 
another function?
  http://refactoring.com/catalog/extractMethod.html

Regards,
Markus


[PATCH] video: mxsfb: get supply regulator optionally

2016-09-03 Thread Stefan Agner
The lcd-supply is meant to be optional, there are several device-
trees not specifying it and the code handles error values silently.
Therefor, avoid creating a dummy regulator (and the associated
warning) by using devm_regulator_get_optional.

While at it, document that fact also in the device-tree bindings.

Signed-off-by: Stefan Agner 
---
 Documentation/devicetree/bindings/display/mxsfb.txt | 3 +++
 drivers/video/fbdev/mxsfb.c | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/display/mxsfb.txt 
b/Documentation/devicetree/bindings/display/mxsfb.txt
index 96ec517..0bc530f 100644
--- a/Documentation/devicetree/bindings/display/mxsfb.txt
+++ b/Documentation/devicetree/bindings/display/mxsfb.txt
@@ -7,6 +7,9 @@ Required properties:
 - interrupts: Should contain lcdif interrupts
 - display : phandle to display node (see below for details)
 
+Optional properties:
+- lcd-supply: Regulator for LCD supply voltage.
+
 * display node
 
 Required properties:
diff --git a/drivers/video/fbdev/mxsfb.c b/drivers/video/fbdev/mxsfb.c
index 4e6608c..4f7570f 100644
--- a/drivers/video/fbdev/mxsfb.c
+++ b/drivers/video/fbdev/mxsfb.c
@@ -920,7 +920,7 @@ static int mxsfb_probe(struct platform_device *pdev)
if (IS_ERR(host->clk_disp_axi))
host->clk_disp_axi = NULL;
 
-   host->reg_lcd = devm_regulator_get(>dev, "lcd");
+   host->reg_lcd = devm_regulator_get_optional(>dev, "lcd");
if (IS_ERR(host->reg_lcd))
host->reg_lcd = NULL;
 
-- 
2.9.0



[PATCH] video: mxsfb: get supply regulator optionally

2016-09-03 Thread Stefan Agner
The lcd-supply is meant to be optional, there are several device-
trees not specifying it and the code handles error values silently.
Therefor, avoid creating a dummy regulator (and the associated
warning) by using devm_regulator_get_optional.

While at it, document that fact also in the device-tree bindings.

Signed-off-by: Stefan Agner 
---
 Documentation/devicetree/bindings/display/mxsfb.txt | 3 +++
 drivers/video/fbdev/mxsfb.c | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/display/mxsfb.txt 
b/Documentation/devicetree/bindings/display/mxsfb.txt
index 96ec517..0bc530f 100644
--- a/Documentation/devicetree/bindings/display/mxsfb.txt
+++ b/Documentation/devicetree/bindings/display/mxsfb.txt
@@ -7,6 +7,9 @@ Required properties:
 - interrupts: Should contain lcdif interrupts
 - display : phandle to display node (see below for details)
 
+Optional properties:
+- lcd-supply: Regulator for LCD supply voltage.
+
 * display node
 
 Required properties:
diff --git a/drivers/video/fbdev/mxsfb.c b/drivers/video/fbdev/mxsfb.c
index 4e6608c..4f7570f 100644
--- a/drivers/video/fbdev/mxsfb.c
+++ b/drivers/video/fbdev/mxsfb.c
@@ -920,7 +920,7 @@ static int mxsfb_probe(struct platform_device *pdev)
if (IS_ERR(host->clk_disp_axi))
host->clk_disp_axi = NULL;
 
-   host->reg_lcd = devm_regulator_get(>dev, "lcd");
+   host->reg_lcd = devm_regulator_get_optional(>dev, "lcd");
if (IS_ERR(host->reg_lcd))
host->reg_lcd = NULL;
 
-- 
2.9.0



Re: [PATCH 09/15] virtio-blk: Pass attribute group to device_add_disk

2016-09-03 Thread Michael S. Tsirkin
On Wed, Aug 17, 2016 at 03:15:09PM +0800, Fam Zheng wrote:
> Previously after device_add_disk returns, the KOBJ_ADD uevent is already
> emitted. Adding attributes after that is a poor usage of kobject, and
> in practice may result in race conditions with userspace, for
> example udev checks availability of certain attributes and initializes
> /dev entries conditionally.
> 
> device_add_disk can handle adding attribute group better, so use it.
> 
> Meanwhile, handle error of device_add_disk.
> 
> Signed-off-by: Fam Zheng 

Feel free to merge this with the rest of patchset
Acked-by: Michael S. Tsirkin 



> ---
>  drivers/block/virtio_blk.c | 38 +++---
>  1 file changed, 27 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 4564df5..ff60d82 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -522,10 +522,10 @@ virtblk_cache_type_show(struct device *dev, struct 
> device_attribute *attr,
>   return snprintf(buf, 40, "%s\n", virtblk_cache_types[writeback]);
>  }
>  
> -static const struct device_attribute dev_attr_cache_type_ro =
> +static struct device_attribute dev_attr_cache_type_ro =
>   __ATTR(cache_type, S_IRUGO,
>  virtblk_cache_type_show, NULL);
> -static const struct device_attribute dev_attr_cache_type_rw =
> +static struct device_attribute dev_attr_cache_type_rw =
>   __ATTR(cache_type, S_IRUGO|S_IWUSR,
>  virtblk_cache_type_show, virtblk_cache_type_store);
>  
> @@ -550,6 +550,26 @@ static struct blk_mq_ops virtio_mq_ops = {
>  static unsigned int virtblk_queue_depth;
>  module_param_named(queue_depth, virtblk_queue_depth, uint, 0444);
>  
> +static struct attribute *virtblk_attrs_ro[] = {
> + _attr_serial.attr,
> + _attr_cache_type_ro.attr,
> + NULL
> +};
> +
> +static struct attribute *virtblk_attrs_rw[] = {
> + _attr_serial.attr,
> + _attr_cache_type_rw.attr,
> + NULL
> +};
> +
> +static struct attribute_group virtblk_attr_group_ro = {
> + .attrs  = virtblk_attrs_ro,
> +};
> +
> +static struct attribute_group virtblk_attr_group_rw = {
> + .attrs  = virtblk_attrs_rw,
> +};
> +
>  static int virtblk_probe(struct virtio_device *vdev)
>  {
>   struct virtio_blk *vblk;
> @@ -560,6 +580,7 @@ static int virtblk_probe(struct virtio_device *vdev)
>   u32 v, blk_size, sg_elems, opt_io_size;
>   u16 min_io_size;
>   u8 physical_block_exp, alignment_offset;
> + struct attribute_group *attr_group;
>  
>   if (!vdev->config->get) {
>   dev_err(>dev, "%s failure: config access disabled\n",
> @@ -719,19 +740,14 @@ static int virtblk_probe(struct virtio_device *vdev)
>  
>   virtio_device_ready(vdev);
>  
> - device_add_disk(>dev, vblk->disk, NULL);
> - err = device_create_file(disk_to_dev(vblk->disk), _attr_serial);
> - if (err)
> - goto out_del_disk;
> -
>   if (virtio_has_feature(vdev, VIRTIO_BLK_F_CONFIG_WCE))
> - err = device_create_file(disk_to_dev(vblk->disk),
> -  _attr_cache_type_rw);
> + attr_group = _attr_group_rw;
>   else
> - err = device_create_file(disk_to_dev(vblk->disk),
> -  _attr_cache_type_ro);
> + attr_group = _attr_group_ro;
> + err = device_add_disk(>dev, vblk->disk, attr_group);
>   if (err)
>   goto out_del_disk;
> +
>   return 0;
>  
>  out_del_disk:
> -- 
> 2.7.4


[PATCH] usb: phy: generic: request regulator optionally

2016-09-03 Thread Stefan Agner
According to the device tree bindings the vcc-supply is optional.
So far the driver did request the regulator using devm_regulator_get
which creates a dummy regulator for convenience. Since we can have
the supply unconnected, we should make use of the optional variant
of the regulator call which does not return a dummy regulator but
-ENODEV. The driver already has checks in case the regulator is an
error pointer.

Note that with this change the behavior is slightly different in
case devm_regulator_get_optional returns -EPROBE_DEFER: The driver
returns -EPROBE_DEFER even if needs_vcc is set false. This is the
correct behavior, since even if the regulator is optional, it might
get initialized later...

Signed-off-by: Stefan Agner 
---
This gets rid of warnings such as this (seen on i.MX 7):
3080.aips-bus:usbphynop1 supply vcc not found, using dummy regulator

--
Stefan

 drivers/usb/phy/phy-generic.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/usb/phy/phy-generic.c b/drivers/usb/phy/phy-generic.c
index 980c9de..38ceadc 100644
--- a/drivers/usb/phy/phy-generic.c
+++ b/drivers/usb/phy/phy-generic.c
@@ -275,12 +275,12 @@ int usb_phy_gen_create_phy(struct device *dev, struct 
usb_phy_generic *nop,
}
}
 
-   nop->vcc = devm_regulator_get(dev, "vcc");
+   nop->vcc = devm_regulator_get_optional(dev, "vcc");
if (IS_ERR(nop->vcc)) {
dev_dbg(dev, "Error getting vcc regulator: %ld\n",
PTR_ERR(nop->vcc));
-   if (needs_vcc)
-   return -EPROBE_DEFER;
+   if (needs_vcc || PTR_ERR(nop->vcc) == -EPROBE_DEFER)
+   return PTR_ERR(nop->vcc);
}
 
nop->dev= dev;
-- 
2.9.0



Re: [PATCH 09/15] virtio-blk: Pass attribute group to device_add_disk

2016-09-03 Thread Michael S. Tsirkin
On Wed, Aug 17, 2016 at 03:15:09PM +0800, Fam Zheng wrote:
> Previously after device_add_disk returns, the KOBJ_ADD uevent is already
> emitted. Adding attributes after that is a poor usage of kobject, and
> in practice may result in race conditions with userspace, for
> example udev checks availability of certain attributes and initializes
> /dev entries conditionally.
> 
> device_add_disk can handle adding attribute group better, so use it.
> 
> Meanwhile, handle error of device_add_disk.
> 
> Signed-off-by: Fam Zheng 

Feel free to merge this with the rest of patchset
Acked-by: Michael S. Tsirkin 



> ---
>  drivers/block/virtio_blk.c | 38 +++---
>  1 file changed, 27 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 4564df5..ff60d82 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -522,10 +522,10 @@ virtblk_cache_type_show(struct device *dev, struct 
> device_attribute *attr,
>   return snprintf(buf, 40, "%s\n", virtblk_cache_types[writeback]);
>  }
>  
> -static const struct device_attribute dev_attr_cache_type_ro =
> +static struct device_attribute dev_attr_cache_type_ro =
>   __ATTR(cache_type, S_IRUGO,
>  virtblk_cache_type_show, NULL);
> -static const struct device_attribute dev_attr_cache_type_rw =
> +static struct device_attribute dev_attr_cache_type_rw =
>   __ATTR(cache_type, S_IRUGO|S_IWUSR,
>  virtblk_cache_type_show, virtblk_cache_type_store);
>  
> @@ -550,6 +550,26 @@ static struct blk_mq_ops virtio_mq_ops = {
>  static unsigned int virtblk_queue_depth;
>  module_param_named(queue_depth, virtblk_queue_depth, uint, 0444);
>  
> +static struct attribute *virtblk_attrs_ro[] = {
> + _attr_serial.attr,
> + _attr_cache_type_ro.attr,
> + NULL
> +};
> +
> +static struct attribute *virtblk_attrs_rw[] = {
> + _attr_serial.attr,
> + _attr_cache_type_rw.attr,
> + NULL
> +};
> +
> +static struct attribute_group virtblk_attr_group_ro = {
> + .attrs  = virtblk_attrs_ro,
> +};
> +
> +static struct attribute_group virtblk_attr_group_rw = {
> + .attrs  = virtblk_attrs_rw,
> +};
> +
>  static int virtblk_probe(struct virtio_device *vdev)
>  {
>   struct virtio_blk *vblk;
> @@ -560,6 +580,7 @@ static int virtblk_probe(struct virtio_device *vdev)
>   u32 v, blk_size, sg_elems, opt_io_size;
>   u16 min_io_size;
>   u8 physical_block_exp, alignment_offset;
> + struct attribute_group *attr_group;
>  
>   if (!vdev->config->get) {
>   dev_err(>dev, "%s failure: config access disabled\n",
> @@ -719,19 +740,14 @@ static int virtblk_probe(struct virtio_device *vdev)
>  
>   virtio_device_ready(vdev);
>  
> - device_add_disk(>dev, vblk->disk, NULL);
> - err = device_create_file(disk_to_dev(vblk->disk), _attr_serial);
> - if (err)
> - goto out_del_disk;
> -
>   if (virtio_has_feature(vdev, VIRTIO_BLK_F_CONFIG_WCE))
> - err = device_create_file(disk_to_dev(vblk->disk),
> -  _attr_cache_type_rw);
> + attr_group = _attr_group_rw;
>   else
> - err = device_create_file(disk_to_dev(vblk->disk),
> -  _attr_cache_type_ro);
> + attr_group = _attr_group_ro;
> + err = device_add_disk(>dev, vblk->disk, attr_group);
>   if (err)
>   goto out_del_disk;
> +
>   return 0;
>  
>  out_del_disk:
> -- 
> 2.7.4


[PATCH] usb: phy: generic: request regulator optionally

2016-09-03 Thread Stefan Agner
According to the device tree bindings the vcc-supply is optional.
So far the driver did request the regulator using devm_regulator_get
which creates a dummy regulator for convenience. Since we can have
the supply unconnected, we should make use of the optional variant
of the regulator call which does not return a dummy regulator but
-ENODEV. The driver already has checks in case the regulator is an
error pointer.

Note that with this change the behavior is slightly different in
case devm_regulator_get_optional returns -EPROBE_DEFER: The driver
returns -EPROBE_DEFER even if needs_vcc is set false. This is the
correct behavior, since even if the regulator is optional, it might
get initialized later...

Signed-off-by: Stefan Agner 
---
This gets rid of warnings such as this (seen on i.MX 7):
3080.aips-bus:usbphynop1 supply vcc not found, using dummy regulator

--
Stefan

 drivers/usb/phy/phy-generic.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/usb/phy/phy-generic.c b/drivers/usb/phy/phy-generic.c
index 980c9de..38ceadc 100644
--- a/drivers/usb/phy/phy-generic.c
+++ b/drivers/usb/phy/phy-generic.c
@@ -275,12 +275,12 @@ int usb_phy_gen_create_phy(struct device *dev, struct 
usb_phy_generic *nop,
}
}
 
-   nop->vcc = devm_regulator_get(dev, "vcc");
+   nop->vcc = devm_regulator_get_optional(dev, "vcc");
if (IS_ERR(nop->vcc)) {
dev_dbg(dev, "Error getting vcc regulator: %ld\n",
PTR_ERR(nop->vcc));
-   if (needs_vcc)
-   return -EPROBE_DEFER;
+   if (needs_vcc || PTR_ERR(nop->vcc) == -EPROBE_DEFER)
+   return PTR_ERR(nop->vcc);
}
 
nop->dev= dev;
-- 
2.9.0



Apply Today

2016-09-03 Thread Loan Firm
Dear Sir/Madam,

We give out urgent loan for business and personal purpose with 3% intrest rate 
applicable to all amount.

Kindly get back to us via email: loa...@foxmail.com for further details on how 
to apply.


Apply Today

2016-09-03 Thread Loan Firm
Dear Sir/Madam,

We give out urgent loan for business and personal purpose with 3% intrest rate 
applicable to all amount.

Kindly get back to us via email: loa...@foxmail.com for further details on how 
to apply.


Re: [PATCH 2/2] mwifiex: simplify length computation for some memset

2016-09-03 Thread Julian Calaby
Hi All,

On Mon, Aug 8, 2016 at 5:39 PM, Christophe JAILLET
 wrote:
> This patch should be a no-op. It just simplifies code by using the name of
> a variable instead of its type when calling 'sizeof'.
>
> Signed-off-by: Christophe JAILLET 

Reviewed-by: Julian Calaby 

Thanks,

-- 
Julian Calaby

Email: julian.cal...@gmail.com
Profile: http://www.google.com/profiles/julian.calaby/


Re: [PATCH 2/2] mwifiex: simplify length computation for some memset

2016-09-03 Thread Julian Calaby
Hi All,

On Mon, Aug 8, 2016 at 5:39 PM, Christophe JAILLET
 wrote:
> This patch should be a no-op. It just simplifies code by using the name of
> a variable instead of its type when calling 'sizeof'.
>
> Signed-off-by: Christophe JAILLET 

Reviewed-by: Julian Calaby 

Thanks,

-- 
Julian Calaby

Email: julian.cal...@gmail.com
Profile: http://www.google.com/profiles/julian.calaby/


core.c:undefined reference to `fpu_save'

2016-09-03 Thread kbuild test robot
Hi Andrew,

It's probably a bug fix that unveils the link errors.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   28e68154c5e2793123b248d38cf17b34dcb16d87
commit: c60f169202c7643991a8b4bfeea60e06843d5b5a 
arch/mn10300/kernel/fpu-nofpu.c: needs asm/elf.h
date:   6 months ago
config: mn10300-allnoconfig (attached as .config)
compiler: am33_2.0-linux-gcc (GCC) 4.9.0
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout c60f169202c7643991a8b4bfeea60e06843d5b5a
# save the attached .config to linux build tree
make.cross ARCH=mn10300 

All errors (new ones prefixed by >>):

   kernel/built-in.o: In function `.L412':
>> core.c:(.sched.text+0x257): undefined reference to `fpu_save'

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


core.c:undefined reference to `fpu_save'

2016-09-03 Thread kbuild test robot
Hi Andrew,

It's probably a bug fix that unveils the link errors.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   28e68154c5e2793123b248d38cf17b34dcb16d87
commit: c60f169202c7643991a8b4bfeea60e06843d5b5a 
arch/mn10300/kernel/fpu-nofpu.c: needs asm/elf.h
date:   6 months ago
config: mn10300-allnoconfig (attached as .config)
compiler: am33_2.0-linux-gcc (GCC) 4.9.0
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout c60f169202c7643991a8b4bfeea60e06843d5b5a
# save the attached .config to linux build tree
make.cross ARCH=mn10300 

All errors (new ones prefixed by >>):

   kernel/built-in.o: In function `.L412':
>> core.c:(.sched.text+0x257): undefined reference to `fpu_save'

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [PATCH 2/4] sparc: bpf_jit: Move four assignments in bpf_jit_compile()

2016-09-03 Thread Julian Calaby
Hi Markus,

On Sun, Sep 4, 2016 at 2:38 AM, SF Markus Elfring
 wrote:
> From: Markus Elfring 
> Date: Sat, 3 Sep 2016 17:45:28 +0200
>
> Move the assignments for four local variables a bit at the beginning
> so that they will only be performed if a corresponding memory allocation
> succeeded by this function.
>
> Signed-off-by: Markus Elfring 
> ---
>  arch/sparc/net/bpf_jit_comp.c | 12 
>  1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/arch/sparc/net/bpf_jit_comp.c b/arch/sparc/net/bpf_jit_comp.c
> index ced1393..a927470 100644
> --- a/arch/sparc/net/bpf_jit_comp.c
> +++ b/arch/sparc/net/bpf_jit_comp.c
> @@ -362,10 +362,10 @@ do {  *prog++ = BR_OPC | WDISP22(OFF);  
>   \
>
>  void bpf_jit_compile(struct bpf_prog *fp)
>  {
> -   unsigned int cleanup_addr, proglen, oldproglen = 0;
> -   u32 temp[8], *prog, *func, seen = 0, pass;
> -   const struct sock_filter *filter = fp->insns;
> -   int i, flen = fp->len, pc_ret0 = -1;
> +   unsigned int cleanup_addr, proglen, oldproglen;
> +   u32 temp[8], *prog, *func, seen, pass;
> +   const struct sock_filter *filter;
> +   int i, flen = fp->len, pc_ret0;
> unsigned int *addrs;
> void *image;
>
> @@ -385,6 +385,10 @@ void bpf_jit_compile(struct bpf_prog *fp)
> }
> cleanup_addr = proglen; /* epilogue address */
> image = NULL;
> +   filter = fp->insns;
> +   oldproglen = 0;
> +   pc_ret0 = -1;
> +   seen = 0;
> for (pass = 0; pass < 10; pass++) {
> u8 seen_or_pass0 = (pass == 0) ? (SEEN_XREG | SEEN_DATAREF | 
> SEEN_MEM) : seen;

This is utterly pointless, why?

If you were moving the assignments on declaration onto separate lines
at the top of the file then ok, but why all the way down here?

Thanks,

-- 
Julian Calaby

Email: julian.cal...@gmail.com
Profile: http://www.google.com/profiles/julian.calaby/


Re: [PATCH 2/4] sparc: bpf_jit: Move four assignments in bpf_jit_compile()

2016-09-03 Thread Julian Calaby
Hi Markus,

On Sun, Sep 4, 2016 at 2:38 AM, SF Markus Elfring
 wrote:
> From: Markus Elfring 
> Date: Sat, 3 Sep 2016 17:45:28 +0200
>
> Move the assignments for four local variables a bit at the beginning
> so that they will only be performed if a corresponding memory allocation
> succeeded by this function.
>
> Signed-off-by: Markus Elfring 
> ---
>  arch/sparc/net/bpf_jit_comp.c | 12 
>  1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/arch/sparc/net/bpf_jit_comp.c b/arch/sparc/net/bpf_jit_comp.c
> index ced1393..a927470 100644
> --- a/arch/sparc/net/bpf_jit_comp.c
> +++ b/arch/sparc/net/bpf_jit_comp.c
> @@ -362,10 +362,10 @@ do {  *prog++ = BR_OPC | WDISP22(OFF);  
>   \
>
>  void bpf_jit_compile(struct bpf_prog *fp)
>  {
> -   unsigned int cleanup_addr, proglen, oldproglen = 0;
> -   u32 temp[8], *prog, *func, seen = 0, pass;
> -   const struct sock_filter *filter = fp->insns;
> -   int i, flen = fp->len, pc_ret0 = -1;
> +   unsigned int cleanup_addr, proglen, oldproglen;
> +   u32 temp[8], *prog, *func, seen, pass;
> +   const struct sock_filter *filter;
> +   int i, flen = fp->len, pc_ret0;
> unsigned int *addrs;
> void *image;
>
> @@ -385,6 +385,10 @@ void bpf_jit_compile(struct bpf_prog *fp)
> }
> cleanup_addr = proglen; /* epilogue address */
> image = NULL;
> +   filter = fp->insns;
> +   oldproglen = 0;
> +   pc_ret0 = -1;
> +   seen = 0;
> for (pass = 0; pass < 10; pass++) {
> u8 seen_or_pass0 = (pass == 0) ? (SEEN_XREG | SEEN_DATAREF | 
> SEEN_MEM) : seen;

This is utterly pointless, why?

If you were moving the assignments on declaration onto separate lines
at the top of the file then ok, but why all the way down here?

Thanks,

-- 
Julian Calaby

Email: julian.cal...@gmail.com
Profile: http://www.google.com/profiles/julian.calaby/


Re: [PATCH] ath10k: Spelling and miscellaneous neatening

2016-09-03 Thread Julian Calaby
Hi All,

On Tue, Aug 30, 2016 at 3:05 AM, Joe Perches  wrote:
> Correct some trivial comment typos.
> Remove unnecessary parentheses in a long line.
> Convert a return; before the end of a void function definition to just ;
>
> Signed-off-by: Joe Perches 

This all looks correct to me. I wish you'd put the code changes in a
separate patch, however it's all noted in the commit log, so...

Reviewed-by: Julian Calaby 

Thanks,

-- 
Julian Calaby

Email: julian.cal...@gmail.com
Profile: http://www.google.com/profiles/julian.calaby/


Re: [PATCH] ath10k: Spelling and miscellaneous neatening

2016-09-03 Thread Julian Calaby
Hi All,

On Tue, Aug 30, 2016 at 3:05 AM, Joe Perches  wrote:
> Correct some trivial comment typos.
> Remove unnecessary parentheses in a long line.
> Convert a return; before the end of a void function definition to just ;
>
> Signed-off-by: Joe Perches 

This all looks correct to me. I wish you'd put the code changes in a
separate patch, however it's all noted in the commit log, so...

Reviewed-by: Julian Calaby 

Thanks,

-- 
Julian Calaby

Email: julian.cal...@gmail.com
Profile: http://www.google.com/profiles/julian.calaby/


Re: [PATCH] staging: wilc1000: fix spelling mistake: "retyring" -> "retrying"

2016-09-03 Thread Julian Calaby
Hi All,

On Sun, Aug 28, 2016 at 9:28 PM, Colin King  wrote:
> From: Colin Ian King 
>
> trivial fix to spelling mistake in dev_err message
>
> Signed-off-by: Colin Ian King 

Reviewed-by: Julian Calaby 

Thanks,

-- 
Julian Calaby

Email: julian.cal...@gmail.com
Profile: http://www.google.com/profiles/julian.calaby/


Re: [PATCH] staging: wilc1000: fix spelling mistake: "retyring" -> "retrying"

2016-09-03 Thread Julian Calaby
Hi All,

On Sun, Aug 28, 2016 at 9:28 PM, Colin King  wrote:
> From: Colin Ian King 
>
> trivial fix to spelling mistake in dev_err message
>
> Signed-off-by: Colin Ian King 

Reviewed-by: Julian Calaby 

Thanks,

-- 
Julian Calaby

Email: julian.cal...@gmail.com
Profile: http://www.google.com/profiles/julian.calaby/


Re: [PATCH] rtl8xxxu: fix spelling mistake "firmare" -> "firmware"

2016-09-03 Thread Julian Calaby
Hi All,

On Sun, Sep 4, 2016 at 2:43 AM, Colin King  wrote:
> From: Colin Ian King 
>
> Trivial fix to spelling mistakes in dev_dbg message.
>
> Signed-off-by: Colin Ian King 

Reviewed-by: Julian Calaby 

Thanks,

-- 
Julian Calaby

Email: julian.cal...@gmail.com
Profile: http://www.google.com/profiles/julian.calaby/


Re: [PATCH] rtl8xxxu: fix spelling mistake "firmare" -> "firmware"

2016-09-03 Thread Julian Calaby
Hi All,

On Sun, Sep 4, 2016 at 2:43 AM, Colin King  wrote:
> From: Colin Ian King 
>
> Trivial fix to spelling mistakes in dev_dbg message.
>
> Signed-off-by: Colin Ian King 

Reviewed-by: Julian Calaby 

Thanks,

-- 
Julian Calaby

Email: julian.cal...@gmail.com
Profile: http://www.google.com/profiles/julian.calaby/


Re: [RFC 2/4] mm: replace TIF_MEMDIE checks by tsk_is_oom_victim

2016-09-03 Thread Tetsuo Handa
Michal Hocko wrote:
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 9ee178ba7b71..df58733ca48e 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1899,7 +1899,7 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t 
> gfp_mask,
>* bypass the last charges so that they can exit quickly and
>* free their memory.
>*/
> - if (unlikely(test_thread_flag(TIF_MEMDIE) ||
> + if (unlikely(tsk_is_oom_victim(current) ||
>fatal_signal_pending(current) ||
>current->flags & PF_EXITING))
>   goto force;

Does this test_thread_flag(TIF_MEMDIE) (or tsk_is_oom_victim(current)) make 
sense?

If current thread is OOM-killed, SIGKILL must be pending before arriving at
do_exit() and PF_EXITING must be set after arriving at do_exit(). But I can't
find locations which do memory allocation between clearing SIGKILL and setting
PF_EXITING.

When can test_thread_flag(TIF_MEMDIE) == T (or tsk_is_oom_victim(current) == T) 
&&
fatal_signal_pending(current) == F && (current->flags & PF_EXITING) == 0 happen?



> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index b11977585c7b..e26529edcee3 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -1078,7 +1078,7 @@ void pagefault_out_of_memory(void)
>* be a racing OOM victim for which oom_killer_disable()
>* is waiting for.
>*/
> - WARN_ON(test_thread_flag(TIF_MEMDIE));
> + WARN_ON(tsk_is_oom_victim(current));
>   }
>  
>   mutex_unlock(_lock);

Does this WARN_ON() make sense?

When some user thread called oom_killer_disable(), there are running
kernel threads but is no running user threads except the one which
called oom_killer_disable(). Since oom_killer_disable() waits for
oom_lock, out_of_memory() called from here shall not return false
before oom_killer_disable() sets oom_killer_disabled = true. Thus,
possible situation out_of_memory() called from here can return false
are limited to

 (a) the one which triggered pagefaults after returning from
 oom_killer_disable()
 (b) An OOM victim which was thawed triggered pagefaults from do_exit()
 after the one which called oom_killer_disable() released oom_lock
 (c) kernel threads which triggered pagefaults after use_mm()

. And since kernel threads are not subjected to mark_oom_victim(),
test_thread_flag(TIF_MEMDIE) == F (or tsk_is_oom_victim(current) == F)
for kernel threads. Thus, possible situation out_of_memory() called from
here can return false and we hit this WARN_ON() is limited to (b).

Even if (a) or (b) is possible, does continuously emitting backtraces
help? It seems to me that the system is under OOM livelock situation and
we need to take a different action (e.g. try to allocate a page using
ALLOC_NO_WATERMARKS in order to make the pagefault be solved, and panic()
if failed) than emitting same backtraces forever.


Re: [RFC 3/4] mm, oom: do not rely on TIF_MEMDIE for exit_oom_victim

2016-09-03 Thread Tetsuo Handa
Michal Hocko wrote:
> mark_oom_victim and exit_oom_victim are used for oom_killer_disable
> which should block as long as there any any oom victims alive. Up to now
> we have relied on TIF_MEMDIE task flag to count how many oom victim
> we have. This is not optimal because only one thread receives this flag
> at the time while the whole process (thread group) is killed and should
> die. As a result we do not thaw the whole thread group and so a multi
> threaded process can leave some threads behind in the fridge. We really
> want to thaw all the threads.
> 
> This is not all that easy because there is no reliable way to count
> threads in the process as the oom killer might race with copy_process.

What is wrong with racing with copy_process()? Threads doing copy_process()
are not frozen and thus we don't need to thaw such threads. Also, being
OOM-killed implies receiving SIGKILL. Thus, newly created thread will also
enter do_exit().

> So marking all threads with TIF_MEMDIE and increment oom_victims
> accordingly is not safe. Also TIF_MEMDIE flag should just die so
> we should better come up with a different approach.
> 
> All we need to guarantee is that exit_oom_victim is called at the time
> when no further access to (possibly suspended) devices or generate other
> IO (which would clobber suspended image and only once per process)
> is possible. It seems we can rely on exit_notify for that because we
> already have to detect the last thread to do a cleanup. Let's propagate
> that information up to do_exit and only call exit_oom_victim for such
> a thread. With this in place we can safely increment oom_victims only
> once per thread group and thaw all the threads from the process.
> freezing_slow_path can also rely on tsk_is_oom_victim as well now.

If marking all threads which belong to tsk thread group with TIF_MEMDIE
is not safe (due to possible race with copy_process()), how can

rcu_read_lock();
for_each_thread(tsk, t)
__thaw_task(t);
rcu_read_unlock();

in mark_oom_victim() guarantee that all threads which belong to tsk
thread group are thawed?

Unless all threads which belong to tsk thread group in __refrigerator()
are guaranteed to be thawed, they might fail to leave __refrigerator()
in order to enter do_exit() which means that exit_oom_victim() won't be
called.

Do we want to thaw OOM victims from the beginning? If the freezer
depends on CONFIG_MMU=y , we don't need to thaw OOM victims.


Re: [RFC 2/4] mm: replace TIF_MEMDIE checks by tsk_is_oom_victim

2016-09-03 Thread Tetsuo Handa
Michal Hocko wrote:
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 9ee178ba7b71..df58733ca48e 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1899,7 +1899,7 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t 
> gfp_mask,
>* bypass the last charges so that they can exit quickly and
>* free their memory.
>*/
> - if (unlikely(test_thread_flag(TIF_MEMDIE) ||
> + if (unlikely(tsk_is_oom_victim(current) ||
>fatal_signal_pending(current) ||
>current->flags & PF_EXITING))
>   goto force;

Does this test_thread_flag(TIF_MEMDIE) (or tsk_is_oom_victim(current)) make 
sense?

If current thread is OOM-killed, SIGKILL must be pending before arriving at
do_exit() and PF_EXITING must be set after arriving at do_exit(). But I can't
find locations which do memory allocation between clearing SIGKILL and setting
PF_EXITING.

When can test_thread_flag(TIF_MEMDIE) == T (or tsk_is_oom_victim(current) == T) 
&&
fatal_signal_pending(current) == F && (current->flags & PF_EXITING) == 0 happen?



> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index b11977585c7b..e26529edcee3 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -1078,7 +1078,7 @@ void pagefault_out_of_memory(void)
>* be a racing OOM victim for which oom_killer_disable()
>* is waiting for.
>*/
> - WARN_ON(test_thread_flag(TIF_MEMDIE));
> + WARN_ON(tsk_is_oom_victim(current));
>   }
>  
>   mutex_unlock(_lock);

Does this WARN_ON() make sense?

When some user thread called oom_killer_disable(), there are running
kernel threads but is no running user threads except the one which
called oom_killer_disable(). Since oom_killer_disable() waits for
oom_lock, out_of_memory() called from here shall not return false
before oom_killer_disable() sets oom_killer_disabled = true. Thus,
possible situation out_of_memory() called from here can return false
are limited to

 (a) the one which triggered pagefaults after returning from
 oom_killer_disable()
 (b) An OOM victim which was thawed triggered pagefaults from do_exit()
 after the one which called oom_killer_disable() released oom_lock
 (c) kernel threads which triggered pagefaults after use_mm()

. And since kernel threads are not subjected to mark_oom_victim(),
test_thread_flag(TIF_MEMDIE) == F (or tsk_is_oom_victim(current) == F)
for kernel threads. Thus, possible situation out_of_memory() called from
here can return false and we hit this WARN_ON() is limited to (b).

Even if (a) or (b) is possible, does continuously emitting backtraces
help? It seems to me that the system is under OOM livelock situation and
we need to take a different action (e.g. try to allocate a page using
ALLOC_NO_WATERMARKS in order to make the pagefault be solved, and panic()
if failed) than emitting same backtraces forever.


Re: [RFC 3/4] mm, oom: do not rely on TIF_MEMDIE for exit_oom_victim

2016-09-03 Thread Tetsuo Handa
Michal Hocko wrote:
> mark_oom_victim and exit_oom_victim are used for oom_killer_disable
> which should block as long as there any any oom victims alive. Up to now
> we have relied on TIF_MEMDIE task flag to count how many oom victim
> we have. This is not optimal because only one thread receives this flag
> at the time while the whole process (thread group) is killed and should
> die. As a result we do not thaw the whole thread group and so a multi
> threaded process can leave some threads behind in the fridge. We really
> want to thaw all the threads.
> 
> This is not all that easy because there is no reliable way to count
> threads in the process as the oom killer might race with copy_process.

What is wrong with racing with copy_process()? Threads doing copy_process()
are not frozen and thus we don't need to thaw such threads. Also, being
OOM-killed implies receiving SIGKILL. Thus, newly created thread will also
enter do_exit().

> So marking all threads with TIF_MEMDIE and increment oom_victims
> accordingly is not safe. Also TIF_MEMDIE flag should just die so
> we should better come up with a different approach.
> 
> All we need to guarantee is that exit_oom_victim is called at the time
> when no further access to (possibly suspended) devices or generate other
> IO (which would clobber suspended image and only once per process)
> is possible. It seems we can rely on exit_notify for that because we
> already have to detect the last thread to do a cleanup. Let's propagate
> that information up to do_exit and only call exit_oom_victim for such
> a thread. With this in place we can safely increment oom_victims only
> once per thread group and thaw all the threads from the process.
> freezing_slow_path can also rely on tsk_is_oom_victim as well now.

If marking all threads which belong to tsk thread group with TIF_MEMDIE
is not safe (due to possible race with copy_process()), how can

rcu_read_lock();
for_each_thread(tsk, t)
__thaw_task(t);
rcu_read_unlock();

in mark_oom_victim() guarantee that all threads which belong to tsk
thread group are thawed?

Unless all threads which belong to tsk thread group in __refrigerator()
are guaranteed to be thawed, they might fail to leave __refrigerator()
in order to enter do_exit() which means that exit_oom_victim() won't be
called.

Do we want to thaw OOM victims from the beginning? If the freezer
depends on CONFIG_MMU=y , we don't need to thaw OOM victims.


Re: [RFC 1/4] mm, oom: do not rely on TIF_MEMDIE for memory reserves access

2016-09-03 Thread Tetsuo Handa
Michal Hocko wrote:
> @@ -816,7 +816,8 @@ static void oom_kill_process(struct oom_control *oc, 
> const char *message)
>  
>   /*
>* If the task is already exiting, don't alarm the sysadmin or kill
> -  * its children or threads, just set TIF_MEMDIE so it can die quickly
> +  * its children or threads, just give it access to memory reserves
> +  * so it can die quickly
>*/
>   task_lock(p);
>   if (task_will_free_mem(p)) {
> @@ -876,9 +877,9 @@ static void oom_kill_process(struct oom_control *oc, 
> const char *message)
>   mm = victim->mm;
>   atomic_inc(>mm_count);
>   /*
> -  * We should send SIGKILL before setting TIF_MEMDIE in order to prevent
> -  * the OOM victim from depleting the memory reserves from the user
> -  * space under its control.
> +  * We should send SIGKILL before granting access to memory reserves
> +  * in order to prevent the OOM victim from depleting the memory
> +  * reserves from the user space under its control.
>*/

Removing TIF_MEMDIE usage inside comments can be done as a clean up
before this series.



> @@ -3309,6 +3318,22 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
>   return alloc_flags;
>  }
>  
> +static bool oom_reserves_allowed(struct task_struct *tsk)
> +{
> + if (!tsk_is_oom_victim(tsk))
> + return false;
> +
> + /*
> +  * !MMU doesn't have oom reaper so we shouldn't risk the memory reserves
> +  * depletion and shouldn't give access to memory reserves passed the
> +  * exit_mm
> +  */
> + if (!IS_ENABLED(CONFIG_MMU) && !tsk->mm)
> + return false;
> +
> + return true;
> +}
> +

Are you aware that you are trying to make !MMU kernel's allocations not only
after returning exit_mm() but also from __mmput() from mmput() from exit_mm()
fail without allowing access to memory reserves? The comment says only after
returning exit_mm(), but this change is not.



> @@ -3558,8 +3593,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int 
> order,
>   goto nopage;
>   }
>  
> - /* Avoid allocations with no watermarks from looping endlessly */
> - if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
> + /* Avoid allocations for oom victims from looping endlessly */
> + if (tsk_is_oom_victim(current) && !(gfp_mask & __GFP_NOFAIL))
>   goto nopage;

This change increases possibility of giving up without trying ALLOC_OOM
(more allocation failure messages), for currently only one thread which
remotely got TIF_MEMDIE when it was between gfp_to_alloc_flags() and
test_thread_flag(TIF_MEMDIE) will give up without trying ALLOC_NO_WATERMARKS
while all threads which remotely got current->signal->oom_mm when they were
between gfp_to_alloc_flags() and test_thread_flag(TIF_MEMDIE) will give up
without trying ALLOC_OOM. I think we should make sure that ALLOC_OOM is
tried (by using a variable which remembers whether
get_page_from_freelist(ALLOC_OOM) was tried).

We are currently allowing TIF_MEMDIE threads try ALLOC_NO_WATERMARKS for
once and give up without invoking the OOM killer. This change makes
current->signal->oom_mm threads try ALLOC_OOM for once and give up without
invoking the OOM killer. This means that allocations for cleanly cleaning
up by oom victims might fail prematurely, but we don't want to scatter
around __GFP_NOFAIL. Since there are reasonable chances of the parallel
memory freeing, we don't need to give up without invoking the OOM killer
again. I think that

-   /* Avoid allocations with no watermarks from looping endlessly */
-   if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
+#ifndef CONFIG_MMU
+   /* Avoid allocations for oom victims from looping endlessly */
+   if (tsk_is_oom_victim(current) && !(gfp_mask & __GFP_NOFAIL))
+   goto nopage;
+#endif

is possible.


drivers/media/v4l2-core/videobuf2-dma-contig.c:486:2: error: implicit declaration of function 'dma_get_cache_alignment'

2016-09-03 Thread kbuild test robot
Hi Hans,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   28e68154c5e2793123b248d38cf17b34dcb16d87
commit: c1023ba74fc77dc56dc317bd98f5060aab889ac1 [media] 
drivers/media/platform/Kconfig: fix VIDEO_MEDIATEK_VCODEC dependency
date:   8 weeks ago
config: m32r-allyesconfig (attached as .config)
compiler: m32r-linux-gcc (GCC) 4.9.0
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout c1023ba74fc77dc56dc317bd98f5060aab889ac1
# save the attached .config to linux build tree
make.cross ARCH=m32r 

All errors (new ones prefixed by >>):

   drivers/media/v4l2-core/videobuf2-dma-contig.c: In function 
'vb2_dc_get_userptr':
>> drivers/media/v4l2-core/videobuf2-dma-contig.c:486:2: error: implicit 
>> declaration of function 'dma_get_cache_alignment' 
>> [-Werror=implicit-function-declaration]
 unsigned long dma_align = dma_get_cache_alignment();
 ^
   cc1: some warnings being treated as errors

vim +/dma_get_cache_alignment +486 
drivers/media/v4l2-core/videobuf2-dma-contig.c

774d2301 drivers/media/v4l2-core/videobuf2-dma-contig.c Marek Szyprowski
2013-06-19  470  {
774d2301 drivers/media/v4l2-core/videobuf2-dma-contig.c Marek Szyprowski
2013-06-19  471 /* really, we cannot do anything better at this point */
774d2301 drivers/media/v4l2-core/videobuf2-dma-contig.c Marek Szyprowski
2013-06-19  472 return (dma_addr_t)(pfn) << PAGE_SHIFT;
774d2301 drivers/media/v4l2-core/videobuf2-dma-contig.c Marek Szyprowski
2013-06-19  473  }
774d2301 drivers/media/v4l2-core/videobuf2-dma-contig.c Marek Szyprowski
2013-06-19  474  #endif
774d2301 drivers/media/v4l2-core/videobuf2-dma-contig.c Marek Szyprowski
2013-06-19  475  
36c0f8b3 drivers/media/v4l2-core/videobuf2-dma-contig.c Hans Verkuil
2016-04-15  476  static void *vb2_dc_get_userptr(struct device *dev, unsigned 
long vaddr,
cd474037 drivers/media/v4l2-core/videobuf2-dma-contig.c Hans Verkuil
2014-11-18  477 unsigned long size, enum dma_data_direction dma_dir)
1a758d4e drivers/media/video/videobuf2-dma-contig.c Pawel Osciak
2010-10-11  478  {
1a758d4e drivers/media/video/videobuf2-dma-contig.c Pawel Osciak
2010-10-11  479 struct vb2_dc_buf *buf;
fb639eb3 drivers/media/v4l2-core/videobuf2-dma-contig.c Jan Kara
2015-07-13  480 struct frame_vector *vec;
e15dab75 drivers/media/v4l2-core/videobuf2-dma-contig.c Tomasz Stanislawski 
2012-06-14  481 unsigned long offset;
fb639eb3 drivers/media/v4l2-core/videobuf2-dma-contig.c Jan Kara
2015-07-13  482 int n_pages, i;
e15dab75 drivers/media/v4l2-core/videobuf2-dma-contig.c Tomasz Stanislawski 
2012-06-14  483 int ret = 0;
e15dab75 drivers/media/v4l2-core/videobuf2-dma-contig.c Tomasz Stanislawski 
2012-06-14  484 struct sg_table *sgt;
e15dab75 drivers/media/v4l2-core/videobuf2-dma-contig.c Tomasz Stanislawski 
2012-06-14  485 unsigned long contig_size;
d81e870d drivers/media/v4l2-core/videobuf2-dma-contig.c Marek Szyprowski
2012-06-12 @486 unsigned long dma_align = dma_get_cache_alignment();
251a79f8 drivers/media/v4l2-core/videobuf2-dma-contig.c Hans Verkuil
2014-11-18  487 DEFINE_DMA_ATTRS(attrs);
251a79f8 drivers/media/v4l2-core/videobuf2-dma-contig.c Hans Verkuil
2014-11-18  488  
251a79f8 drivers/media/v4l2-core/videobuf2-dma-contig.c Hans Verkuil
2014-11-18  489 dma_set_attr(DMA_ATTR_SKIP_CPU_SYNC, );
d81e870d drivers/media/v4l2-core/videobuf2-dma-contig.c Marek Szyprowski
2012-06-12  490  
d81e870d drivers/media/v4l2-core/videobuf2-dma-contig.c Marek Szyprowski
2012-06-12  491 /* Only cache aligned DMA transfers are reliable */
d81e870d drivers/media/v4l2-core/videobuf2-dma-contig.c Marek Szyprowski
2012-06-12  492 if (!IS_ALIGNED(vaddr | size, dma_align)) {
d81e870d drivers/media/v4l2-core/videobuf2-dma-contig.c Marek Szyprowski
2012-06-12  493 pr_debug("user data must be aligned to %lu 
bytes\n", dma_align);
d81e870d drivers/media/v4l2-core/videobuf2-dma-contig.c Marek Szyprowski
2012-06-12  494 return ERR_PTR(-EINVAL);

:: The code at line 486 was first introduced by commit
:: d81e870d5afa1b0a95ea94c4052d3c7e973fae8c [media] v4l: vb2-dma-contig: 
fail if user ptr buffer is not correctly aligned

:: TO: Marek Szyprowski 
:: CC: Mauro Carvalho Chehab 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [RFC 1/4] mm, oom: do not rely on TIF_MEMDIE for memory reserves access

2016-09-03 Thread Tetsuo Handa
Michal Hocko wrote:
> @@ -816,7 +816,8 @@ static void oom_kill_process(struct oom_control *oc, 
> const char *message)
>  
>   /*
>* If the task is already exiting, don't alarm the sysadmin or kill
> -  * its children or threads, just set TIF_MEMDIE so it can die quickly
> +  * its children or threads, just give it access to memory reserves
> +  * so it can die quickly
>*/
>   task_lock(p);
>   if (task_will_free_mem(p)) {
> @@ -876,9 +877,9 @@ static void oom_kill_process(struct oom_control *oc, 
> const char *message)
>   mm = victim->mm;
>   atomic_inc(>mm_count);
>   /*
> -  * We should send SIGKILL before setting TIF_MEMDIE in order to prevent
> -  * the OOM victim from depleting the memory reserves from the user
> -  * space under its control.
> +  * We should send SIGKILL before granting access to memory reserves
> +  * in order to prevent the OOM victim from depleting the memory
> +  * reserves from the user space under its control.
>*/

Removing TIF_MEMDIE usage inside comments can be done as a clean up
before this series.



> @@ -3309,6 +3318,22 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
>   return alloc_flags;
>  }
>  
> +static bool oom_reserves_allowed(struct task_struct *tsk)
> +{
> + if (!tsk_is_oom_victim(tsk))
> + return false;
> +
> + /*
> +  * !MMU doesn't have oom reaper so we shouldn't risk the memory reserves
> +  * depletion and shouldn't give access to memory reserves passed the
> +  * exit_mm
> +  */
> + if (!IS_ENABLED(CONFIG_MMU) && !tsk->mm)
> + return false;
> +
> + return true;
> +}
> +

Are you aware that you are trying to make !MMU kernel's allocations not only
after returning exit_mm() but also from __mmput() from mmput() from exit_mm()
fail without allowing access to memory reserves? The comment says only after
returning exit_mm(), but this change is not.



> @@ -3558,8 +3593,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int 
> order,
>   goto nopage;
>   }
>  
> - /* Avoid allocations with no watermarks from looping endlessly */
> - if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
> + /* Avoid allocations for oom victims from looping endlessly */
> + if (tsk_is_oom_victim(current) && !(gfp_mask & __GFP_NOFAIL))
>   goto nopage;

This change increases possibility of giving up without trying ALLOC_OOM
(more allocation failure messages), for currently only one thread which
remotely got TIF_MEMDIE when it was between gfp_to_alloc_flags() and
test_thread_flag(TIF_MEMDIE) will give up without trying ALLOC_NO_WATERMARKS
while all threads which remotely got current->signal->oom_mm when they were
between gfp_to_alloc_flags() and test_thread_flag(TIF_MEMDIE) will give up
without trying ALLOC_OOM. I think we should make sure that ALLOC_OOM is
tried (by using a variable which remembers whether
get_page_from_freelist(ALLOC_OOM) was tried).

We are currently allowing TIF_MEMDIE threads try ALLOC_NO_WATERMARKS for
once and give up without invoking the OOM killer. This change makes
current->signal->oom_mm threads try ALLOC_OOM for once and give up without
invoking the OOM killer. This means that allocations for cleanly cleaning
up by oom victims might fail prematurely, but we don't want to scatter
around __GFP_NOFAIL. Since there are reasonable chances of the parallel
memory freeing, we don't need to give up without invoking the OOM killer
again. I think that

-   /* Avoid allocations with no watermarks from looping endlessly */
-   if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
+#ifndef CONFIG_MMU
+   /* Avoid allocations for oom victims from looping endlessly */
+   if (tsk_is_oom_victim(current) && !(gfp_mask & __GFP_NOFAIL))
+   goto nopage;
+#endif

is possible.


drivers/media/v4l2-core/videobuf2-dma-contig.c:486:2: error: implicit declaration of function 'dma_get_cache_alignment'

2016-09-03 Thread kbuild test robot
Hi Hans,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   28e68154c5e2793123b248d38cf17b34dcb16d87
commit: c1023ba74fc77dc56dc317bd98f5060aab889ac1 [media] 
drivers/media/platform/Kconfig: fix VIDEO_MEDIATEK_VCODEC dependency
date:   8 weeks ago
config: m32r-allyesconfig (attached as .config)
compiler: m32r-linux-gcc (GCC) 4.9.0
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout c1023ba74fc77dc56dc317bd98f5060aab889ac1
# save the attached .config to linux build tree
make.cross ARCH=m32r 

All errors (new ones prefixed by >>):

   drivers/media/v4l2-core/videobuf2-dma-contig.c: In function 
'vb2_dc_get_userptr':
>> drivers/media/v4l2-core/videobuf2-dma-contig.c:486:2: error: implicit 
>> declaration of function 'dma_get_cache_alignment' 
>> [-Werror=implicit-function-declaration]
 unsigned long dma_align = dma_get_cache_alignment();
 ^
   cc1: some warnings being treated as errors

vim +/dma_get_cache_alignment +486 
drivers/media/v4l2-core/videobuf2-dma-contig.c

774d2301 drivers/media/v4l2-core/videobuf2-dma-contig.c Marek Szyprowski
2013-06-19  470  {
774d2301 drivers/media/v4l2-core/videobuf2-dma-contig.c Marek Szyprowski
2013-06-19  471 /* really, we cannot do anything better at this point */
774d2301 drivers/media/v4l2-core/videobuf2-dma-contig.c Marek Szyprowski
2013-06-19  472 return (dma_addr_t)(pfn) << PAGE_SHIFT;
774d2301 drivers/media/v4l2-core/videobuf2-dma-contig.c Marek Szyprowski
2013-06-19  473  }
774d2301 drivers/media/v4l2-core/videobuf2-dma-contig.c Marek Szyprowski
2013-06-19  474  #endif
774d2301 drivers/media/v4l2-core/videobuf2-dma-contig.c Marek Szyprowski
2013-06-19  475  
36c0f8b3 drivers/media/v4l2-core/videobuf2-dma-contig.c Hans Verkuil
2016-04-15  476  static void *vb2_dc_get_userptr(struct device *dev, unsigned 
long vaddr,
cd474037 drivers/media/v4l2-core/videobuf2-dma-contig.c Hans Verkuil
2014-11-18  477 unsigned long size, enum dma_data_direction dma_dir)
1a758d4e drivers/media/video/videobuf2-dma-contig.c Pawel Osciak
2010-10-11  478  {
1a758d4e drivers/media/video/videobuf2-dma-contig.c Pawel Osciak
2010-10-11  479 struct vb2_dc_buf *buf;
fb639eb3 drivers/media/v4l2-core/videobuf2-dma-contig.c Jan Kara
2015-07-13  480 struct frame_vector *vec;
e15dab75 drivers/media/v4l2-core/videobuf2-dma-contig.c Tomasz Stanislawski 
2012-06-14  481 unsigned long offset;
fb639eb3 drivers/media/v4l2-core/videobuf2-dma-contig.c Jan Kara
2015-07-13  482 int n_pages, i;
e15dab75 drivers/media/v4l2-core/videobuf2-dma-contig.c Tomasz Stanislawski 
2012-06-14  483 int ret = 0;
e15dab75 drivers/media/v4l2-core/videobuf2-dma-contig.c Tomasz Stanislawski 
2012-06-14  484 struct sg_table *sgt;
e15dab75 drivers/media/v4l2-core/videobuf2-dma-contig.c Tomasz Stanislawski 
2012-06-14  485 unsigned long contig_size;
d81e870d drivers/media/v4l2-core/videobuf2-dma-contig.c Marek Szyprowski
2012-06-12 @486 unsigned long dma_align = dma_get_cache_alignment();
251a79f8 drivers/media/v4l2-core/videobuf2-dma-contig.c Hans Verkuil
2014-11-18  487 DEFINE_DMA_ATTRS(attrs);
251a79f8 drivers/media/v4l2-core/videobuf2-dma-contig.c Hans Verkuil
2014-11-18  488  
251a79f8 drivers/media/v4l2-core/videobuf2-dma-contig.c Hans Verkuil
2014-11-18  489 dma_set_attr(DMA_ATTR_SKIP_CPU_SYNC, );
d81e870d drivers/media/v4l2-core/videobuf2-dma-contig.c Marek Szyprowski
2012-06-12  490  
d81e870d drivers/media/v4l2-core/videobuf2-dma-contig.c Marek Szyprowski
2012-06-12  491 /* Only cache aligned DMA transfers are reliable */
d81e870d drivers/media/v4l2-core/videobuf2-dma-contig.c Marek Szyprowski
2012-06-12  492 if (!IS_ALIGNED(vaddr | size, dma_align)) {
d81e870d drivers/media/v4l2-core/videobuf2-dma-contig.c Marek Szyprowski
2012-06-12  493 pr_debug("user data must be aligned to %lu 
bytes\n", dma_align);
d81e870d drivers/media/v4l2-core/videobuf2-dma-contig.c Marek Szyprowski
2012-06-12  494 return ERR_PTR(-EINVAL);

:: The code at line 486 was first introduced by commit
:: d81e870d5afa1b0a95ea94c4052d3c7e973fae8c [media] v4l: vb2-dma-contig: 
fail if user ptr buffer is not correctly aligned

:: TO: Marek Szyprowski 
:: CC: Mauro Carvalho Chehab 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


[RFC v5 16/23] clockevents: clockevents_program_min_delta(): don't set ->next_event

2016-09-03 Thread Nicolai Stange
Currently, clockevents_program_min_delta() sets a clockevent device's
->next_event to the point in time where the minimum delta would actually
expire:

  delta = dev->min_delta_ns;
  dev->next_event = ktime_add_ns(ktime_get(), delta);

For your reference, this is so since the initial advent of
clockevents_program_min_delta() with
  commit d1748302f70b ("clockevents: Make minimum delay adjustments
configurable").

clockevents_program_min_delta() is called from clockevents_program_event()
only. More specifically, it is called if the latter's force argument is set
and, neglecting the case of device programming failure for the moment, if
the requested expiry is in the past.

On the contrary, if the expiry requested from clockevents_program_event()
is in the future, but less than ->min_delta_ns behind, then
- ->next_event gets set to that expiry verbatim
- but the clockevent device gets silently programmed to fire after
  ->min_delta_ns only.

Thus, in the extreme cases of expires == ktime_get() and
expires == ktime_get() + 1, the respective values of ->next_event would
differ by ->min_delta_ns while the clockevent device would actually get
programmed to fire at (almost) the same times (with force being set,
of course).

While this discontinuity of ->next_event at expires == ktime_get() is not
a problem by itself, the mere use of ->min_delta_ns in the event
programming path hinders upcoming changes making the clockevent core
NTP correction aware: both, ->mult and ->min_delta_ns would need to get
updated as well as consumed atomically and we'd rather like to avoid any
locking here.

Thus, let clockevents_program_event() unconditionally set ->next_event to
the expiry time actually requested by its caller, i.e. don't set
->next_event from clockevents_program_min_delta().

A few notes on why this change is safe with the current consumers of
->next_event:
1.
Note that a clockevents_program_event() with a requested expiry in the
past and force being set basically means: "fire ASAP". Now, consider this
so programmed event getting handed once again to
clockevents_program_event(), i.e. that a

  clockevents_program_event(dev, dev->next_event, false)

as in __clockevents_update_freq() is done.
With this change applied, clockevents_program_event() would now properly
detect the expiry being in the past and, due to the force argument being
unset, wouldn't actually do anything.
Before this change OTOH, there would be the (very unlikely) possibility
that the requested event is still somewhere in the future and
clockevents_program_event() would silently delay the event expiration by
another ->min_delta_ns.

2.
The periodic tick handlers on oneshot-only devices use ->next_event
to calculate the followup expiry time.
tick_handle_periodic() spins on reprogramming the clockevent device
until some expiry in the future has been reached:

  ktime_t next = dev->next_event;
  ...
  for(;;) {
next = ktime_add(next, tick_period);
if (!clockevents_program_event(dev, next, false))
  return;
...
  }

Thus, tick_handle_periodic() isn't affected by this change.
For tick_handle_periodic_broadcast(), the situation is different since

  commit 2951d5c031a3 ("tick: broadcast: Prevent livelock from event
handler")

though: a loop similar to the one from tick_handle_periodic() above got
replaced by a single

  ktime_t next = ktime_add(dev->next_event, tick_period);
  clockevents_program_event(dev, next, true);

In the case that dev->next_event + tick_period happens to be less than
ktime_get() + ->min_delta_ns, without this change applied, ->next_event
would get recovered to some point in the future after a single
tick_handle_periodic_broadcast() event.
On the contrary, with this patch applied, it could potentially take some
number of tick_handle_periodic_broadcast() events, each separated by
->min_delta_ns only, until ->next_event is able to catch up with the
current ktime_get(). However, if this turns out to become a problem,
the reprogramming loop in tick_handle_periodic_broadcast() can probably
be restored easily.

3.
In kernel/time/tick-broadcast.c, the broadcast receiving clockevent
devices' ->next_event is read multiple times in order to determine who's
next or who must be pinged. These uses all continue to work. Moreover,
clockevent devices getting programmed to something less than
ktime_get() + ->min_delta_ns
might not be the best candidates for a transition into C3 anyway.

4.
Finally, a "sleep length" is calculated at the very end of
tick_nohz_stop_sched_tick() as follows:

  ts->sleep_length = ktime_sub(dev->next_event, now);

AFAICS, this can happen to be negative w/o this change applied already: in
NOHZ_MODE_HIGHRES mode there can be some overdue hrtimers whose removal is
blocked because tick_nohz_stop_sched_tick() gets called with interrupts
disabled. Unfortunately, the only user, the menu cpuidle governor,
can't cope with negative sleep lengths as it casts the return 

[RFC v5 16/23] clockevents: clockevents_program_min_delta(): don't set ->next_event

2016-09-03 Thread Nicolai Stange
Currently, clockevents_program_min_delta() sets a clockevent device's
->next_event to the point in time where the minimum delta would actually
expire:

  delta = dev->min_delta_ns;
  dev->next_event = ktime_add_ns(ktime_get(), delta);

For your reference, this is so since the initial advent of
clockevents_program_min_delta() with
  commit d1748302f70b ("clockevents: Make minimum delay adjustments
configurable").

clockevents_program_min_delta() is called from clockevents_program_event()
only. More specifically, it is called if the latter's force argument is set
and, neglecting the case of device programming failure for the moment, if
the requested expiry is in the past.

On the contrary, if the expiry requested from clockevents_program_event()
is in the future, but less than ->min_delta_ns behind, then
- ->next_event gets set to that expiry verbatim
- but the clockevent device gets silently programmed to fire after
  ->min_delta_ns only.

Thus, in the extreme cases of expires == ktime_get() and
expires == ktime_get() + 1, the respective values of ->next_event would
differ by ->min_delta_ns while the clockevent device would actually get
programmed to fire at (almost) the same times (with force being set,
of course).

While this discontinuity of ->next_event at expires == ktime_get() is not
a problem by itself, the mere use of ->min_delta_ns in the event
programming path hinders upcoming changes making the clockevent core
NTP correction aware: both, ->mult and ->min_delta_ns would need to get
updated as well as consumed atomically and we'd rather like to avoid any
locking here.

Thus, let clockevents_program_event() unconditionally set ->next_event to
the expiry time actually requested by its caller, i.e. don't set
->next_event from clockevents_program_min_delta().

A few notes on why this change is safe with the current consumers of
->next_event:
1.
Note that a clockevents_program_event() with a requested expiry in the
past and force being set basically means: "fire ASAP". Now, consider this
so programmed event getting handed once again to
clockevents_program_event(), i.e. that a

  clockevents_program_event(dev, dev->next_event, false)

as in __clockevents_update_freq() is done.
With this change applied, clockevents_program_event() would now properly
detect the expiry being in the past and, due to the force argument being
unset, wouldn't actually do anything.
Before this change OTOH, there would be the (very unlikely) possibility
that the requested event is still somewhere in the future and
clockevents_program_event() would silently delay the event expiration by
another ->min_delta_ns.

2.
The periodic tick handlers on oneshot-only devices use ->next_event
to calculate the followup expiry time.
tick_handle_periodic() spins on reprogramming the clockevent device
until some expiry in the future has been reached:

  ktime_t next = dev->next_event;
  ...
  for(;;) {
next = ktime_add(next, tick_period);
if (!clockevents_program_event(dev, next, false))
  return;
...
  }

Thus, tick_handle_periodic() isn't affected by this change.
For tick_handle_periodic_broadcast(), the situation is different since

  commit 2951d5c031a3 ("tick: broadcast: Prevent livelock from event
handler")

though: a loop similar to the one from tick_handle_periodic() above got
replaced by a single

  ktime_t next = ktime_add(dev->next_event, tick_period);
  clockevents_program_event(dev, next, true);

In the case that dev->next_event + tick_period happens to be less than
ktime_get() + ->min_delta_ns, without this change applied, ->next_event
would get recovered to some point in the future after a single
tick_handle_periodic_broadcast() event.
On the contrary, with this patch applied, it could potentially take some
number of tick_handle_periodic_broadcast() events, each separated by
->min_delta_ns only, until ->next_event is able to catch up with the
current ktime_get(). However, if this turns out to become a problem,
the reprogramming loop in tick_handle_periodic_broadcast() can probably
be restored easily.

3.
In kernel/time/tick-broadcast.c, the broadcast receiving clockevent
devices' ->next_event is read multiple times in order to determine who's
next or who must be pinged. These uses all continue to work. Moreover,
clockevent devices getting programmed to something less than
ktime_get() + ->min_delta_ns
might not be the best candidates for a transition into C3 anyway.

4.
Finally, a "sleep length" is calculated at the very end of
tick_nohz_stop_sched_tick() as follows:

  ts->sleep_length = ktime_sub(dev->next_event, now);

AFAICS, this can happen to be negative w/o this change applied already: in
NOHZ_MODE_HIGHRES mode there can be some overdue hrtimers whose removal is
blocked because tick_nohz_stop_sched_tick() gets called with interrupts
disabled. Unfortunately, the only user, the menu cpuidle governor,
can't cope with negative sleep lengths as it casts the return 

[RFC v5 14/23] clockevents: decouple ->max_delta_ns from ->max_delta_ticks

2016-09-03 Thread Nicolai Stange
Before converting the given delta from ns to cycles by means of the
mult/shift pair, clockevents_program_event() enforces it to be less or
equal than ->max_delta_ns. Simplified, this reads as

  delta = min(delta, dev->max_delta_ns);
  clc = (delta * dev->mult) >> dev->shift;

A device's ->max_delta_ns is chosen such that
1.) The multiplication does not overflow 64 bits.
2.) clc fits into an unsigned long.
3.) clc <= the ->max_delta_ticks allowed by hardware.

These constraints are responsible for making ->max_delta_ns depend tightly
on ->max_delta_ticks and the device's frequency, i.e. its mult/shift pair.
As it stands, any update to ->mult would require a corresponding change of
->max_delta_ns as well and these two had to get consumed in the clockevent
programming path atomically. This would have a negative performance impact
as soon as we start adjusting ->mult from a different CPU with upcoming
changes making the clockevents core NTP correction aware: some kind of
locking would have been needed in the event programming path.

In order to circumvent this necessity, ->max_delta_ns could be made small
enough to cover the whole range of possible adjustments to ->mult,
eliminating the need to update it at all.

The timekeeping core never adjusts its mono clock's inverse frequency by
more than 11% (c.f. timekeeping_adjust()). This means that a clockevent
device's adjusted mult will never grow beyond 1 / (1-11%) = 112.4% of its
initial value. Thus, reducing ->max_delta_ns unconditionally to
1/112.4% = 89% would do the trick. (Actually, setting it to 8/9 = 88.89%
thereof allows for mult increases of up to 12.5% = 1/8 which is good for
binary arithmetic.)

However, such a decrease of ->max_delta_ns could become problematic for
devices whose ->max_delta_ticks is small, i.e. those for whom item 3.)
prevails.

In order to work around this, I chose to make ->max_delta_ns completely
independent of ->max_delta_ticks. It is set to 8/9 of the maximum value
satisfying conditions 1.) and 2.) for the given ->mult. Item 3.) is made
explicit by introducing an extra min(clc, ->max_delta_ticks) after the ns
to cycles conversion in clockevents_program_event(). This way, the maximum
programmable delta is reduced only for those devices which have got a large
->max_delta_ticks anyway. This comes at the cost of an extra min() in the
event programming path though.

So,
- In the case of CLOCK_EVT_FEAT_NO_ADJUST not being enabled, set
  ->max_delta_ns to 8/9 of the maximum possible value such that items 1.)
  and 2.) hold for the given ->mult. OTOH, if CLOCK_EVT_FEAT_NO_ADJUST
  is not set, set ->max_delta_ns to the full such value.

- Add a

clc = min(clc, dev->max_delta_ticks)

  after the ns to cycle conversion in clockevents_program_event().

- Move the ->max_delta_ticks member into struct clock_event_device's
  first 64 bytes in order to optimize for cacheline usage.

- Finally, preserve the semantics of /proc/timer_list by letting it display
  the minimum of ->max_delta_ns and ->max_delta_ticks converted to ns at
  the 'max_delta_ns' field.

Signed-off-by: Nicolai Stange 
---
 include/linux/clockchips.h |  4 ++--
 kernel/time/clockevents.c  | 50 +-
 kernel/time/timer_list.c   |  6 +-
 3 files changed, 56 insertions(+), 4 deletions(-)

diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
index 9a25185..be5f222 100644
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -81,6 +81,7 @@ enum clock_event_state {
  * @next_event:local storage for the next event in oneshot mode
  * @max_delta_ns:  maximum delta value in ns
  * @min_delta_ns:  minimum delta value in ns
+ * @max_delta_ticks:   maximum delta value in ticks
  * @mult:  nanosecond to cycles multiplier
  * @shift: nanoseconds to cycles divisor (power of two)
  * @state_use_accessors:current state of the device, assigned by the core code
@@ -93,7 +94,6 @@ enum clock_event_state {
  * @tick_resume:   resume clkevt device
  * @broadcast: function to broadcast events
  * @min_delta_ticks:   minimum delta value in ticks stored for reconfiguration
- * @max_delta_ticks:   maximum delta value in ticks stored for reconfiguration
  * @name:  ptr to clock event name
  * @rating:variable to rate clock event devices
  * @irq:   IRQ number (only for non CPU local devices)
@@ -109,6 +109,7 @@ struct clock_event_device {
ktime_t next_event;
u64 max_delta_ns;
u64 min_delta_ns;
+   unsigned long   max_delta_ticks;
u32 mult;
u32 shift;
enum clock_event_state  state_use_accessors;
@@ -125,7 +126,6 @@ struct clock_event_device {
void(*suspend)(struct clock_event_device *);
void

Re: [Question] about patch: don't use [delayed_]work_pending()

2016-09-03 Thread Andreas Mohr
Hi,

[no properly binding reference via In-Reply-To: available thus manually 
re-creating, sorry]

https://lkml.org/lkml/2016/9/2/335

I came up with the following somewhat random thoughts:


*** this treatment is exclusive to a single use case, i.e.
not covering things consistently (API-wide)

> +++ b/kernel/power/qos.c
> @@ -482,7 +482,16 @@ void pm_qos_update_request(struct pm_qos_request
> *req,
>   return;
>   }
>  
> - cancel_delayed_work_sync(>work);
> + /*
> +  * This function may be called very early during boot, for
> example,
> +  * from of_clk_init(), where irq needs to stay disabled.
> +  * cancel_delayed_work_sync() assumes that irq is enabled on
> +  * invocation and re-enables it on return.  Avoid calling it
> until
> +  * workqueue is initialized.
> +  */
> + if (keventd_up())
> + cancel_delayed_work_sync(>work);
> +
>   __pm_qos_update_request(req, new_value);
>  }
>  EXPORT_SYMBOL_GPL(pm_qos_update_request);


Reason: any other [early-boot] invoker of cancel_delayed_work_sync()
would hit the same issue,
without any fix then available locally each.

This may or may not be intentional.
Just wanted to point it out.


The more global question here obviously is
how this annoying
irq handling side effect
ought to be handled cleanly / symmetrically
(and ideally with minimal effect to hotpaths,
which probably is the usual inherent outcome of
an elegant/clean solution),
especially given that it is early handling vs. normal.
However unfortunately I do not have any final ideas here ATM.



*** try_to_grab_pending() layer violation (asymmetry)
It *internally* does a local_irq_save(*flags)
and then passes these flags to the *external* user,
where the user
after having called try_to_grab_pending()
then has to invoke these same
*internal layer implementation details*
(local_irq_restore(flags))
That's just fugly asymmetric handling.
If an API layer happens to have
some certain
*internal*
"context"/"resource" requirements
[either by having an explicit API function
to establish these requirements,
or by "dirtily"/"implicitly" passing these out
via an out parameter],
then this API layer
*always* should *itself* provide
correspondingly required
resource handling functions
*at this very layer implementation*
Since the API currently is named try_to_grab_pending(),
this suggests that the signature of the
corresponding resource cleanup function
could be something like
work_grab_context_release(context_reference);

Not properly providing such a public API
will result in e.g. the following major disadvantages:
- a flood of changes necessary
  *at all API users*
  in case use protocol of these
  implementation details
  (local_irq_restore())
  happen to get changed
- all users of this API layer
  having to #include the specific header
  of the inner implementation details layer
  (local_irq_restore())
  rather than the cleanly plain API layer itself only
  (either
   within the API layer header in case of an inline helper, or
   within the API layer implemention internally only in case of non-inline)
  IOW, by getting this wrong,
  #include handling of this header of the internal implementation layer 
requirements
  is not properly under the control of the API layer implementer
  any more


Also,
comment
"try_to_grab_pending - steal work item from worklist and disable irq"
seems to have wrong order
since I would think that
"disable irq"
is the *pre-requisite* for being able to
reliably (atomically) steal work items from work list.
[[hmm, and let's hope that NMIs are (known to be?) not involved]]

HTH,

Andreas Mohr


[RFC v5 20/23] clockevents: purge ->min_delta_ns

2016-09-03 Thread Nicolai Stange
The struct clock_event_device's ->min_delta_ns member isn't used anymore.

Purge it.

In __clockevents_update_bounds(), shortcut the
  ->min_delta_ticks => ->min_delta_ns => ->min_delta_ticks_adjusted
calculation detour -- it had been made solely for the purpose of ensuring
that ->min_delta_ticks_adjusted corresponds to something >= 1us.

Signed-off-by: Nicolai Stange 
---
 include/linux/clockchips.h |  2 --
 kernel/time/clockevents.c  | 12 +---
 2 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
index 7c3a193..2ff15f03 100644
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -94,7 +94,6 @@ enum clock_event_state {
  * @tick_resume:   resume clkevt device
  * @broadcast: function to broadcast events
  * @min_delta_ticks:   minimum delta value in ticks stored for reconfiguration
- * @min_delta_ns:  minimum delta value in ns
  * @name:  ptr to clock event name
  * @rating:variable to rate clock event devices
  * @irq:   IRQ number (only for non CPU local devices)
@@ -127,7 +126,6 @@ struct clock_event_device {
void(*suspend)(struct clock_event_device *);
void(*resume)(struct clock_event_device *);
unsigned long   min_delta_ticks;
-   u64 min_delta_ns;
 
const char  *name;
int rating;
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 86d9f97..394b8dc 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -492,14 +492,12 @@ static void __clockevents_update_bounds(struct 
clock_event_device *dev)
}
 
/*
-* cev_delta2ns() never returns values less than 1us and thus,
-* we'll never program any ced with anything less.
+* Enforce ->min_delta_ticks_adjusted to correspond to a value
+* >= 1us.
 */
-   dev->min_delta_ns = cev_delta2ns(dev->min_delta_ticks, dev, false);
-   dev->min_delta_ticks_adjusted = (unsigned long)((dev->min_delta_ns *
-   dev->mult) >> dev->shift);
-   dev->min_delta_ticks_adjusted = max(dev->min_delta_ticks_adjusted,
-   dev->min_delta_ticks);
+   dev->min_delta_ticks_adjusted =
+   max(dev->min_delta_ticks,
+   (unsigned long)((1000ULL * dev->mult) >> dev->shift));
 }
 
 /**
-- 
2.9.3



Re: [Question] about patch: don't use [delayed_]work_pending()

2016-09-03 Thread Andreas Mohr
Hi,

[no properly binding reference via In-Reply-To: available thus manually 
re-creating, sorry]

https://lkml.org/lkml/2016/9/2/335

I came up with the following somewhat random thoughts:


*** this treatment is exclusive to a single use case, i.e.
not covering things consistently (API-wide)

> +++ b/kernel/power/qos.c
> @@ -482,7 +482,16 @@ void pm_qos_update_request(struct pm_qos_request
> *req,
>   return;
>   }
>  
> - cancel_delayed_work_sync(>work);
> + /*
> +  * This function may be called very early during boot, for
> example,
> +  * from of_clk_init(), where irq needs to stay disabled.
> +  * cancel_delayed_work_sync() assumes that irq is enabled on
> +  * invocation and re-enables it on return.  Avoid calling it
> until
> +  * workqueue is initialized.
> +  */
> + if (keventd_up())
> + cancel_delayed_work_sync(>work);
> +
>   __pm_qos_update_request(req, new_value);
>  }
>  EXPORT_SYMBOL_GPL(pm_qos_update_request);


Reason: any other [early-boot] invoker of cancel_delayed_work_sync()
would hit the same issue,
without any fix then available locally each.

This may or may not be intentional.
Just wanted to point it out.


The more global question here obviously is
how this annoying
irq handling side effect
ought to be handled cleanly / symmetrically
(and ideally with minimal effect to hotpaths,
which probably is the usual inherent outcome of
an elegant/clean solution),
especially given that it is early handling vs. normal.
However unfortunately I do not have any final ideas here ATM.



*** try_to_grab_pending() layer violation (asymmetry)
It *internally* does a local_irq_save(*flags)
and then passes these flags to the *external* user,
where the user
after having called try_to_grab_pending()
then has to invoke these same
*internal layer implementation details*
(local_irq_restore(flags))
That's just fugly asymmetric handling.
If an API layer happens to have
some certain
*internal*
"context"/"resource" requirements
[either by having an explicit API function
to establish these requirements,
or by "dirtily"/"implicitly" passing these out
via an out parameter],
then this API layer
*always* should *itself* provide
correspondingly required
resource handling functions
*at this very layer implementation*
Since the API currently is named try_to_grab_pending(),
this suggests that the signature of the
corresponding resource cleanup function
could be something like
work_grab_context_release(context_reference);

Not properly providing such a public API
will result in e.g. the following major disadvantages:
- a flood of changes necessary
  *at all API users*
  in case use protocol of these
  implementation details
  (local_irq_restore())
  happen to get changed
- all users of this API layer
  having to #include the specific header
  of the inner implementation details layer
  (local_irq_restore())
  rather than the cleanly plain API layer itself only
  (either
   within the API layer header in case of an inline helper, or
   within the API layer implemention internally only in case of non-inline)
  IOW, by getting this wrong,
  #include handling of this header of the internal implementation layer 
requirements
  is not properly under the control of the API layer implementer
  any more


Also,
comment
"try_to_grab_pending - steal work item from worklist and disable irq"
seems to have wrong order
since I would think that
"disable irq"
is the *pre-requisite* for being able to
reliably (atomically) steal work items from work list.
[[hmm, and let's hope that NMIs are (known to be?) not involved]]

HTH,

Andreas Mohr


[RFC v5 20/23] clockevents: purge ->min_delta_ns

2016-09-03 Thread Nicolai Stange
The struct clock_event_device's ->min_delta_ns member isn't used anymore.

Purge it.

In __clockevents_update_bounds(), shortcut the
  ->min_delta_ticks => ->min_delta_ns => ->min_delta_ticks_adjusted
calculation detour -- it had been made solely for the purpose of ensuring
that ->min_delta_ticks_adjusted corresponds to something >= 1us.

Signed-off-by: Nicolai Stange 
---
 include/linux/clockchips.h |  2 --
 kernel/time/clockevents.c  | 12 +---
 2 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
index 7c3a193..2ff15f03 100644
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -94,7 +94,6 @@ enum clock_event_state {
  * @tick_resume:   resume clkevt device
  * @broadcast: function to broadcast events
  * @min_delta_ticks:   minimum delta value in ticks stored for reconfiguration
- * @min_delta_ns:  minimum delta value in ns
  * @name:  ptr to clock event name
  * @rating:variable to rate clock event devices
  * @irq:   IRQ number (only for non CPU local devices)
@@ -127,7 +126,6 @@ struct clock_event_device {
void(*suspend)(struct clock_event_device *);
void(*resume)(struct clock_event_device *);
unsigned long   min_delta_ticks;
-   u64 min_delta_ns;
 
const char  *name;
int rating;
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 86d9f97..394b8dc 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -492,14 +492,12 @@ static void __clockevents_update_bounds(struct 
clock_event_device *dev)
}
 
/*
-* cev_delta2ns() never returns values less than 1us and thus,
-* we'll never program any ced with anything less.
+* Enforce ->min_delta_ticks_adjusted to correspond to a value
+* >= 1us.
 */
-   dev->min_delta_ns = cev_delta2ns(dev->min_delta_ticks, dev, false);
-   dev->min_delta_ticks_adjusted = (unsigned long)((dev->min_delta_ns *
-   dev->mult) >> dev->shift);
-   dev->min_delta_ticks_adjusted = max(dev->min_delta_ticks_adjusted,
-   dev->min_delta_ticks);
+   dev->min_delta_ticks_adjusted =
+   max(dev->min_delta_ticks,
+   (unsigned long)((1000ULL * dev->mult) >> dev->shift));
 }
 
 /**
-- 
2.9.3



[RFC v5 14/23] clockevents: decouple ->max_delta_ns from ->max_delta_ticks

2016-09-03 Thread Nicolai Stange
Before converting the given delta from ns to cycles by means of the
mult/shift pair, clockevents_program_event() enforces it to be less or
equal than ->max_delta_ns. Simplified, this reads as

  delta = min(delta, dev->max_delta_ns);
  clc = (delta * dev->mult) >> dev->shift;

A device's ->max_delta_ns is chosen such that
1.) The multiplication does not overflow 64 bits.
2.) clc fits into an unsigned long.
3.) clc <= the ->max_delta_ticks allowed by hardware.

These constraints are responsible for making ->max_delta_ns depend tightly
on ->max_delta_ticks and the device's frequency, i.e. its mult/shift pair.
As it stands, any update to ->mult would require a corresponding change of
->max_delta_ns as well and these two had to get consumed in the clockevent
programming path atomically. This would have a negative performance impact
as soon as we start adjusting ->mult from a different CPU with upcoming
changes making the clockevents core NTP correction aware: some kind of
locking would have been needed in the event programming path.

In order to circumvent this necessity, ->max_delta_ns could be made small
enough to cover the whole range of possible adjustments to ->mult,
eliminating the need to update it at all.

The timekeeping core never adjusts its mono clock's inverse frequency by
more than 11% (c.f. timekeeping_adjust()). This means that a clockevent
device's adjusted mult will never grow beyond 1 / (1-11%) = 112.4% of its
initial value. Thus, reducing ->max_delta_ns unconditionally to
1/112.4% = 89% would do the trick. (Actually, setting it to 8/9 = 88.89%
thereof allows for mult increases of up to 12.5% = 1/8 which is good for
binary arithmetic.)

However, such a decrease of ->max_delta_ns could become problematic for
devices whose ->max_delta_ticks is small, i.e. those for whom item 3.)
prevails.

In order to work around this, I chose to make ->max_delta_ns completely
independent of ->max_delta_ticks. It is set to 8/9 of the maximum value
satisfying conditions 1.) and 2.) for the given ->mult. Item 3.) is made
explicit by introducing an extra min(clc, ->max_delta_ticks) after the ns
to cycles conversion in clockevents_program_event(). This way, the maximum
programmable delta is reduced only for those devices which have got a large
->max_delta_ticks anyway. This comes at the cost of an extra min() in the
event programming path though.

So,
- In the case of CLOCK_EVT_FEAT_NO_ADJUST not being enabled, set
  ->max_delta_ns to 8/9 of the maximum possible value such that items 1.)
  and 2.) hold for the given ->mult. OTOH, if CLOCK_EVT_FEAT_NO_ADJUST
  is not set, set ->max_delta_ns to the full such value.

- Add a

clc = min(clc, dev->max_delta_ticks)

  after the ns to cycle conversion in clockevents_program_event().

- Move the ->max_delta_ticks member into struct clock_event_device's
  first 64 bytes in order to optimize for cacheline usage.

- Finally, preserve the semantics of /proc/timer_list by letting it display
  the minimum of ->max_delta_ns and ->max_delta_ticks converted to ns at
  the 'max_delta_ns' field.

Signed-off-by: Nicolai Stange 
---
 include/linux/clockchips.h |  4 ++--
 kernel/time/clockevents.c  | 50 +-
 kernel/time/timer_list.c   |  6 +-
 3 files changed, 56 insertions(+), 4 deletions(-)

diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
index 9a25185..be5f222 100644
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -81,6 +81,7 @@ enum clock_event_state {
  * @next_event:local storage for the next event in oneshot mode
  * @max_delta_ns:  maximum delta value in ns
  * @min_delta_ns:  minimum delta value in ns
+ * @max_delta_ticks:   maximum delta value in ticks
  * @mult:  nanosecond to cycles multiplier
  * @shift: nanoseconds to cycles divisor (power of two)
  * @state_use_accessors:current state of the device, assigned by the core code
@@ -93,7 +94,6 @@ enum clock_event_state {
  * @tick_resume:   resume clkevt device
  * @broadcast: function to broadcast events
  * @min_delta_ticks:   minimum delta value in ticks stored for reconfiguration
- * @max_delta_ticks:   maximum delta value in ticks stored for reconfiguration
  * @name:  ptr to clock event name
  * @rating:variable to rate clock event devices
  * @irq:   IRQ number (only for non CPU local devices)
@@ -109,6 +109,7 @@ struct clock_event_device {
ktime_t next_event;
u64 max_delta_ns;
u64 min_delta_ns;
+   unsigned long   max_delta_ticks;
u32 mult;
u32 shift;
enum clock_event_state  state_use_accessors;
@@ -125,7 +126,6 @@ struct clock_event_device {
void(*suspend)(struct clock_event_device *);
void(*resume)(struct 

[RFC v5 17/23] clockevents: use ->min_delta_ticks_adjusted to program minimum delta

2016-09-03 Thread Nicolai Stange
The use of a clockevent device's ->min_delta_ns in the event programming
path hinders upcoming changes to the clockevent core making it NTP
correction aware: both, ->mult and ->min_delta_ns would need to get
updated as well as consumed atomically and we'd rather like to avoid any
locking here.

We already have got ->min_delta_ticks_adjusted which
- resembles the value of ->min_delta_ns
- and is guaranteed to be always >= the hardware's hard limit
  ->min_delta_ticks and thus, can be used w/o locking as we don't care
  for small deviations.

In both implementations of clockevents_program_min_delta(),
don't calculate the event's deadline from ->min_delta_ns, but use its
drop-in replacement ->min_delta_ticks_adjusted.

Signed-off-by: Nicolai Stange 
---
 kernel/time/clockevents.c | 15 ++-
 1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 8983fee..aa7b325 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -246,19 +246,14 @@ static int clockevents_increase_min_delta(struct 
clock_event_device *dev)
  */
 static int clockevents_program_min_delta(struct clock_event_device *dev)
 {
-   unsigned long long clc;
-   int64_t delta;
int i;
 
for (i = 0;;) {
-   delta = dev->min_delta_ns;
-
if (clockevent_state_shutdown(dev))
return 0;
 
dev->retries++;
-   clc = ((unsigned long long) delta * dev->mult) >> dev->shift;
-   if (dev->set_next_event((unsigned long) clc, dev) == 0)
+   if (!dev->set_next_event(dev->min_delta_ticks_adjusted, dev))
return 0;
 
if (++i > 2) {
@@ -284,17 +279,11 @@ static int clockevents_program_min_delta(struct 
clock_event_device *dev)
  */
 static int clockevents_program_min_delta(struct clock_event_device *dev)
 {
-   unsigned long long clc;
-   int64_t delta;
-
-   delta = dev->min_delta_ns;
-
if (clockevent_state_shutdown(dev))
return 0;
 
dev->retries++;
-   clc = ((unsigned long long) delta * dev->mult) >> dev->shift;
-   return dev->set_next_event((unsigned long) clc, dev);
+   return dev->set_next_event(dev->min_delta_ticks_adjusted, dev);
 }
 
 #endif /* CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST */
-- 
2.9.3



[RFC v5 17/23] clockevents: use ->min_delta_ticks_adjusted to program minimum delta

2016-09-03 Thread Nicolai Stange
The use of a clockevent device's ->min_delta_ns in the event programming
path hinders upcoming changes to the clockevent core making it NTP
correction aware: both, ->mult and ->min_delta_ns would need to get
updated as well as consumed atomically and we'd rather like to avoid any
locking here.

We already have got ->min_delta_ticks_adjusted which
- resembles the value of ->min_delta_ns
- and is guaranteed to be always >= the hardware's hard limit
  ->min_delta_ticks and thus, can be used w/o locking as we don't care
  for small deviations.

In both implementations of clockevents_program_min_delta(),
don't calculate the event's deadline from ->min_delta_ns, but use its
drop-in replacement ->min_delta_ticks_adjusted.

Signed-off-by: Nicolai Stange 
---
 kernel/time/clockevents.c | 15 ++-
 1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 8983fee..aa7b325 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -246,19 +246,14 @@ static int clockevents_increase_min_delta(struct 
clock_event_device *dev)
  */
 static int clockevents_program_min_delta(struct clock_event_device *dev)
 {
-   unsigned long long clc;
-   int64_t delta;
int i;
 
for (i = 0;;) {
-   delta = dev->min_delta_ns;
-
if (clockevent_state_shutdown(dev))
return 0;
 
dev->retries++;
-   clc = ((unsigned long long) delta * dev->mult) >> dev->shift;
-   if (dev->set_next_event((unsigned long) clc, dev) == 0)
+   if (!dev->set_next_event(dev->min_delta_ticks_adjusted, dev))
return 0;
 
if (++i > 2) {
@@ -284,17 +279,11 @@ static int clockevents_program_min_delta(struct 
clock_event_device *dev)
  */
 static int clockevents_program_min_delta(struct clock_event_device *dev)
 {
-   unsigned long long clc;
-   int64_t delta;
-
-   delta = dev->min_delta_ns;
-
if (clockevent_state_shutdown(dev))
return 0;
 
dev->retries++;
-   clc = ((unsigned long long) delta * dev->mult) >> dev->shift;
-   return dev->set_next_event((unsigned long) clc, dev);
+   return dev->set_next_event(dev->min_delta_ticks_adjusted, dev);
 }
 
 #endif /* CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST */
-- 
2.9.3



[RFC v5 21/23] clockevents: initial support for mono to raw time conversion

2016-09-03 Thread Nicolai Stange
With NOHZ_FULL and one single well-isolated, CPU consumptive task, one
would expect approximately one clockevent interrupt per second. However, on
my Intel Haswell where the monotonic clock is the TSC monotonic clock and
the clockevent device is the TSC deadline device, it turns out that every
second, there are two such interrupts: the first one arrives always
approximately ~50us before the scheduled deadline as programmed by
tick_nohz_stop_sched_tick() through the hrtimer API. The
__hrtimer_run_queues() called in this interrupt detects that the queued
tick_sched_timer hasn't expired yet and simply does nothing except
reprogramming the clock event device to fire shortly after again.

These too early programmed deadlines are explained as follows:
clockevents_program_event() programs the clockevent device to fire
after
  f_event * delta_t_progr
clockevent device cycles where f_event is the clockevent device's hardware
frequency and delta_t_progr is the requested time interval. After that many
clockevent device cycles have elapsed, the device underlying the monotonic
clock, that is the monotonic raw clock has seen f_raw / f_event as many
cycles.
The ktime_get() called from __hrtimer_run_queues() interprets those
cycles to run at the frequency of the monotonic clock. Summarizing:
  delta_t_perc = 1/f_mono * f_raw/f_event * f_event * delta_t_progr
   = f_raw / f_mono * delta_t_progr
with f_mono being the monotonic clock's frequency and delta_t_perc being
the elapsed time interval as perceived by __hrtimer_run_queues().

Now, f_mono is not a constant, but is dynamically adjusted in
timekeeping_adjust() in order to compensate for the NTP error. With the
large values of delta_t_progr of 10^9ns with NOHZ_FULL, the error made
becomes significant and results in the double timer interrupts described
above.

Compensate for this error by multiplying the clockevent device's f_event
by f_mono/f_raw.

Namely:
- Introduce a ->mult_adjusted member to the struct clock_event_device. Its
  value is supposed to be equal to ->mult * f_mono/f_raw for devices
  which don't have the CLOCK_EVT_FEAT_NO_ADJUST flag set, equal to ->mult
  otherwise.
- Introduce the timekeeping_get_mono_mult() helper which provides
  the clockevent core with access to the timekeeping's current f_mono
  and f_raw.
- Introduce the helper __clockevents_adjust_freq() which
  sets a clockevent device's ->mult_adjusted member as appropriate. It is
  implemented with the help of the new __clockevents_calc_adjust_freq().
- Call __clockevents_adjust_freq() at clockevent device registration time
  as well as at frequency updates through clockevents_update_freq().
- Use the ->mult_adjusted rather than ->mult in the ns to cycle
  conversion made in clockevents_program_event() as well as in the
  cycle to ns conversion in cev_delta2ns().
- Finally, move ->mult out of struct clock_event_device's first cacheline.

Note that future adjustments of the monotonic clock are not taken into
account yet. Furthemore, this patch assumes that after a clockevent
device's registration, its ->mult changes only through calls to
clockevents_update_freq().

Signed-off-by: Nicolai Stange 
---
 include/linux/clockchips.h  |  6 ++--
 kernel/time/clockevents.c   | 79 ++---
 kernel/time/tick-internal.h |  1 +
 kernel/time/timekeeping.c   | 16 +
 4 files changed, 89 insertions(+), 13 deletions(-)

diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
index 2ff15f03..28f9263 100644
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -82,7 +82,7 @@ enum clock_event_state {
  * @max_delta_ns:  maximum delta value in ns
  * @max_delta_ticks:   maximum delta value in ticks
  * @min_delta_ticks_adjusted:  minimum delta value, increased as needed
- * @mult:  nanosecond to cycles multiplier
+ * @mult_adjusted: adjusted multiplier compensating for NTP adjustments
  * @shift: nanoseconds to cycles divisor (power of two)
  * @state_use_accessors:current state of the device, assigned by the core code
  * @features:  features
@@ -94,6 +94,7 @@ enum clock_event_state {
  * @tick_resume:   resume clkevt device
  * @broadcast: function to broadcast events
  * @min_delta_ticks:   minimum delta value in ticks stored for reconfiguration
+ * @mult:  ns to cycles multiplier stored for reconfiguration
  * @name:  ptr to clock event name
  * @rating:variable to rate clock event devices
  * @irq:   IRQ number (only for non CPU local devices)
@@ -110,7 +111,7 @@ struct clock_event_device {
u64 max_delta_ns;
unsigned long   max_delta_ticks;
unsigned long   min_delta_ticks_adjusted;
-   u32 mult;
+   u32 mult_adjusted;
u32 shift;
enum clock_event_state  

[RFC v5 19/23] timer_list: print_tickdevice(): calculate ->min_delta_ns dynamically

2016-09-03 Thread Nicolai Stange
print_tickdevice(), assembling the per-tick device sections in
/proc/timer_list, is the last user of struct clock_event_device's
->min_delta_ns member.

In order to make this one fully obsolete while retaining userspace ABI,
calculate the displayed value of 'min_delta_ns' on the fly from
->min_delta_ticks_adjusted.

Signed-off-by: Nicolai Stange 
---
 kernel/time/timer_list.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/time/timer_list.c b/kernel/time/timer_list.c
index ac20d4c..9067760 100644
--- a/kernel/time/timer_list.c
+++ b/kernel/time/timer_list.c
@@ -206,10 +206,11 @@ static void
 print_tickdevice(struct seq_file *m, struct tick_device *td, int cpu)
 {
struct clock_event_device *dev = td->evtdev;
-   u64 max_delta_ns;
+   u64 max_delta_ns, min_delta_ns;
 
max_delta_ns = clockevent_delta2ns(dev->max_delta_ticks, dev);
max_delta_ns = min(max_delta_ns, dev->max_delta_ns);
+   min_delta_ns = clockevent_delta2ns(dev->min_delta_ticks_adjusted, dev);
 
SEQ_printf(m, "Tick Device: mode: %d\n", td->mode);
if (cpu < 0)
@@ -226,7 +227,7 @@ print_tickdevice(struct seq_file *m, struct tick_device 
*td, int cpu)
SEQ_printf(m, " max_delta_ns:   %llu\n",
   (unsigned long long) max_delta_ns);
SEQ_printf(m, " min_delta_ns:   %llu\n",
-  (unsigned long long) dev->min_delta_ns);
+  (unsigned long long) min_delta_ns);
SEQ_printf(m, " mult:   %u\n", dev->mult);
SEQ_printf(m, " shift:  %u\n", dev->shift);
SEQ_printf(m, " mode:   %d\n", clockevent_get_state(dev));
-- 
2.9.3



[RFC v5 21/23] clockevents: initial support for mono to raw time conversion

2016-09-03 Thread Nicolai Stange
With NOHZ_FULL and one single well-isolated, CPU consumptive task, one
would expect approximately one clockevent interrupt per second. However, on
my Intel Haswell where the monotonic clock is the TSC monotonic clock and
the clockevent device is the TSC deadline device, it turns out that every
second, there are two such interrupts: the first one arrives always
approximately ~50us before the scheduled deadline as programmed by
tick_nohz_stop_sched_tick() through the hrtimer API. The
__hrtimer_run_queues() called in this interrupt detects that the queued
tick_sched_timer hasn't expired yet and simply does nothing except
reprogramming the clock event device to fire shortly after again.

These too early programmed deadlines are explained as follows:
clockevents_program_event() programs the clockevent device to fire
after
  f_event * delta_t_progr
clockevent device cycles where f_event is the clockevent device's hardware
frequency and delta_t_progr is the requested time interval. After that many
clockevent device cycles have elapsed, the device underlying the monotonic
clock, that is the monotonic raw clock has seen f_raw / f_event as many
cycles.
The ktime_get() called from __hrtimer_run_queues() interprets those
cycles to run at the frequency of the monotonic clock. Summarizing:
  delta_t_perc = 1/f_mono * f_raw/f_event * f_event * delta_t_progr
   = f_raw / f_mono * delta_t_progr
with f_mono being the monotonic clock's frequency and delta_t_perc being
the elapsed time interval as perceived by __hrtimer_run_queues().

Now, f_mono is not a constant, but is dynamically adjusted in
timekeeping_adjust() in order to compensate for the NTP error. With the
large values of delta_t_progr of 10^9ns with NOHZ_FULL, the error made
becomes significant and results in the double timer interrupts described
above.

Compensate for this error by multiplying the clockevent device's f_event
by f_mono/f_raw.

Namely:
- Introduce a ->mult_adjusted member to the struct clock_event_device. Its
  value is supposed to be equal to ->mult * f_mono/f_raw for devices
  which don't have the CLOCK_EVT_FEAT_NO_ADJUST flag set, equal to ->mult
  otherwise.
- Introduce the timekeeping_get_mono_mult() helper which provides
  the clockevent core with access to the timekeeping's current f_mono
  and f_raw.
- Introduce the helper __clockevents_adjust_freq() which
  sets a clockevent device's ->mult_adjusted member as appropriate. It is
  implemented with the help of the new __clockevents_calc_adjust_freq().
- Call __clockevents_adjust_freq() at clockevent device registration time
  as well as at frequency updates through clockevents_update_freq().
- Use the ->mult_adjusted rather than ->mult in the ns to cycle
  conversion made in clockevents_program_event() as well as in the
  cycle to ns conversion in cev_delta2ns().
- Finally, move ->mult out of struct clock_event_device's first cacheline.

Note that future adjustments of the monotonic clock are not taken into
account yet. Furthemore, this patch assumes that after a clockevent
device's registration, its ->mult changes only through calls to
clockevents_update_freq().

Signed-off-by: Nicolai Stange 
---
 include/linux/clockchips.h  |  6 ++--
 kernel/time/clockevents.c   | 79 ++---
 kernel/time/tick-internal.h |  1 +
 kernel/time/timekeeping.c   | 16 +
 4 files changed, 89 insertions(+), 13 deletions(-)

diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
index 2ff15f03..28f9263 100644
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -82,7 +82,7 @@ enum clock_event_state {
  * @max_delta_ns:  maximum delta value in ns
  * @max_delta_ticks:   maximum delta value in ticks
  * @min_delta_ticks_adjusted:  minimum delta value, increased as needed
- * @mult:  nanosecond to cycles multiplier
+ * @mult_adjusted: adjusted multiplier compensating for NTP adjustments
  * @shift: nanoseconds to cycles divisor (power of two)
  * @state_use_accessors:current state of the device, assigned by the core code
  * @features:  features
@@ -94,6 +94,7 @@ enum clock_event_state {
  * @tick_resume:   resume clkevt device
  * @broadcast: function to broadcast events
  * @min_delta_ticks:   minimum delta value in ticks stored for reconfiguration
+ * @mult:  ns to cycles multiplier stored for reconfiguration
  * @name:  ptr to clock event name
  * @rating:variable to rate clock event devices
  * @irq:   IRQ number (only for non CPU local devices)
@@ -110,7 +111,7 @@ struct clock_event_device {
u64 max_delta_ns;
unsigned long   max_delta_ticks;
unsigned long   min_delta_ticks_adjusted;
-   u32 mult;
+   u32 mult_adjusted;
u32 shift;
enum clock_event_state  state_use_accessors;

[RFC v5 19/23] timer_list: print_tickdevice(): calculate ->min_delta_ns dynamically

2016-09-03 Thread Nicolai Stange
print_tickdevice(), assembling the per-tick device sections in
/proc/timer_list, is the last user of struct clock_event_device's
->min_delta_ns member.

In order to make this one fully obsolete while retaining userspace ABI,
calculate the displayed value of 'min_delta_ns' on the fly from
->min_delta_ticks_adjusted.

Signed-off-by: Nicolai Stange 
---
 kernel/time/timer_list.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/time/timer_list.c b/kernel/time/timer_list.c
index ac20d4c..9067760 100644
--- a/kernel/time/timer_list.c
+++ b/kernel/time/timer_list.c
@@ -206,10 +206,11 @@ static void
 print_tickdevice(struct seq_file *m, struct tick_device *td, int cpu)
 {
struct clock_event_device *dev = td->evtdev;
-   u64 max_delta_ns;
+   u64 max_delta_ns, min_delta_ns;
 
max_delta_ns = clockevent_delta2ns(dev->max_delta_ticks, dev);
max_delta_ns = min(max_delta_ns, dev->max_delta_ns);
+   min_delta_ns = clockevent_delta2ns(dev->min_delta_ticks_adjusted, dev);
 
SEQ_printf(m, "Tick Device: mode: %d\n", td->mode);
if (cpu < 0)
@@ -226,7 +227,7 @@ print_tickdevice(struct seq_file *m, struct tick_device 
*td, int cpu)
SEQ_printf(m, " max_delta_ns:   %llu\n",
   (unsigned long long) max_delta_ns);
SEQ_printf(m, " min_delta_ns:   %llu\n",
-  (unsigned long long) dev->min_delta_ns);
+  (unsigned long long) min_delta_ns);
SEQ_printf(m, " mult:   %u\n", dev->mult);
SEQ_printf(m, " shift:  %u\n", dev->shift);
SEQ_printf(m, " mode:   %d\n", clockevent_get_state(dev));
-- 
2.9.3



[RFC v5 18/23] clockevents: min delta increment: calculate min_delta_ns from ticks

2016-09-03 Thread Nicolai Stange
The use of a clockevent device's ->min_delta_ns in the event programming
path hinders upcoming changes to the clockevent core making it NTP
correction aware: both, ->mult and ->min_delta_ns would need to get
updated as well as consumed atomically and we'd rather like to avoid any
locking here.

We already have got ->min_delta_ticks_adjusted which
- resembles the value of ->min_delta_ns
- and is guaranteed to be always >= the hardware's hard limit
  ->min_delta_ticks and thus, can be used w/o locking as we don't care
  for small deviations.

In clockevents_increase_min_delta(), don't use ->min_delta_ns but
calculate it dynamically from ->min_delta_ticks_adjusted.

As clockevents_increase_min_delta() gets invoked only rarely, the
additional division should not be an issue from a performance standpoint.

Signed-off-by: Nicolai Stange 
---
 kernel/time/clockevents.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index aa7b325..86d9f97 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -210,23 +210,26 @@ int clockevents_tick_resume(struct clock_event_device 
*dev)
  */
 static int clockevents_increase_min_delta(struct clock_event_device *dev)
 {
+   u64 min_delta_ns = cev_delta2ns(dev->min_delta_ticks_adjusted, dev,
+   false);
+
/* Nothing to do if we already reached the limit */
-   if (dev->min_delta_ns >= MIN_DELTA_LIMIT) {
+   if (min_delta_ns >= MIN_DELTA_LIMIT) {
printk_deferred(KERN_WARNING
"CE: Reprogramming failure. Giving up\n");
dev->next_event.tv64 = KTIME_MAX;
return -ETIME;
}
 
-   if (dev->min_delta_ns < 5000)
-   dev->min_delta_ns = 5000;
+   if (min_delta_ns < 5000)
+   min_delta_ns = 5000;
else
-   dev->min_delta_ns += dev->min_delta_ns >> 1;
+   min_delta_ns += min_delta_ns >> 1;
 
-   if (dev->min_delta_ns > MIN_DELTA_LIMIT)
-   dev->min_delta_ns = MIN_DELTA_LIMIT;
+   if (min_delta_ns > MIN_DELTA_LIMIT)
+   min_delta_ns = MIN_DELTA_LIMIT;
 
-   dev->min_delta_ticks_adjusted = (unsigned long)((dev->min_delta_ns *
+   dev->min_delta_ticks_adjusted = (unsigned long)((min_delta_ns *
dev->mult) >> dev->shift);
dev->min_delta_ticks_adjusted = max(dev->min_delta_ticks_adjusted,
dev->min_delta_ticks);
@@ -234,7 +237,7 @@ static int clockevents_increase_min_delta(struct 
clock_event_device *dev)
printk_deferred(KERN_WARNING
"CE: %s increased min_delta_ns to %llu nsec\n",
dev->name ? dev->name : "?",
-   (unsigned long long) dev->min_delta_ns);
+   (unsigned long long) min_delta_ns);
return 0;
 }
 
-- 
2.9.3



[RFC v5 22/23] clockevents: make setting of ->mult and ->mult_adjusted atomic

2016-09-03 Thread Nicolai Stange
In order to avoid races between setting a struct clock_event_device's
->mult_adjusted in clockevents_update_freq() and yet to be implemented
updates triggered from the timekeeping core, the setting of ->mult and
->mult_adjusted should be made atomic.

Protect the update in clockevents_update_freq() by locking the
clockevents_lock spinlock. Frequency updates are expected to be done
seldomly and thus, taking this subsystem lock should not have any impact
on performance.

Note that the call to tick_broadcast_update_freq() is also protected
by this clockevents_lock and thus, this patch can take the
tick_broadcast_lock with the clockevents_lock being held. However,
this is supposed to be safe since the sequence
  __clockevents_unbind() -> clockevents_replace() ->
  tick_install_replacement() -> tick_setup_device() ->
  tick_device_uses_broadcast()
already does this, too.

Signed-off-by: Nicolai Stange 
---
 kernel/time/clockevents.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 60adaa8..c8b2df8 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -658,11 +658,11 @@ int clockevents_update_freq(struct clock_event_device 
*dev, u32 freq)
unsigned long flags;
int ret;
 
-   local_irq_save(flags);
+   raw_spin_lock_irqsave(_lock, flags);
ret = tick_broadcast_update_freq(dev, freq);
if (ret == -ENODEV)
ret = __clockevents_update_freq(dev, freq);
-   local_irq_restore(flags);
+   raw_spin_unlock_irqrestore(_lock, flags);
return ret;
 }
 
-- 
2.9.3



[RFC v5 23/23] timekeeping: inform clockevents about freq adjustments

2016-09-03 Thread Nicolai Stange
Upon adjustments of the monotonic clock's frequencies from the
timekeeping core, the clockevents devices' ->mult_adjusted should be
changed accordingly, too.

Introduce clockevents_adjust_all_freqs() which traverses all registered
clockevent devices and, if the CLOCK_EVT_FEAT_NO_ADJUST flag is not set,
recalculates their ->mult_adjusted based on the monotonic clock's current
frequency.

Call clockevents_adjust_all_freqs() from timekeeping_apply_adjustment().

Signed-off-by: Nicolai Stange 
---
 kernel/time/clockevents.c   | 33 +
 kernel/time/tick-internal.h |  5 +
 kernel/time/timekeeping.c   |  3 +++
 3 files changed, 41 insertions(+)

diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index c8b2df8..40ee5b2 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -626,6 +626,39 @@ void __clockevents_adjust_freq(struct clock_event_device 
*dev)
mult_cs_raw);
 }
 
+void clockevents_adjust_all_freqs(u32 mult_cs_mono, u32 mult_cs_raw)
+{
+   u32 last_mult_raw = 0, last_shift = 0, last_mult_adjusted = 0;
+   u32 mult_raw, shift;
+   unsigned long flags;
+   struct clock_event_device *dev;
+
+   raw_spin_lock_irqsave(_lock, flags);
+   list_for_each_entry(dev, _devices, list) {
+   if (!(dev->features & CLOCK_EVT_FEAT_ONESHOT) ||
+(dev->features & CLOCK_EVT_FEAT_NO_ADJUST))
+   continue;
+
+   /*
+* The cached last_mult_adjusted is only valid if
+* shift == last_shift. Otherwise, it could exceed
+* what is allowed by ->max_delta_ns.
+*/
+   mult_raw = dev->mult;
+   shift = dev->shift;
+   if (mult_raw != last_mult_raw || shift != last_shift) {
+   last_mult_raw = mult_raw;
+   last_shift = shift;
+   last_mult_adjusted =
+   __clockevents_calc_adjust_freq(mult_raw,
+   mult_cs_mono,
+   mult_cs_raw);
+   }
+   dev->mult_adjusted = last_mult_adjusted;
+   }
+   raw_spin_unlock_irqrestore(_lock, flags);
+}
+
 int __clockevents_update_freq(struct clock_event_device *dev, u32 freq)
 {
clockevents_config(dev, freq);
diff --git a/kernel/time/tick-internal.h b/kernel/time/tick-internal.h
index 0b29d23..2d97c42 100644
--- a/kernel/time/tick-internal.h
+++ b/kernel/time/tick-internal.h
@@ -56,6 +56,7 @@ extern int clockevents_program_event(struct 
clock_event_device *dev,
 ktime_t expires, bool force);
 extern void clockevents_handle_noop(struct clock_event_device *dev);
 extern int __clockevents_update_freq(struct clock_event_device *dev, u32 freq);
+extern void clockevents_adjust_all_freqs(u32 mult_cs_mono, u32 mult_cs_raw);
 extern void timekeeping_get_mono_mult(u32 *mult_cs_mono, u32 *mult_cs_raw);
 extern ssize_t sysfs_get_uname(const char *buf, char *dst, size_t cnt);
 
@@ -95,6 +96,10 @@ static inline void tick_set_periodic_handler(struct 
clock_event_device *dev, int
 #else /* !GENERIC_CLOCKEVENTS: */
 static inline void tick_suspend(void) { }
 static inline void tick_resume(void) { }
+
+static inline void clockevents_adjust_all_freqs(u32 mult_cs_mono,
+   u32 mult_cs_raw)
+{}
 #endif /* !GENERIC_CLOCKEVENTS */
 
 /* Oneshot related functions */
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index bd12861..6b3f2bc 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1850,6 +1850,9 @@ static __always_inline void 
timekeeping_apply_adjustment(struct timekeeper *tk,
tk->xtime_interval += interval;
tk->tkr_mono.xtime_nsec -= offset;
tk->ntp_error -= (interval - offset) << tk->ntp_error_shift;
+
+   clockevents_adjust_all_freqs(tk->tkr_mono.mult,
+   tk->tkr_mono.clock->mult);
 }
 
 /*
-- 
2.9.3



[RFC v5 15/23] clockevents: do comparison of delta against minimum in terms of cycles

2016-09-03 Thread Nicolai Stange
Before converting the given delta from ns to cycles by means of the
mult/shift pair, clockevents_program_event() enforces it to be greater or
equal than ->max_delta_ns. Simplified, this reads as

  delta = max(delta, dev->min_delta_ns);
  clc = (delta * dev->mult) >> dev->shift;

Note that ->min_delta_ns is more or less ->min_delta_ticks converted to
nanoseconds and thus, it depends on the device's mult/shift pair as well.
Thus, any update to ->mult would require a corresponding change of
->min_delta_ns and these two had to get consumed in the clockevent
programming path atomically. This would have a negative performance impact
as soon as we start adjusting ->mult from a different CPU with upcoming
changes making the clockevents core NTP correction aware: some kind of
locking would have been needed in the event programming path.

These costly atomic reads in the event programming path can be avoided by
doing the comparison against the lower bound in terms of cycles instead of
nanoseconds.

Now, as it stands, ->min_delta_ns is not always completely equivalent
to ->min_delta_ticks: first, it is forced to be >= 1us and second,
clockevents_program_min_delta() can increase it if
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y.

In order to account for this, introduce ->min_delta_ticks_adjusted which
is supposed to be equivalent to ->min_delta_ns. Keep the original
->min_delta_ticks for reconfiguration.

The invariant ->min_delta_ticks <= ->min_delta_ticks_adjusted will be
guaranteed to hold at all times. This will allow for non-atomic updates of
 ->mult and ->min_delta_ticks_adjusted -- as long as we stay within a
device's allowed bounds, we don't care for small deviations.

Make clockevents_program_event() use ->min_delta_ticks_adjusted for the
enforcement of the lower bound on the delta value.

Finally, slightly reorder the members of struct clock_event_device in order
to keep the now often used ones close together.

Signed-off-by: Nicolai Stange 
---
 include/linux/clockchips.h |  6 --
 kernel/time/clockevents.c  | 11 ++-
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
index be5f222..7c3a193 100644
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -80,8 +80,8 @@ enum clock_event_state {
  * @set_next_ktime:set next event function using a direct ktime value
  * @next_event:local storage for the next event in oneshot mode
  * @max_delta_ns:  maximum delta value in ns
- * @min_delta_ns:  minimum delta value in ns
  * @max_delta_ticks:   maximum delta value in ticks
+ * @min_delta_ticks_adjusted:  minimum delta value, increased as needed
  * @mult:  nanosecond to cycles multiplier
  * @shift: nanoseconds to cycles divisor (power of two)
  * @state_use_accessors:current state of the device, assigned by the core code
@@ -94,6 +94,7 @@ enum clock_event_state {
  * @tick_resume:   resume clkevt device
  * @broadcast: function to broadcast events
  * @min_delta_ticks:   minimum delta value in ticks stored for reconfiguration
+ * @min_delta_ns:  minimum delta value in ns
  * @name:  ptr to clock event name
  * @rating:variable to rate clock event devices
  * @irq:   IRQ number (only for non CPU local devices)
@@ -108,8 +109,8 @@ struct clock_event_device {
int (*set_next_ktime)(ktime_t expires, struct 
clock_event_device *);
ktime_t next_event;
u64 max_delta_ns;
-   u64 min_delta_ns;
unsigned long   max_delta_ticks;
+   unsigned long   min_delta_ticks_adjusted;
u32 mult;
u32 shift;
enum clock_event_state  state_use_accessors;
@@ -126,6 +127,7 @@ struct clock_event_device {
void(*suspend)(struct clock_event_device *);
void(*resume)(struct clock_event_device *);
unsigned long   min_delta_ticks;
+   u64 min_delta_ns;
 
const char  *name;
int rating;
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 472fcc2..f41f584 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -226,6 +226,11 @@ static int clockevents_increase_min_delta(struct 
clock_event_device *dev)
if (dev->min_delta_ns > MIN_DELTA_LIMIT)
dev->min_delta_ns = MIN_DELTA_LIMIT;
 
+   dev->min_delta_ticks_adjusted = (unsigned long)((dev->min_delta_ns *
+   dev->mult) >> dev->shift);
+   dev->min_delta_ticks_adjusted = max(dev->min_delta_ticks_adjusted,
+   dev->min_delta_ticks);
+
printk_deferred(KERN_WARNING
"CE: %s 

[RFC v5 18/23] clockevents: min delta increment: calculate min_delta_ns from ticks

2016-09-03 Thread Nicolai Stange
The use of a clockevent device's ->min_delta_ns in the event programming
path hinders upcoming changes to the clockevent core making it NTP
correction aware: both, ->mult and ->min_delta_ns would need to get
updated as well as consumed atomically and we'd rather like to avoid any
locking here.

We already have got ->min_delta_ticks_adjusted which
- resembles the value of ->min_delta_ns
- and is guaranteed to be always >= the hardware's hard limit
  ->min_delta_ticks and thus, can be used w/o locking as we don't care
  for small deviations.

In clockevents_increase_min_delta(), don't use ->min_delta_ns but
calculate it dynamically from ->min_delta_ticks_adjusted.

As clockevents_increase_min_delta() gets invoked only rarely, the
additional division should not be an issue from a performance standpoint.

Signed-off-by: Nicolai Stange 
---
 kernel/time/clockevents.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index aa7b325..86d9f97 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -210,23 +210,26 @@ int clockevents_tick_resume(struct clock_event_device 
*dev)
  */
 static int clockevents_increase_min_delta(struct clock_event_device *dev)
 {
+   u64 min_delta_ns = cev_delta2ns(dev->min_delta_ticks_adjusted, dev,
+   false);
+
/* Nothing to do if we already reached the limit */
-   if (dev->min_delta_ns >= MIN_DELTA_LIMIT) {
+   if (min_delta_ns >= MIN_DELTA_LIMIT) {
printk_deferred(KERN_WARNING
"CE: Reprogramming failure. Giving up\n");
dev->next_event.tv64 = KTIME_MAX;
return -ETIME;
}
 
-   if (dev->min_delta_ns < 5000)
-   dev->min_delta_ns = 5000;
+   if (min_delta_ns < 5000)
+   min_delta_ns = 5000;
else
-   dev->min_delta_ns += dev->min_delta_ns >> 1;
+   min_delta_ns += min_delta_ns >> 1;
 
-   if (dev->min_delta_ns > MIN_DELTA_LIMIT)
-   dev->min_delta_ns = MIN_DELTA_LIMIT;
+   if (min_delta_ns > MIN_DELTA_LIMIT)
+   min_delta_ns = MIN_DELTA_LIMIT;
 
-   dev->min_delta_ticks_adjusted = (unsigned long)((dev->min_delta_ns *
+   dev->min_delta_ticks_adjusted = (unsigned long)((min_delta_ns *
dev->mult) >> dev->shift);
dev->min_delta_ticks_adjusted = max(dev->min_delta_ticks_adjusted,
dev->min_delta_ticks);
@@ -234,7 +237,7 @@ static int clockevents_increase_min_delta(struct 
clock_event_device *dev)
printk_deferred(KERN_WARNING
"CE: %s increased min_delta_ns to %llu nsec\n",
dev->name ? dev->name : "?",
-   (unsigned long long) dev->min_delta_ns);
+   (unsigned long long) min_delta_ns);
return 0;
 }
 
-- 
2.9.3



[RFC v5 22/23] clockevents: make setting of ->mult and ->mult_adjusted atomic

2016-09-03 Thread Nicolai Stange
In order to avoid races between setting a struct clock_event_device's
->mult_adjusted in clockevents_update_freq() and yet to be implemented
updates triggered from the timekeeping core, the setting of ->mult and
->mult_adjusted should be made atomic.

Protect the update in clockevents_update_freq() by locking the
clockevents_lock spinlock. Frequency updates are expected to be done
seldomly and thus, taking this subsystem lock should not have any impact
on performance.

Note that the call to tick_broadcast_update_freq() is also protected
by this clockevents_lock and thus, this patch can take the
tick_broadcast_lock with the clockevents_lock being held. However,
this is supposed to be safe since the sequence
  __clockevents_unbind() -> clockevents_replace() ->
  tick_install_replacement() -> tick_setup_device() ->
  tick_device_uses_broadcast()
already does this, too.

Signed-off-by: Nicolai Stange 
---
 kernel/time/clockevents.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 60adaa8..c8b2df8 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -658,11 +658,11 @@ int clockevents_update_freq(struct clock_event_device 
*dev, u32 freq)
unsigned long flags;
int ret;
 
-   local_irq_save(flags);
+   raw_spin_lock_irqsave(_lock, flags);
ret = tick_broadcast_update_freq(dev, freq);
if (ret == -ENODEV)
ret = __clockevents_update_freq(dev, freq);
-   local_irq_restore(flags);
+   raw_spin_unlock_irqrestore(_lock, flags);
return ret;
 }
 
-- 
2.9.3



[RFC v5 23/23] timekeeping: inform clockevents about freq adjustments

2016-09-03 Thread Nicolai Stange
Upon adjustments of the monotonic clock's frequencies from the
timekeeping core, the clockevents devices' ->mult_adjusted should be
changed accordingly, too.

Introduce clockevents_adjust_all_freqs() which traverses all registered
clockevent devices and, if the CLOCK_EVT_FEAT_NO_ADJUST flag is not set,
recalculates their ->mult_adjusted based on the monotonic clock's current
frequency.

Call clockevents_adjust_all_freqs() from timekeeping_apply_adjustment().

Signed-off-by: Nicolai Stange 
---
 kernel/time/clockevents.c   | 33 +
 kernel/time/tick-internal.h |  5 +
 kernel/time/timekeeping.c   |  3 +++
 3 files changed, 41 insertions(+)

diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index c8b2df8..40ee5b2 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -626,6 +626,39 @@ void __clockevents_adjust_freq(struct clock_event_device 
*dev)
mult_cs_raw);
 }
 
+void clockevents_adjust_all_freqs(u32 mult_cs_mono, u32 mult_cs_raw)
+{
+   u32 last_mult_raw = 0, last_shift = 0, last_mult_adjusted = 0;
+   u32 mult_raw, shift;
+   unsigned long flags;
+   struct clock_event_device *dev;
+
+   raw_spin_lock_irqsave(_lock, flags);
+   list_for_each_entry(dev, _devices, list) {
+   if (!(dev->features & CLOCK_EVT_FEAT_ONESHOT) ||
+(dev->features & CLOCK_EVT_FEAT_NO_ADJUST))
+   continue;
+
+   /*
+* The cached last_mult_adjusted is only valid if
+* shift == last_shift. Otherwise, it could exceed
+* what is allowed by ->max_delta_ns.
+*/
+   mult_raw = dev->mult;
+   shift = dev->shift;
+   if (mult_raw != last_mult_raw || shift != last_shift) {
+   last_mult_raw = mult_raw;
+   last_shift = shift;
+   last_mult_adjusted =
+   __clockevents_calc_adjust_freq(mult_raw,
+   mult_cs_mono,
+   mult_cs_raw);
+   }
+   dev->mult_adjusted = last_mult_adjusted;
+   }
+   raw_spin_unlock_irqrestore(_lock, flags);
+}
+
 int __clockevents_update_freq(struct clock_event_device *dev, u32 freq)
 {
clockevents_config(dev, freq);
diff --git a/kernel/time/tick-internal.h b/kernel/time/tick-internal.h
index 0b29d23..2d97c42 100644
--- a/kernel/time/tick-internal.h
+++ b/kernel/time/tick-internal.h
@@ -56,6 +56,7 @@ extern int clockevents_program_event(struct 
clock_event_device *dev,
 ktime_t expires, bool force);
 extern void clockevents_handle_noop(struct clock_event_device *dev);
 extern int __clockevents_update_freq(struct clock_event_device *dev, u32 freq);
+extern void clockevents_adjust_all_freqs(u32 mult_cs_mono, u32 mult_cs_raw);
 extern void timekeeping_get_mono_mult(u32 *mult_cs_mono, u32 *mult_cs_raw);
 extern ssize_t sysfs_get_uname(const char *buf, char *dst, size_t cnt);
 
@@ -95,6 +96,10 @@ static inline void tick_set_periodic_handler(struct 
clock_event_device *dev, int
 #else /* !GENERIC_CLOCKEVENTS: */
 static inline void tick_suspend(void) { }
 static inline void tick_resume(void) { }
+
+static inline void clockevents_adjust_all_freqs(u32 mult_cs_mono,
+   u32 mult_cs_raw)
+{}
 #endif /* !GENERIC_CLOCKEVENTS */
 
 /* Oneshot related functions */
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index bd12861..6b3f2bc 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1850,6 +1850,9 @@ static __always_inline void 
timekeeping_apply_adjustment(struct timekeeper *tk,
tk->xtime_interval += interval;
tk->tkr_mono.xtime_nsec -= offset;
tk->ntp_error -= (interval - offset) << tk->ntp_error_shift;
+
+   clockevents_adjust_all_freqs(tk->tkr_mono.mult,
+   tk->tkr_mono.clock->mult);
 }
 
 /*
-- 
2.9.3



[RFC v5 15/23] clockevents: do comparison of delta against minimum in terms of cycles

2016-09-03 Thread Nicolai Stange
Before converting the given delta from ns to cycles by means of the
mult/shift pair, clockevents_program_event() enforces it to be greater or
equal than ->max_delta_ns. Simplified, this reads as

  delta = max(delta, dev->min_delta_ns);
  clc = (delta * dev->mult) >> dev->shift;

Note that ->min_delta_ns is more or less ->min_delta_ticks converted to
nanoseconds and thus, it depends on the device's mult/shift pair as well.
Thus, any update to ->mult would require a corresponding change of
->min_delta_ns and these two had to get consumed in the clockevent
programming path atomically. This would have a negative performance impact
as soon as we start adjusting ->mult from a different CPU with upcoming
changes making the clockevents core NTP correction aware: some kind of
locking would have been needed in the event programming path.

These costly atomic reads in the event programming path can be avoided by
doing the comparison against the lower bound in terms of cycles instead of
nanoseconds.

Now, as it stands, ->min_delta_ns is not always completely equivalent
to ->min_delta_ticks: first, it is forced to be >= 1us and second,
clockevents_program_min_delta() can increase it if
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y.

In order to account for this, introduce ->min_delta_ticks_adjusted which
is supposed to be equivalent to ->min_delta_ns. Keep the original
->min_delta_ticks for reconfiguration.

The invariant ->min_delta_ticks <= ->min_delta_ticks_adjusted will be
guaranteed to hold at all times. This will allow for non-atomic updates of
 ->mult and ->min_delta_ticks_adjusted -- as long as we stay within a
device's allowed bounds, we don't care for small deviations.

Make clockevents_program_event() use ->min_delta_ticks_adjusted for the
enforcement of the lower bound on the delta value.

Finally, slightly reorder the members of struct clock_event_device in order
to keep the now often used ones close together.

Signed-off-by: Nicolai Stange 
---
 include/linux/clockchips.h |  6 --
 kernel/time/clockevents.c  | 11 ++-
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
index be5f222..7c3a193 100644
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -80,8 +80,8 @@ enum clock_event_state {
  * @set_next_ktime:set next event function using a direct ktime value
  * @next_event:local storage for the next event in oneshot mode
  * @max_delta_ns:  maximum delta value in ns
- * @min_delta_ns:  minimum delta value in ns
  * @max_delta_ticks:   maximum delta value in ticks
+ * @min_delta_ticks_adjusted:  minimum delta value, increased as needed
  * @mult:  nanosecond to cycles multiplier
  * @shift: nanoseconds to cycles divisor (power of two)
  * @state_use_accessors:current state of the device, assigned by the core code
@@ -94,6 +94,7 @@ enum clock_event_state {
  * @tick_resume:   resume clkevt device
  * @broadcast: function to broadcast events
  * @min_delta_ticks:   minimum delta value in ticks stored for reconfiguration
+ * @min_delta_ns:  minimum delta value in ns
  * @name:  ptr to clock event name
  * @rating:variable to rate clock event devices
  * @irq:   IRQ number (only for non CPU local devices)
@@ -108,8 +109,8 @@ struct clock_event_device {
int (*set_next_ktime)(ktime_t expires, struct 
clock_event_device *);
ktime_t next_event;
u64 max_delta_ns;
-   u64 min_delta_ns;
unsigned long   max_delta_ticks;
+   unsigned long   min_delta_ticks_adjusted;
u32 mult;
u32 shift;
enum clock_event_state  state_use_accessors;
@@ -126,6 +127,7 @@ struct clock_event_device {
void(*suspend)(struct clock_event_device *);
void(*resume)(struct clock_event_device *);
unsigned long   min_delta_ticks;
+   u64 min_delta_ns;
 
const char  *name;
int rating;
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 472fcc2..f41f584 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -226,6 +226,11 @@ static int clockevents_increase_min_delta(struct 
clock_event_device *dev)
if (dev->min_delta_ns > MIN_DELTA_LIMIT)
dev->min_delta_ns = MIN_DELTA_LIMIT;
 
+   dev->min_delta_ticks_adjusted = (unsigned long)((dev->min_delta_ns *
+   dev->mult) >> dev->shift);
+   dev->min_delta_ticks_adjusted = max(dev->min_delta_ticks_adjusted,
+   dev->min_delta_ticks);
+
printk_deferred(KERN_WARNING
"CE: %s increased min_delta_ns to 

[RFC v5 12/23] many clockevent drivers: don't set ->min_delta_ns and ->max_delta_ns

2016-09-03 Thread Nicolai Stange
Now that the clockevent core always initializes the ->*_delta_ns values
from their ->_delta_ticks counterparts, there is no point in having the
clockevent devices' drivers doing so as well.

Don't initialize ->min_delta_ns and ->max_delta_ns from the clockevent
devices' drivers.

This patch was created with the help of the Coccinelle script below.
One initialization of ->min_delta_ns in arch/tile/kernel/time.c gets
missed by this Cocci script and its removal has been manually added.

@@
expression ced;
expression ns, cycles;
@@
- ced->min_delta_ns = ns;
...
ced->min_delta_ticks = cycles;

@@
expression ced;
expression ns, cycles;
@@
- ced.min_delta_ns = ns;
...
ced.min_delta_ticks = cycles;

@@
identifier ced;
expression ns, cycles;
@@
struct clock_event_device ced = {
-   .min_delta_ns = ns,
   .min_delta_ticks = cycles,
};

@@
expression ced;
expression ns, cycles;
@@
- ced->max_delta_ns = ns;
...
ced->max_delta_ticks = cycles;

@@
expression ced;
expression ns, cycles;
@@
- ced.max_delta_ns = ns;
...
ced.max_delta_ticks = cycles;

@@
identifier ced;
expression ns, cycles;
@@
struct clock_event_device ced = {
-   .max_delta_ns = ns,
   .max_delta_ticks = cycles,
};

Signed-off-by: Nicolai Stange 
---
 arch/avr32/kernel/time.c  | 2 --
 arch/blackfin/kernel/time-ts.c| 4 
 arch/c6x/platforms/timer64.c  | 2 --
 arch/hexagon/kernel/time.c| 2 --
 arch/m68k/coldfire/pit.c  | 4 
 arch/microblaze/kernel/timer.c| 4 
 arch/mips/alchemy/common/time.c   | 2 --
 arch/mips/jz4740/time.c   | 2 --
 arch/mips/kernel/cevt-bcm1480.c   | 2 --
 arch/mips/kernel/cevt-ds1287.c| 2 --
 arch/mips/kernel/cevt-gt641xx.c   | 2 --
 arch/mips/kernel/cevt-sb1250.c| 2 --
 arch/mips/kernel/cevt-txx9.c  | 3 ---
 arch/mips/loongson32/common/time.c| 2 --
 arch/mips/loongson64/common/cs5536/cs5536_mfgpt.c | 2 --
 arch/mips/loongson64/loongson-3/hpet.c| 2 --
 arch/mips/ralink/cevt-rt3352.c| 2 --
 arch/mips/sgi-ip27/ip27-timer.c   | 2 --
 arch/mn10300/kernel/cevt-mn10300.c| 2 --
 arch/powerpc/kernel/time.c| 4 
 arch/s390/kernel/time.c   | 2 --
 arch/score/kernel/time.c  | 4 
 arch/sparc/kernel/time_32.c   | 2 --
 arch/sparc/kernel/time_64.c   | 4 
 arch/tile/kernel/time.c   | 2 --
 arch/um/kernel/time.c | 2 --
 arch/unicore32/kernel/time.c  | 4 
 arch/x86/kernel/apic/apic.c   | 8 
 arch/x86/lguest/boot.c| 2 --
 arch/x86/platform/uv/uv_time.c| 4 
 arch/x86/xen/time.c   | 4 
 drivers/clocksource/dw_apb_timer.c| 3 ---
 drivers/clocksource/metag_generic.c   | 2 --
 drivers/clocksource/numachip.c| 2 --
 drivers/clocksource/sh_cmt.c  | 2 --
 drivers/clocksource/timer-atlas7.c| 2 --
 kernel/time/tick-broadcast-hrtimer.c  | 2 --
 37 files changed, 100 deletions(-)

diff --git a/arch/avr32/kernel/time.c b/arch/avr32/kernel/time.c
index 3fff4c9..af08261 100644
--- a/arch/avr32/kernel/time.c
+++ b/arch/avr32/kernel/time.c
@@ -141,9 +141,7 @@ void __init time_init(void)
 
/* setup COMPARE clockevent */
comparator.mult = div_sc(counter_hz, NSEC_PER_SEC, comparator.shift);
-   comparator.max_delta_ns = clockevent_delta2ns((u32)~0, );
comparator.max_delta_ticks = (u32)~0;
-   comparator.min_delta_ns = clockevent_delta2ns(50, ) + 1;
comparator.min_delta_ticks = 50;
comparator.cpumask = cpumask_of(0);
 
diff --git a/arch/blackfin/kernel/time-ts.c b/arch/blackfin/kernel/time-ts.c
index 4c93b6f..be43289 100644
--- a/arch/blackfin/kernel/time-ts.c
+++ b/arch/blackfin/kernel/time-ts.c
@@ -229,9 +229,7 @@ static void __init bfin_gptmr0_clockevent_init(struct 
clock_event_device *evt)
 
clock_tick = get_sclk();
evt->mult = div_sc(clock_tick, NSEC_PER_SEC, evt->shift);
-   evt->max_delta_ns = clockevent_delta2ns(-1, evt);
evt->max_delta_ticks = (unsigned long)-1;
-   evt->min_delta_ns = clockevent_delta2ns(100, evt);
evt->min_delta_ticks = 100;
 
evt->cpumask = cpumask_of(0);
@@ -345,9 +343,7 @@ void bfin_coretmr_clockevent_init(void)
 
clock_tick = get_cclk() / TIME_SCALE;
evt->mult = div_sc(clock_tick, NSEC_PER_SEC, evt->shift);
-   evt->max_delta_ns = clockevent_delta2ns(-1, evt);
evt->max_delta_ticks = (unsigned long)-1;
-   evt->min_delta_ns = 

[RFC v5 13/23] clockevents: introduce CLOCK_EVT_FEAT_NO_ADJUST flag

2016-09-03 Thread Nicolai Stange
Upcoming changes to the clockevent core will make it adjusting a clockevent
device's mult/shift pair in order to compensate for NTP corrections made
by the timekeeping core.

For certain devices this behaviour is unwanted. Introduce the
CLOCK_EVT_FEAT_NO_ADJUST flag that, when being set, will cause a clockevent
device to be excluded from those adjustments.

Signed-off-by: Nicolai Stange 
---
 include/linux/clockchips.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
index a116926..9a25185 100644
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -66,6 +66,12 @@ enum clock_event_state {
  */
 # define CLOCK_EVT_FEAT_HRTIMER0x80
 
+/*
+ * Clockevent device's mult/shift pair shall not get adjusted in order
+ * to compensate for NTP corrections made in the timekeeping core.
+ */
+# define CLOCK_EVT_FEAT_NO_ADJUST  0x000100
+
 /**
  * struct clock_event_device - clock event device descriptor
  * @event_handler: Assigned by the framework to be called by the low
-- 
2.9.3



[RFC v5 12/23] many clockevent drivers: don't set ->min_delta_ns and ->max_delta_ns

2016-09-03 Thread Nicolai Stange
Now that the clockevent core always initializes the ->*_delta_ns values
from their ->_delta_ticks counterparts, there is no point in having the
clockevent devices' drivers doing so as well.

Don't initialize ->min_delta_ns and ->max_delta_ns from the clockevent
devices' drivers.

This patch was created with the help of the Coccinelle script below.
One initialization of ->min_delta_ns in arch/tile/kernel/time.c gets
missed by this Cocci script and its removal has been manually added.

@@
expression ced;
expression ns, cycles;
@@
- ced->min_delta_ns = ns;
...
ced->min_delta_ticks = cycles;

@@
expression ced;
expression ns, cycles;
@@
- ced.min_delta_ns = ns;
...
ced.min_delta_ticks = cycles;

@@
identifier ced;
expression ns, cycles;
@@
struct clock_event_device ced = {
-   .min_delta_ns = ns,
   .min_delta_ticks = cycles,
};

@@
expression ced;
expression ns, cycles;
@@
- ced->max_delta_ns = ns;
...
ced->max_delta_ticks = cycles;

@@
expression ced;
expression ns, cycles;
@@
- ced.max_delta_ns = ns;
...
ced.max_delta_ticks = cycles;

@@
identifier ced;
expression ns, cycles;
@@
struct clock_event_device ced = {
-   .max_delta_ns = ns,
   .max_delta_ticks = cycles,
};

Signed-off-by: Nicolai Stange 
---
 arch/avr32/kernel/time.c  | 2 --
 arch/blackfin/kernel/time-ts.c| 4 
 arch/c6x/platforms/timer64.c  | 2 --
 arch/hexagon/kernel/time.c| 2 --
 arch/m68k/coldfire/pit.c  | 4 
 arch/microblaze/kernel/timer.c| 4 
 arch/mips/alchemy/common/time.c   | 2 --
 arch/mips/jz4740/time.c   | 2 --
 arch/mips/kernel/cevt-bcm1480.c   | 2 --
 arch/mips/kernel/cevt-ds1287.c| 2 --
 arch/mips/kernel/cevt-gt641xx.c   | 2 --
 arch/mips/kernel/cevt-sb1250.c| 2 --
 arch/mips/kernel/cevt-txx9.c  | 3 ---
 arch/mips/loongson32/common/time.c| 2 --
 arch/mips/loongson64/common/cs5536/cs5536_mfgpt.c | 2 --
 arch/mips/loongson64/loongson-3/hpet.c| 2 --
 arch/mips/ralink/cevt-rt3352.c| 2 --
 arch/mips/sgi-ip27/ip27-timer.c   | 2 --
 arch/mn10300/kernel/cevt-mn10300.c| 2 --
 arch/powerpc/kernel/time.c| 4 
 arch/s390/kernel/time.c   | 2 --
 arch/score/kernel/time.c  | 4 
 arch/sparc/kernel/time_32.c   | 2 --
 arch/sparc/kernel/time_64.c   | 4 
 arch/tile/kernel/time.c   | 2 --
 arch/um/kernel/time.c | 2 --
 arch/unicore32/kernel/time.c  | 4 
 arch/x86/kernel/apic/apic.c   | 8 
 arch/x86/lguest/boot.c| 2 --
 arch/x86/platform/uv/uv_time.c| 4 
 arch/x86/xen/time.c   | 4 
 drivers/clocksource/dw_apb_timer.c| 3 ---
 drivers/clocksource/metag_generic.c   | 2 --
 drivers/clocksource/numachip.c| 2 --
 drivers/clocksource/sh_cmt.c  | 2 --
 drivers/clocksource/timer-atlas7.c| 2 --
 kernel/time/tick-broadcast-hrtimer.c  | 2 --
 37 files changed, 100 deletions(-)

diff --git a/arch/avr32/kernel/time.c b/arch/avr32/kernel/time.c
index 3fff4c9..af08261 100644
--- a/arch/avr32/kernel/time.c
+++ b/arch/avr32/kernel/time.c
@@ -141,9 +141,7 @@ void __init time_init(void)
 
/* setup COMPARE clockevent */
comparator.mult = div_sc(counter_hz, NSEC_PER_SEC, comparator.shift);
-   comparator.max_delta_ns = clockevent_delta2ns((u32)~0, );
comparator.max_delta_ticks = (u32)~0;
-   comparator.min_delta_ns = clockevent_delta2ns(50, ) + 1;
comparator.min_delta_ticks = 50;
comparator.cpumask = cpumask_of(0);
 
diff --git a/arch/blackfin/kernel/time-ts.c b/arch/blackfin/kernel/time-ts.c
index 4c93b6f..be43289 100644
--- a/arch/blackfin/kernel/time-ts.c
+++ b/arch/blackfin/kernel/time-ts.c
@@ -229,9 +229,7 @@ static void __init bfin_gptmr0_clockevent_init(struct 
clock_event_device *evt)
 
clock_tick = get_sclk();
evt->mult = div_sc(clock_tick, NSEC_PER_SEC, evt->shift);
-   evt->max_delta_ns = clockevent_delta2ns(-1, evt);
evt->max_delta_ticks = (unsigned long)-1;
-   evt->min_delta_ns = clockevent_delta2ns(100, evt);
evt->min_delta_ticks = 100;
 
evt->cpumask = cpumask_of(0);
@@ -345,9 +343,7 @@ void bfin_coretmr_clockevent_init(void)
 
clock_tick = get_cclk() / TIME_SCALE;
evt->mult = div_sc(clock_tick, NSEC_PER_SEC, evt->shift);
-   evt->max_delta_ns = clockevent_delta2ns(-1, evt);
evt->max_delta_ticks = (unsigned long)-1;
-   evt->min_delta_ns = clockevent_delta2ns(100, 

[RFC v5 13/23] clockevents: introduce CLOCK_EVT_FEAT_NO_ADJUST flag

2016-09-03 Thread Nicolai Stange
Upcoming changes to the clockevent core will make it adjusting a clockevent
device's mult/shift pair in order to compensate for NTP corrections made
by the timekeeping core.

For certain devices this behaviour is unwanted. Introduce the
CLOCK_EVT_FEAT_NO_ADJUST flag that, when being set, will cause a clockevent
device to be excluded from those adjustments.

Signed-off-by: Nicolai Stange 
---
 include/linux/clockchips.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
index a116926..9a25185 100644
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -66,6 +66,12 @@ enum clock_event_state {
  */
 # define CLOCK_EVT_FEAT_HRTIMER0x80
 
+/*
+ * Clockevent device's mult/shift pair shall not get adjusted in order
+ * to compensate for NTP corrections made in the timekeeping core.
+ */
+# define CLOCK_EVT_FEAT_NO_ADJUST  0x000100
+
 /**
  * struct clock_event_device - clock event device descriptor
  * @event_handler: Assigned by the framework to be called by the low
-- 
2.9.3



[RFC v5 02/23] clocksource: sh_tmu: compute rate before registration again

2016-09-03 Thread Nicolai Stange
With the upcoming NTP correction related rate adjustments to be implemented
in the clockevents core, the latter needs to get informed about every rate
change of a clockevent device made after its registration.

Currently, sh_tmu violates this requirement in that it registers its
clockevent device with a dummy rate and sets its final rate through
clockevents_config() called from its ->set_state_oneshot() and
->set_state_periodic() functions respectively.

This patch moves the setting of the clockevent device's rate to its
registration.

Note that there has been some back and forth regarding this question with
respect to the clocksource also provided by this driver:
  commit 66f49121ffa4 ("clocksource: sh_tmu: compute mult and shift before
registration")
moves the rate determination from the clocksource's ->enable() function to
before its registration. OTOH, the later
  commit 0aeac458d9eb ("clocksource: sh_tmu: __clocksource_updatefreq_hz()
update")
basically reverts this, saying
  "Without this patch the old code uses clocksource_register() together
   with a hack that assumes a never changing clock rate."

However, I checked all current sh_tmu users in arch/sh as well as in
arch/arm/mach-shmobile carefully and right now, none of them changes any
rate in any clock tree relevant to sh_tmu after their respective
time_init(). Since all sh_tmu instances are created after time_init(), none
of them should ever observe any clock rate changes.

What's more, both, a clocksource as well as a clockevent device, can
immediately get selected for use at their registration and thus, enabled
at this point already. So it's probably safer to assume a "never changing
clock rate" here.

- Move the struct sh_tmu_channel's ->rate member to struct sh_tmu_device:
  it's a property of the underlying clock which is in turn specific to
  the sh_tmu_device.
- Determine the ->rate value in sh_tmu_setup() at device probing rather
  than at first usage.
- Set the clockevent device's rate at its registration.
- Although not strictly necessary for the upcoming clockevent core changes,
  set the clocksource's rate at its registration for consistency.

Signed-off-by: Nicolai Stange 
---

Notes:
For a detailed analysis of the current sh_tmu users, please see
https://nicst.de/ced-clk-rate-change-analysis/sh_tmu-cgitted.html

Compile-only tested on ARCH=sh and ARCH=arm.

 drivers/clocksource/sh_tmu.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/clocksource/sh_tmu.c b/drivers/clocksource/sh_tmu.c
index 469e776..8200042 100644
--- a/drivers/clocksource/sh_tmu.c
+++ b/drivers/clocksource/sh_tmu.c
@@ -46,7 +46,6 @@ struct sh_tmu_channel {
void __iomem *base;
int irq;
 
-   unsigned long rate;
unsigned long periodic;
struct clock_event_device ced;
struct clocksource cs;
@@ -59,6 +58,7 @@ struct sh_tmu_device {
 
void __iomem *mapbase;
struct clk *clk;
+   unsigned long rate;
 
enum sh_tmu_model model;
 
@@ -165,7 +165,6 @@ static int __sh_tmu_enable(struct sh_tmu_channel *ch)
sh_tmu_write(ch, TCNT, 0x);
 
/* configure channel to parent clock / 4, irq off */
-   ch->rate = clk_get_rate(ch->tmu->clk) / 4;
sh_tmu_write(ch, TCR, TCR_TPSC_CLK4);
 
/* enable channel */
@@ -271,10 +270,8 @@ static int sh_tmu_clocksource_enable(struct clocksource 
*cs)
return 0;
 
ret = sh_tmu_enable(ch);
-   if (!ret) {
-   __clocksource_update_freq_hz(cs, ch->rate);
+   if (!ret)
ch->cs_enabled = true;
-   }
 
return ret;
 }
@@ -334,8 +331,7 @@ static int sh_tmu_register_clocksource(struct 
sh_tmu_channel *ch,
dev_info(>tmu->pdev->dev, "ch%u: used as clock source\n",
 ch->index);
 
-   /* Register with dummy 1 Hz value, gets updated in ->enable() */
-   clocksource_register_hz(cs, 1);
+   clocksource_register_hz(cs, ch->tmu->rate);
return 0;
 }
 
@@ -346,14 +342,10 @@ static struct sh_tmu_channel *ced_to_sh_tmu(struct 
clock_event_device *ced)
 
 static void sh_tmu_clock_event_start(struct sh_tmu_channel *ch, int periodic)
 {
-   struct clock_event_device *ced = >ced;
-
sh_tmu_enable(ch);
 
-   clockevents_config(ced, ch->rate);
-
if (periodic) {
-   ch->periodic = (ch->rate + HZ/2) / HZ;
+   ch->periodic = (ch->tmu->rate + HZ/2) / HZ;
sh_tmu_set_next(ch, ch->periodic, 1);
}
 }
@@ -435,7 +427,7 @@ static void sh_tmu_register_clockevent(struct 
sh_tmu_channel *ch,
dev_info(>tmu->pdev->dev, "ch%u: used for clock events\n",
 ch->index);
 
-   clockevents_config_and_register(ced, 1, 0x300, 0x);
+   clockevents_config_and_register(ced, ch->tmu->rate, 0x300, 0x);
 
ret = 

[RFC v5 09/23] arch/x86/platform/uv/uv_time: set ->min_delta_ticks and ->max_delta_ticks

2016-09-03 Thread Nicolai Stange
With the yet to come introduction of NTP correction awareness to the
clockevent core, drivers should report their valid ranges in units of
cycles to the latter.

Currently, the x86's uv rtc clockevent device is initialized as follows:

  clock_event_device_uv.min_delta_ns = NSEC_PER_SEC /
 sn_rtc_cycles_per_second;
  clock_event_device_uv.max_delta_ns = clocksource_uv.mask *
 (NSEC_PER_SEC / sn_rtc_cycles_per_second);

This translates to a ->min_delta_ticks value of 1 and a ->max_delta_ticks
value of clocksource_uv.mask.

Initialize ->min_delta_ticks and ->max_delta_ticks with these values
respectively.

Signed-off-by: Nicolai Stange 
---

Notes:
Compile-only tested through an allmodconfig build on ARCH=x86_64.

 arch/x86/platform/uv/uv_time.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/platform/uv/uv_time.c b/arch/x86/platform/uv/uv_time.c
index b333fc4..6410ee3 100644
--- a/arch/x86/platform/uv/uv_time.c
+++ b/arch/x86/platform/uv/uv_time.c
@@ -390,9 +390,11 @@ static __init int uv_rtc_setup_clock(void)
 
clock_event_device_uv.min_delta_ns = NSEC_PER_SEC /
sn_rtc_cycles_per_second;
+   clock_event_device_uv.min_delta_ticks = 1;
 
clock_event_device_uv.max_delta_ns = clocksource_uv.mask *
(NSEC_PER_SEC / sn_rtc_cycles_per_second);
+   clock_event_device_uv.max_delta_ticks = clocksource_uv.mask;
 
rc = schedule_on_each_cpu(uv_rtc_register_clockevents);
if (rc) {
-- 
2.9.3



[RFC v5 07/23] many clockevent drivers: set ->min_delta_ticks and ->max_delta_ticks

2016-09-03 Thread Nicolai Stange
The struct clock_event_device has got the ->min_delta_ns, ->max_delta_ns,
->min_delta_ticks and ->max_delta_ticks members for setting the bounds
of valid deltas to be programmed.

During operation, the clockevent core uses the ->*_delta_ns versions only.
OTOH, the ->*_delta_ticks are currently used solely for the calculations of
the ->*_delta_ns values in clockevents_config().

>From the hardware's point of view, giving the valid ranges in terms of
cycles, i.e. the ->*_delta_ticks values, is the natural choice.

Right now, as the ratio of ns/cycles is fixed, it doesn't matter much which
unit of measurement is actually used by the clockevents core. (Well, apart
from the fact that nearly all drivers which set the ->*_delta_ns manually
don't round towards the correct direction).

This situation is going to change though: as NTP time correction awareness
will be introduced to the clockevents core, the ratio ns/cycles will cease
to stay fixed at all times.

In conclusion, every driver should hand its valid range in terms of an
invariant unit, i.e. cycles, to the clockevent core.

Those drivers which configure their clockevent devices via
clockevents_config_and_register() already do this.

This patch makes the vast majority of the remaining drivers do that, too.
For the sake of bisectability, the manual settings of the ->*delta_ns
by drivers will be removed by a later patch.

This patch was created with the help of the Coccinelle script below. It is
noteworthy that for constructs like

  ced.min_delta_ns = clockevent_delta2ns(, ) + 1;

it is assumed that the +1 addition is done in order to compensate for
any rounding introduced by the conversion of cycles to ns and thus,
is irrelevant for ->min_delta_cycles.

@@
expression ced, cycles;
@@
ced->min_delta_ns = clockevent_delta2ns(cycles, ced);
+ ced->min_delta_ticks = cycles;

@min@
expression ced, cycles;
@@
ced.min_delta_ns = clockevent_delta2ns(cycles, );
+ ced.min_delta_ticks = cycles;

@min_round_up depends on !min@
expression ced, cycles;
@@
ced.min_delta_ns = clockevent_delta2ns(cycles, ) + 1;
+ ced.min_delta_ticks = cycles;

@@
identifier ced;
expression ns;
@@
struct clock_event_device ced = {
   .mult = 1,
   .shift = 0,
   .min_delta_ns = ns,
+  .min_delta_ticks = ns,
};

@p_max_neg_const@
expression ced;
constant int cycles;
@@
ced->max_delta_ns = clockevent_delta2ns(-cycles, ced);
+ ced->max_delta_ticks = (unsigned long)-cycles;

@p_max depends on !p_max_neg_const@
expression ced, cycles;
@@
ced->max_delta_ns = clockevent_delta2ns(cycles, ced);
+ ced->max_delta_ticks = cycles;

@@
expression ced, cycles;
@@
ced.max_delta_ns = clockevent_delta2ns(cycles, );
+ ced.max_delta_ticks = cycles;

@@
identifier ced;
expression ns;
@@
struct clock_event_device ced = {
   .mult = 1,
   .shift = 0,
   .max_delta_ns = ns,
+  .max_delta_ticks = ns,
};

This patch differs from what spatch created only in some whitespace
adjustments and an explicit cast added in arch/sparc/kernel/time_32.c.

Signed-off-by: Nicolai Stange 
---
 arch/avr32/kernel/time.c  | 2 ++
 arch/blackfin/kernel/time-ts.c| 4 
 arch/c6x/platforms/timer64.c  | 2 ++
 arch/hexagon/kernel/time.c| 2 ++
 arch/m68k/coldfire/pit.c  | 2 ++
 arch/microblaze/kernel/timer.c| 2 ++
 arch/mips/alchemy/common/time.c   | 4 +++-
 arch/mips/jz4740/time.c   | 2 ++
 arch/mips/kernel/cevt-bcm1480.c   | 2 ++
 arch/mips/kernel/cevt-ds1287.c| 2 ++
 arch/mips/kernel/cevt-gt641xx.c   | 2 ++
 arch/mips/kernel/cevt-sb1250.c| 2 ++
 arch/mips/kernel/cevt-txx9.c  | 2 ++
 arch/mips/loongson32/common/time.c| 2 ++
 arch/mips/loongson64/common/cs5536/cs5536_mfgpt.c | 2 ++
 arch/mips/loongson64/loongson-3/hpet.c| 2 ++
 arch/mips/ralink/cevt-rt3352.c| 2 ++
 arch/mips/sgi-ip27/ip27-timer.c   | 2 ++
 arch/mn10300/kernel/cevt-mn10300.c| 2 ++
 arch/powerpc/kernel/time.c| 2 ++
 arch/score/kernel/time.c  | 2 ++
 arch/sparc/kernel/time_32.c   | 2 ++
 arch/sparc/kernel/time_64.c   | 2 ++
 arch/um/kernel/time.c | 4 +++-
 arch/unicore32/kernel/time.c  | 2 ++
 arch/x86/kernel/apic/apic.c   | 4 
 arch/x86/lguest/boot.c| 2 ++
 arch/x86/xen/time.c   | 4 
 drivers/clocksource/dw_apb_timer.c| 2 ++
 drivers/clocksource/metag_generic.c   | 2 ++
 drivers/clocksource/numachip.c| 2 ++
 drivers/clocksource/sh_cmt.c  | 2 ++
 drivers/clocksource/timer-atlas7.c| 2 ++
 

[RFC v5 10/23] arch/tile/kernel/time: set ->min_delta_ticks and ->max_delta_ticks

2016-09-03 Thread Nicolai Stange
With the yet to come introduction of NTP correction awareness to the
clockevent core, drivers should report their valid ranges in units of
cycles to the latter.

Currently, the tile's timer clockevent device is initialized as follows:

  evt->max_delta_ns = clockevent_delta2ns(MAX_TICK, evt);

and

  .min_delta_ns = 1000,

The first one translates to a ->max_delta_ticks value of MAX_TICK.
For the latter, note that the clockevent core will superimpose a
minimum of 1us by itself -- setting ->min_delta_ticks to 1 is safe here.

Initialize ->min_delta_ticks and ->max_delta_ticks with these values.

Signed-off-by: Nicolai Stange 
---
 arch/tile/kernel/time.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/tile/kernel/time.c b/arch/tile/kernel/time.c
index 178989e..c2fd280 100644
--- a/arch/tile/kernel/time.c
+++ b/arch/tile/kernel/time.c
@@ -154,6 +154,7 @@ static DEFINE_PER_CPU(struct clock_event_device, 
tile_timer) = {
.name = "tile timer",
.features = CLOCK_EVT_FEAT_ONESHOT,
.min_delta_ns = 1000,
+   .min_delta_ticks = 1,
.rating = 100,
.irq = -1,
.set_next_event = tile_timer_set_next_event,
@@ -169,6 +170,7 @@ void setup_tile_timer(void)
/* Fill in fields that are speed-specific. */
clockevents_calc_mult_shift(evt, cycles_per_sec, TILE_MINSEC);
evt->max_delta_ns = clockevent_delta2ns(MAX_TICK, evt);
+   evt->max_delta_ticks = MAX_TICK;
 
/* Mark as being for this cpu only. */
evt->cpumask = cpumask_of(smp_processor_id());
-- 
2.9.3



[RFC v5 02/23] clocksource: sh_tmu: compute rate before registration again

2016-09-03 Thread Nicolai Stange
With the upcoming NTP correction related rate adjustments to be implemented
in the clockevents core, the latter needs to get informed about every rate
change of a clockevent device made after its registration.

Currently, sh_tmu violates this requirement in that it registers its
clockevent device with a dummy rate and sets its final rate through
clockevents_config() called from its ->set_state_oneshot() and
->set_state_periodic() functions respectively.

This patch moves the setting of the clockevent device's rate to its
registration.

Note that there has been some back and forth regarding this question with
respect to the clocksource also provided by this driver:
  commit 66f49121ffa4 ("clocksource: sh_tmu: compute mult and shift before
registration")
moves the rate determination from the clocksource's ->enable() function to
before its registration. OTOH, the later
  commit 0aeac458d9eb ("clocksource: sh_tmu: __clocksource_updatefreq_hz()
update")
basically reverts this, saying
  "Without this patch the old code uses clocksource_register() together
   with a hack that assumes a never changing clock rate."

However, I checked all current sh_tmu users in arch/sh as well as in
arch/arm/mach-shmobile carefully and right now, none of them changes any
rate in any clock tree relevant to sh_tmu after their respective
time_init(). Since all sh_tmu instances are created after time_init(), none
of them should ever observe any clock rate changes.

What's more, both, a clocksource as well as a clockevent device, can
immediately get selected for use at their registration and thus, enabled
at this point already. So it's probably safer to assume a "never changing
clock rate" here.

- Move the struct sh_tmu_channel's ->rate member to struct sh_tmu_device:
  it's a property of the underlying clock which is in turn specific to
  the sh_tmu_device.
- Determine the ->rate value in sh_tmu_setup() at device probing rather
  than at first usage.
- Set the clockevent device's rate at its registration.
- Although not strictly necessary for the upcoming clockevent core changes,
  set the clocksource's rate at its registration for consistency.

Signed-off-by: Nicolai Stange 
---

Notes:
For a detailed analysis of the current sh_tmu users, please see
https://nicst.de/ced-clk-rate-change-analysis/sh_tmu-cgitted.html

Compile-only tested on ARCH=sh and ARCH=arm.

 drivers/clocksource/sh_tmu.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/clocksource/sh_tmu.c b/drivers/clocksource/sh_tmu.c
index 469e776..8200042 100644
--- a/drivers/clocksource/sh_tmu.c
+++ b/drivers/clocksource/sh_tmu.c
@@ -46,7 +46,6 @@ struct sh_tmu_channel {
void __iomem *base;
int irq;
 
-   unsigned long rate;
unsigned long periodic;
struct clock_event_device ced;
struct clocksource cs;
@@ -59,6 +58,7 @@ struct sh_tmu_device {
 
void __iomem *mapbase;
struct clk *clk;
+   unsigned long rate;
 
enum sh_tmu_model model;
 
@@ -165,7 +165,6 @@ static int __sh_tmu_enable(struct sh_tmu_channel *ch)
sh_tmu_write(ch, TCNT, 0x);
 
/* configure channel to parent clock / 4, irq off */
-   ch->rate = clk_get_rate(ch->tmu->clk) / 4;
sh_tmu_write(ch, TCR, TCR_TPSC_CLK4);
 
/* enable channel */
@@ -271,10 +270,8 @@ static int sh_tmu_clocksource_enable(struct clocksource 
*cs)
return 0;
 
ret = sh_tmu_enable(ch);
-   if (!ret) {
-   __clocksource_update_freq_hz(cs, ch->rate);
+   if (!ret)
ch->cs_enabled = true;
-   }
 
return ret;
 }
@@ -334,8 +331,7 @@ static int sh_tmu_register_clocksource(struct 
sh_tmu_channel *ch,
dev_info(>tmu->pdev->dev, "ch%u: used as clock source\n",
 ch->index);
 
-   /* Register with dummy 1 Hz value, gets updated in ->enable() */
-   clocksource_register_hz(cs, 1);
+   clocksource_register_hz(cs, ch->tmu->rate);
return 0;
 }
 
@@ -346,14 +342,10 @@ static struct sh_tmu_channel *ced_to_sh_tmu(struct 
clock_event_device *ced)
 
 static void sh_tmu_clock_event_start(struct sh_tmu_channel *ch, int periodic)
 {
-   struct clock_event_device *ced = >ced;
-
sh_tmu_enable(ch);
 
-   clockevents_config(ced, ch->rate);
-
if (periodic) {
-   ch->periodic = (ch->rate + HZ/2) / HZ;
+   ch->periodic = (ch->tmu->rate + HZ/2) / HZ;
sh_tmu_set_next(ch, ch->periodic, 1);
}
 }
@@ -435,7 +427,7 @@ static void sh_tmu_register_clockevent(struct 
sh_tmu_channel *ch,
dev_info(>tmu->pdev->dev, "ch%u: used for clock events\n",
 ch->index);
 
-   clockevents_config_and_register(ced, 1, 0x300, 0x);
+   clockevents_config_and_register(ced, ch->tmu->rate, 0x300, 0x);
 
ret = request_irq(ch->irq, 

[RFC v5 09/23] arch/x86/platform/uv/uv_time: set ->min_delta_ticks and ->max_delta_ticks

2016-09-03 Thread Nicolai Stange
With the yet to come introduction of NTP correction awareness to the
clockevent core, drivers should report their valid ranges in units of
cycles to the latter.

Currently, the x86's uv rtc clockevent device is initialized as follows:

  clock_event_device_uv.min_delta_ns = NSEC_PER_SEC /
 sn_rtc_cycles_per_second;
  clock_event_device_uv.max_delta_ns = clocksource_uv.mask *
 (NSEC_PER_SEC / sn_rtc_cycles_per_second);

This translates to a ->min_delta_ticks value of 1 and a ->max_delta_ticks
value of clocksource_uv.mask.

Initialize ->min_delta_ticks and ->max_delta_ticks with these values
respectively.

Signed-off-by: Nicolai Stange 
---

Notes:
Compile-only tested through an allmodconfig build on ARCH=x86_64.

 arch/x86/platform/uv/uv_time.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/platform/uv/uv_time.c b/arch/x86/platform/uv/uv_time.c
index b333fc4..6410ee3 100644
--- a/arch/x86/platform/uv/uv_time.c
+++ b/arch/x86/platform/uv/uv_time.c
@@ -390,9 +390,11 @@ static __init int uv_rtc_setup_clock(void)
 
clock_event_device_uv.min_delta_ns = NSEC_PER_SEC /
sn_rtc_cycles_per_second;
+   clock_event_device_uv.min_delta_ticks = 1;
 
clock_event_device_uv.max_delta_ns = clocksource_uv.mask *
(NSEC_PER_SEC / sn_rtc_cycles_per_second);
+   clock_event_device_uv.max_delta_ticks = clocksource_uv.mask;
 
rc = schedule_on_each_cpu(uv_rtc_register_clockevents);
if (rc) {
-- 
2.9.3



[RFC v5 07/23] many clockevent drivers: set ->min_delta_ticks and ->max_delta_ticks

2016-09-03 Thread Nicolai Stange
The struct clock_event_device has got the ->min_delta_ns, ->max_delta_ns,
->min_delta_ticks and ->max_delta_ticks members for setting the bounds
of valid deltas to be programmed.

During operation, the clockevent core uses the ->*_delta_ns versions only.
OTOH, the ->*_delta_ticks are currently used solely for the calculations of
the ->*_delta_ns values in clockevents_config().

>From the hardware's point of view, giving the valid ranges in terms of
cycles, i.e. the ->*_delta_ticks values, is the natural choice.

Right now, as the ratio of ns/cycles is fixed, it doesn't matter much which
unit of measurement is actually used by the clockevents core. (Well, apart
from the fact that nearly all drivers which set the ->*_delta_ns manually
don't round towards the correct direction).

This situation is going to change though: as NTP time correction awareness
will be introduced to the clockevents core, the ratio ns/cycles will cease
to stay fixed at all times.

In conclusion, every driver should hand its valid range in terms of an
invariant unit, i.e. cycles, to the clockevent core.

Those drivers which configure their clockevent devices via
clockevents_config_and_register() already do this.

This patch makes the vast majority of the remaining drivers do that, too.
For the sake of bisectability, the manual settings of the ->*delta_ns
by drivers will be removed by a later patch.

This patch was created with the help of the Coccinelle script below. It is
noteworthy that for constructs like

  ced.min_delta_ns = clockevent_delta2ns(, ) + 1;

it is assumed that the +1 addition is done in order to compensate for
any rounding introduced by the conversion of cycles to ns and thus,
is irrelevant for ->min_delta_cycles.

@@
expression ced, cycles;
@@
ced->min_delta_ns = clockevent_delta2ns(cycles, ced);
+ ced->min_delta_ticks = cycles;

@min@
expression ced, cycles;
@@
ced.min_delta_ns = clockevent_delta2ns(cycles, );
+ ced.min_delta_ticks = cycles;

@min_round_up depends on !min@
expression ced, cycles;
@@
ced.min_delta_ns = clockevent_delta2ns(cycles, ) + 1;
+ ced.min_delta_ticks = cycles;

@@
identifier ced;
expression ns;
@@
struct clock_event_device ced = {
   .mult = 1,
   .shift = 0,
   .min_delta_ns = ns,
+  .min_delta_ticks = ns,
};

@p_max_neg_const@
expression ced;
constant int cycles;
@@
ced->max_delta_ns = clockevent_delta2ns(-cycles, ced);
+ ced->max_delta_ticks = (unsigned long)-cycles;

@p_max depends on !p_max_neg_const@
expression ced, cycles;
@@
ced->max_delta_ns = clockevent_delta2ns(cycles, ced);
+ ced->max_delta_ticks = cycles;

@@
expression ced, cycles;
@@
ced.max_delta_ns = clockevent_delta2ns(cycles, );
+ ced.max_delta_ticks = cycles;

@@
identifier ced;
expression ns;
@@
struct clock_event_device ced = {
   .mult = 1,
   .shift = 0,
   .max_delta_ns = ns,
+  .max_delta_ticks = ns,
};

This patch differs from what spatch created only in some whitespace
adjustments and an explicit cast added in arch/sparc/kernel/time_32.c.

Signed-off-by: Nicolai Stange 
---
 arch/avr32/kernel/time.c  | 2 ++
 arch/blackfin/kernel/time-ts.c| 4 
 arch/c6x/platforms/timer64.c  | 2 ++
 arch/hexagon/kernel/time.c| 2 ++
 arch/m68k/coldfire/pit.c  | 2 ++
 arch/microblaze/kernel/timer.c| 2 ++
 arch/mips/alchemy/common/time.c   | 4 +++-
 arch/mips/jz4740/time.c   | 2 ++
 arch/mips/kernel/cevt-bcm1480.c   | 2 ++
 arch/mips/kernel/cevt-ds1287.c| 2 ++
 arch/mips/kernel/cevt-gt641xx.c   | 2 ++
 arch/mips/kernel/cevt-sb1250.c| 2 ++
 arch/mips/kernel/cevt-txx9.c  | 2 ++
 arch/mips/loongson32/common/time.c| 2 ++
 arch/mips/loongson64/common/cs5536/cs5536_mfgpt.c | 2 ++
 arch/mips/loongson64/loongson-3/hpet.c| 2 ++
 arch/mips/ralink/cevt-rt3352.c| 2 ++
 arch/mips/sgi-ip27/ip27-timer.c   | 2 ++
 arch/mn10300/kernel/cevt-mn10300.c| 2 ++
 arch/powerpc/kernel/time.c| 2 ++
 arch/score/kernel/time.c  | 2 ++
 arch/sparc/kernel/time_32.c   | 2 ++
 arch/sparc/kernel/time_64.c   | 2 ++
 arch/um/kernel/time.c | 4 +++-
 arch/unicore32/kernel/time.c  | 2 ++
 arch/x86/kernel/apic/apic.c   | 4 
 arch/x86/lguest/boot.c| 2 ++
 arch/x86/xen/time.c   | 4 
 drivers/clocksource/dw_apb_timer.c| 2 ++
 drivers/clocksource/metag_generic.c   | 2 ++
 drivers/clocksource/numachip.c| 2 ++
 drivers/clocksource/sh_cmt.c  | 2 ++
 drivers/clocksource/timer-atlas7.c| 2 ++
 33 files changed, 74 

[RFC v5 10/23] arch/tile/kernel/time: set ->min_delta_ticks and ->max_delta_ticks

2016-09-03 Thread Nicolai Stange
With the yet to come introduction of NTP correction awareness to the
clockevent core, drivers should report their valid ranges in units of
cycles to the latter.

Currently, the tile's timer clockevent device is initialized as follows:

  evt->max_delta_ns = clockevent_delta2ns(MAX_TICK, evt);

and

  .min_delta_ns = 1000,

The first one translates to a ->max_delta_ticks value of MAX_TICK.
For the latter, note that the clockevent core will superimpose a
minimum of 1us by itself -- setting ->min_delta_ticks to 1 is safe here.

Initialize ->min_delta_ticks and ->max_delta_ticks with these values.

Signed-off-by: Nicolai Stange 
---
 arch/tile/kernel/time.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/tile/kernel/time.c b/arch/tile/kernel/time.c
index 178989e..c2fd280 100644
--- a/arch/tile/kernel/time.c
+++ b/arch/tile/kernel/time.c
@@ -154,6 +154,7 @@ static DEFINE_PER_CPU(struct clock_event_device, 
tile_timer) = {
.name = "tile timer",
.features = CLOCK_EVT_FEAT_ONESHOT,
.min_delta_ns = 1000,
+   .min_delta_ticks = 1,
.rating = 100,
.irq = -1,
.set_next_event = tile_timer_set_next_event,
@@ -169,6 +170,7 @@ void setup_tile_timer(void)
/* Fill in fields that are speed-specific. */
clockevents_calc_mult_shift(evt, cycles_per_sec, TILE_MINSEC);
evt->max_delta_ns = clockevent_delta2ns(MAX_TICK, evt);
+   evt->max_delta_ticks = MAX_TICK;
 
/* Mark as being for this cpu only. */
evt->cpumask = cpumask_of(smp_processor_id());
-- 
2.9.3



[RFC v5 01/23] clocksource: sh_cmt: compute rate before registration again

2016-09-03 Thread Nicolai Stange
With the upcoming NTP correction related rate adjustments to be implemented
in the clockevents core, the latter needs to get informed about every rate
change of a clockevent device made after its registration.

Currently, sh_cmt violates this requirement in that it registers its
clockevent device with a dummy rate and sets its final ->mult and ->shift
values from its ->set_state_oneshot() and ->set_state_periodic() functions
respectively.

This patch moves the setting of the clockevent device's ->mult and ->shift
values to before its registration.

Note that there has been some back and forth regarding this question with
respect to the clocksource also provided by this driver:
  commit f4d7c3565c16 ("clocksource: sh_cmt: compute mult and shift before
registration")
moves the rate determination from the clocksource's ->enable() function to
before its registration. OTOH, the later
  commit 3593f5fe40a1 ("clocksource: sh_cmt: __clocksource_updatefreq_hz()
update")
basically reverts this, saying
  "Without this patch the old code uses clocksource_register() together
   with a hack that assumes a never changing clock rate."

However, I checked all current sh_cmt users in arch/sh as well as in
arch/arm/mach-shmobile carefully and right now, none of them changes any
rate in any clock tree relevant to sh_cmt after their respective
time_init(). Since all sh_cmt instances are created after time_init(), none
of them should ever observe any clock rate changes.

What's more, both, a clocksource as well as a clockevent device, can
immediately get selected for use at their registration and thus, enabled
at this point already. So it's probably safer to assume a "never changing
clock rate" here.

- Move the struct sh_cmt_channel's ->rate member to struct sh_cmt_device:
  it's a property of the underlying clock which is in turn specific to
  the sh_cmt_device.
- Determine the ->rate value in sh_cmt_setup() at device probing rather
  than at first usage.
- Set the clockevent device's ->mult and ->shift values right before its
  registration.
- Although not strictly necessary for the upcoming clockevent core changes,
  set the clocksource's rate at its registration for consistency.

Signed-off-by: Nicolai Stange 
---

Notes:
For a detailed analysis of the current sh_cmt users, please see
https://nicst.de/ced-clk-rate-change-analysis/sh_cmt-cgitted.html

Compile-only tested on ARCH=sh and ARCH=arm.

 drivers/clocksource/sh_cmt.c | 50 
 1 file changed, 27 insertions(+), 23 deletions(-)

diff --git a/drivers/clocksource/sh_cmt.c b/drivers/clocksource/sh_cmt.c
index 103c493..fd9caa7 100644
--- a/drivers/clocksource/sh_cmt.c
+++ b/drivers/clocksource/sh_cmt.c
@@ -103,7 +103,6 @@ struct sh_cmt_channel {
unsigned long match_value;
unsigned long next_match_value;
unsigned long max_match_value;
-   unsigned long rate;
raw_spinlock_t lock;
struct clock_event_device ced;
struct clocksource cs;
@@ -118,6 +117,7 @@ struct sh_cmt_device {
 
void __iomem *mapbase;
struct clk *clk;
+   unsigned long rate;
 
raw_spinlock_t lock; /* Protect the shared start/stop register */
 
@@ -320,7 +320,7 @@ static void sh_cmt_start_stop_ch(struct sh_cmt_channel *ch, 
int start)
raw_spin_unlock_irqrestore(>cmt->lock, flags);
 }
 
-static int sh_cmt_enable(struct sh_cmt_channel *ch, unsigned long *rate)
+static int sh_cmt_enable(struct sh_cmt_channel *ch)
 {
int k, ret;
 
@@ -339,17 +339,14 @@ static int sh_cmt_enable(struct sh_cmt_channel *ch, 
unsigned long *rate)
sh_cmt_start_stop_ch(ch, 0);
 
/* configure channel, periodic mode and maximum timeout */
-   if (ch->cmt->info->width == 16) {
-   *rate = clk_get_rate(ch->cmt->clk) / 512;
+   if (ch->cmt->info->width == 16)
sh_cmt_write_cmcsr(ch, SH_CMT16_CMCSR_CMIE |
   SH_CMT16_CMCSR_CKS512);
-   } else {
-   *rate = clk_get_rate(ch->cmt->clk) / 8;
+   else
sh_cmt_write_cmcsr(ch, SH_CMT32_CMCSR_CMM |
   SH_CMT32_CMCSR_CMTOUT_IE |
   SH_CMT32_CMCSR_CMR_IRQ |
   SH_CMT32_CMCSR_CKS_RCLK8);
-   }
 
sh_cmt_write_cmcor(ch, 0x);
sh_cmt_write_cmcnt(ch, 0);
@@ -572,7 +569,7 @@ static int sh_cmt_start(struct sh_cmt_channel *ch, unsigned 
long flag)
raw_spin_lock_irqsave(>lock, flags);
 
if (!(ch->flags & (FLAG_CLOCKEVENT | FLAG_CLOCKSOURCE)))
-   ret = sh_cmt_enable(ch, >rate);
+   ret = sh_cmt_enable(ch);
 
if (ret)
goto out;
@@ -640,10 +637,9 @@ static int sh_cmt_clocksource_enable(struct clocksource 
*cs)
ch->total_cycles = 0;
 
ret = sh_cmt_start(ch, FLAG_CLOCKSOURCE);
-

[RFC v5 00/23] adapt clockevents frequencies to mono clock

2016-09-03 Thread Nicolai Stange
Goal: avoid programming ced devices too early for large deltas, for
  details, for details, c.f. the description of [21/23].


Previous v4 of this series can be found here:

  http://lkml.kernel.org/r/20160822233320.4548-1-nicsta...@gmail.com

Raised concerns were
1.) There should be a flag available which when enabled, should
exclude a ced from getting its ->mult adjusted.
2.) The ns2cyc multiplication in clockevents_program_event() can
overflow with the adjusted ->mult.

The first item is addressed by the new
[13/23] ("clockevents: introduce CLOCK_EVT_FEAT_NO_ADJUST flag"),
the second by
[14/23] ("clockevents: decouple ->max_delta_ns from ->max_delta_ticks")

A note regarding the CLOCK_EVT_FEAT_NO_ADJUST flag: we probably want
to set it for the ->mult = 1, ->shift = 0 devices. However, since the
frequency adjustments are small (and bounded), they actually translate
to noops for those ceds. Thus, setting it there would be a matter of
style and optimization and I left this out for now.


This series can be divided into logical subseries as follows:
[1-6/23]   Don't modify ced rate after registrations through mechanisms
   other than clockevents_update_freq().

[7-12/23]  Let all ced devices set their ->*_delta_ticks values and let
   the clockevent core do the rest.

[13/23]Introduce the CLOCK_EVT_FEAT_NO_ADJUST flag

[14-20/23] Fiddle around with the bound checking code in order to
   allow for non-atomic frequency updates from a CPU different
   than where the ced is programmed.

[21-23/23] Actually do the frequency adjustments.


Tested on x86_64 and next-20160825.


Changes to v4:
 [1-12/23] Unchanged

 [13/23] ("clockevents: introduce CLOCK_EVT_FEAT_NO_ADJUST flag")
   New.

 [14/23] ("clockevents: decouple ->max_delta_ns from ->max_delta_ticks")
   New. Solves the overflow problem the former
   [13/22] ("clockevents: check a programmed delta's bounds in terms of cycles")
   from v4 introduced.

   (Note that the former
[14/22] ("clockevents: clockevents_program_event(): turn clc into unsigned 
long")
from v4 has been purged.)

 [15/23] ("clockevents: do comparison of delta against minimum in terms of 
cycles")
   This is the former
   [13/22] ("clockevents: check a programmed delta's bounds in terms of 
cycles"),
   but only for the ->min_delta_* -- the ->max_delta_* are handled by [14/23] 
now.

 [16/23] ("clockevents: clockevents_program_min_delta(): don't set 
->next_event")
   Former [15/22] unchanged.

 [17/23] ("clockevents: use ->min_delta_ticks_adjusted to program minimum 
delta")
   Former [16/22]. Trivially fix compilation error with
   CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=n.

 [18/22] ("clockevents: min delta increment: calculate min_delta_ns from ticks")
   Former [17/22] unchanged.

 [19/23] ("timer_list: print_tickdevice(): calculate ->min_delta_ns 
dynamically")
   Corresponds to former
   [18/22] ("timer_list: print_tickdevice(): calculate ->*_delta_ns 
dynamically")
   from v4, but only for ->min_delta_ns. The changes required for the display of
   ->max_delta_ns are now being made in [14/23] already.

 [20/23] ("clockevents: purge ->min_delta_ns")
   Corresponds to former
   [19/22] ("clockevents: purge ->min_delta_ns and ->max_delta_ns"),
   but with ->max_delta_ns being kept.

 [21/23] ("clockevents: initial support for mono to raw time conversion")
   Former [20/22] with the following changes:
   - Don't adjust mult for those ced's that have CLOCK_EVT_FEAT_NO_ADJUST set.
   - Don't meld __clockevents_update_bounds() into __clockevents_adjust_freq()
 anymore: the bounds for those devices having CLOCK_EVT_FEAT_NO_ADJUST set
 must have got their bounds set as well.
   - In __clockevents_calc_adjust_freq(), make sure that the adjusted mult
 doesn't exceed the original by more than 12.5%. C.f. [14/23].
   - In timekeeping, define timekeeping_get_mono_mult() only for
 CONFIG_GENERIC_CLOCKEVENTS=y.

  [22/23] ("clockevents: make setting of ->mult and ->mult_adjusted atomic")
   Former [12/22], but with the description updated: previously, it said that
   this patch would introduce a new locking dependency. This is not true.

  [23/23] ("timekeeping: inform clockevents about freq adjustments")
Former [22/22] with the following changes:
- Don't adjust mult for those ced's that have CLOCK_EVT_FEAT_NO_ADJUST set.
- In clockevents_adjust_all_freqs(), reuse the adjusted cached mult only
  if the associated ->shift also matches.
- Introduce noop clockevents_adjust_all_freqs() in order to fix a
  compilation error with CONFIG_GENERIC_CLOCKEVENTS=n.

Nicolai Stange (23):
  clocksource: sh_cmt: compute rate before registration again
  clocksource: sh_tmu: compute rate before registration again
  clocksource: em_sti: split clock prepare and enable steps
  clocksource: em_sti: compute rate before registration
  clocksource: h8300_timer8: don't reset rate in ->set_state_oneshot()
  clockevents: make 

[RFC v5 01/23] clocksource: sh_cmt: compute rate before registration again

2016-09-03 Thread Nicolai Stange
With the upcoming NTP correction related rate adjustments to be implemented
in the clockevents core, the latter needs to get informed about every rate
change of a clockevent device made after its registration.

Currently, sh_cmt violates this requirement in that it registers its
clockevent device with a dummy rate and sets its final ->mult and ->shift
values from its ->set_state_oneshot() and ->set_state_periodic() functions
respectively.

This patch moves the setting of the clockevent device's ->mult and ->shift
values to before its registration.

Note that there has been some back and forth regarding this question with
respect to the clocksource also provided by this driver:
  commit f4d7c3565c16 ("clocksource: sh_cmt: compute mult and shift before
registration")
moves the rate determination from the clocksource's ->enable() function to
before its registration. OTOH, the later
  commit 3593f5fe40a1 ("clocksource: sh_cmt: __clocksource_updatefreq_hz()
update")
basically reverts this, saying
  "Without this patch the old code uses clocksource_register() together
   with a hack that assumes a never changing clock rate."

However, I checked all current sh_cmt users in arch/sh as well as in
arch/arm/mach-shmobile carefully and right now, none of them changes any
rate in any clock tree relevant to sh_cmt after their respective
time_init(). Since all sh_cmt instances are created after time_init(), none
of them should ever observe any clock rate changes.

What's more, both, a clocksource as well as a clockevent device, can
immediately get selected for use at their registration and thus, enabled
at this point already. So it's probably safer to assume a "never changing
clock rate" here.

- Move the struct sh_cmt_channel's ->rate member to struct sh_cmt_device:
  it's a property of the underlying clock which is in turn specific to
  the sh_cmt_device.
- Determine the ->rate value in sh_cmt_setup() at device probing rather
  than at first usage.
- Set the clockevent device's ->mult and ->shift values right before its
  registration.
- Although not strictly necessary for the upcoming clockevent core changes,
  set the clocksource's rate at its registration for consistency.

Signed-off-by: Nicolai Stange 
---

Notes:
For a detailed analysis of the current sh_cmt users, please see
https://nicst.de/ced-clk-rate-change-analysis/sh_cmt-cgitted.html

Compile-only tested on ARCH=sh and ARCH=arm.

 drivers/clocksource/sh_cmt.c | 50 
 1 file changed, 27 insertions(+), 23 deletions(-)

diff --git a/drivers/clocksource/sh_cmt.c b/drivers/clocksource/sh_cmt.c
index 103c493..fd9caa7 100644
--- a/drivers/clocksource/sh_cmt.c
+++ b/drivers/clocksource/sh_cmt.c
@@ -103,7 +103,6 @@ struct sh_cmt_channel {
unsigned long match_value;
unsigned long next_match_value;
unsigned long max_match_value;
-   unsigned long rate;
raw_spinlock_t lock;
struct clock_event_device ced;
struct clocksource cs;
@@ -118,6 +117,7 @@ struct sh_cmt_device {
 
void __iomem *mapbase;
struct clk *clk;
+   unsigned long rate;
 
raw_spinlock_t lock; /* Protect the shared start/stop register */
 
@@ -320,7 +320,7 @@ static void sh_cmt_start_stop_ch(struct sh_cmt_channel *ch, 
int start)
raw_spin_unlock_irqrestore(>cmt->lock, flags);
 }
 
-static int sh_cmt_enable(struct sh_cmt_channel *ch, unsigned long *rate)
+static int sh_cmt_enable(struct sh_cmt_channel *ch)
 {
int k, ret;
 
@@ -339,17 +339,14 @@ static int sh_cmt_enable(struct sh_cmt_channel *ch, 
unsigned long *rate)
sh_cmt_start_stop_ch(ch, 0);
 
/* configure channel, periodic mode and maximum timeout */
-   if (ch->cmt->info->width == 16) {
-   *rate = clk_get_rate(ch->cmt->clk) / 512;
+   if (ch->cmt->info->width == 16)
sh_cmt_write_cmcsr(ch, SH_CMT16_CMCSR_CMIE |
   SH_CMT16_CMCSR_CKS512);
-   } else {
-   *rate = clk_get_rate(ch->cmt->clk) / 8;
+   else
sh_cmt_write_cmcsr(ch, SH_CMT32_CMCSR_CMM |
   SH_CMT32_CMCSR_CMTOUT_IE |
   SH_CMT32_CMCSR_CMR_IRQ |
   SH_CMT32_CMCSR_CKS_RCLK8);
-   }
 
sh_cmt_write_cmcor(ch, 0x);
sh_cmt_write_cmcnt(ch, 0);
@@ -572,7 +569,7 @@ static int sh_cmt_start(struct sh_cmt_channel *ch, unsigned 
long flag)
raw_spin_lock_irqsave(>lock, flags);
 
if (!(ch->flags & (FLAG_CLOCKEVENT | FLAG_CLOCKSOURCE)))
-   ret = sh_cmt_enable(ch, >rate);
+   ret = sh_cmt_enable(ch);
 
if (ret)
goto out;
@@ -640,10 +637,9 @@ static int sh_cmt_clocksource_enable(struct clocksource 
*cs)
ch->total_cycles = 0;
 
ret = sh_cmt_start(ch, FLAG_CLOCKSOURCE);
-   if (!ret) {
- 

[RFC v5 00/23] adapt clockevents frequencies to mono clock

2016-09-03 Thread Nicolai Stange
Goal: avoid programming ced devices too early for large deltas, for
  details, for details, c.f. the description of [21/23].


Previous v4 of this series can be found here:

  http://lkml.kernel.org/r/20160822233320.4548-1-nicsta...@gmail.com

Raised concerns were
1.) There should be a flag available which when enabled, should
exclude a ced from getting its ->mult adjusted.
2.) The ns2cyc multiplication in clockevents_program_event() can
overflow with the adjusted ->mult.

The first item is addressed by the new
[13/23] ("clockevents: introduce CLOCK_EVT_FEAT_NO_ADJUST flag"),
the second by
[14/23] ("clockevents: decouple ->max_delta_ns from ->max_delta_ticks")

A note regarding the CLOCK_EVT_FEAT_NO_ADJUST flag: we probably want
to set it for the ->mult = 1, ->shift = 0 devices. However, since the
frequency adjustments are small (and bounded), they actually translate
to noops for those ceds. Thus, setting it there would be a matter of
style and optimization and I left this out for now.


This series can be divided into logical subseries as follows:
[1-6/23]   Don't modify ced rate after registrations through mechanisms
   other than clockevents_update_freq().

[7-12/23]  Let all ced devices set their ->*_delta_ticks values and let
   the clockevent core do the rest.

[13/23]Introduce the CLOCK_EVT_FEAT_NO_ADJUST flag

[14-20/23] Fiddle around with the bound checking code in order to
   allow for non-atomic frequency updates from a CPU different
   than where the ced is programmed.

[21-23/23] Actually do the frequency adjustments.


Tested on x86_64 and next-20160825.


Changes to v4:
 [1-12/23] Unchanged

 [13/23] ("clockevents: introduce CLOCK_EVT_FEAT_NO_ADJUST flag")
   New.

 [14/23] ("clockevents: decouple ->max_delta_ns from ->max_delta_ticks")
   New. Solves the overflow problem the former
   [13/22] ("clockevents: check a programmed delta's bounds in terms of cycles")
   from v4 introduced.

   (Note that the former
[14/22] ("clockevents: clockevents_program_event(): turn clc into unsigned 
long")
from v4 has been purged.)

 [15/23] ("clockevents: do comparison of delta against minimum in terms of 
cycles")
   This is the former
   [13/22] ("clockevents: check a programmed delta's bounds in terms of 
cycles"),
   but only for the ->min_delta_* -- the ->max_delta_* are handled by [14/23] 
now.

 [16/23] ("clockevents: clockevents_program_min_delta(): don't set 
->next_event")
   Former [15/22] unchanged.

 [17/23] ("clockevents: use ->min_delta_ticks_adjusted to program minimum 
delta")
   Former [16/22]. Trivially fix compilation error with
   CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=n.

 [18/22] ("clockevents: min delta increment: calculate min_delta_ns from ticks")
   Former [17/22] unchanged.

 [19/23] ("timer_list: print_tickdevice(): calculate ->min_delta_ns 
dynamically")
   Corresponds to former
   [18/22] ("timer_list: print_tickdevice(): calculate ->*_delta_ns 
dynamically")
   from v4, but only for ->min_delta_ns. The changes required for the display of
   ->max_delta_ns are now being made in [14/23] already.

 [20/23] ("clockevents: purge ->min_delta_ns")
   Corresponds to former
   [19/22] ("clockevents: purge ->min_delta_ns and ->max_delta_ns"),
   but with ->max_delta_ns being kept.

 [21/23] ("clockevents: initial support for mono to raw time conversion")
   Former [20/22] with the following changes:
   - Don't adjust mult for those ced's that have CLOCK_EVT_FEAT_NO_ADJUST set.
   - Don't meld __clockevents_update_bounds() into __clockevents_adjust_freq()
 anymore: the bounds for those devices having CLOCK_EVT_FEAT_NO_ADJUST set
 must have got their bounds set as well.
   - In __clockevents_calc_adjust_freq(), make sure that the adjusted mult
 doesn't exceed the original by more than 12.5%. C.f. [14/23].
   - In timekeeping, define timekeeping_get_mono_mult() only for
 CONFIG_GENERIC_CLOCKEVENTS=y.

  [22/23] ("clockevents: make setting of ->mult and ->mult_adjusted atomic")
   Former [12/22], but with the description updated: previously, it said that
   this patch would introduce a new locking dependency. This is not true.

  [23/23] ("timekeeping: inform clockevents about freq adjustments")
Former [22/22] with the following changes:
- Don't adjust mult for those ced's that have CLOCK_EVT_FEAT_NO_ADJUST set.
- In clockevents_adjust_all_freqs(), reuse the adjusted cached mult only
  if the associated ->shift also matches.
- Introduce noop clockevents_adjust_all_freqs() in order to fix a
  compilation error with CONFIG_GENERIC_CLOCKEVENTS=n.

Nicolai Stange (23):
  clocksource: sh_cmt: compute rate before registration again
  clocksource: sh_tmu: compute rate before registration again
  clocksource: em_sti: split clock prepare and enable steps
  clocksource: em_sti: compute rate before registration
  clocksource: h8300_timer8: don't reset rate in ->set_state_oneshot()
  clockevents: make 

[RFC v5 11/23] clockevents: always initialize ->min_delta_ns and ->max_delta_ns

2016-09-03 Thread Nicolai Stange
Now that all clockevent drivers set ->min_delta_ticks and ->max_delta_ticks
independently of whether they use clockevents_config*() or not, the
clockevent core can calculate ->min_delta_ns and ->max_delta_ns from these
unconditionally.

The goal is to prepare the clockevent core for introducing NTP rate
correction awareness: as the clockevent devices' rates will get adjusted,
the ->*_delta_ns won't stay fixed but need to get changed as well.

Thus, make the clockevent core calculate the ->*_delta_ns values from
the invariant ->_delta_ticks ones at device registration.

In order to facilitate this, move the corresponding code from
clockevents_config() into its own helper function,
__clockevents_update_bounds() and invoke this where needed, in
particular from clockevents_register_device().

Note that there is a side effect affecting those drivers
- not using clockevents_config*()
- and manually initializing their ->min_delta_ns to values less than 1us.
In order to avoid "pointless noise", the clockevent core always sets
->min_delta_ns to values >= 1us, and thus, those drivers' ->min_delta_ns
is now increased. I think that this side effect is desired though.

Signed-off-by: Nicolai Stange 
---
 kernel/time/clockevents.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index e73ac7f..f352f54 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -442,6 +442,19 @@ int clockevents_unbind_device(struct clock_event_device 
*ced, int cpu)
 }
 EXPORT_SYMBOL_GPL(clockevents_unbind_device);
 
+static void __clockevents_update_bounds(struct clock_event_device *dev)
+{
+   if (!(dev->features & CLOCK_EVT_FEAT_ONESHOT))
+   return;
+
+   /*
+* cev_delta2ns() never returns values less than 1us and thus,
+* we'll never program any ced with anything less.
+*/
+   dev->min_delta_ns = cev_delta2ns(dev->min_delta_ticks, dev, false);
+   dev->max_delta_ns = cev_delta2ns(dev->max_delta_ticks, dev, true);
+}
+
 /**
  * clockevents_register_device - register a clock event device
  * @dev:   device to register
@@ -458,6 +471,8 @@ void clockevents_register_device(struct clock_event_device 
*dev)
dev->cpumask = cpumask_of(smp_processor_id());
}
 
+   __clockevents_update_bounds(dev);
+
raw_spin_lock_irqsave(_lock, flags);
 
list_add(>list, _devices);
@@ -488,8 +503,6 @@ static void clockevents_config(struct clock_event_device 
*dev, u32 freq)
sec = 600;
 
clockevents_calc_mult_shift(dev, freq, sec);
-   dev->min_delta_ns = cev_delta2ns(dev->min_delta_ticks, dev, false);
-   dev->max_delta_ns = cev_delta2ns(dev->max_delta_ticks, dev, true);
 }
 
 /**
@@ -515,6 +528,7 @@ EXPORT_SYMBOL_GPL(clockevents_config_and_register);
 int __clockevents_update_freq(struct clock_event_device *dev, u32 freq)
 {
clockevents_config(dev, freq);
+   __clockevents_update_bounds(dev);
 
if (clockevent_state_oneshot(dev))
return clockevents_program_event(dev, dev->next_event, false);
-- 
2.9.3



[RFC v5 04/23] clocksource: em_sti: compute rate before registration

2016-09-03 Thread Nicolai Stange
With the upcoming NTP correction related rate adjustments to be implemented
in the clockevents core, the latter needs to get informed about every rate
change of a clockevent device made after its registration.

Currently, em_sti violates this requirement in that it registers its
clockevent device with a dummy rate and sets its final rate through
clockevents_config() called from its ->set_state_oneshot().

This patch moves the setting of the clockevent device's rate to its
registration.

I checked all current em_sti users in arch/arm/mach-shmobile and right now,
none of them changes any rate in any clock tree relevant to em_sti after
their respective time_init(). Since all em_sti instances are created after
time_init(), none of them should ever observe any clock rate changes.

- Determine the ->rate value in em_sti_probe() at device probing rather
  than at first usage.
- Set the clockevent device's rate at its registration.
- Although not strictly necessary for the upcoming clockevent core changes,
  set the clocksource's rate at its registration for consistency.

Signed-off-by: Nicolai Stange 
---

Notes:
For a detailed analysis of the current em_sti users, please see
https://nicst.de/ced-clk-rate-change-analysis/em_sti-cgitted.html

Compile-only tested on ARCH=arm.

 drivers/clocksource/em_sti.c | 24 ++--
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/clocksource/em_sti.c b/drivers/clocksource/em_sti.c
index 46750c0..6168d18 100644
--- a/drivers/clocksource/em_sti.c
+++ b/drivers/clocksource/em_sti.c
@@ -205,13 +205,9 @@ static cycle_t em_sti_clocksource_read(struct clocksource 
*cs)
 
 static int em_sti_clocksource_enable(struct clocksource *cs)
 {
-   int ret;
struct em_sti_priv *p = cs_to_em_sti(cs);
 
-   ret = em_sti_start(p, USER_CLOCKSOURCE);
-   if (!ret)
-   __clocksource_update_freq_hz(cs, p->rate);
-   return ret;
+   return em_sti_start(p, USER_CLOCKSOURCE);
 }
 
 static void em_sti_clocksource_disable(struct clocksource *cs)
@@ -240,8 +236,7 @@ static int em_sti_register_clocksource(struct em_sti_priv 
*p)
 
dev_info(>pdev->dev, "used as clock source\n");
 
-   /* Register with dummy 1 Hz value, gets updated in ->enable() */
-   clocksource_register_hz(cs, 1);
+   clocksource_register_hz(cs, p->rate);
return 0;
 }
 
@@ -263,7 +258,6 @@ static int em_sti_clock_event_set_oneshot(struct 
clock_event_device *ced)
 
dev_info(>pdev->dev, "used for oneshot clock events\n");
em_sti_start(p, USER_CLOCKEVENT);
-   clockevents_config(>ced, p->rate);
return 0;
 }
 
@@ -294,8 +288,7 @@ static void em_sti_register_clockevent(struct em_sti_priv 
*p)
 
dev_info(>pdev->dev, "used for clock events\n");
 
-   /* Register with dummy 1 Hz value, gets updated in 
->set_state_oneshot() */
-   clockevents_config_and_register(ced, 1, 2, 0x);
+   clockevents_config_and_register(ced, p->rate, 2, 0x);
 }
 
 static int em_sti_probe(struct platform_device *pdev)
@@ -344,11 +337,22 @@ static int em_sti_probe(struct platform_device *pdev)
goto err_clk_put;
}
 
+   ret = clk_enable(p->clk);
+   if (ret < 0) {
+   dev_err(>pdev->dev, "cannot enable clock\n");
+   goto err_clk_unprepare;
+   }
+   p->rate = clk_get_rate(p->clk);
+   clk_disable(p->clk);
+
raw_spin_lock_init(>lock);
em_sti_register_clockevent(p);
em_sti_register_clocksource(p);
return 0;
 
+err_clk_unprepare:
+   clk_unprepare(p->clk);
+
 err_clk_put:
clk_put(p->clk);
return ret;
-- 
2.9.3



[RFC v5 11/23] clockevents: always initialize ->min_delta_ns and ->max_delta_ns

2016-09-03 Thread Nicolai Stange
Now that all clockevent drivers set ->min_delta_ticks and ->max_delta_ticks
independently of whether they use clockevents_config*() or not, the
clockevent core can calculate ->min_delta_ns and ->max_delta_ns from these
unconditionally.

The goal is to prepare the clockevent core for introducing NTP rate
correction awareness: as the clockevent devices' rates will get adjusted,
the ->*_delta_ns won't stay fixed but need to get changed as well.

Thus, make the clockevent core calculate the ->*_delta_ns values from
the invariant ->_delta_ticks ones at device registration.

In order to facilitate this, move the corresponding code from
clockevents_config() into its own helper function,
__clockevents_update_bounds() and invoke this where needed, in
particular from clockevents_register_device().

Note that there is a side effect affecting those drivers
- not using clockevents_config*()
- and manually initializing their ->min_delta_ns to values less than 1us.
In order to avoid "pointless noise", the clockevent core always sets
->min_delta_ns to values >= 1us, and thus, those drivers' ->min_delta_ns
is now increased. I think that this side effect is desired though.

Signed-off-by: Nicolai Stange 
---
 kernel/time/clockevents.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index e73ac7f..f352f54 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -442,6 +442,19 @@ int clockevents_unbind_device(struct clock_event_device 
*ced, int cpu)
 }
 EXPORT_SYMBOL_GPL(clockevents_unbind_device);
 
+static void __clockevents_update_bounds(struct clock_event_device *dev)
+{
+   if (!(dev->features & CLOCK_EVT_FEAT_ONESHOT))
+   return;
+
+   /*
+* cev_delta2ns() never returns values less than 1us and thus,
+* we'll never program any ced with anything less.
+*/
+   dev->min_delta_ns = cev_delta2ns(dev->min_delta_ticks, dev, false);
+   dev->max_delta_ns = cev_delta2ns(dev->max_delta_ticks, dev, true);
+}
+
 /**
  * clockevents_register_device - register a clock event device
  * @dev:   device to register
@@ -458,6 +471,8 @@ void clockevents_register_device(struct clock_event_device 
*dev)
dev->cpumask = cpumask_of(smp_processor_id());
}
 
+   __clockevents_update_bounds(dev);
+
raw_spin_lock_irqsave(_lock, flags);
 
list_add(>list, _devices);
@@ -488,8 +503,6 @@ static void clockevents_config(struct clock_event_device 
*dev, u32 freq)
sec = 600;
 
clockevents_calc_mult_shift(dev, freq, sec);
-   dev->min_delta_ns = cev_delta2ns(dev->min_delta_ticks, dev, false);
-   dev->max_delta_ns = cev_delta2ns(dev->max_delta_ticks, dev, true);
 }
 
 /**
@@ -515,6 +528,7 @@ EXPORT_SYMBOL_GPL(clockevents_config_and_register);
 int __clockevents_update_freq(struct clock_event_device *dev, u32 freq)
 {
clockevents_config(dev, freq);
+   __clockevents_update_bounds(dev);
 
if (clockevent_state_oneshot(dev))
return clockevents_program_event(dev, dev->next_event, false);
-- 
2.9.3



[RFC v5 04/23] clocksource: em_sti: compute rate before registration

2016-09-03 Thread Nicolai Stange
With the upcoming NTP correction related rate adjustments to be implemented
in the clockevents core, the latter needs to get informed about every rate
change of a clockevent device made after its registration.

Currently, em_sti violates this requirement in that it registers its
clockevent device with a dummy rate and sets its final rate through
clockevents_config() called from its ->set_state_oneshot().

This patch moves the setting of the clockevent device's rate to its
registration.

I checked all current em_sti users in arch/arm/mach-shmobile and right now,
none of them changes any rate in any clock tree relevant to em_sti after
their respective time_init(). Since all em_sti instances are created after
time_init(), none of them should ever observe any clock rate changes.

- Determine the ->rate value in em_sti_probe() at device probing rather
  than at first usage.
- Set the clockevent device's rate at its registration.
- Although not strictly necessary for the upcoming clockevent core changes,
  set the clocksource's rate at its registration for consistency.

Signed-off-by: Nicolai Stange 
---

Notes:
For a detailed analysis of the current em_sti users, please see
https://nicst.de/ced-clk-rate-change-analysis/em_sti-cgitted.html

Compile-only tested on ARCH=arm.

 drivers/clocksource/em_sti.c | 24 ++--
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/clocksource/em_sti.c b/drivers/clocksource/em_sti.c
index 46750c0..6168d18 100644
--- a/drivers/clocksource/em_sti.c
+++ b/drivers/clocksource/em_sti.c
@@ -205,13 +205,9 @@ static cycle_t em_sti_clocksource_read(struct clocksource 
*cs)
 
 static int em_sti_clocksource_enable(struct clocksource *cs)
 {
-   int ret;
struct em_sti_priv *p = cs_to_em_sti(cs);
 
-   ret = em_sti_start(p, USER_CLOCKSOURCE);
-   if (!ret)
-   __clocksource_update_freq_hz(cs, p->rate);
-   return ret;
+   return em_sti_start(p, USER_CLOCKSOURCE);
 }
 
 static void em_sti_clocksource_disable(struct clocksource *cs)
@@ -240,8 +236,7 @@ static int em_sti_register_clocksource(struct em_sti_priv 
*p)
 
dev_info(>pdev->dev, "used as clock source\n");
 
-   /* Register with dummy 1 Hz value, gets updated in ->enable() */
-   clocksource_register_hz(cs, 1);
+   clocksource_register_hz(cs, p->rate);
return 0;
 }
 
@@ -263,7 +258,6 @@ static int em_sti_clock_event_set_oneshot(struct 
clock_event_device *ced)
 
dev_info(>pdev->dev, "used for oneshot clock events\n");
em_sti_start(p, USER_CLOCKEVENT);
-   clockevents_config(>ced, p->rate);
return 0;
 }
 
@@ -294,8 +288,7 @@ static void em_sti_register_clockevent(struct em_sti_priv 
*p)
 
dev_info(>pdev->dev, "used for clock events\n");
 
-   /* Register with dummy 1 Hz value, gets updated in 
->set_state_oneshot() */
-   clockevents_config_and_register(ced, 1, 2, 0x);
+   clockevents_config_and_register(ced, p->rate, 2, 0x);
 }
 
 static int em_sti_probe(struct platform_device *pdev)
@@ -344,11 +337,22 @@ static int em_sti_probe(struct platform_device *pdev)
goto err_clk_put;
}
 
+   ret = clk_enable(p->clk);
+   if (ret < 0) {
+   dev_err(>pdev->dev, "cannot enable clock\n");
+   goto err_clk_unprepare;
+   }
+   p->rate = clk_get_rate(p->clk);
+   clk_disable(p->clk);
+
raw_spin_lock_init(>lock);
em_sti_register_clockevent(p);
em_sti_register_clocksource(p);
return 0;
 
+err_clk_unprepare:
+   clk_unprepare(p->clk);
+
 err_clk_put:
clk_put(p->clk);
return ret;
-- 
2.9.3



[RFC v5 03/23] clocksource: em_sti: split clock prepare and enable steps

2016-09-03 Thread Nicolai Stange
Currently, the em_sti driver prepares and enables the needed clock in
em_sti_enable(), potentially called through its clockevent device's
->set_state_oneshot().

However, the clk_prepare() step may sleep whereas tick_program_event() and
thus, ->set_state_oneshot(), can be called in atomic context.

Split the clk_prepare_enable() in em_sti_enable() into two steps:
- prepare the clock at device probing via clk_prepare()
- and enable it in em_sti_enable() via clk_enable().
Slightly reorder resource initialization in em_sti_probe() in order to
facilitate error handling.

Signed-off-by: Nicolai Stange 
---

Notes:
Compile-only tested on ARCH=arm.

 drivers/clocksource/em_sti.c | 25 ++---
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/drivers/clocksource/em_sti.c b/drivers/clocksource/em_sti.c
index 19bb179..46750c0 100644
--- a/drivers/clocksource/em_sti.c
+++ b/drivers/clocksource/em_sti.c
@@ -78,7 +78,7 @@ static int em_sti_enable(struct em_sti_priv *p)
int ret;
 
/* enable clock */
-   ret = clk_prepare_enable(p->clk);
+   ret = clk_enable(p->clk);
if (ret) {
dev_err(>pdev->dev, "cannot enable clock\n");
return ret;
@@ -107,7 +107,7 @@ static void em_sti_disable(struct em_sti_priv *p)
em_sti_write(p, STI_INTENCLR, 3);
 
/* stop clock */
-   clk_disable_unprepare(p->clk);
+   clk_disable(p->clk);
 }
 
 static cycle_t em_sti_count(struct em_sti_priv *p)
@@ -303,6 +303,7 @@ static int em_sti_probe(struct platform_device *pdev)
struct em_sti_priv *p;
struct resource *res;
int irq;
+   int ret;
 
p = devm_kzalloc(>dev, sizeof(*p), GFP_KERNEL);
if (p == NULL)
@@ -323,6 +324,13 @@ static int em_sti_probe(struct platform_device *pdev)
if (IS_ERR(p->base))
return PTR_ERR(p->base);
 
+   if (devm_request_irq(>dev, irq, em_sti_interrupt,
+IRQF_TIMER | IRQF_IRQPOLL | IRQF_NOBALANCING,
+dev_name(>dev), p)) {
+   dev_err(>dev, "failed to request low IRQ\n");
+   return -ENOENT;
+   }
+
/* get hold of clock */
p->clk = devm_clk_get(>dev, "sclk");
if (IS_ERR(p->clk)) {
@@ -330,17 +338,20 @@ static int em_sti_probe(struct platform_device *pdev)
return PTR_ERR(p->clk);
}
 
-   if (devm_request_irq(>dev, irq, em_sti_interrupt,
-IRQF_TIMER | IRQF_IRQPOLL | IRQF_NOBALANCING,
-dev_name(>dev), p)) {
-   dev_err(>dev, "failed to request low IRQ\n");
-   return -ENOENT;
+   ret = clk_prepare(p->clk);
+   if (ret < 0) {
+   dev_err(>dev, "cannot prepare clock\n");
+   goto err_clk_put;
}
 
raw_spin_lock_init(>lock);
em_sti_register_clockevent(p);
em_sti_register_clocksource(p);
return 0;
+
+err_clk_put:
+   clk_put(p->clk);
+   return ret;
 }
 
 static int em_sti_remove(struct platform_device *pdev)
-- 
2.9.3



[RFC v5 03/23] clocksource: em_sti: split clock prepare and enable steps

2016-09-03 Thread Nicolai Stange
Currently, the em_sti driver prepares and enables the needed clock in
em_sti_enable(), potentially called through its clockevent device's
->set_state_oneshot().

However, the clk_prepare() step may sleep whereas tick_program_event() and
thus, ->set_state_oneshot(), can be called in atomic context.

Split the clk_prepare_enable() in em_sti_enable() into two steps:
- prepare the clock at device probing via clk_prepare()
- and enable it in em_sti_enable() via clk_enable().
Slightly reorder resource initialization in em_sti_probe() in order to
facilitate error handling.

Signed-off-by: Nicolai Stange 
---

Notes:
Compile-only tested on ARCH=arm.

 drivers/clocksource/em_sti.c | 25 ++---
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/drivers/clocksource/em_sti.c b/drivers/clocksource/em_sti.c
index 19bb179..46750c0 100644
--- a/drivers/clocksource/em_sti.c
+++ b/drivers/clocksource/em_sti.c
@@ -78,7 +78,7 @@ static int em_sti_enable(struct em_sti_priv *p)
int ret;
 
/* enable clock */
-   ret = clk_prepare_enable(p->clk);
+   ret = clk_enable(p->clk);
if (ret) {
dev_err(>pdev->dev, "cannot enable clock\n");
return ret;
@@ -107,7 +107,7 @@ static void em_sti_disable(struct em_sti_priv *p)
em_sti_write(p, STI_INTENCLR, 3);
 
/* stop clock */
-   clk_disable_unprepare(p->clk);
+   clk_disable(p->clk);
 }
 
 static cycle_t em_sti_count(struct em_sti_priv *p)
@@ -303,6 +303,7 @@ static int em_sti_probe(struct platform_device *pdev)
struct em_sti_priv *p;
struct resource *res;
int irq;
+   int ret;
 
p = devm_kzalloc(>dev, sizeof(*p), GFP_KERNEL);
if (p == NULL)
@@ -323,6 +324,13 @@ static int em_sti_probe(struct platform_device *pdev)
if (IS_ERR(p->base))
return PTR_ERR(p->base);
 
+   if (devm_request_irq(>dev, irq, em_sti_interrupt,
+IRQF_TIMER | IRQF_IRQPOLL | IRQF_NOBALANCING,
+dev_name(>dev), p)) {
+   dev_err(>dev, "failed to request low IRQ\n");
+   return -ENOENT;
+   }
+
/* get hold of clock */
p->clk = devm_clk_get(>dev, "sclk");
if (IS_ERR(p->clk)) {
@@ -330,17 +338,20 @@ static int em_sti_probe(struct platform_device *pdev)
return PTR_ERR(p->clk);
}
 
-   if (devm_request_irq(>dev, irq, em_sti_interrupt,
-IRQF_TIMER | IRQF_IRQPOLL | IRQF_NOBALANCING,
-dev_name(>dev), p)) {
-   dev_err(>dev, "failed to request low IRQ\n");
-   return -ENOENT;
+   ret = clk_prepare(p->clk);
+   if (ret < 0) {
+   dev_err(>dev, "cannot prepare clock\n");
+   goto err_clk_put;
}
 
raw_spin_lock_init(>lock);
em_sti_register_clockevent(p);
em_sti_register_clocksource(p);
return 0;
+
+err_clk_put:
+   clk_put(p->clk);
+   return ret;
 }
 
 static int em_sti_remove(struct platform_device *pdev)
-- 
2.9.3



[RFC v5 05/23] clocksource: h8300_timer8: don't reset rate in ->set_state_oneshot()

2016-09-03 Thread Nicolai Stange
With the upcoming NTP correction related rate adjustments to be implemented
in the clockevents core, the latter needs to get informed about every rate
change of a clockevent device made after its registration.

Currently, h8300_timer8 violates this requirement in that it registers its
clockevent device with the correct rate, but resets its ->mult and ->rate
values in timer8_clock_event_start(), called from its ->set_state_oneshot()
function.

It seems like
  commit 4633f4cac85a ("clocksource/drivers/h8300: Cleanup startup and
remove module code."),
which introduced the rate initialization at registration, missed to remove
the manual setting of ->mult and ->shift from timer8_clock_event_start().

Purge the setting of ->mult, ->shift, ->min_delta_ns and ->max_delta_ns
from timer8_clock_event_start().

Signed-off-by: Nicolai Stange 
---

Notes:
Compile-only tested on ARCH=h8300.

 drivers/clocksource/h8300_timer8.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/drivers/clocksource/h8300_timer8.c 
b/drivers/clocksource/h8300_timer8.c
index 546bb18..804c489 100644
--- a/drivers/clocksource/h8300_timer8.c
+++ b/drivers/clocksource/h8300_timer8.c
@@ -101,15 +101,7 @@ static inline struct timer8_priv *ced_to_priv(struct 
clock_event_device *ced)
 
 static void timer8_clock_event_start(struct timer8_priv *p, unsigned long 
delta)
 {
-   struct clock_event_device *ced = >ced;
-
timer8_start(p);
-
-   ced->shift = 32;
-   ced->mult = div_sc(p->rate, NSEC_PER_SEC, ced->shift);
-   ced->max_delta_ns = clockevent_delta2ns(0x, ced);
-   ced->min_delta_ns = clockevent_delta2ns(0x0001, ced);
-
timer8_set_next(p, delta);
 }
 
-- 
2.9.3



[RFC v5 06/23] clockevents: make clockevents_config() static

2016-09-03 Thread Nicolai Stange
A clockevent device's rate should be configured before or at registration
and changed afterwards through clockevents_update_freq() only.

For the configuration at registration, we already have
clockevents_config_and_register().

Right now, there are no clockevents_config() users outside of the
clockevents core.

To mitigiate the risk of drivers errorneously reconfiguring their rates
through clockevents_config() *after* device registration, make
clockevents_config() static.

Signed-off-by: Nicolai Stange 
---
 include/linux/clockchips.h | 1 -
 kernel/time/clockevents.c  | 2 +-
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
index 0d442e3..a116926 100644
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -182,7 +182,6 @@ extern u64 clockevent_delta2ns(unsigned long latch, struct 
clock_event_device *e
 extern void clockevents_register_device(struct clock_event_device *dev);
 extern int clockevents_unbind_device(struct clock_event_device *ced, int cpu);
 
-extern void clockevents_config(struct clock_event_device *dev, u32 freq);
 extern void clockevents_config_and_register(struct clock_event_device *dev,
u32 freq, unsigned long min_delta,
unsigned long max_delta);
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 2c5bc77..e73ac7f 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -468,7 +468,7 @@ void clockevents_register_device(struct clock_event_device 
*dev)
 }
 EXPORT_SYMBOL_GPL(clockevents_register_device);
 
-void clockevents_config(struct clock_event_device *dev, u32 freq)
+static void clockevents_config(struct clock_event_device *dev, u32 freq)
 {
u64 sec;
 
-- 
2.9.3



[RFC v5 05/23] clocksource: h8300_timer8: don't reset rate in ->set_state_oneshot()

2016-09-03 Thread Nicolai Stange
With the upcoming NTP correction related rate adjustments to be implemented
in the clockevents core, the latter needs to get informed about every rate
change of a clockevent device made after its registration.

Currently, h8300_timer8 violates this requirement in that it registers its
clockevent device with the correct rate, but resets its ->mult and ->rate
values in timer8_clock_event_start(), called from its ->set_state_oneshot()
function.

It seems like
  commit 4633f4cac85a ("clocksource/drivers/h8300: Cleanup startup and
remove module code."),
which introduced the rate initialization at registration, missed to remove
the manual setting of ->mult and ->shift from timer8_clock_event_start().

Purge the setting of ->mult, ->shift, ->min_delta_ns and ->max_delta_ns
from timer8_clock_event_start().

Signed-off-by: Nicolai Stange 
---

Notes:
Compile-only tested on ARCH=h8300.

 drivers/clocksource/h8300_timer8.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/drivers/clocksource/h8300_timer8.c 
b/drivers/clocksource/h8300_timer8.c
index 546bb18..804c489 100644
--- a/drivers/clocksource/h8300_timer8.c
+++ b/drivers/clocksource/h8300_timer8.c
@@ -101,15 +101,7 @@ static inline struct timer8_priv *ced_to_priv(struct 
clock_event_device *ced)
 
 static void timer8_clock_event_start(struct timer8_priv *p, unsigned long 
delta)
 {
-   struct clock_event_device *ced = >ced;
-
timer8_start(p);
-
-   ced->shift = 32;
-   ced->mult = div_sc(p->rate, NSEC_PER_SEC, ced->shift);
-   ced->max_delta_ns = clockevent_delta2ns(0x, ced);
-   ced->min_delta_ns = clockevent_delta2ns(0x0001, ced);
-
timer8_set_next(p, delta);
 }
 
-- 
2.9.3



[RFC v5 06/23] clockevents: make clockevents_config() static

2016-09-03 Thread Nicolai Stange
A clockevent device's rate should be configured before or at registration
and changed afterwards through clockevents_update_freq() only.

For the configuration at registration, we already have
clockevents_config_and_register().

Right now, there are no clockevents_config() users outside of the
clockevents core.

To mitigiate the risk of drivers errorneously reconfiguring their rates
through clockevents_config() *after* device registration, make
clockevents_config() static.

Signed-off-by: Nicolai Stange 
---
 include/linux/clockchips.h | 1 -
 kernel/time/clockevents.c  | 2 +-
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
index 0d442e3..a116926 100644
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -182,7 +182,6 @@ extern u64 clockevent_delta2ns(unsigned long latch, struct 
clock_event_device *e
 extern void clockevents_register_device(struct clock_event_device *dev);
 extern int clockevents_unbind_device(struct clock_event_device *ced, int cpu);
 
-extern void clockevents_config(struct clock_event_device *dev, u32 freq);
 extern void clockevents_config_and_register(struct clock_event_device *dev,
u32 freq, unsigned long min_delta,
unsigned long max_delta);
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 2c5bc77..e73ac7f 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -468,7 +468,7 @@ void clockevents_register_device(struct clock_event_device 
*dev)
 }
 EXPORT_SYMBOL_GPL(clockevents_register_device);
 
-void clockevents_config(struct clock_event_device *dev, u32 freq)
+static void clockevents_config(struct clock_event_device *dev, u32 freq)
 {
u64 sec;
 
-- 
2.9.3



[RFC v5 08/23] arch/s390/kernel/time: set ->min_delta_ticks and ->max_delta_ticks

2016-09-03 Thread Nicolai Stange
With the yet to come introduction of NTP correction awareness to the
clockevent core, drivers should report their valid ranges in units of
cycles to the latter.

Currently, the s390's CPU timer clockevent device is initialized as
follows:

  cd->min_delta_ns= 1;
  cd->max_delta_ns= LONG_MAX;

Note that the device's time to cycle conversion factor, i.e.
cd->mult / (2^cd->shift), is approx. equal to 4.

Hence, this would translate to

  cd->min_delta_ticks = 4;
  cd->max_delta_ticks = 4 * LONG_MAX;

However, a minimum value of 1ns is in the range of noise anyway and the
clockevent core will take care of this by increasing it to 1us or so.
Furthermore, 4*LONG_MAX will overflow the unsigned long argument the
clockevent devices gets programmed with.

Thus, initialize ->min_delta_ticks with 1 and ->max_delta_ticks with
ULONG_MAX.

Signed-off-by: Nicolai Stange 
---
 arch/s390/kernel/time.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/s390/kernel/time.c b/arch/s390/kernel/time.c
index 4e99498..9e66df1 100644
--- a/arch/s390/kernel/time.c
+++ b/arch/s390/kernel/time.c
@@ -166,7 +166,9 @@ void init_cpu_timer(void)
cd->mult= 16777;
cd->shift   = 12;
cd->min_delta_ns= 1;
+   cd->min_delta_ticks = 1;
cd->max_delta_ns= LONG_MAX;
+   cd->max_delta_ticks = ULONG_MAX;
cd->rating  = 400;
cd->cpumask = cpumask_of(cpu);
cd->set_next_event  = s390_next_event;
-- 
2.9.3



[RFC v5 08/23] arch/s390/kernel/time: set ->min_delta_ticks and ->max_delta_ticks

2016-09-03 Thread Nicolai Stange
With the yet to come introduction of NTP correction awareness to the
clockevent core, drivers should report their valid ranges in units of
cycles to the latter.

Currently, the s390's CPU timer clockevent device is initialized as
follows:

  cd->min_delta_ns= 1;
  cd->max_delta_ns= LONG_MAX;

Note that the device's time to cycle conversion factor, i.e.
cd->mult / (2^cd->shift), is approx. equal to 4.

Hence, this would translate to

  cd->min_delta_ticks = 4;
  cd->max_delta_ticks = 4 * LONG_MAX;

However, a minimum value of 1ns is in the range of noise anyway and the
clockevent core will take care of this by increasing it to 1us or so.
Furthermore, 4*LONG_MAX will overflow the unsigned long argument the
clockevent devices gets programmed with.

Thus, initialize ->min_delta_ticks with 1 and ->max_delta_ticks with
ULONG_MAX.

Signed-off-by: Nicolai Stange 
---
 arch/s390/kernel/time.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/s390/kernel/time.c b/arch/s390/kernel/time.c
index 4e99498..9e66df1 100644
--- a/arch/s390/kernel/time.c
+++ b/arch/s390/kernel/time.c
@@ -166,7 +166,9 @@ void init_cpu_timer(void)
cd->mult= 16777;
cd->shift   = 12;
cd->min_delta_ns= 1;
+   cd->min_delta_ticks = 1;
cd->max_delta_ns= LONG_MAX;
+   cd->max_delta_ticks = ULONG_MAX;
cd->rating  = 400;
cd->cpumask = cpumask_of(cpu);
cd->set_next_event  = s390_next_event;
-- 
2.9.3



  1   2   3   4   5   6   7   >