Re: [PATCH 1/2] mfd: max77693: Allow building as a module

2016-04-07 Thread Krzysztof Kozlowski
On Thu, Apr 07, 2016 at 02:29:47PM +0100, Lee Jones wrote:
> On Mon, 04 Apr 2016, Krzysztof Kozlowski wrote:
> 
> > The consumer of max77693 regulators on Trats2 board (samsung-usb2-phy
> > driver) supports deferred probing so the max77693 main MFD driver can be
> > built now as a module. This gives more flexibility and removes manual
> > ordering of init calls.
> > 
> > Suggested-by: Paul Gortmaker 
> > Cc: Paul Gortmaker 
> > Signed-off-by: Krzysztof Kozlowski 
> > ---
> >  drivers/mfd/Kconfig|  4 ++--
> >  drivers/mfd/max77693.c | 14 ++
> >  2 files changed, 4 insertions(+), 14 deletions(-)
> 
> I assume this can be taken immediately and doesn't depend on anything
> external to the set?
> 
> For my own reference:
>   Acked-by: Lee Jones 

Hi,

Yes, this can be taken as is. Only second patch (changing defconfig)
depends on this. I can take the second patch through samsung tree but
that would require a tag/branch with this... which looks like an
overkill. So maybe you would take both?

Best regards,
Krzysztof


Re: [PATCH 1/2] mfd: max77693: Allow building as a module

2016-04-07 Thread Krzysztof Kozlowski
On Thu, Apr 07, 2016 at 02:29:47PM +0100, Lee Jones wrote:
> On Mon, 04 Apr 2016, Krzysztof Kozlowski wrote:
> 
> > The consumer of max77693 regulators on Trats2 board (samsung-usb2-phy
> > driver) supports deferred probing so the max77693 main MFD driver can be
> > built now as a module. This gives more flexibility and removes manual
> > ordering of init calls.
> > 
> > Suggested-by: Paul Gortmaker 
> > Cc: Paul Gortmaker 
> > Signed-off-by: Krzysztof Kozlowski 
> > ---
> >  drivers/mfd/Kconfig|  4 ++--
> >  drivers/mfd/max77693.c | 14 ++
> >  2 files changed, 4 insertions(+), 14 deletions(-)
> 
> I assume this can be taken immediately and doesn't depend on anything
> external to the set?
> 
> For my own reference:
>   Acked-by: Lee Jones 

Hi,

Yes, this can be taken as is. Only second patch (changing defconfig)
depends on this. I can take the second patch through samsung tree but
that would require a tag/branch with this... which looks like an
overkill. So maybe you would take both?

Best regards,
Krzysztof


[PATCH] [linux-next]:ALSA: Fix a typo in timestamping.txt

2016-04-07 Thread Masanari Iida
This patch fix a spelling typo found in
Documentation/sound/alsa/timestamping.txt

Signed-off-by: Masanari Iida 
---
 Documentation/sound/alsa/timestamping.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/sound/alsa/timestamping.txt 
b/Documentation/sound/alsa/timestamping.txt
index 0b191a23f534..1b6473f393a8 100644
--- a/Documentation/sound/alsa/timestamping.txt
+++ b/Documentation/sound/alsa/timestamping.txt
@@ -129,7 +129,7 @@ will be required to issue multiple queries and perform an
 interpolation of the results
 
 In some hardware-specific configuration, the system timestamp is
-latched by a low-level audio subsytem, and the information provided
+latched by a low-level audio subsystem, and the information provided
 back to the driver. Due to potential delays in the communication with
 the hardware, there is a risk of misalignment with the avail and delay
 information. To make sure applications are not confused, a
-- 
2.8.0



[PATCH] [linux-next]:ALSA: Fix a typo in timestamping.txt

2016-04-07 Thread Masanari Iida
This patch fix a spelling typo found in
Documentation/sound/alsa/timestamping.txt

Signed-off-by: Masanari Iida 
---
 Documentation/sound/alsa/timestamping.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/sound/alsa/timestamping.txt 
b/Documentation/sound/alsa/timestamping.txt
index 0b191a23f534..1b6473f393a8 100644
--- a/Documentation/sound/alsa/timestamping.txt
+++ b/Documentation/sound/alsa/timestamping.txt
@@ -129,7 +129,7 @@ will be required to issue multiple queries and perform an
 interpolation of the results
 
 In some hardware-specific configuration, the system timestamp is
-latched by a low-level audio subsytem, and the information provided
+latched by a low-level audio subsystem, and the information provided
 back to the driver. Due to potential delays in the communication with
 the hardware, there is a risk of misalignment with the avail and delay
 information. To make sure applications are not confused, a
-- 
2.8.0



Re: [PATCH 2/2] pci, acpi: free IO resource during shutdown

2016-04-07 Thread Sinan Kaya
Hi Tomasz,

On 4/7/2016 5:41 PM, Bjorn Helgaas wrote:
>>> You say this is undoing the effect of pci_remap_iospace(), but that's
>>> > > only called by native drivers and the generic (OF) driver, not by
>>> > > pci_root.c.
>> > 
>> > See the ACPI root bridge driver above.
> If this is a fix to patches that haven't been merged yet, we need to
> squash the fix into the patches.
> 

Can you merge these to two patches to your series for the next post?

I need to remove weak on the first patch per direction from Bjorn and 
fix the function comments. You could as well do this while you are merging. 

Let me know what your preference is.

-- 
Sinan Kaya
Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux 
Foundation Collaborative Project


Re: [PATCH 2/2] pci, acpi: free IO resource during shutdown

2016-04-07 Thread Sinan Kaya
Hi Tomasz,

On 4/7/2016 5:41 PM, Bjorn Helgaas wrote:
>>> You say this is undoing the effect of pci_remap_iospace(), but that's
>>> > > only called by native drivers and the generic (OF) driver, not by
>>> > > pci_root.c.
>> > 
>> > See the ACPI root bridge driver above.
> If this is a fix to patches that haven't been merged yet, we need to
> squash the fix into the patches.
> 

Can you merge these to two patches to your series for the next post?

I need to remove weak on the first patch per direction from Bjorn and 
fix the function comments. You could as well do this while you are merging. 

Let me know what your preference is.

-- 
Sinan Kaya
Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux 
Foundation Collaborative Project


Re: [PATCH 17/19] perf tools: Build syscall table .c header from kernel's syscall_64.tbl

2016-04-07 Thread Wangnan (F)



On 2016/4/8 4:58, Arnaldo Carvalho de Melo wrote:

From: Arnaldo Carvalho de Melo 

We used libaudit to map ids to syscall names and vice-versa, but that
imposes a delay in supporting new syscalls, having to wait for libaudit
to get those new syscalls on its tables.

To remove that delay, for x86_64 initially, grab a copy of
arch/x86/entry/syscalls/syscall_64.tbl and use it to generate those
tables.

Syscalls currently not available in audit-libs:

   # trace -e 
copy_file_range,membarrier,mlock2,pread64,pwrite64,timerfd_create,userfaultfd
   Error:   Invalid syscall copy_file_range, membarrier, mlock2, pread64, 
pwrite64, timerfd_create, userfaultfd
   Hint:try 'perf list syscalls:sys_enter_*'
   Hint:and: 'man syscalls'
   #

With this patch:

   # trace -e 
copy_file_range,membarrier,mlock2,pread64,pwrite64,timerfd_create,userfaultfd
 8505.733 ( 0.010 ms): gnome-shell/2519 timerfd_create(flags: 524288) = 36
 8506.688 ( 0.005 ms): gnome-shell/2519 timerfd_create(flags: 524288) = 40
30023.097 ( 0.025 ms): qemu-system-x8/24629 pwrite64(fd: 18, buf: 
0x7f63ae382000, count: 4096, pos: 529592320) = 4096
31268.712 ( 0.028 ms): qemu-system-x8/24629 pwrite64(fd: 18, buf: 
0x7f63afd8b000, count: 4096, pos: 2314133504) = 4096
31268.854 ( 0.016 ms): qemu-system-x8/24629 pwrite64(fd: 18, buf: 
0x7f63afda2000, count: 4096, pos: 2314137600) = 4096

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Wang Nan 
Link: http://lkml.kernel.org/n/tip-5n4sx1wp0ig75dwcghf9m...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---



There is racing in Makefile. I see following output:

  ...
  CC   /xx/xx/perf/1.0-r8/perf-1.0/util/callchain.o
  CC   /xx/xx/perf/1.0-r8/perf-1.0/util/values.o
  CC   /xx/xx/perf/1.0-r8/perf-1.0/util/debug.o
make[3]: *** No rule to make target 
`/xx/xx/perf/1.0-r8/perf-1.0/util/syscalltbl.o'.  Stop.

make[3]: *** Waiting for unfinished jobs
  CC   /xx/xx/perf/1.0-r8/perf-1.0/util/machine.o
  CC   /xx/xx/perf/1.0-r8/perf-1.0/util/map.o
  ...

The error disappeareafter doing 'make archheaders' before 'make all'.

Thank you.




Re: [PATCH 17/19] perf tools: Build syscall table .c header from kernel's syscall_64.tbl

2016-04-07 Thread Wangnan (F)



On 2016/4/8 4:58, Arnaldo Carvalho de Melo wrote:

From: Arnaldo Carvalho de Melo 

We used libaudit to map ids to syscall names and vice-versa, but that
imposes a delay in supporting new syscalls, having to wait for libaudit
to get those new syscalls on its tables.

To remove that delay, for x86_64 initially, grab a copy of
arch/x86/entry/syscalls/syscall_64.tbl and use it to generate those
tables.

Syscalls currently not available in audit-libs:

   # trace -e 
copy_file_range,membarrier,mlock2,pread64,pwrite64,timerfd_create,userfaultfd
   Error:   Invalid syscall copy_file_range, membarrier, mlock2, pread64, 
pwrite64, timerfd_create, userfaultfd
   Hint:try 'perf list syscalls:sys_enter_*'
   Hint:and: 'man syscalls'
   #

With this patch:

   # trace -e 
copy_file_range,membarrier,mlock2,pread64,pwrite64,timerfd_create,userfaultfd
 8505.733 ( 0.010 ms): gnome-shell/2519 timerfd_create(flags: 524288) = 36
 8506.688 ( 0.005 ms): gnome-shell/2519 timerfd_create(flags: 524288) = 40
30023.097 ( 0.025 ms): qemu-system-x8/24629 pwrite64(fd: 18, buf: 
0x7f63ae382000, count: 4096, pos: 529592320) = 4096
31268.712 ( 0.028 ms): qemu-system-x8/24629 pwrite64(fd: 18, buf: 
0x7f63afd8b000, count: 4096, pos: 2314133504) = 4096
31268.854 ( 0.016 ms): qemu-system-x8/24629 pwrite64(fd: 18, buf: 
0x7f63afda2000, count: 4096, pos: 2314137600) = 4096

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Wang Nan 
Link: http://lkml.kernel.org/n/tip-5n4sx1wp0ig75dwcghf9m...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---



There is racing in Makefile. I see following output:

  ...
  CC   /xx/xx/perf/1.0-r8/perf-1.0/util/callchain.o
  CC   /xx/xx/perf/1.0-r8/perf-1.0/util/values.o
  CC   /xx/xx/perf/1.0-r8/perf-1.0/util/debug.o
make[3]: *** No rule to make target 
`/xx/xx/perf/1.0-r8/perf-1.0/util/syscalltbl.o'.  Stop.

make[3]: *** Waiting for unfinished jobs
  CC   /xx/xx/perf/1.0-r8/perf-1.0/util/machine.o
  CC   /xx/xx/perf/1.0-r8/perf-1.0/util/map.o
  ...

The error disappeareafter doing 'make archheaders' before 'make all'.

Thank you.




Re: [PATCHSET RFC cgroup/for-4.6] cgroup, sched: implement resource group and PRIO_RGRP

2016-04-07 Thread Mike Galbraith
On Thu, 2016-04-07 at 16:23 -0400, Johannes Weiner wrote:

> Mike, is that the one you referred to with one group per customer
> account? If so, would you have a pointer to where you outline it?

The usage I loosely outlined, I did in this thread.  All of the gory
details I do not have, do not want, and could not provide if I did.

-Mike


Re: [PATCHSET RFC cgroup/for-4.6] cgroup, sched: implement resource group and PRIO_RGRP

2016-04-07 Thread Mike Galbraith
On Thu, 2016-04-07 at 16:23 -0400, Johannes Weiner wrote:

> Mike, is that the one you referred to with one group per customer
> account? If so, would you have a pointer to where you outline it?

The usage I loosely outlined, I did in this thread.  All of the gory
details I do not have, do not want, and could not provide if I did.

-Mike


Re: [PATCH] Fix issue with dmesg.py and python 3.X

2016-04-07 Thread Kieran Bingham
Hi Dom,

I've just tested your patch quickly, and it generates an error on Python 2.7


On 05/04/16 19:38, Dom Cote wrote:
> When using GDB built with python 2.7,
> 
> Inferior.read_memory (address, length)
> 
> returns a buffer object. When using GDB built with python 3.X,
> it returns a memoryview object, which cannot be added to another
> memoryview object.
> 
> Replace the addition (+) of 2 python memoryview objects
> with the addition of 2 'bytes' objects.
> 
> Create a read_memoryview() function that always return a memoryview
> object.

The change to memoryview appears to work - but I don't think it needs to
be indirected by a function definition?


> Change the read_u16 function so it doesn't need to use ord()
> anymore.

This change is separate to the memoryview object, and should be in it's
own patch. One patch should fix one thing independently.

For example, the change to memoryview object appears to be functional,
and the read_u16 is not. If these changes are in separate patches, then
working changes can be accepted sooner, and tested easier.


> Tested with Python 3.4 and gdb 7.7
> 
> Signed-off-by: Dom Cote 
> ---
>  scripts/gdb/linux/dmesg.py | 9 +
>  scripts/gdb/linux/utils.py | 9 +++--
>  2 files changed, 12 insertions(+), 6 deletions(-)
> 
> diff --git a/scripts/gdb/linux/dmesg.py b/scripts/gdb/linux/dmesg.py
> index 927d0d2a3145..96f4732157d8 100644
> --- a/scripts/gdb/linux/dmesg.py
> +++ b/scripts/gdb/linux/dmesg.py
> @@ -33,11 +33,12 @@ class LxDmesg(gdb.Command):
>  if log_first_idx < log_next_idx:
>  log_buf_2nd_half = -1
>  length = log_next_idx - log_first_idx
> -log_buf = inf.read_memory(start, length)
> +log_buf = utils.read_memoryview(inf, start, length).tobytes()

This looks like it could just call memoryview() directly ...

>  else:
>  log_buf_2nd_half = log_buf_len - log_first_idx
> -log_buf = inf.read_memory(start, log_buf_2nd_half) + \
> -inf.read_memory(log_buf_addr, log_next_idx)
> +a = utils.read_memoryview(inf, start, log_buf_2nd_half)
> +b = utils.read_memoryview(inf, log_buf_addr, log_next_idx)

Likewise here I presume ...

> +log_buf = a.tobytes() + b.tobytes()
>  
>  pos = 0
>  while pos < log_buf.__len__():
> @@ -53,7 +54,7 @@ class LxDmesg(gdb.Command):
>  text = log_buf[pos + 16:pos + 16 + text_len]
>  time_stamp = utils.read_u64(log_buf[pos:pos + 8])
>  
> -for line in memoryview(text).tobytes().splitlines():
> +for line in text.splitlines():

This looks like a separate change, not related to the bug fix?
If this is an improvement, rather than a fix - it should probably also
have it's own patch. (at first glance it looks like an improvement :D )

I just haven't seen yet if it depends on the other change.

>  gdb.write("[{time:12.6f}] {line}\n".format(
>  time=time_stamp / 10.0,
>  line=line))
> diff --git a/scripts/gdb/linux/utils.py b/scripts/gdb/linux/utils.py
> index 0893b326a28b..c2b779e7bd26 100644
> --- a/scripts/gdb/linux/utils.py
> +++ b/scripts/gdb/linux/utils.py
> @@ -87,11 +87,16 @@ def get_target_endianness():
>  return target_endianness
>  
>  
> +# Compat between GDB built with python 2.7 vs 3.X
> +def read_memoryview(inf, start, length):
> +return memoryview(inf.read_memory(start, length))

This simply always returns a memoryview object, so why not just change
the respective lines, which you have already had to modify to call/use
memoryview directly?

> +
> +
>  def read_u16(buffer):
>  if get_target_endianness() == LITTLE_ENDIAN:
> -return ord(buffer[0]) + (ord(buffer[1]) << 8)
> +return buffer[0] + (buffer[1] << 8)
>  else:
> -return ord(buffer[1]) + (ord(buffer[0]) << 8)
> +return buffer[1] + (buffer[0] << 8)

This breaks for me, but returning these lines to use ord() shows that
the memoryview changes appear to work.

What was the need for changing these lines? Does it cause a break on 3.x?

This causes the following error on 2.7 for me:

(gdb) lx-dmesg
Traceback (most recent call last):
File
"/home/linuxembedded/linaro/lkd/openst-lkd/objects/arm/qemu-arm/linux/scripts/gdb/linux/dmesg.py",
line 45, in invoke
length = utils.read_u16(log_buf[pos + 8:pos + 10])
File
"/home/linuxembedded/linaro/lkd/openst-lkd/objects/arm/qemu-arm/linux/scripts/gdb/linux/utils.py",
line 97, in read_u16
return buffer[0] + (buffer[1] << 8)
TypeError: unsupported operand type(s) for <<: 'str' and 'int'
Error occurred in Python command: unsupported operand type(s) for <<:
'str' and 'int'

>  
>  
>  def read_u32(buffer):
> 


--
Regards

Kieran


Re: [PATCH] Fix issue with dmesg.py and python 3.X

2016-04-07 Thread Kieran Bingham
Hi Dom,

I've just tested your patch quickly, and it generates an error on Python 2.7


On 05/04/16 19:38, Dom Cote wrote:
> When using GDB built with python 2.7,
> 
> Inferior.read_memory (address, length)
> 
> returns a buffer object. When using GDB built with python 3.X,
> it returns a memoryview object, which cannot be added to another
> memoryview object.
> 
> Replace the addition (+) of 2 python memoryview objects
> with the addition of 2 'bytes' objects.
> 
> Create a read_memoryview() function that always return a memoryview
> object.

The change to memoryview appears to work - but I don't think it needs to
be indirected by a function definition?


> Change the read_u16 function so it doesn't need to use ord()
> anymore.

This change is separate to the memoryview object, and should be in it's
own patch. One patch should fix one thing independently.

For example, the change to memoryview object appears to be functional,
and the read_u16 is not. If these changes are in separate patches, then
working changes can be accepted sooner, and tested easier.


> Tested with Python 3.4 and gdb 7.7
> 
> Signed-off-by: Dom Cote 
> ---
>  scripts/gdb/linux/dmesg.py | 9 +
>  scripts/gdb/linux/utils.py | 9 +++--
>  2 files changed, 12 insertions(+), 6 deletions(-)
> 
> diff --git a/scripts/gdb/linux/dmesg.py b/scripts/gdb/linux/dmesg.py
> index 927d0d2a3145..96f4732157d8 100644
> --- a/scripts/gdb/linux/dmesg.py
> +++ b/scripts/gdb/linux/dmesg.py
> @@ -33,11 +33,12 @@ class LxDmesg(gdb.Command):
>  if log_first_idx < log_next_idx:
>  log_buf_2nd_half = -1
>  length = log_next_idx - log_first_idx
> -log_buf = inf.read_memory(start, length)
> +log_buf = utils.read_memoryview(inf, start, length).tobytes()

This looks like it could just call memoryview() directly ...

>  else:
>  log_buf_2nd_half = log_buf_len - log_first_idx
> -log_buf = inf.read_memory(start, log_buf_2nd_half) + \
> -inf.read_memory(log_buf_addr, log_next_idx)
> +a = utils.read_memoryview(inf, start, log_buf_2nd_half)
> +b = utils.read_memoryview(inf, log_buf_addr, log_next_idx)

Likewise here I presume ...

> +log_buf = a.tobytes() + b.tobytes()
>  
>  pos = 0
>  while pos < log_buf.__len__():
> @@ -53,7 +54,7 @@ class LxDmesg(gdb.Command):
>  text = log_buf[pos + 16:pos + 16 + text_len]
>  time_stamp = utils.read_u64(log_buf[pos:pos + 8])
>  
> -for line in memoryview(text).tobytes().splitlines():
> +for line in text.splitlines():

This looks like a separate change, not related to the bug fix?
If this is an improvement, rather than a fix - it should probably also
have it's own patch. (at first glance it looks like an improvement :D )

I just haven't seen yet if it depends on the other change.

>  gdb.write("[{time:12.6f}] {line}\n".format(
>  time=time_stamp / 10.0,
>  line=line))
> diff --git a/scripts/gdb/linux/utils.py b/scripts/gdb/linux/utils.py
> index 0893b326a28b..c2b779e7bd26 100644
> --- a/scripts/gdb/linux/utils.py
> +++ b/scripts/gdb/linux/utils.py
> @@ -87,11 +87,16 @@ def get_target_endianness():
>  return target_endianness
>  
>  
> +# Compat between GDB built with python 2.7 vs 3.X
> +def read_memoryview(inf, start, length):
> +return memoryview(inf.read_memory(start, length))

This simply always returns a memoryview object, so why not just change
the respective lines, which you have already had to modify to call/use
memoryview directly?

> +
> +
>  def read_u16(buffer):
>  if get_target_endianness() == LITTLE_ENDIAN:
> -return ord(buffer[0]) + (ord(buffer[1]) << 8)
> +return buffer[0] + (buffer[1] << 8)
>  else:
> -return ord(buffer[1]) + (ord(buffer[0]) << 8)
> +return buffer[1] + (buffer[0] << 8)

This breaks for me, but returning these lines to use ord() shows that
the memoryview changes appear to work.

What was the need for changing these lines? Does it cause a break on 3.x?

This causes the following error on 2.7 for me:

(gdb) lx-dmesg
Traceback (most recent call last):
File
"/home/linuxembedded/linaro/lkd/openst-lkd/objects/arm/qemu-arm/linux/scripts/gdb/linux/dmesg.py",
line 45, in invoke
length = utils.read_u16(log_buf[pos + 8:pos + 10])
File
"/home/linuxembedded/linaro/lkd/openst-lkd/objects/arm/qemu-arm/linux/scripts/gdb/linux/utils.py",
line 97, in read_u16
return buffer[0] + (buffer[1] << 8)
TypeError: unsupported operand type(s) for <<: 'str' and 'int'
Error occurred in Python command: unsupported operand type(s) for <<:
'str' and 'int'

>  
>  
>  def read_u32(buffer):
> 


--
Regards

Kieran


Re: [PATCH 0/3] crypto: af_alg - add TLS type encryption

2016-04-07 Thread Tom Herbert
On Thu, Apr 7, 2016 at 11:52 PM, Herbert Xu  wrote:
> On Wed, Apr 06, 2016 at 10:56:12AM -0700, Tadeusz Struk wrote:
>>
>> The intend is to enable HW acceleration of the TLS protocol.
>> The way it will work is that the user space will send a packet of data
>> via AF_ALG and HW will authenticate and encrypt it in one go.
>
> There have been suggestions to implement TLS data-path within
> the kernel.  So we should decide whether we pursue that or go
> with your approach before we start adding algorithms.
>
Yes, please see Dave Watson's patches on this.

Tom

> Cheers,
> --
> Email: Herbert Xu 
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH 0/3] crypto: af_alg - add TLS type encryption

2016-04-07 Thread Tom Herbert
On Thu, Apr 7, 2016 at 11:52 PM, Herbert Xu  wrote:
> On Wed, Apr 06, 2016 at 10:56:12AM -0700, Tadeusz Struk wrote:
>>
>> The intend is to enable HW acceleration of the TLS protocol.
>> The way it will work is that the user space will send a packet of data
>> via AF_ALG and HW will authenticate and encrypt it in one go.
>
> There have been suggestions to implement TLS data-path within
> the kernel.  So we should decide whether we pursue that or go
> with your approach before we start adding algorithms.
>
Yes, please see Dave Watson's patches on this.

Tom

> Cheers,
> --
> Email: Herbert Xu 
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH 0/3] crypto: af_alg - add TLS type encryption

2016-04-07 Thread Herbert Xu
On Wed, Apr 06, 2016 at 10:56:12AM -0700, Tadeusz Struk wrote:
> 
> The intend is to enable HW acceleration of the TLS protocol.
> The way it will work is that the user space will send a packet of data
> via AF_ALG and HW will authenticate and encrypt it in one go.

There have been suggestions to implement TLS data-path within
the kernel.  So we should decide whether we pursue that or go
with your approach before we start adding algorithms.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH 0/3] crypto: af_alg - add TLS type encryption

2016-04-07 Thread Herbert Xu
On Wed, Apr 06, 2016 at 10:56:12AM -0700, Tadeusz Struk wrote:
> 
> The intend is to enable HW acceleration of the TLS protocol.
> The way it will work is that the user space will send a packet of data
> via AF_ALG and HW will authenticate and encrypt it in one go.

There have been suggestions to implement TLS data-path within
the kernel.  So we should decide whether we pursue that or go
with your approach before we start adding algorithms.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [RFC6 PATCH v6 00/21] ILP32 for ARM64

2016-04-07 Thread Andrew Pinski
On Thu, Apr 7, 2016 at 5:18 AM, Adam Borowski  wrote:
> On Wed, 6 Apr 2016, Geert Uytterhoeven wrote:
>> On Wed, Apr 6, 2016 at 12:08 AM, Yury Norov  
>> wrote:
>>>  v6:
>>>  - time_t, __kenel_off_t and other types turned to be 32-bit
>>>for compatibility reasons (after v5 discussion);
>
> Introducing a new arch today with y2038 problems is not a good idea.
> Linus said so with appropriately pointy words in 2011.

This is the third time we had this discussion on time_t for ILP32.  I
had originally it as 32bit, then Catalin suggested I change it to
64bit and then Arnd (with his work for 2038 issue on 32bit arch) said
ILP32 should match all other 32bit targets and the other 64bit time_t
be fixed by the current work he was working on.  Now you are
suggesting we change it again.
Arnd can you please comment more on why we want 32bit time_t instead
of the 64bit one?  I Know there was some POSIX (or was it C90)
violation but I suspect there is an easy way to workaround this inside
the kernel but the discussion to move over to 32bit time_t was already
made by the time I started to look into that.

>
>> What makes you think these "applications that can’t readily be migrated to 
>> LP64
>> because they were written assuming an ILP32 data model, and that will never
>> become suitable for a LP64 data model and will remain locked into ILP32
>> operating environments" are more likely to be fixed for y2038 later, than for
>> LP64 now?
>
> Such broken applications already have plenty of bogus architecture detection
> code so you need porting anyway...

I don't disagree here; just see below ...

>
>> We're already closer to the (future) y2038 than to the (past) introduction of
>> LP64...
>>
>> These unfixable legacy applications have been spreading through x32 to
>> the shiny new arm64 server architecture (does ppc64el also have an ILP32 
>> mode,
>> or is it planned)? Lots of resources are spent on maintaining the status quo,
>> instead of on fixing the real problems.
>
> As an x32 (userland) porter, I can tell you that time_t!=long _did_ cause
> non-trivial amounts of work.  But that work is already done (at least in
> Debian), so you might as well benefit from it.

There is actually private code out there which uses timespec and
timeval to pass time over the wire; yes I know bad coding style and
all but they did it that way.  This is code which was working for x86
and we are porting it to ARM64; a data center code by the way; not
some networking code even.  This means they have not ported the code
to fully 64bit yet and they might never.

Thanks,
Andrew

>
>
> --
> A tit a day keeps the vet away.


Re: [RFC6 PATCH v6 00/21] ILP32 for ARM64

2016-04-07 Thread Andrew Pinski
On Thu, Apr 7, 2016 at 5:18 AM, Adam Borowski  wrote:
> On Wed, 6 Apr 2016, Geert Uytterhoeven wrote:
>> On Wed, Apr 6, 2016 at 12:08 AM, Yury Norov  
>> wrote:
>>>  v6:
>>>  - time_t, __kenel_off_t and other types turned to be 32-bit
>>>for compatibility reasons (after v5 discussion);
>
> Introducing a new arch today with y2038 problems is not a good idea.
> Linus said so with appropriately pointy words in 2011.

This is the third time we had this discussion on time_t for ILP32.  I
had originally it as 32bit, then Catalin suggested I change it to
64bit and then Arnd (with his work for 2038 issue on 32bit arch) said
ILP32 should match all other 32bit targets and the other 64bit time_t
be fixed by the current work he was working on.  Now you are
suggesting we change it again.
Arnd can you please comment more on why we want 32bit time_t instead
of the 64bit one?  I Know there was some POSIX (or was it C90)
violation but I suspect there is an easy way to workaround this inside
the kernel but the discussion to move over to 32bit time_t was already
made by the time I started to look into that.

>
>> What makes you think these "applications that can’t readily be migrated to 
>> LP64
>> because they were written assuming an ILP32 data model, and that will never
>> become suitable for a LP64 data model and will remain locked into ILP32
>> operating environments" are more likely to be fixed for y2038 later, than for
>> LP64 now?
>
> Such broken applications already have plenty of bogus architecture detection
> code so you need porting anyway...

I don't disagree here; just see below ...

>
>> We're already closer to the (future) y2038 than to the (past) introduction of
>> LP64...
>>
>> These unfixable legacy applications have been spreading through x32 to
>> the shiny new arm64 server architecture (does ppc64el also have an ILP32 
>> mode,
>> or is it planned)? Lots of resources are spent on maintaining the status quo,
>> instead of on fixing the real problems.
>
> As an x32 (userland) porter, I can tell you that time_t!=long _did_ cause
> non-trivial amounts of work.  But that work is already done (at least in
> Debian), so you might as well benefit from it.

There is actually private code out there which uses timespec and
timeval to pass time over the wire; yes I know bad coding style and
all but they did it that way.  This is code which was working for x86
and we are porting it to ARM64; a data center code by the way; not
some networking code even.  This means they have not ported the code
to fully 64bit yet and they might never.

Thanks,
Andrew

>
>
> --
> A tit a day keeps the vet away.


Re: [RFC v1] mm: SLAB freelist randomization

2016-04-07 Thread Kees Cook
On Thu, Apr 7, 2016 at 2:14 PM, Jesper Dangaard Brouer
 wrote:
>
> On Wed, 6 Apr 2016 14:45:30 -0700 Kees Cook  wrote:
>
>> On Wed, Apr 6, 2016 at 12:35 PM, Thomas Garnier  wrote:
> [...]
>> > re-used on slab creation for performance.
>>
>> I'd like to see some benchmark results for this so the Kconfig can
>> include the performance characteristics. I recommend using hackbench
>> and kernel build times with a before/after comparison.
>>
>
> It looks like it only happens on init, right? (Thus must bench tools
> might not be the right choice).

Oh! Yes, you're right. I entirely missed that detail. :) 0-cost
randomization! Sounds good to me. :)

-Kees

>
> My slab tools for benchmarking the fastpath is here:
>  
> https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/mm/slab_bulk_test01.c
>
> And I also carry a version of Christoph's slab bench tool:
>  
> https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/mm/slab_test.c
>
> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   Author of http://www.iptv-analyzer.org
>   LinkedIn: http://www.linkedin.com/in/brouer



-- 
Kees Cook
Chrome OS & Brillo Security


Re: [RFC v1] mm: SLAB freelist randomization

2016-04-07 Thread Kees Cook
On Thu, Apr 7, 2016 at 2:14 PM, Jesper Dangaard Brouer
 wrote:
>
> On Wed, 6 Apr 2016 14:45:30 -0700 Kees Cook  wrote:
>
>> On Wed, Apr 6, 2016 at 12:35 PM, Thomas Garnier  wrote:
> [...]
>> > re-used on slab creation for performance.
>>
>> I'd like to see some benchmark results for this so the Kconfig can
>> include the performance characteristics. I recommend using hackbench
>> and kernel build times with a before/after comparison.
>>
>
> It looks like it only happens on init, right? (Thus must bench tools
> might not be the right choice).

Oh! Yes, you're right. I entirely missed that detail. :) 0-cost
randomization! Sounds good to me. :)

-Kees

>
> My slab tools for benchmarking the fastpath is here:
>  
> https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/mm/slab_bulk_test01.c
>
> And I also carry a version of Christoph's slab bench tool:
>  
> https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/mm/slab_test.c
>
> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   Author of http://www.iptv-analyzer.org
>   LinkedIn: http://www.linkedin.com/in/brouer



-- 
Kees Cook
Chrome OS & Brillo Security


[PATCH 1/2] kbuild: rename cmd_cc_i_c to cmd_cpp_i_c

2016-04-07 Thread Masahiro Yamada
This command just preprocesses .c files into .i files, so cmd_cpp_i_c
seems more suitable.

Signed-off-by: Masahiro Yamada 
---

 scripts/Makefile.build | 6 +++---
 tools/build/Makefile.build | 8 
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index e1bc190..7e4df13 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -152,11 +152,11 @@ cmd_cc_s_c   = $(CC) $(c_flags) $(DISABLE_LTO) 
-fverbose-asm -S -o $@ $<
 $(obj)/%.s: $(src)/%.c FORCE
$(call if_changed_dep,cc_s_c)
 
-quiet_cmd_cc_i_c = CPP $(quiet_modtag) $@
-cmd_cc_i_c   = $(CPP) $(c_flags)   -o $@ $<
+quiet_cmd_cpp_i_c = CPP $(quiet_modtag) $@
+cmd_cpp_i_c   = $(CPP) $(c_flags) -o $@ $<
 
 $(obj)/%.i: $(src)/%.c FORCE
-   $(call if_changed_dep,cc_i_c)
+   $(call if_changed_dep,cpp_i_c)
 
 cmd_gensymtypes =   \
 $(CPP) -D__GENKSYMS__ $(c_flags) $< |   \
diff --git a/tools/build/Makefile.build b/tools/build/Makefile.build
index ee566e8..27f3583 100644
--- a/tools/build/Makefile.build
+++ b/tools/build/Makefile.build
@@ -58,8 +58,8 @@ quiet_cmd_mkdir = MKDIR$(dir $@)
 quiet_cmd_cc_o_c = CC   $@
   cmd_cc_o_c = $(CC) $(c_flags) -c -o $@ $<
 
-quiet_cmd_cc_i_c = CPP  $@
-  cmd_cc_i_c = $(CC) $(c_flags) -E -o $@ $<
+quiet_cmd_cpp_i_c = CPP  $@
+  cmd_cpp_i_c = $(CC) $(c_flags) -E -o $@ $<
 
 quiet_cmd_cc_s_c = AS   $@
   cmd_cc_s_c = $(CC) $(c_flags) -S -o $@ $<
@@ -83,11 +83,11 @@ $(OUTPUT)%.o: %.S FORCE
 
 $(OUTPUT)%.i: %.c FORCE
$(call rule_mkdir)
-   $(call if_changed_dep,cc_i_c)
+   $(call if_changed_dep,cpp_i_c)
 
 $(OUTPUT)%.s: %.S FORCE
$(call rule_mkdir)
-   $(call if_changed_dep,cc_i_c)
+   $(call if_changed_dep,cpp_i_c)
 
 $(OUTPUT)%.s: %.c FORCE
$(call rule_mkdir)
-- 
1.9.1



[PATCH 1/2] kbuild: rename cmd_cc_i_c to cmd_cpp_i_c

2016-04-07 Thread Masahiro Yamada
This command just preprocesses .c files into .i files, so cmd_cpp_i_c
seems more suitable.

Signed-off-by: Masahiro Yamada 
---

 scripts/Makefile.build | 6 +++---
 tools/build/Makefile.build | 8 
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index e1bc190..7e4df13 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -152,11 +152,11 @@ cmd_cc_s_c   = $(CC) $(c_flags) $(DISABLE_LTO) 
-fverbose-asm -S -o $@ $<
 $(obj)/%.s: $(src)/%.c FORCE
$(call if_changed_dep,cc_s_c)
 
-quiet_cmd_cc_i_c = CPP $(quiet_modtag) $@
-cmd_cc_i_c   = $(CPP) $(c_flags)   -o $@ $<
+quiet_cmd_cpp_i_c = CPP $(quiet_modtag) $@
+cmd_cpp_i_c   = $(CPP) $(c_flags) -o $@ $<
 
 $(obj)/%.i: $(src)/%.c FORCE
-   $(call if_changed_dep,cc_i_c)
+   $(call if_changed_dep,cpp_i_c)
 
 cmd_gensymtypes =   \
 $(CPP) -D__GENKSYMS__ $(c_flags) $< |   \
diff --git a/tools/build/Makefile.build b/tools/build/Makefile.build
index ee566e8..27f3583 100644
--- a/tools/build/Makefile.build
+++ b/tools/build/Makefile.build
@@ -58,8 +58,8 @@ quiet_cmd_mkdir = MKDIR$(dir $@)
 quiet_cmd_cc_o_c = CC   $@
   cmd_cc_o_c = $(CC) $(c_flags) -c -o $@ $<
 
-quiet_cmd_cc_i_c = CPP  $@
-  cmd_cc_i_c = $(CC) $(c_flags) -E -o $@ $<
+quiet_cmd_cpp_i_c = CPP  $@
+  cmd_cpp_i_c = $(CC) $(c_flags) -E -o $@ $<
 
 quiet_cmd_cc_s_c = AS   $@
   cmd_cc_s_c = $(CC) $(c_flags) -S -o $@ $<
@@ -83,11 +83,11 @@ $(OUTPUT)%.o: %.S FORCE
 
 $(OUTPUT)%.i: %.c FORCE
$(call rule_mkdir)
-   $(call if_changed_dep,cc_i_c)
+   $(call if_changed_dep,cpp_i_c)
 
 $(OUTPUT)%.s: %.S FORCE
$(call rule_mkdir)
-   $(call if_changed_dep,cc_i_c)
+   $(call if_changed_dep,cpp_i_c)
 
 $(OUTPUT)%.s: %.c FORCE
$(call rule_mkdir)
-- 
1.9.1



[PATCH 2/2] kbuild: rename cmd_as_s_S to cmd_cpp_s_S

2016-04-07 Thread Masahiro Yamada
This command just preprocesses .S files into .s files, so cmd_cpp_s_S
seems more suitable.

Signed-off-by: Masahiro Yamada 
---

 scripts/Makefile.build | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index 7e4df13..e65a3e9 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -314,11 +314,11 @@ modkern_aflags := $(KBUILD_AFLAGS_KERNEL) $(AFLAGS_KERNEL)
 $(real-objs-m)  : modkern_aflags := $(KBUILD_AFLAGS_MODULE) 
$(AFLAGS_MODULE)
 $(real-objs-m:.o=.s): modkern_aflags := $(KBUILD_AFLAGS_MODULE) 
$(AFLAGS_MODULE)
 
-quiet_cmd_as_s_S = CPP $(quiet_modtag) $@
-cmd_as_s_S   = $(CPP) $(a_flags)   -o $@ $<
+quiet_cmd_cpp_s_S = CPP $(quiet_modtag) $@
+cmd_cpp_s_S   = $(CPP) $(a_flags) -o $@ $<
 
 $(obj)/%.s: $(src)/%.S FORCE
-   $(call if_changed_dep,as_s_S)
+   $(call if_changed_dep,cpp_s_S)
 
 quiet_cmd_as_o_S = AS $(quiet_modtag)  $@
 cmd_as_o_S   = $(CC) $(a_flags) -c -o $@ $<
-- 
1.9.1



[PATCH 2/2] kbuild: rename cmd_as_s_S to cmd_cpp_s_S

2016-04-07 Thread Masahiro Yamada
This command just preprocesses .S files into .s files, so cmd_cpp_s_S
seems more suitable.

Signed-off-by: Masahiro Yamada 
---

 scripts/Makefile.build | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index 7e4df13..e65a3e9 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -314,11 +314,11 @@ modkern_aflags := $(KBUILD_AFLAGS_KERNEL) $(AFLAGS_KERNEL)
 $(real-objs-m)  : modkern_aflags := $(KBUILD_AFLAGS_MODULE) 
$(AFLAGS_MODULE)
 $(real-objs-m:.o=.s): modkern_aflags := $(KBUILD_AFLAGS_MODULE) 
$(AFLAGS_MODULE)
 
-quiet_cmd_as_s_S = CPP $(quiet_modtag) $@
-cmd_as_s_S   = $(CPP) $(a_flags)   -o $@ $<
+quiet_cmd_cpp_s_S = CPP $(quiet_modtag) $@
+cmd_cpp_s_S   = $(CPP) $(a_flags) -o $@ $<
 
 $(obj)/%.s: $(src)/%.S FORCE
-   $(call if_changed_dep,as_s_S)
+   $(call if_changed_dep,cpp_s_S)
 
 quiet_cmd_as_o_S = AS $(quiet_modtag)  $@
 cmd_as_o_S   = $(CC) $(a_flags) -c -o $@ $<
-- 
1.9.1



Re: [PATCH] mtd: nand: s3c2410: fix bug in s3c2410_nand_correct_data()

2016-04-07 Thread Boris Brezillon
On Fri, 8 Apr 2016 09:51:04 +0800
Zeng Zhaoxiu  wrote:

> 
> 
> 在 2016年04月08日 08:18, Boris Brezillon 写道:
> > Hi Zeng,
> >
> > On Fri,  8 Apr 2016 00:48:17 +0800
> > zengzhao...@163.com wrote:
> >
> >> From: Zeng Zhaoxiu 
> >>
> >> If there is only one bit difference in the ECC, the function should return 
> >> 1.
> >> The result of "diff0 & ~(1< >> actually returns -1.
> >>
> >> Here, we can use the simple expression "(diff0 & (diff0 - 1)) == 0" to 
> >> determine
> >> whether the diff0 has only one 1-bit.
> > Missing Signed-off-by here.
> >
> >> ---
> >>   drivers/mtd/nand/s3c2410.c | 2 +-
> >>   1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/mtd/nand/s3c2410.c b/drivers/mtd/nand/s3c2410.c
> >> index 9c9397b..c9698cf 100644
> >> --- a/drivers/mtd/nand/s3c2410.c
> >> +++ b/drivers/mtd/nand/s3c2410.c
> >> @@ -542,7 +542,7 @@ static int s3c2410_nand_correct_data(struct mtd_info 
> >> *mtd, u_char *dat,
> >>diff0 |= (diff1 << 8);
> >>diff0 |= (diff2 << 16);
> >>   
> >> -  if ((diff0 & ~(1< >> +  if ((diff0 & (diff0 - 1)) == 0)
> > Or just
> >
> > if (hweight_long((unsigned long)diff0) == 1)
> >
> > which is doing exactly what the comment says.
> >
> > BTW, I don't understand why the current code is wrong? To me, it seems
> > it's correctly detecting the case where only a single bit is different.
> > What are you trying to fix exactly?
> >
> > Best Regards,
> >
> > Boris
> >
> 
> For example, assuming diff0 is 1, then fls(diff0) is equal to 1, then "~(1 << 
> fls(diff0))" is equal to 0xfffd,
> then the result of "(diff0 & ~(1 << fls(diff0)))" is 1 , not we expected 0.
> 
> __fls(diff0) and "(fls(diff0) - 1)" are all right, but fls(diff0) is wrong.
> 

Indeed, I forgot that fls() was returning (position + 1). Anyway, I
still think using hweight clarifies what you really want to test.

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com


Re: [PATCH] mtd: nand: s3c2410: fix bug in s3c2410_nand_correct_data()

2016-04-07 Thread Boris Brezillon
On Fri, 8 Apr 2016 09:51:04 +0800
Zeng Zhaoxiu  wrote:

> 
> 
> 在 2016年04月08日 08:18, Boris Brezillon 写道:
> > Hi Zeng,
> >
> > On Fri,  8 Apr 2016 00:48:17 +0800
> > zengzhao...@163.com wrote:
> >
> >> From: Zeng Zhaoxiu 
> >>
> >> If there is only one bit difference in the ECC, the function should return 
> >> 1.
> >> The result of "diff0 & ~(1< >> actually returns -1.
> >>
> >> Here, we can use the simple expression "(diff0 & (diff0 - 1)) == 0" to 
> >> determine
> >> whether the diff0 has only one 1-bit.
> > Missing Signed-off-by here.
> >
> >> ---
> >>   drivers/mtd/nand/s3c2410.c | 2 +-
> >>   1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/mtd/nand/s3c2410.c b/drivers/mtd/nand/s3c2410.c
> >> index 9c9397b..c9698cf 100644
> >> --- a/drivers/mtd/nand/s3c2410.c
> >> +++ b/drivers/mtd/nand/s3c2410.c
> >> @@ -542,7 +542,7 @@ static int s3c2410_nand_correct_data(struct mtd_info 
> >> *mtd, u_char *dat,
> >>diff0 |= (diff1 << 8);
> >>diff0 |= (diff2 << 16);
> >>   
> >> -  if ((diff0 & ~(1< >> +  if ((diff0 & (diff0 - 1)) == 0)
> > Or just
> >
> > if (hweight_long((unsigned long)diff0) == 1)
> >
> > which is doing exactly what the comment says.
> >
> > BTW, I don't understand why the current code is wrong? To me, it seems
> > it's correctly detecting the case where only a single bit is different.
> > What are you trying to fix exactly?
> >
> > Best Regards,
> >
> > Boris
> >
> 
> For example, assuming diff0 is 1, then fls(diff0) is equal to 1, then "~(1 << 
> fls(diff0))" is equal to 0xfffd,
> then the result of "(diff0 & ~(1 << fls(diff0)))" is 1 , not we expected 0.
> 
> __fls(diff0) and "(fls(diff0) - 1)" are all right, but fls(diff0) is wrong.
> 

Indeed, I forgot that fls() was returning (position + 1). Anyway, I
still think using hweight clarifies what you really want to test.

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com


[PATCH] kbuild: drop redundant "PHONY += FORCE"

2016-04-07 Thread Masahiro Yamada
"PHONY += FORCE" is already cared by scripts/Makefile.build,
which these files are included from.

Signed-off-by: Masahiro Yamada 
---

Please apply the following to avoid conflicts.
https://patchwork.kernel.org/patch/8572451/


 arch/arm/boot/bootp/Makefile | 2 +-
 arch/ia64/Makefile   | 2 +-
 arch/unicore32/boot/Makefile | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm/boot/bootp/Makefile b/arch/arm/boot/bootp/Makefile
index 52a543b..5e4acd2 100644
--- a/arch/arm/boot/bootp/Makefile
+++ b/arch/arm/boot/bootp/Makefile
@@ -25,4 +25,4 @@ $(obj)/kernel.o: arch/arm/boot/zImage FORCE
 
 $(obj)/initrd.o: $(INITRD) FORCE
 
-PHONY += $(INITRD) FORCE
+PHONY += $(INITRD)
diff --git a/arch/ia64/Makefile b/arch/ia64/Makefile
index 648f1ce..c100d78 100644
--- a/arch/ia64/Makefile
+++ b/arch/ia64/Makefile
@@ -96,7 +96,7 @@ define archhelp
 endef
 
 archprepare: make_nr_irqs_h
-PHONY += make_nr_irqs_h FORCE
+PHONY += make_nr_irqs_h
 
 make_nr_irqs_h:
$(Q)$(MAKE) $(build)=arch/ia64/kernel include/generated/nr-irqs.h
diff --git a/arch/unicore32/boot/Makefile b/arch/unicore32/boot/Makefile
index ec7fb70..8288550 100644
--- a/arch/unicore32/boot/Makefile
+++ b/arch/unicore32/boot/Makefile
@@ -31,7 +31,7 @@ $(obj)/uImage: $(obj)/zImage FORCE
$(call if_changed,uimage)
@echo '  Image $@ is ready'
 
-PHONY += initrd FORCE
+PHONY += initrd
 initrd:
@test "$(INITRD)" != "" || \
(echo You must specify INITRD; exit -1)
-- 
1.9.1



[PATCH] kbuild: drop redundant "PHONY += FORCE"

2016-04-07 Thread Masahiro Yamada
"PHONY += FORCE" is already cared by scripts/Makefile.build,
which these files are included from.

Signed-off-by: Masahiro Yamada 
---

Please apply the following to avoid conflicts.
https://patchwork.kernel.org/patch/8572451/


 arch/arm/boot/bootp/Makefile | 2 +-
 arch/ia64/Makefile   | 2 +-
 arch/unicore32/boot/Makefile | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm/boot/bootp/Makefile b/arch/arm/boot/bootp/Makefile
index 52a543b..5e4acd2 100644
--- a/arch/arm/boot/bootp/Makefile
+++ b/arch/arm/boot/bootp/Makefile
@@ -25,4 +25,4 @@ $(obj)/kernel.o: arch/arm/boot/zImage FORCE
 
 $(obj)/initrd.o: $(INITRD) FORCE
 
-PHONY += $(INITRD) FORCE
+PHONY += $(INITRD)
diff --git a/arch/ia64/Makefile b/arch/ia64/Makefile
index 648f1ce..c100d78 100644
--- a/arch/ia64/Makefile
+++ b/arch/ia64/Makefile
@@ -96,7 +96,7 @@ define archhelp
 endef
 
 archprepare: make_nr_irqs_h
-PHONY += make_nr_irqs_h FORCE
+PHONY += make_nr_irqs_h
 
 make_nr_irqs_h:
$(Q)$(MAKE) $(build)=arch/ia64/kernel include/generated/nr-irqs.h
diff --git a/arch/unicore32/boot/Makefile b/arch/unicore32/boot/Makefile
index ec7fb70..8288550 100644
--- a/arch/unicore32/boot/Makefile
+++ b/arch/unicore32/boot/Makefile
@@ -31,7 +31,7 @@ $(obj)/uImage: $(obj)/zImage FORCE
$(call if_changed,uimage)
@echo '  Image $@ is ready'
 
-PHONY += initrd FORCE
+PHONY += initrd
 initrd:
@test "$(INITRD)" != "" || \
(echo You must specify INITRD; exit -1)
-- 
1.9.1



RE: [PATCH v8 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-04-07 Thread Dexuan Cui
> From: Joe Perches [mailto:j...@perches.com]
> Sent: Friday, April 8, 2016 9:15
> On Thu, 2016-04-07 at 18:36 -0700, Dexuan Cui wrote:
> > diff --git a/include/net/af_hvsock.h b/include/net/af_hvsock.h
> []
> > +#define VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV (5 * PAGE_SIZE)
> > +#define VMBUS_RINGBUFFER_SIZE_HVSOCK_SEND (5 * PAGE_SIZE)
> > +
> > +#define HVSOCK_RCV_BUF_SZ
>   VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV
> > +#define HVSOCK_SND_BUF_SZ  PAGE_SIZE
> []
> > +struct hvsock_sock {
> []
> > +   struct {
> > +   struct vmpipe_proto_header hdr;
> > +   char buf[HVSOCK_SND_BUF_SZ];
> > +   } __packed send;
> > +
> > +   struct {
> > +   struct vmpipe_proto_header hdr;
> > +   char buf[HVSOCK_RCV_BUF_SZ];
> > +   unsigned int data_len;
> > +   unsigned int data_offset;
> > +   } __packed recv;
> > +};
> 
> These bufs are not page aligned and so can span pages.
> 
> Is there any value in allocating these bufs separately
> as pages instead of as a kmalloc?

The bufs are not required to be page aligned.
Here the 'hdr' and the 'buf' must be consecutive, i.e., the 'buf' must be
an array rather than a pointer: please see hvsock_send_data().

It looks to me there is no big value to make sure the 'buf' is page
aligned: on x86_64, at least it should already be 8-byte aligned due to the
adjacent channel pointer, so memcpy_from_msg() should work
enough good and in hvsock_send_data() -> vmbus_sendpacket(),
we don't copy the 'buf'.

Thanks,
-- Dexuan


RE: [PATCH v8 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-04-07 Thread Dexuan Cui
> From: Joe Perches [mailto:j...@perches.com]
> Sent: Friday, April 8, 2016 9:15
> On Thu, 2016-04-07 at 18:36 -0700, Dexuan Cui wrote:
> > diff --git a/include/net/af_hvsock.h b/include/net/af_hvsock.h
> []
> > +#define VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV (5 * PAGE_SIZE)
> > +#define VMBUS_RINGBUFFER_SIZE_HVSOCK_SEND (5 * PAGE_SIZE)
> > +
> > +#define HVSOCK_RCV_BUF_SZ
>   VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV
> > +#define HVSOCK_SND_BUF_SZ  PAGE_SIZE
> []
> > +struct hvsock_sock {
> []
> > +   struct {
> > +   struct vmpipe_proto_header hdr;
> > +   char buf[HVSOCK_SND_BUF_SZ];
> > +   } __packed send;
> > +
> > +   struct {
> > +   struct vmpipe_proto_header hdr;
> > +   char buf[HVSOCK_RCV_BUF_SZ];
> > +   unsigned int data_len;
> > +   unsigned int data_offset;
> > +   } __packed recv;
> > +};
> 
> These bufs are not page aligned and so can span pages.
> 
> Is there any value in allocating these bufs separately
> as pages instead of as a kmalloc?

The bufs are not required to be page aligned.
Here the 'hdr' and the 'buf' must be consecutive, i.e., the 'buf' must be
an array rather than a pointer: please see hvsock_send_data().

It looks to me there is no big value to make sure the 'buf' is page
aligned: on x86_64, at least it should already be 8-byte aligned due to the
adjacent channel pointer, so memcpy_from_msg() should work
enough good and in hvsock_send_data() -> vmbus_sendpacket(),
we don't copy the 'buf'.

Thanks,
-- Dexuan


Re: [RFC PATCH 0/3] restartable sequences v2: fast user-space percpu critical sections

2016-04-07 Thread Mathieu Desnoyers
- On Apr 7, 2016, at 9:21 PM, Andy Lutomirski l...@amacapital.net wrote:

> On Thu, Apr 7, 2016 at 6:11 PM, Mathieu Desnoyers
>  wrote:
>> - On Apr 7, 2016, at 6:05 PM, Andy Lutomirski l...@amacapital.net wrote:
>>
>>> On Thu, Apr 7, 2016 at 1:11 PM, Peter Zijlstra  wrote:
 On Thu, Apr 07, 2016 at 09:43:33AM -0700, Andy Lutomirski wrote:
>> [...]

> it's inherently debuggable,

 It is more debuggable, agreed.

> and it allows multiple independent
> rseq-protected things to coexist without forcing each other to abort.
>>
>> [...]
>>
>> My understanding is that the main goal of this rather more complex
>> proposal is to make interaction with debuggers more straightforward in
>> cases of single-stepping through the rseq critical section.
> 
> The things I like about my proposal are both that you can single-step
> through it just like any other code as long as you pin the thread to a
> CPU and that it doesn't make preemption magical.  (Of course, you can
> *force* it to do something on resume and/or preemption by sticking a
> bogus value in the expected event count field, but that's not the
> intended use.  Hmm, I guess it does need to hook preemption and/or
> resume for all processes that enable the thing so it can know to check
> for an enabled post_commit_rip, just like all the other proposals.)
> 
> Also, mine lets you have a fairly long-running critical section that
> doesn't get aborted under heavy load and can interleave with other
> critical sections that don't conflict.

Yes, those would be nice advantages. I'll have to do a few more
pseudo-code and execution scenarios to get a better understanding of
your idea.

> 
>>
>> I recently came up with a scheme that should allow us to handle such
>> situations in a fashion similar to debuggers handling ll/sc
>> restartable sequences of instructions on e.g. powerpc. The good news
>> is that my scheme does not require anything at the kernel level.
>>
>> The idea is simple: the userspace rseq critical sections now
>> become marked by 3 inline functions (rather than 2 in Paul's proposal):
>>
>> rseq_start(void *rseq_key)
>> rseq_finish(void *rseq_key)
>> rseq_abort(void *rseq_key)
> 
> How do you use this thing?  What are its semantics?

You define one rseq_key variable (dummy 1 byte variable, can be an
empty structure) for each rseq critical section you have in your
program.

A rseq critical section will typically have one entry point (rseq_start),
and one exit point (rseq_finish). I'm saying "typically" because there
may be more than one entry point, and more than one exit point per
critical section.

Entry and exit points mark the beginning and end of each rseq critical
section. rseq_start loads the sequence counter from the TLS and copies
it onto the stack. It then gets passed to rseq_finish() to be compared
with the final seqnum TLS value just before the commit. rseq_finish is
the one responsible for storing into the post_commit_instr field of the
TLS and populating rcx with the failure insn label address. rseq_finish()
does the commit.

And there is rseq_abort(), which would need to be called if we just want
to exit from a rseq critical section without doing the commit (no matching
call to rseq_finish after a rseq_start).

Each of rseq_start, finish, and abort would need to receive a pointer
to the rseq_key as parameter.

rseq_start would return the sequence number read from the TLS.

rseq_finish would also receive as parameter that sequence number that has
been returned by rseq_start.

Does it make sense ?

Thanks,

Mathieu


> 
> --Andy
> 
>>
>> We associate each critical section with a unique "key" (dummy
>> 1 byte object in the process address space), so we can group
>> them. The new "rseq_abort" would mark exit points that would
>> exit the critical section without executing the final commit
>> instruction.
>>
>> Within each of rseq_start, rseq_finish and rseq_abort,
>> we declare a non-loadable section that gets populated
>> with the following tuples:
>>
>> (RSEQ_TYPE, insn address, rseq_key)
>>
>> Where RSEQ_TYPE is either RSEQ_START, RSEQ_FINISH, or RSEQ_ABORT.
>>
>> That special section would be found in the executable by the
>> debugger, which can then skip over entire restartable critical
>> sections when it encounters them by placing breakpoints at
>> all exit points (finish and cancel) associated to the same
>> rseq_key as the entry point (start).
>>
>> This way we don't need to complexify the runtime code, neither
>> at kernel nor user-space level, and we get debuggability using
>> a trick similar to what ll/sc architectures already need to do.
>>
>> Of course, this requires extending gdb, which should not be
>> a show-stopper.
>>
>> Thoughts ?
>>
>> Thanks,
>>
>> Mathieu
>>
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
>> http://www.efficios.com
> 
> 
> 
> --
> Andy Lutomirski
> AMA Capital Management, LLC

-- 
Mathieu Desnoyers
EfficiOS Inc.

Re: [RFC PATCH 0/3] restartable sequences v2: fast user-space percpu critical sections

2016-04-07 Thread Mathieu Desnoyers
- On Apr 7, 2016, at 9:21 PM, Andy Lutomirski l...@amacapital.net wrote:

> On Thu, Apr 7, 2016 at 6:11 PM, Mathieu Desnoyers
>  wrote:
>> - On Apr 7, 2016, at 6:05 PM, Andy Lutomirski l...@amacapital.net wrote:
>>
>>> On Thu, Apr 7, 2016 at 1:11 PM, Peter Zijlstra  wrote:
 On Thu, Apr 07, 2016 at 09:43:33AM -0700, Andy Lutomirski wrote:
>> [...]

> it's inherently debuggable,

 It is more debuggable, agreed.

> and it allows multiple independent
> rseq-protected things to coexist without forcing each other to abort.
>>
>> [...]
>>
>> My understanding is that the main goal of this rather more complex
>> proposal is to make interaction with debuggers more straightforward in
>> cases of single-stepping through the rseq critical section.
> 
> The things I like about my proposal are both that you can single-step
> through it just like any other code as long as you pin the thread to a
> CPU and that it doesn't make preemption magical.  (Of course, you can
> *force* it to do something on resume and/or preemption by sticking a
> bogus value in the expected event count field, but that's not the
> intended use.  Hmm, I guess it does need to hook preemption and/or
> resume for all processes that enable the thing so it can know to check
> for an enabled post_commit_rip, just like all the other proposals.)
> 
> Also, mine lets you have a fairly long-running critical section that
> doesn't get aborted under heavy load and can interleave with other
> critical sections that don't conflict.

Yes, those would be nice advantages. I'll have to do a few more
pseudo-code and execution scenarios to get a better understanding of
your idea.

> 
>>
>> I recently came up with a scheme that should allow us to handle such
>> situations in a fashion similar to debuggers handling ll/sc
>> restartable sequences of instructions on e.g. powerpc. The good news
>> is that my scheme does not require anything at the kernel level.
>>
>> The idea is simple: the userspace rseq critical sections now
>> become marked by 3 inline functions (rather than 2 in Paul's proposal):
>>
>> rseq_start(void *rseq_key)
>> rseq_finish(void *rseq_key)
>> rseq_abort(void *rseq_key)
> 
> How do you use this thing?  What are its semantics?

You define one rseq_key variable (dummy 1 byte variable, can be an
empty structure) for each rseq critical section you have in your
program.

A rseq critical section will typically have one entry point (rseq_start),
and one exit point (rseq_finish). I'm saying "typically" because there
may be more than one entry point, and more than one exit point per
critical section.

Entry and exit points mark the beginning and end of each rseq critical
section. rseq_start loads the sequence counter from the TLS and copies
it onto the stack. It then gets passed to rseq_finish() to be compared
with the final seqnum TLS value just before the commit. rseq_finish is
the one responsible for storing into the post_commit_instr field of the
TLS and populating rcx with the failure insn label address. rseq_finish()
does the commit.

And there is rseq_abort(), which would need to be called if we just want
to exit from a rseq critical section without doing the commit (no matching
call to rseq_finish after a rseq_start).

Each of rseq_start, finish, and abort would need to receive a pointer
to the rseq_key as parameter.

rseq_start would return the sequence number read from the TLS.

rseq_finish would also receive as parameter that sequence number that has
been returned by rseq_start.

Does it make sense ?

Thanks,

Mathieu


> 
> --Andy
> 
>>
>> We associate each critical section with a unique "key" (dummy
>> 1 byte object in the process address space), so we can group
>> them. The new "rseq_abort" would mark exit points that would
>> exit the critical section without executing the final commit
>> instruction.
>>
>> Within each of rseq_start, rseq_finish and rseq_abort,
>> we declare a non-loadable section that gets populated
>> with the following tuples:
>>
>> (RSEQ_TYPE, insn address, rseq_key)
>>
>> Where RSEQ_TYPE is either RSEQ_START, RSEQ_FINISH, or RSEQ_ABORT.
>>
>> That special section would be found in the executable by the
>> debugger, which can then skip over entire restartable critical
>> sections when it encounters them by placing breakpoints at
>> all exit points (finish and cancel) associated to the same
>> rseq_key as the entry point (start).
>>
>> This way we don't need to complexify the runtime code, neither
>> at kernel nor user-space level, and we get debuggability using
>> a trick similar to what ll/sc architectures already need to do.
>>
>> Of course, this requires extending gdb, which should not be
>> a show-stopper.
>>
>> Thoughts ?
>>
>> Thanks,
>>
>> Mathieu
>>
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
>> http://www.efficios.com
> 
> 
> 
> --
> Andy Lutomirski
> AMA Capital Management, LLC

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: [PATCH v2] clk: let clk_disable() return immediately if clk is NULL or error

2016-04-07 Thread Masahiro Yamada
Hi Stephen,


2016-04-08 9:33 GMT+09:00 Stephen Boyd :
> On 04/05, Masahiro Yamada wrote:
>> The clk_disable() in the common clock framework (drivers/clk/clk.c)
>> returns immediately if a given clk is NULL or an error pointer.  It
>> allows clock consumers to call clk_disable() without IS_ERR_OR_NULL
>> checking if drivers are only used with the common clock framework.
>>
>> Unfortunately, NULL/error checking is missing from some of non-common
>> clk_disable() implementations.  This prevents us from completely
>> dropping NULL/error checking from callers.  Let's make it tree-wide
>> consistent by adding IS_ERR_OR_NULL(clk) to all callees.
>>
>> Signed-off-by: Masahiro Yamada 
>> Acked-by: Greg Ungerer 
>> Acked-by: Wan Zongshun 
>> ---
>>
>> Stephen,
>>
>> This patch has been unapplied for a long time.
>>
>> Please let me know if there is something wrong with this patch.
>>
>
> I'm mostly confused why we wouldn't want to encourage people to
> call clk_disable or unprepare on a clk that's an error pointer.
> Typically an error pointer should be dealt with, instead of
> silently ignored, so why wasn't it dealt with by passing it up
> the probe() path?
>


This makes our driver programming life easier.


For example, let's see drivers/tty/serial/8250/8250_of.c


The "clock-frequency" DT property takes precedence over "clocks" property.
So, it is valid to probe the driver with a NULL pointer for info->clk.


if (of_property_read_u32(np, "clock-frequency", )) {

/* Get clk rate through clk driver if present */
info->clk = devm_clk_get(>dev, NULL);
if (IS_ERR(info->clk)) {
dev_warn(>dev,
"clk or clock-frequency not defined\n");
return PTR_ERR(info->clk);
}

ret = clk_prepare_enable(info->clk);
if (ret < 0)
return ret;

clk = clk_get_rate(info->clk);
}


As a result, we need to make sure the clk pointer is valid
before calling clk_disable_unprepare().


If we could support pointer checking in callees, we would be able to
clean-up lots of clock consumers.



-- 
Best Regards
Masahiro Yamada


Re: [PATCH v2] clk: let clk_disable() return immediately if clk is NULL or error

2016-04-07 Thread Masahiro Yamada
Hi Stephen,


2016-04-08 9:33 GMT+09:00 Stephen Boyd :
> On 04/05, Masahiro Yamada wrote:
>> The clk_disable() in the common clock framework (drivers/clk/clk.c)
>> returns immediately if a given clk is NULL or an error pointer.  It
>> allows clock consumers to call clk_disable() without IS_ERR_OR_NULL
>> checking if drivers are only used with the common clock framework.
>>
>> Unfortunately, NULL/error checking is missing from some of non-common
>> clk_disable() implementations.  This prevents us from completely
>> dropping NULL/error checking from callers.  Let's make it tree-wide
>> consistent by adding IS_ERR_OR_NULL(clk) to all callees.
>>
>> Signed-off-by: Masahiro Yamada 
>> Acked-by: Greg Ungerer 
>> Acked-by: Wan Zongshun 
>> ---
>>
>> Stephen,
>>
>> This patch has been unapplied for a long time.
>>
>> Please let me know if there is something wrong with this patch.
>>
>
> I'm mostly confused why we wouldn't want to encourage people to
> call clk_disable or unprepare on a clk that's an error pointer.
> Typically an error pointer should be dealt with, instead of
> silently ignored, so why wasn't it dealt with by passing it up
> the probe() path?
>


This makes our driver programming life easier.


For example, let's see drivers/tty/serial/8250/8250_of.c


The "clock-frequency" DT property takes precedence over "clocks" property.
So, it is valid to probe the driver with a NULL pointer for info->clk.


if (of_property_read_u32(np, "clock-frequency", )) {

/* Get clk rate through clk driver if present */
info->clk = devm_clk_get(>dev, NULL);
if (IS_ERR(info->clk)) {
dev_warn(>dev,
"clk or clock-frequency not defined\n");
return PTR_ERR(info->clk);
}

ret = clk_prepare_enable(info->clk);
if (ret < 0)
return ret;

clk = clk_get_rate(info->clk);
}


As a result, we need to make sure the clk pointer is valid
before calling clk_disable_unprepare().


If we could support pointer checking in callees, we would be able to
clean-up lots of clock consumers.



-- 
Best Regards
Masahiro Yamada


Re: [PATCH] mtd: nand: s3c2410: fix bug in s3c2410_nand_correct_data()

2016-04-07 Thread Zeng Zhaoxiu



在 2016年04月08日 08:18, Boris Brezillon 写道:

Hi Zeng,

On Fri,  8 Apr 2016 00:48:17 +0800
zengzhao...@163.com wrote:


From: Zeng Zhaoxiu 

If there is only one bit difference in the ECC, the function should return 1.
The result of "diff0 & ~(1<

Re: [PATCH] mtd: nand: s3c2410: fix bug in s3c2410_nand_correct_data()

2016-04-07 Thread Zeng Zhaoxiu



在 2016年04月08日 08:18, Boris Brezillon 写道:

Hi Zeng,

On Fri,  8 Apr 2016 00:48:17 +0800
zengzhao...@163.com wrote:


From: Zeng Zhaoxiu 

If there is only one bit difference in the ECC, the function should return 1.
The result of "diff0 & ~(1<
Missing Signed-off-by here.


---
  drivers/mtd/nand/s3c2410.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mtd/nand/s3c2410.c b/drivers/mtd/nand/s3c2410.c
index 9c9397b..c9698cf 100644
--- a/drivers/mtd/nand/s3c2410.c
+++ b/drivers/mtd/nand/s3c2410.c
@@ -542,7 +542,7 @@ static int s3c2410_nand_correct_data(struct mtd_info *mtd, 
u_char *dat,
diff0 |= (diff1 << 8);
diff0 |= (diff2 << 16);
  
-	if ((diff0 & ~(1<
+   if ((diff0 & (diff0 - 1)) == 0)

Or just

if (hweight_long((unsigned long)diff0) == 1)

which is doing exactly what the comment says.

BTW, I don't understand why the current code is wrong? To me, it seems
it's correctly detecting the case where only a single bit is different.
What are you trying to fix exactly?

Best Regards,

Boris



For example, assuming diff0 is 1, then fls(diff0) is equal to 1, then "~(1 << 
fls(diff0))" is equal to 0xfffd,
then the result of "(diff0 & ~(1 << fls(diff0)))" is 1 , not we expected 0.

__fls(diff0) and "(fls(diff0) - 1)" are all right, but fls(diff0) is wrong.



Re: [PATCH v1 09/12] serial: 8250_lpss: split LPSS driver to separate module

2016-04-07 Thread Peter Hurley
On 04/07/2016 01:37 PM, Andy Shevchenko wrote:
> Intes SoCs, such as Braswell, have DesignWare UART. Split out to separate
> module which also will be used for Intel Quark later.

What's the rationale?

And this really isn't a split; this patch introduces a number of significant
changes from the pci version.


> Signed-off-by: Andy Shevchenko 
> ---
>  drivers/tty/serial/8250/8250_lpss.c | 279 
> 
>  drivers/tty/serial/8250/8250_pci.c  | 227 ++---
>  drivers/tty/serial/8250/Kconfig |  14 +-
>  drivers/tty/serial/8250/Makefile|   1 +
>  4 files changed, 301 insertions(+), 220 deletions(-)
>  create mode 100644 drivers/tty/serial/8250/8250_lpss.c
> 
> diff --git a/drivers/tty/serial/8250/8250_lpss.c 
> b/drivers/tty/serial/8250/8250_lpss.c
> new file mode 100644
> index 000..bca4adb
> --- /dev/null
> +++ b/drivers/tty/serial/8250/8250_lpss.c
> @@ -0,0 +1,279 @@
> +/*
> + * 8250_lpss.c - Driver for UART on Intel Braswell and various other Intel 
> SoCs
> + *
> + * Copyright (C) 2016 Intel Corporation
> + * Author: Andy Shevchenko 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +
> +#include "8250.h"
> +
> +#define PCI_DEVICE_ID_INTEL_BYT_UART10x0f0a
> +#define PCI_DEVICE_ID_INTEL_BYT_UART20x0f0c
> +
> +#define PCI_DEVICE_ID_INTEL_BSW_UART10x228a
> +#define PCI_DEVICE_ID_INTEL_BSW_UART20x228c
> +
> +#define PCI_DEVICE_ID_INTEL_BDW_UART10x9ce3
> +#define PCI_DEVICE_ID_INTEL_BDW_UART20x9ce4
> +
> +/* Intel LPSS specific registers */
> +
> +#define BYT_PRV_CLK  0x800
> +#define BYT_PRV_CLK_EN   BIT(0)
> +#define BYT_PRV_CLK_M_VAL_SHIFT  1
> +#define BYT_PRV_CLK_N_VAL_SHIFT  16
> +#define BYT_PRV_CLK_UPDATE   BIT(31)
> +
> +#define BYT_TX_OVF_INT   0x820
> +#define BYT_TX_OVF_INT_MASK  BIT(1)
> +
> +struct lpss8250;
> +
> +struct lpss8250_board {
> + unsigned long freq;
> + unsigned int base_baud;
> + int (*setup)(struct lpss8250 *, struct uart_port *p);
> +};

New concept.

> +
> +struct lpss8250 {
> + int line;
> + struct lpss8250_board *board;
> +
> + /* DMA parameters */
> + struct uart_8250_dma dma;
> + struct dw_dma_slave dma_param;
> + u8 dma_maxburst;
> +};
> +
> +/*/

Please remove.

> +
> +static void lpss8250_set_termios(struct uart_port *p,
> +  struct ktermios *termios,
> +  struct ktermios *old)
> +{
> + unsigned int baud = tty_termios_baud_rate(termios);
> + struct lpss8250 *lpss = p->private_data;
> + unsigned long fref = lpss->board->freq, fuart = baud * 16;
> + unsigned long w = BIT(15) - 1;
> + unsigned long m, n;
> + u32 reg;
> +
> + /* Get Fuart closer to Fref */
> + fuart *= rounddown_pow_of_two(fref / fuart);
> +
> + /*
> +  * For baud rates 0.5M, 1M, 1.5M, 2M, 2.5M, 3M, 3.5M and 4M the
> +  * dividers must be adjusted.
> +  *
> +  * uartclk = (m / n) * 100 MHz, where m <= n
> +  */
> + rational_best_approximation(fuart, fref, w, w, , );
> + p->uartclk = fuart;
> +
> + /* Reset the clock */
> + reg = (m << BYT_PRV_CLK_M_VAL_SHIFT) | (n << BYT_PRV_CLK_N_VAL_SHIFT);
> + writel(reg, p->membase + BYT_PRV_CLK);
> + reg |= BYT_PRV_CLK_EN | BYT_PRV_CLK_UPDATE;
> + writel(reg, p->membase + BYT_PRV_CLK);
> +
> + p->status &= ~UPSTAT_AUTOCTS;
> + if (termios->c_cflag & CRTSCTS)
> + p->status |= UPSTAT_AUTOCTS;
> +
> + serial8250_do_set_termios(p, termios, old);
> +}
> +
> +/*/

Please remove.

> +
> +static int byt_serial_setup(struct lpss8250 *lpss, struct uart_port *port)

This would have been much easier to review if you had simply moved it here
and done the rework in a follow-on patch.


> +{
> + struct dw_dma_slave *param = >dma_param;
> + struct uart_8250_port *up = up_to_u8250p(port);
> + struct pci_dev *pdev = to_pci_dev(port->dev);
> + struct pci_dev *dma_dev = pci_get_slot(pdev->bus, 
> PCI_DEVFN(PCI_SLOT(pdev->devfn), 0));
> +
> + switch (pdev->device) {
> + case PCI_DEVICE_ID_INTEL_BYT_UART1:
> + case PCI_DEVICE_ID_INTEL_BSW_UART1:
> + case PCI_DEVICE_ID_INTEL_BDW_UART1:
> + param->src_id = 3;
> + param->dst_id = 2;
> + break;
> + case PCI_DEVICE_ID_INTEL_BYT_UART2:
> + case PCI_DEVICE_ID_INTEL_BSW_UART2:
> + case 

Re: [PATCH v1 09/12] serial: 8250_lpss: split LPSS driver to separate module

2016-04-07 Thread Peter Hurley
On 04/07/2016 01:37 PM, Andy Shevchenko wrote:
> Intes SoCs, such as Braswell, have DesignWare UART. Split out to separate
> module which also will be used for Intel Quark later.

What's the rationale?

And this really isn't a split; this patch introduces a number of significant
changes from the pci version.


> Signed-off-by: Andy Shevchenko 
> ---
>  drivers/tty/serial/8250/8250_lpss.c | 279 
> 
>  drivers/tty/serial/8250/8250_pci.c  | 227 ++---
>  drivers/tty/serial/8250/Kconfig |  14 +-
>  drivers/tty/serial/8250/Makefile|   1 +
>  4 files changed, 301 insertions(+), 220 deletions(-)
>  create mode 100644 drivers/tty/serial/8250/8250_lpss.c
> 
> diff --git a/drivers/tty/serial/8250/8250_lpss.c 
> b/drivers/tty/serial/8250/8250_lpss.c
> new file mode 100644
> index 000..bca4adb
> --- /dev/null
> +++ b/drivers/tty/serial/8250/8250_lpss.c
> @@ -0,0 +1,279 @@
> +/*
> + * 8250_lpss.c - Driver for UART on Intel Braswell and various other Intel 
> SoCs
> + *
> + * Copyright (C) 2016 Intel Corporation
> + * Author: Andy Shevchenko 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +
> +#include "8250.h"
> +
> +#define PCI_DEVICE_ID_INTEL_BYT_UART10x0f0a
> +#define PCI_DEVICE_ID_INTEL_BYT_UART20x0f0c
> +
> +#define PCI_DEVICE_ID_INTEL_BSW_UART10x228a
> +#define PCI_DEVICE_ID_INTEL_BSW_UART20x228c
> +
> +#define PCI_DEVICE_ID_INTEL_BDW_UART10x9ce3
> +#define PCI_DEVICE_ID_INTEL_BDW_UART20x9ce4
> +
> +/* Intel LPSS specific registers */
> +
> +#define BYT_PRV_CLK  0x800
> +#define BYT_PRV_CLK_EN   BIT(0)
> +#define BYT_PRV_CLK_M_VAL_SHIFT  1
> +#define BYT_PRV_CLK_N_VAL_SHIFT  16
> +#define BYT_PRV_CLK_UPDATE   BIT(31)
> +
> +#define BYT_TX_OVF_INT   0x820
> +#define BYT_TX_OVF_INT_MASK  BIT(1)
> +
> +struct lpss8250;
> +
> +struct lpss8250_board {
> + unsigned long freq;
> + unsigned int base_baud;
> + int (*setup)(struct lpss8250 *, struct uart_port *p);
> +};

New concept.

> +
> +struct lpss8250 {
> + int line;
> + struct lpss8250_board *board;
> +
> + /* DMA parameters */
> + struct uart_8250_dma dma;
> + struct dw_dma_slave dma_param;
> + u8 dma_maxburst;
> +};
> +
> +/*/

Please remove.

> +
> +static void lpss8250_set_termios(struct uart_port *p,
> +  struct ktermios *termios,
> +  struct ktermios *old)
> +{
> + unsigned int baud = tty_termios_baud_rate(termios);
> + struct lpss8250 *lpss = p->private_data;
> + unsigned long fref = lpss->board->freq, fuart = baud * 16;
> + unsigned long w = BIT(15) - 1;
> + unsigned long m, n;
> + u32 reg;
> +
> + /* Get Fuart closer to Fref */
> + fuart *= rounddown_pow_of_two(fref / fuart);
> +
> + /*
> +  * For baud rates 0.5M, 1M, 1.5M, 2M, 2.5M, 3M, 3.5M and 4M the
> +  * dividers must be adjusted.
> +  *
> +  * uartclk = (m / n) * 100 MHz, where m <= n
> +  */
> + rational_best_approximation(fuart, fref, w, w, , );
> + p->uartclk = fuart;
> +
> + /* Reset the clock */
> + reg = (m << BYT_PRV_CLK_M_VAL_SHIFT) | (n << BYT_PRV_CLK_N_VAL_SHIFT);
> + writel(reg, p->membase + BYT_PRV_CLK);
> + reg |= BYT_PRV_CLK_EN | BYT_PRV_CLK_UPDATE;
> + writel(reg, p->membase + BYT_PRV_CLK);
> +
> + p->status &= ~UPSTAT_AUTOCTS;
> + if (termios->c_cflag & CRTSCTS)
> + p->status |= UPSTAT_AUTOCTS;
> +
> + serial8250_do_set_termios(p, termios, old);
> +}
> +
> +/*/

Please remove.

> +
> +static int byt_serial_setup(struct lpss8250 *lpss, struct uart_port *port)

This would have been much easier to review if you had simply moved it here
and done the rework in a follow-on patch.


> +{
> + struct dw_dma_slave *param = >dma_param;
> + struct uart_8250_port *up = up_to_u8250p(port);
> + struct pci_dev *pdev = to_pci_dev(port->dev);
> + struct pci_dev *dma_dev = pci_get_slot(pdev->bus, 
> PCI_DEVFN(PCI_SLOT(pdev->devfn), 0));
> +
> + switch (pdev->device) {
> + case PCI_DEVICE_ID_INTEL_BYT_UART1:
> + case PCI_DEVICE_ID_INTEL_BSW_UART1:
> + case PCI_DEVICE_ID_INTEL_BDW_UART1:
> + param->src_id = 3;
> + param->dst_id = 2;
> + break;
> + case PCI_DEVICE_ID_INTEL_BYT_UART2:
> + case PCI_DEVICE_ID_INTEL_BSW_UART2:
> + case PCI_DEVICE_ID_INTEL_BDW_UART2:
> + param->src_id = 5;
> + param->dst_id 

Re: [PATCH 2/2] net-ath9k_htc: Replace a variable initialisation by an assignment in ath9k_htc_set_channel()

2016-04-07 Thread Julian Calaby
Hi Kalle,

On Sat, Jan 2, 2016 at 5:25 AM, SF Markus Elfring
 wrote:
> From: Markus Elfring 
> Date: Fri, 1 Jan 2016 19:09:32 +0100
>
> Replace an explicit initialisation for one local variable at the beginning
> by a conditional assignment.
>
> Signed-off-by: Markus Elfring 

This looks sane to me.

Reviewed-by: Julian Calaby 

Thanks,

Julian Calaby

> ---
>  drivers/net/wireless/ath/ath9k/htc_drv_main.c | 7 ++-
>  1 file changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/wireless/ath/ath9k/htc_drv_main.c 
> b/drivers/net/wireless/ath/ath9k/htc_drv_main.c
> index a680a97..30bd59e 100644
> --- a/drivers/net/wireless/ath/ath9k/htc_drv_main.c
> +++ b/drivers/net/wireless/ath/ath9k/htc_drv_main.c
> @@ -246,7 +246,7 @@ static int ath9k_htc_set_channel(struct ath9k_htc_priv 
> *priv,
> struct ieee80211_conf *conf = >hw->conf;
> bool fastcc;
> struct ieee80211_channel *channel = hw->conf.chandef.chan;
> -   struct ath9k_hw_cal_data *caldata = NULL;
> +   struct ath9k_hw_cal_data *caldata;
> enum htc_phymode mode;
> __be16 htc_mode;
> u8 cmd_rsp;
> @@ -274,10 +274,7 @@ static int ath9k_htc_set_channel(struct ath9k_htc_priv 
> *priv,
> priv->ah->curchan->channel,
> channel->center_freq, conf_is_ht(conf), conf_is_ht40(conf),
> fastcc);
> -
> -   if (!fastcc)
> -   caldata = >caldata;
> -
> +   caldata = fastcc ? NULL : >caldata;
> ret = ath9k_hw_reset(ah, hchan, caldata, fastcc);
> if (ret) {
> ath_err(common,
> --
> 2.6.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Julian Calaby

Email: julian.cal...@gmail.com
Profile: http://www.google.com/profiles/julian.calaby/


Re: [PATCH 2/2] net-ath9k_htc: Replace a variable initialisation by an assignment in ath9k_htc_set_channel()

2016-04-07 Thread Julian Calaby
Hi Kalle,

On Sat, Jan 2, 2016 at 5:25 AM, SF Markus Elfring
 wrote:
> From: Markus Elfring 
> Date: Fri, 1 Jan 2016 19:09:32 +0100
>
> Replace an explicit initialisation for one local variable at the beginning
> by a conditional assignment.
>
> Signed-off-by: Markus Elfring 

This looks sane to me.

Reviewed-by: Julian Calaby 

Thanks,

Julian Calaby

> ---
>  drivers/net/wireless/ath/ath9k/htc_drv_main.c | 7 ++-
>  1 file changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/wireless/ath/ath9k/htc_drv_main.c 
> b/drivers/net/wireless/ath/ath9k/htc_drv_main.c
> index a680a97..30bd59e 100644
> --- a/drivers/net/wireless/ath/ath9k/htc_drv_main.c
> +++ b/drivers/net/wireless/ath/ath9k/htc_drv_main.c
> @@ -246,7 +246,7 @@ static int ath9k_htc_set_channel(struct ath9k_htc_priv 
> *priv,
> struct ieee80211_conf *conf = >hw->conf;
> bool fastcc;
> struct ieee80211_channel *channel = hw->conf.chandef.chan;
> -   struct ath9k_hw_cal_data *caldata = NULL;
> +   struct ath9k_hw_cal_data *caldata;
> enum htc_phymode mode;
> __be16 htc_mode;
> u8 cmd_rsp;
> @@ -274,10 +274,7 @@ static int ath9k_htc_set_channel(struct ath9k_htc_priv 
> *priv,
> priv->ah->curchan->channel,
> channel->center_freq, conf_is_ht(conf), conf_is_ht40(conf),
> fastcc);
> -
> -   if (!fastcc)
> -   caldata = >caldata;
> -
> +   caldata = fastcc ? NULL : >caldata;
> ret = ath9k_hw_reset(ah, hchan, caldata, fastcc);
> if (ret) {
> ath_err(common,
> --
> 2.6.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Julian Calaby

Email: julian.cal...@gmail.com
Profile: http://www.google.com/profiles/julian.calaby/


Re: [PATCH v3 6/7] x86/cpu: Add Erratum 88 detection on AMD

2016-04-07 Thread Andy Lutomirski
On Thu, Apr 7, 2016 at 5:31 PM, Andy Lutomirski  wrote:
> From: Borislav Petkov 
>
> Erratum 88 affects old AMD K8s, where a SWAPGS fails to cause an input
> dependency on GS. Therefore, we need to MFENCE before it.
>
> But that MFENCE is expensive and unnecessary on the remaining x86 CPUs
> out there so patch it out on the CPUs which don't require it.

This is basically identical to:

https://lkml.kernel.org/g/1458576969-13309-4-git-send-email-a...@firstfloor.org

Whoops!  I thought I'd seen that somewhere but I couldn't spot it.

Ingo, etc: we should probably apply one of those patches with a
-stable tag (to mitigate the otherwise potentially unpleasant
performance regression in here), but I don't really care which one.
Andi's has a name for the bug that seems nicer by one character to me,
but it would have to be (trivally) rebased.

--Andy

>
> Signed-off-by: Borislav Petkov 
> Cc: 
> Signed-off-by: Andy Lutomirski  ---
>  arch/x86/entry/entry_64.S  | 2 +-
>  arch/x86/include/asm/cpufeatures.h | 2 ++
>  arch/x86/kernel/cpu/amd.c  | 1 +
>  3 files changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 858b555e274b..64d2033d1e49 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -783,7 +783,7 @@ ENTRY(native_load_gs_index)
> SWAPGS
>  gs_change:
> movl%edi, %gs
> -2: mfence  /* workaround */
> +2: ALTERNATIVE "", "mfence", X86_BUG_SWAPGS_FENCE
> SWAPGS
> popfq
> ret
> diff --git a/arch/x86/include/asm/cpufeatures.h 
> b/arch/x86/include/asm/cpufeatures.h
> index 2a052302bc43..7bfb6b70c745 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -295,6 +295,8 @@
>  #define X86_BUG_CLFLUSH_MONITORX86_BUG(7) /* AAI65, CLFLUSH required 
> before MONITOR */
>  #define X86_BUG_SYSRET_SS_ATTRSX86_BUG(8) /* SYSRET doesn't fix up 
> SS attrs */
>  #define X86_BUG_NULL_SEG   X86_BUG(9) /* Nulling a selector preserves 
> the base */
> +#define X86_BUG_SWAPGS_FENCE   X86_BUG(10) /* SWAPGS without input dep on GS 
> */
> +
>
>  #ifdef CONFIG_X86_32
>  /*
> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
> index 6e47e3a916f1..b7cc9efe08b5 100644
> --- a/arch/x86/kernel/cpu/amd.c
> +++ b/arch/x86/kernel/cpu/amd.c
> @@ -632,6 +632,7 @@ static void init_amd_k8(struct cpuinfo_x86 *c)
>  */
> msr_set_bit(MSR_K7_HWCR, 6);
>  #endif
> +   set_cpu_bug(c, X86_BUG_SWAPGS_FENCE);
>  }
>
>  static void init_amd_gh(struct cpuinfo_x86 *c)
> --
> 2.5.5
>



-- 
Andy Lutomirski
AMA Capital Management, LLC


Re: [PATCH v3 6/7] x86/cpu: Add Erratum 88 detection on AMD

2016-04-07 Thread Andy Lutomirski
On Thu, Apr 7, 2016 at 5:31 PM, Andy Lutomirski  wrote:
> From: Borislav Petkov 
>
> Erratum 88 affects old AMD K8s, where a SWAPGS fails to cause an input
> dependency on GS. Therefore, we need to MFENCE before it.
>
> But that MFENCE is expensive and unnecessary on the remaining x86 CPUs
> out there so patch it out on the CPUs which don't require it.

This is basically identical to:

https://lkml.kernel.org/g/1458576969-13309-4-git-send-email-a...@firstfloor.org

Whoops!  I thought I'd seen that somewhere but I couldn't spot it.

Ingo, etc: we should probably apply one of those patches with a
-stable tag (to mitigate the otherwise potentially unpleasant
performance regression in here), but I don't really care which one.
Andi's has a name for the bug that seems nicer by one character to me,
but it would have to be (trivally) rebased.

--Andy

>
> Signed-off-by: Borislav Petkov 
> Cc: 
> Signed-off-by: Andy Lutomirski  ---
>  arch/x86/entry/entry_64.S  | 2 +-
>  arch/x86/include/asm/cpufeatures.h | 2 ++
>  arch/x86/kernel/cpu/amd.c  | 1 +
>  3 files changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 858b555e274b..64d2033d1e49 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -783,7 +783,7 @@ ENTRY(native_load_gs_index)
> SWAPGS
>  gs_change:
> movl%edi, %gs
> -2: mfence  /* workaround */
> +2: ALTERNATIVE "", "mfence", X86_BUG_SWAPGS_FENCE
> SWAPGS
> popfq
> ret
> diff --git a/arch/x86/include/asm/cpufeatures.h 
> b/arch/x86/include/asm/cpufeatures.h
> index 2a052302bc43..7bfb6b70c745 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -295,6 +295,8 @@
>  #define X86_BUG_CLFLUSH_MONITORX86_BUG(7) /* AAI65, CLFLUSH required 
> before MONITOR */
>  #define X86_BUG_SYSRET_SS_ATTRSX86_BUG(8) /* SYSRET doesn't fix up 
> SS attrs */
>  #define X86_BUG_NULL_SEG   X86_BUG(9) /* Nulling a selector preserves 
> the base */
> +#define X86_BUG_SWAPGS_FENCE   X86_BUG(10) /* SWAPGS without input dep on GS 
> */
> +
>
>  #ifdef CONFIG_X86_32
>  /*
> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
> index 6e47e3a916f1..b7cc9efe08b5 100644
> --- a/arch/x86/kernel/cpu/amd.c
> +++ b/arch/x86/kernel/cpu/amd.c
> @@ -632,6 +632,7 @@ static void init_amd_k8(struct cpuinfo_x86 *c)
>  */
> msr_set_bit(MSR_K7_HWCR, 6);
>  #endif
> +   set_cpu_bug(c, X86_BUG_SWAPGS_FENCE);
>  }
>
>  static void init_amd_gh(struct cpuinfo_x86 *c)
> --
> 2.5.5
>



-- 
Andy Lutomirski
AMA Capital Management, LLC


Re: x32 processes, with CONFIG_X86_X32 not set

2016-04-07 Thread Dave Jones
On Thu, Apr 07, 2016 at 05:20:50PM -0700, Andy Lutomirski wrote:
 > >  > > I had a trinity process get stuck last overnight.
 > >  > > The reason for it getting stuck is my bug (I think), but
 > >  > > there's an odd unrelated thing I noticed while debugging this..
 > >  > >
 > >  > > $ strace -p 20966
 > >  > > strace: Process 20966 attached
 > >  > > strace: [ Process PID=20966 runs in x32 mode. ]
 > >  > >
 > >  > > So I don't use that new-fangled x32 stuff.
 > >  > > I don't even have CONFIG_X86_X32 compiled in.
 > >  > >
 > >  > > Is this strace getting confused, or did we somehow screw
 > >  > > up the syscall entry code ?
 > >  > >
 > >  > > Dave
 > >  > >
 > >  >
 > >  > I think you're just seeing an oddity of how x32 works.  Unlike
 > >  > "compat", x32-ness of the current syscall isn't a special magic state
 > >  > variable; it's just but 31 in the syscall nr.  So trying to do an x32
 > >  > syscall on a non-x32 syscall should still show bit 31 set to ptracers,
 > >  > and the strace probably decodes this as being in x32 mode.
 > >
 > > But this is an x86-64 binary, and it's the main process, not one of the 
 > > fuzzing
 > > child processes. It shouldn't be even trying to do anything weird.
 > > It creates a bunch of fd's, then enters a loop forking/reaping children.
 > > (In this case it actually hung while creating the fd's)
 > >
 > > Trinity doesn't actually have any knowledge of x32 at all, mostly because
 > > it's been irrelevant to me (and most other people).
 > >
 > 
 > Hmm.  Do you have the next couple lines of strace output by any
 > chance?  I'm wondering if this is a classic bug/misfeature/confusion
 > in the way that orig_ax works.

I don't. That box got rebooted a couple dozen times since then.
FWIW, I've not seen this happen since. Very strange.

Dave


Re: x32 processes, with CONFIG_X86_X32 not set

2016-04-07 Thread Dave Jones
On Thu, Apr 07, 2016 at 05:20:50PM -0700, Andy Lutomirski wrote:
 > >  > > I had a trinity process get stuck last overnight.
 > >  > > The reason for it getting stuck is my bug (I think), but
 > >  > > there's an odd unrelated thing I noticed while debugging this..
 > >  > >
 > >  > > $ strace -p 20966
 > >  > > strace: Process 20966 attached
 > >  > > strace: [ Process PID=20966 runs in x32 mode. ]
 > >  > >
 > >  > > So I don't use that new-fangled x32 stuff.
 > >  > > I don't even have CONFIG_X86_X32 compiled in.
 > >  > >
 > >  > > Is this strace getting confused, or did we somehow screw
 > >  > > up the syscall entry code ?
 > >  > >
 > >  > > Dave
 > >  > >
 > >  >
 > >  > I think you're just seeing an oddity of how x32 works.  Unlike
 > >  > "compat", x32-ness of the current syscall isn't a special magic state
 > >  > variable; it's just but 31 in the syscall nr.  So trying to do an x32
 > >  > syscall on a non-x32 syscall should still show bit 31 set to ptracers,
 > >  > and the strace probably decodes this as being in x32 mode.
 > >
 > > But this is an x86-64 binary, and it's the main process, not one of the 
 > > fuzzing
 > > child processes. It shouldn't be even trying to do anything weird.
 > > It creates a bunch of fd's, then enters a loop forking/reaping children.
 > > (In this case it actually hung while creating the fd's)
 > >
 > > Trinity doesn't actually have any knowledge of x32 at all, mostly because
 > > it's been irrelevant to me (and most other people).
 > >
 > 
 > Hmm.  Do you have the next couple lines of strace output by any
 > chance?  I'm wondering if this is a classic bug/misfeature/confusion
 > in the way that orig_ax works.

I don't. That box got rebooted a couple dozen times since then.
FWIW, I've not seen this happen since. Very strange.

Dave


Re: [PATCH v1 06/10] device property: switch to use UUID API

2016-04-07 Thread Huang, Ying
Andy Shevchenko  writes:

> On Fri, 2016-02-26 at 16:11 +0200, Andy Shevchenko wrote:
>> On Thu, 2016-02-18 at 01:03 +0100, Rafael J. Wysocki wrote:
>> > 
>> > On Wednesday, February 17, 2016 02:17:24 PM Andy Shevchenko wrote:
>> > > 
>> > > Switch to use a generic UUID API instead of custom approach. It
>> > > allows to
>> > > define UUIDs, compare them, and validate.
>> []
>> 
>
> Summon initial author of the UUID library.
>
> Summary: the API of comparison functions is rather strange. What the
> point to not take pointers directly? (Moreover I hope compiler too
> clever not to make a copy of constant arguments there)
>
> I could only imagine the case you are trying to avoid temporary
> variables for constants like NULL_UUID.
>
> Issue with this is the ugliness in the users of that, in particularly
> present in ACPI (drivers/acpi/apei/ghes.c).
>
> I would like to have more clear interface for that. Perhaps we may add
> something like
>
> cmp_p(pointer, non-pointer);
> cmp_pp(pointer, pointer);
>
> to not break existing API for now.
>
> It would be useful for many cases in the kernel.

You can take a look at the drivers/acpi/apei/erst.c for uuid_le_cmp
usage.

#define CPER_CREATOR_PSTORE \
UUID_LE(0x75a574e3, 0x5052, 0x4b29, 0x8a, 0x8e, 0xbe, 0x2c, \
0x64, 0x90, 0xb8, 0x9d)

if (uuid_le_cmp(rcd->hdr.creator_id, CPER_CREATOR_PSTORE) != 0)
goto skip;

Looks better?

This is the typical use case in mind when I write the uuid.h.

As for uuid_le_cmp usage in drivers/acpi/apei/ghes.c,

if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
 CPER_SEC_PLATFORM_MEM)) {

The code looks not good mainly because acpi_hest_generic_data is not
defined with uuid_le in mind.

struct acpi_hest_generic_data {
u8 section_type[16];
u32 error_severity;
u16 revision;
u8 validation_bits;
u8 flags;
u32 error_data_length;
u8 fru_id[16];
u8 fru_text[20];
};

If section_type was defined as uuid_le instead of u8[16], the
uuid_le_cmp usage would look better.  So I suggest to use uuid_le/be in
data structure definition in new code if possible.

Best Regards,
Huang, Ying

>> > 
>> > > 
>> > > +static const uuid_le ads_uuid =
>> > > +UUID_LE(0xdbb8e3e6, 0x5886, 0x4ba6,
>> > > +0x87, 0x95, 0x13, 0x19, 0xf5, 0x2a, 0x96, 0x6b);
>> > >  
>> > >  static bool acpi_enumerate_nondev_subnodes(acpi_handle scope,
>> > >     const union
>> > > acpi_object
>> > > *desc,
>> > > @@ -138,7 +136,7 @@ static bool
>> > > acpi_enumerate_nondev_subnodes(acpi_handle scope,
>> > >  || links->type != ACPI_TYPE_PACKAGE)
>> > >  break;
>> > >  
>> > > -if (memcmp(uuid->buffer.pointer, ads_uuid,
>> > > sizeof(ads_uuid)))
>> > > +if (uuid_le_cmp(*(uuid_le *)uuid->buffer.pointer,
>> > > ads_uuid))
>> > Maybe it's too late, but I don't quite understand the pointer
>> > manipulations here.
>> > 
>> > I can see why you need a type conversion (although it looks ugly),
>> > but why do you
>> > need to dereference it too?
>> The function takes that kind of type on input. The other variants are
>> not compiled.
>> Perhaps we better change uuid_{lb}e_cmp() first to take normal
>> pointers, though I think the initial idea was to get type checking at
>> compile time.
>> 


Re: [PATCH v1 06/10] device property: switch to use UUID API

2016-04-07 Thread Huang, Ying
Andy Shevchenko  writes:

> On Fri, 2016-02-26 at 16:11 +0200, Andy Shevchenko wrote:
>> On Thu, 2016-02-18 at 01:03 +0100, Rafael J. Wysocki wrote:
>> > 
>> > On Wednesday, February 17, 2016 02:17:24 PM Andy Shevchenko wrote:
>> > > 
>> > > Switch to use a generic UUID API instead of custom approach. It
>> > > allows to
>> > > define UUIDs, compare them, and validate.
>> []
>> 
>
> Summon initial author of the UUID library.
>
> Summary: the API of comparison functions is rather strange. What the
> point to not take pointers directly? (Moreover I hope compiler too
> clever not to make a copy of constant arguments there)
>
> I could only imagine the case you are trying to avoid temporary
> variables for constants like NULL_UUID.
>
> Issue with this is the ugliness in the users of that, in particularly
> present in ACPI (drivers/acpi/apei/ghes.c).
>
> I would like to have more clear interface for that. Perhaps we may add
> something like
>
> cmp_p(pointer, non-pointer);
> cmp_pp(pointer, pointer);
>
> to not break existing API for now.
>
> It would be useful for many cases in the kernel.

You can take a look at the drivers/acpi/apei/erst.c for uuid_le_cmp
usage.

#define CPER_CREATOR_PSTORE \
UUID_LE(0x75a574e3, 0x5052, 0x4b29, 0x8a, 0x8e, 0xbe, 0x2c, \
0x64, 0x90, 0xb8, 0x9d)

if (uuid_le_cmp(rcd->hdr.creator_id, CPER_CREATOR_PSTORE) != 0)
goto skip;

Looks better?

This is the typical use case in mind when I write the uuid.h.

As for uuid_le_cmp usage in drivers/acpi/apei/ghes.c,

if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
 CPER_SEC_PLATFORM_MEM)) {

The code looks not good mainly because acpi_hest_generic_data is not
defined with uuid_le in mind.

struct acpi_hest_generic_data {
u8 section_type[16];
u32 error_severity;
u16 revision;
u8 validation_bits;
u8 flags;
u32 error_data_length;
u8 fru_id[16];
u8 fru_text[20];
};

If section_type was defined as uuid_le instead of u8[16], the
uuid_le_cmp usage would look better.  So I suggest to use uuid_le/be in
data structure definition in new code if possible.

Best Regards,
Huang, Ying

>> > 
>> > > 
>> > > +static const uuid_le ads_uuid =
>> > > +UUID_LE(0xdbb8e3e6, 0x5886, 0x4ba6,
>> > > +0x87, 0x95, 0x13, 0x19, 0xf5, 0x2a, 0x96, 0x6b);
>> > >  
>> > >  static bool acpi_enumerate_nondev_subnodes(acpi_handle scope,
>> > >     const union
>> > > acpi_object
>> > > *desc,
>> > > @@ -138,7 +136,7 @@ static bool
>> > > acpi_enumerate_nondev_subnodes(acpi_handle scope,
>> > >  || links->type != ACPI_TYPE_PACKAGE)
>> > >  break;
>> > >  
>> > > -if (memcmp(uuid->buffer.pointer, ads_uuid,
>> > > sizeof(ads_uuid)))
>> > > +if (uuid_le_cmp(*(uuid_le *)uuid->buffer.pointer,
>> > > ads_uuid))
>> > Maybe it's too late, but I don't quite understand the pointer
>> > manipulations here.
>> > 
>> > I can see why you need a type conversion (although it looks ugly),
>> > but why do you
>> > need to dereference it too?
>> The function takes that kind of type on input. The other variants are
>> not compiled.
>> Perhaps we better change uuid_{lb}e_cmp() first to take normal
>> pointers, though I think the initial idea was to get type checking at
>> compile time.
>> 


Re: [RFC PATCH 0/3] restartable sequences v2: fast user-space percpu critical sections

2016-04-07 Thread Andy Lutomirski
On Thu, Apr 7, 2016 at 6:11 PM, Mathieu Desnoyers
 wrote:
> - On Apr 7, 2016, at 6:05 PM, Andy Lutomirski l...@amacapital.net wrote:
>
>> On Thu, Apr 7, 2016 at 1:11 PM, Peter Zijlstra  wrote:
>>> On Thu, Apr 07, 2016 at 09:43:33AM -0700, Andy Lutomirski wrote:
> [...]
>>>
 it's inherently debuggable,
>>>
>>> It is more debuggable, agreed.
>>>
 and it allows multiple independent
 rseq-protected things to coexist without forcing each other to abort.
>
> [...]
>
> My understanding is that the main goal of this rather more complex
> proposal is to make interaction with debuggers more straightforward in
> cases of single-stepping through the rseq critical section.

The things I like about my proposal are both that you can single-step
through it just like any other code as long as you pin the thread to a
CPU and that it doesn't make preemption magical.  (Of course, you can
*force* it to do something on resume and/or preemption by sticking a
bogus value in the expected event count field, but that's not the
intended use.  Hmm, I guess it does need to hook preemption and/or
resume for all processes that enable the thing so it can know to check
for an enabled post_commit_rip, just like all the other proposals.)

Also, mine lets you have a fairly long-running critical section that
doesn't get aborted under heavy load and can interleave with other
critical sections that don't conflict.

>
> I recently came up with a scheme that should allow us to handle such
> situations in a fashion similar to debuggers handling ll/sc
> restartable sequences of instructions on e.g. powerpc. The good news
> is that my scheme does not require anything at the kernel level.
>
> The idea is simple: the userspace rseq critical sections now
> become marked by 3 inline functions (rather than 2 in Paul's proposal):
>
> rseq_start(void *rseq_key)
> rseq_finish(void *rseq_key)
> rseq_abort(void *rseq_key)

How do you use this thing?  What are its semantics?

--Andy

>
> We associate each critical section with a unique "key" (dummy
> 1 byte object in the process address space), so we can group
> them. The new "rseq_abort" would mark exit points that would
> exit the critical section without executing the final commit
> instruction.
>
> Within each of rseq_start, rseq_finish and rseq_abort,
> we declare a non-loadable section that gets populated
> with the following tuples:
>
> (RSEQ_TYPE, insn address, rseq_key)
>
> Where RSEQ_TYPE is either RSEQ_START, RSEQ_FINISH, or RSEQ_ABORT.
>
> That special section would be found in the executable by the
> debugger, which can then skip over entire restartable critical
> sections when it encounters them by placing breakpoints at
> all exit points (finish and cancel) associated to the same
> rseq_key as the entry point (start).
>
> This way we don't need to complexify the runtime code, neither
> at kernel nor user-space level, and we get debuggability using
> a trick similar to what ll/sc architectures already need to do.
>
> Of course, this requires extending gdb, which should not be
> a show-stopper.
>
> Thoughts ?
>
> Thanks,
>
> Mathieu
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com



-- 
Andy Lutomirski
AMA Capital Management, LLC


Re: [RFC PATCH 0/3] restartable sequences v2: fast user-space percpu critical sections

2016-04-07 Thread Andy Lutomirski
On Thu, Apr 7, 2016 at 6:11 PM, Mathieu Desnoyers
 wrote:
> - On Apr 7, 2016, at 6:05 PM, Andy Lutomirski l...@amacapital.net wrote:
>
>> On Thu, Apr 7, 2016 at 1:11 PM, Peter Zijlstra  wrote:
>>> On Thu, Apr 07, 2016 at 09:43:33AM -0700, Andy Lutomirski wrote:
> [...]
>>>
 it's inherently debuggable,
>>>
>>> It is more debuggable, agreed.
>>>
 and it allows multiple independent
 rseq-protected things to coexist without forcing each other to abort.
>
> [...]
>
> My understanding is that the main goal of this rather more complex
> proposal is to make interaction with debuggers more straightforward in
> cases of single-stepping through the rseq critical section.

The things I like about my proposal are both that you can single-step
through it just like any other code as long as you pin the thread to a
CPU and that it doesn't make preemption magical.  (Of course, you can
*force* it to do something on resume and/or preemption by sticking a
bogus value in the expected event count field, but that's not the
intended use.  Hmm, I guess it does need to hook preemption and/or
resume for all processes that enable the thing so it can know to check
for an enabled post_commit_rip, just like all the other proposals.)

Also, mine lets you have a fairly long-running critical section that
doesn't get aborted under heavy load and can interleave with other
critical sections that don't conflict.

>
> I recently came up with a scheme that should allow us to handle such
> situations in a fashion similar to debuggers handling ll/sc
> restartable sequences of instructions on e.g. powerpc. The good news
> is that my scheme does not require anything at the kernel level.
>
> The idea is simple: the userspace rseq critical sections now
> become marked by 3 inline functions (rather than 2 in Paul's proposal):
>
> rseq_start(void *rseq_key)
> rseq_finish(void *rseq_key)
> rseq_abort(void *rseq_key)

How do you use this thing?  What are its semantics?

--Andy

>
> We associate each critical section with a unique "key" (dummy
> 1 byte object in the process address space), so we can group
> them. The new "rseq_abort" would mark exit points that would
> exit the critical section without executing the final commit
> instruction.
>
> Within each of rseq_start, rseq_finish and rseq_abort,
> we declare a non-loadable section that gets populated
> with the following tuples:
>
> (RSEQ_TYPE, insn address, rseq_key)
>
> Where RSEQ_TYPE is either RSEQ_START, RSEQ_FINISH, or RSEQ_ABORT.
>
> That special section would be found in the executable by the
> debugger, which can then skip over entire restartable critical
> sections when it encounters them by placing breakpoints at
> all exit points (finish and cancel) associated to the same
> rseq_key as the entry point (start).
>
> This way we don't need to complexify the runtime code, neither
> at kernel nor user-space level, and we get debuggability using
> a trick similar to what ll/sc architectures already need to do.
>
> Of course, this requires extending gdb, which should not be
> a show-stopper.
>
> Thoughts ?
>
> Thanks,
>
> Mathieu
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com



-- 
Andy Lutomirski
AMA Capital Management, LLC


[PATCH v11 50/60] PCI: Unify calculate_size() for io port and MMIO

2016-04-07 Thread Yinghai Lu
Now calculate_memsize() and calculate_iosize() is the same.

Change them to calculate_size().

Signed-off-by: Yinghai Lu 
---
 drivers/pci/setup-bus.c | 25 +
 1 file changed, 5 insertions(+), 20 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 930dcbd..b071035 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1103,22 +1103,7 @@ static struct resource *find_free_bus_resource(struct 
pci_bus *bus,
return NULL;
 }
 
-static resource_size_t calculate_iosize(resource_size_t size,
-   resource_size_t min_size,
-   resource_size_t old_size,
-   resource_size_t align)
-{
-   if (size < min_size)
-   size = min_size;
-   if (old_size == 1)
-   old_size = 0;
-   size = ALIGN(size, align);
-   if (size < old_size)
-   size = old_size;
-   return size;
-}
-
-static resource_size_t calculate_memsize(resource_size_t size,
+static resource_size_t calculate_size(resource_size_t size,
resource_size_t min_size,
resource_size_t old_size,
resource_size_t align)
@@ -1244,14 +1229,14 @@ static void pbus_size_io(struct pci_bus *bus, 
resource_size_t min_size,
 
size = size_aligned_for_isa(size);
size += size1;
-   size0 = calculate_iosize(size, min_size,
+   size0 = calculate_size(size, min_size,
resource_size(b_res), min_align);
sum_add_size = size_aligned_for_isa(sum_add_size);
sum_add_size += sum_add_size1;
if (sum_add_size < min_sum_size)
sum_add_size = min_sum_size;
size1 = !realloc_head ? size0 :
-   calculate_iosize(sum_add_size, min_size,
+   calculate_size(sum_add_size, min_size,
resource_size(b_res), min_align);
if (!size0 && !size1) {
if (b_res->start || b_res->end)
@@ -1580,7 +1565,7 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned 
long mask,
if (size || min_size) {
min_align = calculate_mem_align(_test_list, max_align,
size, window_align);
-   size0 = calculate_memsize(size, min_size,
+   size0 = calculate_size(size, min_size,
  resource_size(b_res), min_align);
}
free_align_test_list(_test_list);
@@ -1605,7 +1590,7 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned 
long mask,
min_add_align = calculate_mem_align(_test_add_list,
max_add_align, sum_add_size,
window_align);
-   size1 = calculate_memsize(sum_add_size, min_size,
+   size1 = calculate_size(sum_add_size, min_size,
 resource_size(b_res), min_add_align);
}
free_align_test_list(_test_add_list);
-- 
1.8.4.5



[PATCH v11 50/60] PCI: Unify calculate_size() for io port and MMIO

2016-04-07 Thread Yinghai Lu
Now calculate_memsize() and calculate_iosize() is the same.

Change them to calculate_size().

Signed-off-by: Yinghai Lu 
---
 drivers/pci/setup-bus.c | 25 +
 1 file changed, 5 insertions(+), 20 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 930dcbd..b071035 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1103,22 +1103,7 @@ static struct resource *find_free_bus_resource(struct 
pci_bus *bus,
return NULL;
 }
 
-static resource_size_t calculate_iosize(resource_size_t size,
-   resource_size_t min_size,
-   resource_size_t old_size,
-   resource_size_t align)
-{
-   if (size < min_size)
-   size = min_size;
-   if (old_size == 1)
-   old_size = 0;
-   size = ALIGN(size, align);
-   if (size < old_size)
-   size = old_size;
-   return size;
-}
-
-static resource_size_t calculate_memsize(resource_size_t size,
+static resource_size_t calculate_size(resource_size_t size,
resource_size_t min_size,
resource_size_t old_size,
resource_size_t align)
@@ -1244,14 +1229,14 @@ static void pbus_size_io(struct pci_bus *bus, 
resource_size_t min_size,
 
size = size_aligned_for_isa(size);
size += size1;
-   size0 = calculate_iosize(size, min_size,
+   size0 = calculate_size(size, min_size,
resource_size(b_res), min_align);
sum_add_size = size_aligned_for_isa(sum_add_size);
sum_add_size += sum_add_size1;
if (sum_add_size < min_sum_size)
sum_add_size = min_sum_size;
size1 = !realloc_head ? size0 :
-   calculate_iosize(sum_add_size, min_size,
+   calculate_size(sum_add_size, min_size,
resource_size(b_res), min_align);
if (!size0 && !size1) {
if (b_res->start || b_res->end)
@@ -1580,7 +1565,7 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned 
long mask,
if (size || min_size) {
min_align = calculate_mem_align(_test_list, max_align,
size, window_align);
-   size0 = calculate_memsize(size, min_size,
+   size0 = calculate_size(size, min_size,
  resource_size(b_res), min_align);
}
free_align_test_list(_test_list);
@@ -1605,7 +1590,7 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned 
long mask,
min_add_align = calculate_mem_align(_test_add_list,
max_add_align, sum_add_size,
window_align);
-   size1 = calculate_memsize(sum_add_size, min_size,
+   size1 = calculate_size(sum_add_size, min_size,
 resource_size(b_res), min_add_align);
}
free_align_test_list(_test_add_list);
-- 
1.8.4.5



Re: [PATCH v8 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-04-07 Thread Joe Perches
On Thu, 2016-04-07 at 18:36 -0700, Dexuan Cui wrote:
> Hyper-V Sockets (hv_sock) supplies a byte-stream based communication
> mechanism between the host and the guest. It's somewhat like TCP over
> VMBus, but the transportation layer (VMBus) is much simpler than IP.
[]
> diff --git a/include/net/af_hvsock.h b/include/net/af_hvsock.h
[]
> +#define VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV (5 * PAGE_SIZE)
> +#define VMBUS_RINGBUFFER_SIZE_HVSOCK_SEND (5 * PAGE_SIZE)
> +
> +#define HVSOCK_RCV_BUF_SZVMBUS_RINGBUFFER_SIZE_HVSOCK_RECV
> +#define HVSOCK_SND_BUF_SZPAGE_SIZE
[]
> +struct hvsock_sock {
[]
> + struct {
> + struct vmpipe_proto_header hdr;
> + char buf[HVSOCK_SND_BUF_SZ];
> + } __packed send;
> +
> + struct {
> + struct vmpipe_proto_header hdr;
> + char buf[HVSOCK_RCV_BUF_SZ];
> + unsigned int data_len;
> + unsigned int data_offset;
> + } __packed recv;
> +};

These bufs are not page aligned and so can span pages.

Is there any value in allocating these bufs separately
as pages instead of as a kmalloc?



Re: [PATCH v8 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-04-07 Thread Joe Perches
On Thu, 2016-04-07 at 18:36 -0700, Dexuan Cui wrote:
> Hyper-V Sockets (hv_sock) supplies a byte-stream based communication
> mechanism between the host and the guest. It's somewhat like TCP over
> VMBus, but the transportation layer (VMBus) is much simpler than IP.
[]
> diff --git a/include/net/af_hvsock.h b/include/net/af_hvsock.h
[]
> +#define VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV (5 * PAGE_SIZE)
> +#define VMBUS_RINGBUFFER_SIZE_HVSOCK_SEND (5 * PAGE_SIZE)
> +
> +#define HVSOCK_RCV_BUF_SZVMBUS_RINGBUFFER_SIZE_HVSOCK_RECV
> +#define HVSOCK_SND_BUF_SZPAGE_SIZE
[]
> +struct hvsock_sock {
[]
> + struct {
> + struct vmpipe_proto_header hdr;
> + char buf[HVSOCK_SND_BUF_SZ];
> + } __packed send;
> +
> + struct {
> + struct vmpipe_proto_header hdr;
> + char buf[HVSOCK_RCV_BUF_SZ];
> + unsigned int data_len;
> + unsigned int data_offset;
> + } __packed recv;
> +};

These bufs are not page aligned and so can span pages.

Is there any value in allocating these bufs separately
as pages instead of as a kmalloc?



Re: [PATCH v4 00/14] x86: remove paravirt_enabled

2016-04-07 Thread Luis R. Rodriguez
On Wed, Apr 06, 2016 at 05:06:20PM -0700, Luis R. Rodriguez wrote:
> Now that Andy's ASM paravirt_enabled() use is merged 

Sorry I should have provided more context, I meant that now
that Andy's ASM paravirt_enabled() removal is merged:

This is the ASM hack that Andy removed:
https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/fsgsbase=58a5aac5331388a175a42b6ed2154f0559cefb21

This puts a nail on coffin for the ASM hack:
https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/fsgsbase=0dd0036f6e07f741a1356b424b84a3164b6e59cf

  Luis


Re: [PATCH] phy: rockchip-emmc: fix compile issue on arm64 platform

2016-04-07 Thread Shawn Lin

在 2016/4/7 21:31, Kishon Vijay Abraham I 写道:

Hi,

On Thursday 07 April 2016 06:30 PM, Kishon Vijay Abraham I wrote:

Hi,

On Tuesday 08 March 2016 01:54 PM, Shawn Lin wrote:

This patch rename "reg" property to "reg_offset".
We rename it to fix the compile issue on ARM64 platform:
(reg_format): "reg" property in /phy has invalid length (4 bytes)
(#address-cells == 2, #size-cells == 2)


Is the same node used for both ARM32 and ARM64 platforms?

Thanks
Kishon


This's because "reg" is very special one which should keep the
*-cells with its parent node and can't be overwrited even if we
do that explicitly. On 32-bit plafform, the default *-cells
fit for what we assign to "reg". But that's not correct for 64-bit
platform. So we can see two possible solutions to fix this problem:
A) make phy-rockchip-emmc as a child phy node and overwrite its
parent's #address-cells and #size-cells.
B) avoid using this special property.

we use it just for passing on a offset for different Socs, and there's
no requirement to change the code to make phy-rockchip-emmc as a child
node. so choose option B) is sane.


I just looked at the Heiko's patch and it makes more sense to have the binding
that he described in his patch [1]. Can you fix it accordingly?


yes, Heiko's patch is more reasonable to me. With his patch applied,
this issue is gone if assigning address-cells in dt for grf.

So we can drop this patch.



I seem to have only this patch and Heiko's patch for this -rc cycle. Once you
send your patch, I can send a pull request to Greg.

Thanks
Kishon

[1]- > https://patchwork.ozlabs.org/patch/601580/


Signed-off-by: Shawn Lin 
---

  Documentation/devicetree/bindings/phy/rockchip-emmc-phy.txt | 4 ++--
  drivers/phy/phy-rockchip-emmc.c | 2 +-
  2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/devicetree/bindings/phy/rockchip-emmc-phy.txt 
b/Documentation/devicetree/bindings/phy/rockchip-emmc-phy.txt
index 61916f1..ed964ef 100644
--- a/Documentation/devicetree/bindings/phy/rockchip-emmc-phy.txt
+++ b/Documentation/devicetree/bindings/phy/rockchip-emmc-phy.txt
@@ -6,7 +6,7 @@ Required properties:
   - rockchip,grf : phandle to the syscon managing the "general
 register files"
   - #phy-cells: must be 0
- - reg: PHY configure reg address offset in "general
+ - reg_offset: PHY configure reg address offset in "general
 register files"

  Example:
@@ -14,6 +14,6 @@ Example:
  emmcphy: phy {
compatible = "rockchip,rk3399-emmc-phy";
rockchip,grf = <>;
-   reg = <0xf780>;
+   reg_offset = <0xf780>;
#phy-cells = <0>;
  };
diff --git a/drivers/phy/phy-rockchip-emmc.c b/drivers/phy/phy-rockchip-emmc.c
index 887b4c2..3f55c0d 100644
--- a/drivers/phy/phy-rockchip-emmc.c
+++ b/drivers/phy/phy-rockchip-emmc.c
@@ -186,7 +186,7 @@ static int rockchip_emmc_phy_probe(struct platform_device 
*pdev)
if (!rk_phy)
return -ENOMEM;

-   if (of_property_read_u32(dev->of_node, "reg", _offset)) {
+   if (of_property_read_u32(dev->of_node, "reg_offset", _offset)) {
dev_err(dev, "missing reg property in node %s\n",
dev->of_node->name);
return -EINVAL;








--
Best Regards
Shawn Lin



Re: [PATCH v4 00/14] x86: remove paravirt_enabled

2016-04-07 Thread Luis R. Rodriguez
On Wed, Apr 06, 2016 at 05:06:20PM -0700, Luis R. Rodriguez wrote:
> Now that Andy's ASM paravirt_enabled() use is merged 

Sorry I should have provided more context, I meant that now
that Andy's ASM paravirt_enabled() removal is merged:

This is the ASM hack that Andy removed:
https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/fsgsbase=58a5aac5331388a175a42b6ed2154f0559cefb21

This puts a nail on coffin for the ASM hack:
https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/fsgsbase=0dd0036f6e07f741a1356b424b84a3164b6e59cf

  Luis


Re: [PATCH] phy: rockchip-emmc: fix compile issue on arm64 platform

2016-04-07 Thread Shawn Lin

在 2016/4/7 21:31, Kishon Vijay Abraham I 写道:

Hi,

On Thursday 07 April 2016 06:30 PM, Kishon Vijay Abraham I wrote:

Hi,

On Tuesday 08 March 2016 01:54 PM, Shawn Lin wrote:

This patch rename "reg" property to "reg_offset".
We rename it to fix the compile issue on ARM64 platform:
(reg_format): "reg" property in /phy has invalid length (4 bytes)
(#address-cells == 2, #size-cells == 2)


Is the same node used for both ARM32 and ARM64 platforms?

Thanks
Kishon


This's because "reg" is very special one which should keep the
*-cells with its parent node and can't be overwrited even if we
do that explicitly. On 32-bit plafform, the default *-cells
fit for what we assign to "reg". But that's not correct for 64-bit
platform. So we can see two possible solutions to fix this problem:
A) make phy-rockchip-emmc as a child phy node and overwrite its
parent's #address-cells and #size-cells.
B) avoid using this special property.

we use it just for passing on a offset for different Socs, and there's
no requirement to change the code to make phy-rockchip-emmc as a child
node. so choose option B) is sane.


I just looked at the Heiko's patch and it makes more sense to have the binding
that he described in his patch [1]. Can you fix it accordingly?


yes, Heiko's patch is more reasonable to me. With his patch applied,
this issue is gone if assigning address-cells in dt for grf.

So we can drop this patch.



I seem to have only this patch and Heiko's patch for this -rc cycle. Once you
send your patch, I can send a pull request to Greg.

Thanks
Kishon

[1]- > https://patchwork.ozlabs.org/patch/601580/


Signed-off-by: Shawn Lin 
---

  Documentation/devicetree/bindings/phy/rockchip-emmc-phy.txt | 4 ++--
  drivers/phy/phy-rockchip-emmc.c | 2 +-
  2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/devicetree/bindings/phy/rockchip-emmc-phy.txt 
b/Documentation/devicetree/bindings/phy/rockchip-emmc-phy.txt
index 61916f1..ed964ef 100644
--- a/Documentation/devicetree/bindings/phy/rockchip-emmc-phy.txt
+++ b/Documentation/devicetree/bindings/phy/rockchip-emmc-phy.txt
@@ -6,7 +6,7 @@ Required properties:
   - rockchip,grf : phandle to the syscon managing the "general
 register files"
   - #phy-cells: must be 0
- - reg: PHY configure reg address offset in "general
+ - reg_offset: PHY configure reg address offset in "general
 register files"

  Example:
@@ -14,6 +14,6 @@ Example:
  emmcphy: phy {
compatible = "rockchip,rk3399-emmc-phy";
rockchip,grf = <>;
-   reg = <0xf780>;
+   reg_offset = <0xf780>;
#phy-cells = <0>;
  };
diff --git a/drivers/phy/phy-rockchip-emmc.c b/drivers/phy/phy-rockchip-emmc.c
index 887b4c2..3f55c0d 100644
--- a/drivers/phy/phy-rockchip-emmc.c
+++ b/drivers/phy/phy-rockchip-emmc.c
@@ -186,7 +186,7 @@ static int rockchip_emmc_phy_probe(struct platform_device 
*pdev)
if (!rk_phy)
return -ENOMEM;

-   if (of_property_read_u32(dev->of_node, "reg", _offset)) {
+   if (of_property_read_u32(dev->of_node, "reg_offset", _offset)) {
dev_err(dev, "missing reg property in node %s\n",
dev->of_node->name);
return -EINVAL;








--
Best Regards
Shawn Lin



Re: [RFC PATCH 0/3] restartable sequences v2: fast user-space percpu critical sections

2016-04-07 Thread Mathieu Desnoyers
- On Apr 7, 2016, at 6:05 PM, Andy Lutomirski l...@amacapital.net wrote:

> On Thu, Apr 7, 2016 at 1:11 PM, Peter Zijlstra  wrote:
>> On Thu, Apr 07, 2016 at 09:43:33AM -0700, Andy Lutomirski wrote:
[...]
>>
>>> it's inherently debuggable,
>>
>> It is more debuggable, agreed.
>>
>>> and it allows multiple independent
>>> rseq-protected things to coexist without forcing each other to abort.

[...]

My understanding is that the main goal of this rather more complex
proposal is to make interaction with debuggers more straightforward in
cases of single-stepping through the rseq critical section.

I recently came up with a scheme that should allow us to handle such
situations in a fashion similar to debuggers handling ll/sc
restartable sequences of instructions on e.g. powerpc. The good news
is that my scheme does not require anything at the kernel level.

The idea is simple: the userspace rseq critical sections now
become marked by 3 inline functions (rather than 2 in Paul's proposal):

rseq_start(void *rseq_key)
rseq_finish(void *rseq_key)
rseq_abort(void *rseq_key)

We associate each critical section with a unique "key" (dummy
1 byte object in the process address space), so we can group
them. The new "rseq_abort" would mark exit points that would
exit the critical section without executing the final commit
instruction.

Within each of rseq_start, rseq_finish and rseq_abort,
we declare a non-loadable section that gets populated
with the following tuples:

(RSEQ_TYPE, insn address, rseq_key)

Where RSEQ_TYPE is either RSEQ_START, RSEQ_FINISH, or RSEQ_ABORT.

That special section would be found in the executable by the
debugger, which can then skip over entire restartable critical
sections when it encounters them by placing breakpoints at
all exit points (finish and cancel) associated to the same
rseq_key as the entry point (start).

This way we don't need to complexify the runtime code, neither
at kernel nor user-space level, and we get debuggability using
a trick similar to what ll/sc architectures already need to do.

Of course, this requires extending gdb, which should not be
a show-stopper.

Thoughts ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: [RFC PATCH 0/3] restartable sequences v2: fast user-space percpu critical sections

2016-04-07 Thread Mathieu Desnoyers
- On Apr 7, 2016, at 6:05 PM, Andy Lutomirski l...@amacapital.net wrote:

> On Thu, Apr 7, 2016 at 1:11 PM, Peter Zijlstra  wrote:
>> On Thu, Apr 07, 2016 at 09:43:33AM -0700, Andy Lutomirski wrote:
[...]
>>
>>> it's inherently debuggable,
>>
>> It is more debuggable, agreed.
>>
>>> and it allows multiple independent
>>> rseq-protected things to coexist without forcing each other to abort.

[...]

My understanding is that the main goal of this rather more complex
proposal is to make interaction with debuggers more straightforward in
cases of single-stepping through the rseq critical section.

I recently came up with a scheme that should allow us to handle such
situations in a fashion similar to debuggers handling ll/sc
restartable sequences of instructions on e.g. powerpc. The good news
is that my scheme does not require anything at the kernel level.

The idea is simple: the userspace rseq critical sections now
become marked by 3 inline functions (rather than 2 in Paul's proposal):

rseq_start(void *rseq_key)
rseq_finish(void *rseq_key)
rseq_abort(void *rseq_key)

We associate each critical section with a unique "key" (dummy
1 byte object in the process address space), so we can group
them. The new "rseq_abort" would mark exit points that would
exit the critical section without executing the final commit
instruction.

Within each of rseq_start, rseq_finish and rseq_abort,
we declare a non-loadable section that gets populated
with the following tuples:

(RSEQ_TYPE, insn address, rseq_key)

Where RSEQ_TYPE is either RSEQ_START, RSEQ_FINISH, or RSEQ_ABORT.

That special section would be found in the executable by the
debugger, which can then skip over entire restartable critical
sections when it encounters them by placing breakpoints at
all exit points (finish and cancel) associated to the same
rseq_key as the entry point (start).

This way we don't need to complexify the runtime code, neither
at kernel nor user-space level, and we get debuggability using
a trick similar to what ll/sc architectures already need to do.

Of course, this requires extending gdb, which should not be
a show-stopper.

Thoughts ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


[PATCH 1/3] sched: Gather cpu load functions under a more conventional namespace

2016-04-07 Thread Frederic Weisbecker
The cpu load update related functions have a weak naming convention
currently, starting with update_cpu_load_*() which isn't ideal as
"update" is a very generic concept.

Since two of these functions are public already (and a third is to come)
that's enough to introduce a more conventional naming scheme. So let's
do the following rename instead:

update_cpu_load_*() -> cpu_load_update_*()

Cc: Byungchul Park 
Cc: Chris Metcalf 
Cc: Christoph Lameter 
Cc: Luiz Capitulino 
Cc: Mike Galbraith 
Cc: Paul E. McKenney 
Cc: Peter Zijlstra 
Cc: Rik van Riel 
Cc: Thomas Gleixner 
Signed-off-by: Frederic Weisbecker 
---
 Documentation/trace/ftrace.txt | 10 +-
 include/linux/sched.h  |  4 ++--
 kernel/sched/core.c|  2 +-
 kernel/sched/fair.c| 24 
 kernel/sched/sched.h   |  4 ++--
 kernel/time/tick-sched.c   |  2 +-
 6 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
index f52f297..9857606 100644
--- a/Documentation/trace/ftrace.txt
+++ b/Documentation/trace/ftrace.txt
@@ -1562,12 +1562,12 @@ Doing the same with chrt -r 5 and function-trace set.
   -0   3dN.1   12us : menu_hrtimer_cancel <-tick_nohz_idle_exit
   -0   3dN.1   12us : ktime_get <-tick_nohz_idle_exit
   -0   3dN.1   12us : tick_do_update_jiffies64 <-tick_nohz_idle_exit
-  -0   3dN.1   13us : update_cpu_load_nohz <-tick_nohz_idle_exit
-  -0   3dN.1   13us : _raw_spin_lock <-update_cpu_load_nohz
+  -0   3dN.1   13us : cpu_load_update_nohz <-tick_nohz_idle_exit
+  -0   3dN.1   13us : _raw_spin_lock <-cpu_load_update_nohz
   -0   3dN.1   13us : add_preempt_count <-_raw_spin_lock
-  -0   3dN.2   13us : __update_cpu_load <-update_cpu_load_nohz
-  -0   3dN.2   14us : sched_avg_update <-__update_cpu_load
-  -0   3dN.2   14us : _raw_spin_unlock <-update_cpu_load_nohz
+  -0   3dN.2   13us : __cpu_load_update <-cpu_load_update_nohz
+  -0   3dN.2   14us : sched_avg_update <-__cpu_load_update
+  -0   3dN.2   14us : _raw_spin_unlock <-cpu_load_update_nohz
   -0   3dN.2   14us : sub_preempt_count <-_raw_spin_unlock
   -0   3dN.1   15us : calc_load_exit_idle <-tick_nohz_idle_exit
   -0   3dN.1   15us : touch_softlockup_watchdog <-tick_nohz_idle_exit
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 52c4847..aa5ee0d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -178,9 +178,9 @@ extern void get_iowait_load(unsigned long *nr_waiters, 
unsigned long *load);
 extern void calc_global_load(unsigned long ticks);
 
 #if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
-extern void update_cpu_load_nohz(int active);
+extern void cpu_load_update_nohz(int active);
 #else
-static inline void update_cpu_load_nohz(int active) { }
+static inline void cpu_load_update_nohz(int active) { }
 #endif
 
 extern void dump_cpu_task(int cpu);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8b489fc..4c522a7 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2915,7 +2915,7 @@ void scheduler_tick(void)
raw_spin_lock(>lock);
update_rq_clock(rq);
curr->sched_class->task_tick(rq, curr, 0);
-   update_cpu_load_active(rq);
+   cpu_load_update_active(rq);
calc_global_load_tick(rq);
raw_spin_unlock(>lock);
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0fe30e66..f33764d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4491,7 +4491,7 @@ decay_load_missed(unsigned long load, unsigned long 
missed_updates, int idx)
 }
 
 /**
- * __update_cpu_load - update the rq->cpu_load[] statistics
+ * __cpu_load_update - update the rq->cpu_load[] statistics
  * @this_rq: The rq to update statistics for
  * @this_load: The current load
  * @pending_updates: The number of missed updates
@@ -4526,7 +4526,7 @@ decay_load_missed(unsigned long load, unsigned long 
missed_updates, int idx)
  * see decay_load_misses(). For NOHZ_FULL we get to subtract and add the extra
  * term. See the @active paramter.
  */
-static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
+static void __cpu_load_update(struct rq *this_rq, unsigned long this_load,
  unsigned long pending_updates, int active)
 {
unsigned long tickless_load = active ? this_rq->cpu_load[0] : 0;
@@ -4574,7 +4574,7 @@ static unsigned long weighted_cpuload(const int cpu)
 }
 
 #ifdef CONFIG_NO_HZ_COMMON
-static void __update_cpu_load_nohz(struct rq *this_rq,
+static void __cpu_load_update_nohz(struct rq *this_rq,
   unsigned long curr_jiffies,
   unsigned long load,
 

[PATCH 1/3] sched: Gather cpu load functions under a more conventional namespace

2016-04-07 Thread Frederic Weisbecker
The cpu load update related functions have a weak naming convention
currently, starting with update_cpu_load_*() which isn't ideal as
"update" is a very generic concept.

Since two of these functions are public already (and a third is to come)
that's enough to introduce a more conventional naming scheme. So let's
do the following rename instead:

update_cpu_load_*() -> cpu_load_update_*()

Cc: Byungchul Park 
Cc: Chris Metcalf 
Cc: Christoph Lameter 
Cc: Luiz Capitulino 
Cc: Mike Galbraith 
Cc: Paul E. McKenney 
Cc: Peter Zijlstra 
Cc: Rik van Riel 
Cc: Thomas Gleixner 
Signed-off-by: Frederic Weisbecker 
---
 Documentation/trace/ftrace.txt | 10 +-
 include/linux/sched.h  |  4 ++--
 kernel/sched/core.c|  2 +-
 kernel/sched/fair.c| 24 
 kernel/sched/sched.h   |  4 ++--
 kernel/time/tick-sched.c   |  2 +-
 6 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
index f52f297..9857606 100644
--- a/Documentation/trace/ftrace.txt
+++ b/Documentation/trace/ftrace.txt
@@ -1562,12 +1562,12 @@ Doing the same with chrt -r 5 and function-trace set.
   -0   3dN.1   12us : menu_hrtimer_cancel <-tick_nohz_idle_exit
   -0   3dN.1   12us : ktime_get <-tick_nohz_idle_exit
   -0   3dN.1   12us : tick_do_update_jiffies64 <-tick_nohz_idle_exit
-  -0   3dN.1   13us : update_cpu_load_nohz <-tick_nohz_idle_exit
-  -0   3dN.1   13us : _raw_spin_lock <-update_cpu_load_nohz
+  -0   3dN.1   13us : cpu_load_update_nohz <-tick_nohz_idle_exit
+  -0   3dN.1   13us : _raw_spin_lock <-cpu_load_update_nohz
   -0   3dN.1   13us : add_preempt_count <-_raw_spin_lock
-  -0   3dN.2   13us : __update_cpu_load <-update_cpu_load_nohz
-  -0   3dN.2   14us : sched_avg_update <-__update_cpu_load
-  -0   3dN.2   14us : _raw_spin_unlock <-update_cpu_load_nohz
+  -0   3dN.2   13us : __cpu_load_update <-cpu_load_update_nohz
+  -0   3dN.2   14us : sched_avg_update <-__cpu_load_update
+  -0   3dN.2   14us : _raw_spin_unlock <-cpu_load_update_nohz
   -0   3dN.2   14us : sub_preempt_count <-_raw_spin_unlock
   -0   3dN.1   15us : calc_load_exit_idle <-tick_nohz_idle_exit
   -0   3dN.1   15us : touch_softlockup_watchdog <-tick_nohz_idle_exit
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 52c4847..aa5ee0d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -178,9 +178,9 @@ extern void get_iowait_load(unsigned long *nr_waiters, 
unsigned long *load);
 extern void calc_global_load(unsigned long ticks);
 
 #if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
-extern void update_cpu_load_nohz(int active);
+extern void cpu_load_update_nohz(int active);
 #else
-static inline void update_cpu_load_nohz(int active) { }
+static inline void cpu_load_update_nohz(int active) { }
 #endif
 
 extern void dump_cpu_task(int cpu);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8b489fc..4c522a7 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2915,7 +2915,7 @@ void scheduler_tick(void)
raw_spin_lock(>lock);
update_rq_clock(rq);
curr->sched_class->task_tick(rq, curr, 0);
-   update_cpu_load_active(rq);
+   cpu_load_update_active(rq);
calc_global_load_tick(rq);
raw_spin_unlock(>lock);
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0fe30e66..f33764d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4491,7 +4491,7 @@ decay_load_missed(unsigned long load, unsigned long 
missed_updates, int idx)
 }
 
 /**
- * __update_cpu_load - update the rq->cpu_load[] statistics
+ * __cpu_load_update - update the rq->cpu_load[] statistics
  * @this_rq: The rq to update statistics for
  * @this_load: The current load
  * @pending_updates: The number of missed updates
@@ -4526,7 +4526,7 @@ decay_load_missed(unsigned long load, unsigned long 
missed_updates, int idx)
  * see decay_load_misses(). For NOHZ_FULL we get to subtract and add the extra
  * term. See the @active paramter.
  */
-static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
+static void __cpu_load_update(struct rq *this_rq, unsigned long this_load,
  unsigned long pending_updates, int active)
 {
unsigned long tickless_load = active ? this_rq->cpu_load[0] : 0;
@@ -4574,7 +4574,7 @@ static unsigned long weighted_cpuload(const int cpu)
 }
 
 #ifdef CONFIG_NO_HZ_COMMON
-static void __update_cpu_load_nohz(struct rq *this_rq,
+static void __cpu_load_update_nohz(struct rq *this_rq,
   unsigned long curr_jiffies,
   unsigned long load,
   int active)
@@ -4589,7 +4589,7 @@ static void __update_cpu_load_nohz(struct rq *this_rq,
 * In the NOHZ_FULL case, we were non-idle, we should consider
 * its weighted 

Re: [PATCH v6 01/12] usb: hcd: Initialize hcd->flags to 0

2016-04-07 Thread Peter Chen
On Thu, Apr 07, 2016 at 01:40:21PM +0300, Roger Quadros wrote:
> On 07/04/16 12:42, Peter Chen wrote:
> > On Wed, Apr 06, 2016 at 09:32:22AM +0300, Roger Quadros wrote:
> >> On 06/04/16 09:09, Felipe Balbi wrote:
> >>>
> >>> Hi,
> >>>
> >>> Roger Quadros  writes:
>  diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
>  index 2ca2cef..6b1930d 100644
>  --- a/drivers/usb/core/hcd.c
>  +++ b/drivers/usb/core/hcd.c
>  @@ -2706,6 +2706,7 @@ int usb_add_hcd(struct usb_hcd *hcd,
>   int retval;
>   struct usb_device *rhdev;
>   
>  +hcd->flags = 0;
> >>>
> > 
> > I am not sure if this usb_add(remove)_hcd pair is safe and clean enough
> > for start/stop host role. From my point, we may need to do like
> > .probe/.remove host platform driver interface. In that case, we can make
> 
> probe and remove are meant to be called from bus layer.
> I do not see a way how OTG framework can call probe/remove of HCD driver.
> Some HCDs may be platform devices, some PCI, so different entities are calling
> the HCD .probe hook.
> 
> > sure the clocks and regulators are off, and hcd will be zero-initialized
> 
> why can't we make that sure that is taken care of within the hcd_ops?
> Why should some driver keep its regulators and clocks enabled when hcd is 
> stopped?
> It doesn't need to. If it is doing so now, it needs to be fixed.
> 

Well, you may misunderstand me. I mean your hcd_ops->start or ->stop
is hard to be a general one which only calls usb_hcd_add or
usb_hcd_remove. It needs to implement like .probe or .remove at platform
driver, some example code like host_start and host_stop at
drivers/usb/chipidea/host.c.

-- 

Best Regards,
Peter Chen


Re: [PATCH v6 01/12] usb: hcd: Initialize hcd->flags to 0

2016-04-07 Thread Peter Chen
On Thu, Apr 07, 2016 at 01:40:21PM +0300, Roger Quadros wrote:
> On 07/04/16 12:42, Peter Chen wrote:
> > On Wed, Apr 06, 2016 at 09:32:22AM +0300, Roger Quadros wrote:
> >> On 06/04/16 09:09, Felipe Balbi wrote:
> >>>
> >>> Hi,
> >>>
> >>> Roger Quadros  writes:
>  diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
>  index 2ca2cef..6b1930d 100644
>  --- a/drivers/usb/core/hcd.c
>  +++ b/drivers/usb/core/hcd.c
>  @@ -2706,6 +2706,7 @@ int usb_add_hcd(struct usb_hcd *hcd,
>   int retval;
>   struct usb_device *rhdev;
>   
>  +hcd->flags = 0;
> >>>
> > 
> > I am not sure if this usb_add(remove)_hcd pair is safe and clean enough
> > for start/stop host role. From my point, we may need to do like
> > .probe/.remove host platform driver interface. In that case, we can make
> 
> probe and remove are meant to be called from bus layer.
> I do not see a way how OTG framework can call probe/remove of HCD driver.
> Some HCDs may be platform devices, some PCI, so different entities are calling
> the HCD .probe hook.
> 
> > sure the clocks and regulators are off, and hcd will be zero-initialized
> 
> why can't we make that sure that is taken care of within the hcd_ops?
> Why should some driver keep its regulators and clocks enabled when hcd is 
> stopped?
> It doesn't need to. If it is doing so now, it needs to be fixed.
> 

Well, you may misunderstand me. I mean your hcd_ops->start or ->stop
is hard to be a general one which only calls usb_hcd_add or
usb_hcd_remove. It needs to implement like .probe or .remove at platform
driver, some example code like host_start and host_stop at
drivers/usb/chipidea/host.c.

-- 

Best Regards,
Peter Chen


[PATCH 3/3] sched: Optimize !CONFIG_NO_HZ_COMMON cpu load updates

2016-04-07 Thread Frederic Weisbecker
Some code in cpu load update only concern NO_HZ configs but it is
built on all configurations. When NO_HZ isn't built, that code is harmless
but just happens to take some useless ressources in CPU and memory:

1) one useless field in struct rq
2) jiffies record on every tick that is never used (cpu_load_update_periodic)
3) decay_load_missed is called two times on every tick to eventually
   return immediately with no action taken. And that function is dead
   code.

For pure optimization purposes, lets conditionally build the NO_HZ
related code.

Cc: Byungchul Park 
Cc: Chris Metcalf 
Cc: Christoph Lameter 
Cc: Ingo Molnar 
Cc: Luiz Capitulino 
Cc: Mike Galbraith 
Cc: Paul E. McKenney 
Cc: Peter Zijlstra 
Cc: Rik van Riel 
Cc: Thomas Gleixner 
Signed-off-by: Frederic Weisbecker 
---
 kernel/sched/core.c  |  3 ++-
 kernel/sched/fair.c  | 43 ---
 kernel/sched/sched.h |  6 --
 3 files changed, 38 insertions(+), 14 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4c522a7..59a2821 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7327,8 +7327,9 @@ void __init sched_init(void)
 
for (j = 0; j < CPU_LOAD_IDX_MAX; j++)
rq->cpu_load[j] = 0;
-
+#ifdef CONFIG_NO_HZ_COMMON
rq->last_load_update_tick = jiffies;
+#endif
 
 #ifdef CONFIG_SMP
rq->sd = NULL;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1dd864d..4618e5b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4423,6 +4423,7 @@ static void dequeue_task_fair(struct rq *rq, struct 
task_struct *p, int flags)
 }
 
 #ifdef CONFIG_SMP
+#ifdef CONFIG_NO_HZ_COMMON
 
 /*
  * per rq 'load' arrray crap; XXX kill this.
@@ -4490,6 +4491,33 @@ decay_load_missed(unsigned long load, unsigned long 
missed_updates, int idx)
return load;
 }
 
+static unsigned long
+cpu_load_update_missed(unsigned long old_load, unsigned long tickless_load,
+  unsigned long pending_updates, int idx)
+{
+   old_load = decay_load_missed(old_load, pending_updates - 1, idx);
+   if (tickless_load) {
+   old_load -= decay_load_missed(tickless_load, pending_updates - 
1, idx);
+   /*
+* old_load can never be a negative value because a
+* decayed tickless_load cannot be greater than the
+* original tickless_load.
+*/
+   old_load += tickless_load;
+   }
+   return old_load;
+}
+#else /* !CONFIG_NO_HZ_COMMON */
+
+static inline unsigned long
+cpu_load_update_missed(unsigned long old_load, unsigned long tickless_load,
+  unsigned long pending_updates, int idx)
+{
+   return old_load;
+}
+
+#endif /* CONFIG_NO_HZ_COMMON */
+
 /**
  * __cpu_load_update - update the rq->cpu_load[] statistics
  * @this_rq: The rq to update statistics for
@@ -4540,17 +4568,8 @@ static void cpu_load_update(struct rq *this_rq, unsigned 
long this_load,
 
/* scale is effectively 1 << i now, and >> i divides by scale */
 
-   old_load = this_rq->cpu_load[i];
-   old_load = decay_load_missed(old_load, pending_updates - 1, i);
-   if (tickless_load) {
-   old_load -= decay_load_missed(tickless_load, 
pending_updates - 1, i);
-   /*
-* old_load can never be a negative value because a
-* decayed tickless_load cannot be greater than the
-* original tickless_load.
-*/
-   old_load += tickless_load;
-   }
+   old_load = cpu_load_update_missed(this_rq->cpu_load[i],
+ tickless_load, 
pending_updates, i);
new_load = this_load;
/*
 * Round up the averaging division if load is increasing. This
@@ -4661,8 +4680,10 @@ static inline void cpu_load_update_nohz(struct rq 
*this_rq,
 
 static void cpu_load_update_periodic(struct rq *this_rq, unsigned long load)
 {
+#ifdef CONFIG_NO_HZ_COMMON
/* See the mess around cpu_load_update_nohz(). */
this_rq->last_load_update_tick = READ_ONCE(jiffies);
+#endif
cpu_load_update(this_rq, load, 1);
 }
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 1802013..2302bb6 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -585,11 +585,13 @@ struct rq {
 #endif
#define CPU_LOAD_IDX_MAX 5
unsigned long cpu_load[CPU_LOAD_IDX_MAX];
+#ifdef CONFIG_NO_HZ_COMMON
+#ifdef CONFIG_SMP
unsigned long last_load_update_tick;
-#ifdef CONFIG_NO_HZ_COMMON
+#endif /* CONFIG_SMP */
 

[PATCH 3/3] sched: Optimize !CONFIG_NO_HZ_COMMON cpu load updates

2016-04-07 Thread Frederic Weisbecker
Some code in cpu load update only concern NO_HZ configs but it is
built on all configurations. When NO_HZ isn't built, that code is harmless
but just happens to take some useless ressources in CPU and memory:

1) one useless field in struct rq
2) jiffies record on every tick that is never used (cpu_load_update_periodic)
3) decay_load_missed is called two times on every tick to eventually
   return immediately with no action taken. And that function is dead
   code.

For pure optimization purposes, lets conditionally build the NO_HZ
related code.

Cc: Byungchul Park 
Cc: Chris Metcalf 
Cc: Christoph Lameter 
Cc: Ingo Molnar 
Cc: Luiz Capitulino 
Cc: Mike Galbraith 
Cc: Paul E. McKenney 
Cc: Peter Zijlstra 
Cc: Rik van Riel 
Cc: Thomas Gleixner 
Signed-off-by: Frederic Weisbecker 
---
 kernel/sched/core.c  |  3 ++-
 kernel/sched/fair.c  | 43 ---
 kernel/sched/sched.h |  6 --
 3 files changed, 38 insertions(+), 14 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4c522a7..59a2821 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7327,8 +7327,9 @@ void __init sched_init(void)
 
for (j = 0; j < CPU_LOAD_IDX_MAX; j++)
rq->cpu_load[j] = 0;
-
+#ifdef CONFIG_NO_HZ_COMMON
rq->last_load_update_tick = jiffies;
+#endif
 
 #ifdef CONFIG_SMP
rq->sd = NULL;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1dd864d..4618e5b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4423,6 +4423,7 @@ static void dequeue_task_fair(struct rq *rq, struct 
task_struct *p, int flags)
 }
 
 #ifdef CONFIG_SMP
+#ifdef CONFIG_NO_HZ_COMMON
 
 /*
  * per rq 'load' arrray crap; XXX kill this.
@@ -4490,6 +4491,33 @@ decay_load_missed(unsigned long load, unsigned long 
missed_updates, int idx)
return load;
 }
 
+static unsigned long
+cpu_load_update_missed(unsigned long old_load, unsigned long tickless_load,
+  unsigned long pending_updates, int idx)
+{
+   old_load = decay_load_missed(old_load, pending_updates - 1, idx);
+   if (tickless_load) {
+   old_load -= decay_load_missed(tickless_load, pending_updates - 
1, idx);
+   /*
+* old_load can never be a negative value because a
+* decayed tickless_load cannot be greater than the
+* original tickless_load.
+*/
+   old_load += tickless_load;
+   }
+   return old_load;
+}
+#else /* !CONFIG_NO_HZ_COMMON */
+
+static inline unsigned long
+cpu_load_update_missed(unsigned long old_load, unsigned long tickless_load,
+  unsigned long pending_updates, int idx)
+{
+   return old_load;
+}
+
+#endif /* CONFIG_NO_HZ_COMMON */
+
 /**
  * __cpu_load_update - update the rq->cpu_load[] statistics
  * @this_rq: The rq to update statistics for
@@ -4540,17 +4568,8 @@ static void cpu_load_update(struct rq *this_rq, unsigned 
long this_load,
 
/* scale is effectively 1 << i now, and >> i divides by scale */
 
-   old_load = this_rq->cpu_load[i];
-   old_load = decay_load_missed(old_load, pending_updates - 1, i);
-   if (tickless_load) {
-   old_load -= decay_load_missed(tickless_load, 
pending_updates - 1, i);
-   /*
-* old_load can never be a negative value because a
-* decayed tickless_load cannot be greater than the
-* original tickless_load.
-*/
-   old_load += tickless_load;
-   }
+   old_load = cpu_load_update_missed(this_rq->cpu_load[i],
+ tickless_load, 
pending_updates, i);
new_load = this_load;
/*
 * Round up the averaging division if load is increasing. This
@@ -4661,8 +4680,10 @@ static inline void cpu_load_update_nohz(struct rq 
*this_rq,
 
 static void cpu_load_update_periodic(struct rq *this_rq, unsigned long load)
 {
+#ifdef CONFIG_NO_HZ_COMMON
/* See the mess around cpu_load_update_nohz(). */
this_rq->last_load_update_tick = READ_ONCE(jiffies);
+#endif
cpu_load_update(this_rq, load, 1);
 }
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 1802013..2302bb6 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -585,11 +585,13 @@ struct rq {
 #endif
#define CPU_LOAD_IDX_MAX 5
unsigned long cpu_load[CPU_LOAD_IDX_MAX];
+#ifdef CONFIG_NO_HZ_COMMON
+#ifdef CONFIG_SMP
unsigned long last_load_update_tick;
-#ifdef CONFIG_NO_HZ_COMMON
+#endif /* CONFIG_SMP */
u64 nohz_stamp;
unsigned long nohz_flags;
-#endif
+#endif /* CONFIG_NO_HZ_COMMON */
 #ifdef CONFIG_NO_HZ_FULL
unsigned long last_sched_tick;
 #endif
-- 
2.7.0



[PATCH 2/3] sched: Correctly handle nohz ticks cpu load accounting

2016-04-07 Thread Frederic Weisbecker
Ticks can happen while the CPU is in dynticks-idle or dynticks-singletask
mode. In fact "nohz" or "dynticks" only mean that we exit the periodic
mode and we try to minimize the ticks as much as possible. The nohz
subsystem uses a confusing terminology with the internal state
"ts->tick_stopped" which is also available through its public interface
with tick_nohz_tick_stopped(). This is a misnomer as the tick is instead
reduced with the best effort rather than stopped. In the best case the
tick can indeed be actually stopped but there is no guarantee about that.
If a timer needs to fire one second later, a tick will fire while the
CPU is in nohz mode and this is a very common scenario.

Now this confusion happens to be a problem with cpu load updates:
cpu_load_update_active() doesn't handle nohz ticks correctly because it
assumes that ticks are completely stopped in nohz mode and that
cpu_load_update_active() can't be called in dynticks mode. When that
happens, the whole previous tickless load is ignored and the function
just records the load for the current tick, ignoring potentially long
idle periods behind.

In order to solve this, we could account the current load for the
previous nohz time but there is a risk that we account the load of a
task that got freshly enqueued for the whole nohz period.

So instead, lets record the dynticks load on nohz frame entry so we know
what to record in case of nohz ticks, then use this record to account
the tickless load on nohz ticks and nohz frame end.

Cc: Byungchul Park 
Cc: Chris Metcalf 
Cc: Christoph Lameter 
Cc: Ingo Molnar 
Cc: Luiz Capitulino 
Cc: Mike Galbraith 
Cc: Paul E. McKenney 
Cc: Peter Zijlstra 
Cc: Rik van Riel 
Cc: Thomas Gleixner 
Signed-off-by: Frederic Weisbecker 
---
 include/linux/sched.h|  6 ++-
 kernel/sched/fair.c  | 95 +++-
 kernel/time/tick-sched.c |  9 +++--
 3 files changed, 70 insertions(+), 40 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index aa5ee0d..07b1b1e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -178,9 +178,11 @@ extern void get_iowait_load(unsigned long *nr_waiters, 
unsigned long *load);
 extern void calc_global_load(unsigned long ticks);
 
 #if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
-extern void cpu_load_update_nohz(int active);
+extern void cpu_load_update_nohz_start(void);
+extern void cpu_load_update_nohz_stop(void);
 #else
-static inline void cpu_load_update_nohz(int active) { }
+static inline void cpu_load_update_nohz_start(void) { }
+static inline void cpu_load_update_nohz_stop(void) { }
 #endif
 
 extern void dump_cpu_task(int cpu);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f33764d..1dd864d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4495,7 +4495,6 @@ decay_load_missed(unsigned long load, unsigned long 
missed_updates, int idx)
  * @this_rq: The rq to update statistics for
  * @this_load: The current load
  * @pending_updates: The number of missed updates
- * @active: !0 for NOHZ_FULL
  *
  * Update rq->cpu_load[] statistics. This function is usually called every
  * scheduler tick (TICK_NSEC).
@@ -4524,12 +4523,12 @@ decay_load_missed(unsigned long load, unsigned long 
missed_updates, int idx)
  *   load[i]_n = (1 - 1/2^i)^n * load[i]_0
  *
  * see decay_load_misses(). For NOHZ_FULL we get to subtract and add the extra
- * term. See the @active paramter.
+ * term.
  */
-static void __cpu_load_update(struct rq *this_rq, unsigned long this_load,
- unsigned long pending_updates, int active)
+static void cpu_load_update(struct rq *this_rq, unsigned long this_load,
+   unsigned long pending_updates)
 {
-   unsigned long tickless_load = active ? this_rq->cpu_load[0] : 0;
+   unsigned long tickless_load = this_rq->cpu_load[0];
int i, scale;
 
this_rq->nr_load_updates++;
@@ -4574,10 +4573,23 @@ static unsigned long weighted_cpuload(const int cpu)
 }
 
 #ifdef CONFIG_NO_HZ_COMMON
-static void __cpu_load_update_nohz(struct rq *this_rq,
-  unsigned long curr_jiffies,
-  unsigned long load,
-  int active)
+/*
+ * There is no sane way to deal with nohz on smp when using jiffies because the
+ * cpu doing the jiffies update might drift wrt the cpu doing the jiffy reading
+ * causing off-by-one errors in observed deltas; {0,2} instead of {1,1}.
+ *
+ * Therefore we need to avoid the delta approach from the regular tick when
+ * possible since that would seriously skew the load calculation. This is why 
we
+ * use cpu_load_update_periodic() for CPUs out of nohz. However we'll rely on
+ * jiffies 

[PATCH 0/3] sched: Fix/improve nohz cpu load updates v2

2016-04-07 Thread Frederic Weisbecker
This series tries to fix issues against CFS cpu load accounting in
nohz configs (both idle and full nohz). Some optimizations coming along.

Following Peterz reviews, I've tried to improve the changelogs. Comments
have been updated as well and some changes have been refactored.

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
sched/nohz-v2

HEAD: 55cb7c1a25acb140253ca4e55f8d43cd31404b55

Thanks,
Frederic
---

Frederic Weisbecker (3):
  sched: Gather cpu load functions under a more conventional namespace
  sched: Correctly handle nohz ticks cpu load accounting
  sched: Optimize !CONFIG_NO_HZ_COMMON cpu load updates


 Documentation/trace/ftrace.txt |  10 +--
 include/linux/sched.h  |   6 +-
 kernel/sched/core.c|   5 +-
 kernel/sched/fair.c| 146 +++--
 kernel/sched/sched.h   |  10 +--
 kernel/time/tick-sched.c   |   9 +--
 6 files changed, 120 insertions(+), 66 deletions(-)


[PATCH 2/3] sched: Correctly handle nohz ticks cpu load accounting

2016-04-07 Thread Frederic Weisbecker
Ticks can happen while the CPU is in dynticks-idle or dynticks-singletask
mode. In fact "nohz" or "dynticks" only mean that we exit the periodic
mode and we try to minimize the ticks as much as possible. The nohz
subsystem uses a confusing terminology with the internal state
"ts->tick_stopped" which is also available through its public interface
with tick_nohz_tick_stopped(). This is a misnomer as the tick is instead
reduced with the best effort rather than stopped. In the best case the
tick can indeed be actually stopped but there is no guarantee about that.
If a timer needs to fire one second later, a tick will fire while the
CPU is in nohz mode and this is a very common scenario.

Now this confusion happens to be a problem with cpu load updates:
cpu_load_update_active() doesn't handle nohz ticks correctly because it
assumes that ticks are completely stopped in nohz mode and that
cpu_load_update_active() can't be called in dynticks mode. When that
happens, the whole previous tickless load is ignored and the function
just records the load for the current tick, ignoring potentially long
idle periods behind.

In order to solve this, we could account the current load for the
previous nohz time but there is a risk that we account the load of a
task that got freshly enqueued for the whole nohz period.

So instead, lets record the dynticks load on nohz frame entry so we know
what to record in case of nohz ticks, then use this record to account
the tickless load on nohz ticks and nohz frame end.

Cc: Byungchul Park 
Cc: Chris Metcalf 
Cc: Christoph Lameter 
Cc: Ingo Molnar 
Cc: Luiz Capitulino 
Cc: Mike Galbraith 
Cc: Paul E. McKenney 
Cc: Peter Zijlstra 
Cc: Rik van Riel 
Cc: Thomas Gleixner 
Signed-off-by: Frederic Weisbecker 
---
 include/linux/sched.h|  6 ++-
 kernel/sched/fair.c  | 95 +++-
 kernel/time/tick-sched.c |  9 +++--
 3 files changed, 70 insertions(+), 40 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index aa5ee0d..07b1b1e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -178,9 +178,11 @@ extern void get_iowait_load(unsigned long *nr_waiters, 
unsigned long *load);
 extern void calc_global_load(unsigned long ticks);
 
 #if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
-extern void cpu_load_update_nohz(int active);
+extern void cpu_load_update_nohz_start(void);
+extern void cpu_load_update_nohz_stop(void);
 #else
-static inline void cpu_load_update_nohz(int active) { }
+static inline void cpu_load_update_nohz_start(void) { }
+static inline void cpu_load_update_nohz_stop(void) { }
 #endif
 
 extern void dump_cpu_task(int cpu);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f33764d..1dd864d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4495,7 +4495,6 @@ decay_load_missed(unsigned long load, unsigned long 
missed_updates, int idx)
  * @this_rq: The rq to update statistics for
  * @this_load: The current load
  * @pending_updates: The number of missed updates
- * @active: !0 for NOHZ_FULL
  *
  * Update rq->cpu_load[] statistics. This function is usually called every
  * scheduler tick (TICK_NSEC).
@@ -4524,12 +4523,12 @@ decay_load_missed(unsigned long load, unsigned long 
missed_updates, int idx)
  *   load[i]_n = (1 - 1/2^i)^n * load[i]_0
  *
  * see decay_load_misses(). For NOHZ_FULL we get to subtract and add the extra
- * term. See the @active paramter.
+ * term.
  */
-static void __cpu_load_update(struct rq *this_rq, unsigned long this_load,
- unsigned long pending_updates, int active)
+static void cpu_load_update(struct rq *this_rq, unsigned long this_load,
+   unsigned long pending_updates)
 {
-   unsigned long tickless_load = active ? this_rq->cpu_load[0] : 0;
+   unsigned long tickless_load = this_rq->cpu_load[0];
int i, scale;
 
this_rq->nr_load_updates++;
@@ -4574,10 +4573,23 @@ static unsigned long weighted_cpuload(const int cpu)
 }
 
 #ifdef CONFIG_NO_HZ_COMMON
-static void __cpu_load_update_nohz(struct rq *this_rq,
-  unsigned long curr_jiffies,
-  unsigned long load,
-  int active)
+/*
+ * There is no sane way to deal with nohz on smp when using jiffies because the
+ * cpu doing the jiffies update might drift wrt the cpu doing the jiffy reading
+ * causing off-by-one errors in observed deltas; {0,2} instead of {1,1}.
+ *
+ * Therefore we need to avoid the delta approach from the regular tick when
+ * possible since that would seriously skew the load calculation. This is why 
we
+ * use cpu_load_update_periodic() for CPUs out of nohz. However we'll rely on
+ * jiffies deltas for updates happening while in nohz mode (idle ticks, idle
+ * loop exit, nohz_idle_balance, nohz full exit...)
+ *
+ * This means we might still be one tick off for nohz periods.
+ */
+
+static void 

[PATCH 0/3] sched: Fix/improve nohz cpu load updates v2

2016-04-07 Thread Frederic Weisbecker
This series tries to fix issues against CFS cpu load accounting in
nohz configs (both idle and full nohz). Some optimizations coming along.

Following Peterz reviews, I've tried to improve the changelogs. Comments
have been updated as well and some changes have been refactored.

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
sched/nohz-v2

HEAD: 55cb7c1a25acb140253ca4e55f8d43cd31404b55

Thanks,
Frederic
---

Frederic Weisbecker (3):
  sched: Gather cpu load functions under a more conventional namespace
  sched: Correctly handle nohz ticks cpu load accounting
  sched: Optimize !CONFIG_NO_HZ_COMMON cpu load updates


 Documentation/trace/ftrace.txt |  10 +--
 include/linux/sched.h  |   6 +-
 kernel/sched/core.c|   5 +-
 kernel/sched/fair.c| 146 +++--
 kernel/sched/sched.h   |  10 +--
 kernel/time/tick-sched.c   |   9 +--
 6 files changed, 120 insertions(+), 66 deletions(-)


Re: [PATCH v2 net-next 00/10] allow bpf attach to tracepoints

2016-04-07 Thread David Miller

Series applied, thanks Alexei.


Re: [PATCH v2 net-next 00/10] allow bpf attach to tracepoints

2016-04-07 Thread David Miller

Series applied, thanks Alexei.


Re: [PATCH v11 44/60] PCI: Add alt_size ressource allocation support

2016-04-07 Thread Linus Torvalds
On Thu, Apr 7, 2016 at 5:15 PM, Yinghai Lu  wrote:
> On system with several pcie switches, BIOS allocate very tight resources
> to the bridge bar, and it is not aligned to min_align as kernel allocation
> code.

Ok, this came in after I already replied to the other ones.

I'm not excited about the whole "alternate aligment".

Maybe the kernel should just accept the smaller alignment. If the
minimum alignment we use is bigger than necessary, then we're just
wrong about it, and perhaps we should just use the smaller alignment
that the bios used.

So instead of adding this notion of alternate alignment, maybe we
should just not be so uptight about our own minimum alignment
requirements?

  Linus


Re: [PATCH v11 44/60] PCI: Add alt_size ressource allocation support

2016-04-07 Thread Linus Torvalds
On Thu, Apr 7, 2016 at 5:15 PM, Yinghai Lu  wrote:
> On system with several pcie switches, BIOS allocate very tight resources
> to the bridge bar, and it is not aligned to min_align as kernel allocation
> code.

Ok, this came in after I already replied to the other ones.

I'm not excited about the whole "alternate aligment".

Maybe the kernel should just accept the smaller alignment. If the
minimum alignment we use is bigger than necessary, then we're just
wrong about it, and perhaps we should just use the smaller alignment
that the bios used.

So instead of adding this notion of alternate alignment, maybe we
should just not be so uptight about our own minimum alignment
requirements?

  Linus


Re: [PATCH V3] net: emac: emac gigabit ethernet controller driver

2016-04-07 Thread Andrew Lunn
On Thu, Apr 07, 2016 at 04:43:47PM -0500, Timur Tabi wrote:
> On my platform, firmware (UEFI) configures all of the GPIOs.  I need
> to get confirmation, but it appears that we don't actually make any
> GPIO calls at all.  I see code that looks like this:
> 
>   for (i = 0; (!adpt->no_mdio_gpio) && i < EMAC_NUM_GPIO; i++) {
>   gpio_info = >gpio_info[i];
>   retval = of_get_named_gpio(node, gpio_info->name, 0);
>   if (retval < 0)
>   return retval;
> 
> And on our ACPI system, adpt->no_mdio_gpio is always true:
> 
>   /* Assume GPIOs required for MDC/MDIO are enabled in firmware */
>   adpt->no_mdio_gpio = true;

There are two different things here. One is configuring the pin to be
a GPIO. The second is using the GPIO as a GPIO. In this case,
bit-banging the MDIO bus.

The firmware could be doing the configuration, setting the pin as a
GPIO. However, the firmware cannot be doing the MDIO bit-banging to
make an MDIO bus available. Linux has to do that.

Or it could be we have all completely misunderstood the hardware, and
we are not doing bit-banging GPIO MDIO. There is a real MDIO
controller there, we don't use these pins as GPIOs, etc

   Andrew


Re: [PATCH V3] net: emac: emac gigabit ethernet controller driver

2016-04-07 Thread Andrew Lunn
On Thu, Apr 07, 2016 at 04:43:47PM -0500, Timur Tabi wrote:
> On my platform, firmware (UEFI) configures all of the GPIOs.  I need
> to get confirmation, but it appears that we don't actually make any
> GPIO calls at all.  I see code that looks like this:
> 
>   for (i = 0; (!adpt->no_mdio_gpio) && i < EMAC_NUM_GPIO; i++) {
>   gpio_info = >gpio_info[i];
>   retval = of_get_named_gpio(node, gpio_info->name, 0);
>   if (retval < 0)
>   return retval;
> 
> And on our ACPI system, adpt->no_mdio_gpio is always true:
> 
>   /* Assume GPIOs required for MDC/MDIO are enabled in firmware */
>   adpt->no_mdio_gpio = true;

There are two different things here. One is configuring the pin to be
a GPIO. The second is using the GPIO as a GPIO. In this case,
bit-banging the MDIO bus.

The firmware could be doing the configuration, setting the pin as a
GPIO. However, the firmware cannot be doing the MDIO bit-banging to
make an MDIO bus available. Linux has to do that.

Or it could be we have all completely misunderstood the hardware, and
we are not doing bit-banging GPIO MDIO. There is a real MDIO
controller there, we don't use these pins as GPIOs, etc

   Andrew


Re: [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7

2016-04-07 Thread Linus Torvalds
On Thu, Apr 7, 2016 at 5:15 PM, Yinghai Lu  wrote:
>
> After 5b28541552ef (PCI: Restrict 64-bit prefetchable bridge windows
> to 64-bit resources), we have several reports on resource allocation
> failure, and we try to fix the problem with resource clip, and find
> more problems.

Quite frankly, that commit 5b28541552ef is two years old by now, and
went into 3.16.

So whatever problems it caused are kind of water under the bridge.

It worries me a *lot* when there is then a 60-patch series to fix
those alleged problems, because my natural worry ends up being that
the series is as likely to introduce new issues as it is to fix
something that clearly people have been living with for a while now.

I'm not saying that this series is bad, but I *am* saying that at this
point, I'd much rather see this be done as much smaller series, and
merged slowly and carefully. I'm not seeing a lot of people reviewing
the code, but even *with* code review, things like "let's start
allocating from the top of the resource" tends to make me really
really nervous. Because those kinds of patches tend to show problems
even if the code itself was perfectly bug-free, just because it
changes some layout issue and hits some hardware weakness or
undocumented resource allocation issue.

Using tricks like a __weak pcibios_align_end_resource() to only
trigger on one single architecture (despite the naming) makes those
things even subtler.

So please, try to split this up sanely, and let's merge it in sane
pieces. I see that you have that M7101 quirk removal randomly in the
middle of this series, for example, and it doesn't seem to be the only
such random patch. That's entirely independent of all the other
patches in the series (and I thought I acked it already, but
whatever).

Put another way: this is less of a real targeted series, and more of a
random collection of patches. Very few people have the background to
review them, and basically nobody has the ability to test them
(although _individual_ parts of it are obviously testable).

I'd realyl like to see the misc per-device ones separated, for
example. Same for the pure cleanup ones that obviously don't change
any actual semantics. There's a number of those there. And then the
ones that do change real and generic pci allocation code need to be
done in smaller batches so that you don't have "everything changes at
once".

I tried to scan the patches, and I didn't find anything actually
_wrong_. Several looked like "that's an obvious improvement". But
several looked fairly complex, and all _together_ just looked pretty
scary.

  Linus


Re: [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7

2016-04-07 Thread Linus Torvalds
On Thu, Apr 7, 2016 at 5:15 PM, Yinghai Lu  wrote:
>
> After 5b28541552ef (PCI: Restrict 64-bit prefetchable bridge windows
> to 64-bit resources), we have several reports on resource allocation
> failure, and we try to fix the problem with resource clip, and find
> more problems.

Quite frankly, that commit 5b28541552ef is two years old by now, and
went into 3.16.

So whatever problems it caused are kind of water under the bridge.

It worries me a *lot* when there is then a 60-patch series to fix
those alleged problems, because my natural worry ends up being that
the series is as likely to introduce new issues as it is to fix
something that clearly people have been living with for a while now.

I'm not saying that this series is bad, but I *am* saying that at this
point, I'd much rather see this be done as much smaller series, and
merged slowly and carefully. I'm not seeing a lot of people reviewing
the code, but even *with* code review, things like "let's start
allocating from the top of the resource" tends to make me really
really nervous. Because those kinds of patches tend to show problems
even if the code itself was perfectly bug-free, just because it
changes some layout issue and hits some hardware weakness or
undocumented resource allocation issue.

Using tricks like a __weak pcibios_align_end_resource() to only
trigger on one single architecture (despite the naming) makes those
things even subtler.

So please, try to split this up sanely, and let's merge it in sane
pieces. I see that you have that M7101 quirk removal randomly in the
middle of this series, for example, and it doesn't seem to be the only
such random patch. That's entirely independent of all the other
patches in the series (and I thought I acked it already, but
whatever).

Put another way: this is less of a real targeted series, and more of a
random collection of patches. Very few people have the background to
review them, and basically nobody has the ability to test them
(although _individual_ parts of it are obviously testable).

I'd realyl like to see the misc per-device ones separated, for
example. Same for the pure cleanup ones that obviously don't change
any actual semantics. There's a number of those there. And then the
ones that do change real and generic pci allocation code need to be
done in smaller batches so that you don't have "everything changes at
once".

I tried to scan the patches, and I didn't find anything actually
_wrong_. Several looked like "that's an obvious improvement". But
several looked fairly complex, and all _together_ just looked pretty
scary.

  Linus


Re: [PATCH 04/10] iio: stx104: Change STX104 dependency to ISA_BUS

2016-04-07 Thread Guenter Roeck
On Thu, Apr 07, 2016 at 10:47:25AM -0400, William Breathitt Gray wrote:
> The Apex Embedded Systems STX104 may be used on 64-bit X86 systems. This
> patch allows the Apex Embedded Systems STX104 DAC driver to be compiled
> for both 32-bit and 64-bit X86 systems.
> 
> Signed-off-by: William Breathitt Gray 
> ---
>  drivers/iio/dac/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iio/dac/Kconfig b/drivers/iio/dac/Kconfig
> index a995139..df4b55d 100644
> --- a/drivers/iio/dac/Kconfig
> +++ b/drivers/iio/dac/Kconfig
> @@ -210,7 +210,7 @@ config MCP4922
>  
>  config STX104
>   tristate "Apex Embedded Systems STX104 DAC driver"
> - depends on ISA
> + depends on X86 && ISA_BUS

This means for this and other similar drivers that the driver is no longer
supported on architectures which support ISA but not the newly introduced
ISA_BUS. Affected architectures are alpha, arm, m32r, m68k, mips, powerpc,
and parisc.

A typical example is SCSI_AHA1542, which is no longer supported on those
architectures. It builds, but isa_register_driver() will be a dummy and fail.
Actually, this is true for _all_ drivers calling isa_register_driver().

I hope this is understood and doesn't cause any problems.

Thanks,
Guenter

>   help
> Say yes here to build support for the 2-channel DAC on the Apex
> Embedded Systems STX104 integrated analog PC/104 card. The base port
> -- 
> 2.7.3
> 


Re: [PATCH 04/10] iio: stx104: Change STX104 dependency to ISA_BUS

2016-04-07 Thread Guenter Roeck
On Thu, Apr 07, 2016 at 10:47:25AM -0400, William Breathitt Gray wrote:
> The Apex Embedded Systems STX104 may be used on 64-bit X86 systems. This
> patch allows the Apex Embedded Systems STX104 DAC driver to be compiled
> for both 32-bit and 64-bit X86 systems.
> 
> Signed-off-by: William Breathitt Gray 
> ---
>  drivers/iio/dac/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iio/dac/Kconfig b/drivers/iio/dac/Kconfig
> index a995139..df4b55d 100644
> --- a/drivers/iio/dac/Kconfig
> +++ b/drivers/iio/dac/Kconfig
> @@ -210,7 +210,7 @@ config MCP4922
>  
>  config STX104
>   tristate "Apex Embedded Systems STX104 DAC driver"
> - depends on ISA
> + depends on X86 && ISA_BUS

This means for this and other similar drivers that the driver is no longer
supported on architectures which support ISA but not the newly introduced
ISA_BUS. Affected architectures are alpha, arm, m32r, m68k, mips, powerpc,
and parisc.

A typical example is SCSI_AHA1542, which is no longer supported on those
architectures. It builds, but isa_register_driver() will be a dummy and fail.
Actually, this is true for _all_ drivers calling isa_register_driver().

I hope this is understood and doesn't cause any problems.

Thanks,
Guenter

>   help
> Say yes here to build support for the 2-channel DAC on the Apex
> Embedded Systems STX104 integrated analog PC/104 card. The base port
> -- 
> 2.7.3
> 


[PATCH v11 05/60] sparc/PCI: Reserve legacy mmio after PCI mmio

2016-04-07 Thread Yinghai Lu
On one system found bunch of claim resource fail from pci device.
pci_sun4v f02b894c: PCI host bridge to bus :00
pci_bus :00: root bus resource [io  0x2007e-0x2007e0fff] (bus 
address [0x-0xfff])
pci_bus :00: root bus resource [mem 0x2-0x27eff] (bus 
address [0x-0x7eff])
pci_bus :00: root bus resource [mem 0x20001-0x20007] (bus 
address [0x1-0x7])
...
PCI: Claiming :00:02.0: Resource 14: 0002..0002004f 
[200]
pci :00:02.0: can't claim BAR 14 [mem 0x2-0x2004f]: 
address conflict with Video RAM area [??? 0x2000a-0x2000b flags 
0x8000]
pci :02:00.0: can't claim BAR 0 [mem 0x2-0x2000f]: no 
compatible bridge window
PCI: Claiming :02:00.0: Resource 3: 00020010..000200103fff [200]
pci :02:00.0: can't claim BAR 3 [mem 0x20010-0x200103fff]: no 
compatible bridge window
PCI: Claiming :02:00.1: Resource 0: 00020020..0002002f [200]
pci :02:00.1: can't claim BAR 0 [mem 0x20020-0x2002f]: no 
compatible bridge window
PCI: Claiming :02:00.1: Resource 3: 000200104000..000200107fff [200]
pci :02:00.1: can't claim BAR 3 [mem 0x200104000-0x200107fff]: no 
compatible bridge window
PCI: Claiming :02:00.2: Resource 0: 00020030..0002003f [200]
pci :02:00.2: can't claim BAR 0 [mem 0x20030-0x2003f]: no 
compatible bridge window
PCI: Claiming :02:00.2: Resource 3: 000200108000..00020010bfff [200]
pci :02:00.2: can't claim BAR 3 [mem 0x200108000-0x20010bfff]: no 
compatible bridge window
PCI: Claiming :02:00.3: Resource 0: 00020040..0002004f [200]
pci :02:00.3: can't claim BAR 0 [mem 0x20040-0x2004f]: no 
compatible bridge window
PCI: Claiming :02:00.3: Resource 3: 00020010c000..00020010 [200]
pci :02:00.3: can't claim BAR 3 [mem 0x20010c000-0x20010]: no 
compatible bridge window

The bridge 00:02.0 resource does not get reserved as Video RAM take the 
position early,
and following children resources reservation all fail.

Move down Video RAM area reservation after pci mmio get reserved,
so we leave pci driver to use those regions.

-v5: merge simplify one and use pcibios_bus_to_resource()

-v6: use pci_find_bus_resource()

Signed-off-by: Yinghai Lu 
Tested-by: Khalid Aziz 
---
 arch/sparc/kernel/pci.c|  1 +
 arch/sparc/kernel/pci_common.c | 59 ++
 arch/sparc/kernel/pci_impl.h   |  1 +
 3 files changed, 33 insertions(+), 28 deletions(-)

diff --git a/arch/sparc/kernel/pci.c b/arch/sparc/kernel/pci.c
index 4606dc1..9c6daad 100644
--- a/arch/sparc/kernel/pci.c
+++ b/arch/sparc/kernel/pci.c
@@ -677,6 +677,7 @@ struct pci_bus *pci_scan_one_pbm(struct pci_pbm_info *pbm,
pci_bus_register_of_sysfs(bus);
 
pci_claim_bus_resources(bus);
+   pci_register_legacy_regions(bus);
pci_bus_add_devices(bus);
return bus;
 }
diff --git a/arch/sparc/kernel/pci_common.c b/arch/sparc/kernel/pci_common.c
index 76998f8..1ebc7ff 100644
--- a/arch/sparc/kernel/pci_common.c
+++ b/arch/sparc/kernel/pci_common.c
@@ -328,41 +328,46 @@ void pci_get_pbm_props(struct pci_pbm_info *pbm)
}
 }
 
-static void pci_register_legacy_regions(struct resource *io_res,
-   struct resource *mem_res)
+static void pci_register_region(struct pci_bus *bus, const char *name,
+   resource_size_t rstart, resource_size_t size)
 {
-   struct resource *p;
+   struct resource *res, *conflict, *bus_res;
+   struct pci_bus_region region;
 
-   /* VGA Video RAM. */
-   p = kzalloc(sizeof(*p), GFP_KERNEL);
-   if (!p)
+   res = kzalloc(sizeof(*res), GFP_KERNEL);
+   if (!res)
return;
 
-   p->name = "Video RAM area";
-   p->start = mem_res->start + 0xaUL;
-   p->end = p->start + 0x1UL;
-   p->flags = IORESOURCE_BUSY;
-   request_resource(mem_res, p);
+   res->flags = IORESOURCE_MEM;
 
-   p = kzalloc(sizeof(*p), GFP_KERNEL);
-   if (!p)
+   region.start = rstart;
+   region.end = rstart + size - 1UL;
+   pcibios_bus_to_resource(bus, res, );
+   bus_res = pci_find_bus_resource(bus, res);
+   if (!bus_res) {
+   kfree(res);
return;
+   }
+
+   res->name = name;
+   res->flags |= IORESOURCE_BUSY;
+   conflict = request_resource_conflict(bus_res, res);
+   if (conflict) {
+   dev_printk(KERN_DEBUG, >dev,
+   " can't claim %s %pR: address conflict with %s %pR\n",
+   res->name, res, conflict->name, conflict);
+   kfree(res);
+   }
+}
 
-   p->name = "System 

[PATCH v11 05/60] sparc/PCI: Reserve legacy mmio after PCI mmio

2016-04-07 Thread Yinghai Lu
On one system found bunch of claim resource fail from pci device.
pci_sun4v f02b894c: PCI host bridge to bus :00
pci_bus :00: root bus resource [io  0x2007e-0x2007e0fff] (bus 
address [0x-0xfff])
pci_bus :00: root bus resource [mem 0x2-0x27eff] (bus 
address [0x-0x7eff])
pci_bus :00: root bus resource [mem 0x20001-0x20007] (bus 
address [0x1-0x7])
...
PCI: Claiming :00:02.0: Resource 14: 0002..0002004f 
[200]
pci :00:02.0: can't claim BAR 14 [mem 0x2-0x2004f]: 
address conflict with Video RAM area [??? 0x2000a-0x2000b flags 
0x8000]
pci :02:00.0: can't claim BAR 0 [mem 0x2-0x2000f]: no 
compatible bridge window
PCI: Claiming :02:00.0: Resource 3: 00020010..000200103fff [200]
pci :02:00.0: can't claim BAR 3 [mem 0x20010-0x200103fff]: no 
compatible bridge window
PCI: Claiming :02:00.1: Resource 0: 00020020..0002002f [200]
pci :02:00.1: can't claim BAR 0 [mem 0x20020-0x2002f]: no 
compatible bridge window
PCI: Claiming :02:00.1: Resource 3: 000200104000..000200107fff [200]
pci :02:00.1: can't claim BAR 3 [mem 0x200104000-0x200107fff]: no 
compatible bridge window
PCI: Claiming :02:00.2: Resource 0: 00020030..0002003f [200]
pci :02:00.2: can't claim BAR 0 [mem 0x20030-0x2003f]: no 
compatible bridge window
PCI: Claiming :02:00.2: Resource 3: 000200108000..00020010bfff [200]
pci :02:00.2: can't claim BAR 3 [mem 0x200108000-0x20010bfff]: no 
compatible bridge window
PCI: Claiming :02:00.3: Resource 0: 00020040..0002004f [200]
pci :02:00.3: can't claim BAR 0 [mem 0x20040-0x2004f]: no 
compatible bridge window
PCI: Claiming :02:00.3: Resource 3: 00020010c000..00020010 [200]
pci :02:00.3: can't claim BAR 3 [mem 0x20010c000-0x20010]: no 
compatible bridge window

The bridge 00:02.0 resource does not get reserved as Video RAM take the 
position early,
and following children resources reservation all fail.

Move down Video RAM area reservation after pci mmio get reserved,
so we leave pci driver to use those regions.

-v5: merge simplify one and use pcibios_bus_to_resource()

-v6: use pci_find_bus_resource()

Signed-off-by: Yinghai Lu 
Tested-by: Khalid Aziz 
---
 arch/sparc/kernel/pci.c|  1 +
 arch/sparc/kernel/pci_common.c | 59 ++
 arch/sparc/kernel/pci_impl.h   |  1 +
 3 files changed, 33 insertions(+), 28 deletions(-)

diff --git a/arch/sparc/kernel/pci.c b/arch/sparc/kernel/pci.c
index 4606dc1..9c6daad 100644
--- a/arch/sparc/kernel/pci.c
+++ b/arch/sparc/kernel/pci.c
@@ -677,6 +677,7 @@ struct pci_bus *pci_scan_one_pbm(struct pci_pbm_info *pbm,
pci_bus_register_of_sysfs(bus);
 
pci_claim_bus_resources(bus);
+   pci_register_legacy_regions(bus);
pci_bus_add_devices(bus);
return bus;
 }
diff --git a/arch/sparc/kernel/pci_common.c b/arch/sparc/kernel/pci_common.c
index 76998f8..1ebc7ff 100644
--- a/arch/sparc/kernel/pci_common.c
+++ b/arch/sparc/kernel/pci_common.c
@@ -328,41 +328,46 @@ void pci_get_pbm_props(struct pci_pbm_info *pbm)
}
 }
 
-static void pci_register_legacy_regions(struct resource *io_res,
-   struct resource *mem_res)
+static void pci_register_region(struct pci_bus *bus, const char *name,
+   resource_size_t rstart, resource_size_t size)
 {
-   struct resource *p;
+   struct resource *res, *conflict, *bus_res;
+   struct pci_bus_region region;
 
-   /* VGA Video RAM. */
-   p = kzalloc(sizeof(*p), GFP_KERNEL);
-   if (!p)
+   res = kzalloc(sizeof(*res), GFP_KERNEL);
+   if (!res)
return;
 
-   p->name = "Video RAM area";
-   p->start = mem_res->start + 0xaUL;
-   p->end = p->start + 0x1UL;
-   p->flags = IORESOURCE_BUSY;
-   request_resource(mem_res, p);
+   res->flags = IORESOURCE_MEM;
 
-   p = kzalloc(sizeof(*p), GFP_KERNEL);
-   if (!p)
+   region.start = rstart;
+   region.end = rstart + size - 1UL;
+   pcibios_bus_to_resource(bus, res, );
+   bus_res = pci_find_bus_resource(bus, res);
+   if (!bus_res) {
+   kfree(res);
return;
+   }
+
+   res->name = name;
+   res->flags |= IORESOURCE_BUSY;
+   conflict = request_resource_conflict(bus_res, res);
+   if (conflict) {
+   dev_printk(KERN_DEBUG, >dev,
+   " can't claim %s %pR: address conflict with %s %pR\n",
+   res->name, res, conflict->name, conflict);
+   kfree(res);
+   }
+}
 
-   p->name = "System ROM";
-   p->start = mem_res->start + 

Re: [PATCH 06/10] watchdog: ebc-c384_wdt: Utilize the ISA bus driver

2016-04-07 Thread Guenter Roeck
On Thu, Apr 07, 2016 at 10:47:27AM -0400, William Breathitt Gray wrote:
> The WinSystems EBC-C384 watchdog timer is controlled via ISA bus
> communication. As such, the ISA bus driver is more appropriate than the
> platform driver for the WinSystems EBC-C384 watchdog timer driver.
> 
> Signed-off-by: William Breathitt Gray 
> ---
>  drivers/watchdog/Kconfig|  2 +-
>  drivers/watchdog/ebc-c384_wdt.c | 43 
> ++---
>  2 files changed, 11 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
> index fb94765..b10761d 100644
> --- a/drivers/watchdog/Kconfig
> +++ b/drivers/watchdog/Kconfig
> @@ -738,7 +738,7 @@ config ALIM7101_WDT
>  
>  config EBC_C384_WDT
>   tristate "WinSystems EBC-C384 Watchdog Timer"
> - depends on X86
> + depends on X86 && ISA_BUS

I am a bit concerend that the newly introduced ISA_BUS is not automatically
enabled. Effectively this means that all drivers depending on it will
be disabled until someone enables ISA_BUS in the distribution.

Is this a concern for anyone but me ?

Anyway, since you are the driver maintainer, I assume that you are ok
with it, so

Acked-by: Guenter Roeck 

Side note for Wim: ISA_BUS was introduced with commit b3c1be1b789c
("base: isa: Remove X86_32 dependency") in -next.

Guenter

>   select WATCHDOG_CORE
>   help
> Enables watchdog timer support for the watchdog timer on the
> diff --git a/drivers/watchdog/ebc-c384_wdt.c b/drivers/watchdog/ebc-c384_wdt.c
> index 77fda0b..4b849b8 100644
> --- a/drivers/watchdog/ebc-c384_wdt.c
> +++ b/drivers/watchdog/ebc-c384_wdt.c
> @@ -16,10 +16,10 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  
> @@ -95,9 +95,8 @@ static const struct watchdog_info ebc_c384_wdt_info = {
>   .identity = MODULE_NAME
>  };
>  
> -static int __init ebc_c384_wdt_probe(struct platform_device *pdev)
> +static int ebc_c384_wdt_probe(struct device *dev, unsigned int id)
>  {
> - struct device *dev = >dev;
>   struct watchdog_device *wdd;
>  
>   if (!devm_request_region(dev, BASE_ADDR, ADDR_EXTENT, dev_name(dev))) {
> @@ -122,61 +121,39 @@ static int __init ebc_c384_wdt_probe(struct 
> platform_device *pdev)
>   dev_warn(dev, "Invalid timeout (%u seconds), using default (%u 
> seconds)\n",
>   timeout, WATCHDOG_TIMEOUT);
>  
> - platform_set_drvdata(pdev, wdd);
> + dev_set_drvdata(dev, wdd);
>  
>   return watchdog_register_device(wdd);
>  }
>  
> -static int ebc_c384_wdt_remove(struct platform_device *pdev)
> +static int ebc_c384_wdt_remove(struct device *dev, unsigned int id)
>  {
> - struct watchdog_device *wdd = platform_get_drvdata(pdev);
> + struct watchdog_device *wdd = dev_get_drvdata(dev);
>  
>   watchdog_unregister_device(wdd);
>  
>   return 0;
>  }
>  
> -static struct platform_driver ebc_c384_wdt_driver = {
> +static struct isa_driver ebc_c384_wdt_driver = {
> + .probe = ebc_c384_wdt_probe,
>   .driver = {
>   .name = MODULE_NAME
>   },
>   .remove = ebc_c384_wdt_remove
>  };
>  
> -static struct platform_device *ebc_c384_wdt_device;
> -
>  static int __init ebc_c384_wdt_init(void)
>  {
> - int err;
> -
>   if (!dmi_match(DMI_BOARD_NAME, "EBC-C384 SBC"))
>   return -ENODEV;
>  
> - ebc_c384_wdt_device = platform_device_alloc(MODULE_NAME, -1);
> - if (!ebc_c384_wdt_device)
> - return -ENOMEM;
> -
> - err = platform_device_add(ebc_c384_wdt_device);
> - if (err)
> - goto err_platform_device;
> -
> - err = platform_driver_probe(_c384_wdt_driver, ebc_c384_wdt_probe);
> - if (err)
> - goto err_platform_driver;
> -
> - return 0;
> -
> -err_platform_driver:
> - platform_device_del(ebc_c384_wdt_device);
> -err_platform_device:
> - platform_device_put(ebc_c384_wdt_device);
> - return err;
> + return isa_register_driver(_c384_wdt_driver, 1);
>  }
>  
>  static void __exit ebc_c384_wdt_exit(void)
>  {
> - platform_device_unregister(ebc_c384_wdt_device);
> - platform_driver_unregister(_c384_wdt_driver);
> + isa_unregister_driver(_c384_wdt_driver);
>  }
>  
>  module_init(ebc_c384_wdt_init);
> @@ -185,4 +162,4 @@ module_exit(ebc_c384_wdt_exit);
>  MODULE_AUTHOR("William Breathitt Gray ");
>  MODULE_DESCRIPTION("WinSystems EBC-C384 watchdog timer driver");
>  MODULE_LICENSE("GPL v2");
> -MODULE_ALIAS("platform:" MODULE_NAME);
> +MODULE_ALIAS("isa:" MODULE_NAME);
> -- 
> 2.7.3
> 


[PATCH v11 44/60] PCI: Add alt_size ressource allocation support

2016-04-07 Thread Yinghai Lu
On system with several pcie switches, BIOS allocate very tight resources
to the bridge bar, and it is not aligned to min_align as kernel allocation
code.

For example:
  02:03.0---0c:00.0---0d:04.0---18:00.0

  18:00.0 need 0x1000, and 0x0001.
BIOS only allocate 0x1010 to 0d:04.0 and above bridges.
Later after using /sys/bus/pci/devices/:0c:00.0/remove to remove 0c:00.0,
rescan with /sys/bus/pci/rescan can not allocate 0x1800 to 0c:00.0.
as current min_align solution will need 0x1800.

Another example:
  00:1c.0---02:00.0---03:01.0---04:00.0---05:19.0---06:00.0

  06:00.0 need 0x400 and 0x80.
BIOS only allocate 0x480 to 05:19.0 and 04:00.0.
when 05:19.0 get removed via /sys/bus/pci/devices/:05:19.0/remove,
rescan with /sys/bus/pci/rescan will fail.
 pci :05:19.0: BAR 14: no space for [mem size 0x0600]
 pci :05:19.0: BAR 14: failed to assign [mem size 0x0600]
 pci :06:00.0: BAR 2: no space for [mem size 0x0400 64bit]
 pci :06:00.0: BAR 2: failed to assign [mem size 0x0400 64bit]
 pci :06:00.0: BAR 0: no space for [mem size 0x0080]
 pci :06:00.0: BAR 0: failed to assign [mem size 0x0080]
current code try to use align 0x200 and size 0x600, but parent
bridge only have 0x480.

Introduce alt_align/alt_size and store them in realloc list in addition
to addon info, and will try it after min_align/min_size allocation fails.

The alt_align is max_align, and alt_size is aligned size with bridge
minimum window alignment.

On my test setup:
  00:1c.7---61:00.0---62:00.0

  62:00.0 needs 0x80 and 0x2, and 00:1c.7 only have 9M allocated
for mmio, with this patch we have

 pci :61:00.0: bridge window [mem 0x0040-0x00ff] to [bus 62]
   add_size 0 add_align 0 alt_size 90 alt_align 80
   req_size c0 req_align 40
 pci :61:00.0: BAR 14: no space for [mem size 0x00c0]
 pci :61:00.0: BAR 14: failed to assign [mem size 0x00c0]
 pci :61:00.0: BAR 14: assigned [mem 0xdf00-0xdf8f]
 pci :62:00.0: BAR 0: assigned [mem 0xdf00-0xdf7f pref]
 pci :62:00.0: BAR 1: assigned [mem 0xdf80-0xdf81]
 pci :61:00.0: PCI bridge to [bus 62]
 pci :61:00.0:   bridge window [io  0x6000-0x6fff]
 pci :61:00.0:   bridge window [mem 0xdf00-0xdf8f]
 pci :00:1c.7: PCI bridge to [bus 61-68]
 pci :00:1c.7:   bridge window [io  0x6000-0x6fff]
 pci :00:1c.7:   bridge window [mem 0xdf00-0xdf8f]

So for 61:00.0 first try with 12M fails, and second try with 9M the
alt_size works. Later 62:00.0 get correct resource allocated too.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=100451
Reported-by: Yijing Wang 
Tested-by: Yijing Wang 
Signed-off-by: Yinghai Lu 
---
 drivers/pci/setup-bus.c | 203 +---
 1 file changed, 191 insertions(+), 12 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 6c58b4a..f3e9873 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -322,7 +322,7 @@ static void reassign_resources_sorted(struct list_head 
*realloc_head,
 {
struct resource *res;
struct pci_dev_resource *add_res, *tmp;
-   resource_size_t add_size, align;
+   resource_size_t add_size, align, r_size;
int idx;
 
list_for_each_entry_safe(add_res, tmp, realloc_head, list) {
@@ -338,12 +338,23 @@ static void reassign_resources_sorted(struct list_head 
*realloc_head,
idx = res - _res->dev->resource[0];
add_size = add_res->add_size;
align = add_res->min_align;
-   if (!resource_size(res)) {
+   if (!add_size || !align) /* alt_size only */
+   goto out;
+
+   r_size = resource_size(res);
+   if (!r_size) {
res->start = align;
res->end = res->start + add_size - 1;
if (pci_assign_resource(add_res->dev, idx))
reset_resource(res);
} else {
+   /* could just assigned with alt, add difference ? */
+   resource_size_t size;
+
+   size = add_res->end - add_res->start + 1;
+   if (r_size < size)
+   add_size += size - r_size;
+
res->flags |= add_res->flags &
 (IORESOURCE_STARTALIGN|IORESOURCE_SIZEALIGN);
if (pci_reassign_resource(add_res->dev, idx,
@@ -582,6 +593,104 @@ static bool 
__assign_resources_required_optional_sorted(struct list_head *head,
return false;
 }
 
+static bool __has_alt(struct list_head *head,
+   struct list_head *realloc_head)
+{
+   int alt_count = 0;
+   struct pci_dev_resource *dev_res, *alt_res;
+
+   

Re: [PATCH 06/10] watchdog: ebc-c384_wdt: Utilize the ISA bus driver

2016-04-07 Thread Guenter Roeck
On Thu, Apr 07, 2016 at 10:47:27AM -0400, William Breathitt Gray wrote:
> The WinSystems EBC-C384 watchdog timer is controlled via ISA bus
> communication. As such, the ISA bus driver is more appropriate than the
> platform driver for the WinSystems EBC-C384 watchdog timer driver.
> 
> Signed-off-by: William Breathitt Gray 
> ---
>  drivers/watchdog/Kconfig|  2 +-
>  drivers/watchdog/ebc-c384_wdt.c | 43 
> ++---
>  2 files changed, 11 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
> index fb94765..b10761d 100644
> --- a/drivers/watchdog/Kconfig
> +++ b/drivers/watchdog/Kconfig
> @@ -738,7 +738,7 @@ config ALIM7101_WDT
>  
>  config EBC_C384_WDT
>   tristate "WinSystems EBC-C384 Watchdog Timer"
> - depends on X86
> + depends on X86 && ISA_BUS

I am a bit concerend that the newly introduced ISA_BUS is not automatically
enabled. Effectively this means that all drivers depending on it will
be disabled until someone enables ISA_BUS in the distribution.

Is this a concern for anyone but me ?

Anyway, since you are the driver maintainer, I assume that you are ok
with it, so

Acked-by: Guenter Roeck 

Side note for Wim: ISA_BUS was introduced with commit b3c1be1b789c
("base: isa: Remove X86_32 dependency") in -next.

Guenter

>   select WATCHDOG_CORE
>   help
> Enables watchdog timer support for the watchdog timer on the
> diff --git a/drivers/watchdog/ebc-c384_wdt.c b/drivers/watchdog/ebc-c384_wdt.c
> index 77fda0b..4b849b8 100644
> --- a/drivers/watchdog/ebc-c384_wdt.c
> +++ b/drivers/watchdog/ebc-c384_wdt.c
> @@ -16,10 +16,10 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  
> @@ -95,9 +95,8 @@ static const struct watchdog_info ebc_c384_wdt_info = {
>   .identity = MODULE_NAME
>  };
>  
> -static int __init ebc_c384_wdt_probe(struct platform_device *pdev)
> +static int ebc_c384_wdt_probe(struct device *dev, unsigned int id)
>  {
> - struct device *dev = >dev;
>   struct watchdog_device *wdd;
>  
>   if (!devm_request_region(dev, BASE_ADDR, ADDR_EXTENT, dev_name(dev))) {
> @@ -122,61 +121,39 @@ static int __init ebc_c384_wdt_probe(struct 
> platform_device *pdev)
>   dev_warn(dev, "Invalid timeout (%u seconds), using default (%u 
> seconds)\n",
>   timeout, WATCHDOG_TIMEOUT);
>  
> - platform_set_drvdata(pdev, wdd);
> + dev_set_drvdata(dev, wdd);
>  
>   return watchdog_register_device(wdd);
>  }
>  
> -static int ebc_c384_wdt_remove(struct platform_device *pdev)
> +static int ebc_c384_wdt_remove(struct device *dev, unsigned int id)
>  {
> - struct watchdog_device *wdd = platform_get_drvdata(pdev);
> + struct watchdog_device *wdd = dev_get_drvdata(dev);
>  
>   watchdog_unregister_device(wdd);
>  
>   return 0;
>  }
>  
> -static struct platform_driver ebc_c384_wdt_driver = {
> +static struct isa_driver ebc_c384_wdt_driver = {
> + .probe = ebc_c384_wdt_probe,
>   .driver = {
>   .name = MODULE_NAME
>   },
>   .remove = ebc_c384_wdt_remove
>  };
>  
> -static struct platform_device *ebc_c384_wdt_device;
> -
>  static int __init ebc_c384_wdt_init(void)
>  {
> - int err;
> -
>   if (!dmi_match(DMI_BOARD_NAME, "EBC-C384 SBC"))
>   return -ENODEV;
>  
> - ebc_c384_wdt_device = platform_device_alloc(MODULE_NAME, -1);
> - if (!ebc_c384_wdt_device)
> - return -ENOMEM;
> -
> - err = platform_device_add(ebc_c384_wdt_device);
> - if (err)
> - goto err_platform_device;
> -
> - err = platform_driver_probe(_c384_wdt_driver, ebc_c384_wdt_probe);
> - if (err)
> - goto err_platform_driver;
> -
> - return 0;
> -
> -err_platform_driver:
> - platform_device_del(ebc_c384_wdt_device);
> -err_platform_device:
> - platform_device_put(ebc_c384_wdt_device);
> - return err;
> + return isa_register_driver(_c384_wdt_driver, 1);
>  }
>  
>  static void __exit ebc_c384_wdt_exit(void)
>  {
> - platform_device_unregister(ebc_c384_wdt_device);
> - platform_driver_unregister(_c384_wdt_driver);
> + isa_unregister_driver(_c384_wdt_driver);
>  }
>  
>  module_init(ebc_c384_wdt_init);
> @@ -185,4 +162,4 @@ module_exit(ebc_c384_wdt_exit);
>  MODULE_AUTHOR("William Breathitt Gray ");
>  MODULE_DESCRIPTION("WinSystems EBC-C384 watchdog timer driver");
>  MODULE_LICENSE("GPL v2");
> -MODULE_ALIAS("platform:" MODULE_NAME);
> +MODULE_ALIAS("isa:" MODULE_NAME);
> -- 
> 2.7.3
> 


[PATCH v11 44/60] PCI: Add alt_size ressource allocation support

2016-04-07 Thread Yinghai Lu
On system with several pcie switches, BIOS allocate very tight resources
to the bridge bar, and it is not aligned to min_align as kernel allocation
code.

For example:
  02:03.0---0c:00.0---0d:04.0---18:00.0

  18:00.0 need 0x1000, and 0x0001.
BIOS only allocate 0x1010 to 0d:04.0 and above bridges.
Later after using /sys/bus/pci/devices/:0c:00.0/remove to remove 0c:00.0,
rescan with /sys/bus/pci/rescan can not allocate 0x1800 to 0c:00.0.
as current min_align solution will need 0x1800.

Another example:
  00:1c.0---02:00.0---03:01.0---04:00.0---05:19.0---06:00.0

  06:00.0 need 0x400 and 0x80.
BIOS only allocate 0x480 to 05:19.0 and 04:00.0.
when 05:19.0 get removed via /sys/bus/pci/devices/:05:19.0/remove,
rescan with /sys/bus/pci/rescan will fail.
 pci :05:19.0: BAR 14: no space for [mem size 0x0600]
 pci :05:19.0: BAR 14: failed to assign [mem size 0x0600]
 pci :06:00.0: BAR 2: no space for [mem size 0x0400 64bit]
 pci :06:00.0: BAR 2: failed to assign [mem size 0x0400 64bit]
 pci :06:00.0: BAR 0: no space for [mem size 0x0080]
 pci :06:00.0: BAR 0: failed to assign [mem size 0x0080]
current code try to use align 0x200 and size 0x600, but parent
bridge only have 0x480.

Introduce alt_align/alt_size and store them in realloc list in addition
to addon info, and will try it after min_align/min_size allocation fails.

The alt_align is max_align, and alt_size is aligned size with bridge
minimum window alignment.

On my test setup:
  00:1c.7---61:00.0---62:00.0

  62:00.0 needs 0x80 and 0x2, and 00:1c.7 only have 9M allocated
for mmio, with this patch we have

 pci :61:00.0: bridge window [mem 0x0040-0x00ff] to [bus 62]
   add_size 0 add_align 0 alt_size 90 alt_align 80
   req_size c0 req_align 40
 pci :61:00.0: BAR 14: no space for [mem size 0x00c0]
 pci :61:00.0: BAR 14: failed to assign [mem size 0x00c0]
 pci :61:00.0: BAR 14: assigned [mem 0xdf00-0xdf8f]
 pci :62:00.0: BAR 0: assigned [mem 0xdf00-0xdf7f pref]
 pci :62:00.0: BAR 1: assigned [mem 0xdf80-0xdf81]
 pci :61:00.0: PCI bridge to [bus 62]
 pci :61:00.0:   bridge window [io  0x6000-0x6fff]
 pci :61:00.0:   bridge window [mem 0xdf00-0xdf8f]
 pci :00:1c.7: PCI bridge to [bus 61-68]
 pci :00:1c.7:   bridge window [io  0x6000-0x6fff]
 pci :00:1c.7:   bridge window [mem 0xdf00-0xdf8f]

So for 61:00.0 first try with 12M fails, and second try with 9M the
alt_size works. Later 62:00.0 get correct resource allocated too.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=100451
Reported-by: Yijing Wang 
Tested-by: Yijing Wang 
Signed-off-by: Yinghai Lu 
---
 drivers/pci/setup-bus.c | 203 +---
 1 file changed, 191 insertions(+), 12 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 6c58b4a..f3e9873 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -322,7 +322,7 @@ static void reassign_resources_sorted(struct list_head 
*realloc_head,
 {
struct resource *res;
struct pci_dev_resource *add_res, *tmp;
-   resource_size_t add_size, align;
+   resource_size_t add_size, align, r_size;
int idx;
 
list_for_each_entry_safe(add_res, tmp, realloc_head, list) {
@@ -338,12 +338,23 @@ static void reassign_resources_sorted(struct list_head 
*realloc_head,
idx = res - _res->dev->resource[0];
add_size = add_res->add_size;
align = add_res->min_align;
-   if (!resource_size(res)) {
+   if (!add_size || !align) /* alt_size only */
+   goto out;
+
+   r_size = resource_size(res);
+   if (!r_size) {
res->start = align;
res->end = res->start + add_size - 1;
if (pci_assign_resource(add_res->dev, idx))
reset_resource(res);
} else {
+   /* could just assigned with alt, add difference ? */
+   resource_size_t size;
+
+   size = add_res->end - add_res->start + 1;
+   if (r_size < size)
+   add_size += size - r_size;
+
res->flags |= add_res->flags &
 (IORESOURCE_STARTALIGN|IORESOURCE_SIZEALIGN);
if (pci_reassign_resource(add_res->dev, idx,
@@ -582,6 +593,104 @@ static bool 
__assign_resources_required_optional_sorted(struct list_head *head,
return false;
 }
 
+static bool __has_alt(struct list_head *head,
+   struct list_head *realloc_head)
+{
+   int alt_count = 0;
+   struct pci_dev_resource *dev_res, *alt_res;
+
+   if (!realloc_head)
+   return false;
+
+   /* 

[PATCH v11 14/60] PCI: Add has_mem64 for struct host_bridge

2016-04-07 Thread Yinghai Lu
Add has_mem64 for struct host_bridge, on root bus that does not support
mmio64 above 4g, will not set that.

We will use that info next two following patches:
1. Don't treat non-pref mmio64 as pref mmio, so will not put
   it under bridge's pref range when rescan the devices
2. will keep pref mmio64 and pref mmio32 under bridge pref bar.

Signed-off-by: Yinghai Lu 
Tested-by: Khalid Aziz 
---
 drivers/pci/probe.c | 7 +++
 include/linux/pci.h | 1 +
 2 files changed, 8 insertions(+)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 48e6f29..6b079f4 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2230,6 +2230,13 @@ struct pci_bus *pci_create_root_bus(struct device 
*parent, int bus,
} else
bus_addr[0] = '\0';
dev_info(>dev, "root bus resource %pR%s\n", res, bus_addr);
+
+   if (resource_type(res) == IORESOURCE_MEM) {
+   if ((res->end - offset) > 0x)
+   bridge->has_mem64 = 1;
+   if ((res->start - offset) > 0x)
+   res->flags |= IORESOURCE_MEM_64;
+   }
}
 
down_write(_bus_sem);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 1527735..979be25 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -408,6 +408,7 @@ struct pci_host_bridge {
void (*release_fn)(struct pci_host_bridge *);
void *release_data;
unsigned int ignore_reset_delay:1;  /* for entire hierarchy */
+   unsigned int has_mem64:1;
/* Resource alignment requirements */
resource_size_t (*align_resource)(struct pci_dev *dev,
const struct resource *res,
-- 
1.8.4.5



[PATCH v11 14/60] PCI: Add has_mem64 for struct host_bridge

2016-04-07 Thread Yinghai Lu
Add has_mem64 for struct host_bridge, on root bus that does not support
mmio64 above 4g, will not set that.

We will use that info next two following patches:
1. Don't treat non-pref mmio64 as pref mmio, so will not put
   it under bridge's pref range when rescan the devices
2. will keep pref mmio64 and pref mmio32 under bridge pref bar.

Signed-off-by: Yinghai Lu 
Tested-by: Khalid Aziz 
---
 drivers/pci/probe.c | 7 +++
 include/linux/pci.h | 1 +
 2 files changed, 8 insertions(+)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 48e6f29..6b079f4 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2230,6 +2230,13 @@ struct pci_bus *pci_create_root_bus(struct device 
*parent, int bus,
} else
bus_addr[0] = '\0';
dev_info(>dev, "root bus resource %pR%s\n", res, bus_addr);
+
+   if (resource_type(res) == IORESOURCE_MEM) {
+   if ((res->end - offset) > 0x)
+   bridge->has_mem64 = 1;
+   if ((res->start - offset) > 0x)
+   res->flags |= IORESOURCE_MEM_64;
+   }
}
 
down_write(_bus_sem);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 1527735..979be25 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -408,6 +408,7 @@ struct pci_host_bridge {
void (*release_fn)(struct pci_host_bridge *);
void *release_data;
unsigned int ignore_reset_delay:1;  /* for entire hierarchy */
+   unsigned int has_mem64:1;
/* Resource alignment requirements */
resource_size_t (*align_resource)(struct pci_dev *dev,
const struct resource *res,
-- 
1.8.4.5



[PATCH v11 04/60] sparc/PCI: Use correct offset for bus address to resource

2016-04-07 Thread Yinghai Lu
After we added 64bit mmio parsing, we got some "no compatible bridge window"
warning on anther new model that support 64bit resource.

It turns out that we can not use mem_space.start as 64bit mem space
offset, aka there is mem_space.start != offset.

Use child_phys_addr to calculate exact offset and record offset in
pbm.

After patch we get correct offset.

/pci@305: PCI IO [io  0x2007e-0x2007e0fff] offset 2007e
/pci@305: PCI MEM [mem 0x20010-0x27eff] offset 2
/pci@305: PCI MEM64 [mem 0x20001-0x2000d] offset 2
...
pci_sun4v f02ae7f8: PCI host bridge to bus :00
pci_bus :00: root bus resource [io  0x2007e-0x2007e0fff] (bus 
address [0x-0xfff])
pci_bus :00: root bus resource [mem 0x20010-0x27eff] (bus 
address [0x0010-0x7eff])
pci_bus :00: root bus resource [mem 0x20001-0x2000d] (bus 
address [0x1-0xd])

-v3: put back mem64_offset, as we found T4 has mem_offset != mem64_offset
 check overlapping between mem64_space and mem_space.

-v5: use pcibios_bus_to_region() requested by Bjorn.

-v6: use pci_find_bus_resource().

Signed-off-by: Yinghai Lu 
Tested-by: Khalid Aziz 
---
 arch/sparc/kernel/pci.c| 54 --
 arch/sparc/kernel/pci_common.c | 32 ++---
 arch/sparc/kernel/pci_impl.h   |  4 
 3 files changed, 54 insertions(+), 36 deletions(-)

diff --git a/arch/sparc/kernel/pci.c b/arch/sparc/kernel/pci.c
index badf095..4606dc1 100644
--- a/arch/sparc/kernel/pci.c
+++ b/arch/sparc/kernel/pci.c
@@ -654,12 +654,12 @@ struct pci_bus *pci_scan_one_pbm(struct pci_pbm_info *pbm,
printk("PCI: Scanning PBM %s\n", node->full_name);
 
pci_add_resource_offset(, >io_space,
-   pbm->io_space.start);
+   pbm->io_offset);
pci_add_resource_offset(, >mem_space,
-   pbm->mem_space.start);
+   pbm->mem_offset);
if (pbm->mem64_space.flags)
pci_add_resource_offset(, >mem64_space,
-   pbm->mem_space.start);
+   pbm->mem64_offset);
pbm->busn.start = pbm->pci_first_busno;
pbm->busn.end   = pbm->pci_last_busno;
pbm->busn.flags = IORESOURCE_BUS;
@@ -733,30 +733,32 @@ int pcibios_enable_device(struct pci_dev *dev, int mask)
 static int __pci_mmap_make_offset_bus(struct pci_dev *pdev, struct 
vm_area_struct *vma,
  enum pci_mmap_state mmap_state)
 {
-   struct pci_pbm_info *pbm = pdev->dev.archdata.host_controller;
-   unsigned long space_size, user_offset, user_size;
-
-   if (mmap_state == pci_mmap_io) {
-   space_size = resource_size(>io_space);
-   } else {
-   space_size = resource_size(>mem_space);
-   }
+   unsigned long user_offset, user_size;
+   struct resource res, *root_bus_res;
+   struct pci_bus_region region;
+   struct pci_bus *bus;
 
/* Make sure the request is in range. */
user_offset = vma->vm_pgoff << PAGE_SHIFT;
user_size = vma->vm_end - vma->vm_start;
 
-   if (user_offset >= space_size ||
-   (user_offset + user_size) > space_size)
+   region.start = user_offset;
+   region.end = user_offset + user_size - 1;
+   memset(, 0, sizeof(res));
+   if (mmap_state == pci_mmap_io)
+   res.flags = IORESOURCE_IO;
+   else
+   res.flags = IORESOURCE_MEM;
+
+   pcibios_bus_to_resource(pdev->bus, , );
+   bus = pdev->bus;
+   while (bus->parent)
+   bus = bus->parent;
+   root_bus_res = pci_find_bus_resource(bus, );
+   if (!root_bus_res)
return -EINVAL;
 
-   if (mmap_state == pci_mmap_io) {
-   vma->vm_pgoff = (pbm->io_space.start +
-user_offset) >> PAGE_SHIFT;
-   } else {
-   vma->vm_pgoff = (pbm->mem_space.start +
-user_offset) >> PAGE_SHIFT;
-   }
+   vma->vm_pgoff = res.start >> PAGE_SHIFT;
 
return 0;
 }
@@ -977,16 +979,12 @@ void pci_resource_to_user(const struct pci_dev *pdev, int 
bar,
  const struct resource *rp, resource_size_t *start,
  resource_size_t *end)
 {
-   struct pci_pbm_info *pbm = pdev->dev.archdata.host_controller;
-   unsigned long offset;
+   struct pci_bus_region region;
 
-   if (rp->flags & IORESOURCE_IO)
-   offset = pbm->io_space.start;
-   else
-   offset = pbm->mem_space.start;
+   pcibios_resource_to_bus(pdev->bus, , (struct resource *)rp);
 
-   *start = rp->start - offset;
-   *end = rp->end - offset;
+   *start = region.start;

[PATCH v11 03/60] PCI: Add pci_find_bus_resource()

2016-04-07 Thread Yinghai Lu
Add pci_find_bus_resource() to return bus resource for input resource.

In some case, we may only have bus instead of dev.
It is same as pci_find_parent_resource, but take bus as input.

Signed-off-by: Yinghai Lu 
---
 drivers/pci/pci.c   | 27 ---
 include/linux/pci.h |  2 ++
 2 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 25e0327..313dea5 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -414,18 +414,9 @@ int pci_find_ht_capability(struct pci_dev *dev, int ht_cap)
 }
 EXPORT_SYMBOL_GPL(pci_find_ht_capability);
 
-/**
- * pci_find_parent_resource - return resource region of parent bus of given 
region
- * @dev: PCI device structure contains resources to be searched
- * @res: child resource record for which parent is sought
- *
- *  For given resource region of given device, return the resource
- *  region of parent bus the given region is contained in.
- */
-struct resource *pci_find_parent_resource(const struct pci_dev *dev,
- struct resource *res)
+struct resource *pci_find_bus_resource(const struct pci_bus *bus,
+   struct resource *res)
 {
-   const struct pci_bus *bus = dev->bus;
struct resource *r;
int i;
 
@@ -455,6 +446,20 @@ struct resource *pci_find_parent_resource(const struct 
pci_dev *dev,
}
return NULL;
 }
+
+/**
+ * pci_find_parent_resource - return resource region of parent bus of given 
region
+ * @dev: PCI device structure contains resources to be searched
+ * @res: child resource record for which parent is sought
+ *
+ *  For given resource region of given device, return the resource
+ *  region of parent bus the given region is contained in.
+ */
+struct resource *pci_find_parent_resource(const struct pci_dev *dev,
+ struct resource *res)
+{
+   return pci_find_bus_resource(dev->bus, res);
+}
 EXPORT_SYMBOL(pci_find_parent_resource);
 
 /**
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 004b813..795b4c7 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -807,6 +807,8 @@ void pcibios_resource_to_bus(struct pci_bus *bus, struct 
pci_bus_region *region,
 struct resource *res);
 void pcibios_bus_to_resource(struct pci_bus *bus, struct resource *res,
 struct pci_bus_region *region);
+struct resource *pci_find_bus_resource(const struct pci_bus *bus,
+   struct resource *res);
 void pcibios_scan_specific_bus(int busn);
 struct pci_bus *pci_find_bus(int domain, int busnr);
 void pci_bus_add_devices(const struct pci_bus *bus);
-- 
1.8.4.5



[PATCH v11 04/60] sparc/PCI: Use correct offset for bus address to resource

2016-04-07 Thread Yinghai Lu
After we added 64bit mmio parsing, we got some "no compatible bridge window"
warning on anther new model that support 64bit resource.

It turns out that we can not use mem_space.start as 64bit mem space
offset, aka there is mem_space.start != offset.

Use child_phys_addr to calculate exact offset and record offset in
pbm.

After patch we get correct offset.

/pci@305: PCI IO [io  0x2007e-0x2007e0fff] offset 2007e
/pci@305: PCI MEM [mem 0x20010-0x27eff] offset 2
/pci@305: PCI MEM64 [mem 0x20001-0x2000d] offset 2
...
pci_sun4v f02ae7f8: PCI host bridge to bus :00
pci_bus :00: root bus resource [io  0x2007e-0x2007e0fff] (bus 
address [0x-0xfff])
pci_bus :00: root bus resource [mem 0x20010-0x27eff] (bus 
address [0x0010-0x7eff])
pci_bus :00: root bus resource [mem 0x20001-0x2000d] (bus 
address [0x1-0xd])

-v3: put back mem64_offset, as we found T4 has mem_offset != mem64_offset
 check overlapping between mem64_space and mem_space.

-v5: use pcibios_bus_to_region() requested by Bjorn.

-v6: use pci_find_bus_resource().

Signed-off-by: Yinghai Lu 
Tested-by: Khalid Aziz 
---
 arch/sparc/kernel/pci.c| 54 --
 arch/sparc/kernel/pci_common.c | 32 ++---
 arch/sparc/kernel/pci_impl.h   |  4 
 3 files changed, 54 insertions(+), 36 deletions(-)

diff --git a/arch/sparc/kernel/pci.c b/arch/sparc/kernel/pci.c
index badf095..4606dc1 100644
--- a/arch/sparc/kernel/pci.c
+++ b/arch/sparc/kernel/pci.c
@@ -654,12 +654,12 @@ struct pci_bus *pci_scan_one_pbm(struct pci_pbm_info *pbm,
printk("PCI: Scanning PBM %s\n", node->full_name);
 
pci_add_resource_offset(, >io_space,
-   pbm->io_space.start);
+   pbm->io_offset);
pci_add_resource_offset(, >mem_space,
-   pbm->mem_space.start);
+   pbm->mem_offset);
if (pbm->mem64_space.flags)
pci_add_resource_offset(, >mem64_space,
-   pbm->mem_space.start);
+   pbm->mem64_offset);
pbm->busn.start = pbm->pci_first_busno;
pbm->busn.end   = pbm->pci_last_busno;
pbm->busn.flags = IORESOURCE_BUS;
@@ -733,30 +733,32 @@ int pcibios_enable_device(struct pci_dev *dev, int mask)
 static int __pci_mmap_make_offset_bus(struct pci_dev *pdev, struct 
vm_area_struct *vma,
  enum pci_mmap_state mmap_state)
 {
-   struct pci_pbm_info *pbm = pdev->dev.archdata.host_controller;
-   unsigned long space_size, user_offset, user_size;
-
-   if (mmap_state == pci_mmap_io) {
-   space_size = resource_size(>io_space);
-   } else {
-   space_size = resource_size(>mem_space);
-   }
+   unsigned long user_offset, user_size;
+   struct resource res, *root_bus_res;
+   struct pci_bus_region region;
+   struct pci_bus *bus;
 
/* Make sure the request is in range. */
user_offset = vma->vm_pgoff << PAGE_SHIFT;
user_size = vma->vm_end - vma->vm_start;
 
-   if (user_offset >= space_size ||
-   (user_offset + user_size) > space_size)
+   region.start = user_offset;
+   region.end = user_offset + user_size - 1;
+   memset(, 0, sizeof(res));
+   if (mmap_state == pci_mmap_io)
+   res.flags = IORESOURCE_IO;
+   else
+   res.flags = IORESOURCE_MEM;
+
+   pcibios_bus_to_resource(pdev->bus, , );
+   bus = pdev->bus;
+   while (bus->parent)
+   bus = bus->parent;
+   root_bus_res = pci_find_bus_resource(bus, );
+   if (!root_bus_res)
return -EINVAL;
 
-   if (mmap_state == pci_mmap_io) {
-   vma->vm_pgoff = (pbm->io_space.start +
-user_offset) >> PAGE_SHIFT;
-   } else {
-   vma->vm_pgoff = (pbm->mem_space.start +
-user_offset) >> PAGE_SHIFT;
-   }
+   vma->vm_pgoff = res.start >> PAGE_SHIFT;
 
return 0;
 }
@@ -977,16 +979,12 @@ void pci_resource_to_user(const struct pci_dev *pdev, int 
bar,
  const struct resource *rp, resource_size_t *start,
  resource_size_t *end)
 {
-   struct pci_pbm_info *pbm = pdev->dev.archdata.host_controller;
-   unsigned long offset;
+   struct pci_bus_region region;
 
-   if (rp->flags & IORESOURCE_IO)
-   offset = pbm->io_space.start;
-   else
-   offset = pbm->mem_space.start;
+   pcibios_resource_to_bus(pdev->bus, , (struct resource *)rp);
 
-   *start = rp->start - offset;
-   *end = rp->end - offset;
+   *start = region.start;
+   *end = region.end;
 }
 
 void 

[PATCH v11 03/60] PCI: Add pci_find_bus_resource()

2016-04-07 Thread Yinghai Lu
Add pci_find_bus_resource() to return bus resource for input resource.

In some case, we may only have bus instead of dev.
It is same as pci_find_parent_resource, but take bus as input.

Signed-off-by: Yinghai Lu 
---
 drivers/pci/pci.c   | 27 ---
 include/linux/pci.h |  2 ++
 2 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 25e0327..313dea5 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -414,18 +414,9 @@ int pci_find_ht_capability(struct pci_dev *dev, int ht_cap)
 }
 EXPORT_SYMBOL_GPL(pci_find_ht_capability);
 
-/**
- * pci_find_parent_resource - return resource region of parent bus of given 
region
- * @dev: PCI device structure contains resources to be searched
- * @res: child resource record for which parent is sought
- *
- *  For given resource region of given device, return the resource
- *  region of parent bus the given region is contained in.
- */
-struct resource *pci_find_parent_resource(const struct pci_dev *dev,
- struct resource *res)
+struct resource *pci_find_bus_resource(const struct pci_bus *bus,
+   struct resource *res)
 {
-   const struct pci_bus *bus = dev->bus;
struct resource *r;
int i;
 
@@ -455,6 +446,20 @@ struct resource *pci_find_parent_resource(const struct 
pci_dev *dev,
}
return NULL;
 }
+
+/**
+ * pci_find_parent_resource - return resource region of parent bus of given 
region
+ * @dev: PCI device structure contains resources to be searched
+ * @res: child resource record for which parent is sought
+ *
+ *  For given resource region of given device, return the resource
+ *  region of parent bus the given region is contained in.
+ */
+struct resource *pci_find_parent_resource(const struct pci_dev *dev,
+ struct resource *res)
+{
+   return pci_find_bus_resource(dev->bus, res);
+}
 EXPORT_SYMBOL(pci_find_parent_resource);
 
 /**
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 004b813..795b4c7 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -807,6 +807,8 @@ void pcibios_resource_to_bus(struct pci_bus *bus, struct 
pci_bus_region *region,
 struct resource *res);
 void pcibios_bus_to_resource(struct pci_bus *bus, struct resource *res,
 struct pci_bus_region *region);
+struct resource *pci_find_bus_resource(const struct pci_bus *bus,
+   struct resource *res);
 void pcibios_scan_specific_bus(int busn);
 struct pci_bus *pci_find_bus(int domain, int busnr);
 void pci_bus_add_devices(const struct pci_bus *bus);
-- 
1.8.4.5



[PATCH v11 53/60] PCI: Kill macro checking for bus io port sizing

2016-04-07 Thread Yinghai Lu
We can use new generic version skip_isa_ioresource_align() instead
of macro, and then kill the marco.

Signed-off-by: Yinghai Lu 
---
 drivers/pci/setup-bus.c | 17 +++--
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 5ba4bf5..65a41e7 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1159,15 +1159,12 @@ int skip_isa_ioresource_align(struct pci_bus *bus)
return 0;
 }
 
-static resource_size_t size_aligned_for_isa(resource_size_t size)
+static resource_size_t size_aligned_for_isa(resource_size_t size,
+   struct pci_bus *bus)
 {
-   /*
-* To be fixed in 2.5: we should have sort of HAVE_ISA
-*  flag in the struct pci_bus.
-*/
-#if defined(CONFIG_ISA) || defined(CONFIG_EISA)
-   size = (size & 0xff) + ((size & ~0xffUL) << 2);
-#endif
+   if (!skip_isa_ioresource_align(bus))
+   size = (size & 0xff) + ((size & ~0xffUL) << 2);
+
return size;
 }
 
@@ -1236,12 +1233,12 @@ static void pbus_size_io(struct pci_bus *bus, 
resource_size_t min_size,
}
}
 
-   size = size_aligned_for_isa(size);
+   size = size_aligned_for_isa(size, bus);
size += size1;
if (size || min_size)
size0 = calculate_size(size, min_size,
resource_size(b_res), min_align);
-   sum_add_size = size_aligned_for_isa(sum_add_size);
+   sum_add_size = size_aligned_for_isa(sum_add_size, bus);
sum_add_size += sum_add_size1;
if (sum_add_size < min_sum_size)
sum_add_size = min_sum_size;
-- 
1.8.4.5



[PATCH v11 01/60] PCI: Fix iomem_is_exclusive() checking in pci_mmap_resource()

2016-04-07 Thread Yinghai Lu
From: Bjorn Helgaas 

iomem_is_exclusive() requires a CPU physical address, but on some arches we
supplied a PCI bus address instead.

On most arches, pci_resource_to_user(res) returns "res->start", which is a
CPU physical address.  But on microblaze, mips, powerpc, and sparc, it
returns the PCI bus address corresponding to "res->start".

The result is that pci_mmap_resource() may fail when it shouldn't (if the
bus address happens to match an existing resource), or it may succeed when
it should fail (if the resource is exclusive but the bus address doesn't
match it).

Call iomem_is_exclusive() with "res->start", which is always a CPU physical
address, not the result of pci_resource_to_user().

Fixes: e8de1481fd71 ("resource: allow MMIO exclusivity for device drivers")
Suggested-by: Yinghai Lu 
Signed-off-by: Bjorn Helgaas 
CC: Arjan van de Ven 
---
 drivers/pci/pci-sysfs.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index e982010..cbb13be 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -1008,6 +1008,9 @@ static int pci_mmap_resource(struct kobject *kobj, struct 
bin_attribute *attr,
if (i >= PCI_ROM_RESOURCE)
return -ENODEV;
 
+   if (res->flags & IORESOURCE_MEM && iomem_is_exclusive(res->start))
+   return -EINVAL;
+
if (!pci_mmap_fits(pdev, i, vma, PCI_MMAP_SYSFS)) {
WARN(1, "process \"%s\" tried to map 0x%08lx bytes at page 
0x%08lx on %s BAR %d (start 0x%16Lx, size 0x%16Lx)\n",
current->comm, vma->vm_end-vma->vm_start, vma->vm_pgoff,
@@ -1024,10 +1027,6 @@ static int pci_mmap_resource(struct kobject *kobj, 
struct bin_attribute *attr,
pci_resource_to_user(pdev, i, res, , );
vma->vm_pgoff += start >> PAGE_SHIFT;
mmap_type = res->flags & IORESOURCE_MEM ? pci_mmap_mem : pci_mmap_io;
-
-   if (res->flags & IORESOURCE_MEM && iomem_is_exclusive(start))
-   return -EINVAL;
-
return pci_mmap_page_range(pdev, vma, mmap_type, write_combine);
 }
 
-- 
1.8.4.5



[PATCH v11 01/60] PCI: Fix iomem_is_exclusive() checking in pci_mmap_resource()

2016-04-07 Thread Yinghai Lu
From: Bjorn Helgaas 

iomem_is_exclusive() requires a CPU physical address, but on some arches we
supplied a PCI bus address instead.

On most arches, pci_resource_to_user(res) returns "res->start", which is a
CPU physical address.  But on microblaze, mips, powerpc, and sparc, it
returns the PCI bus address corresponding to "res->start".

The result is that pci_mmap_resource() may fail when it shouldn't (if the
bus address happens to match an existing resource), or it may succeed when
it should fail (if the resource is exclusive but the bus address doesn't
match it).

Call iomem_is_exclusive() with "res->start", which is always a CPU physical
address, not the result of pci_resource_to_user().

Fixes: e8de1481fd71 ("resource: allow MMIO exclusivity for device drivers")
Suggested-by: Yinghai Lu 
Signed-off-by: Bjorn Helgaas 
CC: Arjan van de Ven 
---
 drivers/pci/pci-sysfs.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index e982010..cbb13be 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -1008,6 +1008,9 @@ static int pci_mmap_resource(struct kobject *kobj, struct 
bin_attribute *attr,
if (i >= PCI_ROM_RESOURCE)
return -ENODEV;
 
+   if (res->flags & IORESOURCE_MEM && iomem_is_exclusive(res->start))
+   return -EINVAL;
+
if (!pci_mmap_fits(pdev, i, vma, PCI_MMAP_SYSFS)) {
WARN(1, "process \"%s\" tried to map 0x%08lx bytes at page 
0x%08lx on %s BAR %d (start 0x%16Lx, size 0x%16Lx)\n",
current->comm, vma->vm_end-vma->vm_start, vma->vm_pgoff,
@@ -1024,10 +1027,6 @@ static int pci_mmap_resource(struct kobject *kobj, 
struct bin_attribute *attr,
pci_resource_to_user(pdev, i, res, , );
vma->vm_pgoff += start >> PAGE_SHIFT;
mmap_type = res->flags & IORESOURCE_MEM ? pci_mmap_mem : pci_mmap_io;
-
-   if (res->flags & IORESOURCE_MEM && iomem_is_exclusive(start))
-   return -EINVAL;
-
return pci_mmap_page_range(pdev, vma, mmap_type, write_combine);
 }
 
-- 
1.8.4.5



[PATCH v11 53/60] PCI: Kill macro checking for bus io port sizing

2016-04-07 Thread Yinghai Lu
We can use new generic version skip_isa_ioresource_align() instead
of macro, and then kill the marco.

Signed-off-by: Yinghai Lu 
---
 drivers/pci/setup-bus.c | 17 +++--
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 5ba4bf5..65a41e7 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1159,15 +1159,12 @@ int skip_isa_ioresource_align(struct pci_bus *bus)
return 0;
 }
 
-static resource_size_t size_aligned_for_isa(resource_size_t size)
+static resource_size_t size_aligned_for_isa(resource_size_t size,
+   struct pci_bus *bus)
 {
-   /*
-* To be fixed in 2.5: we should have sort of HAVE_ISA
-*  flag in the struct pci_bus.
-*/
-#if defined(CONFIG_ISA) || defined(CONFIG_EISA)
-   size = (size & 0xff) + ((size & ~0xffUL) << 2);
-#endif
+   if (!skip_isa_ioresource_align(bus))
+   size = (size & 0xff) + ((size & ~0xffUL) << 2);
+
return size;
 }
 
@@ -1236,12 +1233,12 @@ static void pbus_size_io(struct pci_bus *bus, 
resource_size_t min_size,
}
}
 
-   size = size_aligned_for_isa(size);
+   size = size_aligned_for_isa(size, bus);
size += size1;
if (size || min_size)
size0 = calculate_size(size, min_size,
resource_size(b_res), min_align);
-   sum_add_size = size_aligned_for_isa(sum_add_size);
+   sum_add_size = size_aligned_for_isa(sum_add_size, bus);
sum_add_size += sum_add_size1;
if (sum_add_size < min_sum_size)
sum_add_size = min_sum_size;
-- 
1.8.4.5



[PATCH v11 30/60] PCI: Reorder resources list for required/optional resources

2016-04-07 Thread Yinghai Lu
We try to allocate required+optional before allocate required only and
expand with optional.

At first we update size and alignment for required+optional resource.
And after that we reorder them with new alignment, but current we only
do that STARTALIGN ones.

For SIZEALIGN type resource, after add back add_size, the alignment
get changed, so need to do sorting like STARTALIGN type resources.

Also we need to reorder the sorting back after we restore
resource to required only when required+optional fail to allocate for all.

So move out the reordering code from the loop to separated function,
and call it two times accordingly.

Signed-off-by: Yinghai Lu 
---
 drivers/pci/setup-bus.c | 62 +
 1 file changed, 37 insertions(+), 25 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index f6b7b8f..aed62cc 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -285,6 +285,31 @@ static inline void reset_resource(struct resource *res)
res->flags = 0;
 }
 
+static void sort_resources(struct list_head *head)
+{
+   struct pci_dev_resource *res1, *tmp_res, *res2;
+
+   list_for_each_entry_safe(res1, tmp_res, head, list) {
+   resource_size_t align1, size1, align2, size2;
+
+   align1 = pci_resource_alignment(res1->dev, res1->res);
+   size1 = resource_size(res1->res);
+
+   /* reorder it */
+   list_for_each_entry(res2, head, list) {
+   if (res2 == res1)
+   break;
+
+   align2 = pci_resource_alignment(res2->dev, res2->res);
+   size2 = resource_size(res2->res);
+   if (is_before(align1, size1, align2, size2)) {
+   list_move_tail(>list, >list);
+   break;
+   }
+   }
+   }
+}
+
 /**
  * reassign_resources_sorted() - satisfy any additional resource requests
  *
@@ -453,9 +478,9 @@ static void __assign_resources_sorted(struct list_head 
*head,
LIST_HEAD(save_head);
LIST_HEAD(local_fail_head);
struct pci_dev_resource *save_res;
-   struct pci_dev_resource *dev_res, *tmp_res, *dev_res2;
+   struct pci_dev_resource *dev_res, *tmp_res;
unsigned long fail_type;
-   resource_size_t add_align, align;
+   resource_size_t add_align;
 
/* Check if optional add_size is there */
if (!realloc_head || list_empty(realloc_head))
@@ -470,47 +495,32 @@ static void __assign_resources_sorted(struct list_head 
*head,
}
 
/* Update res in head list with add_size in realloc_head list */
-   list_for_each_entry_safe(dev_res, tmp_res, head, list) {
+   list_for_each_entry(dev_res, head, list) {
dev_res->res->end += get_res_add_size(realloc_head,
dev_res->res);
 
/*
 * There are two kinds of additional resources in the list:
-* 1. bridge resource  -- IORESOURCE_STARTALIGN
-* 2. SR-IOV resource   -- IORESOURCE_SIZEALIGN
-* Here just fix the additional alignment for bridge
+* 1. bridge resource with IORESOURCE_STARTALIGN
+*need to update start to change alignment
+* 2. resource with IORESOURCE_SIZEALIGN
+*update size above already change alignment.
 */
if (!(dev_res->res->flags & IORESOURCE_STARTALIGN))
continue;
 
add_align = get_res_add_align(realloc_head, dev_res->res);
 
-   /*
-* The "head" list is sorted by the alignment to make sure
-* resources with bigger alignment will be assigned first.
-* After we change the alignment of a dev_res in "head" list,
-* we need to reorder the list by alignment to make it
-* consistent.
-*/
-   if (add_align > dev_res->res->start) {
+   if (add_align) {
resource_size_t r_size = resource_size(dev_res->res);
 
dev_res->res->start = add_align;
dev_res->res->end = add_align + r_size - 1;
-
-   list_for_each_entry(dev_res2, head, list) {
-   align = pci_resource_alignment(dev_res2->dev,
-  dev_res2->res);
-   if (add_align > align) {
-   list_move_tail(_res->list,
-  _res2->list);
-   break;
-   }
-   }
}
-
}
 
+   

Re: [PATCH v2] clk: let clk_disable() return immediately if clk is NULL or error

2016-04-07 Thread Stephen Boyd
On 04/05, Masahiro Yamada wrote:
> The clk_disable() in the common clock framework (drivers/clk/clk.c)
> returns immediately if a given clk is NULL or an error pointer.  It
> allows clock consumers to call clk_disable() without IS_ERR_OR_NULL
> checking if drivers are only used with the common clock framework.
> 
> Unfortunately, NULL/error checking is missing from some of non-common
> clk_disable() implementations.  This prevents us from completely
> dropping NULL/error checking from callers.  Let's make it tree-wide
> consistent by adding IS_ERR_OR_NULL(clk) to all callees.
> 
> Signed-off-by: Masahiro Yamada 
> Acked-by: Greg Ungerer 
> Acked-by: Wan Zongshun 
> ---
> 
> Stephen,
> 
> This patch has been unapplied for a long time.
> 
> Please let me know if there is something wrong with this patch.
> 

I'm mostly confused why we wouldn't want to encourage people to
call clk_disable or unprepare on a clk that's an error pointer.
Typically an error pointer should be dealt with, instead of
silently ignored, so why wasn't it dealt with by passing it up
the probe() path?

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


[PATCH v11 30/60] PCI: Reorder resources list for required/optional resources

2016-04-07 Thread Yinghai Lu
We try to allocate required+optional before allocate required only and
expand with optional.

At first we update size and alignment for required+optional resource.
And after that we reorder them with new alignment, but current we only
do that STARTALIGN ones.

For SIZEALIGN type resource, after add back add_size, the alignment
get changed, so need to do sorting like STARTALIGN type resources.

Also we need to reorder the sorting back after we restore
resource to required only when required+optional fail to allocate for all.

So move out the reordering code from the loop to separated function,
and call it two times accordingly.

Signed-off-by: Yinghai Lu 
---
 drivers/pci/setup-bus.c | 62 +
 1 file changed, 37 insertions(+), 25 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index f6b7b8f..aed62cc 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -285,6 +285,31 @@ static inline void reset_resource(struct resource *res)
res->flags = 0;
 }
 
+static void sort_resources(struct list_head *head)
+{
+   struct pci_dev_resource *res1, *tmp_res, *res2;
+
+   list_for_each_entry_safe(res1, tmp_res, head, list) {
+   resource_size_t align1, size1, align2, size2;
+
+   align1 = pci_resource_alignment(res1->dev, res1->res);
+   size1 = resource_size(res1->res);
+
+   /* reorder it */
+   list_for_each_entry(res2, head, list) {
+   if (res2 == res1)
+   break;
+
+   align2 = pci_resource_alignment(res2->dev, res2->res);
+   size2 = resource_size(res2->res);
+   if (is_before(align1, size1, align2, size2)) {
+   list_move_tail(>list, >list);
+   break;
+   }
+   }
+   }
+}
+
 /**
  * reassign_resources_sorted() - satisfy any additional resource requests
  *
@@ -453,9 +478,9 @@ static void __assign_resources_sorted(struct list_head 
*head,
LIST_HEAD(save_head);
LIST_HEAD(local_fail_head);
struct pci_dev_resource *save_res;
-   struct pci_dev_resource *dev_res, *tmp_res, *dev_res2;
+   struct pci_dev_resource *dev_res, *tmp_res;
unsigned long fail_type;
-   resource_size_t add_align, align;
+   resource_size_t add_align;
 
/* Check if optional add_size is there */
if (!realloc_head || list_empty(realloc_head))
@@ -470,47 +495,32 @@ static void __assign_resources_sorted(struct list_head 
*head,
}
 
/* Update res in head list with add_size in realloc_head list */
-   list_for_each_entry_safe(dev_res, tmp_res, head, list) {
+   list_for_each_entry(dev_res, head, list) {
dev_res->res->end += get_res_add_size(realloc_head,
dev_res->res);
 
/*
 * There are two kinds of additional resources in the list:
-* 1. bridge resource  -- IORESOURCE_STARTALIGN
-* 2. SR-IOV resource   -- IORESOURCE_SIZEALIGN
-* Here just fix the additional alignment for bridge
+* 1. bridge resource with IORESOURCE_STARTALIGN
+*need to update start to change alignment
+* 2. resource with IORESOURCE_SIZEALIGN
+*update size above already change alignment.
 */
if (!(dev_res->res->flags & IORESOURCE_STARTALIGN))
continue;
 
add_align = get_res_add_align(realloc_head, dev_res->res);
 
-   /*
-* The "head" list is sorted by the alignment to make sure
-* resources with bigger alignment will be assigned first.
-* After we change the alignment of a dev_res in "head" list,
-* we need to reorder the list by alignment to make it
-* consistent.
-*/
-   if (add_align > dev_res->res->start) {
+   if (add_align) {
resource_size_t r_size = resource_size(dev_res->res);
 
dev_res->res->start = add_align;
dev_res->res->end = add_align + r_size - 1;
-
-   list_for_each_entry(dev_res2, head, list) {
-   align = pci_resource_alignment(dev_res2->dev,
-  dev_res2->res);
-   if (add_align > align) {
-   list_move_tail(_res->list,
-  _res2->list);
-   break;
-   }
-   }
}
-
}
 
+   sort_resources(head);
+
   

Re: [PATCH v2] clk: let clk_disable() return immediately if clk is NULL or error

2016-04-07 Thread Stephen Boyd
On 04/05, Masahiro Yamada wrote:
> The clk_disable() in the common clock framework (drivers/clk/clk.c)
> returns immediately if a given clk is NULL or an error pointer.  It
> allows clock consumers to call clk_disable() without IS_ERR_OR_NULL
> checking if drivers are only used with the common clock framework.
> 
> Unfortunately, NULL/error checking is missing from some of non-common
> clk_disable() implementations.  This prevents us from completely
> dropping NULL/error checking from callers.  Let's make it tree-wide
> consistent by adding IS_ERR_OR_NULL(clk) to all callees.
> 
> Signed-off-by: Masahiro Yamada 
> Acked-by: Greg Ungerer 
> Acked-by: Wan Zongshun 
> ---
> 
> Stephen,
> 
> This patch has been unapplied for a long time.
> 
> Please let me know if there is something wrong with this patch.
> 

I'm mostly confused why we wouldn't want to encourage people to
call clk_disable or unprepare on a clk that's an error pointer.
Typically an error pointer should be dealt with, instead of
silently ignored, so why wasn't it dealt with by passing it up
the probe() path?

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


[PATCH v3 7/7] x86/entry: Make gs_change a local label

2016-04-07 Thread Andy Lutomirski
From: Borislav Petkov 

... so that it doesn't appear in objdump output.

Signed-off-by: Borislav Petkov 
Signed-off-by: Andy Lutomirski 
---
 arch/x86/entry/entry_64.S | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 64d2033d1e49..1693c17dbf81 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -781,7 +781,7 @@ ENTRY(native_load_gs_index)
pushfq
DISABLE_INTERRUPTS(CLBR_ANY & ~CLBR_RDI)
SWAPGS
-gs_change:
+.Lgs_change:
movl%edi, %gs
 2: ALTERNATIVE "", "mfence", X86_BUG_SWAPGS_FENCE
SWAPGS
@@ -789,7 +789,7 @@ gs_change:
ret
 END(native_load_gs_index)
 
-   _ASM_EXTABLE(gs_change, bad_gs)
+   _ASM_EXTABLE(.Lgs_change, bad_gs)
.section .fixup, "ax"
/* running with kernelgs */
 bad_gs:
@@ -1019,13 +1019,13 @@ ENTRY(error_entry)
movl%ecx, %eax  /* zero extend */
cmpq%rax, RIP+8(%rsp)
je  .Lbstep_iret
-   cmpq$gs_change, RIP+8(%rsp)
+   cmpq$.Lgs_change, RIP+8(%rsp)
jne .Lerror_entry_done
 
/*
-* hack: gs_change can fail with user gsbase.  If this happens, fix up
+* hack: .Lgs_change can fail with user gsbase.  If this happens, fix up
 * gsbase and proceed.  We'll fix up the exception and land in
-* gs_change's error handler with kernel gsbase.
+* .Lgs_change's error handler with kernel gsbase.
 */
jmp .Lerror_entry_from_usermode_swapgs
 
-- 
2.5.5



[PATCH v3 1/7] selftests/x86: Test the FSBASE/GSBASE API and context switching

2016-04-07 Thread Andy Lutomirski
This catches two distinct bugs in the current code.  I'll fix them.

Signed-off-by: Andy Lutomirski 
---
 tools/testing/selftests/x86/Makefile   |   1 +
 tools/testing/selftests/x86/fsgsbase.c | 398 +
 2 files changed, 399 insertions(+)
 create mode 100644 tools/testing/selftests/x86/fsgsbase.c

diff --git a/tools/testing/selftests/x86/Makefile 
b/tools/testing/selftests/x86/Makefile
index b47ebd170690..c73425de3cfe 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -9,6 +9,7 @@ TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs 
syscall_nt ptrace_sysc
 TARGETS_C_32BIT_ONLY := entry_from_vm86 syscall_arg_fault test_syscall_vdso 
unwind_vdso \
test_FCMOV test_FCOMI test_FISTTP \
vdso_restorer
+TARGETS_C_64BIT_ONLY := fsgsbase
 
 TARGETS_C_32BIT_ALL := $(TARGETS_C_BOTHBITS) $(TARGETS_C_32BIT_ONLY)
 TARGETS_C_64BIT_ALL := $(TARGETS_C_BOTHBITS) $(TARGETS_C_64BIT_ONLY)
diff --git a/tools/testing/selftests/x86/fsgsbase.c 
b/tools/testing/selftests/x86/fsgsbase.c
new file mode 100644
index ..b7733055a359
--- /dev/null
+++ b/tools/testing/selftests/x86/fsgsbase.c
@@ -0,0 +1,398 @@
+/*
+ * fsgsbase.c, an fsgsbase test
+ * Copyright (c) 2014-2016 Andy Lutomirski
+ * GPL v2
+ */
+
+#define _GNU_SOURCE
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#ifndef __x86_64__
+# error This test is 64-bit only
+#endif
+
+static volatile sig_atomic_t want_segv;
+static volatile unsigned long segv_addr;
+
+static int nerrs;
+
+static void sethandler(int sig, void (*handler)(int, siginfo_t *, void *),
+  int flags)
+{
+   struct sigaction sa;
+   memset(, 0, sizeof(sa));
+   sa.sa_sigaction = handler;
+   sa.sa_flags = SA_SIGINFO | flags;
+   sigemptyset(_mask);
+   if (sigaction(sig, , 0))
+   err(1, "sigaction");
+}
+
+static void clearhandler(int sig)
+{
+   struct sigaction sa;
+   memset(, 0, sizeof(sa));
+   sa.sa_handler = SIG_DFL;
+   sigemptyset(_mask);
+   if (sigaction(sig, , 0))
+   err(1, "sigaction");
+}
+
+static void sigsegv(int sig, siginfo_t *si, void *ctx_void)
+{
+   ucontext_t *ctx = (ucontext_t*)ctx_void;
+
+   if (!want_segv) {
+   clearhandler(SIGSEGV);
+   return;  /* Crash cleanly. */
+   }
+
+   want_segv = false;
+   segv_addr = (unsigned long)si->si_addr;
+
+   ctx->uc_mcontext.gregs[REG_RIP] += 4;   /* Skip the faulting mov */
+
+}
+
+enum which_base { FS, GS };
+
+static unsigned long read_base(enum which_base which)
+{
+   unsigned long offset;
+   /*
+* Unless we have FSGSBASE, there's no direct way to do this from
+* user mode.  We can get at it indirectly using signals, though.
+*/
+
+   want_segv = true;
+
+   offset = 0;
+   if (which == FS) {
+   /* Use a constant-length instruction here. */
+   asm volatile ("mov %%fs:(%%rcx), %%rax" : : "c" (offset) : 
"rax");
+   } else {
+   asm volatile ("mov %%gs:(%%rcx), %%rax" : : "c" (offset) : 
"rax");
+   }
+   if (!want_segv)
+   return segv_addr + offset;
+
+   /*
+* If that didn't segfault, try the other end of the address space.
+* Unless we get really unlucky and run into the vsyscall page, this
+* is guaranteed to segfault.
+*/
+
+   offset = (ULONG_MAX >> 1) + 1;
+   if (which == FS) {
+   asm volatile ("mov %%fs:(%%rcx), %%rax"
+ : : "c" (offset) : "rax");
+   } else {
+   asm volatile ("mov %%gs:(%%rcx), %%rax"
+ : : "c" (offset) : "rax");
+   }
+   if (!want_segv)
+   return segv_addr + offset;
+
+   abort();
+}
+
+static void check_gs_value(unsigned long value)
+{
+   unsigned long base;
+   unsigned short sel;
+
+   printf("[RUN]\tARCH_SET_GS to 0x%lx\n", value);
+   if (syscall(SYS_arch_prctl, ARCH_SET_GS, value) != 0)
+   err(1, "ARCH_SET_GS");
+
+   asm volatile ("mov %%gs, %0" : "=rm" (sel));
+   base = read_base(GS);
+   if (base == value) {
+   printf("[OK]\tGSBASE was set as expected (selector 0x%hx)\n",
+  sel);
+   } else {
+   nerrs++;
+   printf("[FAIL]\tGSBASE was not as expected: got 0x%lx (selector 
0x%hx)\n",
+  base, sel);
+   }
+
+   if (syscall(SYS_arch_prctl, ARCH_GET_GS, ) != 0)
+   err(1, "ARCH_GET_GS");
+   if (base == value) {
+   printf("[OK]\tARCH_GET_GS worked as expected (selector 
0x%hx)\n",
+  sel);
+   } else {
+   nerrs++;
+ 

<    1   2   3   4   5   6   7   8   9   10   >