Re: [PATCH 1/2] mfd: max77693: Allow building as a module
On Thu, Apr 07, 2016 at 02:29:47PM +0100, Lee Jones wrote: > On Mon, 04 Apr 2016, Krzysztof Kozlowski wrote: > > > The consumer of max77693 regulators on Trats2 board (samsung-usb2-phy > > driver) supports deferred probing so the max77693 main MFD driver can be > > built now as a module. This gives more flexibility and removes manual > > ordering of init calls. > > > > Suggested-by: Paul Gortmaker> > Cc: Paul Gortmaker > > Signed-off-by: Krzysztof Kozlowski > > --- > > drivers/mfd/Kconfig| 4 ++-- > > drivers/mfd/max77693.c | 14 ++ > > 2 files changed, 4 insertions(+), 14 deletions(-) > > I assume this can be taken immediately and doesn't depend on anything > external to the set? > > For my own reference: > Acked-by: Lee Jones Hi, Yes, this can be taken as is. Only second patch (changing defconfig) depends on this. I can take the second patch through samsung tree but that would require a tag/branch with this... which looks like an overkill. So maybe you would take both? Best regards, Krzysztof
Re: [PATCH 1/2] mfd: max77693: Allow building as a module
On Thu, Apr 07, 2016 at 02:29:47PM +0100, Lee Jones wrote: > On Mon, 04 Apr 2016, Krzysztof Kozlowski wrote: > > > The consumer of max77693 regulators on Trats2 board (samsung-usb2-phy > > driver) supports deferred probing so the max77693 main MFD driver can be > > built now as a module. This gives more flexibility and removes manual > > ordering of init calls. > > > > Suggested-by: Paul Gortmaker > > Cc: Paul Gortmaker > > Signed-off-by: Krzysztof Kozlowski > > --- > > drivers/mfd/Kconfig| 4 ++-- > > drivers/mfd/max77693.c | 14 ++ > > 2 files changed, 4 insertions(+), 14 deletions(-) > > I assume this can be taken immediately and doesn't depend on anything > external to the set? > > For my own reference: > Acked-by: Lee Jones Hi, Yes, this can be taken as is. Only second patch (changing defconfig) depends on this. I can take the second patch through samsung tree but that would require a tag/branch with this... which looks like an overkill. So maybe you would take both? Best regards, Krzysztof
[PATCH] [linux-next]:ALSA: Fix a typo in timestamping.txt
This patch fix a spelling typo found in Documentation/sound/alsa/timestamping.txt Signed-off-by: Masanari Iida--- Documentation/sound/alsa/timestamping.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/sound/alsa/timestamping.txt b/Documentation/sound/alsa/timestamping.txt index 0b191a23f534..1b6473f393a8 100644 --- a/Documentation/sound/alsa/timestamping.txt +++ b/Documentation/sound/alsa/timestamping.txt @@ -129,7 +129,7 @@ will be required to issue multiple queries and perform an interpolation of the results In some hardware-specific configuration, the system timestamp is -latched by a low-level audio subsytem, and the information provided +latched by a low-level audio subsystem, and the information provided back to the driver. Due to potential delays in the communication with the hardware, there is a risk of misalignment with the avail and delay information. To make sure applications are not confused, a -- 2.8.0
[PATCH] [linux-next]:ALSA: Fix a typo in timestamping.txt
This patch fix a spelling typo found in Documentation/sound/alsa/timestamping.txt Signed-off-by: Masanari Iida --- Documentation/sound/alsa/timestamping.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/sound/alsa/timestamping.txt b/Documentation/sound/alsa/timestamping.txt index 0b191a23f534..1b6473f393a8 100644 --- a/Documentation/sound/alsa/timestamping.txt +++ b/Documentation/sound/alsa/timestamping.txt @@ -129,7 +129,7 @@ will be required to issue multiple queries and perform an interpolation of the results In some hardware-specific configuration, the system timestamp is -latched by a low-level audio subsytem, and the information provided +latched by a low-level audio subsystem, and the information provided back to the driver. Due to potential delays in the communication with the hardware, there is a risk of misalignment with the avail and delay information. To make sure applications are not confused, a -- 2.8.0
Re: [PATCH 2/2] pci, acpi: free IO resource during shutdown
Hi Tomasz, On 4/7/2016 5:41 PM, Bjorn Helgaas wrote: >>> You say this is undoing the effect of pci_remap_iospace(), but that's >>> > > only called by native drivers and the generic (OF) driver, not by >>> > > pci_root.c. >> > >> > See the ACPI root bridge driver above. > If this is a fix to patches that haven't been merged yet, we need to > squash the fix into the patches. > Can you merge these to two patches to your series for the next post? I need to remove weak on the first patch per direction from Bjorn and fix the function comments. You could as well do this while you are merging. Let me know what your preference is. -- Sinan Kaya Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH 2/2] pci, acpi: free IO resource during shutdown
Hi Tomasz, On 4/7/2016 5:41 PM, Bjorn Helgaas wrote: >>> You say this is undoing the effect of pci_remap_iospace(), but that's >>> > > only called by native drivers and the generic (OF) driver, not by >>> > > pci_root.c. >> > >> > See the ACPI root bridge driver above. > If this is a fix to patches that haven't been merged yet, we need to > squash the fix into the patches. > Can you merge these to two patches to your series for the next post? I need to remove weak on the first patch per direction from Bjorn and fix the function comments. You could as well do this while you are merging. Let me know what your preference is. -- Sinan Kaya Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH 17/19] perf tools: Build syscall table .c header from kernel's syscall_64.tbl
On 2016/4/8 4:58, Arnaldo Carvalho de Melo wrote: From: Arnaldo Carvalho de MeloWe used libaudit to map ids to syscall names and vice-versa, but that imposes a delay in supporting new syscalls, having to wait for libaudit to get those new syscalls on its tables. To remove that delay, for x86_64 initially, grab a copy of arch/x86/entry/syscalls/syscall_64.tbl and use it to generate those tables. Syscalls currently not available in audit-libs: # trace -e copy_file_range,membarrier,mlock2,pread64,pwrite64,timerfd_create,userfaultfd Error: Invalid syscall copy_file_range, membarrier, mlock2, pread64, pwrite64, timerfd_create, userfaultfd Hint:try 'perf list syscalls:sys_enter_*' Hint:and: 'man syscalls' # With this patch: # trace -e copy_file_range,membarrier,mlock2,pread64,pwrite64,timerfd_create,userfaultfd 8505.733 ( 0.010 ms): gnome-shell/2519 timerfd_create(flags: 524288) = 36 8506.688 ( 0.005 ms): gnome-shell/2519 timerfd_create(flags: 524288) = 40 30023.097 ( 0.025 ms): qemu-system-x8/24629 pwrite64(fd: 18, buf: 0x7f63ae382000, count: 4096, pos: 529592320) = 4096 31268.712 ( 0.028 ms): qemu-system-x8/24629 pwrite64(fd: 18, buf: 0x7f63afd8b000, count: 4096, pos: 2314133504) = 4096 31268.854 ( 0.016 ms): qemu-system-x8/24629 pwrite64(fd: 18, buf: 0x7f63afda2000, count: 4096, pos: 2314137600) = 4096 Cc: Adrian Hunter Cc: David Ahern Cc: Jiri Olsa Cc: Namhyung Kim Cc: Wang Nan Link: http://lkml.kernel.org/n/tip-5n4sx1wp0ig75dwcghf9m...@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo --- There is racing in Makefile. I see following output: ... CC /xx/xx/perf/1.0-r8/perf-1.0/util/callchain.o CC /xx/xx/perf/1.0-r8/perf-1.0/util/values.o CC /xx/xx/perf/1.0-r8/perf-1.0/util/debug.o make[3]: *** No rule to make target `/xx/xx/perf/1.0-r8/perf-1.0/util/syscalltbl.o'. Stop. make[3]: *** Waiting for unfinished jobs CC /xx/xx/perf/1.0-r8/perf-1.0/util/machine.o CC /xx/xx/perf/1.0-r8/perf-1.0/util/map.o ... The error disappeareafter doing 'make archheaders' before 'make all'. Thank you.
Re: [PATCH 17/19] perf tools: Build syscall table .c header from kernel's syscall_64.tbl
On 2016/4/8 4:58, Arnaldo Carvalho de Melo wrote: From: Arnaldo Carvalho de Melo We used libaudit to map ids to syscall names and vice-versa, but that imposes a delay in supporting new syscalls, having to wait for libaudit to get those new syscalls on its tables. To remove that delay, for x86_64 initially, grab a copy of arch/x86/entry/syscalls/syscall_64.tbl and use it to generate those tables. Syscalls currently not available in audit-libs: # trace -e copy_file_range,membarrier,mlock2,pread64,pwrite64,timerfd_create,userfaultfd Error: Invalid syscall copy_file_range, membarrier, mlock2, pread64, pwrite64, timerfd_create, userfaultfd Hint:try 'perf list syscalls:sys_enter_*' Hint:and: 'man syscalls' # With this patch: # trace -e copy_file_range,membarrier,mlock2,pread64,pwrite64,timerfd_create,userfaultfd 8505.733 ( 0.010 ms): gnome-shell/2519 timerfd_create(flags: 524288) = 36 8506.688 ( 0.005 ms): gnome-shell/2519 timerfd_create(flags: 524288) = 40 30023.097 ( 0.025 ms): qemu-system-x8/24629 pwrite64(fd: 18, buf: 0x7f63ae382000, count: 4096, pos: 529592320) = 4096 31268.712 ( 0.028 ms): qemu-system-x8/24629 pwrite64(fd: 18, buf: 0x7f63afd8b000, count: 4096, pos: 2314133504) = 4096 31268.854 ( 0.016 ms): qemu-system-x8/24629 pwrite64(fd: 18, buf: 0x7f63afda2000, count: 4096, pos: 2314137600) = 4096 Cc: Adrian Hunter Cc: David Ahern Cc: Jiri Olsa Cc: Namhyung Kim Cc: Wang Nan Link: http://lkml.kernel.org/n/tip-5n4sx1wp0ig75dwcghf9m...@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo --- There is racing in Makefile. I see following output: ... CC /xx/xx/perf/1.0-r8/perf-1.0/util/callchain.o CC /xx/xx/perf/1.0-r8/perf-1.0/util/values.o CC /xx/xx/perf/1.0-r8/perf-1.0/util/debug.o make[3]: *** No rule to make target `/xx/xx/perf/1.0-r8/perf-1.0/util/syscalltbl.o'. Stop. make[3]: *** Waiting for unfinished jobs CC /xx/xx/perf/1.0-r8/perf-1.0/util/machine.o CC /xx/xx/perf/1.0-r8/perf-1.0/util/map.o ... The error disappeareafter doing 'make archheaders' before 'make all'. Thank you.
Re: [PATCHSET RFC cgroup/for-4.6] cgroup, sched: implement resource group and PRIO_RGRP
On Thu, 2016-04-07 at 16:23 -0400, Johannes Weiner wrote: > Mike, is that the one you referred to with one group per customer > account? If so, would you have a pointer to where you outline it? The usage I loosely outlined, I did in this thread. All of the gory details I do not have, do not want, and could not provide if I did. -Mike
Re: [PATCHSET RFC cgroup/for-4.6] cgroup, sched: implement resource group and PRIO_RGRP
On Thu, 2016-04-07 at 16:23 -0400, Johannes Weiner wrote: > Mike, is that the one you referred to with one group per customer > account? If so, would you have a pointer to where you outline it? The usage I loosely outlined, I did in this thread. All of the gory details I do not have, do not want, and could not provide if I did. -Mike
Re: [PATCH] Fix issue with dmesg.py and python 3.X
Hi Dom, I've just tested your patch quickly, and it generates an error on Python 2.7 On 05/04/16 19:38, Dom Cote wrote: > When using GDB built with python 2.7, > > Inferior.read_memory (address, length) > > returns a buffer object. When using GDB built with python 3.X, > it returns a memoryview object, which cannot be added to another > memoryview object. > > Replace the addition (+) of 2 python memoryview objects > with the addition of 2 'bytes' objects. > > Create a read_memoryview() function that always return a memoryview > object. The change to memoryview appears to work - but I don't think it needs to be indirected by a function definition? > Change the read_u16 function so it doesn't need to use ord() > anymore. This change is separate to the memoryview object, and should be in it's own patch. One patch should fix one thing independently. For example, the change to memoryview object appears to be functional, and the read_u16 is not. If these changes are in separate patches, then working changes can be accepted sooner, and tested easier. > Tested with Python 3.4 and gdb 7.7 > > Signed-off-by: Dom Cote> --- > scripts/gdb/linux/dmesg.py | 9 + > scripts/gdb/linux/utils.py | 9 +++-- > 2 files changed, 12 insertions(+), 6 deletions(-) > > diff --git a/scripts/gdb/linux/dmesg.py b/scripts/gdb/linux/dmesg.py > index 927d0d2a3145..96f4732157d8 100644 > --- a/scripts/gdb/linux/dmesg.py > +++ b/scripts/gdb/linux/dmesg.py > @@ -33,11 +33,12 @@ class LxDmesg(gdb.Command): > if log_first_idx < log_next_idx: > log_buf_2nd_half = -1 > length = log_next_idx - log_first_idx > -log_buf = inf.read_memory(start, length) > +log_buf = utils.read_memoryview(inf, start, length).tobytes() This looks like it could just call memoryview() directly ... > else: > log_buf_2nd_half = log_buf_len - log_first_idx > -log_buf = inf.read_memory(start, log_buf_2nd_half) + \ > -inf.read_memory(log_buf_addr, log_next_idx) > +a = utils.read_memoryview(inf, start, log_buf_2nd_half) > +b = utils.read_memoryview(inf, log_buf_addr, log_next_idx) Likewise here I presume ... > +log_buf = a.tobytes() + b.tobytes() > > pos = 0 > while pos < log_buf.__len__(): > @@ -53,7 +54,7 @@ class LxDmesg(gdb.Command): > text = log_buf[pos + 16:pos + 16 + text_len] > time_stamp = utils.read_u64(log_buf[pos:pos + 8]) > > -for line in memoryview(text).tobytes().splitlines(): > +for line in text.splitlines(): This looks like a separate change, not related to the bug fix? If this is an improvement, rather than a fix - it should probably also have it's own patch. (at first glance it looks like an improvement :D ) I just haven't seen yet if it depends on the other change. > gdb.write("[{time:12.6f}] {line}\n".format( > time=time_stamp / 10.0, > line=line)) > diff --git a/scripts/gdb/linux/utils.py b/scripts/gdb/linux/utils.py > index 0893b326a28b..c2b779e7bd26 100644 > --- a/scripts/gdb/linux/utils.py > +++ b/scripts/gdb/linux/utils.py > @@ -87,11 +87,16 @@ def get_target_endianness(): > return target_endianness > > > +# Compat between GDB built with python 2.7 vs 3.X > +def read_memoryview(inf, start, length): > +return memoryview(inf.read_memory(start, length)) This simply always returns a memoryview object, so why not just change the respective lines, which you have already had to modify to call/use memoryview directly? > + > + > def read_u16(buffer): > if get_target_endianness() == LITTLE_ENDIAN: > -return ord(buffer[0]) + (ord(buffer[1]) << 8) > +return buffer[0] + (buffer[1] << 8) > else: > -return ord(buffer[1]) + (ord(buffer[0]) << 8) > +return buffer[1] + (buffer[0] << 8) This breaks for me, but returning these lines to use ord() shows that the memoryview changes appear to work. What was the need for changing these lines? Does it cause a break on 3.x? This causes the following error on 2.7 for me: (gdb) lx-dmesg Traceback (most recent call last): File "/home/linuxembedded/linaro/lkd/openst-lkd/objects/arm/qemu-arm/linux/scripts/gdb/linux/dmesg.py", line 45, in invoke length = utils.read_u16(log_buf[pos + 8:pos + 10]) File "/home/linuxembedded/linaro/lkd/openst-lkd/objects/arm/qemu-arm/linux/scripts/gdb/linux/utils.py", line 97, in read_u16 return buffer[0] + (buffer[1] << 8) TypeError: unsupported operand type(s) for <<: 'str' and 'int' Error occurred in Python command: unsupported operand type(s) for <<: 'str' and 'int' > > > def read_u32(buffer): > -- Regards Kieran
Re: [PATCH] Fix issue with dmesg.py and python 3.X
Hi Dom, I've just tested your patch quickly, and it generates an error on Python 2.7 On 05/04/16 19:38, Dom Cote wrote: > When using GDB built with python 2.7, > > Inferior.read_memory (address, length) > > returns a buffer object. When using GDB built with python 3.X, > it returns a memoryview object, which cannot be added to another > memoryview object. > > Replace the addition (+) of 2 python memoryview objects > with the addition of 2 'bytes' objects. > > Create a read_memoryview() function that always return a memoryview > object. The change to memoryview appears to work - but I don't think it needs to be indirected by a function definition? > Change the read_u16 function so it doesn't need to use ord() > anymore. This change is separate to the memoryview object, and should be in it's own patch. One patch should fix one thing independently. For example, the change to memoryview object appears to be functional, and the read_u16 is not. If these changes are in separate patches, then working changes can be accepted sooner, and tested easier. > Tested with Python 3.4 and gdb 7.7 > > Signed-off-by: Dom Cote > --- > scripts/gdb/linux/dmesg.py | 9 + > scripts/gdb/linux/utils.py | 9 +++-- > 2 files changed, 12 insertions(+), 6 deletions(-) > > diff --git a/scripts/gdb/linux/dmesg.py b/scripts/gdb/linux/dmesg.py > index 927d0d2a3145..96f4732157d8 100644 > --- a/scripts/gdb/linux/dmesg.py > +++ b/scripts/gdb/linux/dmesg.py > @@ -33,11 +33,12 @@ class LxDmesg(gdb.Command): > if log_first_idx < log_next_idx: > log_buf_2nd_half = -1 > length = log_next_idx - log_first_idx > -log_buf = inf.read_memory(start, length) > +log_buf = utils.read_memoryview(inf, start, length).tobytes() This looks like it could just call memoryview() directly ... > else: > log_buf_2nd_half = log_buf_len - log_first_idx > -log_buf = inf.read_memory(start, log_buf_2nd_half) + \ > -inf.read_memory(log_buf_addr, log_next_idx) > +a = utils.read_memoryview(inf, start, log_buf_2nd_half) > +b = utils.read_memoryview(inf, log_buf_addr, log_next_idx) Likewise here I presume ... > +log_buf = a.tobytes() + b.tobytes() > > pos = 0 > while pos < log_buf.__len__(): > @@ -53,7 +54,7 @@ class LxDmesg(gdb.Command): > text = log_buf[pos + 16:pos + 16 + text_len] > time_stamp = utils.read_u64(log_buf[pos:pos + 8]) > > -for line in memoryview(text).tobytes().splitlines(): > +for line in text.splitlines(): This looks like a separate change, not related to the bug fix? If this is an improvement, rather than a fix - it should probably also have it's own patch. (at first glance it looks like an improvement :D ) I just haven't seen yet if it depends on the other change. > gdb.write("[{time:12.6f}] {line}\n".format( > time=time_stamp / 10.0, > line=line)) > diff --git a/scripts/gdb/linux/utils.py b/scripts/gdb/linux/utils.py > index 0893b326a28b..c2b779e7bd26 100644 > --- a/scripts/gdb/linux/utils.py > +++ b/scripts/gdb/linux/utils.py > @@ -87,11 +87,16 @@ def get_target_endianness(): > return target_endianness > > > +# Compat between GDB built with python 2.7 vs 3.X > +def read_memoryview(inf, start, length): > +return memoryview(inf.read_memory(start, length)) This simply always returns a memoryview object, so why not just change the respective lines, which you have already had to modify to call/use memoryview directly? > + > + > def read_u16(buffer): > if get_target_endianness() == LITTLE_ENDIAN: > -return ord(buffer[0]) + (ord(buffer[1]) << 8) > +return buffer[0] + (buffer[1] << 8) > else: > -return ord(buffer[1]) + (ord(buffer[0]) << 8) > +return buffer[1] + (buffer[0] << 8) This breaks for me, but returning these lines to use ord() shows that the memoryview changes appear to work. What was the need for changing these lines? Does it cause a break on 3.x? This causes the following error on 2.7 for me: (gdb) lx-dmesg Traceback (most recent call last): File "/home/linuxembedded/linaro/lkd/openst-lkd/objects/arm/qemu-arm/linux/scripts/gdb/linux/dmesg.py", line 45, in invoke length = utils.read_u16(log_buf[pos + 8:pos + 10]) File "/home/linuxembedded/linaro/lkd/openst-lkd/objects/arm/qemu-arm/linux/scripts/gdb/linux/utils.py", line 97, in read_u16 return buffer[0] + (buffer[1] << 8) TypeError: unsupported operand type(s) for <<: 'str' and 'int' Error occurred in Python command: unsupported operand type(s) for <<: 'str' and 'int' > > > def read_u32(buffer): > -- Regards Kieran
Re: [PATCH 0/3] crypto: af_alg - add TLS type encryption
On Thu, Apr 7, 2016 at 11:52 PM, Herbert Xuwrote: > On Wed, Apr 06, 2016 at 10:56:12AM -0700, Tadeusz Struk wrote: >> >> The intend is to enable HW acceleration of the TLS protocol. >> The way it will work is that the user space will send a packet of data >> via AF_ALG and HW will authenticate and encrypt it in one go. > > There have been suggestions to implement TLS data-path within > the kernel. So we should decide whether we pursue that or go > with your approach before we start adding algorithms. > Yes, please see Dave Watson's patches on this. Tom > Cheers, > -- > Email: Herbert Xu > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH 0/3] crypto: af_alg - add TLS type encryption
On Thu, Apr 7, 2016 at 11:52 PM, Herbert Xu wrote: > On Wed, Apr 06, 2016 at 10:56:12AM -0700, Tadeusz Struk wrote: >> >> The intend is to enable HW acceleration of the TLS protocol. >> The way it will work is that the user space will send a packet of data >> via AF_ALG and HW will authenticate and encrypt it in one go. > > There have been suggestions to implement TLS data-path within > the kernel. So we should decide whether we pursue that or go > with your approach before we start adding algorithms. > Yes, please see Dave Watson's patches on this. Tom > Cheers, > -- > Email: Herbert Xu > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH 0/3] crypto: af_alg - add TLS type encryption
On Wed, Apr 06, 2016 at 10:56:12AM -0700, Tadeusz Struk wrote: > > The intend is to enable HW acceleration of the TLS protocol. > The way it will work is that the user space will send a packet of data > via AF_ALG and HW will authenticate and encrypt it in one go. There have been suggestions to implement TLS data-path within the kernel. So we should decide whether we pursue that or go with your approach before we start adding algorithms. Cheers, -- Email: Herbert XuHome Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH 0/3] crypto: af_alg - add TLS type encryption
On Wed, Apr 06, 2016 at 10:56:12AM -0700, Tadeusz Struk wrote: > > The intend is to enable HW acceleration of the TLS protocol. > The way it will work is that the user space will send a packet of data > via AF_ALG and HW will authenticate and encrypt it in one go. There have been suggestions to implement TLS data-path within the kernel. So we should decide whether we pursue that or go with your approach before we start adding algorithms. Cheers, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [RFC6 PATCH v6 00/21] ILP32 for ARM64
On Thu, Apr 7, 2016 at 5:18 AM, Adam Borowskiwrote: > On Wed, 6 Apr 2016, Geert Uytterhoeven wrote: >> On Wed, Apr 6, 2016 at 12:08 AM, Yury Norov >> wrote: >>> v6: >>> - time_t, __kenel_off_t and other types turned to be 32-bit >>>for compatibility reasons (after v5 discussion); > > Introducing a new arch today with y2038 problems is not a good idea. > Linus said so with appropriately pointy words in 2011. This is the third time we had this discussion on time_t for ILP32. I had originally it as 32bit, then Catalin suggested I change it to 64bit and then Arnd (with his work for 2038 issue on 32bit arch) said ILP32 should match all other 32bit targets and the other 64bit time_t be fixed by the current work he was working on. Now you are suggesting we change it again. Arnd can you please comment more on why we want 32bit time_t instead of the 64bit one? I Know there was some POSIX (or was it C90) violation but I suspect there is an easy way to workaround this inside the kernel but the discussion to move over to 32bit time_t was already made by the time I started to look into that. > >> What makes you think these "applications that can’t readily be migrated to >> LP64 >> because they were written assuming an ILP32 data model, and that will never >> become suitable for a LP64 data model and will remain locked into ILP32 >> operating environments" are more likely to be fixed for y2038 later, than for >> LP64 now? > > Such broken applications already have plenty of bogus architecture detection > code so you need porting anyway... I don't disagree here; just see below ... > >> We're already closer to the (future) y2038 than to the (past) introduction of >> LP64... >> >> These unfixable legacy applications have been spreading through x32 to >> the shiny new arm64 server architecture (does ppc64el also have an ILP32 >> mode, >> or is it planned)? Lots of resources are spent on maintaining the status quo, >> instead of on fixing the real problems. > > As an x32 (userland) porter, I can tell you that time_t!=long _did_ cause > non-trivial amounts of work. But that work is already done (at least in > Debian), so you might as well benefit from it. There is actually private code out there which uses timespec and timeval to pass time over the wire; yes I know bad coding style and all but they did it that way. This is code which was working for x86 and we are porting it to ARM64; a data center code by the way; not some networking code even. This means they have not ported the code to fully 64bit yet and they might never. Thanks, Andrew > > > -- > A tit a day keeps the vet away.
Re: [RFC6 PATCH v6 00/21] ILP32 for ARM64
On Thu, Apr 7, 2016 at 5:18 AM, Adam Borowski wrote: > On Wed, 6 Apr 2016, Geert Uytterhoeven wrote: >> On Wed, Apr 6, 2016 at 12:08 AM, Yury Norov >> wrote: >>> v6: >>> - time_t, __kenel_off_t and other types turned to be 32-bit >>>for compatibility reasons (after v5 discussion); > > Introducing a new arch today with y2038 problems is not a good idea. > Linus said so with appropriately pointy words in 2011. This is the third time we had this discussion on time_t for ILP32. I had originally it as 32bit, then Catalin suggested I change it to 64bit and then Arnd (with his work for 2038 issue on 32bit arch) said ILP32 should match all other 32bit targets and the other 64bit time_t be fixed by the current work he was working on. Now you are suggesting we change it again. Arnd can you please comment more on why we want 32bit time_t instead of the 64bit one? I Know there was some POSIX (or was it C90) violation but I suspect there is an easy way to workaround this inside the kernel but the discussion to move over to 32bit time_t was already made by the time I started to look into that. > >> What makes you think these "applications that can’t readily be migrated to >> LP64 >> because they were written assuming an ILP32 data model, and that will never >> become suitable for a LP64 data model and will remain locked into ILP32 >> operating environments" are more likely to be fixed for y2038 later, than for >> LP64 now? > > Such broken applications already have plenty of bogus architecture detection > code so you need porting anyway... I don't disagree here; just see below ... > >> We're already closer to the (future) y2038 than to the (past) introduction of >> LP64... >> >> These unfixable legacy applications have been spreading through x32 to >> the shiny new arm64 server architecture (does ppc64el also have an ILP32 >> mode, >> or is it planned)? Lots of resources are spent on maintaining the status quo, >> instead of on fixing the real problems. > > As an x32 (userland) porter, I can tell you that time_t!=long _did_ cause > non-trivial amounts of work. But that work is already done (at least in > Debian), so you might as well benefit from it. There is actually private code out there which uses timespec and timeval to pass time over the wire; yes I know bad coding style and all but they did it that way. This is code which was working for x86 and we are porting it to ARM64; a data center code by the way; not some networking code even. This means they have not ported the code to fully 64bit yet and they might never. Thanks, Andrew > > > -- > A tit a day keeps the vet away.
Re: [RFC v1] mm: SLAB freelist randomization
On Thu, Apr 7, 2016 at 2:14 PM, Jesper Dangaard Brouerwrote: > > On Wed, 6 Apr 2016 14:45:30 -0700 Kees Cook wrote: > >> On Wed, Apr 6, 2016 at 12:35 PM, Thomas Garnier wrote: > [...] >> > re-used on slab creation for performance. >> >> I'd like to see some benchmark results for this so the Kconfig can >> include the performance characteristics. I recommend using hackbench >> and kernel build times with a before/after comparison. >> > > It looks like it only happens on init, right? (Thus must bench tools > might not be the right choice). Oh! Yes, you're right. I entirely missed that detail. :) 0-cost randomization! Sounds good to me. :) -Kees > > My slab tools for benchmarking the fastpath is here: > > https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/mm/slab_bulk_test01.c > > And I also carry a version of Christoph's slab bench tool: > > https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/mm/slab_test.c > > -- > Best regards, > Jesper Dangaard Brouer > MSc.CS, Principal Kernel Engineer at Red Hat > Author of http://www.iptv-analyzer.org > LinkedIn: http://www.linkedin.com/in/brouer -- Kees Cook Chrome OS & Brillo Security
Re: [RFC v1] mm: SLAB freelist randomization
On Thu, Apr 7, 2016 at 2:14 PM, Jesper Dangaard Brouer wrote: > > On Wed, 6 Apr 2016 14:45:30 -0700 Kees Cook wrote: > >> On Wed, Apr 6, 2016 at 12:35 PM, Thomas Garnier wrote: > [...] >> > re-used on slab creation for performance. >> >> I'd like to see some benchmark results for this so the Kconfig can >> include the performance characteristics. I recommend using hackbench >> and kernel build times with a before/after comparison. >> > > It looks like it only happens on init, right? (Thus must bench tools > might not be the right choice). Oh! Yes, you're right. I entirely missed that detail. :) 0-cost randomization! Sounds good to me. :) -Kees > > My slab tools for benchmarking the fastpath is here: > > https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/mm/slab_bulk_test01.c > > And I also carry a version of Christoph's slab bench tool: > > https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/mm/slab_test.c > > -- > Best regards, > Jesper Dangaard Brouer > MSc.CS, Principal Kernel Engineer at Red Hat > Author of http://www.iptv-analyzer.org > LinkedIn: http://www.linkedin.com/in/brouer -- Kees Cook Chrome OS & Brillo Security
[PATCH 1/2] kbuild: rename cmd_cc_i_c to cmd_cpp_i_c
This command just preprocesses .c files into .i files, so cmd_cpp_i_c seems more suitable. Signed-off-by: Masahiro Yamada--- scripts/Makefile.build | 6 +++--- tools/build/Makefile.build | 8 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/scripts/Makefile.build b/scripts/Makefile.build index e1bc190..7e4df13 100644 --- a/scripts/Makefile.build +++ b/scripts/Makefile.build @@ -152,11 +152,11 @@ cmd_cc_s_c = $(CC) $(c_flags) $(DISABLE_LTO) -fverbose-asm -S -o $@ $< $(obj)/%.s: $(src)/%.c FORCE $(call if_changed_dep,cc_s_c) -quiet_cmd_cc_i_c = CPP $(quiet_modtag) $@ -cmd_cc_i_c = $(CPP) $(c_flags) -o $@ $< +quiet_cmd_cpp_i_c = CPP $(quiet_modtag) $@ +cmd_cpp_i_c = $(CPP) $(c_flags) -o $@ $< $(obj)/%.i: $(src)/%.c FORCE - $(call if_changed_dep,cc_i_c) + $(call if_changed_dep,cpp_i_c) cmd_gensymtypes = \ $(CPP) -D__GENKSYMS__ $(c_flags) $< | \ diff --git a/tools/build/Makefile.build b/tools/build/Makefile.build index ee566e8..27f3583 100644 --- a/tools/build/Makefile.build +++ b/tools/build/Makefile.build @@ -58,8 +58,8 @@ quiet_cmd_mkdir = MKDIR$(dir $@) quiet_cmd_cc_o_c = CC $@ cmd_cc_o_c = $(CC) $(c_flags) -c -o $@ $< -quiet_cmd_cc_i_c = CPP $@ - cmd_cc_i_c = $(CC) $(c_flags) -E -o $@ $< +quiet_cmd_cpp_i_c = CPP $@ + cmd_cpp_i_c = $(CC) $(c_flags) -E -o $@ $< quiet_cmd_cc_s_c = AS $@ cmd_cc_s_c = $(CC) $(c_flags) -S -o $@ $< @@ -83,11 +83,11 @@ $(OUTPUT)%.o: %.S FORCE $(OUTPUT)%.i: %.c FORCE $(call rule_mkdir) - $(call if_changed_dep,cc_i_c) + $(call if_changed_dep,cpp_i_c) $(OUTPUT)%.s: %.S FORCE $(call rule_mkdir) - $(call if_changed_dep,cc_i_c) + $(call if_changed_dep,cpp_i_c) $(OUTPUT)%.s: %.c FORCE $(call rule_mkdir) -- 1.9.1
[PATCH 1/2] kbuild: rename cmd_cc_i_c to cmd_cpp_i_c
This command just preprocesses .c files into .i files, so cmd_cpp_i_c seems more suitable. Signed-off-by: Masahiro Yamada --- scripts/Makefile.build | 6 +++--- tools/build/Makefile.build | 8 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/scripts/Makefile.build b/scripts/Makefile.build index e1bc190..7e4df13 100644 --- a/scripts/Makefile.build +++ b/scripts/Makefile.build @@ -152,11 +152,11 @@ cmd_cc_s_c = $(CC) $(c_flags) $(DISABLE_LTO) -fverbose-asm -S -o $@ $< $(obj)/%.s: $(src)/%.c FORCE $(call if_changed_dep,cc_s_c) -quiet_cmd_cc_i_c = CPP $(quiet_modtag) $@ -cmd_cc_i_c = $(CPP) $(c_flags) -o $@ $< +quiet_cmd_cpp_i_c = CPP $(quiet_modtag) $@ +cmd_cpp_i_c = $(CPP) $(c_flags) -o $@ $< $(obj)/%.i: $(src)/%.c FORCE - $(call if_changed_dep,cc_i_c) + $(call if_changed_dep,cpp_i_c) cmd_gensymtypes = \ $(CPP) -D__GENKSYMS__ $(c_flags) $< | \ diff --git a/tools/build/Makefile.build b/tools/build/Makefile.build index ee566e8..27f3583 100644 --- a/tools/build/Makefile.build +++ b/tools/build/Makefile.build @@ -58,8 +58,8 @@ quiet_cmd_mkdir = MKDIR$(dir $@) quiet_cmd_cc_o_c = CC $@ cmd_cc_o_c = $(CC) $(c_flags) -c -o $@ $< -quiet_cmd_cc_i_c = CPP $@ - cmd_cc_i_c = $(CC) $(c_flags) -E -o $@ $< +quiet_cmd_cpp_i_c = CPP $@ + cmd_cpp_i_c = $(CC) $(c_flags) -E -o $@ $< quiet_cmd_cc_s_c = AS $@ cmd_cc_s_c = $(CC) $(c_flags) -S -o $@ $< @@ -83,11 +83,11 @@ $(OUTPUT)%.o: %.S FORCE $(OUTPUT)%.i: %.c FORCE $(call rule_mkdir) - $(call if_changed_dep,cc_i_c) + $(call if_changed_dep,cpp_i_c) $(OUTPUT)%.s: %.S FORCE $(call rule_mkdir) - $(call if_changed_dep,cc_i_c) + $(call if_changed_dep,cpp_i_c) $(OUTPUT)%.s: %.c FORCE $(call rule_mkdir) -- 1.9.1
[PATCH 2/2] kbuild: rename cmd_as_s_S to cmd_cpp_s_S
This command just preprocesses .S files into .s files, so cmd_cpp_s_S seems more suitable. Signed-off-by: Masahiro Yamada--- scripts/Makefile.build | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/scripts/Makefile.build b/scripts/Makefile.build index 7e4df13..e65a3e9 100644 --- a/scripts/Makefile.build +++ b/scripts/Makefile.build @@ -314,11 +314,11 @@ modkern_aflags := $(KBUILD_AFLAGS_KERNEL) $(AFLAGS_KERNEL) $(real-objs-m) : modkern_aflags := $(KBUILD_AFLAGS_MODULE) $(AFLAGS_MODULE) $(real-objs-m:.o=.s): modkern_aflags := $(KBUILD_AFLAGS_MODULE) $(AFLAGS_MODULE) -quiet_cmd_as_s_S = CPP $(quiet_modtag) $@ -cmd_as_s_S = $(CPP) $(a_flags) -o $@ $< +quiet_cmd_cpp_s_S = CPP $(quiet_modtag) $@ +cmd_cpp_s_S = $(CPP) $(a_flags) -o $@ $< $(obj)/%.s: $(src)/%.S FORCE - $(call if_changed_dep,as_s_S) + $(call if_changed_dep,cpp_s_S) quiet_cmd_as_o_S = AS $(quiet_modtag) $@ cmd_as_o_S = $(CC) $(a_flags) -c -o $@ $< -- 1.9.1
[PATCH 2/2] kbuild: rename cmd_as_s_S to cmd_cpp_s_S
This command just preprocesses .S files into .s files, so cmd_cpp_s_S seems more suitable. Signed-off-by: Masahiro Yamada --- scripts/Makefile.build | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/scripts/Makefile.build b/scripts/Makefile.build index 7e4df13..e65a3e9 100644 --- a/scripts/Makefile.build +++ b/scripts/Makefile.build @@ -314,11 +314,11 @@ modkern_aflags := $(KBUILD_AFLAGS_KERNEL) $(AFLAGS_KERNEL) $(real-objs-m) : modkern_aflags := $(KBUILD_AFLAGS_MODULE) $(AFLAGS_MODULE) $(real-objs-m:.o=.s): modkern_aflags := $(KBUILD_AFLAGS_MODULE) $(AFLAGS_MODULE) -quiet_cmd_as_s_S = CPP $(quiet_modtag) $@ -cmd_as_s_S = $(CPP) $(a_flags) -o $@ $< +quiet_cmd_cpp_s_S = CPP $(quiet_modtag) $@ +cmd_cpp_s_S = $(CPP) $(a_flags) -o $@ $< $(obj)/%.s: $(src)/%.S FORCE - $(call if_changed_dep,as_s_S) + $(call if_changed_dep,cpp_s_S) quiet_cmd_as_o_S = AS $(quiet_modtag) $@ cmd_as_o_S = $(CC) $(a_flags) -c -o $@ $< -- 1.9.1
Re: [PATCH] mtd: nand: s3c2410: fix bug in s3c2410_nand_correct_data()
On Fri, 8 Apr 2016 09:51:04 +0800 Zeng Zhaoxiuwrote: > > > 在 2016年04月08日 08:18, Boris Brezillon 写道: > > Hi Zeng, > > > > On Fri, 8 Apr 2016 00:48:17 +0800 > > zengzhao...@163.com wrote: > > > >> From: Zeng Zhaoxiu > >> > >> If there is only one bit difference in the ECC, the function should return > >> 1. > >> The result of "diff0 & ~(1< >> actually returns -1. > >> > >> Here, we can use the simple expression "(diff0 & (diff0 - 1)) == 0" to > >> determine > >> whether the diff0 has only one 1-bit. > > Missing Signed-off-by here. > > > >> --- > >> drivers/mtd/nand/s3c2410.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/drivers/mtd/nand/s3c2410.c b/drivers/mtd/nand/s3c2410.c > >> index 9c9397b..c9698cf 100644 > >> --- a/drivers/mtd/nand/s3c2410.c > >> +++ b/drivers/mtd/nand/s3c2410.c > >> @@ -542,7 +542,7 @@ static int s3c2410_nand_correct_data(struct mtd_info > >> *mtd, u_char *dat, > >>diff0 |= (diff1 << 8); > >>diff0 |= (diff2 << 16); > >> > >> - if ((diff0 & ~(1< >> + if ((diff0 & (diff0 - 1)) == 0) > > Or just > > > > if (hweight_long((unsigned long)diff0) == 1) > > > > which is doing exactly what the comment says. > > > > BTW, I don't understand why the current code is wrong? To me, it seems > > it's correctly detecting the case where only a single bit is different. > > What are you trying to fix exactly? > > > > Best Regards, > > > > Boris > > > > For example, assuming diff0 is 1, then fls(diff0) is equal to 1, then "~(1 << > fls(diff0))" is equal to 0xfffd, > then the result of "(diff0 & ~(1 << fls(diff0)))" is 1 , not we expected 0. > > __fls(diff0) and "(fls(diff0) - 1)" are all right, but fls(diff0) is wrong. > Indeed, I forgot that fls() was returning (position + 1). Anyway, I still think using hweight clarifies what you really want to test. -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com
Re: [PATCH] mtd: nand: s3c2410: fix bug in s3c2410_nand_correct_data()
On Fri, 8 Apr 2016 09:51:04 +0800 Zeng Zhaoxiu wrote: > > > 在 2016年04月08日 08:18, Boris Brezillon 写道: > > Hi Zeng, > > > > On Fri, 8 Apr 2016 00:48:17 +0800 > > zengzhao...@163.com wrote: > > > >> From: Zeng Zhaoxiu > >> > >> If there is only one bit difference in the ECC, the function should return > >> 1. > >> The result of "diff0 & ~(1< >> actually returns -1. > >> > >> Here, we can use the simple expression "(diff0 & (diff0 - 1)) == 0" to > >> determine > >> whether the diff0 has only one 1-bit. > > Missing Signed-off-by here. > > > >> --- > >> drivers/mtd/nand/s3c2410.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/drivers/mtd/nand/s3c2410.c b/drivers/mtd/nand/s3c2410.c > >> index 9c9397b..c9698cf 100644 > >> --- a/drivers/mtd/nand/s3c2410.c > >> +++ b/drivers/mtd/nand/s3c2410.c > >> @@ -542,7 +542,7 @@ static int s3c2410_nand_correct_data(struct mtd_info > >> *mtd, u_char *dat, > >>diff0 |= (diff1 << 8); > >>diff0 |= (diff2 << 16); > >> > >> - if ((diff0 & ~(1< >> + if ((diff0 & (diff0 - 1)) == 0) > > Or just > > > > if (hweight_long((unsigned long)diff0) == 1) > > > > which is doing exactly what the comment says. > > > > BTW, I don't understand why the current code is wrong? To me, it seems > > it's correctly detecting the case where only a single bit is different. > > What are you trying to fix exactly? > > > > Best Regards, > > > > Boris > > > > For example, assuming diff0 is 1, then fls(diff0) is equal to 1, then "~(1 << > fls(diff0))" is equal to 0xfffd, > then the result of "(diff0 & ~(1 << fls(diff0)))" is 1 , not we expected 0. > > __fls(diff0) and "(fls(diff0) - 1)" are all right, but fls(diff0) is wrong. > Indeed, I forgot that fls() was returning (position + 1). Anyway, I still think using hweight clarifies what you really want to test. -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com
[PATCH] kbuild: drop redundant "PHONY += FORCE"
"PHONY += FORCE" is already cared by scripts/Makefile.build, which these files are included from. Signed-off-by: Masahiro Yamada--- Please apply the following to avoid conflicts. https://patchwork.kernel.org/patch/8572451/ arch/arm/boot/bootp/Makefile | 2 +- arch/ia64/Makefile | 2 +- arch/unicore32/boot/Makefile | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/arm/boot/bootp/Makefile b/arch/arm/boot/bootp/Makefile index 52a543b..5e4acd2 100644 --- a/arch/arm/boot/bootp/Makefile +++ b/arch/arm/boot/bootp/Makefile @@ -25,4 +25,4 @@ $(obj)/kernel.o: arch/arm/boot/zImage FORCE $(obj)/initrd.o: $(INITRD) FORCE -PHONY += $(INITRD) FORCE +PHONY += $(INITRD) diff --git a/arch/ia64/Makefile b/arch/ia64/Makefile index 648f1ce..c100d78 100644 --- a/arch/ia64/Makefile +++ b/arch/ia64/Makefile @@ -96,7 +96,7 @@ define archhelp endef archprepare: make_nr_irqs_h -PHONY += make_nr_irqs_h FORCE +PHONY += make_nr_irqs_h make_nr_irqs_h: $(Q)$(MAKE) $(build)=arch/ia64/kernel include/generated/nr-irqs.h diff --git a/arch/unicore32/boot/Makefile b/arch/unicore32/boot/Makefile index ec7fb70..8288550 100644 --- a/arch/unicore32/boot/Makefile +++ b/arch/unicore32/boot/Makefile @@ -31,7 +31,7 @@ $(obj)/uImage: $(obj)/zImage FORCE $(call if_changed,uimage) @echo ' Image $@ is ready' -PHONY += initrd FORCE +PHONY += initrd initrd: @test "$(INITRD)" != "" || \ (echo You must specify INITRD; exit -1) -- 1.9.1
[PATCH] kbuild: drop redundant "PHONY += FORCE"
"PHONY += FORCE" is already cared by scripts/Makefile.build, which these files are included from. Signed-off-by: Masahiro Yamada --- Please apply the following to avoid conflicts. https://patchwork.kernel.org/patch/8572451/ arch/arm/boot/bootp/Makefile | 2 +- arch/ia64/Makefile | 2 +- arch/unicore32/boot/Makefile | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/arm/boot/bootp/Makefile b/arch/arm/boot/bootp/Makefile index 52a543b..5e4acd2 100644 --- a/arch/arm/boot/bootp/Makefile +++ b/arch/arm/boot/bootp/Makefile @@ -25,4 +25,4 @@ $(obj)/kernel.o: arch/arm/boot/zImage FORCE $(obj)/initrd.o: $(INITRD) FORCE -PHONY += $(INITRD) FORCE +PHONY += $(INITRD) diff --git a/arch/ia64/Makefile b/arch/ia64/Makefile index 648f1ce..c100d78 100644 --- a/arch/ia64/Makefile +++ b/arch/ia64/Makefile @@ -96,7 +96,7 @@ define archhelp endef archprepare: make_nr_irqs_h -PHONY += make_nr_irqs_h FORCE +PHONY += make_nr_irqs_h make_nr_irqs_h: $(Q)$(MAKE) $(build)=arch/ia64/kernel include/generated/nr-irqs.h diff --git a/arch/unicore32/boot/Makefile b/arch/unicore32/boot/Makefile index ec7fb70..8288550 100644 --- a/arch/unicore32/boot/Makefile +++ b/arch/unicore32/boot/Makefile @@ -31,7 +31,7 @@ $(obj)/uImage: $(obj)/zImage FORCE $(call if_changed,uimage) @echo ' Image $@ is ready' -PHONY += initrd FORCE +PHONY += initrd initrd: @test "$(INITRD)" != "" || \ (echo You must specify INITRD; exit -1) -- 1.9.1
RE: [PATCH v8 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> From: Joe Perches [mailto:j...@perches.com] > Sent: Friday, April 8, 2016 9:15 > On Thu, 2016-04-07 at 18:36 -0700, Dexuan Cui wrote: > > diff --git a/include/net/af_hvsock.h b/include/net/af_hvsock.h > [] > > +#define VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV (5 * PAGE_SIZE) > > +#define VMBUS_RINGBUFFER_SIZE_HVSOCK_SEND (5 * PAGE_SIZE) > > + > > +#define HVSOCK_RCV_BUF_SZ > VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV > > +#define HVSOCK_SND_BUF_SZ PAGE_SIZE > [] > > +struct hvsock_sock { > [] > > + struct { > > + struct vmpipe_proto_header hdr; > > + char buf[HVSOCK_SND_BUF_SZ]; > > + } __packed send; > > + > > + struct { > > + struct vmpipe_proto_header hdr; > > + char buf[HVSOCK_RCV_BUF_SZ]; > > + unsigned int data_len; > > + unsigned int data_offset; > > + } __packed recv; > > +}; > > These bufs are not page aligned and so can span pages. > > Is there any value in allocating these bufs separately > as pages instead of as a kmalloc? The bufs are not required to be page aligned. Here the 'hdr' and the 'buf' must be consecutive, i.e., the 'buf' must be an array rather than a pointer: please see hvsock_send_data(). It looks to me there is no big value to make sure the 'buf' is page aligned: on x86_64, at least it should already be 8-byte aligned due to the adjacent channel pointer, so memcpy_from_msg() should work enough good and in hvsock_send_data() -> vmbus_sendpacket(), we don't copy the 'buf'. Thanks, -- Dexuan
RE: [PATCH v8 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> From: Joe Perches [mailto:j...@perches.com] > Sent: Friday, April 8, 2016 9:15 > On Thu, 2016-04-07 at 18:36 -0700, Dexuan Cui wrote: > > diff --git a/include/net/af_hvsock.h b/include/net/af_hvsock.h > [] > > +#define VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV (5 * PAGE_SIZE) > > +#define VMBUS_RINGBUFFER_SIZE_HVSOCK_SEND (5 * PAGE_SIZE) > > + > > +#define HVSOCK_RCV_BUF_SZ > VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV > > +#define HVSOCK_SND_BUF_SZ PAGE_SIZE > [] > > +struct hvsock_sock { > [] > > + struct { > > + struct vmpipe_proto_header hdr; > > + char buf[HVSOCK_SND_BUF_SZ]; > > + } __packed send; > > + > > + struct { > > + struct vmpipe_proto_header hdr; > > + char buf[HVSOCK_RCV_BUF_SZ]; > > + unsigned int data_len; > > + unsigned int data_offset; > > + } __packed recv; > > +}; > > These bufs are not page aligned and so can span pages. > > Is there any value in allocating these bufs separately > as pages instead of as a kmalloc? The bufs are not required to be page aligned. Here the 'hdr' and the 'buf' must be consecutive, i.e., the 'buf' must be an array rather than a pointer: please see hvsock_send_data(). It looks to me there is no big value to make sure the 'buf' is page aligned: on x86_64, at least it should already be 8-byte aligned due to the adjacent channel pointer, so memcpy_from_msg() should work enough good and in hvsock_send_data() -> vmbus_sendpacket(), we don't copy the 'buf'. Thanks, -- Dexuan
Re: [RFC PATCH 0/3] restartable sequences v2: fast user-space percpu critical sections
- On Apr 7, 2016, at 9:21 PM, Andy Lutomirski l...@amacapital.net wrote: > On Thu, Apr 7, 2016 at 6:11 PM, Mathieu Desnoyers >wrote: >> - On Apr 7, 2016, at 6:05 PM, Andy Lutomirski l...@amacapital.net wrote: >> >>> On Thu, Apr 7, 2016 at 1:11 PM, Peter Zijlstra wrote: On Thu, Apr 07, 2016 at 09:43:33AM -0700, Andy Lutomirski wrote: >> [...] > it's inherently debuggable, It is more debuggable, agreed. > and it allows multiple independent > rseq-protected things to coexist without forcing each other to abort. >> >> [...] >> >> My understanding is that the main goal of this rather more complex >> proposal is to make interaction with debuggers more straightforward in >> cases of single-stepping through the rseq critical section. > > The things I like about my proposal are both that you can single-step > through it just like any other code as long as you pin the thread to a > CPU and that it doesn't make preemption magical. (Of course, you can > *force* it to do something on resume and/or preemption by sticking a > bogus value in the expected event count field, but that's not the > intended use. Hmm, I guess it does need to hook preemption and/or > resume for all processes that enable the thing so it can know to check > for an enabled post_commit_rip, just like all the other proposals.) > > Also, mine lets you have a fairly long-running critical section that > doesn't get aborted under heavy load and can interleave with other > critical sections that don't conflict. Yes, those would be nice advantages. I'll have to do a few more pseudo-code and execution scenarios to get a better understanding of your idea. > >> >> I recently came up with a scheme that should allow us to handle such >> situations in a fashion similar to debuggers handling ll/sc >> restartable sequences of instructions on e.g. powerpc. The good news >> is that my scheme does not require anything at the kernel level. >> >> The idea is simple: the userspace rseq critical sections now >> become marked by 3 inline functions (rather than 2 in Paul's proposal): >> >> rseq_start(void *rseq_key) >> rseq_finish(void *rseq_key) >> rseq_abort(void *rseq_key) > > How do you use this thing? What are its semantics? You define one rseq_key variable (dummy 1 byte variable, can be an empty structure) for each rseq critical section you have in your program. A rseq critical section will typically have one entry point (rseq_start), and one exit point (rseq_finish). I'm saying "typically" because there may be more than one entry point, and more than one exit point per critical section. Entry and exit points mark the beginning and end of each rseq critical section. rseq_start loads the sequence counter from the TLS and copies it onto the stack. It then gets passed to rseq_finish() to be compared with the final seqnum TLS value just before the commit. rseq_finish is the one responsible for storing into the post_commit_instr field of the TLS and populating rcx with the failure insn label address. rseq_finish() does the commit. And there is rseq_abort(), which would need to be called if we just want to exit from a rseq critical section without doing the commit (no matching call to rseq_finish after a rseq_start). Each of rseq_start, finish, and abort would need to receive a pointer to the rseq_key as parameter. rseq_start would return the sequence number read from the TLS. rseq_finish would also receive as parameter that sequence number that has been returned by rseq_start. Does it make sense ? Thanks, Mathieu > > --Andy > >> >> We associate each critical section with a unique "key" (dummy >> 1 byte object in the process address space), so we can group >> them. The new "rseq_abort" would mark exit points that would >> exit the critical section without executing the final commit >> instruction. >> >> Within each of rseq_start, rseq_finish and rseq_abort, >> we declare a non-loadable section that gets populated >> with the following tuples: >> >> (RSEQ_TYPE, insn address, rseq_key) >> >> Where RSEQ_TYPE is either RSEQ_START, RSEQ_FINISH, or RSEQ_ABORT. >> >> That special section would be found in the executable by the >> debugger, which can then skip over entire restartable critical >> sections when it encounters them by placing breakpoints at >> all exit points (finish and cancel) associated to the same >> rseq_key as the entry point (start). >> >> This way we don't need to complexify the runtime code, neither >> at kernel nor user-space level, and we get debuggability using >> a trick similar to what ll/sc architectures already need to do. >> >> Of course, this requires extending gdb, which should not be >> a show-stopper. >> >> Thoughts ? >> >> Thanks, >> >> Mathieu >> >> -- >> Mathieu Desnoyers >> EfficiOS Inc. >> http://www.efficios.com > > > > -- > Andy Lutomirski > AMA Capital Management, LLC -- Mathieu Desnoyers EfficiOS Inc.
Re: [RFC PATCH 0/3] restartable sequences v2: fast user-space percpu critical sections
- On Apr 7, 2016, at 9:21 PM, Andy Lutomirski l...@amacapital.net wrote: > On Thu, Apr 7, 2016 at 6:11 PM, Mathieu Desnoyers > wrote: >> - On Apr 7, 2016, at 6:05 PM, Andy Lutomirski l...@amacapital.net wrote: >> >>> On Thu, Apr 7, 2016 at 1:11 PM, Peter Zijlstra wrote: On Thu, Apr 07, 2016 at 09:43:33AM -0700, Andy Lutomirski wrote: >> [...] > it's inherently debuggable, It is more debuggable, agreed. > and it allows multiple independent > rseq-protected things to coexist without forcing each other to abort. >> >> [...] >> >> My understanding is that the main goal of this rather more complex >> proposal is to make interaction with debuggers more straightforward in >> cases of single-stepping through the rseq critical section. > > The things I like about my proposal are both that you can single-step > through it just like any other code as long as you pin the thread to a > CPU and that it doesn't make preemption magical. (Of course, you can > *force* it to do something on resume and/or preemption by sticking a > bogus value in the expected event count field, but that's not the > intended use. Hmm, I guess it does need to hook preemption and/or > resume for all processes that enable the thing so it can know to check > for an enabled post_commit_rip, just like all the other proposals.) > > Also, mine lets you have a fairly long-running critical section that > doesn't get aborted under heavy load and can interleave with other > critical sections that don't conflict. Yes, those would be nice advantages. I'll have to do a few more pseudo-code and execution scenarios to get a better understanding of your idea. > >> >> I recently came up with a scheme that should allow us to handle such >> situations in a fashion similar to debuggers handling ll/sc >> restartable sequences of instructions on e.g. powerpc. The good news >> is that my scheme does not require anything at the kernel level. >> >> The idea is simple: the userspace rseq critical sections now >> become marked by 3 inline functions (rather than 2 in Paul's proposal): >> >> rseq_start(void *rseq_key) >> rseq_finish(void *rseq_key) >> rseq_abort(void *rseq_key) > > How do you use this thing? What are its semantics? You define one rseq_key variable (dummy 1 byte variable, can be an empty structure) for each rseq critical section you have in your program. A rseq critical section will typically have one entry point (rseq_start), and one exit point (rseq_finish). I'm saying "typically" because there may be more than one entry point, and more than one exit point per critical section. Entry and exit points mark the beginning and end of each rseq critical section. rseq_start loads the sequence counter from the TLS and copies it onto the stack. It then gets passed to rseq_finish() to be compared with the final seqnum TLS value just before the commit. rseq_finish is the one responsible for storing into the post_commit_instr field of the TLS and populating rcx with the failure insn label address. rseq_finish() does the commit. And there is rseq_abort(), which would need to be called if we just want to exit from a rseq critical section without doing the commit (no matching call to rseq_finish after a rseq_start). Each of rseq_start, finish, and abort would need to receive a pointer to the rseq_key as parameter. rseq_start would return the sequence number read from the TLS. rseq_finish would also receive as parameter that sequence number that has been returned by rseq_start. Does it make sense ? Thanks, Mathieu > > --Andy > >> >> We associate each critical section with a unique "key" (dummy >> 1 byte object in the process address space), so we can group >> them. The new "rseq_abort" would mark exit points that would >> exit the critical section without executing the final commit >> instruction. >> >> Within each of rseq_start, rseq_finish and rseq_abort, >> we declare a non-loadable section that gets populated >> with the following tuples: >> >> (RSEQ_TYPE, insn address, rseq_key) >> >> Where RSEQ_TYPE is either RSEQ_START, RSEQ_FINISH, or RSEQ_ABORT. >> >> That special section would be found in the executable by the >> debugger, which can then skip over entire restartable critical >> sections when it encounters them by placing breakpoints at >> all exit points (finish and cancel) associated to the same >> rseq_key as the entry point (start). >> >> This way we don't need to complexify the runtime code, neither >> at kernel nor user-space level, and we get debuggability using >> a trick similar to what ll/sc architectures already need to do. >> >> Of course, this requires extending gdb, which should not be >> a show-stopper. >> >> Thoughts ? >> >> Thanks, >> >> Mathieu >> >> -- >> Mathieu Desnoyers >> EfficiOS Inc. >> http://www.efficios.com > > > > -- > Andy Lutomirski > AMA Capital Management, LLC -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
Re: [PATCH v2] clk: let clk_disable() return immediately if clk is NULL or error
Hi Stephen, 2016-04-08 9:33 GMT+09:00 Stephen Boyd: > On 04/05, Masahiro Yamada wrote: >> The clk_disable() in the common clock framework (drivers/clk/clk.c) >> returns immediately if a given clk is NULL or an error pointer. It >> allows clock consumers to call clk_disable() without IS_ERR_OR_NULL >> checking if drivers are only used with the common clock framework. >> >> Unfortunately, NULL/error checking is missing from some of non-common >> clk_disable() implementations. This prevents us from completely >> dropping NULL/error checking from callers. Let's make it tree-wide >> consistent by adding IS_ERR_OR_NULL(clk) to all callees. >> >> Signed-off-by: Masahiro Yamada >> Acked-by: Greg Ungerer >> Acked-by: Wan Zongshun >> --- >> >> Stephen, >> >> This patch has been unapplied for a long time. >> >> Please let me know if there is something wrong with this patch. >> > > I'm mostly confused why we wouldn't want to encourage people to > call clk_disable or unprepare on a clk that's an error pointer. > Typically an error pointer should be dealt with, instead of > silently ignored, so why wasn't it dealt with by passing it up > the probe() path? > This makes our driver programming life easier. For example, let's see drivers/tty/serial/8250/8250_of.c The "clock-frequency" DT property takes precedence over "clocks" property. So, it is valid to probe the driver with a NULL pointer for info->clk. if (of_property_read_u32(np, "clock-frequency", )) { /* Get clk rate through clk driver if present */ info->clk = devm_clk_get(>dev, NULL); if (IS_ERR(info->clk)) { dev_warn(>dev, "clk or clock-frequency not defined\n"); return PTR_ERR(info->clk); } ret = clk_prepare_enable(info->clk); if (ret < 0) return ret; clk = clk_get_rate(info->clk); } As a result, we need to make sure the clk pointer is valid before calling clk_disable_unprepare(). If we could support pointer checking in callees, we would be able to clean-up lots of clock consumers. -- Best Regards Masahiro Yamada
Re: [PATCH v2] clk: let clk_disable() return immediately if clk is NULL or error
Hi Stephen, 2016-04-08 9:33 GMT+09:00 Stephen Boyd : > On 04/05, Masahiro Yamada wrote: >> The clk_disable() in the common clock framework (drivers/clk/clk.c) >> returns immediately if a given clk is NULL or an error pointer. It >> allows clock consumers to call clk_disable() without IS_ERR_OR_NULL >> checking if drivers are only used with the common clock framework. >> >> Unfortunately, NULL/error checking is missing from some of non-common >> clk_disable() implementations. This prevents us from completely >> dropping NULL/error checking from callers. Let's make it tree-wide >> consistent by adding IS_ERR_OR_NULL(clk) to all callees. >> >> Signed-off-by: Masahiro Yamada >> Acked-by: Greg Ungerer >> Acked-by: Wan Zongshun >> --- >> >> Stephen, >> >> This patch has been unapplied for a long time. >> >> Please let me know if there is something wrong with this patch. >> > > I'm mostly confused why we wouldn't want to encourage people to > call clk_disable or unprepare on a clk that's an error pointer. > Typically an error pointer should be dealt with, instead of > silently ignored, so why wasn't it dealt with by passing it up > the probe() path? > This makes our driver programming life easier. For example, let's see drivers/tty/serial/8250/8250_of.c The "clock-frequency" DT property takes precedence over "clocks" property. So, it is valid to probe the driver with a NULL pointer for info->clk. if (of_property_read_u32(np, "clock-frequency", )) { /* Get clk rate through clk driver if present */ info->clk = devm_clk_get(>dev, NULL); if (IS_ERR(info->clk)) { dev_warn(>dev, "clk or clock-frequency not defined\n"); return PTR_ERR(info->clk); } ret = clk_prepare_enable(info->clk); if (ret < 0) return ret; clk = clk_get_rate(info->clk); } As a result, we need to make sure the clk pointer is valid before calling clk_disable_unprepare(). If we could support pointer checking in callees, we would be able to clean-up lots of clock consumers. -- Best Regards Masahiro Yamada
Re: [PATCH] mtd: nand: s3c2410: fix bug in s3c2410_nand_correct_data()
在 2016年04月08日 08:18, Boris Brezillon 写道: Hi Zeng, On Fri, 8 Apr 2016 00:48:17 +0800 zengzhao...@163.com wrote: From: Zeng ZhaoxiuIf there is only one bit difference in the ECC, the function should return 1. The result of "diff0 & ~(1<
Re: [PATCH] mtd: nand: s3c2410: fix bug in s3c2410_nand_correct_data()
在 2016年04月08日 08:18, Boris Brezillon 写道: Hi Zeng, On Fri, 8 Apr 2016 00:48:17 +0800 zengzhao...@163.com wrote: From: Zeng Zhaoxiu If there is only one bit difference in the ECC, the function should return 1. The result of "diff0 & ~(1< Missing Signed-off-by here. --- drivers/mtd/nand/s3c2410.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/mtd/nand/s3c2410.c b/drivers/mtd/nand/s3c2410.c index 9c9397b..c9698cf 100644 --- a/drivers/mtd/nand/s3c2410.c +++ b/drivers/mtd/nand/s3c2410.c @@ -542,7 +542,7 @@ static int s3c2410_nand_correct_data(struct mtd_info *mtd, u_char *dat, diff0 |= (diff1 << 8); diff0 |= (diff2 << 16); - if ((diff0 & ~(1< + if ((diff0 & (diff0 - 1)) == 0) Or just if (hweight_long((unsigned long)diff0) == 1) which is doing exactly what the comment says. BTW, I don't understand why the current code is wrong? To me, it seems it's correctly detecting the case where only a single bit is different. What are you trying to fix exactly? Best Regards, Boris For example, assuming diff0 is 1, then fls(diff0) is equal to 1, then "~(1 << fls(diff0))" is equal to 0xfffd, then the result of "(diff0 & ~(1 << fls(diff0)))" is 1 , not we expected 0. __fls(diff0) and "(fls(diff0) - 1)" are all right, but fls(diff0) is wrong.
Re: [PATCH v1 09/12] serial: 8250_lpss: split LPSS driver to separate module
On 04/07/2016 01:37 PM, Andy Shevchenko wrote: > Intes SoCs, such as Braswell, have DesignWare UART. Split out to separate > module which also will be used for Intel Quark later. What's the rationale? And this really isn't a split; this patch introduces a number of significant changes from the pci version. > Signed-off-by: Andy Shevchenko> --- > drivers/tty/serial/8250/8250_lpss.c | 279 > > drivers/tty/serial/8250/8250_pci.c | 227 ++--- > drivers/tty/serial/8250/Kconfig | 14 +- > drivers/tty/serial/8250/Makefile| 1 + > 4 files changed, 301 insertions(+), 220 deletions(-) > create mode 100644 drivers/tty/serial/8250/8250_lpss.c > > diff --git a/drivers/tty/serial/8250/8250_lpss.c > b/drivers/tty/serial/8250/8250_lpss.c > new file mode 100644 > index 000..bca4adb > --- /dev/null > +++ b/drivers/tty/serial/8250/8250_lpss.c > @@ -0,0 +1,279 @@ > +/* > + * 8250_lpss.c - Driver for UART on Intel Braswell and various other Intel > SoCs > + * > + * Copyright (C) 2016 Intel Corporation > + * Author: Andy Shevchenko > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License version 2 as > + * published by the Free Software Foundation. > + */ > + > +#include > +#include > +#include > +#include > + > +#include > +#include > + > +#include "8250.h" > + > +#define PCI_DEVICE_ID_INTEL_BYT_UART10x0f0a > +#define PCI_DEVICE_ID_INTEL_BYT_UART20x0f0c > + > +#define PCI_DEVICE_ID_INTEL_BSW_UART10x228a > +#define PCI_DEVICE_ID_INTEL_BSW_UART20x228c > + > +#define PCI_DEVICE_ID_INTEL_BDW_UART10x9ce3 > +#define PCI_DEVICE_ID_INTEL_BDW_UART20x9ce4 > + > +/* Intel LPSS specific registers */ > + > +#define BYT_PRV_CLK 0x800 > +#define BYT_PRV_CLK_EN BIT(0) > +#define BYT_PRV_CLK_M_VAL_SHIFT 1 > +#define BYT_PRV_CLK_N_VAL_SHIFT 16 > +#define BYT_PRV_CLK_UPDATE BIT(31) > + > +#define BYT_TX_OVF_INT 0x820 > +#define BYT_TX_OVF_INT_MASK BIT(1) > + > +struct lpss8250; > + > +struct lpss8250_board { > + unsigned long freq; > + unsigned int base_baud; > + int (*setup)(struct lpss8250 *, struct uart_port *p); > +}; New concept. > + > +struct lpss8250 { > + int line; > + struct lpss8250_board *board; > + > + /* DMA parameters */ > + struct uart_8250_dma dma; > + struct dw_dma_slave dma_param; > + u8 dma_maxburst; > +}; > + > +/*/ Please remove. > + > +static void lpss8250_set_termios(struct uart_port *p, > + struct ktermios *termios, > + struct ktermios *old) > +{ > + unsigned int baud = tty_termios_baud_rate(termios); > + struct lpss8250 *lpss = p->private_data; > + unsigned long fref = lpss->board->freq, fuart = baud * 16; > + unsigned long w = BIT(15) - 1; > + unsigned long m, n; > + u32 reg; > + > + /* Get Fuart closer to Fref */ > + fuart *= rounddown_pow_of_two(fref / fuart); > + > + /* > + * For baud rates 0.5M, 1M, 1.5M, 2M, 2.5M, 3M, 3.5M and 4M the > + * dividers must be adjusted. > + * > + * uartclk = (m / n) * 100 MHz, where m <= n > + */ > + rational_best_approximation(fuart, fref, w, w, , ); > + p->uartclk = fuart; > + > + /* Reset the clock */ > + reg = (m << BYT_PRV_CLK_M_VAL_SHIFT) | (n << BYT_PRV_CLK_N_VAL_SHIFT); > + writel(reg, p->membase + BYT_PRV_CLK); > + reg |= BYT_PRV_CLK_EN | BYT_PRV_CLK_UPDATE; > + writel(reg, p->membase + BYT_PRV_CLK); > + > + p->status &= ~UPSTAT_AUTOCTS; > + if (termios->c_cflag & CRTSCTS) > + p->status |= UPSTAT_AUTOCTS; > + > + serial8250_do_set_termios(p, termios, old); > +} > + > +/*/ Please remove. > + > +static int byt_serial_setup(struct lpss8250 *lpss, struct uart_port *port) This would have been much easier to review if you had simply moved it here and done the rework in a follow-on patch. > +{ > + struct dw_dma_slave *param = >dma_param; > + struct uart_8250_port *up = up_to_u8250p(port); > + struct pci_dev *pdev = to_pci_dev(port->dev); > + struct pci_dev *dma_dev = pci_get_slot(pdev->bus, > PCI_DEVFN(PCI_SLOT(pdev->devfn), 0)); > + > + switch (pdev->device) { > + case PCI_DEVICE_ID_INTEL_BYT_UART1: > + case PCI_DEVICE_ID_INTEL_BSW_UART1: > + case PCI_DEVICE_ID_INTEL_BDW_UART1: > + param->src_id = 3; > + param->dst_id = 2; > + break; > + case PCI_DEVICE_ID_INTEL_BYT_UART2: > + case PCI_DEVICE_ID_INTEL_BSW_UART2: > + case
Re: [PATCH v1 09/12] serial: 8250_lpss: split LPSS driver to separate module
On 04/07/2016 01:37 PM, Andy Shevchenko wrote: > Intes SoCs, such as Braswell, have DesignWare UART. Split out to separate > module which also will be used for Intel Quark later. What's the rationale? And this really isn't a split; this patch introduces a number of significant changes from the pci version. > Signed-off-by: Andy Shevchenko > --- > drivers/tty/serial/8250/8250_lpss.c | 279 > > drivers/tty/serial/8250/8250_pci.c | 227 ++--- > drivers/tty/serial/8250/Kconfig | 14 +- > drivers/tty/serial/8250/Makefile| 1 + > 4 files changed, 301 insertions(+), 220 deletions(-) > create mode 100644 drivers/tty/serial/8250/8250_lpss.c > > diff --git a/drivers/tty/serial/8250/8250_lpss.c > b/drivers/tty/serial/8250/8250_lpss.c > new file mode 100644 > index 000..bca4adb > --- /dev/null > +++ b/drivers/tty/serial/8250/8250_lpss.c > @@ -0,0 +1,279 @@ > +/* > + * 8250_lpss.c - Driver for UART on Intel Braswell and various other Intel > SoCs > + * > + * Copyright (C) 2016 Intel Corporation > + * Author: Andy Shevchenko > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License version 2 as > + * published by the Free Software Foundation. > + */ > + > +#include > +#include > +#include > +#include > + > +#include > +#include > + > +#include "8250.h" > + > +#define PCI_DEVICE_ID_INTEL_BYT_UART10x0f0a > +#define PCI_DEVICE_ID_INTEL_BYT_UART20x0f0c > + > +#define PCI_DEVICE_ID_INTEL_BSW_UART10x228a > +#define PCI_DEVICE_ID_INTEL_BSW_UART20x228c > + > +#define PCI_DEVICE_ID_INTEL_BDW_UART10x9ce3 > +#define PCI_DEVICE_ID_INTEL_BDW_UART20x9ce4 > + > +/* Intel LPSS specific registers */ > + > +#define BYT_PRV_CLK 0x800 > +#define BYT_PRV_CLK_EN BIT(0) > +#define BYT_PRV_CLK_M_VAL_SHIFT 1 > +#define BYT_PRV_CLK_N_VAL_SHIFT 16 > +#define BYT_PRV_CLK_UPDATE BIT(31) > + > +#define BYT_TX_OVF_INT 0x820 > +#define BYT_TX_OVF_INT_MASK BIT(1) > + > +struct lpss8250; > + > +struct lpss8250_board { > + unsigned long freq; > + unsigned int base_baud; > + int (*setup)(struct lpss8250 *, struct uart_port *p); > +}; New concept. > + > +struct lpss8250 { > + int line; > + struct lpss8250_board *board; > + > + /* DMA parameters */ > + struct uart_8250_dma dma; > + struct dw_dma_slave dma_param; > + u8 dma_maxburst; > +}; > + > +/*/ Please remove. > + > +static void lpss8250_set_termios(struct uart_port *p, > + struct ktermios *termios, > + struct ktermios *old) > +{ > + unsigned int baud = tty_termios_baud_rate(termios); > + struct lpss8250 *lpss = p->private_data; > + unsigned long fref = lpss->board->freq, fuart = baud * 16; > + unsigned long w = BIT(15) - 1; > + unsigned long m, n; > + u32 reg; > + > + /* Get Fuart closer to Fref */ > + fuart *= rounddown_pow_of_two(fref / fuart); > + > + /* > + * For baud rates 0.5M, 1M, 1.5M, 2M, 2.5M, 3M, 3.5M and 4M the > + * dividers must be adjusted. > + * > + * uartclk = (m / n) * 100 MHz, where m <= n > + */ > + rational_best_approximation(fuart, fref, w, w, , ); > + p->uartclk = fuart; > + > + /* Reset the clock */ > + reg = (m << BYT_PRV_CLK_M_VAL_SHIFT) | (n << BYT_PRV_CLK_N_VAL_SHIFT); > + writel(reg, p->membase + BYT_PRV_CLK); > + reg |= BYT_PRV_CLK_EN | BYT_PRV_CLK_UPDATE; > + writel(reg, p->membase + BYT_PRV_CLK); > + > + p->status &= ~UPSTAT_AUTOCTS; > + if (termios->c_cflag & CRTSCTS) > + p->status |= UPSTAT_AUTOCTS; > + > + serial8250_do_set_termios(p, termios, old); > +} > + > +/*/ Please remove. > + > +static int byt_serial_setup(struct lpss8250 *lpss, struct uart_port *port) This would have been much easier to review if you had simply moved it here and done the rework in a follow-on patch. > +{ > + struct dw_dma_slave *param = >dma_param; > + struct uart_8250_port *up = up_to_u8250p(port); > + struct pci_dev *pdev = to_pci_dev(port->dev); > + struct pci_dev *dma_dev = pci_get_slot(pdev->bus, > PCI_DEVFN(PCI_SLOT(pdev->devfn), 0)); > + > + switch (pdev->device) { > + case PCI_DEVICE_ID_INTEL_BYT_UART1: > + case PCI_DEVICE_ID_INTEL_BSW_UART1: > + case PCI_DEVICE_ID_INTEL_BDW_UART1: > + param->src_id = 3; > + param->dst_id = 2; > + break; > + case PCI_DEVICE_ID_INTEL_BYT_UART2: > + case PCI_DEVICE_ID_INTEL_BSW_UART2: > + case PCI_DEVICE_ID_INTEL_BDW_UART2: > + param->src_id = 5; > + param->dst_id
Re: [PATCH 2/2] net-ath9k_htc: Replace a variable initialisation by an assignment in ath9k_htc_set_channel()
Hi Kalle, On Sat, Jan 2, 2016 at 5:25 AM, SF Markus Elfringwrote: > From: Markus Elfring > Date: Fri, 1 Jan 2016 19:09:32 +0100 > > Replace an explicit initialisation for one local variable at the beginning > by a conditional assignment. > > Signed-off-by: Markus Elfring This looks sane to me. Reviewed-by: Julian Calaby Thanks, Julian Calaby > --- > drivers/net/wireless/ath/ath9k/htc_drv_main.c | 7 ++- > 1 file changed, 2 insertions(+), 5 deletions(-) > > diff --git a/drivers/net/wireless/ath/ath9k/htc_drv_main.c > b/drivers/net/wireless/ath/ath9k/htc_drv_main.c > index a680a97..30bd59e 100644 > --- a/drivers/net/wireless/ath/ath9k/htc_drv_main.c > +++ b/drivers/net/wireless/ath/ath9k/htc_drv_main.c > @@ -246,7 +246,7 @@ static int ath9k_htc_set_channel(struct ath9k_htc_priv > *priv, > struct ieee80211_conf *conf = >hw->conf; > bool fastcc; > struct ieee80211_channel *channel = hw->conf.chandef.chan; > - struct ath9k_hw_cal_data *caldata = NULL; > + struct ath9k_hw_cal_data *caldata; > enum htc_phymode mode; > __be16 htc_mode; > u8 cmd_rsp; > @@ -274,10 +274,7 @@ static int ath9k_htc_set_channel(struct ath9k_htc_priv > *priv, > priv->ah->curchan->channel, > channel->center_freq, conf_is_ht(conf), conf_is_ht40(conf), > fastcc); > - > - if (!fastcc) > - caldata = >caldata; > - > + caldata = fastcc ? NULL : >caldata; > ret = ath9k_hw_reset(ah, hchan, caldata, fastcc); > if (ret) { > ath_err(common, > -- > 2.6.3 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-wireless" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Julian Calaby Email: julian.cal...@gmail.com Profile: http://www.google.com/profiles/julian.calaby/
Re: [PATCH 2/2] net-ath9k_htc: Replace a variable initialisation by an assignment in ath9k_htc_set_channel()
Hi Kalle, On Sat, Jan 2, 2016 at 5:25 AM, SF Markus Elfring wrote: > From: Markus Elfring > Date: Fri, 1 Jan 2016 19:09:32 +0100 > > Replace an explicit initialisation for one local variable at the beginning > by a conditional assignment. > > Signed-off-by: Markus Elfring This looks sane to me. Reviewed-by: Julian Calaby Thanks, Julian Calaby > --- > drivers/net/wireless/ath/ath9k/htc_drv_main.c | 7 ++- > 1 file changed, 2 insertions(+), 5 deletions(-) > > diff --git a/drivers/net/wireless/ath/ath9k/htc_drv_main.c > b/drivers/net/wireless/ath/ath9k/htc_drv_main.c > index a680a97..30bd59e 100644 > --- a/drivers/net/wireless/ath/ath9k/htc_drv_main.c > +++ b/drivers/net/wireless/ath/ath9k/htc_drv_main.c > @@ -246,7 +246,7 @@ static int ath9k_htc_set_channel(struct ath9k_htc_priv > *priv, > struct ieee80211_conf *conf = >hw->conf; > bool fastcc; > struct ieee80211_channel *channel = hw->conf.chandef.chan; > - struct ath9k_hw_cal_data *caldata = NULL; > + struct ath9k_hw_cal_data *caldata; > enum htc_phymode mode; > __be16 htc_mode; > u8 cmd_rsp; > @@ -274,10 +274,7 @@ static int ath9k_htc_set_channel(struct ath9k_htc_priv > *priv, > priv->ah->curchan->channel, > channel->center_freq, conf_is_ht(conf), conf_is_ht40(conf), > fastcc); > - > - if (!fastcc) > - caldata = >caldata; > - > + caldata = fastcc ? NULL : >caldata; > ret = ath9k_hw_reset(ah, hchan, caldata, fastcc); > if (ret) { > ath_err(common, > -- > 2.6.3 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-wireless" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Julian Calaby Email: julian.cal...@gmail.com Profile: http://www.google.com/profiles/julian.calaby/
Re: [PATCH v3 6/7] x86/cpu: Add Erratum 88 detection on AMD
On Thu, Apr 7, 2016 at 5:31 PM, Andy Lutomirskiwrote: > From: Borislav Petkov > > Erratum 88 affects old AMD K8s, where a SWAPGS fails to cause an input > dependency on GS. Therefore, we need to MFENCE before it. > > But that MFENCE is expensive and unnecessary on the remaining x86 CPUs > out there so patch it out on the CPUs which don't require it. This is basically identical to: https://lkml.kernel.org/g/1458576969-13309-4-git-send-email-a...@firstfloor.org Whoops! I thought I'd seen that somewhere but I couldn't spot it. Ingo, etc: we should probably apply one of those patches with a -stable tag (to mitigate the otherwise potentially unpleasant performance regression in here), but I don't really care which one. Andi's has a name for the bug that seems nicer by one character to me, but it would have to be (trivally) rebased. --Andy > > Signed-off-by: Borislav Petkov > Cc: > Signed-off-by: Andy Lutomirski --- > arch/x86/entry/entry_64.S | 2 +- > arch/x86/include/asm/cpufeatures.h | 2 ++ > arch/x86/kernel/cpu/amd.c | 1 + > 3 files changed, 4 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S > index 858b555e274b..64d2033d1e49 100644 > --- a/arch/x86/entry/entry_64.S > +++ b/arch/x86/entry/entry_64.S > @@ -783,7 +783,7 @@ ENTRY(native_load_gs_index) > SWAPGS > gs_change: > movl%edi, %gs > -2: mfence /* workaround */ > +2: ALTERNATIVE "", "mfence", X86_BUG_SWAPGS_FENCE > SWAPGS > popfq > ret > diff --git a/arch/x86/include/asm/cpufeatures.h > b/arch/x86/include/asm/cpufeatures.h > index 2a052302bc43..7bfb6b70c745 100644 > --- a/arch/x86/include/asm/cpufeatures.h > +++ b/arch/x86/include/asm/cpufeatures.h > @@ -295,6 +295,8 @@ > #define X86_BUG_CLFLUSH_MONITORX86_BUG(7) /* AAI65, CLFLUSH required > before MONITOR */ > #define X86_BUG_SYSRET_SS_ATTRSX86_BUG(8) /* SYSRET doesn't fix up > SS attrs */ > #define X86_BUG_NULL_SEG X86_BUG(9) /* Nulling a selector preserves > the base */ > +#define X86_BUG_SWAPGS_FENCE X86_BUG(10) /* SWAPGS without input dep on GS > */ > + > > #ifdef CONFIG_X86_32 > /* > diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c > index 6e47e3a916f1..b7cc9efe08b5 100644 > --- a/arch/x86/kernel/cpu/amd.c > +++ b/arch/x86/kernel/cpu/amd.c > @@ -632,6 +632,7 @@ static void init_amd_k8(struct cpuinfo_x86 *c) > */ > msr_set_bit(MSR_K7_HWCR, 6); > #endif > + set_cpu_bug(c, X86_BUG_SWAPGS_FENCE); > } > > static void init_amd_gh(struct cpuinfo_x86 *c) > -- > 2.5.5 > -- Andy Lutomirski AMA Capital Management, LLC
Re: [PATCH v3 6/7] x86/cpu: Add Erratum 88 detection on AMD
On Thu, Apr 7, 2016 at 5:31 PM, Andy Lutomirski wrote: > From: Borislav Petkov > > Erratum 88 affects old AMD K8s, where a SWAPGS fails to cause an input > dependency on GS. Therefore, we need to MFENCE before it. > > But that MFENCE is expensive and unnecessary on the remaining x86 CPUs > out there so patch it out on the CPUs which don't require it. This is basically identical to: https://lkml.kernel.org/g/1458576969-13309-4-git-send-email-a...@firstfloor.org Whoops! I thought I'd seen that somewhere but I couldn't spot it. Ingo, etc: we should probably apply one of those patches with a -stable tag (to mitigate the otherwise potentially unpleasant performance regression in here), but I don't really care which one. Andi's has a name for the bug that seems nicer by one character to me, but it would have to be (trivally) rebased. --Andy > > Signed-off-by: Borislav Petkov > Cc: > Signed-off-by: Andy Lutomirski --- > arch/x86/entry/entry_64.S | 2 +- > arch/x86/include/asm/cpufeatures.h | 2 ++ > arch/x86/kernel/cpu/amd.c | 1 + > 3 files changed, 4 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S > index 858b555e274b..64d2033d1e49 100644 > --- a/arch/x86/entry/entry_64.S > +++ b/arch/x86/entry/entry_64.S > @@ -783,7 +783,7 @@ ENTRY(native_load_gs_index) > SWAPGS > gs_change: > movl%edi, %gs > -2: mfence /* workaround */ > +2: ALTERNATIVE "", "mfence", X86_BUG_SWAPGS_FENCE > SWAPGS > popfq > ret > diff --git a/arch/x86/include/asm/cpufeatures.h > b/arch/x86/include/asm/cpufeatures.h > index 2a052302bc43..7bfb6b70c745 100644 > --- a/arch/x86/include/asm/cpufeatures.h > +++ b/arch/x86/include/asm/cpufeatures.h > @@ -295,6 +295,8 @@ > #define X86_BUG_CLFLUSH_MONITORX86_BUG(7) /* AAI65, CLFLUSH required > before MONITOR */ > #define X86_BUG_SYSRET_SS_ATTRSX86_BUG(8) /* SYSRET doesn't fix up > SS attrs */ > #define X86_BUG_NULL_SEG X86_BUG(9) /* Nulling a selector preserves > the base */ > +#define X86_BUG_SWAPGS_FENCE X86_BUG(10) /* SWAPGS without input dep on GS > */ > + > > #ifdef CONFIG_X86_32 > /* > diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c > index 6e47e3a916f1..b7cc9efe08b5 100644 > --- a/arch/x86/kernel/cpu/amd.c > +++ b/arch/x86/kernel/cpu/amd.c > @@ -632,6 +632,7 @@ static void init_amd_k8(struct cpuinfo_x86 *c) > */ > msr_set_bit(MSR_K7_HWCR, 6); > #endif > + set_cpu_bug(c, X86_BUG_SWAPGS_FENCE); > } > > static void init_amd_gh(struct cpuinfo_x86 *c) > -- > 2.5.5 > -- Andy Lutomirski AMA Capital Management, LLC
Re: x32 processes, with CONFIG_X86_X32 not set
On Thu, Apr 07, 2016 at 05:20:50PM -0700, Andy Lutomirski wrote: > > > > I had a trinity process get stuck last overnight. > > > > The reason for it getting stuck is my bug (I think), but > > > > there's an odd unrelated thing I noticed while debugging this.. > > > > > > > > $ strace -p 20966 > > > > strace: Process 20966 attached > > > > strace: [ Process PID=20966 runs in x32 mode. ] > > > > > > > > So I don't use that new-fangled x32 stuff. > > > > I don't even have CONFIG_X86_X32 compiled in. > > > > > > > > Is this strace getting confused, or did we somehow screw > > > > up the syscall entry code ? > > > > > > > > Dave > > > > > > > > > > I think you're just seeing an oddity of how x32 works. Unlike > > > "compat", x32-ness of the current syscall isn't a special magic state > > > variable; it's just but 31 in the syscall nr. So trying to do an x32 > > > syscall on a non-x32 syscall should still show bit 31 set to ptracers, > > > and the strace probably decodes this as being in x32 mode. > > > > But this is an x86-64 binary, and it's the main process, not one of the > > fuzzing > > child processes. It shouldn't be even trying to do anything weird. > > It creates a bunch of fd's, then enters a loop forking/reaping children. > > (In this case it actually hung while creating the fd's) > > > > Trinity doesn't actually have any knowledge of x32 at all, mostly because > > it's been irrelevant to me (and most other people). > > > > Hmm. Do you have the next couple lines of strace output by any > chance? I'm wondering if this is a classic bug/misfeature/confusion > in the way that orig_ax works. I don't. That box got rebooted a couple dozen times since then. FWIW, I've not seen this happen since. Very strange. Dave
Re: x32 processes, with CONFIG_X86_X32 not set
On Thu, Apr 07, 2016 at 05:20:50PM -0700, Andy Lutomirski wrote: > > > > I had a trinity process get stuck last overnight. > > > > The reason for it getting stuck is my bug (I think), but > > > > there's an odd unrelated thing I noticed while debugging this.. > > > > > > > > $ strace -p 20966 > > > > strace: Process 20966 attached > > > > strace: [ Process PID=20966 runs in x32 mode. ] > > > > > > > > So I don't use that new-fangled x32 stuff. > > > > I don't even have CONFIG_X86_X32 compiled in. > > > > > > > > Is this strace getting confused, or did we somehow screw > > > > up the syscall entry code ? > > > > > > > > Dave > > > > > > > > > > I think you're just seeing an oddity of how x32 works. Unlike > > > "compat", x32-ness of the current syscall isn't a special magic state > > > variable; it's just but 31 in the syscall nr. So trying to do an x32 > > > syscall on a non-x32 syscall should still show bit 31 set to ptracers, > > > and the strace probably decodes this as being in x32 mode. > > > > But this is an x86-64 binary, and it's the main process, not one of the > > fuzzing > > child processes. It shouldn't be even trying to do anything weird. > > It creates a bunch of fd's, then enters a loop forking/reaping children. > > (In this case it actually hung while creating the fd's) > > > > Trinity doesn't actually have any knowledge of x32 at all, mostly because > > it's been irrelevant to me (and most other people). > > > > Hmm. Do you have the next couple lines of strace output by any > chance? I'm wondering if this is a classic bug/misfeature/confusion > in the way that orig_ax works. I don't. That box got rebooted a couple dozen times since then. FWIW, I've not seen this happen since. Very strange. Dave
Re: [PATCH v1 06/10] device property: switch to use UUID API
Andy Shevchenkowrites: > On Fri, 2016-02-26 at 16:11 +0200, Andy Shevchenko wrote: >> On Thu, 2016-02-18 at 01:03 +0100, Rafael J. Wysocki wrote: >> > >> > On Wednesday, February 17, 2016 02:17:24 PM Andy Shevchenko wrote: >> > > >> > > Switch to use a generic UUID API instead of custom approach. It >> > > allows to >> > > define UUIDs, compare them, and validate. >> [] >> > > Summon initial author of the UUID library. > > Summary: the API of comparison functions is rather strange. What the > point to not take pointers directly? (Moreover I hope compiler too > clever not to make a copy of constant arguments there) > > I could only imagine the case you are trying to avoid temporary > variables for constants like NULL_UUID. > > Issue with this is the ugliness in the users of that, in particularly > present in ACPI (drivers/acpi/apei/ghes.c). > > I would like to have more clear interface for that. Perhaps we may add > something like > > cmp_p(pointer, non-pointer); > cmp_pp(pointer, pointer); > > to not break existing API for now. > > It would be useful for many cases in the kernel. You can take a look at the drivers/acpi/apei/erst.c for uuid_le_cmp usage. #define CPER_CREATOR_PSTORE \ UUID_LE(0x75a574e3, 0x5052, 0x4b29, 0x8a, 0x8e, 0xbe, 0x2c, \ 0x64, 0x90, 0xb8, 0x9d) if (uuid_le_cmp(rcd->hdr.creator_id, CPER_CREATOR_PSTORE) != 0) goto skip; Looks better? This is the typical use case in mind when I write the uuid.h. As for uuid_le_cmp usage in drivers/acpi/apei/ghes.c, if (!uuid_le_cmp(*(uuid_le *)gdata->section_type, CPER_SEC_PLATFORM_MEM)) { The code looks not good mainly because acpi_hest_generic_data is not defined with uuid_le in mind. struct acpi_hest_generic_data { u8 section_type[16]; u32 error_severity; u16 revision; u8 validation_bits; u8 flags; u32 error_data_length; u8 fru_id[16]; u8 fru_text[20]; }; If section_type was defined as uuid_le instead of u8[16], the uuid_le_cmp usage would look better. So I suggest to use uuid_le/be in data structure definition in new code if possible. Best Regards, Huang, Ying >> > >> > > >> > > +static const uuid_le ads_uuid = >> > > +UUID_LE(0xdbb8e3e6, 0x5886, 0x4ba6, >> > > +0x87, 0x95, 0x13, 0x19, 0xf5, 0x2a, 0x96, 0x6b); >> > > >> > > static bool acpi_enumerate_nondev_subnodes(acpi_handle scope, >> > > const union >> > > acpi_object >> > > *desc, >> > > @@ -138,7 +136,7 @@ static bool >> > > acpi_enumerate_nondev_subnodes(acpi_handle scope, >> > > || links->type != ACPI_TYPE_PACKAGE) >> > > break; >> > > >> > > -if (memcmp(uuid->buffer.pointer, ads_uuid, >> > > sizeof(ads_uuid))) >> > > +if (uuid_le_cmp(*(uuid_le *)uuid->buffer.pointer, >> > > ads_uuid)) >> > Maybe it's too late, but I don't quite understand the pointer >> > manipulations here. >> > >> > I can see why you need a type conversion (although it looks ugly), >> > but why do you >> > need to dereference it too? >> The function takes that kind of type on input. The other variants are >> not compiled. >> Perhaps we better change uuid_{lb}e_cmp() first to take normal >> pointers, though I think the initial idea was to get type checking at >> compile time. >>
Re: [PATCH v1 06/10] device property: switch to use UUID API
Andy Shevchenko writes: > On Fri, 2016-02-26 at 16:11 +0200, Andy Shevchenko wrote: >> On Thu, 2016-02-18 at 01:03 +0100, Rafael J. Wysocki wrote: >> > >> > On Wednesday, February 17, 2016 02:17:24 PM Andy Shevchenko wrote: >> > > >> > > Switch to use a generic UUID API instead of custom approach. It >> > > allows to >> > > define UUIDs, compare them, and validate. >> [] >> > > Summon initial author of the UUID library. > > Summary: the API of comparison functions is rather strange. What the > point to not take pointers directly? (Moreover I hope compiler too > clever not to make a copy of constant arguments there) > > I could only imagine the case you are trying to avoid temporary > variables for constants like NULL_UUID. > > Issue with this is the ugliness in the users of that, in particularly > present in ACPI (drivers/acpi/apei/ghes.c). > > I would like to have more clear interface for that. Perhaps we may add > something like > > cmp_p(pointer, non-pointer); > cmp_pp(pointer, pointer); > > to not break existing API for now. > > It would be useful for many cases in the kernel. You can take a look at the drivers/acpi/apei/erst.c for uuid_le_cmp usage. #define CPER_CREATOR_PSTORE \ UUID_LE(0x75a574e3, 0x5052, 0x4b29, 0x8a, 0x8e, 0xbe, 0x2c, \ 0x64, 0x90, 0xb8, 0x9d) if (uuid_le_cmp(rcd->hdr.creator_id, CPER_CREATOR_PSTORE) != 0) goto skip; Looks better? This is the typical use case in mind when I write the uuid.h. As for uuid_le_cmp usage in drivers/acpi/apei/ghes.c, if (!uuid_le_cmp(*(uuid_le *)gdata->section_type, CPER_SEC_PLATFORM_MEM)) { The code looks not good mainly because acpi_hest_generic_data is not defined with uuid_le in mind. struct acpi_hest_generic_data { u8 section_type[16]; u32 error_severity; u16 revision; u8 validation_bits; u8 flags; u32 error_data_length; u8 fru_id[16]; u8 fru_text[20]; }; If section_type was defined as uuid_le instead of u8[16], the uuid_le_cmp usage would look better. So I suggest to use uuid_le/be in data structure definition in new code if possible. Best Regards, Huang, Ying >> > >> > > >> > > +static const uuid_le ads_uuid = >> > > +UUID_LE(0xdbb8e3e6, 0x5886, 0x4ba6, >> > > +0x87, 0x95, 0x13, 0x19, 0xf5, 0x2a, 0x96, 0x6b); >> > > >> > > static bool acpi_enumerate_nondev_subnodes(acpi_handle scope, >> > > const union >> > > acpi_object >> > > *desc, >> > > @@ -138,7 +136,7 @@ static bool >> > > acpi_enumerate_nondev_subnodes(acpi_handle scope, >> > > || links->type != ACPI_TYPE_PACKAGE) >> > > break; >> > > >> > > -if (memcmp(uuid->buffer.pointer, ads_uuid, >> > > sizeof(ads_uuid))) >> > > +if (uuid_le_cmp(*(uuid_le *)uuid->buffer.pointer, >> > > ads_uuid)) >> > Maybe it's too late, but I don't quite understand the pointer >> > manipulations here. >> > >> > I can see why you need a type conversion (although it looks ugly), >> > but why do you >> > need to dereference it too? >> The function takes that kind of type on input. The other variants are >> not compiled. >> Perhaps we better change uuid_{lb}e_cmp() first to take normal >> pointers, though I think the initial idea was to get type checking at >> compile time. >>
Re: [RFC PATCH 0/3] restartable sequences v2: fast user-space percpu critical sections
On Thu, Apr 7, 2016 at 6:11 PM, Mathieu Desnoyerswrote: > - On Apr 7, 2016, at 6:05 PM, Andy Lutomirski l...@amacapital.net wrote: > >> On Thu, Apr 7, 2016 at 1:11 PM, Peter Zijlstra wrote: >>> On Thu, Apr 07, 2016 at 09:43:33AM -0700, Andy Lutomirski wrote: > [...] >>> it's inherently debuggable, >>> >>> It is more debuggable, agreed. >>> and it allows multiple independent rseq-protected things to coexist without forcing each other to abort. > > [...] > > My understanding is that the main goal of this rather more complex > proposal is to make interaction with debuggers more straightforward in > cases of single-stepping through the rseq critical section. The things I like about my proposal are both that you can single-step through it just like any other code as long as you pin the thread to a CPU and that it doesn't make preemption magical. (Of course, you can *force* it to do something on resume and/or preemption by sticking a bogus value in the expected event count field, but that's not the intended use. Hmm, I guess it does need to hook preemption and/or resume for all processes that enable the thing so it can know to check for an enabled post_commit_rip, just like all the other proposals.) Also, mine lets you have a fairly long-running critical section that doesn't get aborted under heavy load and can interleave with other critical sections that don't conflict. > > I recently came up with a scheme that should allow us to handle such > situations in a fashion similar to debuggers handling ll/sc > restartable sequences of instructions on e.g. powerpc. The good news > is that my scheme does not require anything at the kernel level. > > The idea is simple: the userspace rseq critical sections now > become marked by 3 inline functions (rather than 2 in Paul's proposal): > > rseq_start(void *rseq_key) > rseq_finish(void *rseq_key) > rseq_abort(void *rseq_key) How do you use this thing? What are its semantics? --Andy > > We associate each critical section with a unique "key" (dummy > 1 byte object in the process address space), so we can group > them. The new "rseq_abort" would mark exit points that would > exit the critical section without executing the final commit > instruction. > > Within each of rseq_start, rseq_finish and rseq_abort, > we declare a non-loadable section that gets populated > with the following tuples: > > (RSEQ_TYPE, insn address, rseq_key) > > Where RSEQ_TYPE is either RSEQ_START, RSEQ_FINISH, or RSEQ_ABORT. > > That special section would be found in the executable by the > debugger, which can then skip over entire restartable critical > sections when it encounters them by placing breakpoints at > all exit points (finish and cancel) associated to the same > rseq_key as the entry point (start). > > This way we don't need to complexify the runtime code, neither > at kernel nor user-space level, and we get debuggability using > a trick similar to what ll/sc architectures already need to do. > > Of course, this requires extending gdb, which should not be > a show-stopper. > > Thoughts ? > > Thanks, > > Mathieu > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com -- Andy Lutomirski AMA Capital Management, LLC
Re: [RFC PATCH 0/3] restartable sequences v2: fast user-space percpu critical sections
On Thu, Apr 7, 2016 at 6:11 PM, Mathieu Desnoyers wrote: > - On Apr 7, 2016, at 6:05 PM, Andy Lutomirski l...@amacapital.net wrote: > >> On Thu, Apr 7, 2016 at 1:11 PM, Peter Zijlstra wrote: >>> On Thu, Apr 07, 2016 at 09:43:33AM -0700, Andy Lutomirski wrote: > [...] >>> it's inherently debuggable, >>> >>> It is more debuggable, agreed. >>> and it allows multiple independent rseq-protected things to coexist without forcing each other to abort. > > [...] > > My understanding is that the main goal of this rather more complex > proposal is to make interaction with debuggers more straightforward in > cases of single-stepping through the rseq critical section. The things I like about my proposal are both that you can single-step through it just like any other code as long as you pin the thread to a CPU and that it doesn't make preemption magical. (Of course, you can *force* it to do something on resume and/or preemption by sticking a bogus value in the expected event count field, but that's not the intended use. Hmm, I guess it does need to hook preemption and/or resume for all processes that enable the thing so it can know to check for an enabled post_commit_rip, just like all the other proposals.) Also, mine lets you have a fairly long-running critical section that doesn't get aborted under heavy load and can interleave with other critical sections that don't conflict. > > I recently came up with a scheme that should allow us to handle such > situations in a fashion similar to debuggers handling ll/sc > restartable sequences of instructions on e.g. powerpc. The good news > is that my scheme does not require anything at the kernel level. > > The idea is simple: the userspace rseq critical sections now > become marked by 3 inline functions (rather than 2 in Paul's proposal): > > rseq_start(void *rseq_key) > rseq_finish(void *rseq_key) > rseq_abort(void *rseq_key) How do you use this thing? What are its semantics? --Andy > > We associate each critical section with a unique "key" (dummy > 1 byte object in the process address space), so we can group > them. The new "rseq_abort" would mark exit points that would > exit the critical section without executing the final commit > instruction. > > Within each of rseq_start, rseq_finish and rseq_abort, > we declare a non-loadable section that gets populated > with the following tuples: > > (RSEQ_TYPE, insn address, rseq_key) > > Where RSEQ_TYPE is either RSEQ_START, RSEQ_FINISH, or RSEQ_ABORT. > > That special section would be found in the executable by the > debugger, which can then skip over entire restartable critical > sections when it encounters them by placing breakpoints at > all exit points (finish and cancel) associated to the same > rseq_key as the entry point (start). > > This way we don't need to complexify the runtime code, neither > at kernel nor user-space level, and we get debuggability using > a trick similar to what ll/sc architectures already need to do. > > Of course, this requires extending gdb, which should not be > a show-stopper. > > Thoughts ? > > Thanks, > > Mathieu > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com -- Andy Lutomirski AMA Capital Management, LLC
[PATCH v11 50/60] PCI: Unify calculate_size() for io port and MMIO
Now calculate_memsize() and calculate_iosize() is the same. Change them to calculate_size(). Signed-off-by: Yinghai Lu--- drivers/pci/setup-bus.c | 25 + 1 file changed, 5 insertions(+), 20 deletions(-) diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c index 930dcbd..b071035 100644 --- a/drivers/pci/setup-bus.c +++ b/drivers/pci/setup-bus.c @@ -1103,22 +1103,7 @@ static struct resource *find_free_bus_resource(struct pci_bus *bus, return NULL; } -static resource_size_t calculate_iosize(resource_size_t size, - resource_size_t min_size, - resource_size_t old_size, - resource_size_t align) -{ - if (size < min_size) - size = min_size; - if (old_size == 1) - old_size = 0; - size = ALIGN(size, align); - if (size < old_size) - size = old_size; - return size; -} - -static resource_size_t calculate_memsize(resource_size_t size, +static resource_size_t calculate_size(resource_size_t size, resource_size_t min_size, resource_size_t old_size, resource_size_t align) @@ -1244,14 +1229,14 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size, size = size_aligned_for_isa(size); size += size1; - size0 = calculate_iosize(size, min_size, + size0 = calculate_size(size, min_size, resource_size(b_res), min_align); sum_add_size = size_aligned_for_isa(sum_add_size); sum_add_size += sum_add_size1; if (sum_add_size < min_sum_size) sum_add_size = min_sum_size; size1 = !realloc_head ? size0 : - calculate_iosize(sum_add_size, min_size, + calculate_size(sum_add_size, min_size, resource_size(b_res), min_align); if (!size0 && !size1) { if (b_res->start || b_res->end) @@ -1580,7 +1565,7 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask, if (size || min_size) { min_align = calculate_mem_align(_test_list, max_align, size, window_align); - size0 = calculate_memsize(size, min_size, + size0 = calculate_size(size, min_size, resource_size(b_res), min_align); } free_align_test_list(_test_list); @@ -1605,7 +1590,7 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask, min_add_align = calculate_mem_align(_test_add_list, max_add_align, sum_add_size, window_align); - size1 = calculate_memsize(sum_add_size, min_size, + size1 = calculate_size(sum_add_size, min_size, resource_size(b_res), min_add_align); } free_align_test_list(_test_add_list); -- 1.8.4.5
[PATCH v11 50/60] PCI: Unify calculate_size() for io port and MMIO
Now calculate_memsize() and calculate_iosize() is the same. Change them to calculate_size(). Signed-off-by: Yinghai Lu --- drivers/pci/setup-bus.c | 25 + 1 file changed, 5 insertions(+), 20 deletions(-) diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c index 930dcbd..b071035 100644 --- a/drivers/pci/setup-bus.c +++ b/drivers/pci/setup-bus.c @@ -1103,22 +1103,7 @@ static struct resource *find_free_bus_resource(struct pci_bus *bus, return NULL; } -static resource_size_t calculate_iosize(resource_size_t size, - resource_size_t min_size, - resource_size_t old_size, - resource_size_t align) -{ - if (size < min_size) - size = min_size; - if (old_size == 1) - old_size = 0; - size = ALIGN(size, align); - if (size < old_size) - size = old_size; - return size; -} - -static resource_size_t calculate_memsize(resource_size_t size, +static resource_size_t calculate_size(resource_size_t size, resource_size_t min_size, resource_size_t old_size, resource_size_t align) @@ -1244,14 +1229,14 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size, size = size_aligned_for_isa(size); size += size1; - size0 = calculate_iosize(size, min_size, + size0 = calculate_size(size, min_size, resource_size(b_res), min_align); sum_add_size = size_aligned_for_isa(sum_add_size); sum_add_size += sum_add_size1; if (sum_add_size < min_sum_size) sum_add_size = min_sum_size; size1 = !realloc_head ? size0 : - calculate_iosize(sum_add_size, min_size, + calculate_size(sum_add_size, min_size, resource_size(b_res), min_align); if (!size0 && !size1) { if (b_res->start || b_res->end) @@ -1580,7 +1565,7 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask, if (size || min_size) { min_align = calculate_mem_align(_test_list, max_align, size, window_align); - size0 = calculate_memsize(size, min_size, + size0 = calculate_size(size, min_size, resource_size(b_res), min_align); } free_align_test_list(_test_list); @@ -1605,7 +1590,7 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask, min_add_align = calculate_mem_align(_test_add_list, max_add_align, sum_add_size, window_align); - size1 = calculate_memsize(sum_add_size, min_size, + size1 = calculate_size(sum_add_size, min_size, resource_size(b_res), min_add_align); } free_align_test_list(_test_add_list); -- 1.8.4.5
Re: [PATCH v8 net-next 1/1] hv_sock: introduce Hyper-V Sockets
On Thu, 2016-04-07 at 18:36 -0700, Dexuan Cui wrote: > Hyper-V Sockets (hv_sock) supplies a byte-stream based communication > mechanism between the host and the guest. It's somewhat like TCP over > VMBus, but the transportation layer (VMBus) is much simpler than IP. [] > diff --git a/include/net/af_hvsock.h b/include/net/af_hvsock.h [] > +#define VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV (5 * PAGE_SIZE) > +#define VMBUS_RINGBUFFER_SIZE_HVSOCK_SEND (5 * PAGE_SIZE) > + > +#define HVSOCK_RCV_BUF_SZVMBUS_RINGBUFFER_SIZE_HVSOCK_RECV > +#define HVSOCK_SND_BUF_SZPAGE_SIZE [] > +struct hvsock_sock { [] > + struct { > + struct vmpipe_proto_header hdr; > + char buf[HVSOCK_SND_BUF_SZ]; > + } __packed send; > + > + struct { > + struct vmpipe_proto_header hdr; > + char buf[HVSOCK_RCV_BUF_SZ]; > + unsigned int data_len; > + unsigned int data_offset; > + } __packed recv; > +}; These bufs are not page aligned and so can span pages. Is there any value in allocating these bufs separately as pages instead of as a kmalloc?
Re: [PATCH v8 net-next 1/1] hv_sock: introduce Hyper-V Sockets
On Thu, 2016-04-07 at 18:36 -0700, Dexuan Cui wrote: > Hyper-V Sockets (hv_sock) supplies a byte-stream based communication > mechanism between the host and the guest. It's somewhat like TCP over > VMBus, but the transportation layer (VMBus) is much simpler than IP. [] > diff --git a/include/net/af_hvsock.h b/include/net/af_hvsock.h [] > +#define VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV (5 * PAGE_SIZE) > +#define VMBUS_RINGBUFFER_SIZE_HVSOCK_SEND (5 * PAGE_SIZE) > + > +#define HVSOCK_RCV_BUF_SZVMBUS_RINGBUFFER_SIZE_HVSOCK_RECV > +#define HVSOCK_SND_BUF_SZPAGE_SIZE [] > +struct hvsock_sock { [] > + struct { > + struct vmpipe_proto_header hdr; > + char buf[HVSOCK_SND_BUF_SZ]; > + } __packed send; > + > + struct { > + struct vmpipe_proto_header hdr; > + char buf[HVSOCK_RCV_BUF_SZ]; > + unsigned int data_len; > + unsigned int data_offset; > + } __packed recv; > +}; These bufs are not page aligned and so can span pages. Is there any value in allocating these bufs separately as pages instead of as a kmalloc?
Re: [PATCH v4 00/14] x86: remove paravirt_enabled
On Wed, Apr 06, 2016 at 05:06:20PM -0700, Luis R. Rodriguez wrote: > Now that Andy's ASM paravirt_enabled() use is merged Sorry I should have provided more context, I meant that now that Andy's ASM paravirt_enabled() removal is merged: This is the ASM hack that Andy removed: https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/fsgsbase=58a5aac5331388a175a42b6ed2154f0559cefb21 This puts a nail on coffin for the ASM hack: https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/fsgsbase=0dd0036f6e07f741a1356b424b84a3164b6e59cf Luis
Re: [PATCH] phy: rockchip-emmc: fix compile issue on arm64 platform
在 2016/4/7 21:31, Kishon Vijay Abraham I 写道: Hi, On Thursday 07 April 2016 06:30 PM, Kishon Vijay Abraham I wrote: Hi, On Tuesday 08 March 2016 01:54 PM, Shawn Lin wrote: This patch rename "reg" property to "reg_offset". We rename it to fix the compile issue on ARM64 platform: (reg_format): "reg" property in /phy has invalid length (4 bytes) (#address-cells == 2, #size-cells == 2) Is the same node used for both ARM32 and ARM64 platforms? Thanks Kishon This's because "reg" is very special one which should keep the *-cells with its parent node and can't be overwrited even if we do that explicitly. On 32-bit plafform, the default *-cells fit for what we assign to "reg". But that's not correct for 64-bit platform. So we can see two possible solutions to fix this problem: A) make phy-rockchip-emmc as a child phy node and overwrite its parent's #address-cells and #size-cells. B) avoid using this special property. we use it just for passing on a offset for different Socs, and there's no requirement to change the code to make phy-rockchip-emmc as a child node. so choose option B) is sane. I just looked at the Heiko's patch and it makes more sense to have the binding that he described in his patch [1]. Can you fix it accordingly? yes, Heiko's patch is more reasonable to me. With his patch applied, this issue is gone if assigning address-cells in dt for grf. So we can drop this patch. I seem to have only this patch and Heiko's patch for this -rc cycle. Once you send your patch, I can send a pull request to Greg. Thanks Kishon [1]- > https://patchwork.ozlabs.org/patch/601580/ Signed-off-by: Shawn Lin--- Documentation/devicetree/bindings/phy/rockchip-emmc-phy.txt | 4 ++-- drivers/phy/phy-rockchip-emmc.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/Documentation/devicetree/bindings/phy/rockchip-emmc-phy.txt b/Documentation/devicetree/bindings/phy/rockchip-emmc-phy.txt index 61916f1..ed964ef 100644 --- a/Documentation/devicetree/bindings/phy/rockchip-emmc-phy.txt +++ b/Documentation/devicetree/bindings/phy/rockchip-emmc-phy.txt @@ -6,7 +6,7 @@ Required properties: - rockchip,grf : phandle to the syscon managing the "general register files" - #phy-cells: must be 0 - - reg: PHY configure reg address offset in "general + - reg_offset: PHY configure reg address offset in "general register files" Example: @@ -14,6 +14,6 @@ Example: emmcphy: phy { compatible = "rockchip,rk3399-emmc-phy"; rockchip,grf = <>; - reg = <0xf780>; + reg_offset = <0xf780>; #phy-cells = <0>; }; diff --git a/drivers/phy/phy-rockchip-emmc.c b/drivers/phy/phy-rockchip-emmc.c index 887b4c2..3f55c0d 100644 --- a/drivers/phy/phy-rockchip-emmc.c +++ b/drivers/phy/phy-rockchip-emmc.c @@ -186,7 +186,7 @@ static int rockchip_emmc_phy_probe(struct platform_device *pdev) if (!rk_phy) return -ENOMEM; - if (of_property_read_u32(dev->of_node, "reg", _offset)) { + if (of_property_read_u32(dev->of_node, "reg_offset", _offset)) { dev_err(dev, "missing reg property in node %s\n", dev->of_node->name); return -EINVAL; -- Best Regards Shawn Lin
Re: [PATCH v4 00/14] x86: remove paravirt_enabled
On Wed, Apr 06, 2016 at 05:06:20PM -0700, Luis R. Rodriguez wrote: > Now that Andy's ASM paravirt_enabled() use is merged Sorry I should have provided more context, I meant that now that Andy's ASM paravirt_enabled() removal is merged: This is the ASM hack that Andy removed: https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/fsgsbase=58a5aac5331388a175a42b6ed2154f0559cefb21 This puts a nail on coffin for the ASM hack: https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/fsgsbase=0dd0036f6e07f741a1356b424b84a3164b6e59cf Luis
Re: [PATCH] phy: rockchip-emmc: fix compile issue on arm64 platform
在 2016/4/7 21:31, Kishon Vijay Abraham I 写道: Hi, On Thursday 07 April 2016 06:30 PM, Kishon Vijay Abraham I wrote: Hi, On Tuesday 08 March 2016 01:54 PM, Shawn Lin wrote: This patch rename "reg" property to "reg_offset". We rename it to fix the compile issue on ARM64 platform: (reg_format): "reg" property in /phy has invalid length (4 bytes) (#address-cells == 2, #size-cells == 2) Is the same node used for both ARM32 and ARM64 platforms? Thanks Kishon This's because "reg" is very special one which should keep the *-cells with its parent node and can't be overwrited even if we do that explicitly. On 32-bit plafform, the default *-cells fit for what we assign to "reg". But that's not correct for 64-bit platform. So we can see two possible solutions to fix this problem: A) make phy-rockchip-emmc as a child phy node and overwrite its parent's #address-cells and #size-cells. B) avoid using this special property. we use it just for passing on a offset for different Socs, and there's no requirement to change the code to make phy-rockchip-emmc as a child node. so choose option B) is sane. I just looked at the Heiko's patch and it makes more sense to have the binding that he described in his patch [1]. Can you fix it accordingly? yes, Heiko's patch is more reasonable to me. With his patch applied, this issue is gone if assigning address-cells in dt for grf. So we can drop this patch. I seem to have only this patch and Heiko's patch for this -rc cycle. Once you send your patch, I can send a pull request to Greg. Thanks Kishon [1]- > https://patchwork.ozlabs.org/patch/601580/ Signed-off-by: Shawn Lin --- Documentation/devicetree/bindings/phy/rockchip-emmc-phy.txt | 4 ++-- drivers/phy/phy-rockchip-emmc.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/Documentation/devicetree/bindings/phy/rockchip-emmc-phy.txt b/Documentation/devicetree/bindings/phy/rockchip-emmc-phy.txt index 61916f1..ed964ef 100644 --- a/Documentation/devicetree/bindings/phy/rockchip-emmc-phy.txt +++ b/Documentation/devicetree/bindings/phy/rockchip-emmc-phy.txt @@ -6,7 +6,7 @@ Required properties: - rockchip,grf : phandle to the syscon managing the "general register files" - #phy-cells: must be 0 - - reg: PHY configure reg address offset in "general + - reg_offset: PHY configure reg address offset in "general register files" Example: @@ -14,6 +14,6 @@ Example: emmcphy: phy { compatible = "rockchip,rk3399-emmc-phy"; rockchip,grf = <>; - reg = <0xf780>; + reg_offset = <0xf780>; #phy-cells = <0>; }; diff --git a/drivers/phy/phy-rockchip-emmc.c b/drivers/phy/phy-rockchip-emmc.c index 887b4c2..3f55c0d 100644 --- a/drivers/phy/phy-rockchip-emmc.c +++ b/drivers/phy/phy-rockchip-emmc.c @@ -186,7 +186,7 @@ static int rockchip_emmc_phy_probe(struct platform_device *pdev) if (!rk_phy) return -ENOMEM; - if (of_property_read_u32(dev->of_node, "reg", _offset)) { + if (of_property_read_u32(dev->of_node, "reg_offset", _offset)) { dev_err(dev, "missing reg property in node %s\n", dev->of_node->name); return -EINVAL; -- Best Regards Shawn Lin
Re: [RFC PATCH 0/3] restartable sequences v2: fast user-space percpu critical sections
- On Apr 7, 2016, at 6:05 PM, Andy Lutomirski l...@amacapital.net wrote: > On Thu, Apr 7, 2016 at 1:11 PM, Peter Zijlstrawrote: >> On Thu, Apr 07, 2016 at 09:43:33AM -0700, Andy Lutomirski wrote: [...] >> >>> it's inherently debuggable, >> >> It is more debuggable, agreed. >> >>> and it allows multiple independent >>> rseq-protected things to coexist without forcing each other to abort. [...] My understanding is that the main goal of this rather more complex proposal is to make interaction with debuggers more straightforward in cases of single-stepping through the rseq critical section. I recently came up with a scheme that should allow us to handle such situations in a fashion similar to debuggers handling ll/sc restartable sequences of instructions on e.g. powerpc. The good news is that my scheme does not require anything at the kernel level. The idea is simple: the userspace rseq critical sections now become marked by 3 inline functions (rather than 2 in Paul's proposal): rseq_start(void *rseq_key) rseq_finish(void *rseq_key) rseq_abort(void *rseq_key) We associate each critical section with a unique "key" (dummy 1 byte object in the process address space), so we can group them. The new "rseq_abort" would mark exit points that would exit the critical section without executing the final commit instruction. Within each of rseq_start, rseq_finish and rseq_abort, we declare a non-loadable section that gets populated with the following tuples: (RSEQ_TYPE, insn address, rseq_key) Where RSEQ_TYPE is either RSEQ_START, RSEQ_FINISH, or RSEQ_ABORT. That special section would be found in the executable by the debugger, which can then skip over entire restartable critical sections when it encounters them by placing breakpoints at all exit points (finish and cancel) associated to the same rseq_key as the entry point (start). This way we don't need to complexify the runtime code, neither at kernel nor user-space level, and we get debuggability using a trick similar to what ll/sc architectures already need to do. Of course, this requires extending gdb, which should not be a show-stopper. Thoughts ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
Re: [RFC PATCH 0/3] restartable sequences v2: fast user-space percpu critical sections
- On Apr 7, 2016, at 6:05 PM, Andy Lutomirski l...@amacapital.net wrote: > On Thu, Apr 7, 2016 at 1:11 PM, Peter Zijlstra wrote: >> On Thu, Apr 07, 2016 at 09:43:33AM -0700, Andy Lutomirski wrote: [...] >> >>> it's inherently debuggable, >> >> It is more debuggable, agreed. >> >>> and it allows multiple independent >>> rseq-protected things to coexist without forcing each other to abort. [...] My understanding is that the main goal of this rather more complex proposal is to make interaction with debuggers more straightforward in cases of single-stepping through the rseq critical section. I recently came up with a scheme that should allow us to handle such situations in a fashion similar to debuggers handling ll/sc restartable sequences of instructions on e.g. powerpc. The good news is that my scheme does not require anything at the kernel level. The idea is simple: the userspace rseq critical sections now become marked by 3 inline functions (rather than 2 in Paul's proposal): rseq_start(void *rseq_key) rseq_finish(void *rseq_key) rseq_abort(void *rseq_key) We associate each critical section with a unique "key" (dummy 1 byte object in the process address space), so we can group them. The new "rseq_abort" would mark exit points that would exit the critical section without executing the final commit instruction. Within each of rseq_start, rseq_finish and rseq_abort, we declare a non-loadable section that gets populated with the following tuples: (RSEQ_TYPE, insn address, rseq_key) Where RSEQ_TYPE is either RSEQ_START, RSEQ_FINISH, or RSEQ_ABORT. That special section would be found in the executable by the debugger, which can then skip over entire restartable critical sections when it encounters them by placing breakpoints at all exit points (finish and cancel) associated to the same rseq_key as the entry point (start). This way we don't need to complexify the runtime code, neither at kernel nor user-space level, and we get debuggability using a trick similar to what ll/sc architectures already need to do. Of course, this requires extending gdb, which should not be a show-stopper. Thoughts ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
[PATCH 1/3] sched: Gather cpu load functions under a more conventional namespace
The cpu load update related functions have a weak naming convention currently, starting with update_cpu_load_*() which isn't ideal as "update" is a very generic concept. Since two of these functions are public already (and a third is to come) that's enough to introduce a more conventional naming scheme. So let's do the following rename instead: update_cpu_load_*() -> cpu_load_update_*() Cc: Byungchul ParkCc: Chris Metcalf Cc: Christoph Lameter Cc: Luiz Capitulino Cc: Mike Galbraith Cc: Paul E. McKenney Cc: Peter Zijlstra Cc: Rik van Riel Cc: Thomas Gleixner Signed-off-by: Frederic Weisbecker --- Documentation/trace/ftrace.txt | 10 +- include/linux/sched.h | 4 ++-- kernel/sched/core.c| 2 +- kernel/sched/fair.c| 24 kernel/sched/sched.h | 4 ++-- kernel/time/tick-sched.c | 2 +- 6 files changed, 23 insertions(+), 23 deletions(-) diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt index f52f297..9857606 100644 --- a/Documentation/trace/ftrace.txt +++ b/Documentation/trace/ftrace.txt @@ -1562,12 +1562,12 @@ Doing the same with chrt -r 5 and function-trace set. -0 3dN.1 12us : menu_hrtimer_cancel <-tick_nohz_idle_exit -0 3dN.1 12us : ktime_get <-tick_nohz_idle_exit -0 3dN.1 12us : tick_do_update_jiffies64 <-tick_nohz_idle_exit - -0 3dN.1 13us : update_cpu_load_nohz <-tick_nohz_idle_exit - -0 3dN.1 13us : _raw_spin_lock <-update_cpu_load_nohz + -0 3dN.1 13us : cpu_load_update_nohz <-tick_nohz_idle_exit + -0 3dN.1 13us : _raw_spin_lock <-cpu_load_update_nohz -0 3dN.1 13us : add_preempt_count <-_raw_spin_lock - -0 3dN.2 13us : __update_cpu_load <-update_cpu_load_nohz - -0 3dN.2 14us : sched_avg_update <-__update_cpu_load - -0 3dN.2 14us : _raw_spin_unlock <-update_cpu_load_nohz + -0 3dN.2 13us : __cpu_load_update <-cpu_load_update_nohz + -0 3dN.2 14us : sched_avg_update <-__cpu_load_update + -0 3dN.2 14us : _raw_spin_unlock <-cpu_load_update_nohz -0 3dN.2 14us : sub_preempt_count <-_raw_spin_unlock -0 3dN.1 15us : calc_load_exit_idle <-tick_nohz_idle_exit -0 3dN.1 15us : touch_softlockup_watchdog <-tick_nohz_idle_exit diff --git a/include/linux/sched.h b/include/linux/sched.h index 52c4847..aa5ee0d 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -178,9 +178,9 @@ extern void get_iowait_load(unsigned long *nr_waiters, unsigned long *load); extern void calc_global_load(unsigned long ticks); #if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON) -extern void update_cpu_load_nohz(int active); +extern void cpu_load_update_nohz(int active); #else -static inline void update_cpu_load_nohz(int active) { } +static inline void cpu_load_update_nohz(int active) { } #endif extern void dump_cpu_task(int cpu); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 8b489fc..4c522a7 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2915,7 +2915,7 @@ void scheduler_tick(void) raw_spin_lock(>lock); update_rq_clock(rq); curr->sched_class->task_tick(rq, curr, 0); - update_cpu_load_active(rq); + cpu_load_update_active(rq); calc_global_load_tick(rq); raw_spin_unlock(>lock); diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 0fe30e66..f33764d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4491,7 +4491,7 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx) } /** - * __update_cpu_load - update the rq->cpu_load[] statistics + * __cpu_load_update - update the rq->cpu_load[] statistics * @this_rq: The rq to update statistics for * @this_load: The current load * @pending_updates: The number of missed updates @@ -4526,7 +4526,7 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx) * see decay_load_misses(). For NOHZ_FULL we get to subtract and add the extra * term. See the @active paramter. */ -static void __update_cpu_load(struct rq *this_rq, unsigned long this_load, +static void __cpu_load_update(struct rq *this_rq, unsigned long this_load, unsigned long pending_updates, int active) { unsigned long tickless_load = active ? this_rq->cpu_load[0] : 0; @@ -4574,7 +4574,7 @@ static unsigned long weighted_cpuload(const int cpu) } #ifdef CONFIG_NO_HZ_COMMON -static void __update_cpu_load_nohz(struct rq *this_rq, +static void __cpu_load_update_nohz(struct rq *this_rq, unsigned long curr_jiffies, unsigned long load,
[PATCH 1/3] sched: Gather cpu load functions under a more conventional namespace
The cpu load update related functions have a weak naming convention currently, starting with update_cpu_load_*() which isn't ideal as "update" is a very generic concept. Since two of these functions are public already (and a third is to come) that's enough to introduce a more conventional naming scheme. So let's do the following rename instead: update_cpu_load_*() -> cpu_load_update_*() Cc: Byungchul Park Cc: Chris Metcalf Cc: Christoph Lameter Cc: Luiz Capitulino Cc: Mike Galbraith Cc: Paul E. McKenney Cc: Peter Zijlstra Cc: Rik van Riel Cc: Thomas Gleixner Signed-off-by: Frederic Weisbecker --- Documentation/trace/ftrace.txt | 10 +- include/linux/sched.h | 4 ++-- kernel/sched/core.c| 2 +- kernel/sched/fair.c| 24 kernel/sched/sched.h | 4 ++-- kernel/time/tick-sched.c | 2 +- 6 files changed, 23 insertions(+), 23 deletions(-) diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt index f52f297..9857606 100644 --- a/Documentation/trace/ftrace.txt +++ b/Documentation/trace/ftrace.txt @@ -1562,12 +1562,12 @@ Doing the same with chrt -r 5 and function-trace set. -0 3dN.1 12us : menu_hrtimer_cancel <-tick_nohz_idle_exit -0 3dN.1 12us : ktime_get <-tick_nohz_idle_exit -0 3dN.1 12us : tick_do_update_jiffies64 <-tick_nohz_idle_exit - -0 3dN.1 13us : update_cpu_load_nohz <-tick_nohz_idle_exit - -0 3dN.1 13us : _raw_spin_lock <-update_cpu_load_nohz + -0 3dN.1 13us : cpu_load_update_nohz <-tick_nohz_idle_exit + -0 3dN.1 13us : _raw_spin_lock <-cpu_load_update_nohz -0 3dN.1 13us : add_preempt_count <-_raw_spin_lock - -0 3dN.2 13us : __update_cpu_load <-update_cpu_load_nohz - -0 3dN.2 14us : sched_avg_update <-__update_cpu_load - -0 3dN.2 14us : _raw_spin_unlock <-update_cpu_load_nohz + -0 3dN.2 13us : __cpu_load_update <-cpu_load_update_nohz + -0 3dN.2 14us : sched_avg_update <-__cpu_load_update + -0 3dN.2 14us : _raw_spin_unlock <-cpu_load_update_nohz -0 3dN.2 14us : sub_preempt_count <-_raw_spin_unlock -0 3dN.1 15us : calc_load_exit_idle <-tick_nohz_idle_exit -0 3dN.1 15us : touch_softlockup_watchdog <-tick_nohz_idle_exit diff --git a/include/linux/sched.h b/include/linux/sched.h index 52c4847..aa5ee0d 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -178,9 +178,9 @@ extern void get_iowait_load(unsigned long *nr_waiters, unsigned long *load); extern void calc_global_load(unsigned long ticks); #if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON) -extern void update_cpu_load_nohz(int active); +extern void cpu_load_update_nohz(int active); #else -static inline void update_cpu_load_nohz(int active) { } +static inline void cpu_load_update_nohz(int active) { } #endif extern void dump_cpu_task(int cpu); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 8b489fc..4c522a7 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2915,7 +2915,7 @@ void scheduler_tick(void) raw_spin_lock(>lock); update_rq_clock(rq); curr->sched_class->task_tick(rq, curr, 0); - update_cpu_load_active(rq); + cpu_load_update_active(rq); calc_global_load_tick(rq); raw_spin_unlock(>lock); diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 0fe30e66..f33764d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4491,7 +4491,7 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx) } /** - * __update_cpu_load - update the rq->cpu_load[] statistics + * __cpu_load_update - update the rq->cpu_load[] statistics * @this_rq: The rq to update statistics for * @this_load: The current load * @pending_updates: The number of missed updates @@ -4526,7 +4526,7 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx) * see decay_load_misses(). For NOHZ_FULL we get to subtract and add the extra * term. See the @active paramter. */ -static void __update_cpu_load(struct rq *this_rq, unsigned long this_load, +static void __cpu_load_update(struct rq *this_rq, unsigned long this_load, unsigned long pending_updates, int active) { unsigned long tickless_load = active ? this_rq->cpu_load[0] : 0; @@ -4574,7 +4574,7 @@ static unsigned long weighted_cpuload(const int cpu) } #ifdef CONFIG_NO_HZ_COMMON -static void __update_cpu_load_nohz(struct rq *this_rq, +static void __cpu_load_update_nohz(struct rq *this_rq, unsigned long curr_jiffies, unsigned long load, int active) @@ -4589,7 +4589,7 @@ static void __update_cpu_load_nohz(struct rq *this_rq, * In the NOHZ_FULL case, we were non-idle, we should consider * its weighted
Re: [PATCH v6 01/12] usb: hcd: Initialize hcd->flags to 0
On Thu, Apr 07, 2016 at 01:40:21PM +0300, Roger Quadros wrote: > On 07/04/16 12:42, Peter Chen wrote: > > On Wed, Apr 06, 2016 at 09:32:22AM +0300, Roger Quadros wrote: > >> On 06/04/16 09:09, Felipe Balbi wrote: > >>> > >>> Hi, > >>> > >>> Roger Quadroswrites: > diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c > index 2ca2cef..6b1930d 100644 > --- a/drivers/usb/core/hcd.c > +++ b/drivers/usb/core/hcd.c > @@ -2706,6 +2706,7 @@ int usb_add_hcd(struct usb_hcd *hcd, > int retval; > struct usb_device *rhdev; > > +hcd->flags = 0; > >>> > > > > I am not sure if this usb_add(remove)_hcd pair is safe and clean enough > > for start/stop host role. From my point, we may need to do like > > .probe/.remove host platform driver interface. In that case, we can make > > probe and remove are meant to be called from bus layer. > I do not see a way how OTG framework can call probe/remove of HCD driver. > Some HCDs may be platform devices, some PCI, so different entities are calling > the HCD .probe hook. > > > sure the clocks and regulators are off, and hcd will be zero-initialized > > why can't we make that sure that is taken care of within the hcd_ops? > Why should some driver keep its regulators and clocks enabled when hcd is > stopped? > It doesn't need to. If it is doing so now, it needs to be fixed. > Well, you may misunderstand me. I mean your hcd_ops->start or ->stop is hard to be a general one which only calls usb_hcd_add or usb_hcd_remove. It needs to implement like .probe or .remove at platform driver, some example code like host_start and host_stop at drivers/usb/chipidea/host.c. -- Best Regards, Peter Chen
Re: [PATCH v6 01/12] usb: hcd: Initialize hcd->flags to 0
On Thu, Apr 07, 2016 at 01:40:21PM +0300, Roger Quadros wrote: > On 07/04/16 12:42, Peter Chen wrote: > > On Wed, Apr 06, 2016 at 09:32:22AM +0300, Roger Quadros wrote: > >> On 06/04/16 09:09, Felipe Balbi wrote: > >>> > >>> Hi, > >>> > >>> Roger Quadros writes: > diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c > index 2ca2cef..6b1930d 100644 > --- a/drivers/usb/core/hcd.c > +++ b/drivers/usb/core/hcd.c > @@ -2706,6 +2706,7 @@ int usb_add_hcd(struct usb_hcd *hcd, > int retval; > struct usb_device *rhdev; > > +hcd->flags = 0; > >>> > > > > I am not sure if this usb_add(remove)_hcd pair is safe and clean enough > > for start/stop host role. From my point, we may need to do like > > .probe/.remove host platform driver interface. In that case, we can make > > probe and remove are meant to be called from bus layer. > I do not see a way how OTG framework can call probe/remove of HCD driver. > Some HCDs may be platform devices, some PCI, so different entities are calling > the HCD .probe hook. > > > sure the clocks and regulators are off, and hcd will be zero-initialized > > why can't we make that sure that is taken care of within the hcd_ops? > Why should some driver keep its regulators and clocks enabled when hcd is > stopped? > It doesn't need to. If it is doing so now, it needs to be fixed. > Well, you may misunderstand me. I mean your hcd_ops->start or ->stop is hard to be a general one which only calls usb_hcd_add or usb_hcd_remove. It needs to implement like .probe or .remove at platform driver, some example code like host_start and host_stop at drivers/usb/chipidea/host.c. -- Best Regards, Peter Chen
[PATCH 3/3] sched: Optimize !CONFIG_NO_HZ_COMMON cpu load updates
Some code in cpu load update only concern NO_HZ configs but it is built on all configurations. When NO_HZ isn't built, that code is harmless but just happens to take some useless ressources in CPU and memory: 1) one useless field in struct rq 2) jiffies record on every tick that is never used (cpu_load_update_periodic) 3) decay_load_missed is called two times on every tick to eventually return immediately with no action taken. And that function is dead code. For pure optimization purposes, lets conditionally build the NO_HZ related code. Cc: Byungchul ParkCc: Chris Metcalf Cc: Christoph Lameter Cc: Ingo Molnar Cc: Luiz Capitulino Cc: Mike Galbraith Cc: Paul E. McKenney Cc: Peter Zijlstra Cc: Rik van Riel Cc: Thomas Gleixner Signed-off-by: Frederic Weisbecker --- kernel/sched/core.c | 3 ++- kernel/sched/fair.c | 43 --- kernel/sched/sched.h | 6 -- 3 files changed, 38 insertions(+), 14 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 4c522a7..59a2821 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -7327,8 +7327,9 @@ void __init sched_init(void) for (j = 0; j < CPU_LOAD_IDX_MAX; j++) rq->cpu_load[j] = 0; - +#ifdef CONFIG_NO_HZ_COMMON rq->last_load_update_tick = jiffies; +#endif #ifdef CONFIG_SMP rq->sd = NULL; diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 1dd864d..4618e5b 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4423,6 +4423,7 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags) } #ifdef CONFIG_SMP +#ifdef CONFIG_NO_HZ_COMMON /* * per rq 'load' arrray crap; XXX kill this. @@ -4490,6 +4491,33 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx) return load; } +static unsigned long +cpu_load_update_missed(unsigned long old_load, unsigned long tickless_load, + unsigned long pending_updates, int idx) +{ + old_load = decay_load_missed(old_load, pending_updates - 1, idx); + if (tickless_load) { + old_load -= decay_load_missed(tickless_load, pending_updates - 1, idx); + /* +* old_load can never be a negative value because a +* decayed tickless_load cannot be greater than the +* original tickless_load. +*/ + old_load += tickless_load; + } + return old_load; +} +#else /* !CONFIG_NO_HZ_COMMON */ + +static inline unsigned long +cpu_load_update_missed(unsigned long old_load, unsigned long tickless_load, + unsigned long pending_updates, int idx) +{ + return old_load; +} + +#endif /* CONFIG_NO_HZ_COMMON */ + /** * __cpu_load_update - update the rq->cpu_load[] statistics * @this_rq: The rq to update statistics for @@ -4540,17 +4568,8 @@ static void cpu_load_update(struct rq *this_rq, unsigned long this_load, /* scale is effectively 1 << i now, and >> i divides by scale */ - old_load = this_rq->cpu_load[i]; - old_load = decay_load_missed(old_load, pending_updates - 1, i); - if (tickless_load) { - old_load -= decay_load_missed(tickless_load, pending_updates - 1, i); - /* -* old_load can never be a negative value because a -* decayed tickless_load cannot be greater than the -* original tickless_load. -*/ - old_load += tickless_load; - } + old_load = cpu_load_update_missed(this_rq->cpu_load[i], + tickless_load, pending_updates, i); new_load = this_load; /* * Round up the averaging division if load is increasing. This @@ -4661,8 +4680,10 @@ static inline void cpu_load_update_nohz(struct rq *this_rq, static void cpu_load_update_periodic(struct rq *this_rq, unsigned long load) { +#ifdef CONFIG_NO_HZ_COMMON /* See the mess around cpu_load_update_nohz(). */ this_rq->last_load_update_tick = READ_ONCE(jiffies); +#endif cpu_load_update(this_rq, load, 1); } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 1802013..2302bb6 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -585,11 +585,13 @@ struct rq { #endif #define CPU_LOAD_IDX_MAX 5 unsigned long cpu_load[CPU_LOAD_IDX_MAX]; +#ifdef CONFIG_NO_HZ_COMMON +#ifdef CONFIG_SMP unsigned long last_load_update_tick; -#ifdef CONFIG_NO_HZ_COMMON +#endif /* CONFIG_SMP */
[PATCH 3/3] sched: Optimize !CONFIG_NO_HZ_COMMON cpu load updates
Some code in cpu load update only concern NO_HZ configs but it is built on all configurations. When NO_HZ isn't built, that code is harmless but just happens to take some useless ressources in CPU and memory: 1) one useless field in struct rq 2) jiffies record on every tick that is never used (cpu_load_update_periodic) 3) decay_load_missed is called two times on every tick to eventually return immediately with no action taken. And that function is dead code. For pure optimization purposes, lets conditionally build the NO_HZ related code. Cc: Byungchul Park Cc: Chris Metcalf Cc: Christoph Lameter Cc: Ingo Molnar Cc: Luiz Capitulino Cc: Mike Galbraith Cc: Paul E. McKenney Cc: Peter Zijlstra Cc: Rik van Riel Cc: Thomas Gleixner Signed-off-by: Frederic Weisbecker --- kernel/sched/core.c | 3 ++- kernel/sched/fair.c | 43 --- kernel/sched/sched.h | 6 -- 3 files changed, 38 insertions(+), 14 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 4c522a7..59a2821 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -7327,8 +7327,9 @@ void __init sched_init(void) for (j = 0; j < CPU_LOAD_IDX_MAX; j++) rq->cpu_load[j] = 0; - +#ifdef CONFIG_NO_HZ_COMMON rq->last_load_update_tick = jiffies; +#endif #ifdef CONFIG_SMP rq->sd = NULL; diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 1dd864d..4618e5b 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4423,6 +4423,7 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags) } #ifdef CONFIG_SMP +#ifdef CONFIG_NO_HZ_COMMON /* * per rq 'load' arrray crap; XXX kill this. @@ -4490,6 +4491,33 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx) return load; } +static unsigned long +cpu_load_update_missed(unsigned long old_load, unsigned long tickless_load, + unsigned long pending_updates, int idx) +{ + old_load = decay_load_missed(old_load, pending_updates - 1, idx); + if (tickless_load) { + old_load -= decay_load_missed(tickless_load, pending_updates - 1, idx); + /* +* old_load can never be a negative value because a +* decayed tickless_load cannot be greater than the +* original tickless_load. +*/ + old_load += tickless_load; + } + return old_load; +} +#else /* !CONFIG_NO_HZ_COMMON */ + +static inline unsigned long +cpu_load_update_missed(unsigned long old_load, unsigned long tickless_load, + unsigned long pending_updates, int idx) +{ + return old_load; +} + +#endif /* CONFIG_NO_HZ_COMMON */ + /** * __cpu_load_update - update the rq->cpu_load[] statistics * @this_rq: The rq to update statistics for @@ -4540,17 +4568,8 @@ static void cpu_load_update(struct rq *this_rq, unsigned long this_load, /* scale is effectively 1 << i now, and >> i divides by scale */ - old_load = this_rq->cpu_load[i]; - old_load = decay_load_missed(old_load, pending_updates - 1, i); - if (tickless_load) { - old_load -= decay_load_missed(tickless_load, pending_updates - 1, i); - /* -* old_load can never be a negative value because a -* decayed tickless_load cannot be greater than the -* original tickless_load. -*/ - old_load += tickless_load; - } + old_load = cpu_load_update_missed(this_rq->cpu_load[i], + tickless_load, pending_updates, i); new_load = this_load; /* * Round up the averaging division if load is increasing. This @@ -4661,8 +4680,10 @@ static inline void cpu_load_update_nohz(struct rq *this_rq, static void cpu_load_update_periodic(struct rq *this_rq, unsigned long load) { +#ifdef CONFIG_NO_HZ_COMMON /* See the mess around cpu_load_update_nohz(). */ this_rq->last_load_update_tick = READ_ONCE(jiffies); +#endif cpu_load_update(this_rq, load, 1); } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 1802013..2302bb6 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -585,11 +585,13 @@ struct rq { #endif #define CPU_LOAD_IDX_MAX 5 unsigned long cpu_load[CPU_LOAD_IDX_MAX]; +#ifdef CONFIG_NO_HZ_COMMON +#ifdef CONFIG_SMP unsigned long last_load_update_tick; -#ifdef CONFIG_NO_HZ_COMMON +#endif /* CONFIG_SMP */ u64 nohz_stamp; unsigned long nohz_flags; -#endif +#endif /* CONFIG_NO_HZ_COMMON */ #ifdef CONFIG_NO_HZ_FULL unsigned long last_sched_tick; #endif -- 2.7.0
[PATCH 2/3] sched: Correctly handle nohz ticks cpu load accounting
Ticks can happen while the CPU is in dynticks-idle or dynticks-singletask mode. In fact "nohz" or "dynticks" only mean that we exit the periodic mode and we try to minimize the ticks as much as possible. The nohz subsystem uses a confusing terminology with the internal state "ts->tick_stopped" which is also available through its public interface with tick_nohz_tick_stopped(). This is a misnomer as the tick is instead reduced with the best effort rather than stopped. In the best case the tick can indeed be actually stopped but there is no guarantee about that. If a timer needs to fire one second later, a tick will fire while the CPU is in nohz mode and this is a very common scenario. Now this confusion happens to be a problem with cpu load updates: cpu_load_update_active() doesn't handle nohz ticks correctly because it assumes that ticks are completely stopped in nohz mode and that cpu_load_update_active() can't be called in dynticks mode. When that happens, the whole previous tickless load is ignored and the function just records the load for the current tick, ignoring potentially long idle periods behind. In order to solve this, we could account the current load for the previous nohz time but there is a risk that we account the load of a task that got freshly enqueued for the whole nohz period. So instead, lets record the dynticks load on nohz frame entry so we know what to record in case of nohz ticks, then use this record to account the tickless load on nohz ticks and nohz frame end. Cc: Byungchul ParkCc: Chris Metcalf Cc: Christoph Lameter Cc: Ingo Molnar Cc: Luiz Capitulino Cc: Mike Galbraith Cc: Paul E. McKenney Cc: Peter Zijlstra Cc: Rik van Riel Cc: Thomas Gleixner Signed-off-by: Frederic Weisbecker --- include/linux/sched.h| 6 ++- kernel/sched/fair.c | 95 +++- kernel/time/tick-sched.c | 9 +++-- 3 files changed, 70 insertions(+), 40 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index aa5ee0d..07b1b1e 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -178,9 +178,11 @@ extern void get_iowait_load(unsigned long *nr_waiters, unsigned long *load); extern void calc_global_load(unsigned long ticks); #if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON) -extern void cpu_load_update_nohz(int active); +extern void cpu_load_update_nohz_start(void); +extern void cpu_load_update_nohz_stop(void); #else -static inline void cpu_load_update_nohz(int active) { } +static inline void cpu_load_update_nohz_start(void) { } +static inline void cpu_load_update_nohz_stop(void) { } #endif extern void dump_cpu_task(int cpu); diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index f33764d..1dd864d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4495,7 +4495,6 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx) * @this_rq: The rq to update statistics for * @this_load: The current load * @pending_updates: The number of missed updates - * @active: !0 for NOHZ_FULL * * Update rq->cpu_load[] statistics. This function is usually called every * scheduler tick (TICK_NSEC). @@ -4524,12 +4523,12 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx) * load[i]_n = (1 - 1/2^i)^n * load[i]_0 * * see decay_load_misses(). For NOHZ_FULL we get to subtract and add the extra - * term. See the @active paramter. + * term. */ -static void __cpu_load_update(struct rq *this_rq, unsigned long this_load, - unsigned long pending_updates, int active) +static void cpu_load_update(struct rq *this_rq, unsigned long this_load, + unsigned long pending_updates) { - unsigned long tickless_load = active ? this_rq->cpu_load[0] : 0; + unsigned long tickless_load = this_rq->cpu_load[0]; int i, scale; this_rq->nr_load_updates++; @@ -4574,10 +4573,23 @@ static unsigned long weighted_cpuload(const int cpu) } #ifdef CONFIG_NO_HZ_COMMON -static void __cpu_load_update_nohz(struct rq *this_rq, - unsigned long curr_jiffies, - unsigned long load, - int active) +/* + * There is no sane way to deal with nohz on smp when using jiffies because the + * cpu doing the jiffies update might drift wrt the cpu doing the jiffy reading + * causing off-by-one errors in observed deltas; {0,2} instead of {1,1}. + * + * Therefore we need to avoid the delta approach from the regular tick when + * possible since that would seriously skew the load calculation. This is why we + * use cpu_load_update_periodic() for CPUs out of nohz. However we'll rely on + * jiffies
[PATCH 0/3] sched: Fix/improve nohz cpu load updates v2
This series tries to fix issues against CFS cpu load accounting in nohz configs (both idle and full nohz). Some optimizations coming along. Following Peterz reviews, I've tried to improve the changelogs. Comments have been updated as well and some changes have been refactored. git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git sched/nohz-v2 HEAD: 55cb7c1a25acb140253ca4e55f8d43cd31404b55 Thanks, Frederic --- Frederic Weisbecker (3): sched: Gather cpu load functions under a more conventional namespace sched: Correctly handle nohz ticks cpu load accounting sched: Optimize !CONFIG_NO_HZ_COMMON cpu load updates Documentation/trace/ftrace.txt | 10 +-- include/linux/sched.h | 6 +- kernel/sched/core.c| 5 +- kernel/sched/fair.c| 146 +++-- kernel/sched/sched.h | 10 +-- kernel/time/tick-sched.c | 9 +-- 6 files changed, 120 insertions(+), 66 deletions(-)
[PATCH 2/3] sched: Correctly handle nohz ticks cpu load accounting
Ticks can happen while the CPU is in dynticks-idle or dynticks-singletask mode. In fact "nohz" or "dynticks" only mean that we exit the periodic mode and we try to minimize the ticks as much as possible. The nohz subsystem uses a confusing terminology with the internal state "ts->tick_stopped" which is also available through its public interface with tick_nohz_tick_stopped(). This is a misnomer as the tick is instead reduced with the best effort rather than stopped. In the best case the tick can indeed be actually stopped but there is no guarantee about that. If a timer needs to fire one second later, a tick will fire while the CPU is in nohz mode and this is a very common scenario. Now this confusion happens to be a problem with cpu load updates: cpu_load_update_active() doesn't handle nohz ticks correctly because it assumes that ticks are completely stopped in nohz mode and that cpu_load_update_active() can't be called in dynticks mode. When that happens, the whole previous tickless load is ignored and the function just records the load for the current tick, ignoring potentially long idle periods behind. In order to solve this, we could account the current load for the previous nohz time but there is a risk that we account the load of a task that got freshly enqueued for the whole nohz period. So instead, lets record the dynticks load on nohz frame entry so we know what to record in case of nohz ticks, then use this record to account the tickless load on nohz ticks and nohz frame end. Cc: Byungchul Park Cc: Chris Metcalf Cc: Christoph Lameter Cc: Ingo Molnar Cc: Luiz Capitulino Cc: Mike Galbraith Cc: Paul E. McKenney Cc: Peter Zijlstra Cc: Rik van Riel Cc: Thomas Gleixner Signed-off-by: Frederic Weisbecker --- include/linux/sched.h| 6 ++- kernel/sched/fair.c | 95 +++- kernel/time/tick-sched.c | 9 +++-- 3 files changed, 70 insertions(+), 40 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index aa5ee0d..07b1b1e 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -178,9 +178,11 @@ extern void get_iowait_load(unsigned long *nr_waiters, unsigned long *load); extern void calc_global_load(unsigned long ticks); #if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON) -extern void cpu_load_update_nohz(int active); +extern void cpu_load_update_nohz_start(void); +extern void cpu_load_update_nohz_stop(void); #else -static inline void cpu_load_update_nohz(int active) { } +static inline void cpu_load_update_nohz_start(void) { } +static inline void cpu_load_update_nohz_stop(void) { } #endif extern void dump_cpu_task(int cpu); diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index f33764d..1dd864d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4495,7 +4495,6 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx) * @this_rq: The rq to update statistics for * @this_load: The current load * @pending_updates: The number of missed updates - * @active: !0 for NOHZ_FULL * * Update rq->cpu_load[] statistics. This function is usually called every * scheduler tick (TICK_NSEC). @@ -4524,12 +4523,12 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx) * load[i]_n = (1 - 1/2^i)^n * load[i]_0 * * see decay_load_misses(). For NOHZ_FULL we get to subtract and add the extra - * term. See the @active paramter. + * term. */ -static void __cpu_load_update(struct rq *this_rq, unsigned long this_load, - unsigned long pending_updates, int active) +static void cpu_load_update(struct rq *this_rq, unsigned long this_load, + unsigned long pending_updates) { - unsigned long tickless_load = active ? this_rq->cpu_load[0] : 0; + unsigned long tickless_load = this_rq->cpu_load[0]; int i, scale; this_rq->nr_load_updates++; @@ -4574,10 +4573,23 @@ static unsigned long weighted_cpuload(const int cpu) } #ifdef CONFIG_NO_HZ_COMMON -static void __cpu_load_update_nohz(struct rq *this_rq, - unsigned long curr_jiffies, - unsigned long load, - int active) +/* + * There is no sane way to deal with nohz on smp when using jiffies because the + * cpu doing the jiffies update might drift wrt the cpu doing the jiffy reading + * causing off-by-one errors in observed deltas; {0,2} instead of {1,1}. + * + * Therefore we need to avoid the delta approach from the regular tick when + * possible since that would seriously skew the load calculation. This is why we + * use cpu_load_update_periodic() for CPUs out of nohz. However we'll rely on + * jiffies deltas for updates happening while in nohz mode (idle ticks, idle + * loop exit, nohz_idle_balance, nohz full exit...) + * + * This means we might still be one tick off for nohz periods. + */ + +static void
[PATCH 0/3] sched: Fix/improve nohz cpu load updates v2
This series tries to fix issues against CFS cpu load accounting in nohz configs (both idle and full nohz). Some optimizations coming along. Following Peterz reviews, I've tried to improve the changelogs. Comments have been updated as well and some changes have been refactored. git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git sched/nohz-v2 HEAD: 55cb7c1a25acb140253ca4e55f8d43cd31404b55 Thanks, Frederic --- Frederic Weisbecker (3): sched: Gather cpu load functions under a more conventional namespace sched: Correctly handle nohz ticks cpu load accounting sched: Optimize !CONFIG_NO_HZ_COMMON cpu load updates Documentation/trace/ftrace.txt | 10 +-- include/linux/sched.h | 6 +- kernel/sched/core.c| 5 +- kernel/sched/fair.c| 146 +++-- kernel/sched/sched.h | 10 +-- kernel/time/tick-sched.c | 9 +-- 6 files changed, 120 insertions(+), 66 deletions(-)
Re: [PATCH v2 net-next 00/10] allow bpf attach to tracepoints
Series applied, thanks Alexei.
Re: [PATCH v2 net-next 00/10] allow bpf attach to tracepoints
Series applied, thanks Alexei.
Re: [PATCH v11 44/60] PCI: Add alt_size ressource allocation support
On Thu, Apr 7, 2016 at 5:15 PM, Yinghai Luwrote: > On system with several pcie switches, BIOS allocate very tight resources > to the bridge bar, and it is not aligned to min_align as kernel allocation > code. Ok, this came in after I already replied to the other ones. I'm not excited about the whole "alternate aligment". Maybe the kernel should just accept the smaller alignment. If the minimum alignment we use is bigger than necessary, then we're just wrong about it, and perhaps we should just use the smaller alignment that the bios used. So instead of adding this notion of alternate alignment, maybe we should just not be so uptight about our own minimum alignment requirements? Linus
Re: [PATCH v11 44/60] PCI: Add alt_size ressource allocation support
On Thu, Apr 7, 2016 at 5:15 PM, Yinghai Lu wrote: > On system with several pcie switches, BIOS allocate very tight resources > to the bridge bar, and it is not aligned to min_align as kernel allocation > code. Ok, this came in after I already replied to the other ones. I'm not excited about the whole "alternate aligment". Maybe the kernel should just accept the smaller alignment. If the minimum alignment we use is bigger than necessary, then we're just wrong about it, and perhaps we should just use the smaller alignment that the bios used. So instead of adding this notion of alternate alignment, maybe we should just not be so uptight about our own minimum alignment requirements? Linus
Re: [PATCH V3] net: emac: emac gigabit ethernet controller driver
On Thu, Apr 07, 2016 at 04:43:47PM -0500, Timur Tabi wrote: > On my platform, firmware (UEFI) configures all of the GPIOs. I need > to get confirmation, but it appears that we don't actually make any > GPIO calls at all. I see code that looks like this: > > for (i = 0; (!adpt->no_mdio_gpio) && i < EMAC_NUM_GPIO; i++) { > gpio_info = >gpio_info[i]; > retval = of_get_named_gpio(node, gpio_info->name, 0); > if (retval < 0) > return retval; > > And on our ACPI system, adpt->no_mdio_gpio is always true: > > /* Assume GPIOs required for MDC/MDIO are enabled in firmware */ > adpt->no_mdio_gpio = true; There are two different things here. One is configuring the pin to be a GPIO. The second is using the GPIO as a GPIO. In this case, bit-banging the MDIO bus. The firmware could be doing the configuration, setting the pin as a GPIO. However, the firmware cannot be doing the MDIO bit-banging to make an MDIO bus available. Linux has to do that. Or it could be we have all completely misunderstood the hardware, and we are not doing bit-banging GPIO MDIO. There is a real MDIO controller there, we don't use these pins as GPIOs, etc Andrew
Re: [PATCH V3] net: emac: emac gigabit ethernet controller driver
On Thu, Apr 07, 2016 at 04:43:47PM -0500, Timur Tabi wrote: > On my platform, firmware (UEFI) configures all of the GPIOs. I need > to get confirmation, but it appears that we don't actually make any > GPIO calls at all. I see code that looks like this: > > for (i = 0; (!adpt->no_mdio_gpio) && i < EMAC_NUM_GPIO; i++) { > gpio_info = >gpio_info[i]; > retval = of_get_named_gpio(node, gpio_info->name, 0); > if (retval < 0) > return retval; > > And on our ACPI system, adpt->no_mdio_gpio is always true: > > /* Assume GPIOs required for MDC/MDIO are enabled in firmware */ > adpt->no_mdio_gpio = true; There are two different things here. One is configuring the pin to be a GPIO. The second is using the GPIO as a GPIO. In this case, bit-banging the MDIO bus. The firmware could be doing the configuration, setting the pin as a GPIO. However, the firmware cannot be doing the MDIO bit-banging to make an MDIO bus available. Linux has to do that. Or it could be we have all completely misunderstood the hardware, and we are not doing bit-banging GPIO MDIO. There is a real MDIO controller there, we don't use these pins as GPIOs, etc Andrew
Re: [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7
On Thu, Apr 7, 2016 at 5:15 PM, Yinghai Luwrote: > > After 5b28541552ef (PCI: Restrict 64-bit prefetchable bridge windows > to 64-bit resources), we have several reports on resource allocation > failure, and we try to fix the problem with resource clip, and find > more problems. Quite frankly, that commit 5b28541552ef is two years old by now, and went into 3.16. So whatever problems it caused are kind of water under the bridge. It worries me a *lot* when there is then a 60-patch series to fix those alleged problems, because my natural worry ends up being that the series is as likely to introduce new issues as it is to fix something that clearly people have been living with for a while now. I'm not saying that this series is bad, but I *am* saying that at this point, I'd much rather see this be done as much smaller series, and merged slowly and carefully. I'm not seeing a lot of people reviewing the code, but even *with* code review, things like "let's start allocating from the top of the resource" tends to make me really really nervous. Because those kinds of patches tend to show problems even if the code itself was perfectly bug-free, just because it changes some layout issue and hits some hardware weakness or undocumented resource allocation issue. Using tricks like a __weak pcibios_align_end_resource() to only trigger on one single architecture (despite the naming) makes those things even subtler. So please, try to split this up sanely, and let's merge it in sane pieces. I see that you have that M7101 quirk removal randomly in the middle of this series, for example, and it doesn't seem to be the only such random patch. That's entirely independent of all the other patches in the series (and I thought I acked it already, but whatever). Put another way: this is less of a real targeted series, and more of a random collection of patches. Very few people have the background to review them, and basically nobody has the ability to test them (although _individual_ parts of it are obviously testable). I'd realyl like to see the misc per-device ones separated, for example. Same for the pure cleanup ones that obviously don't change any actual semantics. There's a number of those there. And then the ones that do change real and generic pci allocation code need to be done in smaller batches so that you don't have "everything changes at once". I tried to scan the patches, and I didn't find anything actually _wrong_. Several looked like "that's an obvious improvement". But several looked fairly complex, and all _together_ just looked pretty scary. Linus
Re: [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7
On Thu, Apr 7, 2016 at 5:15 PM, Yinghai Lu wrote: > > After 5b28541552ef (PCI: Restrict 64-bit prefetchable bridge windows > to 64-bit resources), we have several reports on resource allocation > failure, and we try to fix the problem with resource clip, and find > more problems. Quite frankly, that commit 5b28541552ef is two years old by now, and went into 3.16. So whatever problems it caused are kind of water under the bridge. It worries me a *lot* when there is then a 60-patch series to fix those alleged problems, because my natural worry ends up being that the series is as likely to introduce new issues as it is to fix something that clearly people have been living with for a while now. I'm not saying that this series is bad, but I *am* saying that at this point, I'd much rather see this be done as much smaller series, and merged slowly and carefully. I'm not seeing a lot of people reviewing the code, but even *with* code review, things like "let's start allocating from the top of the resource" tends to make me really really nervous. Because those kinds of patches tend to show problems even if the code itself was perfectly bug-free, just because it changes some layout issue and hits some hardware weakness or undocumented resource allocation issue. Using tricks like a __weak pcibios_align_end_resource() to only trigger on one single architecture (despite the naming) makes those things even subtler. So please, try to split this up sanely, and let's merge it in sane pieces. I see that you have that M7101 quirk removal randomly in the middle of this series, for example, and it doesn't seem to be the only such random patch. That's entirely independent of all the other patches in the series (and I thought I acked it already, but whatever). Put another way: this is less of a real targeted series, and more of a random collection of patches. Very few people have the background to review them, and basically nobody has the ability to test them (although _individual_ parts of it are obviously testable). I'd realyl like to see the misc per-device ones separated, for example. Same for the pure cleanup ones that obviously don't change any actual semantics. There's a number of those there. And then the ones that do change real and generic pci allocation code need to be done in smaller batches so that you don't have "everything changes at once". I tried to scan the patches, and I didn't find anything actually _wrong_. Several looked like "that's an obvious improvement". But several looked fairly complex, and all _together_ just looked pretty scary. Linus
Re: [PATCH 04/10] iio: stx104: Change STX104 dependency to ISA_BUS
On Thu, Apr 07, 2016 at 10:47:25AM -0400, William Breathitt Gray wrote: > The Apex Embedded Systems STX104 may be used on 64-bit X86 systems. This > patch allows the Apex Embedded Systems STX104 DAC driver to be compiled > for both 32-bit and 64-bit X86 systems. > > Signed-off-by: William Breathitt Gray> --- > drivers/iio/dac/Kconfig | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/iio/dac/Kconfig b/drivers/iio/dac/Kconfig > index a995139..df4b55d 100644 > --- a/drivers/iio/dac/Kconfig > +++ b/drivers/iio/dac/Kconfig > @@ -210,7 +210,7 @@ config MCP4922 > > config STX104 > tristate "Apex Embedded Systems STX104 DAC driver" > - depends on ISA > + depends on X86 && ISA_BUS This means for this and other similar drivers that the driver is no longer supported on architectures which support ISA but not the newly introduced ISA_BUS. Affected architectures are alpha, arm, m32r, m68k, mips, powerpc, and parisc. A typical example is SCSI_AHA1542, which is no longer supported on those architectures. It builds, but isa_register_driver() will be a dummy and fail. Actually, this is true for _all_ drivers calling isa_register_driver(). I hope this is understood and doesn't cause any problems. Thanks, Guenter > help > Say yes here to build support for the 2-channel DAC on the Apex > Embedded Systems STX104 integrated analog PC/104 card. The base port > -- > 2.7.3 >
Re: [PATCH 04/10] iio: stx104: Change STX104 dependency to ISA_BUS
On Thu, Apr 07, 2016 at 10:47:25AM -0400, William Breathitt Gray wrote: > The Apex Embedded Systems STX104 may be used on 64-bit X86 systems. This > patch allows the Apex Embedded Systems STX104 DAC driver to be compiled > for both 32-bit and 64-bit X86 systems. > > Signed-off-by: William Breathitt Gray > --- > drivers/iio/dac/Kconfig | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/iio/dac/Kconfig b/drivers/iio/dac/Kconfig > index a995139..df4b55d 100644 > --- a/drivers/iio/dac/Kconfig > +++ b/drivers/iio/dac/Kconfig > @@ -210,7 +210,7 @@ config MCP4922 > > config STX104 > tristate "Apex Embedded Systems STX104 DAC driver" > - depends on ISA > + depends on X86 && ISA_BUS This means for this and other similar drivers that the driver is no longer supported on architectures which support ISA but not the newly introduced ISA_BUS. Affected architectures are alpha, arm, m32r, m68k, mips, powerpc, and parisc. A typical example is SCSI_AHA1542, which is no longer supported on those architectures. It builds, but isa_register_driver() will be a dummy and fail. Actually, this is true for _all_ drivers calling isa_register_driver(). I hope this is understood and doesn't cause any problems. Thanks, Guenter > help > Say yes here to build support for the 2-channel DAC on the Apex > Embedded Systems STX104 integrated analog PC/104 card. The base port > -- > 2.7.3 >
[PATCH v11 05/60] sparc/PCI: Reserve legacy mmio after PCI mmio
On one system found bunch of claim resource fail from pci device. pci_sun4v f02b894c: PCI host bridge to bus :00 pci_bus :00: root bus resource [io 0x2007e-0x2007e0fff] (bus address [0x-0xfff]) pci_bus :00: root bus resource [mem 0x2-0x27eff] (bus address [0x-0x7eff]) pci_bus :00: root bus resource [mem 0x20001-0x20007] (bus address [0x1-0x7]) ... PCI: Claiming :00:02.0: Resource 14: 0002..0002004f [200] pci :00:02.0: can't claim BAR 14 [mem 0x2-0x2004f]: address conflict with Video RAM area [??? 0x2000a-0x2000b flags 0x8000] pci :02:00.0: can't claim BAR 0 [mem 0x2-0x2000f]: no compatible bridge window PCI: Claiming :02:00.0: Resource 3: 00020010..000200103fff [200] pci :02:00.0: can't claim BAR 3 [mem 0x20010-0x200103fff]: no compatible bridge window PCI: Claiming :02:00.1: Resource 0: 00020020..0002002f [200] pci :02:00.1: can't claim BAR 0 [mem 0x20020-0x2002f]: no compatible bridge window PCI: Claiming :02:00.1: Resource 3: 000200104000..000200107fff [200] pci :02:00.1: can't claim BAR 3 [mem 0x200104000-0x200107fff]: no compatible bridge window PCI: Claiming :02:00.2: Resource 0: 00020030..0002003f [200] pci :02:00.2: can't claim BAR 0 [mem 0x20030-0x2003f]: no compatible bridge window PCI: Claiming :02:00.2: Resource 3: 000200108000..00020010bfff [200] pci :02:00.2: can't claim BAR 3 [mem 0x200108000-0x20010bfff]: no compatible bridge window PCI: Claiming :02:00.3: Resource 0: 00020040..0002004f [200] pci :02:00.3: can't claim BAR 0 [mem 0x20040-0x2004f]: no compatible bridge window PCI: Claiming :02:00.3: Resource 3: 00020010c000..00020010 [200] pci :02:00.3: can't claim BAR 3 [mem 0x20010c000-0x20010]: no compatible bridge window The bridge 00:02.0 resource does not get reserved as Video RAM take the position early, and following children resources reservation all fail. Move down Video RAM area reservation after pci mmio get reserved, so we leave pci driver to use those regions. -v5: merge simplify one and use pcibios_bus_to_resource() -v6: use pci_find_bus_resource() Signed-off-by: Yinghai LuTested-by: Khalid Aziz --- arch/sparc/kernel/pci.c| 1 + arch/sparc/kernel/pci_common.c | 59 ++ arch/sparc/kernel/pci_impl.h | 1 + 3 files changed, 33 insertions(+), 28 deletions(-) diff --git a/arch/sparc/kernel/pci.c b/arch/sparc/kernel/pci.c index 4606dc1..9c6daad 100644 --- a/arch/sparc/kernel/pci.c +++ b/arch/sparc/kernel/pci.c @@ -677,6 +677,7 @@ struct pci_bus *pci_scan_one_pbm(struct pci_pbm_info *pbm, pci_bus_register_of_sysfs(bus); pci_claim_bus_resources(bus); + pci_register_legacy_regions(bus); pci_bus_add_devices(bus); return bus; } diff --git a/arch/sparc/kernel/pci_common.c b/arch/sparc/kernel/pci_common.c index 76998f8..1ebc7ff 100644 --- a/arch/sparc/kernel/pci_common.c +++ b/arch/sparc/kernel/pci_common.c @@ -328,41 +328,46 @@ void pci_get_pbm_props(struct pci_pbm_info *pbm) } } -static void pci_register_legacy_regions(struct resource *io_res, - struct resource *mem_res) +static void pci_register_region(struct pci_bus *bus, const char *name, + resource_size_t rstart, resource_size_t size) { - struct resource *p; + struct resource *res, *conflict, *bus_res; + struct pci_bus_region region; - /* VGA Video RAM. */ - p = kzalloc(sizeof(*p), GFP_KERNEL); - if (!p) + res = kzalloc(sizeof(*res), GFP_KERNEL); + if (!res) return; - p->name = "Video RAM area"; - p->start = mem_res->start + 0xaUL; - p->end = p->start + 0x1UL; - p->flags = IORESOURCE_BUSY; - request_resource(mem_res, p); + res->flags = IORESOURCE_MEM; - p = kzalloc(sizeof(*p), GFP_KERNEL); - if (!p) + region.start = rstart; + region.end = rstart + size - 1UL; + pcibios_bus_to_resource(bus, res, ); + bus_res = pci_find_bus_resource(bus, res); + if (!bus_res) { + kfree(res); return; + } + + res->name = name; + res->flags |= IORESOURCE_BUSY; + conflict = request_resource_conflict(bus_res, res); + if (conflict) { + dev_printk(KERN_DEBUG, >dev, + " can't claim %s %pR: address conflict with %s %pR\n", + res->name, res, conflict->name, conflict); + kfree(res); + } +} - p->name = "System
[PATCH v11 05/60] sparc/PCI: Reserve legacy mmio after PCI mmio
On one system found bunch of claim resource fail from pci device. pci_sun4v f02b894c: PCI host bridge to bus :00 pci_bus :00: root bus resource [io 0x2007e-0x2007e0fff] (bus address [0x-0xfff]) pci_bus :00: root bus resource [mem 0x2-0x27eff] (bus address [0x-0x7eff]) pci_bus :00: root bus resource [mem 0x20001-0x20007] (bus address [0x1-0x7]) ... PCI: Claiming :00:02.0: Resource 14: 0002..0002004f [200] pci :00:02.0: can't claim BAR 14 [mem 0x2-0x2004f]: address conflict with Video RAM area [??? 0x2000a-0x2000b flags 0x8000] pci :02:00.0: can't claim BAR 0 [mem 0x2-0x2000f]: no compatible bridge window PCI: Claiming :02:00.0: Resource 3: 00020010..000200103fff [200] pci :02:00.0: can't claim BAR 3 [mem 0x20010-0x200103fff]: no compatible bridge window PCI: Claiming :02:00.1: Resource 0: 00020020..0002002f [200] pci :02:00.1: can't claim BAR 0 [mem 0x20020-0x2002f]: no compatible bridge window PCI: Claiming :02:00.1: Resource 3: 000200104000..000200107fff [200] pci :02:00.1: can't claim BAR 3 [mem 0x200104000-0x200107fff]: no compatible bridge window PCI: Claiming :02:00.2: Resource 0: 00020030..0002003f [200] pci :02:00.2: can't claim BAR 0 [mem 0x20030-0x2003f]: no compatible bridge window PCI: Claiming :02:00.2: Resource 3: 000200108000..00020010bfff [200] pci :02:00.2: can't claim BAR 3 [mem 0x200108000-0x20010bfff]: no compatible bridge window PCI: Claiming :02:00.3: Resource 0: 00020040..0002004f [200] pci :02:00.3: can't claim BAR 0 [mem 0x20040-0x2004f]: no compatible bridge window PCI: Claiming :02:00.3: Resource 3: 00020010c000..00020010 [200] pci :02:00.3: can't claim BAR 3 [mem 0x20010c000-0x20010]: no compatible bridge window The bridge 00:02.0 resource does not get reserved as Video RAM take the position early, and following children resources reservation all fail. Move down Video RAM area reservation after pci mmio get reserved, so we leave pci driver to use those regions. -v5: merge simplify one and use pcibios_bus_to_resource() -v6: use pci_find_bus_resource() Signed-off-by: Yinghai Lu Tested-by: Khalid Aziz --- arch/sparc/kernel/pci.c| 1 + arch/sparc/kernel/pci_common.c | 59 ++ arch/sparc/kernel/pci_impl.h | 1 + 3 files changed, 33 insertions(+), 28 deletions(-) diff --git a/arch/sparc/kernel/pci.c b/arch/sparc/kernel/pci.c index 4606dc1..9c6daad 100644 --- a/arch/sparc/kernel/pci.c +++ b/arch/sparc/kernel/pci.c @@ -677,6 +677,7 @@ struct pci_bus *pci_scan_one_pbm(struct pci_pbm_info *pbm, pci_bus_register_of_sysfs(bus); pci_claim_bus_resources(bus); + pci_register_legacy_regions(bus); pci_bus_add_devices(bus); return bus; } diff --git a/arch/sparc/kernel/pci_common.c b/arch/sparc/kernel/pci_common.c index 76998f8..1ebc7ff 100644 --- a/arch/sparc/kernel/pci_common.c +++ b/arch/sparc/kernel/pci_common.c @@ -328,41 +328,46 @@ void pci_get_pbm_props(struct pci_pbm_info *pbm) } } -static void pci_register_legacy_regions(struct resource *io_res, - struct resource *mem_res) +static void pci_register_region(struct pci_bus *bus, const char *name, + resource_size_t rstart, resource_size_t size) { - struct resource *p; + struct resource *res, *conflict, *bus_res; + struct pci_bus_region region; - /* VGA Video RAM. */ - p = kzalloc(sizeof(*p), GFP_KERNEL); - if (!p) + res = kzalloc(sizeof(*res), GFP_KERNEL); + if (!res) return; - p->name = "Video RAM area"; - p->start = mem_res->start + 0xaUL; - p->end = p->start + 0x1UL; - p->flags = IORESOURCE_BUSY; - request_resource(mem_res, p); + res->flags = IORESOURCE_MEM; - p = kzalloc(sizeof(*p), GFP_KERNEL); - if (!p) + region.start = rstart; + region.end = rstart + size - 1UL; + pcibios_bus_to_resource(bus, res, ); + bus_res = pci_find_bus_resource(bus, res); + if (!bus_res) { + kfree(res); return; + } + + res->name = name; + res->flags |= IORESOURCE_BUSY; + conflict = request_resource_conflict(bus_res, res); + if (conflict) { + dev_printk(KERN_DEBUG, >dev, + " can't claim %s %pR: address conflict with %s %pR\n", + res->name, res, conflict->name, conflict); + kfree(res); + } +} - p->name = "System ROM"; - p->start = mem_res->start +
Re: [PATCH 06/10] watchdog: ebc-c384_wdt: Utilize the ISA bus driver
On Thu, Apr 07, 2016 at 10:47:27AM -0400, William Breathitt Gray wrote: > The WinSystems EBC-C384 watchdog timer is controlled via ISA bus > communication. As such, the ISA bus driver is more appropriate than the > platform driver for the WinSystems EBC-C384 watchdog timer driver. > > Signed-off-by: William Breathitt Gray> --- > drivers/watchdog/Kconfig| 2 +- > drivers/watchdog/ebc-c384_wdt.c | 43 > ++--- > 2 files changed, 11 insertions(+), 34 deletions(-) > > diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig > index fb94765..b10761d 100644 > --- a/drivers/watchdog/Kconfig > +++ b/drivers/watchdog/Kconfig > @@ -738,7 +738,7 @@ config ALIM7101_WDT > > config EBC_C384_WDT > tristate "WinSystems EBC-C384 Watchdog Timer" > - depends on X86 > + depends on X86 && ISA_BUS I am a bit concerend that the newly introduced ISA_BUS is not automatically enabled. Effectively this means that all drivers depending on it will be disabled until someone enables ISA_BUS in the distribution. Is this a concern for anyone but me ? Anyway, since you are the driver maintainer, I assume that you are ok with it, so Acked-by: Guenter Roeck Side note for Wim: ISA_BUS was introduced with commit b3c1be1b789c ("base: isa: Remove X86_32 dependency") in -next. Guenter > select WATCHDOG_CORE > help > Enables watchdog timer support for the watchdog timer on the > diff --git a/drivers/watchdog/ebc-c384_wdt.c b/drivers/watchdog/ebc-c384_wdt.c > index 77fda0b..4b849b8 100644 > --- a/drivers/watchdog/ebc-c384_wdt.c > +++ b/drivers/watchdog/ebc-c384_wdt.c > @@ -16,10 +16,10 @@ > #include > #include > #include > +#include > #include > #include > #include > -#include > #include > #include > > @@ -95,9 +95,8 @@ static const struct watchdog_info ebc_c384_wdt_info = { > .identity = MODULE_NAME > }; > > -static int __init ebc_c384_wdt_probe(struct platform_device *pdev) > +static int ebc_c384_wdt_probe(struct device *dev, unsigned int id) > { > - struct device *dev = >dev; > struct watchdog_device *wdd; > > if (!devm_request_region(dev, BASE_ADDR, ADDR_EXTENT, dev_name(dev))) { > @@ -122,61 +121,39 @@ static int __init ebc_c384_wdt_probe(struct > platform_device *pdev) > dev_warn(dev, "Invalid timeout (%u seconds), using default (%u > seconds)\n", > timeout, WATCHDOG_TIMEOUT); > > - platform_set_drvdata(pdev, wdd); > + dev_set_drvdata(dev, wdd); > > return watchdog_register_device(wdd); > } > > -static int ebc_c384_wdt_remove(struct platform_device *pdev) > +static int ebc_c384_wdt_remove(struct device *dev, unsigned int id) > { > - struct watchdog_device *wdd = platform_get_drvdata(pdev); > + struct watchdog_device *wdd = dev_get_drvdata(dev); > > watchdog_unregister_device(wdd); > > return 0; > } > > -static struct platform_driver ebc_c384_wdt_driver = { > +static struct isa_driver ebc_c384_wdt_driver = { > + .probe = ebc_c384_wdt_probe, > .driver = { > .name = MODULE_NAME > }, > .remove = ebc_c384_wdt_remove > }; > > -static struct platform_device *ebc_c384_wdt_device; > - > static int __init ebc_c384_wdt_init(void) > { > - int err; > - > if (!dmi_match(DMI_BOARD_NAME, "EBC-C384 SBC")) > return -ENODEV; > > - ebc_c384_wdt_device = platform_device_alloc(MODULE_NAME, -1); > - if (!ebc_c384_wdt_device) > - return -ENOMEM; > - > - err = platform_device_add(ebc_c384_wdt_device); > - if (err) > - goto err_platform_device; > - > - err = platform_driver_probe(_c384_wdt_driver, ebc_c384_wdt_probe); > - if (err) > - goto err_platform_driver; > - > - return 0; > - > -err_platform_driver: > - platform_device_del(ebc_c384_wdt_device); > -err_platform_device: > - platform_device_put(ebc_c384_wdt_device); > - return err; > + return isa_register_driver(_c384_wdt_driver, 1); > } > > static void __exit ebc_c384_wdt_exit(void) > { > - platform_device_unregister(ebc_c384_wdt_device); > - platform_driver_unregister(_c384_wdt_driver); > + isa_unregister_driver(_c384_wdt_driver); > } > > module_init(ebc_c384_wdt_init); > @@ -185,4 +162,4 @@ module_exit(ebc_c384_wdt_exit); > MODULE_AUTHOR("William Breathitt Gray "); > MODULE_DESCRIPTION("WinSystems EBC-C384 watchdog timer driver"); > MODULE_LICENSE("GPL v2"); > -MODULE_ALIAS("platform:" MODULE_NAME); > +MODULE_ALIAS("isa:" MODULE_NAME); > -- > 2.7.3 >
[PATCH v11 44/60] PCI: Add alt_size ressource allocation support
On system with several pcie switches, BIOS allocate very tight resources to the bridge bar, and it is not aligned to min_align as kernel allocation code. For example: 02:03.0---0c:00.0---0d:04.0---18:00.0 18:00.0 need 0x1000, and 0x0001. BIOS only allocate 0x1010 to 0d:04.0 and above bridges. Later after using /sys/bus/pci/devices/:0c:00.0/remove to remove 0c:00.0, rescan with /sys/bus/pci/rescan can not allocate 0x1800 to 0c:00.0. as current min_align solution will need 0x1800. Another example: 00:1c.0---02:00.0---03:01.0---04:00.0---05:19.0---06:00.0 06:00.0 need 0x400 and 0x80. BIOS only allocate 0x480 to 05:19.0 and 04:00.0. when 05:19.0 get removed via /sys/bus/pci/devices/:05:19.0/remove, rescan with /sys/bus/pci/rescan will fail. pci :05:19.0: BAR 14: no space for [mem size 0x0600] pci :05:19.0: BAR 14: failed to assign [mem size 0x0600] pci :06:00.0: BAR 2: no space for [mem size 0x0400 64bit] pci :06:00.0: BAR 2: failed to assign [mem size 0x0400 64bit] pci :06:00.0: BAR 0: no space for [mem size 0x0080] pci :06:00.0: BAR 0: failed to assign [mem size 0x0080] current code try to use align 0x200 and size 0x600, but parent bridge only have 0x480. Introduce alt_align/alt_size and store them in realloc list in addition to addon info, and will try it after min_align/min_size allocation fails. The alt_align is max_align, and alt_size is aligned size with bridge minimum window alignment. On my test setup: 00:1c.7---61:00.0---62:00.0 62:00.0 needs 0x80 and 0x2, and 00:1c.7 only have 9M allocated for mmio, with this patch we have pci :61:00.0: bridge window [mem 0x0040-0x00ff] to [bus 62] add_size 0 add_align 0 alt_size 90 alt_align 80 req_size c0 req_align 40 pci :61:00.0: BAR 14: no space for [mem size 0x00c0] pci :61:00.0: BAR 14: failed to assign [mem size 0x00c0] pci :61:00.0: BAR 14: assigned [mem 0xdf00-0xdf8f] pci :62:00.0: BAR 0: assigned [mem 0xdf00-0xdf7f pref] pci :62:00.0: BAR 1: assigned [mem 0xdf80-0xdf81] pci :61:00.0: PCI bridge to [bus 62] pci :61:00.0: bridge window [io 0x6000-0x6fff] pci :61:00.0: bridge window [mem 0xdf00-0xdf8f] pci :00:1c.7: PCI bridge to [bus 61-68] pci :00:1c.7: bridge window [io 0x6000-0x6fff] pci :00:1c.7: bridge window [mem 0xdf00-0xdf8f] So for 61:00.0 first try with 12M fails, and second try with 9M the alt_size works. Later 62:00.0 get correct resource allocated too. Link: https://bugzilla.kernel.org/show_bug.cgi?id=100451 Reported-by: Yijing WangTested-by: Yijing Wang Signed-off-by: Yinghai Lu --- drivers/pci/setup-bus.c | 203 +--- 1 file changed, 191 insertions(+), 12 deletions(-) diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c index 6c58b4a..f3e9873 100644 --- a/drivers/pci/setup-bus.c +++ b/drivers/pci/setup-bus.c @@ -322,7 +322,7 @@ static void reassign_resources_sorted(struct list_head *realloc_head, { struct resource *res; struct pci_dev_resource *add_res, *tmp; - resource_size_t add_size, align; + resource_size_t add_size, align, r_size; int idx; list_for_each_entry_safe(add_res, tmp, realloc_head, list) { @@ -338,12 +338,23 @@ static void reassign_resources_sorted(struct list_head *realloc_head, idx = res - _res->dev->resource[0]; add_size = add_res->add_size; align = add_res->min_align; - if (!resource_size(res)) { + if (!add_size || !align) /* alt_size only */ + goto out; + + r_size = resource_size(res); + if (!r_size) { res->start = align; res->end = res->start + add_size - 1; if (pci_assign_resource(add_res->dev, idx)) reset_resource(res); } else { + /* could just assigned with alt, add difference ? */ + resource_size_t size; + + size = add_res->end - add_res->start + 1; + if (r_size < size) + add_size += size - r_size; + res->flags |= add_res->flags & (IORESOURCE_STARTALIGN|IORESOURCE_SIZEALIGN); if (pci_reassign_resource(add_res->dev, idx, @@ -582,6 +593,104 @@ static bool __assign_resources_required_optional_sorted(struct list_head *head, return false; } +static bool __has_alt(struct list_head *head, + struct list_head *realloc_head) +{ + int alt_count = 0; + struct pci_dev_resource *dev_res, *alt_res; + +
Re: [PATCH 06/10] watchdog: ebc-c384_wdt: Utilize the ISA bus driver
On Thu, Apr 07, 2016 at 10:47:27AM -0400, William Breathitt Gray wrote: > The WinSystems EBC-C384 watchdog timer is controlled via ISA bus > communication. As such, the ISA bus driver is more appropriate than the > platform driver for the WinSystems EBC-C384 watchdog timer driver. > > Signed-off-by: William Breathitt Gray > --- > drivers/watchdog/Kconfig| 2 +- > drivers/watchdog/ebc-c384_wdt.c | 43 > ++--- > 2 files changed, 11 insertions(+), 34 deletions(-) > > diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig > index fb94765..b10761d 100644 > --- a/drivers/watchdog/Kconfig > +++ b/drivers/watchdog/Kconfig > @@ -738,7 +738,7 @@ config ALIM7101_WDT > > config EBC_C384_WDT > tristate "WinSystems EBC-C384 Watchdog Timer" > - depends on X86 > + depends on X86 && ISA_BUS I am a bit concerend that the newly introduced ISA_BUS is not automatically enabled. Effectively this means that all drivers depending on it will be disabled until someone enables ISA_BUS in the distribution. Is this a concern for anyone but me ? Anyway, since you are the driver maintainer, I assume that you are ok with it, so Acked-by: Guenter Roeck Side note for Wim: ISA_BUS was introduced with commit b3c1be1b789c ("base: isa: Remove X86_32 dependency") in -next. Guenter > select WATCHDOG_CORE > help > Enables watchdog timer support for the watchdog timer on the > diff --git a/drivers/watchdog/ebc-c384_wdt.c b/drivers/watchdog/ebc-c384_wdt.c > index 77fda0b..4b849b8 100644 > --- a/drivers/watchdog/ebc-c384_wdt.c > +++ b/drivers/watchdog/ebc-c384_wdt.c > @@ -16,10 +16,10 @@ > #include > #include > #include > +#include > #include > #include > #include > -#include > #include > #include > > @@ -95,9 +95,8 @@ static const struct watchdog_info ebc_c384_wdt_info = { > .identity = MODULE_NAME > }; > > -static int __init ebc_c384_wdt_probe(struct platform_device *pdev) > +static int ebc_c384_wdt_probe(struct device *dev, unsigned int id) > { > - struct device *dev = >dev; > struct watchdog_device *wdd; > > if (!devm_request_region(dev, BASE_ADDR, ADDR_EXTENT, dev_name(dev))) { > @@ -122,61 +121,39 @@ static int __init ebc_c384_wdt_probe(struct > platform_device *pdev) > dev_warn(dev, "Invalid timeout (%u seconds), using default (%u > seconds)\n", > timeout, WATCHDOG_TIMEOUT); > > - platform_set_drvdata(pdev, wdd); > + dev_set_drvdata(dev, wdd); > > return watchdog_register_device(wdd); > } > > -static int ebc_c384_wdt_remove(struct platform_device *pdev) > +static int ebc_c384_wdt_remove(struct device *dev, unsigned int id) > { > - struct watchdog_device *wdd = platform_get_drvdata(pdev); > + struct watchdog_device *wdd = dev_get_drvdata(dev); > > watchdog_unregister_device(wdd); > > return 0; > } > > -static struct platform_driver ebc_c384_wdt_driver = { > +static struct isa_driver ebc_c384_wdt_driver = { > + .probe = ebc_c384_wdt_probe, > .driver = { > .name = MODULE_NAME > }, > .remove = ebc_c384_wdt_remove > }; > > -static struct platform_device *ebc_c384_wdt_device; > - > static int __init ebc_c384_wdt_init(void) > { > - int err; > - > if (!dmi_match(DMI_BOARD_NAME, "EBC-C384 SBC")) > return -ENODEV; > > - ebc_c384_wdt_device = platform_device_alloc(MODULE_NAME, -1); > - if (!ebc_c384_wdt_device) > - return -ENOMEM; > - > - err = platform_device_add(ebc_c384_wdt_device); > - if (err) > - goto err_platform_device; > - > - err = platform_driver_probe(_c384_wdt_driver, ebc_c384_wdt_probe); > - if (err) > - goto err_platform_driver; > - > - return 0; > - > -err_platform_driver: > - platform_device_del(ebc_c384_wdt_device); > -err_platform_device: > - platform_device_put(ebc_c384_wdt_device); > - return err; > + return isa_register_driver(_c384_wdt_driver, 1); > } > > static void __exit ebc_c384_wdt_exit(void) > { > - platform_device_unregister(ebc_c384_wdt_device); > - platform_driver_unregister(_c384_wdt_driver); > + isa_unregister_driver(_c384_wdt_driver); > } > > module_init(ebc_c384_wdt_init); > @@ -185,4 +162,4 @@ module_exit(ebc_c384_wdt_exit); > MODULE_AUTHOR("William Breathitt Gray "); > MODULE_DESCRIPTION("WinSystems EBC-C384 watchdog timer driver"); > MODULE_LICENSE("GPL v2"); > -MODULE_ALIAS("platform:" MODULE_NAME); > +MODULE_ALIAS("isa:" MODULE_NAME); > -- > 2.7.3 >
[PATCH v11 44/60] PCI: Add alt_size ressource allocation support
On system with several pcie switches, BIOS allocate very tight resources to the bridge bar, and it is not aligned to min_align as kernel allocation code. For example: 02:03.0---0c:00.0---0d:04.0---18:00.0 18:00.0 need 0x1000, and 0x0001. BIOS only allocate 0x1010 to 0d:04.0 and above bridges. Later after using /sys/bus/pci/devices/:0c:00.0/remove to remove 0c:00.0, rescan with /sys/bus/pci/rescan can not allocate 0x1800 to 0c:00.0. as current min_align solution will need 0x1800. Another example: 00:1c.0---02:00.0---03:01.0---04:00.0---05:19.0---06:00.0 06:00.0 need 0x400 and 0x80. BIOS only allocate 0x480 to 05:19.0 and 04:00.0. when 05:19.0 get removed via /sys/bus/pci/devices/:05:19.0/remove, rescan with /sys/bus/pci/rescan will fail. pci :05:19.0: BAR 14: no space for [mem size 0x0600] pci :05:19.0: BAR 14: failed to assign [mem size 0x0600] pci :06:00.0: BAR 2: no space for [mem size 0x0400 64bit] pci :06:00.0: BAR 2: failed to assign [mem size 0x0400 64bit] pci :06:00.0: BAR 0: no space for [mem size 0x0080] pci :06:00.0: BAR 0: failed to assign [mem size 0x0080] current code try to use align 0x200 and size 0x600, but parent bridge only have 0x480. Introduce alt_align/alt_size and store them in realloc list in addition to addon info, and will try it after min_align/min_size allocation fails. The alt_align is max_align, and alt_size is aligned size with bridge minimum window alignment. On my test setup: 00:1c.7---61:00.0---62:00.0 62:00.0 needs 0x80 and 0x2, and 00:1c.7 only have 9M allocated for mmio, with this patch we have pci :61:00.0: bridge window [mem 0x0040-0x00ff] to [bus 62] add_size 0 add_align 0 alt_size 90 alt_align 80 req_size c0 req_align 40 pci :61:00.0: BAR 14: no space for [mem size 0x00c0] pci :61:00.0: BAR 14: failed to assign [mem size 0x00c0] pci :61:00.0: BAR 14: assigned [mem 0xdf00-0xdf8f] pci :62:00.0: BAR 0: assigned [mem 0xdf00-0xdf7f pref] pci :62:00.0: BAR 1: assigned [mem 0xdf80-0xdf81] pci :61:00.0: PCI bridge to [bus 62] pci :61:00.0: bridge window [io 0x6000-0x6fff] pci :61:00.0: bridge window [mem 0xdf00-0xdf8f] pci :00:1c.7: PCI bridge to [bus 61-68] pci :00:1c.7: bridge window [io 0x6000-0x6fff] pci :00:1c.7: bridge window [mem 0xdf00-0xdf8f] So for 61:00.0 first try with 12M fails, and second try with 9M the alt_size works. Later 62:00.0 get correct resource allocated too. Link: https://bugzilla.kernel.org/show_bug.cgi?id=100451 Reported-by: Yijing Wang Tested-by: Yijing Wang Signed-off-by: Yinghai Lu --- drivers/pci/setup-bus.c | 203 +--- 1 file changed, 191 insertions(+), 12 deletions(-) diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c index 6c58b4a..f3e9873 100644 --- a/drivers/pci/setup-bus.c +++ b/drivers/pci/setup-bus.c @@ -322,7 +322,7 @@ static void reassign_resources_sorted(struct list_head *realloc_head, { struct resource *res; struct pci_dev_resource *add_res, *tmp; - resource_size_t add_size, align; + resource_size_t add_size, align, r_size; int idx; list_for_each_entry_safe(add_res, tmp, realloc_head, list) { @@ -338,12 +338,23 @@ static void reassign_resources_sorted(struct list_head *realloc_head, idx = res - _res->dev->resource[0]; add_size = add_res->add_size; align = add_res->min_align; - if (!resource_size(res)) { + if (!add_size || !align) /* alt_size only */ + goto out; + + r_size = resource_size(res); + if (!r_size) { res->start = align; res->end = res->start + add_size - 1; if (pci_assign_resource(add_res->dev, idx)) reset_resource(res); } else { + /* could just assigned with alt, add difference ? */ + resource_size_t size; + + size = add_res->end - add_res->start + 1; + if (r_size < size) + add_size += size - r_size; + res->flags |= add_res->flags & (IORESOURCE_STARTALIGN|IORESOURCE_SIZEALIGN); if (pci_reassign_resource(add_res->dev, idx, @@ -582,6 +593,104 @@ static bool __assign_resources_required_optional_sorted(struct list_head *head, return false; } +static bool __has_alt(struct list_head *head, + struct list_head *realloc_head) +{ + int alt_count = 0; + struct pci_dev_resource *dev_res, *alt_res; + + if (!realloc_head) + return false; + + /*
[PATCH v11 14/60] PCI: Add has_mem64 for struct host_bridge
Add has_mem64 for struct host_bridge, on root bus that does not support mmio64 above 4g, will not set that. We will use that info next two following patches: 1. Don't treat non-pref mmio64 as pref mmio, so will not put it under bridge's pref range when rescan the devices 2. will keep pref mmio64 and pref mmio32 under bridge pref bar. Signed-off-by: Yinghai LuTested-by: Khalid Aziz --- drivers/pci/probe.c | 7 +++ include/linux/pci.h | 1 + 2 files changed, 8 insertions(+) diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 48e6f29..6b079f4 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -2230,6 +2230,13 @@ struct pci_bus *pci_create_root_bus(struct device *parent, int bus, } else bus_addr[0] = '\0'; dev_info(>dev, "root bus resource %pR%s\n", res, bus_addr); + + if (resource_type(res) == IORESOURCE_MEM) { + if ((res->end - offset) > 0x) + bridge->has_mem64 = 1; + if ((res->start - offset) > 0x) + res->flags |= IORESOURCE_MEM_64; + } } down_write(_bus_sem); diff --git a/include/linux/pci.h b/include/linux/pci.h index 1527735..979be25 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -408,6 +408,7 @@ struct pci_host_bridge { void (*release_fn)(struct pci_host_bridge *); void *release_data; unsigned int ignore_reset_delay:1; /* for entire hierarchy */ + unsigned int has_mem64:1; /* Resource alignment requirements */ resource_size_t (*align_resource)(struct pci_dev *dev, const struct resource *res, -- 1.8.4.5
[PATCH v11 14/60] PCI: Add has_mem64 for struct host_bridge
Add has_mem64 for struct host_bridge, on root bus that does not support mmio64 above 4g, will not set that. We will use that info next two following patches: 1. Don't treat non-pref mmio64 as pref mmio, so will not put it under bridge's pref range when rescan the devices 2. will keep pref mmio64 and pref mmio32 under bridge pref bar. Signed-off-by: Yinghai Lu Tested-by: Khalid Aziz --- drivers/pci/probe.c | 7 +++ include/linux/pci.h | 1 + 2 files changed, 8 insertions(+) diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 48e6f29..6b079f4 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -2230,6 +2230,13 @@ struct pci_bus *pci_create_root_bus(struct device *parent, int bus, } else bus_addr[0] = '\0'; dev_info(>dev, "root bus resource %pR%s\n", res, bus_addr); + + if (resource_type(res) == IORESOURCE_MEM) { + if ((res->end - offset) > 0x) + bridge->has_mem64 = 1; + if ((res->start - offset) > 0x) + res->flags |= IORESOURCE_MEM_64; + } } down_write(_bus_sem); diff --git a/include/linux/pci.h b/include/linux/pci.h index 1527735..979be25 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -408,6 +408,7 @@ struct pci_host_bridge { void (*release_fn)(struct pci_host_bridge *); void *release_data; unsigned int ignore_reset_delay:1; /* for entire hierarchy */ + unsigned int has_mem64:1; /* Resource alignment requirements */ resource_size_t (*align_resource)(struct pci_dev *dev, const struct resource *res, -- 1.8.4.5
[PATCH v11 04/60] sparc/PCI: Use correct offset for bus address to resource
After we added 64bit mmio parsing, we got some "no compatible bridge window" warning on anther new model that support 64bit resource. It turns out that we can not use mem_space.start as 64bit mem space offset, aka there is mem_space.start != offset. Use child_phys_addr to calculate exact offset and record offset in pbm. After patch we get correct offset. /pci@305: PCI IO [io 0x2007e-0x2007e0fff] offset 2007e /pci@305: PCI MEM [mem 0x20010-0x27eff] offset 2 /pci@305: PCI MEM64 [mem 0x20001-0x2000d] offset 2 ... pci_sun4v f02ae7f8: PCI host bridge to bus :00 pci_bus :00: root bus resource [io 0x2007e-0x2007e0fff] (bus address [0x-0xfff]) pci_bus :00: root bus resource [mem 0x20010-0x27eff] (bus address [0x0010-0x7eff]) pci_bus :00: root bus resource [mem 0x20001-0x2000d] (bus address [0x1-0xd]) -v3: put back mem64_offset, as we found T4 has mem_offset != mem64_offset check overlapping between mem64_space and mem_space. -v5: use pcibios_bus_to_region() requested by Bjorn. -v6: use pci_find_bus_resource(). Signed-off-by: Yinghai LuTested-by: Khalid Aziz --- arch/sparc/kernel/pci.c| 54 -- arch/sparc/kernel/pci_common.c | 32 ++--- arch/sparc/kernel/pci_impl.h | 4 3 files changed, 54 insertions(+), 36 deletions(-) diff --git a/arch/sparc/kernel/pci.c b/arch/sparc/kernel/pci.c index badf095..4606dc1 100644 --- a/arch/sparc/kernel/pci.c +++ b/arch/sparc/kernel/pci.c @@ -654,12 +654,12 @@ struct pci_bus *pci_scan_one_pbm(struct pci_pbm_info *pbm, printk("PCI: Scanning PBM %s\n", node->full_name); pci_add_resource_offset(, >io_space, - pbm->io_space.start); + pbm->io_offset); pci_add_resource_offset(, >mem_space, - pbm->mem_space.start); + pbm->mem_offset); if (pbm->mem64_space.flags) pci_add_resource_offset(, >mem64_space, - pbm->mem_space.start); + pbm->mem64_offset); pbm->busn.start = pbm->pci_first_busno; pbm->busn.end = pbm->pci_last_busno; pbm->busn.flags = IORESOURCE_BUS; @@ -733,30 +733,32 @@ int pcibios_enable_device(struct pci_dev *dev, int mask) static int __pci_mmap_make_offset_bus(struct pci_dev *pdev, struct vm_area_struct *vma, enum pci_mmap_state mmap_state) { - struct pci_pbm_info *pbm = pdev->dev.archdata.host_controller; - unsigned long space_size, user_offset, user_size; - - if (mmap_state == pci_mmap_io) { - space_size = resource_size(>io_space); - } else { - space_size = resource_size(>mem_space); - } + unsigned long user_offset, user_size; + struct resource res, *root_bus_res; + struct pci_bus_region region; + struct pci_bus *bus; /* Make sure the request is in range. */ user_offset = vma->vm_pgoff << PAGE_SHIFT; user_size = vma->vm_end - vma->vm_start; - if (user_offset >= space_size || - (user_offset + user_size) > space_size) + region.start = user_offset; + region.end = user_offset + user_size - 1; + memset(, 0, sizeof(res)); + if (mmap_state == pci_mmap_io) + res.flags = IORESOURCE_IO; + else + res.flags = IORESOURCE_MEM; + + pcibios_bus_to_resource(pdev->bus, , ); + bus = pdev->bus; + while (bus->parent) + bus = bus->parent; + root_bus_res = pci_find_bus_resource(bus, ); + if (!root_bus_res) return -EINVAL; - if (mmap_state == pci_mmap_io) { - vma->vm_pgoff = (pbm->io_space.start + -user_offset) >> PAGE_SHIFT; - } else { - vma->vm_pgoff = (pbm->mem_space.start + -user_offset) >> PAGE_SHIFT; - } + vma->vm_pgoff = res.start >> PAGE_SHIFT; return 0; } @@ -977,16 +979,12 @@ void pci_resource_to_user(const struct pci_dev *pdev, int bar, const struct resource *rp, resource_size_t *start, resource_size_t *end) { - struct pci_pbm_info *pbm = pdev->dev.archdata.host_controller; - unsigned long offset; + struct pci_bus_region region; - if (rp->flags & IORESOURCE_IO) - offset = pbm->io_space.start; - else - offset = pbm->mem_space.start; + pcibios_resource_to_bus(pdev->bus, , (struct resource *)rp); - *start = rp->start - offset; - *end = rp->end - offset; + *start = region.start;
[PATCH v11 03/60] PCI: Add pci_find_bus_resource()
Add pci_find_bus_resource() to return bus resource for input resource. In some case, we may only have bus instead of dev. It is same as pci_find_parent_resource, but take bus as input. Signed-off-by: Yinghai Lu--- drivers/pci/pci.c | 27 --- include/linux/pci.h | 2 ++ 2 files changed, 18 insertions(+), 11 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 25e0327..313dea5 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -414,18 +414,9 @@ int pci_find_ht_capability(struct pci_dev *dev, int ht_cap) } EXPORT_SYMBOL_GPL(pci_find_ht_capability); -/** - * pci_find_parent_resource - return resource region of parent bus of given region - * @dev: PCI device structure contains resources to be searched - * @res: child resource record for which parent is sought - * - * For given resource region of given device, return the resource - * region of parent bus the given region is contained in. - */ -struct resource *pci_find_parent_resource(const struct pci_dev *dev, - struct resource *res) +struct resource *pci_find_bus_resource(const struct pci_bus *bus, + struct resource *res) { - const struct pci_bus *bus = dev->bus; struct resource *r; int i; @@ -455,6 +446,20 @@ struct resource *pci_find_parent_resource(const struct pci_dev *dev, } return NULL; } + +/** + * pci_find_parent_resource - return resource region of parent bus of given region + * @dev: PCI device structure contains resources to be searched + * @res: child resource record for which parent is sought + * + * For given resource region of given device, return the resource + * region of parent bus the given region is contained in. + */ +struct resource *pci_find_parent_resource(const struct pci_dev *dev, + struct resource *res) +{ + return pci_find_bus_resource(dev->bus, res); +} EXPORT_SYMBOL(pci_find_parent_resource); /** diff --git a/include/linux/pci.h b/include/linux/pci.h index 004b813..795b4c7 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -807,6 +807,8 @@ void pcibios_resource_to_bus(struct pci_bus *bus, struct pci_bus_region *region, struct resource *res); void pcibios_bus_to_resource(struct pci_bus *bus, struct resource *res, struct pci_bus_region *region); +struct resource *pci_find_bus_resource(const struct pci_bus *bus, + struct resource *res); void pcibios_scan_specific_bus(int busn); struct pci_bus *pci_find_bus(int domain, int busnr); void pci_bus_add_devices(const struct pci_bus *bus); -- 1.8.4.5
[PATCH v11 04/60] sparc/PCI: Use correct offset for bus address to resource
After we added 64bit mmio parsing, we got some "no compatible bridge window" warning on anther new model that support 64bit resource. It turns out that we can not use mem_space.start as 64bit mem space offset, aka there is mem_space.start != offset. Use child_phys_addr to calculate exact offset and record offset in pbm. After patch we get correct offset. /pci@305: PCI IO [io 0x2007e-0x2007e0fff] offset 2007e /pci@305: PCI MEM [mem 0x20010-0x27eff] offset 2 /pci@305: PCI MEM64 [mem 0x20001-0x2000d] offset 2 ... pci_sun4v f02ae7f8: PCI host bridge to bus :00 pci_bus :00: root bus resource [io 0x2007e-0x2007e0fff] (bus address [0x-0xfff]) pci_bus :00: root bus resource [mem 0x20010-0x27eff] (bus address [0x0010-0x7eff]) pci_bus :00: root bus resource [mem 0x20001-0x2000d] (bus address [0x1-0xd]) -v3: put back mem64_offset, as we found T4 has mem_offset != mem64_offset check overlapping between mem64_space and mem_space. -v5: use pcibios_bus_to_region() requested by Bjorn. -v6: use pci_find_bus_resource(). Signed-off-by: Yinghai Lu Tested-by: Khalid Aziz --- arch/sparc/kernel/pci.c| 54 -- arch/sparc/kernel/pci_common.c | 32 ++--- arch/sparc/kernel/pci_impl.h | 4 3 files changed, 54 insertions(+), 36 deletions(-) diff --git a/arch/sparc/kernel/pci.c b/arch/sparc/kernel/pci.c index badf095..4606dc1 100644 --- a/arch/sparc/kernel/pci.c +++ b/arch/sparc/kernel/pci.c @@ -654,12 +654,12 @@ struct pci_bus *pci_scan_one_pbm(struct pci_pbm_info *pbm, printk("PCI: Scanning PBM %s\n", node->full_name); pci_add_resource_offset(, >io_space, - pbm->io_space.start); + pbm->io_offset); pci_add_resource_offset(, >mem_space, - pbm->mem_space.start); + pbm->mem_offset); if (pbm->mem64_space.flags) pci_add_resource_offset(, >mem64_space, - pbm->mem_space.start); + pbm->mem64_offset); pbm->busn.start = pbm->pci_first_busno; pbm->busn.end = pbm->pci_last_busno; pbm->busn.flags = IORESOURCE_BUS; @@ -733,30 +733,32 @@ int pcibios_enable_device(struct pci_dev *dev, int mask) static int __pci_mmap_make_offset_bus(struct pci_dev *pdev, struct vm_area_struct *vma, enum pci_mmap_state mmap_state) { - struct pci_pbm_info *pbm = pdev->dev.archdata.host_controller; - unsigned long space_size, user_offset, user_size; - - if (mmap_state == pci_mmap_io) { - space_size = resource_size(>io_space); - } else { - space_size = resource_size(>mem_space); - } + unsigned long user_offset, user_size; + struct resource res, *root_bus_res; + struct pci_bus_region region; + struct pci_bus *bus; /* Make sure the request is in range. */ user_offset = vma->vm_pgoff << PAGE_SHIFT; user_size = vma->vm_end - vma->vm_start; - if (user_offset >= space_size || - (user_offset + user_size) > space_size) + region.start = user_offset; + region.end = user_offset + user_size - 1; + memset(, 0, sizeof(res)); + if (mmap_state == pci_mmap_io) + res.flags = IORESOURCE_IO; + else + res.flags = IORESOURCE_MEM; + + pcibios_bus_to_resource(pdev->bus, , ); + bus = pdev->bus; + while (bus->parent) + bus = bus->parent; + root_bus_res = pci_find_bus_resource(bus, ); + if (!root_bus_res) return -EINVAL; - if (mmap_state == pci_mmap_io) { - vma->vm_pgoff = (pbm->io_space.start + -user_offset) >> PAGE_SHIFT; - } else { - vma->vm_pgoff = (pbm->mem_space.start + -user_offset) >> PAGE_SHIFT; - } + vma->vm_pgoff = res.start >> PAGE_SHIFT; return 0; } @@ -977,16 +979,12 @@ void pci_resource_to_user(const struct pci_dev *pdev, int bar, const struct resource *rp, resource_size_t *start, resource_size_t *end) { - struct pci_pbm_info *pbm = pdev->dev.archdata.host_controller; - unsigned long offset; + struct pci_bus_region region; - if (rp->flags & IORESOURCE_IO) - offset = pbm->io_space.start; - else - offset = pbm->mem_space.start; + pcibios_resource_to_bus(pdev->bus, , (struct resource *)rp); - *start = rp->start - offset; - *end = rp->end - offset; + *start = region.start; + *end = region.end; } void
[PATCH v11 03/60] PCI: Add pci_find_bus_resource()
Add pci_find_bus_resource() to return bus resource for input resource. In some case, we may only have bus instead of dev. It is same as pci_find_parent_resource, but take bus as input. Signed-off-by: Yinghai Lu --- drivers/pci/pci.c | 27 --- include/linux/pci.h | 2 ++ 2 files changed, 18 insertions(+), 11 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 25e0327..313dea5 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -414,18 +414,9 @@ int pci_find_ht_capability(struct pci_dev *dev, int ht_cap) } EXPORT_SYMBOL_GPL(pci_find_ht_capability); -/** - * pci_find_parent_resource - return resource region of parent bus of given region - * @dev: PCI device structure contains resources to be searched - * @res: child resource record for which parent is sought - * - * For given resource region of given device, return the resource - * region of parent bus the given region is contained in. - */ -struct resource *pci_find_parent_resource(const struct pci_dev *dev, - struct resource *res) +struct resource *pci_find_bus_resource(const struct pci_bus *bus, + struct resource *res) { - const struct pci_bus *bus = dev->bus; struct resource *r; int i; @@ -455,6 +446,20 @@ struct resource *pci_find_parent_resource(const struct pci_dev *dev, } return NULL; } + +/** + * pci_find_parent_resource - return resource region of parent bus of given region + * @dev: PCI device structure contains resources to be searched + * @res: child resource record for which parent is sought + * + * For given resource region of given device, return the resource + * region of parent bus the given region is contained in. + */ +struct resource *pci_find_parent_resource(const struct pci_dev *dev, + struct resource *res) +{ + return pci_find_bus_resource(dev->bus, res); +} EXPORT_SYMBOL(pci_find_parent_resource); /** diff --git a/include/linux/pci.h b/include/linux/pci.h index 004b813..795b4c7 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -807,6 +807,8 @@ void pcibios_resource_to_bus(struct pci_bus *bus, struct pci_bus_region *region, struct resource *res); void pcibios_bus_to_resource(struct pci_bus *bus, struct resource *res, struct pci_bus_region *region); +struct resource *pci_find_bus_resource(const struct pci_bus *bus, + struct resource *res); void pcibios_scan_specific_bus(int busn); struct pci_bus *pci_find_bus(int domain, int busnr); void pci_bus_add_devices(const struct pci_bus *bus); -- 1.8.4.5
[PATCH v11 53/60] PCI: Kill macro checking for bus io port sizing
We can use new generic version skip_isa_ioresource_align() instead of macro, and then kill the marco. Signed-off-by: Yinghai Lu--- drivers/pci/setup-bus.c | 17 +++-- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c index 5ba4bf5..65a41e7 100644 --- a/drivers/pci/setup-bus.c +++ b/drivers/pci/setup-bus.c @@ -1159,15 +1159,12 @@ int skip_isa_ioresource_align(struct pci_bus *bus) return 0; } -static resource_size_t size_aligned_for_isa(resource_size_t size) +static resource_size_t size_aligned_for_isa(resource_size_t size, + struct pci_bus *bus) { - /* -* To be fixed in 2.5: we should have sort of HAVE_ISA -* flag in the struct pci_bus. -*/ -#if defined(CONFIG_ISA) || defined(CONFIG_EISA) - size = (size & 0xff) + ((size & ~0xffUL) << 2); -#endif + if (!skip_isa_ioresource_align(bus)) + size = (size & 0xff) + ((size & ~0xffUL) << 2); + return size; } @@ -1236,12 +1233,12 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size, } } - size = size_aligned_for_isa(size); + size = size_aligned_for_isa(size, bus); size += size1; if (size || min_size) size0 = calculate_size(size, min_size, resource_size(b_res), min_align); - sum_add_size = size_aligned_for_isa(sum_add_size); + sum_add_size = size_aligned_for_isa(sum_add_size, bus); sum_add_size += sum_add_size1; if (sum_add_size < min_sum_size) sum_add_size = min_sum_size; -- 1.8.4.5
[PATCH v11 01/60] PCI: Fix iomem_is_exclusive() checking in pci_mmap_resource()
From: Bjorn Helgaasiomem_is_exclusive() requires a CPU physical address, but on some arches we supplied a PCI bus address instead. On most arches, pci_resource_to_user(res) returns "res->start", which is a CPU physical address. But on microblaze, mips, powerpc, and sparc, it returns the PCI bus address corresponding to "res->start". The result is that pci_mmap_resource() may fail when it shouldn't (if the bus address happens to match an existing resource), or it may succeed when it should fail (if the resource is exclusive but the bus address doesn't match it). Call iomem_is_exclusive() with "res->start", which is always a CPU physical address, not the result of pci_resource_to_user(). Fixes: e8de1481fd71 ("resource: allow MMIO exclusivity for device drivers") Suggested-by: Yinghai Lu Signed-off-by: Bjorn Helgaas CC: Arjan van de Ven --- drivers/pci/pci-sysfs.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index e982010..cbb13be 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -1008,6 +1008,9 @@ static int pci_mmap_resource(struct kobject *kobj, struct bin_attribute *attr, if (i >= PCI_ROM_RESOURCE) return -ENODEV; + if (res->flags & IORESOURCE_MEM && iomem_is_exclusive(res->start)) + return -EINVAL; + if (!pci_mmap_fits(pdev, i, vma, PCI_MMAP_SYSFS)) { WARN(1, "process \"%s\" tried to map 0x%08lx bytes at page 0x%08lx on %s BAR %d (start 0x%16Lx, size 0x%16Lx)\n", current->comm, vma->vm_end-vma->vm_start, vma->vm_pgoff, @@ -1024,10 +1027,6 @@ static int pci_mmap_resource(struct kobject *kobj, struct bin_attribute *attr, pci_resource_to_user(pdev, i, res, , ); vma->vm_pgoff += start >> PAGE_SHIFT; mmap_type = res->flags & IORESOURCE_MEM ? pci_mmap_mem : pci_mmap_io; - - if (res->flags & IORESOURCE_MEM && iomem_is_exclusive(start)) - return -EINVAL; - return pci_mmap_page_range(pdev, vma, mmap_type, write_combine); } -- 1.8.4.5
[PATCH v11 01/60] PCI: Fix iomem_is_exclusive() checking in pci_mmap_resource()
From: Bjorn Helgaas iomem_is_exclusive() requires a CPU physical address, but on some arches we supplied a PCI bus address instead. On most arches, pci_resource_to_user(res) returns "res->start", which is a CPU physical address. But on microblaze, mips, powerpc, and sparc, it returns the PCI bus address corresponding to "res->start". The result is that pci_mmap_resource() may fail when it shouldn't (if the bus address happens to match an existing resource), or it may succeed when it should fail (if the resource is exclusive but the bus address doesn't match it). Call iomem_is_exclusive() with "res->start", which is always a CPU physical address, not the result of pci_resource_to_user(). Fixes: e8de1481fd71 ("resource: allow MMIO exclusivity for device drivers") Suggested-by: Yinghai Lu Signed-off-by: Bjorn Helgaas CC: Arjan van de Ven --- drivers/pci/pci-sysfs.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index e982010..cbb13be 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -1008,6 +1008,9 @@ static int pci_mmap_resource(struct kobject *kobj, struct bin_attribute *attr, if (i >= PCI_ROM_RESOURCE) return -ENODEV; + if (res->flags & IORESOURCE_MEM && iomem_is_exclusive(res->start)) + return -EINVAL; + if (!pci_mmap_fits(pdev, i, vma, PCI_MMAP_SYSFS)) { WARN(1, "process \"%s\" tried to map 0x%08lx bytes at page 0x%08lx on %s BAR %d (start 0x%16Lx, size 0x%16Lx)\n", current->comm, vma->vm_end-vma->vm_start, vma->vm_pgoff, @@ -1024,10 +1027,6 @@ static int pci_mmap_resource(struct kobject *kobj, struct bin_attribute *attr, pci_resource_to_user(pdev, i, res, , ); vma->vm_pgoff += start >> PAGE_SHIFT; mmap_type = res->flags & IORESOURCE_MEM ? pci_mmap_mem : pci_mmap_io; - - if (res->flags & IORESOURCE_MEM && iomem_is_exclusive(start)) - return -EINVAL; - return pci_mmap_page_range(pdev, vma, mmap_type, write_combine); } -- 1.8.4.5
[PATCH v11 53/60] PCI: Kill macro checking for bus io port sizing
We can use new generic version skip_isa_ioresource_align() instead of macro, and then kill the marco. Signed-off-by: Yinghai Lu --- drivers/pci/setup-bus.c | 17 +++-- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c index 5ba4bf5..65a41e7 100644 --- a/drivers/pci/setup-bus.c +++ b/drivers/pci/setup-bus.c @@ -1159,15 +1159,12 @@ int skip_isa_ioresource_align(struct pci_bus *bus) return 0; } -static resource_size_t size_aligned_for_isa(resource_size_t size) +static resource_size_t size_aligned_for_isa(resource_size_t size, + struct pci_bus *bus) { - /* -* To be fixed in 2.5: we should have sort of HAVE_ISA -* flag in the struct pci_bus. -*/ -#if defined(CONFIG_ISA) || defined(CONFIG_EISA) - size = (size & 0xff) + ((size & ~0xffUL) << 2); -#endif + if (!skip_isa_ioresource_align(bus)) + size = (size & 0xff) + ((size & ~0xffUL) << 2); + return size; } @@ -1236,12 +1233,12 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size, } } - size = size_aligned_for_isa(size); + size = size_aligned_for_isa(size, bus); size += size1; if (size || min_size) size0 = calculate_size(size, min_size, resource_size(b_res), min_align); - sum_add_size = size_aligned_for_isa(sum_add_size); + sum_add_size = size_aligned_for_isa(sum_add_size, bus); sum_add_size += sum_add_size1; if (sum_add_size < min_sum_size) sum_add_size = min_sum_size; -- 1.8.4.5
[PATCH v11 30/60] PCI: Reorder resources list for required/optional resources
We try to allocate required+optional before allocate required only and expand with optional. At first we update size and alignment for required+optional resource. And after that we reorder them with new alignment, but current we only do that STARTALIGN ones. For SIZEALIGN type resource, after add back add_size, the alignment get changed, so need to do sorting like STARTALIGN type resources. Also we need to reorder the sorting back after we restore resource to required only when required+optional fail to allocate for all. So move out the reordering code from the loop to separated function, and call it two times accordingly. Signed-off-by: Yinghai Lu--- drivers/pci/setup-bus.c | 62 + 1 file changed, 37 insertions(+), 25 deletions(-) diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c index f6b7b8f..aed62cc 100644 --- a/drivers/pci/setup-bus.c +++ b/drivers/pci/setup-bus.c @@ -285,6 +285,31 @@ static inline void reset_resource(struct resource *res) res->flags = 0; } +static void sort_resources(struct list_head *head) +{ + struct pci_dev_resource *res1, *tmp_res, *res2; + + list_for_each_entry_safe(res1, tmp_res, head, list) { + resource_size_t align1, size1, align2, size2; + + align1 = pci_resource_alignment(res1->dev, res1->res); + size1 = resource_size(res1->res); + + /* reorder it */ + list_for_each_entry(res2, head, list) { + if (res2 == res1) + break; + + align2 = pci_resource_alignment(res2->dev, res2->res); + size2 = resource_size(res2->res); + if (is_before(align1, size1, align2, size2)) { + list_move_tail(>list, >list); + break; + } + } + } +} + /** * reassign_resources_sorted() - satisfy any additional resource requests * @@ -453,9 +478,9 @@ static void __assign_resources_sorted(struct list_head *head, LIST_HEAD(save_head); LIST_HEAD(local_fail_head); struct pci_dev_resource *save_res; - struct pci_dev_resource *dev_res, *tmp_res, *dev_res2; + struct pci_dev_resource *dev_res, *tmp_res; unsigned long fail_type; - resource_size_t add_align, align; + resource_size_t add_align; /* Check if optional add_size is there */ if (!realloc_head || list_empty(realloc_head)) @@ -470,47 +495,32 @@ static void __assign_resources_sorted(struct list_head *head, } /* Update res in head list with add_size in realloc_head list */ - list_for_each_entry_safe(dev_res, tmp_res, head, list) { + list_for_each_entry(dev_res, head, list) { dev_res->res->end += get_res_add_size(realloc_head, dev_res->res); /* * There are two kinds of additional resources in the list: -* 1. bridge resource -- IORESOURCE_STARTALIGN -* 2. SR-IOV resource -- IORESOURCE_SIZEALIGN -* Here just fix the additional alignment for bridge +* 1. bridge resource with IORESOURCE_STARTALIGN +*need to update start to change alignment +* 2. resource with IORESOURCE_SIZEALIGN +*update size above already change alignment. */ if (!(dev_res->res->flags & IORESOURCE_STARTALIGN)) continue; add_align = get_res_add_align(realloc_head, dev_res->res); - /* -* The "head" list is sorted by the alignment to make sure -* resources with bigger alignment will be assigned first. -* After we change the alignment of a dev_res in "head" list, -* we need to reorder the list by alignment to make it -* consistent. -*/ - if (add_align > dev_res->res->start) { + if (add_align) { resource_size_t r_size = resource_size(dev_res->res); dev_res->res->start = add_align; dev_res->res->end = add_align + r_size - 1; - - list_for_each_entry(dev_res2, head, list) { - align = pci_resource_alignment(dev_res2->dev, - dev_res2->res); - if (add_align > align) { - list_move_tail(_res->list, - _res2->list); - break; - } - } } - } +
Re: [PATCH v2] clk: let clk_disable() return immediately if clk is NULL or error
On 04/05, Masahiro Yamada wrote: > The clk_disable() in the common clock framework (drivers/clk/clk.c) > returns immediately if a given clk is NULL or an error pointer. It > allows clock consumers to call clk_disable() without IS_ERR_OR_NULL > checking if drivers are only used with the common clock framework. > > Unfortunately, NULL/error checking is missing from some of non-common > clk_disable() implementations. This prevents us from completely > dropping NULL/error checking from callers. Let's make it tree-wide > consistent by adding IS_ERR_OR_NULL(clk) to all callees. > > Signed-off-by: Masahiro Yamada> Acked-by: Greg Ungerer > Acked-by: Wan Zongshun > --- > > Stephen, > > This patch has been unapplied for a long time. > > Please let me know if there is something wrong with this patch. > I'm mostly confused why we wouldn't want to encourage people to call clk_disable or unprepare on a clk that's an error pointer. Typically an error pointer should be dealt with, instead of silently ignored, so why wasn't it dealt with by passing it up the probe() path? -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
[PATCH v11 30/60] PCI: Reorder resources list for required/optional resources
We try to allocate required+optional before allocate required only and expand with optional. At first we update size and alignment for required+optional resource. And after that we reorder them with new alignment, but current we only do that STARTALIGN ones. For SIZEALIGN type resource, after add back add_size, the alignment get changed, so need to do sorting like STARTALIGN type resources. Also we need to reorder the sorting back after we restore resource to required only when required+optional fail to allocate for all. So move out the reordering code from the loop to separated function, and call it two times accordingly. Signed-off-by: Yinghai Lu --- drivers/pci/setup-bus.c | 62 + 1 file changed, 37 insertions(+), 25 deletions(-) diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c index f6b7b8f..aed62cc 100644 --- a/drivers/pci/setup-bus.c +++ b/drivers/pci/setup-bus.c @@ -285,6 +285,31 @@ static inline void reset_resource(struct resource *res) res->flags = 0; } +static void sort_resources(struct list_head *head) +{ + struct pci_dev_resource *res1, *tmp_res, *res2; + + list_for_each_entry_safe(res1, tmp_res, head, list) { + resource_size_t align1, size1, align2, size2; + + align1 = pci_resource_alignment(res1->dev, res1->res); + size1 = resource_size(res1->res); + + /* reorder it */ + list_for_each_entry(res2, head, list) { + if (res2 == res1) + break; + + align2 = pci_resource_alignment(res2->dev, res2->res); + size2 = resource_size(res2->res); + if (is_before(align1, size1, align2, size2)) { + list_move_tail(>list, >list); + break; + } + } + } +} + /** * reassign_resources_sorted() - satisfy any additional resource requests * @@ -453,9 +478,9 @@ static void __assign_resources_sorted(struct list_head *head, LIST_HEAD(save_head); LIST_HEAD(local_fail_head); struct pci_dev_resource *save_res; - struct pci_dev_resource *dev_res, *tmp_res, *dev_res2; + struct pci_dev_resource *dev_res, *tmp_res; unsigned long fail_type; - resource_size_t add_align, align; + resource_size_t add_align; /* Check if optional add_size is there */ if (!realloc_head || list_empty(realloc_head)) @@ -470,47 +495,32 @@ static void __assign_resources_sorted(struct list_head *head, } /* Update res in head list with add_size in realloc_head list */ - list_for_each_entry_safe(dev_res, tmp_res, head, list) { + list_for_each_entry(dev_res, head, list) { dev_res->res->end += get_res_add_size(realloc_head, dev_res->res); /* * There are two kinds of additional resources in the list: -* 1. bridge resource -- IORESOURCE_STARTALIGN -* 2. SR-IOV resource -- IORESOURCE_SIZEALIGN -* Here just fix the additional alignment for bridge +* 1. bridge resource with IORESOURCE_STARTALIGN +*need to update start to change alignment +* 2. resource with IORESOURCE_SIZEALIGN +*update size above already change alignment. */ if (!(dev_res->res->flags & IORESOURCE_STARTALIGN)) continue; add_align = get_res_add_align(realloc_head, dev_res->res); - /* -* The "head" list is sorted by the alignment to make sure -* resources with bigger alignment will be assigned first. -* After we change the alignment of a dev_res in "head" list, -* we need to reorder the list by alignment to make it -* consistent. -*/ - if (add_align > dev_res->res->start) { + if (add_align) { resource_size_t r_size = resource_size(dev_res->res); dev_res->res->start = add_align; dev_res->res->end = add_align + r_size - 1; - - list_for_each_entry(dev_res2, head, list) { - align = pci_resource_alignment(dev_res2->dev, - dev_res2->res); - if (add_align > align) { - list_move_tail(_res->list, - _res2->list); - break; - } - } } - } + sort_resources(head); +
Re: [PATCH v2] clk: let clk_disable() return immediately if clk is NULL or error
On 04/05, Masahiro Yamada wrote: > The clk_disable() in the common clock framework (drivers/clk/clk.c) > returns immediately if a given clk is NULL or an error pointer. It > allows clock consumers to call clk_disable() without IS_ERR_OR_NULL > checking if drivers are only used with the common clock framework. > > Unfortunately, NULL/error checking is missing from some of non-common > clk_disable() implementations. This prevents us from completely > dropping NULL/error checking from callers. Let's make it tree-wide > consistent by adding IS_ERR_OR_NULL(clk) to all callees. > > Signed-off-by: Masahiro Yamada > Acked-by: Greg Ungerer > Acked-by: Wan Zongshun > --- > > Stephen, > > This patch has been unapplied for a long time. > > Please let me know if there is something wrong with this patch. > I'm mostly confused why we wouldn't want to encourage people to call clk_disable or unprepare on a clk that's an error pointer. Typically an error pointer should be dealt with, instead of silently ignored, so why wasn't it dealt with by passing it up the probe() path? -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
[PATCH v3 7/7] x86/entry: Make gs_change a local label
From: Borislav Petkov... so that it doesn't appear in objdump output. Signed-off-by: Borislav Petkov Signed-off-by: Andy Lutomirski --- arch/x86/entry/entry_64.S | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 64d2033d1e49..1693c17dbf81 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -781,7 +781,7 @@ ENTRY(native_load_gs_index) pushfq DISABLE_INTERRUPTS(CLBR_ANY & ~CLBR_RDI) SWAPGS -gs_change: +.Lgs_change: movl%edi, %gs 2: ALTERNATIVE "", "mfence", X86_BUG_SWAPGS_FENCE SWAPGS @@ -789,7 +789,7 @@ gs_change: ret END(native_load_gs_index) - _ASM_EXTABLE(gs_change, bad_gs) + _ASM_EXTABLE(.Lgs_change, bad_gs) .section .fixup, "ax" /* running with kernelgs */ bad_gs: @@ -1019,13 +1019,13 @@ ENTRY(error_entry) movl%ecx, %eax /* zero extend */ cmpq%rax, RIP+8(%rsp) je .Lbstep_iret - cmpq$gs_change, RIP+8(%rsp) + cmpq$.Lgs_change, RIP+8(%rsp) jne .Lerror_entry_done /* -* hack: gs_change can fail with user gsbase. If this happens, fix up +* hack: .Lgs_change can fail with user gsbase. If this happens, fix up * gsbase and proceed. We'll fix up the exception and land in -* gs_change's error handler with kernel gsbase. +* .Lgs_change's error handler with kernel gsbase. */ jmp .Lerror_entry_from_usermode_swapgs -- 2.5.5
[PATCH v3 1/7] selftests/x86: Test the FSBASE/GSBASE API and context switching
This catches two distinct bugs in the current code. I'll fix them. Signed-off-by: Andy Lutomirski--- tools/testing/selftests/x86/Makefile | 1 + tools/testing/selftests/x86/fsgsbase.c | 398 + 2 files changed, 399 insertions(+) create mode 100644 tools/testing/selftests/x86/fsgsbase.c diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile index b47ebd170690..c73425de3cfe 100644 --- a/tools/testing/selftests/x86/Makefile +++ b/tools/testing/selftests/x86/Makefile @@ -9,6 +9,7 @@ TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt ptrace_sysc TARGETS_C_32BIT_ONLY := entry_from_vm86 syscall_arg_fault test_syscall_vdso unwind_vdso \ test_FCMOV test_FCOMI test_FISTTP \ vdso_restorer +TARGETS_C_64BIT_ONLY := fsgsbase TARGETS_C_32BIT_ALL := $(TARGETS_C_BOTHBITS) $(TARGETS_C_32BIT_ONLY) TARGETS_C_64BIT_ALL := $(TARGETS_C_BOTHBITS) $(TARGETS_C_64BIT_ONLY) diff --git a/tools/testing/selftests/x86/fsgsbase.c b/tools/testing/selftests/x86/fsgsbase.c new file mode 100644 index ..b7733055a359 --- /dev/null +++ b/tools/testing/selftests/x86/fsgsbase.c @@ -0,0 +1,398 @@ +/* + * fsgsbase.c, an fsgsbase test + * Copyright (c) 2014-2016 Andy Lutomirski + * GPL v2 + */ + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#ifndef __x86_64__ +# error This test is 64-bit only +#endif + +static volatile sig_atomic_t want_segv; +static volatile unsigned long segv_addr; + +static int nerrs; + +static void sethandler(int sig, void (*handler)(int, siginfo_t *, void *), + int flags) +{ + struct sigaction sa; + memset(, 0, sizeof(sa)); + sa.sa_sigaction = handler; + sa.sa_flags = SA_SIGINFO | flags; + sigemptyset(_mask); + if (sigaction(sig, , 0)) + err(1, "sigaction"); +} + +static void clearhandler(int sig) +{ + struct sigaction sa; + memset(, 0, sizeof(sa)); + sa.sa_handler = SIG_DFL; + sigemptyset(_mask); + if (sigaction(sig, , 0)) + err(1, "sigaction"); +} + +static void sigsegv(int sig, siginfo_t *si, void *ctx_void) +{ + ucontext_t *ctx = (ucontext_t*)ctx_void; + + if (!want_segv) { + clearhandler(SIGSEGV); + return; /* Crash cleanly. */ + } + + want_segv = false; + segv_addr = (unsigned long)si->si_addr; + + ctx->uc_mcontext.gregs[REG_RIP] += 4; /* Skip the faulting mov */ + +} + +enum which_base { FS, GS }; + +static unsigned long read_base(enum which_base which) +{ + unsigned long offset; + /* +* Unless we have FSGSBASE, there's no direct way to do this from +* user mode. We can get at it indirectly using signals, though. +*/ + + want_segv = true; + + offset = 0; + if (which == FS) { + /* Use a constant-length instruction here. */ + asm volatile ("mov %%fs:(%%rcx), %%rax" : : "c" (offset) : "rax"); + } else { + asm volatile ("mov %%gs:(%%rcx), %%rax" : : "c" (offset) : "rax"); + } + if (!want_segv) + return segv_addr + offset; + + /* +* If that didn't segfault, try the other end of the address space. +* Unless we get really unlucky and run into the vsyscall page, this +* is guaranteed to segfault. +*/ + + offset = (ULONG_MAX >> 1) + 1; + if (which == FS) { + asm volatile ("mov %%fs:(%%rcx), %%rax" + : : "c" (offset) : "rax"); + } else { + asm volatile ("mov %%gs:(%%rcx), %%rax" + : : "c" (offset) : "rax"); + } + if (!want_segv) + return segv_addr + offset; + + abort(); +} + +static void check_gs_value(unsigned long value) +{ + unsigned long base; + unsigned short sel; + + printf("[RUN]\tARCH_SET_GS to 0x%lx\n", value); + if (syscall(SYS_arch_prctl, ARCH_SET_GS, value) != 0) + err(1, "ARCH_SET_GS"); + + asm volatile ("mov %%gs, %0" : "=rm" (sel)); + base = read_base(GS); + if (base == value) { + printf("[OK]\tGSBASE was set as expected (selector 0x%hx)\n", + sel); + } else { + nerrs++; + printf("[FAIL]\tGSBASE was not as expected: got 0x%lx (selector 0x%hx)\n", + base, sel); + } + + if (syscall(SYS_arch_prctl, ARCH_GET_GS, ) != 0) + err(1, "ARCH_GET_GS"); + if (base == value) { + printf("[OK]\tARCH_GET_GS worked as expected (selector 0x%hx)\n", + sel); + } else { + nerrs++; +