Re: [powerpc:next-test 70/71] include/asm-generic/memory_model.h:55:54: error: 'vmemmap' undeclared; did you mean 'mem_map'?

2018-12-11 Thread Christophe Leroy

This one is already fixed in v2 and v3 by the addition of asm/pgtable.h

Christophe

Le 12/12/2018 à 06:36, kbuild test robot a écrit :

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
next-test
head:   13fd174be1903cc069bb73b0d53a0420b8f6d778
commit: 3a0f466ce2402cc090c97ccc1285508f688d8a8e [70/71] powerpc: Implement 
CONFIG_DEBUG_VIRTUAL
config: powerpc-ppc64e_defconfig (attached as .config)
compiler: powerpc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
 wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
 chmod +x ~/bin/make.cross
 git checkout 3a0f466ce2402cc090c97ccc1285508f688d8a8e
 # save the attached .config to linux build tree
 GCC_VERSION=7.2.0 make.cross ARCH=powerpc

All errors (new ones prefixed by >>):

In file included from arch/powerpc/include/asm/page.h:339:0,
 from arch/powerpc/include/asm/thread_info.h:29,
 from include/linux/thread_info.h:38,
 from include/asm-generic/preempt.h:5,
 from ./arch/powerpc/include/generated/asm/preempt.h:1,
 from include/linux/preempt.h:81,
 from include/linux/spinlock.h:51,
 from include/linux/seqlock.h:36,
 from include/linux/time.h:6,
 from include/linux/stat.h:19,
 from include/linux/module.h:10,
 from init/do_mounts.c:1:
arch/powerpc/include/asm/io.h: In function 'page_to_phys':

include/asm-generic/memory_model.h:55:54: error: 'vmemmap' undeclared (first 
use in this function); did you mean 'mem_map'?

 #define __page_to_pfn(page) (unsigned long)((page) - vmemmap)
  ^
include/asm-generic/memory_model.h:81:21: note: in expansion of macro 
'__page_to_pfn'
 #define page_to_pfn __page_to_pfn
 ^
arch/powerpc/include/asm/io.h:835:22: note: in expansion of macro 
'page_to_pfn'
  unsigned long pfn = page_to_pfn(page);
  ^~~
include/asm-generic/memory_model.h:55:54: note: each undeclared identifier 
is reported only once for each function it appears in
 #define __page_to_pfn(page) (unsigned long)((page) - vmemmap)
  ^
include/asm-generic/memory_model.h:81:21: note: in expansion of macro 
'__page_to_pfn'
 #define page_to_pfn __page_to_pfn
 ^
arch/powerpc/include/asm/io.h:835:22: note: in expansion of macro 
'page_to_pfn'
  unsigned long pfn = page_to_pfn(page);
  ^~~

vim +55 include/asm-generic/memory_model.h

8f6aac41 Christoph Lameter  2007-10-16  52
af901ca1 André Goddard Rosa 2009-11-14  53  /* memmap is virtually contiguous.  
*/
8f6aac41 Christoph Lameter  2007-10-16  54  #define __pfn_to_page(pfn)  
(vmemmap + (pfn))
32272a26 Martin Schwidefsky 2008-12-25 @55  #define __page_to_pfn(page) 
(unsigned long)((page) - vmemmap)
8f6aac41 Christoph Lameter  2007-10-16  56

:: The code at line 55 was first introduced by commit
:: 32272a26974d2027384fd4010cd1780fca425d94 [S390] __page_to_pfn warnings

:: TO: Martin Schwidefsky 
:: CC: Martin Schwidefsky 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation



Re: [PATCH v3] powerpc: implement CONFIG_DEBUG_VIRTUAL

2018-12-11 Thread Christophe Leroy




Le 12/12/2018 à 01:23, Michael Ellerman a écrit :

Christophe Leroy  writes:


This patch implements CONFIG_DEBUG_VIRTUAL to warn about
incorrect use of virt_to_phys() and page_to_phys()

Below is the result of test_debug_virtual:

[1.438746] WARNING: CPU: 0 PID: 1 at ./arch/powerpc/include/asm/io.h:808 
test_debug_virtual_init+0x3c/0xd4
[1.448156] CPU: 0 PID: 1 Comm: swapper Not tainted 
4.20.0-rc5-00560-g6bfb52e23a00-dirty #532
[1.457259] NIP:  c066c550 LR: c0650ccc CTR: c066c514
[1.462257] REGS: c900bdb0 TRAP: 0700   Not tainted  
(4.20.0-rc5-00560-g6bfb52e23a00-dirty)
[1.471184] MSR:  00029032   CR: 48000422  XER: 2000
[1.477811]
[1.477811] GPR00: c0650ccc c900be60 c60d  006000c0 c900 
9032 c7fa0020
[1.477811] GPR08: 2400 0001 0900  c07b5d04  
c00037d8 
[1.477811] GPR16:     c076 c074 
0092 c0685bb0
[1.477811] GPR24: c065042c c068a734 c0685b8c 0006  c076 
c075c3c0 
[1.512711] NIP [c066c550] test_debug_virtual_init+0x3c/0xd4
[1.518315] LR [c0650ccc] do_one_initcall+0x8c/0x1cc
[1.523163] Call Trace:
[1.525595] [c900be60] [c0567340] 0xc0567340 (unreliable)
[1.530954] [c900be90] [c0650ccc] do_one_initcall+0x8c/0x1cc
[1.536551] [c900bef0] [c0651000] kernel_init_freeable+0x1f4/0x2cc
[1.542658] [c900bf30] [c00037ec] kernel_init+0x14/0x110
[1.547913] [c900bf40] [c000e1d0] ret_from_kernel_thread+0x14/0x1c
[1.553971] Instruction dump:
[1.556909] 3ca50100 bfa10024 54a5000e 3fa0c076 7c0802a6 3d454000 813dc204 
554893be
[1.564566] 7d294010 7d294910 90010034 39290001 <0f09> 7c3e0b78 955e0008 
3fe0c062
[1.572425] ---[ end trace 6f6984225b280ad6 ]---
[1.577467] PA: 0x0900 for VA: 0xc900
[1.581799] PA: 0x061e8f50 for VA: 0xc61e8f50

Signed-off-by: Christophe Leroy 
---
  v3: Added missing linux/mm.h
  I realised that a driver may use DMA on stack after checking with 
virt_addr_valid(), so the new
  verification might induce false positives. I remove it for now, will add 
it again later in a more
  controled way.


What is this comment referring to?

I can't see any difference to v2 except the linux/mm.h include.


v2 was:


@@ -804,6 +806,11 @@ extern void __iounmap_at(void *ea, unsigned long size);
  */
 static inline unsigned long virt_to_phys(volatile void * address)
 {
+   if (IS_ENABLED(CONFIG_DEBUG_VIRTUAL) &&
+   !WARN_ON(IS_ENABLED(CONFIG_HAVE_ARCH_VMAP_STACK) && current->pid &&
+object_is_on_stack((const void*)address)))
+   WARN_ON(!virt_addr_valid(address));
+
return __pa((unsigned long)address);
 }


v3 is: (same as v1)


@@ -804,6 +806,8 @@ extern void __iounmap_at(void *ea, unsigned long size);
  */
 static inline unsigned long virt_to_phys(volatile void * address)
 {
+   WARN_ON(IS_ENABLED(CONFIG_DEBUG_VIRTUAL) && !virt_addr_valid(address));
+
return __pa((unsigned long)address);
 }


The idea in v2 was to detect objects on stack used for DMA before 
activating CONFIG_VMAP_STACK, but if the driver uses virt_addr_valid() 
to decide if it can DMA map it, then we'll get false positives.
So I think this should be added with a dedicated DEBUG CONFIG option, 
not implicitely.


Christophe


Re: use generic DMA mapping code in powerpc V4

2018-12-11 Thread Christian Zigotzky



> On 12. Dec 2018, at 01:47, Benjamin Herrenschmidt  
> wrote:
> 
>> On Tue, 2018-12-11 at 19:17 +0100, Christian Zigotzky wrote:
>> X5000 (P5020 board): U-Boot loads the kernel and the dtb file. Then the 
>> kernel starts but it doesn't find any hard disks (partitions). That 
>> means this is also the bad commit for the P5020 board.
> 
> What are the disks hanging off ? A PCIe device of some sort ?
> 
> Can you send good & bad dmesg logs ?
> 
> Ben.
> 
> 
Unfortunately not. It doesn’t detect any hard disk. That means the kernel ring 
buffer won’t save to log files. I don’t have a serial null modem cable for 
seeing all output. We use SATA disks. I will investigate more in this problem 
with rollback the files in the bad commit.

— Christian





[powerpc:next-test 70/71] include/asm-generic/memory_model.h:55:54: error: 'vmemmap' undeclared; did you mean 'mem_map'?

2018-12-11 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
next-test
head:   13fd174be1903cc069bb73b0d53a0420b8f6d778
commit: 3a0f466ce2402cc090c97ccc1285508f688d8a8e [70/71] powerpc: Implement 
CONFIG_DEBUG_VIRTUAL
config: powerpc-ppc64e_defconfig (attached as .config)
compiler: powerpc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout 3a0f466ce2402cc090c97ccc1285508f688d8a8e
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=powerpc 

All errors (new ones prefixed by >>):

   In file included from arch/powerpc/include/asm/page.h:339:0,
from arch/powerpc/include/asm/thread_info.h:29,
from include/linux/thread_info.h:38,
from include/asm-generic/preempt.h:5,
from ./arch/powerpc/include/generated/asm/preempt.h:1,
from include/linux/preempt.h:81,
from include/linux/spinlock.h:51,
from include/linux/seqlock.h:36,
from include/linux/time.h:6,
from include/linux/stat.h:19,
from include/linux/module.h:10,
from init/do_mounts.c:1:
   arch/powerpc/include/asm/io.h: In function 'page_to_phys':
>> include/asm-generic/memory_model.h:55:54: error: 'vmemmap' undeclared (first 
>> use in this function); did you mean 'mem_map'?
#define __page_to_pfn(page) (unsigned long)((page) - vmemmap)
 ^
   include/asm-generic/memory_model.h:81:21: note: in expansion of macro 
'__page_to_pfn'
#define page_to_pfn __page_to_pfn
^
   arch/powerpc/include/asm/io.h:835:22: note: in expansion of macro 
'page_to_pfn'
 unsigned long pfn = page_to_pfn(page);
 ^~~
   include/asm-generic/memory_model.h:55:54: note: each undeclared identifier 
is reported only once for each function it appears in
#define __page_to_pfn(page) (unsigned long)((page) - vmemmap)
 ^
   include/asm-generic/memory_model.h:81:21: note: in expansion of macro 
'__page_to_pfn'
#define page_to_pfn __page_to_pfn
^
   arch/powerpc/include/asm/io.h:835:22: note: in expansion of macro 
'page_to_pfn'
 unsigned long pfn = page_to_pfn(page);
 ^~~

vim +55 include/asm-generic/memory_model.h

8f6aac41 Christoph Lameter  2007-10-16  52  
af901ca1 André Goddard Rosa 2009-11-14  53  /* memmap is virtually contiguous.  
*/
8f6aac41 Christoph Lameter  2007-10-16  54  #define __pfn_to_page(pfn)  
(vmemmap + (pfn))
32272a26 Martin Schwidefsky 2008-12-25 @55  #define __page_to_pfn(page) 
(unsigned long)((page) - vmemmap)
8f6aac41 Christoph Lameter  2007-10-16  56  

:: The code at line 55 was first introduced by commit
:: 32272a26974d2027384fd4010cd1780fca425d94 [S390] __page_to_pfn warnings

:: TO: Martin Schwidefsky 
:: CC: Martin Schwidefsky 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [Resend PATCH V5 7/10] KVM: Make kvm_set_spte_hva() return int

2018-12-11 Thread Paul Mackerras
On Thu, Dec 06, 2018 at 09:21:10PM +0800, lantianyu1...@gmail.com wrote:
> From: Lan Tianyu 
> 
> The patch is to make kvm_set_spte_hva() return int and caller can
> check return value to determine flush tlb or not.

It would be helpful if the patch description told the reader which
return value(s) mean that the caller should flush the tlb.  I would
guess that non-zero means to do the flush, but you should make that
explicit.

> Signed-off-by: Lan Tianyu 

For the powerpc bits:

Acked-by: Paul Mackerras 


Re: [PATCH 10/10] perf/doc: update design.txt for exclude_{host|guest} flags

2018-12-11 Thread Michael Ellerman
Andrew Murray  writes:
> On Tue, Dec 11, 2018 at 10:06:53PM +1100, Michael Ellerman wrote:
>> [ Reviving old thread. ]
>> 
>> Andrew Murray  writes:
>> > On Tue, Nov 20, 2018 at 10:31:36PM +1100, Michael Ellerman wrote:
>> >> Andrew Murray  writes:
>> >> 
>> >> > Update design.txt to reflect the presence of the exclude_host
>> >> > and exclude_guest perf flags.
>> >> >
>> >> > Signed-off-by: Andrew Murray 
>> >> > ---
>> >> >  tools/perf/design.txt | 4 
>> >> >  1 file changed, 4 insertions(+)
>> >> >
>> >> > diff --git a/tools/perf/design.txt b/tools/perf/design.txt
>> >> > index a28dca2..7de7d83 100644
>> >> > --- a/tools/perf/design.txt
>> >> > +++ b/tools/perf/design.txt
>> >> > @@ -222,6 +222,10 @@ The 'exclude_user', 'exclude_kernel' and 
>> >> > 'exclude_hv' bits provide a
>> >> >  way to request that counting of events be restricted to times when the
>> >> >  CPU is in user, kernel and/or hypervisor mode.
>> >> >  
>> >> > +Furthermore the 'exclude_host' and 'exclude_guest' bits provide a way
>> >> > +to request counting of events restricted to guest and host contexts 
>> >> > when
>> >> > +using virtualisation.
>> >> 
>> >> How does exclude_host differ from exclude_hv ?
>> >
>> > I believe exclude_host / exclude_guest are intented to distinguish
>> > between host and guest in the hosted hypervisor context (KVM).
>> 
>> OK yeah, from the perf-list man page:
>> 
>>u - user-space counting
>>k - kernel counting
>>h - hypervisor counting
>>I - non idle counting
>>G - guest counting (in KVM guests)
>>H - host counting (not in KVM guests)
>> 
>> > Whereas exclude_hv allows to distinguish between guest and
>> > hypervisor in the bare-metal type hypervisors.
>> 
>> Except that's exactly not how we use them on powerpc :)
>> 
>> We use exclude_hv to exclude "the hypervisor", regardless of whether
>> it's KVM or PowerVM (which is a bare-metal hypervisor).
>> 
>> We don't use exclude_host / exclude_guest at all, which I guess is a
>> bug, except I didn't know they existed until this thread.
>> 
>> eg, in a KVM guest:
>> 
>>   $ perf record -e cycles:G /bin/bash -c "for i in {0..10}; do :;done"
>>   $ perf report -D | grep -Fc "dso: [hypervisor]"
>>   16
>> 
>> 
>> > In the case of arm64 - if VHE extensions are present then the host
>> > kernel will run at a higher privilege to the guest kernel, in which
>> > case there is no distinction between hypervisor and host so we ignore
>> > exclude_hv. But where VHE extensions are not present then the host
>> > kernel runs at the same privilege level as the guest and we use a
>> > higher privilege level to switch between them - in this case we can
>> > use exclude_hv to discount that hypervisor role of switching between
>> > guests.
>> 
>> I couldn't find any arm64 perf code using exclude_host/guest at all?
>
> Correct - but this is in flight as I am currently adding support for this
> see [1].

OK, so at least that will be consistent across arm64 & x86.

>> And I don't see any x86 code using exclude_hv.
>
> I can't find any either.

I think that's because they don't need it, because they don't let guests
program the PMU directly. It's all handled by the host and the host
doesn't let the guest count host cycles anyway. But I could be wrong I'm
no x86 expert.

>> But maybe that's OK, I just worry this is confusing for users.
>
> There is some extra context regarding this where exclude_guest/exclude_host
> was added, see [2]

Good find. I had looked at that commit, but the thread on the list is
more informative.

In fact there was even a man page update! Never occurred to me look
there :P

http://man7.org/linux/man-pages/man2/perf_event_open.2.html

   exclude_host (since Linux 3.2)
  When conducting measurements that include processes running VM
  instances (i.e., have executed a KVM_RUN ioctl(2)), only mea‐
  sure events happening inside a guest instance.  This is only
  meaningful outside the guests; this setting does not change
  counts gathered inside of a guest.  Currently, this function‐
  ality is x86 only.

   exclude_guest (since Linux 3.2)
  When conducting measurements that include processes running VM
  instances (i.e., have executed a KVM_RUN ioctl(2)), do not
  measure events happening inside guest instances.  This is only
  meaningful outside the guests; this setting does not change
  counts gathered inside of a guest.  Currently, this function‐
  ality is x86 only.


Which makes things much clearer.

Perhaps you want to add a reference to the man page in your text,
something like?

  Furthermore the 'exclude_host' and 'exclude_guest' bits provide a way
  to request counting of events restricted to guest and host contexts when
  using virtualisation. See the perf_event_open(2) man page for more
  detail.


cheers


Re: [PATCH v4] kbuild: Add support for DT binding schema checks

2018-12-11 Thread Masahiro Yamada
On Wed, Dec 12, 2018 at 5:24 AM Rob Herring  wrote:
>
> This adds the build infrastructure for checking DT binding schema
> documents and validating dts files using the binding schema.
>
> Check DT binding schema documents:
> make dt_binding_check
>
> Build dts files and check using DT binding schema:
> make dtbs_check
>
> Optionally, DT_SCHEMA_FILES can be passed in with a schema file(s) to
> use for validation. This makes it easier to find and fix errors
> generated by a specific schema.
>
> Currently, the validation targets are separate from a normal build to
> avoid a hard dependency on the external DT schema project and because
> there are lots of warnings generated.
>
> Cc: Jonathan Corbet 
> Cc: Mark Rutland 
> Cc: Masahiro Yamada 
> Cc: Michal Marek 
> Cc: linux-...@vger.kernel.org
> Cc: devicet...@vger.kernel.org
> Cc: linux-kbu...@vger.kernel.org
> Signed-off-by: Rob Herring 
> ---
> v4:
> - Rework libyaml check and error message with Masahiro's version
> - Simplify build rules and dependencies
>


Acked-by: Masahiro Yamada 


Thanks.


-- 
Best Regards
Masahiro Yamada


Re: [PATCH v3] kbuild: Add support for DT binding schema checks

2018-12-11 Thread Masahiro Yamada
On Wed, Dec 12, 2018 at 3:36 AM Rob Herring  wrote:
>
> On Tue, Dec 11, 2018 at 10:03 AM Masahiro Yamada
>  wrote:
> >
> > On Wed, Dec 12, 2018 at 12:13 AM Rob Herring  wrote:
> >
> > >
> > > > > +$(obj)/%.example.dts: $(src)/%.yaml FORCE
> > > > > +   $(call if_changed,chk_binding)
> > > > > +
> > > > > +DT_TMP_SCHEMA := .schema.yaml.tmp
> > > >
> > > >
> > > > BTW, why does this file start with a period?
> > > > What is the meaning of '.tmp' extension?
> > >
> > > Nothing really. Just named it something so it gets cleaned and ignored by 
> > > git.
> >
> >
> > It is cleaned whatever file name you use.
> >
> >
> > See scripts/Makefile.clean
> >
> > __clean-files   := $(extra-y) $(extra-m) $(extra-)   \
> >$(always) $(targets) $(clean-files)   \
> >$(hostprogs-y) $(hostprogs-m) $(hostprogs-) \
> >$(hostlibs-y) $(hostlibs-m) $(hostlibs-) \
> >$(hostcxxlibs-y) $(hostcxxlibs-m)
> >
> >
> > $(extra-y) is cleaned.
>
> True.
>
> >
> >
> > You are adding *.example.dts to .gitignore
> >
> > Why not "schema.yaml" ?
>
> Okay. I'll do "processed-schema.yaml" to give a bit better name of
> what it contains.
>
> >
> > > > > +extra-y += $(DT_TMP_SCHEMA)
> > > > > +
> > > > > +quiet_cmd_mk_schema = SCHEMA  $@
> > > > > +  cmd_mk_schema = mkdir -p $(obj); \
> > > > > +  rm -f $@; \
> > > > > +  $(DT_MK_SCHEMA) $(DT_MK_SCHEMA_FLAGS) -o $@ 
> > > > > $(filter-out FORCE, $^)
> > > >
> > > >
> > > > "mkdir -p $(obj)" is redundant.
> > > >
> > > >
> > > > Why is 'rm -f $@' necessary ?
> > > > Can't dt-mk-schema overwrite the output file?
> > >
> > > It is for error case when the output file is not generated. I can
> > > handle this within dt-mk-schema instead.
> > > > > +DT_DOCS = $(shell cd $(srctree)/$(src) && find * -name '*.yaml')
> > > > > +DT_SCHEMA_FILES ?= $(addprefix $(src)/,$(DT_DOCS))
> > > > > +
> > > > > +extra-y += $(patsubst $(src)/%.yaml,%.example.dts, 
> > > > > $(DT_SCHEMA_FILES))
> > > > > +extra-y += $(patsubst $(src)/%.yaml,%.example.dtb, 
> > > > > $(DT_SCHEMA_FILES))
> > > >
> > > >
> > > >
> > > > I assume you intentionally did not do like this:
> > > >
> > > > extra-y += $(patsubst %.yaml,%.example.dtb, $(DT_DOCS))
> > > >
> > > > From the commit description, DT_SCHEMA_FILES might be overridden by a 
> > > > user.
> > > > So, I think this is OK.
> > > >
> > > >
> > > >
> > > >
> > > > > +$(obj)/$(DT_TMP_SCHEMA): | $(addprefix $(obj)/,$(patsubst 
> > > > > $(src)/%.yaml,%.example.dtb, $(DT_SCHEMA_FILES)))
> > > >
> > > > I do not understand this line.
> > > > Why is it necessary?
> > > >
> > > > *.example.dtb files are generated anyway
> > > > since they are listed in extra-y.
> > >
> > > It is enforcing the ordering. Without it, the binding checks and
> > > building .schema.yaml.tmp happen in parallel because both only have
> > > the source files as dependencies. The '|' keeps the dependencies out
> > > of the dependency list($^).
> >
> >
> > What kind problem would you see if
> > the binding checks and building .schema.yaml.tmp
> > happen in parallel?
>
> In case of no errors in the binding docs, it doesn't matter. If there
> are errors, I don't want the dtbs validation to run if any schema
> doesn't validate. However, I played around with this a bit more and it
> seems like having the examples' dts/dtb in extra-y prevents that from
> happening. Does that match your expections?

Exactly.

If any error occurs in Documentation/devicetree/bindings/Makefile,
Make terminates before proceeding to the dtbs_check stage.



-- 
Best Regards
Masahiro Yamada


Re: [PATCH] PCI: Add no-D3 quirk for Mellanox ConnectX-[45]

2018-12-11 Thread Bjorn Helgaas
On Tue, Dec 11, 2018 at 6:38 PM David Gibson
 wrote:
>
> On Tue, Dec 11, 2018 at 08:01:43AM -0600, Bjorn Helgaas wrote:
> > Hi David,
> >
> > I see you're still working on this, but if you do end up going this
> > direction eventually, would you mind splitting this into two patches:
> > 1) rename the quirk to make it more generic (but not changing any
> > behavior), and 2) add the ConnectX devices to the quirk.  That way
> > the ConnectX change is smaller and more easily
> > understood/reverted/etc.
>
> Sure.  Would it make sense to send (1) as an independent cleanup,
> while I'm still working out exactly what (if anything) we need for
> (2)?

You could, but I don't think there's really much benefit in doing the
first without the second, and I think there is some value in handling
both patches at the same time.


Re: [PATCH v2] ocxl: Fix endiannes bug in read_afu_name()

2018-12-11 Thread Andrew Donnellan

On 12/12/18 4:58 am, Greg Kurz wrote:

The AFU Descriptor Template in the PCI config space has a Name Space
field which is a 24 Byte ASCII character string of descriptive name
space for the AFU. The OCXL driver read the string four characters at
a time with pci_read_config_dword().

This optimization is valid on a little-endian system since this is PCI,
but a big-endian system ends up with each subset of four characters in
reverse order.

This could be fixed by switching to read characters one by one. Another
option is to swap the bytes if we're big-endian.

Go for the latter with le32_to_cpu().

Cc: sta...@vger.kernel.org  # v4.16
Signed-off-by: Greg Kurz 
Acked-by: Frederic Barrat 


Acked-by: Andrew Donnellan 


---
v2: - silence sparse with (__force __le32) cast
 - new changelog
---
  drivers/misc/ocxl/config.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
index 57a6bb1fd3c9..8f2c5d8bd2ee 100644
--- a/drivers/misc/ocxl/config.c
+++ b/drivers/misc/ocxl/config.c
@@ -318,7 +318,7 @@ static int read_afu_name(struct pci_dev *dev, struct 
ocxl_fn_config *fn,
if (rc)
return rc;
ptr = (u32 *) >name[i];
-   *ptr = val;
+   *ptr = le32_to_cpu((__force __le32) val);
}
afu->name[OCXL_AFU_NAME_SZ - 1] = '\0'; /* play safe */
return 0;



--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited



RE: [PATCH 1/2 v3] powerpc/fsl: Use new clockgen binding

2018-12-11 Thread Andy Tang


> -Original Message-
> From: Scott Wood 
> Sent: 2018年11月26日 9:19
> To: Andy Tang 
> Cc: mturque...@baylibre.com; sb...@kernel.org; robh...@kernel.org;
> mark.rutl...@arm.com; b...@kernel.crashing.org; pau...@samba.org;
> m...@ellerman.id.au; linux-...@vger.kernel.org;
> devicet...@vger.kernel.org; linux-ker...@vger.kernel.org;
> linuxppc-dev@lists.ozlabs.org
> Subject: Re: [PATCH 1/2 v3] powerpc/fsl: Use new clockgen binding
> 
> On Wed, 2018-10-31 at 14:57 +0800, Yuantian Tang wrote:
> > From: Scott Wood 
> >
> > The driver retains compatibility with old device trees, but we don't
> > want the old nodes lying around to be copied, or used as a reference
> > (some of the mux options are incorrect), or even just being clutter.
> >
> >
> > +sysclk: sysclk {
> > +   compatible = "fixed-clock";
> > +   #clock-cells = <0>;
> > +   clock-frequency = <1>;
> > +   clock-output-names = "sysclk";
> > +};
> > +
> >  clockgen: global-utilities@e1000 {
> 
> The U-Boot fixup won't work with this.  U-Boot patches the frequency
> directly into the clockgen node (BTW, this is another reason to preserve
> the generic
> 1.0/2.0 compatible string).  The new binding does not require an input
> clock node when it is provided as clock-frequency directly in the clockgen
> node -- and the sysclk node was not in my original patch (nor did you note
> that you made changes from that original).  Why did you add it?
> 
> I would just remove it when applying, but I'm concerned that this indicates
> a lack of testing (and I don't have the hardware access to test it myself,
> except on t4240) -- unless the 100 MHz sysclk just happened to be correct
> on the machines you tested (which would also be a test coverage
> problem)?
[Andy] You are right. Sysclk may not be useful anymore. 
Uboot will fixup the clockgen node correctly. Please apply this patch without 
sysclk. We will
test it and catch the error if the clock is not fixed correctly.

BTW, which git tree are you going to apply it on? This one?
https://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux.git/log/?h=next

BR,
Andy
> 
> -Scott
> 



Re: use generic DMA mapping code in powerpc V4

2018-12-11 Thread Benjamin Herrenschmidt
On Tue, 2018-12-11 at 19:17 +0100, Christian Zigotzky wrote:
> X5000 (P5020 board): U-Boot loads the kernel and the dtb file. Then the 
> kernel starts but it doesn't find any hard disks (partitions). That 
> means this is also the bad commit for the P5020 board.

What are the disks hanging off ? A PCIe device of some sort ?

Can you send good & bad dmesg logs ?

Ben.




Re: [PATCH] PCI: Add no-D3 quirk for Mellanox ConnectX-[45]

2018-12-11 Thread David Gibson
On Tue, Dec 11, 2018 at 08:01:43AM -0600, Bjorn Helgaas wrote:
> Hi David,
> 
> I see you're still working on this, but if you do end up going this
> direction eventually, would you mind splitting this into two patches:
> 1) rename the quirk to make it more generic (but not changing any
> behavior), and 2) add the ConnectX devices to the quirk.  That way
> the ConnectX change is smaller and more easily
> understood/reverted/etc.

Sure.  Would it make sense to send (1) as an independent cleanup,
while I'm still working out exactly what (if anything) we need for
(2)?

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v3] powerpc: implement CONFIG_DEBUG_VIRTUAL

2018-12-11 Thread Michael Ellerman
Christophe Leroy  writes:

> This patch implements CONFIG_DEBUG_VIRTUAL to warn about
> incorrect use of virt_to_phys() and page_to_phys()
>
> Below is the result of test_debug_virtual:
>
> [1.438746] WARNING: CPU: 0 PID: 1 at ./arch/powerpc/include/asm/io.h:808 
> test_debug_virtual_init+0x3c/0xd4
> [1.448156] CPU: 0 PID: 1 Comm: swapper Not tainted 
> 4.20.0-rc5-00560-g6bfb52e23a00-dirty #532
> [1.457259] NIP:  c066c550 LR: c0650ccc CTR: c066c514
> [1.462257] REGS: c900bdb0 TRAP: 0700   Not tainted  
> (4.20.0-rc5-00560-g6bfb52e23a00-dirty)
> [1.471184] MSR:  00029032   CR: 48000422  XER: 2000
> [1.477811]
> [1.477811] GPR00: c0650ccc c900be60 c60d  006000c0 c900 
> 9032 c7fa0020
> [1.477811] GPR08: 2400 0001 0900  c07b5d04  
> c00037d8 
> [1.477811] GPR16:     c076 c074 
> 0092 c0685bb0
> [1.477811] GPR24: c065042c c068a734 c0685b8c 0006  c076 
> c075c3c0 
> [1.512711] NIP [c066c550] test_debug_virtual_init+0x3c/0xd4
> [1.518315] LR [c0650ccc] do_one_initcall+0x8c/0x1cc
> [1.523163] Call Trace:
> [1.525595] [c900be60] [c0567340] 0xc0567340 (unreliable)
> [1.530954] [c900be90] [c0650ccc] do_one_initcall+0x8c/0x1cc
> [1.536551] [c900bef0] [c0651000] kernel_init_freeable+0x1f4/0x2cc
> [1.542658] [c900bf30] [c00037ec] kernel_init+0x14/0x110
> [1.547913] [c900bf40] [c000e1d0] ret_from_kernel_thread+0x14/0x1c
> [1.553971] Instruction dump:
> [1.556909] 3ca50100 bfa10024 54a5000e 3fa0c076 7c0802a6 3d454000 813dc204 
> 554893be
> [1.564566] 7d294010 7d294910 90010034 39290001 <0f09> 7c3e0b78 
> 955e0008 3fe0c062
> [1.572425] ---[ end trace 6f6984225b280ad6 ]---
> [1.577467] PA: 0x0900 for VA: 0xc900
> [1.581799] PA: 0x061e8f50 for VA: 0xc61e8f50
>
> Signed-off-by: Christophe Leroy 
> ---
>  v3: Added missing linux/mm.h
>  I realised that a driver may use DMA on stack after checking with 
> virt_addr_valid(), so the new
>  verification might induce false positives. I remove it for now, will add 
> it again later in a more
>  controled way.

What is this comment referring to?

I can't see any difference to v2 except the linux/mm.h include.

cheers


Re: Snowpatch

2018-12-11 Thread Russell Currey


On Wed, Dec 12, 2018, at 8:41 AM, Christophe Leroy wrote:
> snowpatch is pretty surprising on 
> https://patchwork.ozlabs.org/patch/1003954/
> 
> How can such a trivial patch for 8xx alter compilation of all other 
> targets that much (ie add or remove sparse warnings) ?

It was a bug I fixed as soon as I noticed and shouldn't happen any more.

> 
> Christophe


[PATCH v3] powerpc: implement CONFIG_DEBUG_VIRTUAL

2018-12-11 Thread Christophe Leroy
This patch implements CONFIG_DEBUG_VIRTUAL to warn about
incorrect use of virt_to_phys() and page_to_phys()

Below is the result of test_debug_virtual:

[1.438746] WARNING: CPU: 0 PID: 1 at ./arch/powerpc/include/asm/io.h:808 
test_debug_virtual_init+0x3c/0xd4
[1.448156] CPU: 0 PID: 1 Comm: swapper Not tainted 
4.20.0-rc5-00560-g6bfb52e23a00-dirty #532
[1.457259] NIP:  c066c550 LR: c0650ccc CTR: c066c514
[1.462257] REGS: c900bdb0 TRAP: 0700   Not tainted  
(4.20.0-rc5-00560-g6bfb52e23a00-dirty)
[1.471184] MSR:  00029032   CR: 48000422  XER: 2000
[1.477811]
[1.477811] GPR00: c0650ccc c900be60 c60d  006000c0 c900 
9032 c7fa0020
[1.477811] GPR08: 2400 0001 0900  c07b5d04  
c00037d8 
[1.477811] GPR16:     c076 c074 
0092 c0685bb0
[1.477811] GPR24: c065042c c068a734 c0685b8c 0006  c076 
c075c3c0 
[1.512711] NIP [c066c550] test_debug_virtual_init+0x3c/0xd4
[1.518315] LR [c0650ccc] do_one_initcall+0x8c/0x1cc
[1.523163] Call Trace:
[1.525595] [c900be60] [c0567340] 0xc0567340 (unreliable)
[1.530954] [c900be90] [c0650ccc] do_one_initcall+0x8c/0x1cc
[1.536551] [c900bef0] [c0651000] kernel_init_freeable+0x1f4/0x2cc
[1.542658] [c900bf30] [c00037ec] kernel_init+0x14/0x110
[1.547913] [c900bf40] [c000e1d0] ret_from_kernel_thread+0x14/0x1c
[1.553971] Instruction dump:
[1.556909] 3ca50100 bfa10024 54a5000e 3fa0c076 7c0802a6 3d454000 813dc204 
554893be
[1.564566] 7d294010 7d294910 90010034 39290001 <0f09> 7c3e0b78 955e0008 
3fe0c062
[1.572425] ---[ end trace 6f6984225b280ad6 ]---
[1.577467] PA: 0x0900 for VA: 0xc900
[1.581799] PA: 0x061e8f50 for VA: 0xc61e8f50

Signed-off-by: Christophe Leroy 
---
 v3: Added missing linux/mm.h
 I realised that a driver may use DMA on stack after checking with 
virt_addr_valid(), so the new
 verification might induce false positives. I remove it for now, will add 
it again later in a more
 controled way.

 v2: Using asm/pgtable.h to avoid build failure on ppc64e.
 Added a verification that the object is not in stack to catch problems 
before activing VMAP_STACK.

 arch/powerpc/Kconfig  |  1 +
 arch/powerpc/include/asm/io.h | 13 -
 arch/powerpc/mm/pgtable_32.c  |  2 +-
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index e312e92e3381..94b46624068d 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -128,6 +128,7 @@ config PPC
#
# Please keep this list sorted alphabetically.
#
+   select ARCH_HAS_DEBUG_VIRTUAL
select ARCH_HAS_DEVMEM_IS_ALLOWED
select ARCH_HAS_DMA_SET_COHERENT_MASK
select ARCH_HAS_ELF_RANDOMIZE
diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
index e746becd9d6f..7f19fbd3ba55 100644
--- a/arch/powerpc/include/asm/io.h
+++ b/arch/powerpc/include/asm/io.h
@@ -29,12 +29,14 @@ extern struct pci_dev *isa_bridge_pcidev;
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_PPC64
 #include 
@@ -804,6 +806,8 @@ extern void __iounmap_at(void *ea, unsigned long size);
  */
 static inline unsigned long virt_to_phys(volatile void * address)
 {
+   WARN_ON(IS_ENABLED(CONFIG_DEBUG_VIRTUAL) && !virt_addr_valid(address));
+
return __pa((unsigned long)address);
 }
 
@@ -827,7 +831,14 @@ static inline void * phys_to_virt(unsigned long address)
 /*
  * Change "struct page" to physical address.
  */
-#define page_to_phys(page) ((phys_addr_t)page_to_pfn(page) << PAGE_SHIFT)
+static inline phys_addr_t page_to_phys(struct page *page)
+{
+   unsigned long pfn = page_to_pfn(page);
+
+   WARN_ON(IS_ENABLED(CONFIG_DEBUG_VIRTUAL) && !pfn_valid(pfn));
+
+   return PFN_PHYS(pfn);
+}
 
 /*
  * 32 bits still uses virt_to_bus() for it's implementation of DMA
diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index 4fc77a99c9bf..68d204a45cd0 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -143,7 +143,7 @@ __ioremap_caller(phys_addr_t addr, unsigned long size, 
pgprot_t prot, void *call
 * Don't allow anybody to remap normal RAM that we're using.
 * mem_init() sets high_memory so only do the check after that.
 */
-   if (slab_is_available() && (p < virt_to_phys(high_memory)) &&
+   if (slab_is_available() && virt_addr_valid(p) &&
page_is_ram(__phys_to_pfn(p))) {
printk("__ioremap(): phys addr 0x%llx is RAM lr %ps\n",
   (unsigned long long)p, __builtin_return_address(0));
-- 
2.13.3



Re: [powerpc:next-test 70/71] arch/powerpc/include/asm/io.h:835:22: note: in expansion of macro 'page_to_pfn'

2018-12-11 Thread Christophe Leroy

Missing linux/mm.h

Will fix it in the coming patch.

Christophe

Le 11/12/2018 à 22:55, kbuild test robot a écrit :

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
next-test
head:   13fd174be1903cc069bb73b0d53a0420b8f6d778
commit: 3a0f466ce2402cc090c97ccc1285508f688d8a8e [70/71] powerpc: Implement 
CONFIG_DEBUG_VIRTUAL
config: powerpc-ps3_petitboot_nfs_defconfig (attached as .config)
compiler: powerpc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
 wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
 chmod +x ~/bin/make.cross
 git checkout 3a0f466ce2402cc090c97ccc1285508f688d8a8e
 # save the attached .config to linux build tree
 GCC_VERSION=7.2.0 make.cross ARCH=powerpc

All warnings (new ones prefixed by >>):

In file included from arch/powerpc/include/asm/page.h:339:0,
 from arch/powerpc/include/asm/thread_info.h:29,
 from include/linux/thread_info.h:38,
 from include/asm-generic/preempt.h:5,
 from ./arch/powerpc/include/generated/asm/preempt.h:1,
 from include/linux/preempt.h:81,
 from include/linux/spinlock.h:51,
 from include/linux/seqlock.h:36,
 from include/linux/time.h:6,
 from include/linux/jiffies.h:9,
 from drivers/net/loopback.c:32:
arch/powerpc/include/asm/io.h: In function 'page_to_phys':
include/asm-generic/memory_model.h:64:14: error: implicit declaration of 
function 'page_to_section'; did you mean '__nr_to_section'? 
[-Werror=implicit-function-declaration]
  int __sec = page_to_section(__pg);   \
  ^
include/asm-generic/memory_model.h:81:21: note: in expansion of macro 
'__page_to_pfn'
 #define page_to_pfn __page_to_pfn
 ^

arch/powerpc/include/asm/io.h:835:22: note: in expansion of macro 'page_to_pfn'

  unsigned long pfn = page_to_pfn(page);
  ^~~
In file included from include/linux/scatterlist.h:8:0,
 from include/linux/dma-mapping.h:11,
 from include/linux/skbuff.h:34,
 from include/net/net_namespace.h:36,
 from include/linux/inet.h:46,
 from drivers/net/loopback.c:46:
include/linux/mm.h: At top level:
include/linux/mm.h:1121:29: error: conflicting types for 'page_to_section'
 static inline unsigned long page_to_section(const struct page *page)
 ^~~
In file included from arch/powerpc/include/asm/page.h:339:0,
 from arch/powerpc/include/asm/thread_info.h:29,
 from include/linux/thread_info.h:38,
 from include/asm-generic/preempt.h:5,
 from ./arch/powerpc/include/generated/asm/preempt.h:1,
 from include/linux/preempt.h:81,
 from include/linux/spinlock.h:51,
 from include/linux/seqlock.h:36,
 from include/linux/time.h:6,
 from include/linux/jiffies.h:9,
 from drivers/net/loopback.c:32:
include/asm-generic/memory_model.h:64:14: note: previous implicit 
declaration of 'page_to_section' was here
  int __sec = page_to_section(__pg);   \
  ^
include/asm-generic/memory_model.h:81:21: note: in expansion of macro 
'__page_to_pfn'
 #define page_to_pfn __page_to_pfn
 ^

arch/powerpc/include/asm/io.h:835:22: note: in expansion of macro 'page_to_pfn'

  unsigned long pfn = page_to_pfn(page);
  ^~~
cc1: some warnings being treated as errors
--
In file included from arch/powerpc/include/asm/page.h:339:0,
 from arch/powerpc/include/asm/book3s/64/mmu.h:5,
 from arch/powerpc/include/asm/mmu.h:328,
 from arch/powerpc/include/asm/lppaca.h:36,
 from arch/powerpc/include/asm/paca.h:21,
 from arch/powerpc/include/asm/current.h:16,
 from include/linux/mutex.h:14,
 from include/linux/kernfs.h:13,
 from include/linux/sysfs.h:16,
 from include/linux/kobject.h:20,
 from include/linux/device.h:16,
 from drivers/of/irq.c:19:
arch/powerpc/include/asm/io.h: In function 'page_to_phys':
include/asm-generic/memory_model.h:64:14: error: implicit declaration of 
function 'page_to_section'; did you mean '__nr_to_section'? 
[-Werror=implicit-function-declaration]
  int __sec = page_to_section(__pg);   \
  ^
include/asm-generic/memory_model.h:81:21: note: in expansion of macro 

Re: [RFC PATCH v2 11/11] powerpc/book3s32: Implement Kernel Userspace Access Protection

2018-12-11 Thread Christophe Leroy




On 12/11/2018 05:25 AM, Russell Currey wrote:

On Wed, 2018-11-28 at 09:27 +, Christophe Leroy wrote:

This patch implements Kernel Userspace Access Protection for
book3s/32.

Due to limitations of the processor page protection capabilities,
the protection is only against writing. read protection cannot be
achieved using page protection.

In order to provide the protection, Ku and Ks keys are modified in
Userspace Segment registers, and different PP bits are used to:

PP01 provides RW for Key 0 and RO for Key 1
PP10 provides RW for all
PP11 provides RO for all

Today PP10 is used for RW pages and PP11 for RO pages. This patch
modifies page protection to PP01 for RW pages.

Then segment registers are set to Ku 0 and Ks 1. When kernel needs
to write to RW pages, the associated segment register is changed to
Ks 0 in order to allow write access to the kernel.

In order to avoid having the read all segment registers when
locking/unlocking the access, some data is kept in the thread_struct
and saved on stack on exceptions. The field identifies both the
first unlocked segment and the first segment following the last
unlocked one. When no segment is unlocked, it contains value 0.

Signed-off-by: Christophe Leroy 


Hey Christophe, I tried to test this and got a machine check after the
kernel starts init.


A program check you mean ?



Vector: 700 (Program Check) at [ef0b5e70]
 pc: 0ca4
 lr: b7e1a030
 sp: ef0b5f30
msr: 81002
   current = 0xef0b8000
 pid   = 1, comm = init

Testing with mac99 model in qemu.


That's pretty surprising. At 0xca4 there is nothing particular for me. 
This is a handler for system call. Do you have the same ?
How can this trigger a program check ? According to the MSR, the check 
is due to an illegal instruction (bit 12). An we are with MMU off.


cc00 :
cc00:   7d 50 43 a6 mtsprg  0,r10
cc04:   7d 71 43 a6 mtsprg  1,r11
cc08:   7d 40 00 26 mfcrr10
cc0c:   7d 7b 02 a6 mfsrr1  r11
cc10:   71 6b 40 00 andi.   r11,r11,16384
cc14:   3d 61 40 00 addis   r11,r1,16384
cc18:   41 82 00 14 beq cc2c 
cc1c:   7d 73 42 a6 mfsprg  r11,3
cc20:   81 6b fb d8 lwz r11,-1064(r11)
cc24:   39 6b 20 00 addir11,r11,8192
cc28:   3d 6b 40 00 addis   r11,r11,16384
cc2c:   39 6b ff 40 addir11,r11,-192
cc30:   91 4b 00 a8 stw r10,168(r11)
cc34:   91 8b 00 40 stw r12,64(r11)
cc38:   91 2b 00 34 stw r9,52(r11)
cc3c:   7d 50 42 a6 mfsprg  r10,0
cc40:   91 4b 00 38 stw r10,56(r11)
cc44:   7d 91 42 a6 mfsprg  r12,1
cc48:   91 8b 00 3c stw r12,60(r11)
cc4c:   7d 48 02 a6 mflrr10
cc50:   91 4b 00 a0 stw r10,160(r11)
cc54:   7d 9a 02 a6 mfsrr0  r12
cc58:   7d 3b 02 a6 mfsrr1  r9
cc5c:   90 2b 00 14 stw r1,20(r11)
cc60:   90 2b 00 00 stw r1,0(r11)
cc64:   3c 2b c0 00 addis   r1,r11,-16384
cc68:   39 40 10 02 li  r10,4098
cc6c:   7d 40 01 24 mtmsr   r10
cc70:   90 0b 00 10 stw r0,16(r11)
cc74:   3d 40 72 65 lis r10,29285
cc78:   39 4a 67 73 addir10,r10,26483
cc7c:   91 4b 00 08 stw r10,8(r11)
cc80:   90 6b 00 1c stw r3,28(r11)
cc84:   90 8b 00 20 stw r4,32(r11)
cc88:   90 ab 00 24 stw r5,36(r11)
cc8c:   90 cb 00 28 stw r6,40(r11)
cc90:   90 eb 00 2c stw r7,44(r11)
cc94:   91 0b 00 30 stw r8,48(r11)
cc98:   39 40 0c 01 li  r10,3073
cc9c:   91 4b 00 b0 stw r10,176(r11)
cca0:   39 40 10 32 li  r10,4146
cca4:   51 2a 04 20 rlwimi  r10,r9,0,16,16
cca8:   48 01 13 5d bl  c0012004 

Christophe



- Russell



Re: [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop

2018-12-11 Thread Michael Bringmann
Note from Scott Mayes on latest crash:

Michael,

Since the partition crashed, I was able to get the last .2 seconds worth of 
RTAS call trace leading up to the crash.

Best I could tell from that bit of trace was that the removal of a processor 
involved the following steps:
-- Call to stop-self for a given thread
-- Repeated calls to query-cpu-stopped-state (which eventually indicated the 
thread was stopped)
-- Call to get-sensor-state for the thread to check its entity-state (9003) 
sensor which returned 'dr-entity-present'
-- Call to set-indicator to set the isolation-state (9001) indicator to ISOLATE 
state
-- Call to set-indicator to set the allocation-state (9003) indicator to 
UNUSABLE state

I noticed one example of thread x28 getting through all of these steps just 
fine, but for thread x20, although the
query-cpu-stopped state returned 0 status (STOPPED), a subsequent call to 
set-indicator to ISOLATE
failed.  This failure was near the end of the trace, but was not the very last 
RTAS call made in the trace.
The set-indicator failure reported to Linux was a -9001 (Valid outstanding 
translation) which was mapped
from a 0x502 (Invalid thread state) return code from PHYP's H_SET_DR_STATE 
h-call.

On 12/10/2018 02:31 PM, Thiago Jung Bauermann wrote:
> 
> Hello Michael,
> 
> Michael Bringmann  writes:
> 
>> I have asked Scott Mayes to take a look at one of these crashes from
>> the phyp side.  I will let you know if he finds anything notable.
> 
> Thanks! It might make sense to test whether booting with
> cede_offline=off makes the bug go away.

Scott is looking at the system.  I will try once he is finished.

> 
> One suspicion I have is regarding the code handling CPU_STATE_INACTIVE.
>>From what I understand, it is a powerpc-specific CPU state and from the
> perspective of the generic CPU hotplug state machine, inactive CPUs are
> already fully offline. Which means that the locking performed by the
> generic code state machine doesn't apply to transitioning CPUs from
> INACTIVE to OFFLINE state. Perhaps the bug is that there is more than
> one CPU making that transition at the same time? That would cause two
> CPUs to call RTAS stop-self.
> 
> I haven't checked whether this is really possible or not, though. It's
> just a conjecture.

Michael

> 
> --
> Thiago Jung Bauermann
> IBM Linux Technology Center
> 
> 

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:   (512) 466-0650
m...@linux.vnet.ibm.com



[RFC 3/3] powerpc/numa: Apply mapping between HW and kernel cpus

2018-12-11 Thread Michael Bringmann
Apply new interface to map external powerpc cpus across multiple
nodes to a range of kernel cpu values.  Mapping is intended to
prevent confusion within the kernel about the cpu+node mapping, and
the changes in configuration that may happen due to powerpc LPAR
migration or other associativity changes during the lifetime of a
system.  These interfaces exchange the thread_index provided by the
'ibm,ppc-interrupt-server#s' properties, for an internal index to
be used by kernel scheduling interfaces.

Signed-off-by: Michael Bringmann 
---
 arch/powerpc/mm/numa.c   |   45 +-
 arch/powerpc/platforms/pseries/hotplug-cpu.c |   15 +++--
 2 files changed, 41 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 7d6bba264..59d7cd9 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1063,7 +1063,8 @@ u64 memory_hotplug_max(void)
 
 struct topology_update_data {
struct topology_update_data *next;
-   unsigned int cpu;
+   unsigned int old_cpu;
+   unsigned int new_cpu;
int old_nid;
int new_nid;
 };
@@ -1253,13 +1254,13 @@ static int update_cpu_topology(void *data)
 
for (update = data; update; update = update->next) {
int new_nid = update->new_nid;
-   if (cpu != update->cpu)
+   if (cpu != update->new_cpu)
continue;
 
-   unmap_cpu_from_node(cpu);
-   map_cpu_to_node(cpu, new_nid);
-   set_cpu_numa_node(cpu, new_nid);
-   set_cpu_numa_mem(cpu, local_memory_node(new_nid));
+   unmap_cpu_from_node(update->old_cpu);
+   map_cpu_to_node(update->new_cpu, new_nid);
+   set_cpu_numa_node(update->new_cpu, new_nid);
+   set_cpu_numa_mem(update->new_cpu, local_memory_node(new_nid));
vdso_getcpu_init();
}
 
@@ -1283,7 +1284,7 @@ static int update_lookup_table(void *data)
int nid, base, j;
 
nid = update->new_nid;
-   base = cpu_first_thread_sibling(update->cpu);
+   base = cpu_first_thread_sibling(update->new_cpu);
 
for (j = 0; j < threads_per_core; j++) {
update_numa_cpu_lookup_table(base + j, nid);
@@ -1305,7 +1306,7 @@ int numa_update_cpu_topology(bool cpus_locked)
struct topology_update_data *updates, *ud;
cpumask_t updated_cpus;
struct device *dev;
-   int weight, new_nid, i = 0;
+   int weight, new_nid, i = 0, ii;
 
if (!prrn_enabled && !vphn_enabled && topology_inited)
return 0;
@@ -1349,12 +1350,16 @@ int numa_update_cpu_topology(bool cpus_locked)
continue;
}
 
+   ii = 0;
for_each_cpu(sibling, cpu_sibling_mask(cpu)) {
ud = [i++];
ud->next = [i];
-   ud->cpu = sibling;
ud->new_nid = new_nid;
ud->old_nid = numa_cpu_lookup_table[sibling];
+   ud->old_cpu = sibling;
+   ud->new_cpu = cpuremap_map_cpu(
+   get_hard_smp_processor_id(sibling),
+   ii++, new_nid);
cpumask_set_cpu(sibling, _cpus);
}
cpu = cpu_last_thread_sibling(cpu);
@@ -1370,9 +1375,10 @@ int numa_update_cpu_topology(bool cpus_locked)
pr_debug("Topology update for the following CPUs:\n");
if (cpumask_weight(_cpus)) {
for (ud = [0]; ud; ud = ud->next) {
-   pr_debug("cpu %d moving from node %d "
- "to %d\n", ud->cpu,
- ud->old_nid, ud->new_nid);
+   pr_debug("cpu %d, node %d moving to"
+" cpu %d, node %d\n",
+ud->old_cpu, ud->old_nid,
+ud->new_cpu, ud->new_nid);
}
}
 
@@ -1409,13 +1415,20 @@ int numa_update_cpu_topology(bool cpus_locked)
 cpumask_of(raw_smp_processor_id()));
 
for (ud = [0]; ud; ud = ud->next) {
-   unregister_cpu_under_node(ud->cpu, ud->old_nid);
-   register_cpu_under_node(ud->cpu, ud->new_nid);
+   unregister_cpu_under_node(ud->old_cpu, ud->old_nid);
+   register_cpu_under_node(ud->new_cpu, ud->new_nid);
 
-   dev = get_cpu_device(ud->cpu);
+   dev = get_cpu_device(ud->old_cpu);
if (dev)
kobject_uevent(>kobj, KOBJ_CHANGE);
-   cpumask_clear_cpu(ud->cpu, _associativity_changes_mask);
+   cpumask_clear_cpu(ud->old_cpu, _associativity_changes_mask);
+   if (ud->old_cpu 

[RFC 2/3] powerpc/numa: Define mapping between HW and kernel cpus

2018-12-11 Thread Michael Bringmann
Define interface to map external powerpc cpus across multiple nodes
to a range of kernel cpu values.  Mapping is intended to prevent
confusion within the kernel about the cpu+node mapping, and the
changes in configuration that may happen due to powerpc LPAR
migration or other associativity changes during the lifetime of a
system.  These interfaces will be used entirely within the powerpc
kernel code to maintain separation between the machine and kernel
contexts.

Signed-off-by: Michael Bringmann 
---
 arch/powerpc/include/asm/topology.h   |   31 +++
 arch/powerpc/platforms/pseries/Kconfig|   10 ++
 arch/powerpc/platforms/pseries/Makefile   |1 
 arch/powerpc/platforms/pseries/cpuremap.c |  131 +
 4 files changed, 173 insertions(+)
 create mode 100644 arch/powerpc/platforms/pseries/cpuremap.c

diff --git a/arch/powerpc/include/asm/topology.h 
b/arch/powerpc/include/asm/topology.h
index 4621f40..db11969 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -135,5 +135,36 @@ static inline void shared_proc_topology_init(void) {}
 #endif
 #endif
 
+#define CPUREMAP_NO_CPU(~0)
+#define CPUREMAP_NO_THREAD (~0)
+
+#ifdef CONFIG_CPUREMAP
+extern int cpuremap_thread_to_cpu(int thread_index);
+   /* Return CPUREMAP_NO_CPU if not found */
+extern int cpuremap_map_cpu(int thread_index, int in_core_ndx, int node);
+   /* Return CPUREMAP_NO_CPU if fails */
+extern int cpuremap_reserve_cpu(int cpu);
+   /* Return CPUREMAP_NO_CPU if fails */
+extern int cpuremap_release_cpu(int cpu);
+   /* Return CPUREMAP_NO_CPU if fails */
+extern int cpuremap_cpu_to_thread(int cpu);
+   /* Return CPUREMAP_NO_THREAD if not found */
+extern void cpuremap_init(void);
+   /* Identify necessary constants & alloc memory at boot */
+#else
+static inline int cpuremap_thread_to_cpu(int thread_index)
+{
+   return thread_index;
+}
+static inline int cpuremap_map_cpu(int thread_index, int in_core_ndx, int node)
+{
+   return thread_index;
+}
+static inline int cpuremap_reserve_cpu(int cpu) { return cpu; }
+static inline int cpuremap_release_cpu(int cpu) { return cpu; }
+static inline int cpuremap_cpu_to_thread(int cpu) { return cpu; }
+static inline void cpuremap_init(void) {}
+#endif
+
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_TOPOLOGY_H */
diff --git a/arch/powerpc/platforms/pseries/Kconfig 
b/arch/powerpc/platforms/pseries/Kconfig
index 2e4bd32..c35009f 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -145,3 +145,13 @@ config PAPR_SCM
tristate "Support for the PAPR Storage Class Memory interface"
help
  Enable access to hypervisor provided storage class memory.
+  Enable access to hypervisor provided storage class memory.
+
+config CPUREMAP
+bool "Support for mapping hw cpu+node to kernel index"
+depends on SMP && (PPC_PSERIES)
+---help---
+  Say Y here to be able to remap hw cpu+node to standardized
+  kernel CPUs at runtime on Pseries machines.
+
+  Say N if you are unsure.
diff --git a/arch/powerpc/platforms/pseries/Makefile 
b/arch/powerpc/platforms/pseries/Makefile
index a43ec84..ad49d8e 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -13,6 +13,7 @@ obj-$(CONFIG_KEXEC_CORE)  += kexec.o
 obj-$(CONFIG_PSERIES_ENERGY)   += pseries_energy.o
 
 obj-$(CONFIG_HOTPLUG_CPU)  += hotplug-cpu.o
+obj-$(CONFIG_CPUREMAP) += cpuremap.o
 obj-$(CONFIG_MEMORY_HOTPLUG)   += hotplug-memory.o pmem.o
 
 obj-$(CONFIG_HVC_CONSOLE)  += hvconsole.o
diff --git a/arch/powerpc/platforms/pseries/cpuremap.c 
b/arch/powerpc/platforms/pseries/cpuremap.c
new file mode 100644
index 000..86fdf12
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/cpuremap.c
@@ -0,0 +1,131 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct cpuremap_cpu {
+   int thread_index;
+   /* Set to thread_index from ibm,ppc-interrupt-server#s arrays
+* Don't clear when release'ed
+*/
+   int node;
+   bool in_use;
+   /* Set to true when reserve'ed
+* Don't clear when release'ed
+   */
+};
+
+struct cpuremap_struct {
+   int num_nodes;
+   int num_cores;
+   int num_threads_per_core;
+   struct cpuremap_cpu *threads;
+} cpuremap_data;
+
+
+void cpuremap_init(void)
+{
+   int i, k;
+
+   /* Identify necessary constants & alloc memory at boot */
+   cpuremap_data.num_threads_per_core = 8;
+   cpuremap_data.num_cores = 32;
+   cpuremap_data.num_nodes =
+   nr_cpu_ids /
+   (cpuremap_data.num_threads_per_core * cpuremap_data.num_cores);
+   cpuremap_data.threads = kcalloc(nr_cpu_ids, 

[RFC 1/3] powerpc/numa: Conditionally online new nodes

2018-12-11 Thread Michael Bringmann
Add argument to allow caller to determine whether the node identified
for a cpu after an associativity / affinity change should be inited.

Signed-off-by: Michael Bringmann 
---
 arch/powerpc/include/asm/topology.h  |2 +-
 arch/powerpc/mm/numa.c   |6 +++---
 arch/powerpc/platforms/pseries/hotplug-cpu.c |2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/topology.h 
b/arch/powerpc/include/asm/topology.h
index a4a718d..4621f40 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -90,7 +90,7 @@ static inline void update_numa_cpu_lookup_table(unsigned int 
cpu, int node) {}
 extern int start_topology_update(void);
 extern int stop_topology_update(void);
 extern int prrn_is_enabled(void);
-extern int find_and_online_cpu_nid(int cpu);
+extern int find_and_online_cpu_nid(int cpu, bool must_online);
 extern int timed_topology_update(int nsecs);
 extern void __init shared_proc_topology_init(void);
 #else
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 87f0dd0..7d6bba264 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1197,7 +1197,7 @@ static long vphn_get_associativity(unsigned long cpu,
return rc;
 }
 
-int find_and_online_cpu_nid(int cpu)
+int find_and_online_cpu_nid(int cpu, bool must_online)
 {
__be32 associativity[VPHN_ASSOC_BUFSIZE] = {0};
int new_nid;
@@ -1210,7 +1210,7 @@ int find_and_online_cpu_nid(int cpu)
if (new_nid < 0 || !node_possible(new_nid))
new_nid = first_online_node;
 
-   if (NODE_DATA(new_nid) == NULL) {
+   if (must_online && (NODE_DATA(new_nid) == NULL)) {
 #ifdef CONFIG_MEMORY_HOTPLUG
/*
 * Need to ensure that NODE_DATA is initialized for a node from
@@ -1337,7 +1337,7 @@ int numa_update_cpu_topology(bool cpus_locked)
continue;
}
 
-   new_nid = find_and_online_cpu_nid(cpu);
+   new_nid = find_and_online_cpu_nid(cpu, true);
 
if (new_nid == numa_cpu_lookup_table[cpu]) {
cpumask_andnot(_associativity_changes_mask,
diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 2f8e621..620cb57 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -366,7 +366,7 @@ static int dlpar_online_cpu(struct device_node *dn)
!= CPU_STATE_OFFLINE);
cpu_maps_update_done();
timed_topology_update(1);
-   find_and_online_cpu_nid(cpu);
+   find_and_online_cpu_nid(cpu, true);
rc = device_online(get_cpu_device(cpu));
if (rc)
goto out;



[RFC 0/3] powerpc/pseries: Remap hw to kernel cpu indexes

2018-12-11 Thread Michael Bringmann
Define and apply new interface to map hardware-specific powerpc cpu
ids to a kernel specific range of cpu values.  Mapping is intended
to prevent confusion within the kernel about the cpu+node mapping,
and the changes in configuration that may happen due to powerpc LPAR
migration or other associativity changes during the lifetime of a
system.  These interfaces exchange the thread_index provided by the
'ibm,ppc-interrupt-server#s' properties, for an internal index to
be used by kernel scheduling interfaces.

Signed-off-by: Michael Bringmann 

Michael Bringmann (3):
  powerpc/numa: Conditionally online new nodes
  powerpc/numa: Define mapping between HW and kernel cpus
  powerpc/numa: Apply mapping between HW and kernel cpu



[powerpc:next-test 70/71] arch/powerpc/include/asm/io.h:835:22: note: in expansion of macro 'page_to_pfn'

2018-12-11 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
next-test
head:   13fd174be1903cc069bb73b0d53a0420b8f6d778
commit: 3a0f466ce2402cc090c97ccc1285508f688d8a8e [70/71] powerpc: Implement 
CONFIG_DEBUG_VIRTUAL
config: powerpc-ps3_petitboot_nfs_defconfig (attached as .config)
compiler: powerpc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout 3a0f466ce2402cc090c97ccc1285508f688d8a8e
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=powerpc 

All warnings (new ones prefixed by >>):

   In file included from arch/powerpc/include/asm/page.h:339:0,
from arch/powerpc/include/asm/thread_info.h:29,
from include/linux/thread_info.h:38,
from include/asm-generic/preempt.h:5,
from ./arch/powerpc/include/generated/asm/preempt.h:1,
from include/linux/preempt.h:81,
from include/linux/spinlock.h:51,
from include/linux/seqlock.h:36,
from include/linux/time.h:6,
from include/linux/jiffies.h:9,
from drivers/net/loopback.c:32:
   arch/powerpc/include/asm/io.h: In function 'page_to_phys':
   include/asm-generic/memory_model.h:64:14: error: implicit declaration of 
function 'page_to_section'; did you mean '__nr_to_section'? 
[-Werror=implicit-function-declaration]
 int __sec = page_to_section(__pg);   \
 ^
   include/asm-generic/memory_model.h:81:21: note: in expansion of macro 
'__page_to_pfn'
#define page_to_pfn __page_to_pfn
^
>> arch/powerpc/include/asm/io.h:835:22: note: in expansion of macro 
>> 'page_to_pfn'
 unsigned long pfn = page_to_pfn(page);
 ^~~
   In file included from include/linux/scatterlist.h:8:0,
from include/linux/dma-mapping.h:11,
from include/linux/skbuff.h:34,
from include/net/net_namespace.h:36,
from include/linux/inet.h:46,
from drivers/net/loopback.c:46:
   include/linux/mm.h: At top level:
   include/linux/mm.h:1121:29: error: conflicting types for 'page_to_section'
static inline unsigned long page_to_section(const struct page *page)
^~~
   In file included from arch/powerpc/include/asm/page.h:339:0,
from arch/powerpc/include/asm/thread_info.h:29,
from include/linux/thread_info.h:38,
from include/asm-generic/preempt.h:5,
from ./arch/powerpc/include/generated/asm/preempt.h:1,
from include/linux/preempt.h:81,
from include/linux/spinlock.h:51,
from include/linux/seqlock.h:36,
from include/linux/time.h:6,
from include/linux/jiffies.h:9,
from drivers/net/loopback.c:32:
   include/asm-generic/memory_model.h:64:14: note: previous implicit 
declaration of 'page_to_section' was here
 int __sec = page_to_section(__pg);   \
 ^
   include/asm-generic/memory_model.h:81:21: note: in expansion of macro 
'__page_to_pfn'
#define page_to_pfn __page_to_pfn
^
>> arch/powerpc/include/asm/io.h:835:22: note: in expansion of macro 
>> 'page_to_pfn'
 unsigned long pfn = page_to_pfn(page);
 ^~~
   cc1: some warnings being treated as errors
--
   In file included from arch/powerpc/include/asm/page.h:339:0,
from arch/powerpc/include/asm/book3s/64/mmu.h:5,
from arch/powerpc/include/asm/mmu.h:328,
from arch/powerpc/include/asm/lppaca.h:36,
from arch/powerpc/include/asm/paca.h:21,
from arch/powerpc/include/asm/current.h:16,
from include/linux/mutex.h:14,
from include/linux/kernfs.h:13,
from include/linux/sysfs.h:16,
from include/linux/kobject.h:20,
from include/linux/device.h:16,
from drivers/of/irq.c:19:
   arch/powerpc/include/asm/io.h: In function 'page_to_phys':
   include/asm-generic/memory_model.h:64:14: error: implicit declaration of 
function 'page_to_section'; did you mean '__nr_to_section'? 
[-Werror=implicit-function-declaration]
 int __sec = page_to_section(__pg);   \
 ^
   include/asm-generic/memory_model.h:81:21: note: in expansion of macro 
'__page_to_pfn'
#define page_to_pfn __page_to_pfn
^
>> arch/powerpc/include/asm/io.h:835:22: note: in expansion of macro 
>> 'page_to_pfn'
 

Snowpatch

2018-12-11 Thread Christophe Leroy
snowpatch is pretty surprising on 
https://patchwork.ozlabs.org/patch/1003954/


How can such a trivial patch for 8xx alter compilation of all other 
targets that much (ie add or remove sparse warnings) ?


Christophe


Re: [PATCH] i2c: powermac: Use of_node_name_eq for node name comparisons

2018-12-11 Thread Wolfram Sang
On Wed, Dec 05, 2018 at 01:50:24PM -0600, Rob Herring wrote:
> Convert string compares of DT node names to use of_node_name_eq helper
> instead. This removes direct access to the node name pointer.
> 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: linux-...@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Rob Herring 

Applied to for-next, thanks!



signature.asc
Description: PGP signature


[PATCH v4] kbuild: Add support for DT binding schema checks

2018-12-11 Thread Rob Herring
This adds the build infrastructure for checking DT binding schema
documents and validating dts files using the binding schema.

Check DT binding schema documents:
make dt_binding_check

Build dts files and check using DT binding schema:
make dtbs_check

Optionally, DT_SCHEMA_FILES can be passed in with a schema file(s) to
use for validation. This makes it easier to find and fix errors
generated by a specific schema.

Currently, the validation targets are separate from a normal build to
avoid a hard dependency on the external DT schema project and because
there are lots of warnings generated.

Cc: Jonathan Corbet 
Cc: Mark Rutland 
Cc: Masahiro Yamada 
Cc: Michal Marek 
Cc: linux-...@vger.kernel.org
Cc: devicet...@vger.kernel.org
Cc: linux-kbu...@vger.kernel.org
Signed-off-by: Rob Herring 
---
v4:
- Rework libyaml check and error message with Masahiro's version
- Simplify build rules and dependencies


 .gitignore   |  1 +
 Documentation/Makefile   |  2 +-
 Documentation/devicetree/bindings/.gitignore |  2 ++
 Documentation/devicetree/bindings/Makefile   | 27 
 Makefile | 13 +++---
 scripts/Makefile.lib | 24 +++--
 scripts/dtc/Makefile |  4 +++
 7 files changed, 67 insertions(+), 6 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/.gitignore
 create mode 100644 Documentation/devicetree/bindings/Makefile

diff --git a/.gitignore b/.gitignore
index 97ba6b79834c..a20ac26aa2f5 100644
--- a/.gitignore
+++ b/.gitignore
@@ -15,6 +15,7 @@
 *.bin
 *.bz2
 *.c.[012]*.*
+*.dt.yaml
 *.dtb
 *.dtb.S
 *.dwo
diff --git a/Documentation/Makefile b/Documentation/Makefile
index 2ca77ad0f238..9786957c6a35 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -2,7 +2,7 @@
 # Makefile for Sphinx documentation
 #
 
-subdir-y :=
+subdir-y := devicetree/bindings/
 
 # You can set these variables from the command line.
 SPHINXBUILD   = sphinx-build
diff --git a/Documentation/devicetree/bindings/.gitignore 
b/Documentation/devicetree/bindings/.gitignore
new file mode 100644
index ..ef82fcfcccab
--- /dev/null
+++ b/Documentation/devicetree/bindings/.gitignore
@@ -0,0 +1,2 @@
+*.example.dts
+processed-schema.yaml
diff --git a/Documentation/devicetree/bindings/Makefile 
b/Documentation/devicetree/bindings/Makefile
new file mode 100644
index ..6e5cef0ed6fb
--- /dev/null
+++ b/Documentation/devicetree/bindings/Makefile
@@ -0,0 +1,27 @@
+# SPDX-License-Identifier: GPL-2.0
+DT_DOC_CHECKER ?= dt-doc-validate
+DT_EXTRACT_EX ?= dt-extract-example
+DT_MK_SCHEMA ?= dt-mk-schema
+DT_MK_SCHEMA_FLAGS := $(if $(DT_SCHEMA_FILES), -u)
+
+quiet_cmd_chk_binding = CHKDT   $(patsubst $(srctree)/%,%,$<)
+  cmd_chk_binding = $(DT_DOC_CHECKER) $< ; \
+$(DT_EXTRACT_EX) $< > $@
+
+$(obj)/%.example.dts: $(src)/%.yaml FORCE
+   $(call if_changed,chk_binding)
+
+DT_TMP_SCHEMA := processed-schema.yaml
+extra-y += $(DT_TMP_SCHEMA)
+
+quiet_cmd_mk_schema = SCHEMA  $@
+  cmd_mk_schema = $(DT_MK_SCHEMA) $(DT_MK_SCHEMA_FLAGS) -o $@ $(filter-out 
FORCE, $^)
+
+DT_DOCS = $(shell cd $(srctree)/$(src) && find * -name '*.yaml')
+DT_SCHEMA_FILES ?= $(addprefix $(src)/,$(DT_DOCS))
+
+extra-y += $(patsubst $(src)/%.yaml,%.example.dts, $(DT_SCHEMA_FILES))
+extra-y += $(patsubst $(src)/%.yaml,%.example.dtb, $(DT_SCHEMA_FILES))
+
+$(obj)/$(DT_TMP_SCHEMA): $(DT_SCHEMA_FILES) FORCE
+   $(call if_changed,mk_schema)
diff --git a/Makefile b/Makefile
index 2f36db897895..a3e2db2a3119 100644
--- a/Makefile
+++ b/Makefile
@@ -1232,10 +1232,13 @@ ifneq ($(dtstree),)
 %.dtb: prepare3 scripts_dtc
$(Q)$(MAKE) $(build)=$(dtstree) $(dtstree)/$@
 
-PHONY += dtbs dtbs_install
-dtbs: prepare3 scripts_dtc
+PHONY += dtbs dtbs_install dt_binding_check
+dtbs dtbs_check: prepare3 scripts_dtc
$(Q)$(MAKE) $(build)=$(dtstree)
 
+dtbs_check: export CHECK_DTBS=1
+dtbs_check: dt_binding_check
+
 dtbs_install:
$(Q)$(MAKE) $(dtbinst)=$(dtstree)
 
@@ -1249,6 +1252,9 @@ PHONY += scripts_dtc
 scripts_dtc: scripts_basic
$(Q)$(MAKE) $(build)=scripts/dtc
 
+dt_binding_check: scripts_dtc
+   $(Q)$(MAKE) $(build)=Documentation/devicetree/bindings
+
 # ---
 # Modules
 
@@ -1611,7 +1617,8 @@ clean: $(clean-dirs)
$(call cmd,rmfiles)
@find $(if $(KBUILD_EXTMOD), $(KBUILD_EXTMOD), .) $(RCS_FIND_IGNORE) \
\( -name '*.[aios]' -o -name '*.ko' -o -name '.*.cmd' \
-   -o -name '*.ko.*' -o -name '*.dtb' -o -name '*.dtb.S' \
+   -o -name '*.ko.*' \
+   -o -name '*.dtb' -o -name '*.dtb.S' -o -name '*.dt.yaml' \
-o -name '*.dwo' -o -name '*.lst' \
-o -name '*.su'  \
-o -name '.*.d' -o -name '.*.tmp' -o -name '*.mod.c' \
diff --git 

Re: [PATCH v3] kbuild: Add support for DT binding schema checks

2018-12-11 Thread Rob Herring
On Tue, Dec 11, 2018 at 10:03 AM Masahiro Yamada
 wrote:
>
> On Wed, Dec 12, 2018 at 12:13 AM Rob Herring  wrote:
>
> >
> > > > +$(obj)/%.example.dts: $(src)/%.yaml FORCE
> > > > +   $(call if_changed,chk_binding)
> > > > +
> > > > +DT_TMP_SCHEMA := .schema.yaml.tmp
> > >
> > >
> > > BTW, why does this file start with a period?
> > > What is the meaning of '.tmp' extension?
> >
> > Nothing really. Just named it something so it gets cleaned and ignored by 
> > git.
>
>
> It is cleaned whatever file name you use.
>
>
> See scripts/Makefile.clean
>
> __clean-files   := $(extra-y) $(extra-m) $(extra-)   \
>$(always) $(targets) $(clean-files)   \
>$(hostprogs-y) $(hostprogs-m) $(hostprogs-) \
>$(hostlibs-y) $(hostlibs-m) $(hostlibs-) \
>$(hostcxxlibs-y) $(hostcxxlibs-m)
>
>
> $(extra-y) is cleaned.

True.

>
>
> You are adding *.example.dts to .gitignore
>
> Why not "schema.yaml" ?

Okay. I'll do "processed-schema.yaml" to give a bit better name of
what it contains.

>
> > > > +extra-y += $(DT_TMP_SCHEMA)
> > > > +
> > > > +quiet_cmd_mk_schema = SCHEMA  $@
> > > > +  cmd_mk_schema = mkdir -p $(obj); \
> > > > +  rm -f $@; \
> > > > +  $(DT_MK_SCHEMA) $(DT_MK_SCHEMA_FLAGS) -o $@ 
> > > > $(filter-out FORCE, $^)
> > >
> > >
> > > "mkdir -p $(obj)" is redundant.
> > >
> > >
> > > Why is 'rm -f $@' necessary ?
> > > Can't dt-mk-schema overwrite the output file?
> >
> > It is for error case when the output file is not generated. I can
> > handle this within dt-mk-schema instead.
> > > > +DT_DOCS = $(shell cd $(srctree)/$(src) && find * -name '*.yaml')
> > > > +DT_SCHEMA_FILES ?= $(addprefix $(src)/,$(DT_DOCS))
> > > > +
> > > > +extra-y += $(patsubst $(src)/%.yaml,%.example.dts, $(DT_SCHEMA_FILES))
> > > > +extra-y += $(patsubst $(src)/%.yaml,%.example.dtb, $(DT_SCHEMA_FILES))
> > >
> > >
> > >
> > > I assume you intentionally did not do like this:
> > >
> > > extra-y += $(patsubst %.yaml,%.example.dtb, $(DT_DOCS))
> > >
> > > From the commit description, DT_SCHEMA_FILES might be overridden by a 
> > > user.
> > > So, I think this is OK.
> > >
> > >
> > >
> > >
> > > > +$(obj)/$(DT_TMP_SCHEMA): | $(addprefix $(obj)/,$(patsubst 
> > > > $(src)/%.yaml,%.example.dtb, $(DT_SCHEMA_FILES)))
> > >
> > > I do not understand this line.
> > > Why is it necessary?
> > >
> > > *.example.dtb files are generated anyway
> > > since they are listed in extra-y.
> >
> > It is enforcing the ordering. Without it, the binding checks and
> > building .schema.yaml.tmp happen in parallel because both only have
> > the source files as dependencies. The '|' keeps the dependencies out
> > of the dependency list($^).
>
>
> What kind problem would you see if
> the binding checks and building .schema.yaml.tmp
> happen in parallel?

In case of no errors in the binding docs, it doesn't matter. If there
are errors, I don't want the dtbs validation to run if any schema
doesn't validate. However, I played around with this a bit more and it
seems like having the examples' dts/dtb in extra-y prevents that from
happening. Does that match your expections? If so, then I think we can
remove the dependency.

Here's an example.

  SCHEMA  Documentation/devicetree/bindings/.schema.yaml.tmp
  CHKDT   Documentation/devicetree/bindings/arm/vt8500.yaml
  CHKDT   Documentation/devicetree/bindings/arm/zte.yaml
  CHKDT   Documentation/devicetree/bindings/arm/altera/socfpga-clk-manager.yaml
  CHKDT   Documentation/devicetree/bindings/arm/ti/ti,davinci.yaml
  CHKDT   Documentation/devicetree/bindings/arm/ti/nspire.yaml
  CHKDT   Documentation/devicetree/bindings/arm/spear.yaml
  CHKDT   Documentation/devicetree/bindings/arm/altera.yaml
warning: no schema found in file:
../Documentation/devicetree/bindings/arm/xilinx.yaml
/home/rob/proj/git/linux-2.6/Documentation/devicetree/bindings/arm/xilinx.yaml:
ignoring, error in schema 'onOf'
  CHKDT   Documentation/devicetree/bindings/arm/sti.yaml
  CHKDT   Documentation/devicetree/bindings/arm/qcom.yaml
  CHKDT   Documentation/devicetree/bindings/arm/primecell.yaml
  CHKDT   Documentation/devicetree/bindings/arm/cpus.yaml
  CHKDT   Documentation/devicetree/bindings/arm/sirf.yaml
  CHKDT   Documentation/devicetree/bindings/arm/calxeda.yaml
  CHKDT   Documentation/devicetree/bindings/arm/xilinx.yaml
  CHKDT   Documentation/devicetree/bindings/example-schema.yaml
/home/rob/proj/git/linux-2.6/Documentation/devicetree/bindings/arm/xilinx.yaml:
properties:compatible:onOf: 'onOf' is not one of ['$ref',
'additionalItems', 'allOf', 'const', 'contains', 'default',
'dependencies', 'description', 'enum', 'items', 'minItems', 'minimum',
'maxItems', 'maximum', 'not', 'oneOf', 'pattern', 'patternProperties',
'properties', 'required', 'type', 'typeSize']
make[2]: *** [../Documentation/devicetree/bindings/Makefile:12:
Documentation/devicetree/bindings/arm/xilinx.example.dts] Error 1

Rob


Re: use generic DMA mapping code in powerpc V4

2018-12-11 Thread Christian Zigotzky
Next step: 7decbcfc656805603ab97206b3f816f26cd2cf7d (powerpc/dma: use 
generic direct and swiotlb ops)


git checkout 7decbcfc656805603ab97206b3f816f26cd2cf7d

We have the bad commit! :-) The PASEMI onboard ethernet doesn't work 
with this commit anymore.


Error messages:

[  367.627623] pci :00:1a.0: dma_direct_map_page: overflow 
0x00026bcb5002+110 of device mask  bus mask 0
[  367.627631] pci :00:1a.0: dma_direct_map_page: overflow 
0x00026bcb5002+110 of device mask  bus mask 0
[  367.627639] pci :00:1a.0: dma_direct_map_page: overflow 
0x00026bcb5002+110 of device mask  bus mask 0
[  367.627647] pci :00:1a.0: dma_direct_map_page: overflow 
0x00026bcb5002+110 of device mask  bus mask 0


pci :00:1a.0 = 00:1a.0 DMA controller: PA Semi, Inc PWRficient DMA 
Controller (rev 12)


X5000 (P5020 board): U-Boot loads the kernel and the dtb file. Then the 
kernel starts but it doesn't find any hard disks (partitions). That 
means this is also the bad commit for the P5020 board.


Link to the bad commit: 
http://git.infradead.org/users/hch/misc.git/commit/7decbcfc656805603ab97206b3f816f26cd2cf7d


Link to the Git: 
http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/powerpc-dma.5


The commit before (977706f9755d2d697aa6f45b4f9f0e07516efeda - 
powerpc/dma: remove dma_nommu_mmap_coherent) works without any problems.


-- Christian




[PATCH v2] ocxl: Fix endiannes bug in read_afu_name()

2018-12-11 Thread Greg Kurz
The AFU Descriptor Template in the PCI config space has a Name Space
field which is a 24 Byte ASCII character string of descriptive name
space for the AFU. The OCXL driver read the string four characters at
a time with pci_read_config_dword().

This optimization is valid on a little-endian system since this is PCI,
but a big-endian system ends up with each subset of four characters in
reverse order.

This could be fixed by switching to read characters one by one. Another
option is to swap the bytes if we're big-endian.

Go for the latter with le32_to_cpu().

Cc: sta...@vger.kernel.org  # v4.16
Signed-off-by: Greg Kurz 
Acked-by: Frederic Barrat 
---
v2: - silence sparse with (__force __le32) cast
- new changelog
---
 drivers/misc/ocxl/config.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
index 57a6bb1fd3c9..8f2c5d8bd2ee 100644
--- a/drivers/misc/ocxl/config.c
+++ b/drivers/misc/ocxl/config.c
@@ -318,7 +318,7 @@ static int read_afu_name(struct pci_dev *dev, struct 
ocxl_fn_config *fn,
if (rc)
return rc;
ptr = (u32 *) >name[i];
-   *ptr = val;
+   *ptr = le32_to_cpu((__force __le32) val);
}
afu->name[OCXL_AFU_NAME_SZ - 1] = '\0'; /* play safe */
return 0;



Re: [PATCH v03] powerpc/mobility: Fix node detach/rename problem

2018-12-11 Thread Michael Bringmann
--- Snip ---

>>
>> The mobility.c code continues on during the second migration, accepts
>> the definitions of the new nodes from the PHYP and ends up renaming
>> the new properties e.g.
>>
>> [ 4565.827296] Duplicate name in base, renamed to "ibm,platform-facilities#1"
>>
>> There is no check like 'of_node_check_flag(np, OF_DETACHED)' within
>> of_find_node_by_phandle to skip nodes that are detached, but still
>> present due to caching or use count considerations.  Also, note that
>> of_find_node_by_phandle also uses a 'phandle_cache' which does not
>> appear to be updated when of_detach_node() is invoked.
> 
> This seems like the real bug. Since the phandle cache was added we can
> now find detached nodes when we shouldn't be able to.
> 
> Does the patch below work?
> 
> cheers
> 
> diff --git a/drivers/of/base.c b/drivers/of/base.c
> index 09692c9b32a7..d8e4534c0686 100644
> --- a/drivers/of/base.c
> +++ b/drivers/of/base.c
> @@ -1190,6 +1190,10 @@ struct device_node *of_find_node_by_phandle(phandle 
> handle)
>   if (phandle_cache[masked_handle] &&
>   handle == phandle_cache[masked_handle]->phandle)
>   np = phandle_cache[masked_handle];
> +
> + /* If we find a detached node, remove it */
> + if (of_node_check_flag(np, OF_DETACHED))
> + np = phandle_cache[masked_handle] = NULL;
>   }
> 
>   if (!np) {
> 
> 

I think this would be a bit better for cases where masked values overlap:

diff --git a/drivers/of/base.c b/drivers/of/base.c
index 09692c9..ec79129 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -1188,8 +1188,13 @@ struct device_node *of_find_node_by_phandle(phandle 
handle)
 
if (phandle_cache) {
if (phandle_cache[masked_handle] &&
-   handle == phandle_cache[masked_handle]->phandle)
-   np = phandle_cache[masked_handle];
+   handle == phandle_cache[masked_handle]->phandle) {
+   np = phandle_cache[masked_handle];
+
+   /* If we find a detached node, remove it */
+   if (of_node_check_flag(np, OF_DETACHED))
+   np = phandle_cache[masked_handle] = NULL;
+   }
}
 
if (!np) {


Will try it out.  Wouldn't it be better to do this when the node is detached
in drivers/of/dynamic.c:__of_detach_node()?

Thanks.
Michael

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:   (512) 466-0650
m...@linux.vnet.ibm.com



Re: [PATCH v03] powerpc/mobility: Fix node detach/rename problem

2018-12-11 Thread Rob Herring
On Tue, Dec 11, 2018 at 7:29 AM Michael Ellerman  wrote:
>
> Hi Michael,
>
> Please Cc the device tree folks on device tree patches, and also the
> original author of the patch that added the code you're modifying.
>
> So I've added:
>   robh...@kernel.org
>   frowand.l...@gmail.com
>   devicet...@vger.kernel.org
>   linux-ker...@vger.kernel.org
>
> Michael Bringmann  writes:
> > The PPC mobility code receives RTAS requests to delete nodes with
> > platform-/hardware-specific attributes when restarting the kernel
> > after a migration.  My example is for migration between a P8 Alpine
> > and a P8 Brazos.   Nodes to be deleted include 'ibm,random-v1',
> > 'ibm,platform-facilities', 'ibm,sym-encryption-v1', and,
> > 'ibm,compression-v1'.
> >
> > The mobility.c code calls 'of_detach_node' for the nodes and their
> > children.  This makes calls to detach the properties and to remove
> > the associated sysfs/kernfs files.
> >
> > Then new copies of the same nodes are next provided by the PHYP,
> > local copies are built, and a pointer to the 'struct device_node'
> > is passed to of_attach_node.  Before the call to of_attach_node,
> > the phandle is initialized to 0 when the data structure is alloced.
> > During the call to of_attach_node, it calls __of_attach_node which
> > pulls the actual name and phandle from just created sub-properties
> > named something like 'name' and 'ibm,phandle'.
> >
> > This is all fine for the first migration.  The problem occurs with
> > the second and subsequent migrations when the PHYP on the new system
> > wants to replace the same set of nodes again, referenced with the
> > same names and phandle values.
> >
> > On the second and subsequent migrations, the PHYP tells the system
> > to again delete the nodes 'ibm,platform-facilities', 'ibm,random-v1',
> > 'ibm,compression-v1', 'ibm,sym-encryption-v1'.  It specifies these
> > nodes by its known set of phandle values -- the same handles used
> > by the PHYP on the source system are known on the target system.
> > The mobility.c code calls of_find_node_by_phandle() with these values
> > and ends up locating the first instance of each node that was added
> > during the original boot, instead of the second instance of each node
> > created after the first migration.  The detach during the second
> > migration fails with errors like,
> >
> > [ 4565.030704] WARNING: CPU: 3 PID: 4787 at drivers/of/dynamic.c:252 
> > __of_detach_node+0x8/0xa0
> > [ 4565.030708] Modules linked in: nfsv3 nfs_acl nfs tcp_diag udp_diag 
> > inet_diag unix_diag af_packet_diag netlink_diag lockd grace fscache sunrpc 
> > xts vmx_crypto sg pseries_rng binfmt_misc ip_tables xfs libcrc32c sd_mod 
> > ibmveth ibmvscsi scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod
> > [ 4565.030733] CPU: 3 PID: 4787 Comm: drmgr Tainted: GW 
> > 4.18.0-rc1-wi107836-v05-120+ #201
> > [ 4565.030737] NIP:  c07c1ea8 LR: c07c1fb4 CTR: 
> > 00655170
> > [ 4565.030741] REGS: c003f302b690 TRAP: 0700   Tainted: GW  
> > (4.18.0-rc1-wi107836-v05-120+)
> > [ 4565.030745] MSR:  80010282b033 
> >   CR: 22288822  XER: 000a
> > [ 4565.030757] CFAR: c07c1fb0 IRQMASK: 1
> > [ 4565.030757] GPR00: c07c1fa4 c003f302b910 c114bf00 
> > c0038e68
> > [ 4565.030757] GPR04: 0001  80c008e0b4b8 
> > 
> > [ 4565.030757] GPR08:  0001 8003 
> > 2843
> > [ 4565.030757] GPR12: 8800 c0001ec9ae00 4000 
> > 
> > [ 4565.030757] GPR16:  0008  
> > f6ff
> > [ 4565.030757] GPR20: 0007  c003e9f1f034 
> > 0001
> > [ 4565.030757] GPR24:    
> > 
> > [ 4565.030757] GPR28: c1549d28 c1134828 c0038e68 
> > c003f302b930
> > [ 4565.030804] NIP [c07c1ea8] __of_detach_node+0x8/0xa0
> > [ 4565.030808] LR [c07c1fb4] of_detach_node+0x74/0xd0
> > [ 4565.030811] Call Trace:
> > [ 4565.030815] [c003f302b910] [c07c1fa4] 
> > of_detach_node+0x64/0xd0 (unreliable)
> > [ 4565.030821] [c003f302b980] [c00c33c4] 
> > dlpar_detach_node+0xb4/0x150
> > [ 4565.030826] [c003f302ba10] [c00c3ffc] 
> > delete_dt_node+0x3c/0x80
> > [ 4565.030831] [c003f302ba40] [c00c4380] 
> > pseries_devicetree_update+0x150/0x4f0
> > [ 4565.030836] [c003f302bb70] [c00c479c] 
> > post_mobility_fixup+0x7c/0xf0
> > [ 4565.030841] [c003f302bbe0] [c00c4908] 
> > migration_store+0xf8/0x130
> > [ 4565.030847] [c003f302bc70] [c0998160] 
> > kobj_attr_store+0x30/0x60
> > [ 4565.030852] [c003f302bc90] [c0412f14] 
> > sysfs_kf_write+0x64/0xa0
> > [ 4565.030857] [c003f302bcb0] [c0411cac] 
> > 

Re: [PATCH v3] kbuild: Add support for DT binding schema checks

2018-12-11 Thread Masahiro Yamada
On Wed, Dec 12, 2018 at 12:13 AM Rob Herring  wrote:

>
> > > +$(obj)/%.example.dts: $(src)/%.yaml FORCE
> > > +   $(call if_changed,chk_binding)
> > > +
> > > +DT_TMP_SCHEMA := .schema.yaml.tmp
> >
> >
> > BTW, why does this file start with a period?
> > What is the meaning of '.tmp' extension?
>
> Nothing really. Just named it something so it gets cleaned and ignored by git.


It is cleaned whatever file name you use.


See scripts/Makefile.clean

__clean-files   := $(extra-y) $(extra-m) $(extra-)   \
   $(always) $(targets) $(clean-files)   \
   $(hostprogs-y) $(hostprogs-m) $(hostprogs-) \
   $(hostlibs-y) $(hostlibs-m) $(hostlibs-) \
   $(hostcxxlibs-y) $(hostcxxlibs-m)


$(extra-y) is cleaned.



You are adding *.example.dts to .gitignore

Why not "schema.yaml" ?




> > > +extra-y += $(DT_TMP_SCHEMA)
> > > +
> > > +quiet_cmd_mk_schema = SCHEMA  $@
> > > +  cmd_mk_schema = mkdir -p $(obj); \
> > > +  rm -f $@; \
> > > +  $(DT_MK_SCHEMA) $(DT_MK_SCHEMA_FLAGS) -o $@ 
> > > $(filter-out FORCE, $^)
> >
> >
> > "mkdir -p $(obj)" is redundant.
> >
> >
> > Why is 'rm -f $@' necessary ?
> > Can't dt-mk-schema overwrite the output file?
>
> It is for error case when the output file is not generated. I can
> handle this within dt-mk-schema instead.
> > > +DT_DOCS = $(shell cd $(srctree)/$(src) && find * -name '*.yaml')
> > > +DT_SCHEMA_FILES ?= $(addprefix $(src)/,$(DT_DOCS))
> > > +
> > > +extra-y += $(patsubst $(src)/%.yaml,%.example.dts, $(DT_SCHEMA_FILES))
> > > +extra-y += $(patsubst $(src)/%.yaml,%.example.dtb, $(DT_SCHEMA_FILES))
> >
> >
> >
> > I assume you intentionally did not do like this:
> >
> > extra-y += $(patsubst %.yaml,%.example.dtb, $(DT_DOCS))
> >
> > From the commit description, DT_SCHEMA_FILES might be overridden by a user.
> > So, I think this is OK.
> >
> >
> >
> >
> > > +$(obj)/$(DT_TMP_SCHEMA): | $(addprefix $(obj)/,$(patsubst 
> > > $(src)/%.yaml,%.example.dtb, $(DT_SCHEMA_FILES)))
> >
> > I do not understand this line.
> > Why is it necessary?
> >
> > *.example.dtb files are generated anyway
> > since they are listed in extra-y.
>
> It is enforcing the ordering. Without it, the binding checks and
> building .schema.yaml.tmp happen in parallel because both only have
> the source files as dependencies. The '|' keeps the dependencies out
> of the dependency list($^).


What kind problem would you see if
the binding checks and building .schema.yaml.tmp
happen in parallel?



-- 
Best Regards
Masahiro Yamada


Re: [PATCH v03] powerpc/mobility: Fix node detach/rename problem

2018-12-11 Thread Michael Bringmann



On 12/11/2018 07:29 AM, Michael Ellerman wrote:
> Hi Michael,
> 
> Please Cc the device tree folks on device tree patches, and also the
> original author of the patch that added the code you're modifying.
> 
> So I've added:
>   robh...@kernel.org
>   frowand.l...@gmail.com
>   devicet...@vger.kernel.org
>   linux-ker...@vger.kernel.org

Thanks.

> 
> Michael Bringmann  writes:
>> The PPC mobility code receives RTAS requests to delete nodes with
>> platform-/hardware-specific attributes when restarting the kernel
>> after a migration.  My example is for migration between a P8 Alpine
>> and a P8 Brazos.   Nodes to be deleted include 'ibm,random-v1',
>> 'ibm,platform-facilities', 'ibm,sym-encryption-v1', and,
>> 'ibm,compression-v1'.
>>
>> The mobility.c code calls 'of_detach_node' for the nodes and their
>> children.  This makes calls to detach the properties and to remove
>> the associated sysfs/kernfs files.
>>
>> Then new copies of the same nodes are next provided by the PHYP,
>> local copies are built, and a pointer to the 'struct device_node'
>> is passed to of_attach_node.  Before the call to of_attach_node,
>> the phandle is initialized to 0 when the data structure is alloced.
>> During the call to of_attach_node, it calls __of_attach_node which
>> pulls the actual name and phandle from just created sub-properties
>> named something like 'name' and 'ibm,phandle'.
>>
>> This is all fine for the first migration.  The problem occurs with
>> the second and subsequent migrations when the PHYP on the new system
>> wants to replace the same set of nodes again, referenced with the
>> same names and phandle values.
>>
>> On the second and subsequent migrations, the PHYP tells the system
>> to again delete the nodes 'ibm,platform-facilities', 'ibm,random-v1',
>> 'ibm,compression-v1', 'ibm,sym-encryption-v1'.  It specifies these
>> nodes by its known set of phandle values -- the same handles used
>> by the PHYP on the source system are known on the target system.
>> The mobility.c code calls of_find_node_by_phandle() with these values
>> and ends up locating the first instance of each node that was added
>> during the original boot, instead of the second instance of each node
>> created after the first migration.  The detach during the second
>> migration fails with errors like,
>>
>> [ 4565.030704] WARNING: CPU: 3 PID: 4787 at drivers/of/dynamic.c:252 
>> __of_detach_node+0x8/0xa0
>> [ 4565.030708] Modules linked in: nfsv3 nfs_acl nfs tcp_diag udp_diag 
>> inet_diag unix_diag af_packet_diag netlink_diag lockd grace fscache sunrpc 
>> xts vmx_crypto sg pseries_rng binfmt_misc ip_tables xfs libcrc32c sd_mod 
>> ibmveth ibmvscsi scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod
>> [ 4565.030733] CPU: 3 PID: 4787 Comm: drmgr Tainted: GW 
>> 4.18.0-rc1-wi107836-v05-120+ #201
>> [ 4565.030737] NIP:  c07c1ea8 LR: c07c1fb4 CTR: 
>> 00655170
>> [ 4565.030741] REGS: c003f302b690 TRAP: 0700   Tainted: GW   
>>(4.18.0-rc1-wi107836-v05-120+)
>> [ 4565.030745] MSR:  80010282b033 
>>   CR: 22288822  XER: 000a
>> [ 4565.030757] CFAR: c07c1fb0 IRQMASK: 1
>> [ 4565.030757] GPR00: c07c1fa4 c003f302b910 c114bf00 
>> c0038e68
>> [ 4565.030757] GPR04: 0001  80c008e0b4b8 
>> 
>> [ 4565.030757] GPR08:  0001 8003 
>> 2843
>> [ 4565.030757] GPR12: 8800 c0001ec9ae00 4000 
>> 
>> [ 4565.030757] GPR16:  0008  
>> f6ff
>> [ 4565.030757] GPR20: 0007  c003e9f1f034 
>> 0001
>> [ 4565.030757] GPR24:    
>> 
>> [ 4565.030757] GPR28: c1549d28 c1134828 c0038e68 
>> c003f302b930
>> [ 4565.030804] NIP [c07c1ea8] __of_detach_node+0x8/0xa0
>> [ 4565.030808] LR [c07c1fb4] of_detach_node+0x74/0xd0
>> [ 4565.030811] Call Trace:
>> [ 4565.030815] [c003f302b910] [c07c1fa4] 
>> of_detach_node+0x64/0xd0 (unreliable)
>> [ 4565.030821] [c003f302b980] [c00c33c4] 
>> dlpar_detach_node+0xb4/0x150
>> [ 4565.030826] [c003f302ba10] [c00c3ffc] delete_dt_node+0x3c/0x80
>> [ 4565.030831] [c003f302ba40] [c00c4380] 
>> pseries_devicetree_update+0x150/0x4f0
>> [ 4565.030836] [c003f302bb70] [c00c479c] 
>> post_mobility_fixup+0x7c/0xf0
>> [ 4565.030841] [c003f302bbe0] [c00c4908] 
>> migration_store+0xf8/0x130
>> [ 4565.030847] [c003f302bc70] [c0998160] 
>> kobj_attr_store+0x30/0x60
>> [ 4565.030852] [c003f302bc90] [c0412f14] sysfs_kf_write+0x64/0xa0
>> [ 4565.030857] [c003f302bcb0] [c0411cac] 
>> kernfs_fop_write+0x16c/0x240
>> [ 4565.030862] [c003f302bd00] [c0355f20] __vfs_write+0x40/0x220
>> [ 

Re: [PATCH v2.2 24/34] dt-bindings: arm: Convert Rockchip board/soc bindings to json-schema

2018-12-11 Thread Rob Herring
On Mon, Dec 10, 2018 at 4:45 PM Heiko Stuebner  wrote:
>
> From: Rob Herring 
>
> Convert Rockchip SoC bindings to DT schema format using json-schema.
>
> Cc: Mark Rutland 
> Cc: Heiko Stuebner 
> Cc: devicet...@vger.kernel.org
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-rockc...@lists.infradead.org
> Signed-off-by: Rob Herring 
> [move to per-board entries and added recently added boards]
> Signed-off-by: Heiko Stuebner 
> ---
> Hi Rob,
>
> thanks for the libyaml hint, now dtc does check my dts nicely and
> emits quite a number of little complaints ;-) .
>
> Also that suggestion to move the original firefly release+beta boards
> together was great and I just did that.
>
> Should look ok now, if so I'll apply it tomorrow.

LGTM


Re: [PATCH v3] kbuild: Add support for DT binding schema checks

2018-12-11 Thread Rob Herring
On Mon, Dec 10, 2018 at 10:39 PM Masahiro Yamada
 wrote:
>
> Hi Rob,
>
> On Tue, Dec 11, 2018 at 5:50 AM Rob Herring  wrote:
> >
> > This adds the build infrastructure for checking DT binding schema
> > documents and validating dts files using the binding schema.
> >
> > Check DT binding schema documents:
> > make dt_binding_check
> >
> > Build dts files and check using DT binding schema:
> > make dtbs_check
> >
> > Optionally, DT_SCHEMA_FILES can passed in with a schema file(s) to use
>
>
> Perhaps, "can be passed" ?

Yes.

[...]

> > +quiet_cmd_chk_binding = CHKDT   $<
>
>
> In convention, the short log displays the target name
> instead of the prerequisite.

Yes, but here there is no target. We're just running a check on the source.


> If O=foo/bar is given, $< shows the full path,
> then log will look like follows:
>
>
>   CHKDT   
> /home/masahiro/ref/linux-devicetree/Documentation/devicetree/bindings/arm/calxeda.yaml
>   DTC Documentation/devicetree/bindings/arm/calxeda.example.dtb
>   CHKDT   
> /home/masahiro/ref/linux-devicetree/Documentation/devicetree/bindings/arm/qcom.yaml
>   DTC Documentation/devicetree/bindings/arm/qcom.example.dtb
>   CHKDT   
> /home/masahiro/ref/linux-devicetree/Documentation/devicetree/bindings/arm/xilinx.yaml
>   DTC Documentation/devicetree/bindings/arm/xilinx.example.dtb
>   CHKDT   
> /home/masahiro/ref/linux-devicetree/Documentation/devicetree/bindings/arm/ti/nspire.yaml
>   DTC Documentation/devicetree/bindings/arm/ti/nspire.example.dtb
>   CHKDT   
> /home/masahiro/ref/linux-devicetree/Documentation/devicetree/bindings/arm/ti/ti,davinci.yaml
>   DTC Documentation/devicetree/bindings/arm/ti/ti,davinci.example.dtb
>   CHKDT   
> /home/masahiro/ref/linux-devicetree/Documentation/devicetree/bindings/arm/altera.yaml
>
> You might think it is ugly.

I've changed it to this:

quiet_cmd_chk_binding = CHKDT   $(patsubst $(srctree)/%,%,$<)

[...]

> > +$(obj)/%.example.dts: $(src)/%.yaml FORCE
> > +   $(call if_changed,chk_binding)
> > +
> > +DT_TMP_SCHEMA := .schema.yaml.tmp
>
>
> BTW, why does this file start with a period?
> What is the meaning of '.tmp' extension?

Nothing really. Just named it something so it gets cleaned and ignored by git.

> > +extra-y += $(DT_TMP_SCHEMA)
> > +
> > +quiet_cmd_mk_schema = SCHEMA  $@
> > +  cmd_mk_schema = mkdir -p $(obj); \
> > +  rm -f $@; \
> > +  $(DT_MK_SCHEMA) $(DT_MK_SCHEMA_FLAGS) -o $@ 
> > $(filter-out FORCE, $^)
>
>
> "mkdir -p $(obj)" is redundant.
>
>
> Why is 'rm -f $@' necessary ?
> Can't dt-mk-schema overwrite the output file?

It is for error case when the output file is not generated. I can
handle this within dt-mk-schema instead.

> > +DT_DOCS = $(shell cd $(srctree)/$(src) && find * -name '*.yaml')
> > +DT_SCHEMA_FILES ?= $(addprefix $(src)/,$(DT_DOCS))
> > +
> > +extra-y += $(patsubst $(src)/%.yaml,%.example.dts, $(DT_SCHEMA_FILES))
> > +extra-y += $(patsubst $(src)/%.yaml,%.example.dtb, $(DT_SCHEMA_FILES))
>
>
>
> I assume you intentionally did not do like this:
>
> extra-y += $(patsubst %.yaml,%.example.dtb, $(DT_DOCS))
>
> From the commit description, DT_SCHEMA_FILES might be overridden by a user.
> So, I think this is OK.
>
>
>
>
> > +$(obj)/$(DT_TMP_SCHEMA): | $(addprefix $(obj)/,$(patsubst 
> > $(src)/%.yaml,%.example.dtb, $(DT_SCHEMA_FILES)))
>
> I do not understand this line.
> Why is it necessary?
>
> *.example.dtb files are generated anyway
> since they are listed in extra-y.

It is enforcing the ordering. Without it, the binding checks and
building .schema.yaml.tmp happen in parallel because both only have
the source files as dependencies. The '|' keeps the dependencies out
of the dependency list($^).

[...]

> > +cmd_dtc_yaml_chk = \
> > +   if ! echo "/dts-v1/; /{};" | $(DTC) -I dts -O yaml > /dev/null ; 
> > then \
> > +   echo "*"; \
> > +   echo "* dtc needs libyaml for DT schema validation 
> > support."; \
> > +   echo "* Install the necessary libyaml development package 
> > from your distro."; \
> > +   echo "*"; \
> > +   false; \
> > +   fi; \
> > +   touch $@
> > +
> > +$(objtree)/scripts/dtc/.dtc-yaml-chk.tmp: $(DTC) FORCE
> > +   $(call if_changed,dtc_yaml_chk)
>
>
> Hmm, this is kind of ugly.
>
> I'd rather check this in scripts/dtc/Makefile
>
>
>
> How about something like below?

Looks good.

>  ifeq ($(wildcard /usr/include/yaml.h),)
> +ifeq ($(CHECK_DTBS),1)

I'm going to change this to "ifneq ($CHECK_DTBS),)" in case we start
supporting more than 1 value such as warning levels.

> +$(error dtc needs libyaml for DT schema validation support. \
> +Install the necessary libyaml development package.)
> +endif
>  HOST_EXTRACFLAGS += -DNO_YAML
>  else
>  dtc-objs   += yamltree.o
>
>
>
>
>
> One drawback of this approach is,
> this is not checked if you are using out-of-tree DTC.
> Probably, Rob is the only person that overrides DTC.


Re: use generic DMA mapping code in powerpc V4

2018-12-11 Thread Christian Zigotzky
Next step: 977706f9755d2d697aa6f45b4f9f0e07516efeda (powerpc/dma: remove 
dma_nommu_mmap_coherent)


Result: The P5020 board boots and the PASEMI onboard ethernet works.

-- Christian


On 10 December 2018 at 4:54PM, Christian Zigotzky wrote:
Next step: 64ecd2c160bbef31465c4d34efc0f076a2aad4df (powerpc/dma: use 
phys_to_dma instead of get_dma_offset)


The P5020 board boots and the PASEMI onboard ethernet works.

-- Christian


On 09 December 2018 at 7:26PM, Christian Zigotzky wrote:
Next step: c1bfcad4b0cf38ce5b00f7ad880d3a13484c123a (dma-mapping, 
powerpc: simplify the arch dma_set_mask override)


Result: No problems with the PASEMI onboard ethernet and with booting 
the X5000 (P5020 board).


-- Christian


On 09 December 2018 at 3:20PM, Christian Zigotzky wrote:
Next step: 602307b034734ce77a05da4b99333a2eaf6b6482 
(powerpc/fsl_pci: simplify fsl_pci_dma_set_mask)


git checkout 602307b034734ce77a05da4b99333a2eaf6b6482

The PASEMI onboard ethernet works and the X5000 boots.

-- Christian


On 08 December 2018 at 2:47PM, Christian Zigotzky wrote:
Next step: e15cd8173ef85e9cc3e2a9c7cc2982f5c1355615 (powerpc/dma: 
fix an off-by-one in dma_capable)


git checkout e15cd8173ef85e9cc3e2a9c7cc2982f5c1355615

The PASEMI onboard ethernet also works with this commit and the 
X5000 boots without any problems.


-- Christian


On 08 December 2018 at 11:29AM, Christian Zigotzky wrote:
Next step: 7ebc44c535f6bd726d553756d38b137acc718443 (powerpc/dma: 
remove max_direct_dma_addr)


git checkout 7ebc44c535f6bd726d553756d38b137acc718443

OK, the PASEMI onboard ethernet works and the P5020 board boots.

-- Christian


On 07 December 2018 at 7:33PM, Christian Zigotzky wrote:
Next step: 13c1fdec5682b6e13257277fa16aa31f342d167d (powerpc/dma: 
move pci_dma_dev_setup_swiotlb to fsl_pci.c)


git checkout 13c1fdec5682b6e13257277fa16aa31f342d167d

Result: The PASEMI onboard ethernet works and the P5020 board boots.

— Christian




















Re: [PATCH] powerpc/ipic: Fix a bounds check in ipic_set_priority()

2018-12-11 Thread Dan Carpenter
On Thu, Dec 06, 2018 at 09:12:12AM +0100, Julia Lawall wrote:
> 
> 
> On Thu, 6 Dec 2018, Christophe LEROY wrote:
> 
> >
> >
> > Le 05/12/2018 à 04:26, Michael Ellerman a écrit :
> > > Hi Dan,
> > >
> > > Thanks for the patch.
> > >
> > > Dan Carpenter  writes:
> > > > The ipic_info[] array only has 95 elements so I have made the bounds
> > > > check smaller to prevent a read overflow.  It was Smatch that found
> > > > this issue:
> > > >
> > > >  arch/powerpc/sysdev/ipic.c:784 ipic_set_priority()
> > > >  error: buffer overflow 'ipic_info' 95 <= 127
> > > >
> > > > Signed-off-by: Dan Carpenter 
> > > > ---
> > > > I wasn't able to find any callers of this code.  Maybe we removed the
> > > > last one in commit b9f0f1bb2bca ("[POWERPC] Adapt ipic driver to new
> > > > host_ops interface, add set_irq_type to set IRQ sense").  So perhaps we
> > > > should just remove it.  I'm not really comfortable doing that myself,
> > > > because I don't know the code well enough and can't build test
> > > > it properly.
> > >
> > > Hah wow, last usage removed in 2006!
> > >
> > > I don't see any mention of it since then, so I'll remove it. If it
> > > breaks something we can put it back.
> > >
> > > Can smatch help us find things like this that are defined non-static but
> > > never used?
> > >
> >
> > I think we have to do that carrefully. Some of those functions might be used
> > by out-of-tree boards.
> >
> > I'm thinking especially at ipic_get_mcp_status() and ipic_set_mcp_status().
> > They are used in my 832x boards's machine check handler to know when a 
> > machine
> > check is a timeout from the 832x watchdog.
> 
> The message I have gotten in the past is that the Linux kernel doesn't
> support code that is not used in the Linux kernel.  However, if I were to
> do this, I would send the code to the individual maintainers, who
> presumably would know what is actually needed and what is not.
> 
> Perhaps a good sanity check would be if the code has been used in the
> past.  If there was a use in the past that has been removed, then perhaps
> it is more likely that the function was intended for internal kernel use
> rather than the case that you are describing.

Yeah.  That's a good idea.  I've been encouraging people who remove
"written to but not used" variables to say in the commit message
"variable foo has not been used since commit 123412341243 ("blah blah")."
It helps me review the patch as well.

regards,
dan carpenter



Re: [PATCH] PCI: Add no-D3 quirk for Mellanox ConnectX-[45]

2018-12-11 Thread Bjorn Helgaas
Hi David,

I see you're still working on this, but if you do end up going this
direction eventually, would you mind splitting this into two patches:
1) rename the quirk to make it more generic (but not changing any
behavior), and 2) add the ConnectX devices to the quirk.  That way
the ConnectX change is smaller and more easily understood/reverted/etc.

On Thu, Dec 06, 2018 at 03:19:51PM +1100, David Gibson wrote:
> Mellanox ConnectX-5 IB cards (MT27800) seem to cause a call trace when
> unbound from their regular driver and attached to vfio-pci in order to pass
> them through to a guest.
> 
> This goes away if the disable_idle_d3 option is used, so it looks like a
> problem with the hardware handling D3 state.  To fix that more permanently,
> use a device quirk to disable D3 state for these devices.
> 
> We do this by renaming the existing quirk_no_ata_d3() more generally and
> attaching it to the ConnectX-[45] devices (0x15b3:0x1013).
> 
> Signed-off-by: David Gibson 
> ---
>  drivers/pci/quirks.c | 17 +++--
>  1 file changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 4700d24e5d55..add3f516ca12 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -1315,23 +1315,24 @@ static void quirk_ide_samemode(struct pci_dev *pdev)
>  }
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82801CA_10, 
> quirk_ide_samemode);
>  
> -/* Some ATA devices break if put into D3 */
> -static void quirk_no_ata_d3(struct pci_dev *pdev)
> +/* Some devices (including a number of ATA cards) break if put into D3 */
> +static void quirk_no_d3(struct pci_dev *pdev)
>  {
>   pdev->dev_flags |= PCI_DEV_FLAGS_NO_D3;
>  }
> +
>  /* Quirk the legacy ATA devices only. The AHCI ones are ok */
>  DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_SERVERWORKS, PCI_ANY_ID,
> - PCI_CLASS_STORAGE_IDE, 8, quirk_no_ata_d3);
> + PCI_CLASS_STORAGE_IDE, 8, quirk_no_d3);
>  DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_ATI, PCI_ANY_ID,
> - PCI_CLASS_STORAGE_IDE, 8, quirk_no_ata_d3);
> + PCI_CLASS_STORAGE_IDE, 8, quirk_no_d3);
>  /* ALi loses some register settings that we cannot then restore */
>  DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AL, PCI_ANY_ID,
> - PCI_CLASS_STORAGE_IDE, 8, quirk_no_ata_d3);
> + PCI_CLASS_STORAGE_IDE, 8, quirk_no_d3);
>  /* VIA comes back fine but we need to keep it alive or ACPI GTM failures
> occur when mode detecting */
>  DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_VIA, PCI_ANY_ID,
> - PCI_CLASS_STORAGE_IDE, 8, quirk_no_ata_d3);
> + PCI_CLASS_STORAGE_IDE, 8, quirk_no_d3);
>  
>  /*
>   * This was originally an Alpha-specific thing, but it really fits here.
> @@ -3367,6 +3368,10 @@ static void mellanox_check_broken_intx_masking(struct 
> pci_dev *pdev)
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MELLANOX, PCI_ANY_ID,
>   mellanox_check_broken_intx_masking);
>  
> +/* Mellanox MT27800 (ConnectX-5) IB card seems to break with D3
> + * In particular this shows up when the device is bound to the vfio-pci 
> driver */

Follow usual multiline comment style, i.e.,

  /*
   * text ...
   * more text ...
   */

> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_MELLANOX, 
> PCI_DEVICE_ID_MELLANOX_CONNECTX4, quirk_no_d3)
> +
>  static void quirk_no_bus_reset(struct pci_dev *dev)
>  {
>   dev->dev_flags |= PCI_DEV_FLAGS_NO_BUS_RESET;
> -- 
> 2.19.2
> 


Re: [PATCH 10/10] perf/doc: update design.txt for exclude_{host|guest} flags

2018-12-11 Thread Andrew Murray
On Tue, Dec 11, 2018 at 10:06:53PM +1100, Michael Ellerman wrote:
> [ Reviving old thread. ]
> 
> Andrew Murray  writes:
> > On Tue, Nov 20, 2018 at 10:31:36PM +1100, Michael Ellerman wrote:
> >> Andrew Murray  writes:
> >> 
> >> > Update design.txt to reflect the presence of the exclude_host
> >> > and exclude_guest perf flags.
> >> >
> >> > Signed-off-by: Andrew Murray 
> >> > ---
> >> >  tools/perf/design.txt | 4 
> >> >  1 file changed, 4 insertions(+)
> >> >
> >> > diff --git a/tools/perf/design.txt b/tools/perf/design.txt
> >> > index a28dca2..7de7d83 100644
> >> > --- a/tools/perf/design.txt
> >> > +++ b/tools/perf/design.txt
> >> > @@ -222,6 +222,10 @@ The 'exclude_user', 'exclude_kernel' and 
> >> > 'exclude_hv' bits provide a
> >> >  way to request that counting of events be restricted to times when the
> >> >  CPU is in user, kernel and/or hypervisor mode.
> >> >  
> >> > +Furthermore the 'exclude_host' and 'exclude_guest' bits provide a way
> >> > +to request counting of events restricted to guest and host contexts when
> >> > +using virtualisation.
> >> 
> >> How does exclude_host differ from exclude_hv ?
> >
> > I believe exclude_host / exclude_guest are intented to distinguish
> > between host and guest in the hosted hypervisor context (KVM).
> 
> OK yeah, from the perf-list man page:
> 
>u - user-space counting
>k - kernel counting
>h - hypervisor counting
>I - non idle counting
>G - guest counting (in KVM guests)
>H - host counting (not in KVM guests)
> 
> > Whereas exclude_hv allows to distinguish between guest and
> > hypervisor in the bare-metal type hypervisors.
> 
> Except that's exactly not how we use them on powerpc :)
> 
> We use exclude_hv to exclude "the hypervisor", regardless of whether
> it's KVM or PowerVM (which is a bare-metal hypervisor).
> 
> We don't use exclude_host / exclude_guest at all, which I guess is a
> bug, except I didn't know they existed until this thread.
> 
> eg, in a KVM guest:
> 
>   $ perf record -e cycles:G /bin/bash -c "for i in {0..10}; do :;done"
>   $ perf report -D | grep -Fc "dso: [hypervisor]"
>   16
> 
> 
> > In the case of arm64 - if VHE extensions are present then the host
> > kernel will run at a higher privilege to the guest kernel, in which
> > case there is no distinction between hypervisor and host so we ignore
> > exclude_hv. But where VHE extensions are not present then the host
> > kernel runs at the same privilege level as the guest and we use a
> > higher privilege level to switch between them - in this case we can
> > use exclude_hv to discount that hypervisor role of switching between
> > guests.
> 
> I couldn't find any arm64 perf code using exclude_host/guest at all?

Correct - but this is in flight as I am currently adding support for this
see [1].

> 
> And I don't see any x86 code using exclude_hv.

I can't find any either.

> 
> But maybe that's OK, I just worry this is confusing for users.

There is some extra context regarding this where exclude_guest/exclude_host
was added, see [2] and where exclude_hv was added, see [3]

Generally it seems that exclude_guest/exclude_host relies upon switching
counters off/on on guest/host switch code (which works well in the nested
virt case). Whereas exclude_hv tends to rely solely on hardware capability
based on privilege level (which works well in the bare metal case where
the guest doesn't run at same privilege as the host).

I think from the user perspective exclude_hv allows you to see your overhead
if you are a guest (i.e. work done by bare metal hypervisor associated with
you as the guest). Whereas exclude_guest/exclude_host doesn't allow you to
see events above you (i.e. the kernel hypervisor) if you are the guest...

At least that's how I read this, I've copied in others that may provide
more authoritative feedback.

[1] https://lists.cs.columbia.edu/pipermail/kvmarm/2018-December/033698.html
[2] https://www.spinics.net/lists/kvm/msg53996.html
[3] https://lore.kernel.org/patchwork/patch/143918/

Thanks,

Andrew Murray

> 
> cheers


Re: [v5] powerpc/ptrace: replace ptrace_report_syscall() with a tracehook call

2018-12-11 Thread Michael Ellerman
On Fri, 2018-12-07 at 15:56:05 UTC, "Dmitry V. Levin" wrote:
> From: Elvira Khabirova 
> 
> Arch code should use tracehook_*() helpers, as documented
> in include/linux/tracehook.h,
> ptrace_report_syscall() is not expected to be used outside that file.
> 
> The patch does not look very nice, but at least it is correct
> and opens the way for PTRACE_GET_SYSCALL_INFO API.
> 
> Co-authored-by: Dmitry V. Levin 
> Fixes: 5521eb4bca2d ("powerpc/ptrace: Add support for PTRACE_SYSEMU")
> Cc: Michael Ellerman 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Oleg Nesterov 
> Cc: Breno Leitao 
> Cc: Andy Lutomirski 
> Cc: Eugene Syromyatnikov 
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Elvira Khabirova 
> Signed-off-by: Dmitry V. Levin 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/a225f1567405558fb5410e9b2b9080

cheers


Re: [PATCH v03] powerpc/mobility: Fix node detach/rename problem

2018-12-11 Thread Michael Ellerman
Hi Michael,

Please Cc the device tree folks on device tree patches, and also the
original author of the patch that added the code you're modifying.

So I've added:
  robh...@kernel.org
  frowand.l...@gmail.com
  devicet...@vger.kernel.org
  linux-ker...@vger.kernel.org

Michael Bringmann  writes:
> The PPC mobility code receives RTAS requests to delete nodes with
> platform-/hardware-specific attributes when restarting the kernel
> after a migration.  My example is for migration between a P8 Alpine
> and a P8 Brazos.   Nodes to be deleted include 'ibm,random-v1',
> 'ibm,platform-facilities', 'ibm,sym-encryption-v1', and,
> 'ibm,compression-v1'.
>
> The mobility.c code calls 'of_detach_node' for the nodes and their
> children.  This makes calls to detach the properties and to remove
> the associated sysfs/kernfs files.
>
> Then new copies of the same nodes are next provided by the PHYP,
> local copies are built, and a pointer to the 'struct device_node'
> is passed to of_attach_node.  Before the call to of_attach_node,
> the phandle is initialized to 0 when the data structure is alloced.
> During the call to of_attach_node, it calls __of_attach_node which
> pulls the actual name and phandle from just created sub-properties
> named something like 'name' and 'ibm,phandle'.
>
> This is all fine for the first migration.  The problem occurs with
> the second and subsequent migrations when the PHYP on the new system
> wants to replace the same set of nodes again, referenced with the
> same names and phandle values.
>
> On the second and subsequent migrations, the PHYP tells the system
> to again delete the nodes 'ibm,platform-facilities', 'ibm,random-v1',
> 'ibm,compression-v1', 'ibm,sym-encryption-v1'.  It specifies these
> nodes by its known set of phandle values -- the same handles used
> by the PHYP on the source system are known on the target system.
> The mobility.c code calls of_find_node_by_phandle() with these values
> and ends up locating the first instance of each node that was added
> during the original boot, instead of the second instance of each node
> created after the first migration.  The detach during the second
> migration fails with errors like,
>
> [ 4565.030704] WARNING: CPU: 3 PID: 4787 at drivers/of/dynamic.c:252 
> __of_detach_node+0x8/0xa0
> [ 4565.030708] Modules linked in: nfsv3 nfs_acl nfs tcp_diag udp_diag 
> inet_diag unix_diag af_packet_diag netlink_diag lockd grace fscache sunrpc 
> xts vmx_crypto sg pseries_rng binfmt_misc ip_tables xfs libcrc32c sd_mod 
> ibmveth ibmvscsi scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod
> [ 4565.030733] CPU: 3 PID: 4787 Comm: drmgr Tainted: GW 
> 4.18.0-rc1-wi107836-v05-120+ #201
> [ 4565.030737] NIP:  c07c1ea8 LR: c07c1fb4 CTR: 
> 00655170
> [ 4565.030741] REGS: c003f302b690 TRAP: 0700   Tainted: GW
>   (4.18.0-rc1-wi107836-v05-120+)
> [ 4565.030745] MSR:  80010282b033  
>  CR: 22288822  XER: 000a
> [ 4565.030757] CFAR: c07c1fb0 IRQMASK: 1
> [ 4565.030757] GPR00: c07c1fa4 c003f302b910 c114bf00 
> c0038e68
> [ 4565.030757] GPR04: 0001  80c008e0b4b8 
> 
> [ 4565.030757] GPR08:  0001 8003 
> 2843
> [ 4565.030757] GPR12: 8800 c0001ec9ae00 4000 
> 
> [ 4565.030757] GPR16:  0008  
> f6ff
> [ 4565.030757] GPR20: 0007  c003e9f1f034 
> 0001
> [ 4565.030757] GPR24:    
> 
> [ 4565.030757] GPR28: c1549d28 c1134828 c0038e68 
> c003f302b930
> [ 4565.030804] NIP [c07c1ea8] __of_detach_node+0x8/0xa0
> [ 4565.030808] LR [c07c1fb4] of_detach_node+0x74/0xd0
> [ 4565.030811] Call Trace:
> [ 4565.030815] [c003f302b910] [c07c1fa4] of_detach_node+0x64/0xd0 
> (unreliable)
> [ 4565.030821] [c003f302b980] [c00c33c4] 
> dlpar_detach_node+0xb4/0x150
> [ 4565.030826] [c003f302ba10] [c00c3ffc] delete_dt_node+0x3c/0x80
> [ 4565.030831] [c003f302ba40] [c00c4380] 
> pseries_devicetree_update+0x150/0x4f0
> [ 4565.030836] [c003f302bb70] [c00c479c] 
> post_mobility_fixup+0x7c/0xf0
> [ 4565.030841] [c003f302bbe0] [c00c4908] 
> migration_store+0xf8/0x130
> [ 4565.030847] [c003f302bc70] [c0998160] kobj_attr_store+0x30/0x60
> [ 4565.030852] [c003f302bc90] [c0412f14] sysfs_kf_write+0x64/0xa0
> [ 4565.030857] [c003f302bcb0] [c0411cac] 
> kernfs_fop_write+0x16c/0x240
> [ 4565.030862] [c003f302bd00] [c0355f20] __vfs_write+0x40/0x220
> [ 4565.030867] [c003f302bd90] [c0356358] vfs_write+0xc8/0x240
> [ 4565.030872] [c003f302bde0] [c03566cc] ksys_write+0x5c/0x100
> [ 4565.030880] 

Re: [PATCH] bpf: fix overflow of bpf_jit_limit when PAGE_SIZE >= 64K

2018-12-11 Thread Michael Ellerman
Daniel Borkmann  writes:
< snip >
>
> I would actually just like to get rid of the BPF_JIT_LIMIT_DEFAULT
> define also given for 4.21 arm64 will have its own dedicated area for
> JIT allocations where neither the above limit nor the MODULES_END/
> MODULES_VADDR one would fit and I don't want to make this even more
> ugly with adding further cases into the core. Would the below variant
> work for you?

I haven't actually hit the bug so I won't ack/tested-by this, but it
looks fine to me.

cheers

> From da9daf462d41ce5506c6b6318a9fa3d6d8a64f6c Mon Sep 17 00:00:00 2001
> From: Daniel Borkmann 
> Date: Mon, 10 Dec 2018 14:30:27 +0100
> Subject: [PATCH bpf] bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K
>
> Michael and Sandipan report:
>
>   Commit ede95a63b5 introduced a bpf_jit_limit tuneable to limit BPF
>   JIT allocations. At compile time it defaults to PAGE_SIZE * 4,
>   and is adjusted again at init time if MODULES_VADDR is defined.
>
>   For ppc64 kernels, MODULES_VADDR isn't defined, so we're stuck with
>   the compile-time default at boot-time, which is 0x9c40 when
>   using 64K page size. This overflows the signed 32-bit bpf_jit_limit
>   value:
>
>   root@ubuntu:/tmp# cat /proc/sys/net/core/bpf_jit_limit
>   -1673527296
>
>   and can cause various unexpected failures throughout the network
>   stack. In one case `strace dhclient eth0` reported:
>
>   setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, filter=0x105dd27f8},
>  16) = -1 ENOTSUPP (Unknown error 524)
>
>   and similar failures can be seen with tools like tcpdump. This doesn't
>   always reproduce however, and I'm not sure why. The more consistent
>   failure I've seen is an Ubuntu 18.04 KVM guest booted on a POWER9
>   host would time out on systemd/netplan configuring a virtio-net NIC
>   with no noticeable errors in the logs.
>
> Given this and also given that in near future some architectures like
> arm64 will have a custom area for BPF JIT image allocations we should
> get rid of the BPF_JIT_LIMIT_DEFAULT fallback / default entirely. For
> 4.21, we have an overridable bpf_jit_alloc_exec(), bpf_jit_free_exec()
> so therefore add another overridable bpf_jit_alloc_exec_limit() helper
> function which returns the possible size of the memory area for deriving
> the default heuristic in bpf_jit_charge_init().
>
> Like bpf_jit_alloc_exec() and bpf_jit_free_exec(), the new
> bpf_jit_alloc_exec_limit() assumes that module_alloc() is the default
> JIT memory provider, and therefore in case archs implement their custom
> module_alloc() we use MODULES_{END,_VADDR} for limits and otherwise for
> vmalloc_exec() cases like on ppc64 we use VMALLOC_{END,_START}.
>
> Fixes: ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict unpriv 
> allocations")
> Reported-by: Sandipan Das 
> Reported-by: Michael Roth 
> Signed-off-by: Daniel Borkmann 
> ---
>  kernel/bpf/core.c | 19 ++-
>  1 file changed, 14 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index b1a3545..6c2332e 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -365,13 +365,11 @@ void bpf_prog_kallsyms_del_all(struct bpf_prog *fp)
>  }
>
>  #ifdef CONFIG_BPF_JIT
> -# define BPF_JIT_LIMIT_DEFAULT   (PAGE_SIZE * 4)
> -
>  /* All BPF JIT sysctl knobs here. */
>  int bpf_jit_enable   __read_mostly = IS_BUILTIN(CONFIG_BPF_JIT_ALWAYS_ON);
>  int bpf_jit_harden   __read_mostly;
>  int bpf_jit_kallsyms __read_mostly;
> -int bpf_jit_limit__read_mostly = BPF_JIT_LIMIT_DEFAULT;
> +int bpf_jit_limit__read_mostly;
>
>  static __always_inline void
>  bpf_get_prog_addr_region(const struct bpf_prog *prog,
> @@ -580,16 +578,27 @@ int bpf_get_kallsym(unsigned int symnum, unsigned long 
> *value, char *type,
>
>  static atomic_long_t bpf_jit_current;
>
> +/* Can be overridden by an arch's JIT compiler if it has a custom,
> + * dedicated BPF backend memory area, or if neither of the two
> + * below apply.
> + */
> +u64 __weak bpf_jit_alloc_exec_limit(void)
> +{
>  #if defined(MODULES_VADDR)
> + return MODULES_END - MODULES_VADDR;
> +#else
> + return VMALLOC_END - VMALLOC_START;
> +#endif
> +}
> +
>  static int __init bpf_jit_charge_init(void)
>  {
>   /* Only used as heuristic here to derive limit. */
> - bpf_jit_limit = min_t(u64, round_up((MODULES_END - MODULES_VADDR) >> 2,
> + bpf_jit_limit = min_t(u64, round_up(bpf_jit_alloc_exec_limit() >> 2,
>   PAGE_SIZE), INT_MAX);
>   return 0;
>  }
>  pure_initcall(bpf_jit_charge_init);
> -#endif
>
>  static int bpf_jit_charge_modmem(u32 pages)
>  {
> -- 
> 2.9.5


Re: [PATCH] powerpc: Fix bogus usage of MSR_RI on BookE and 40x

2018-12-11 Thread Christophe Leroy




Le 11/12/2018 à 02:57, Benjamin Herrenschmidt a écrit :

BookE and 40x processors lack the MSR:RI bit. However, we have a
few common code places that rely on it.

This fixes it by not defining MSR_RI on those processor types and
using the appropriate ifdef's in those locations.

Signed-off-by: Benjamin Herrenschmidt 
---
  arch/powerpc/include/asm/reg.h   | 2 ++
  arch/powerpc/include/asm/reg_booke.h | 4 ++--
  arch/powerpc/kernel/process.c| 2 +-
  arch/powerpc/kernel/traps.c  | 8 ++--
  arch/powerpc/lib/sstep.c | 2 ++
  5 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index de52c31..41d0b2e 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -110,7 +110,9 @@
  #ifndef MSR_PMM
  #define MSR_PMM   __MASK(MSR_PMM_LG)  /* Performance monitor 
*/
  #endif
+#if !defined(CONFIG_BOOKE) && !defined(CONFIG_4xx)
  #define MSR_RI__MASK(MSR_RI_LG)   /* Recoverable 
Exception */
+#endif
  #define MSR_LE__MASK(MSR_LE_LG)   /* Little Endian */
  
  #define MSR_TM		__MASK(MSR_TM_LG)	/* Transactional Mem Available */

diff --git a/arch/powerpc/include/asm/reg_booke.h 
b/arch/powerpc/include/asm/reg_booke.h
index eb2a33d..06967f1 100644
--- a/arch/powerpc/include/asm/reg_booke.h
+++ b/arch/powerpc/include/asm/reg_booke.h
@@ -46,10 +46,10 @@
  #define MSR_USER32(MSR_ | MSR_PR | MSR_EE)
  #define MSR_USER64(MSR_USER32 | MSR_64BIT)
  #elif defined (CONFIG_40x)
-#define MSR_KERNEL (MSR_ME|MSR_RI|MSR_IR|MSR_DR|MSR_CE)
+#define MSR_KERNEL (MSR_ME|MSR_IR|MSR_DR|MSR_CE)
  #define MSR_USER  (MSR_KERNEL|MSR_PR|MSR_EE)
  #else
-#define MSR_KERNEL (MSR_ME|MSR_RI|MSR_CE)
+#define MSR_KERNEL (MSR_ME|MSR_CE)
  #define MSR_USER  (MSR_KERNEL|MSR_PR|MSR_EE)
  #endif
  
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c

index 96f3473..77679a7 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1359,7 +1359,7 @@ static struct regbit msr_bits[] = {
{MSR_IR,"IR"},
{MSR_DR,"DR"},
{MSR_PMM,   "PMM"},
-#ifndef CONFIG_BOOKE
+#if !defined(CONFIG_BOOKE) && !defined(CONFIG_40x)
{MSR_RI,"RI"},
{MSR_LE,"LE"},
  #endif
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 9a86572..2d00696 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -429,10 +429,11 @@ void system_reset_exception(struct pt_regs *regs)
if (get_paca()->in_nmi > 1)
nmi_panic(regs, "Unrecoverable nested System Reset");
  #endif
+#ifdef MSR_RI
/* Must die if the interrupt is not recoverable */
if (!(regs->msr & MSR_RI))
nmi_panic(regs, "Unrecoverable System Reset");
-
+#endif


Could we have a helper in .h file instead of #ifdefs ?

For instance something like

#if defined(CONFIG_4xx) || defined(CONFIG_BOOKE)
static inline bool exception_is_recoverable(u32 msr)
{
return msr & MSR_RI;
}
#else
static inline bool exception_is_recoverable(u32 msr)
{
return true;
}
#endif

Or reuse the function unrecoverable_excp() defined in xmon.c ?


if (!nested)
nmi_exit();
  
@@ -478,7 +479,9 @@ static inline int check_io_access(struct pt_regs *regs)

printk(KERN_DEBUG "%s bad port %lx at %p\n",
   (*nip & 0x100)? "OUT to": "IN from",
   regs->gpr[rb] - _IO_BASE, nip);
+#ifdef MSR_RI
regs->msr |= MSR_RI;
+#endif


Same here, a helper, something like that, to also be used in 
arch/powerpc/sysdev/fsl_rio.c ?


#if defined(CONFIG_4xx) || defined(CONFIG_BOOKE)
static inline void exception_set_recoverable(struct pt_regs *regs)
{
regs->msr |= MSR_RI;
}
#else
static inline void exception_set_recoverable(struct pt_regs *regs)
{
}
#endif


regs->nip = extable_fixup(entry);
return 1;
}
@@ -763,10 +766,11 @@ void machine_check_exception(struct pt_regs *regs)
if (check_io_access(regs))
goto bail;
  
+#ifdef MSR_RI

/* Must die if the interrupt is not recoverable */
if (!(regs->msr & MSR_RI))
nmi_panic(regs, "Unrecoverable Machine check");
-
+#endif


Same


if (!nested)
nmi_exit();
  
diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c

index d81568f..c03c453 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -3059,9 +3059,11 @@ int emulate_step(struct pt_regs *regs, unsigned int 
instr)
  
  	case MTMSR:

val = regs->gpr[op.reg];
+#ifdef MSR_RI
if ((val & MSR_RI) == 0)


Could be something like the following instead of this #ifdef stuff ?

		if (!IS_ENABLED(CONFIG_4xx) && 

[RFC PATCH v1 3/3] powerpc/8xx: Enable CONFIG_VMAP_STACK

2018-12-11 Thread Christophe Leroy
This patch enables CONFIG_VMAP_STACK. For that, a few changes are
done in head_8xx.S to re-activation DATA MMU Translation before
accessing to the stack.

Due to the growing of exception prolog, a few rearrangement is also
done in a few exception handlers.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig   |  1 +
 arch/powerpc/kernel/head_8xx.S | 87 +-
 2 files changed, 79 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 94b46624068d..323b8a1efb3e 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -180,6 +180,7 @@ config PPC
select HAVE_ARCH_MMAP_RND_COMPAT_BITS   if COMPAT
select HAVE_ARCH_SECCOMP_FILTER
select HAVE_ARCH_TRACEHOOK
+   select HAVE_ARCH_VMAP_STACK if PPC_8xx
select HAVE_CBPF_JITif !PPC64
select HAVE_STACKPROTECTOR  if PPC64 && 
$(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r13)
select HAVE_STACKPROTECTOR  if PPC32 && 
$(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r2)
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 48996a424075..ded66a6fdfeb 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -136,6 +136,53 @@ instruction_counter:
EXCEPTION_PROLOG_1; \
EXCEPTION_PROLOG_2
 
+#ifdef CONFIG_VMAP_STACK
+#define EXCEPTION_PROLOG_1 \
+   mtspr   SPRN_SPRG_SCRATCH2, r12;\
+   mfspr   r12, SPRN_SPRG_THREAD;  \
+   mfspr   r11, SPRN_SRR0; \
+   stw r11, SRR0(r12); \
+   mfspr   r11, SPRN_DAR;  \
+   stw r11, DAR(r12);  \
+   mfspr   r11,SPRN_SRR1;  /* check whether user or kernel */ \
+   stw r11, SRR1(r12); \
+   andi.   r11,r11,MSR_PR
+
+#define EXCEPTION_PROLOG_2 \
+   li  r11, MSR_KERNEL & ~(MSR_IR | MSR_RI); /* can take DTLB miss */ \
+   mtmsr   r11;\
+   tovirt(r12, r12);   \
+   subir11, r1, INT_FRAME_SIZE;/* use r1 if kernel */ \
+   beq 1f; \
+   lwz r11, TASK_STACK-THREAD(r12);\
+   addir11, r11, THREAD_SIZE - INT_FRAME_SIZE; \
+1: stw r10,_CCR(r11);  /* save registers */ \
+   stw r9,GPR9(r11);   \
+   mfspr   r10,SPRN_SPRG_SCRATCH0; \
+   stw r10,GPR10(r11); \
+   mfspr   r10,SPRN_SPRG_SCRATCH1; \
+   stw r10,GPR11(r11); \
+   mfspr   r10,SPRN_SPRG_SCRATCH2; \
+   stw r10,GPR12(r11); \
+   mflrr10;\
+   stw r10,_LINK(r11); \
+   lwz r10, DAR(r12);  \
+   stw r10, _DAR(r11); \
+   lwz r9, SRR1(r12);  \
+   lwz r12, SRR0(r12); \
+   stw r1,GPR1(r11);   \
+   stw r1,0(r11);  \
+   mr  r1, r11;/* set new kernel sp */ \
+   li  r10,MSR_KERNEL & ~MSR_IR; /* can take exceptions */ \
+   mtmsr   r10;\
+   stw r0,GPR0(r11);   \
+   lis r10, STACK_FRAME_REGS_MARKER@ha; /* exception frame marker */ \
+   addir10, r10, STACK_FRAME_REGS_MARKER@l; \
+   stw r10, 8(r11);\
+   SAVE_4GPRS(3, r11); \
+   SAVE_2GPRS(7, r11)
+
+#else
 #define EXCEPTION_PROLOG_1 \
mfspr   r11,SPRN_SRR1;  /* check whether user or kernel */ \
andi.   r11,r11,MSR_PR; \
@@ -172,6 +219,8 @@ instruction_counter:
SAVE_4GPRS(3, r11); \
SAVE_2GPRS(7, r11)
 
+#endif
+
 /*
  * Note: code which follows this uses cr0.eq (set if from kernel),
  * r11, r12 (SRR0), and r9 (SRR1).
@@ -226,8 +275,12 @@ i##n:  
\
. = 0x200
 MachineCheck:
EXCEPTION_PROLOG
+#ifdef CONFIG_VMAP_STACK
+   lwz r4, _DAR(r11)
+#else
mfspr r4,SPRN_DAR
stw r4,_DAR(r11)
+#endif
li r5,RPN_PATTERN
mtspr SPRN_DAR,r5   /* Tag DAR, to be used in DTLB Error */
mfspr r5,SPRN_DSISR
@@ -254,8 +307,12 @@ InstructionAccess:
. = 0x600
 Alignment:
EXCEPTION_PROLOG
+#ifdef CONFIG_VMAP_STACK
+   lwz r4, _DAR(r11)
+#else
mfspr   r4,SPRN_DAR
stw r4,_DAR(r11)
+#endif
li  r5,RPN_PATTERN
mtspr   SPRN_DAR,r5 /* Tag DAR, to be used in DTLB Error */
mfspr   r5,SPRN_DSISR
@@ -573,20 +630,31 @@ DataTLBError:
beq-FixupDAR/* must be a buggy dcbX, icbi insn. */
 DARFixed:/* Return from dcbx instruction bug workaround */
EXCEPTION_PROLOG_1
+#ifdef CONFIG_VMAP_STACK
+   li  r11, RPN_PATTERN
+   mtspr   SPRN_DAR, r11   /* Tag DAR, to be used in DTLB Error */
+#endif
EXCEPTION_PROLOG_2
mfspr   r5,SPRN_DSISR
stw r5,_DSISR(r11)
+#ifdef CONFIG_VMAP_STACK
+   lwz r4, _DAR(r11)
+#else
mfspr   r4,SPRN_DAR
+#endif
andis.  

[RFC PATCH v1 2/3] powerpc/8xx: Use alternative scratch registers in DTLB miss handler

2018-12-11 Thread Christophe Leroy
In preparation of handling CONFIG_VMAP_STACK, we need DTLB miss handler
to use different scratch registers than other exception handlers in
order to not jeopardise exception entry on stack DTLB misses.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_8xx.S | 27 ++-
 arch/powerpc/perf/8xx-pmu.c| 12 
 2 files changed, 22 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 5f5f89e87e3a..48996a424075 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -421,8 +421,8 @@ ITLBMissLinear:
 
. = 0x1200
 DataStoreTLBMiss:
-   mtspr   SPRN_SPRG_SCRATCH0, r10
-   mtspr   SPRN_SPRG_SCRATCH1, r11
+   mtspr   SPRN_DAR, r10
+   mtspr   SPRN_M_TW, r11
mfcrr11
 
/* If we are faulting a kernel address, we have to use the
@@ -487,10 +487,10 @@ DataStoreTLBMiss:
mtspr   SPRN_MD_RPN, r10/* Update TLB entry */
 
/* Restore registers */
-   mtspr   SPRN_DAR, r11   /* Tag DAR */
 
-0: mfspr   r10, SPRN_SPRG_SCRATCH0
-   mfspr   r11, SPRN_SPRG_SCRATCH1
+0: mfspr   r10, SPRN_DAR
+   mtspr   SPRN_DAR, r11   /* Tag DAR */
+   mfspr   r11, SPRN_M_TW
rfi
patch_site  0b, patch__dtlbmiss_exit_1
 
@@ -499,8 +499,9 @@ DataStoreTLBMiss:
 0: lwz r10, (dtlb_miss_counter - PAGE_OFFSET)@l(0)
addir10, r10, 1
stw r10, (dtlb_miss_counter - PAGE_OFFSET)@l(0)
-   mfspr   r10, SPRN_SPRG_SCRATCH0
-   mfspr   r11, SPRN_SPRG_SCRATCH1
+   mfspr   r10, SPRN_DAR
+   mtspr   SPRN_DAR, r11   /* Tag DAR */
+   mfspr   r11, SPRN_M_TW
rfi
 #endif
 
@@ -516,10 +517,10 @@ DTLBMissIMMR:
mtspr   SPRN_MD_RPN, r10/* Update TLB entry */
 
li  r11, RPN_PATTERN
-   mtspr   SPRN_DAR, r11   /* Tag DAR */
 
-0: mfspr   r10, SPRN_SPRG_SCRATCH0
-   mfspr   r11, SPRN_SPRG_SCRATCH1
+0: mfspr   r10, SPRN_DAR
+   mtspr   SPRN_DAR, r11   /* Tag DAR */
+   mfspr   r11, SPRN_M_TW
rfi
patch_site  0b, patch__dtlbmiss_exit_2
 
@@ -534,10 +535,10 @@ DTLBMissLinear:
mtspr   SPRN_MD_RPN, r10/* Update TLB entry */
 
li  r11, RPN_PATTERN
-   mtspr   SPRN_DAR, r11   /* Tag DAR */
 
-0: mfspr   r10, SPRN_SPRG_SCRATCH0
-   mfspr   r11, SPRN_SPRG_SCRATCH1
+0: mfspr   r10, SPRN_DAR
+   mtspr   SPRN_DAR, r11   /* Tag DAR */
+   mfspr   r11, SPRN_M_TW
rfi
patch_site  0b, patch__dtlbmiss_exit_3
 
diff --git a/arch/powerpc/perf/8xx-pmu.c b/arch/powerpc/perf/8xx-pmu.c
index e38f74e9e7a4..4556c8837575 100644
--- a/arch/powerpc/perf/8xx-pmu.c
+++ b/arch/powerpc/perf/8xx-pmu.c
@@ -161,10 +161,6 @@ static void mpc8xx_pmu_read(struct perf_event *event)
 
 static void mpc8xx_pmu_del(struct perf_event *event, int flags)
 {
-   /* mfspr r10, SPRN_SPRG_SCRATCH0 */
-   unsigned int insn = PPC_INST_MFSPR | __PPC_RS(R10) |
-   __PPC_SPR(SPRN_SPRG_SCRATCH0);
-
mpc8xx_pmu_read(event);
 
/* If it was the last user, stop counting to avoid useles overhead */
@@ -177,6 +173,10 @@ static void mpc8xx_pmu_del(struct perf_event *event, int 
flags)
break;
case PERF_8xx_ID_ITLB_LOAD_MISS:
if (atomic_dec_return(_miss_ref) == 0) {
+   /* mfspr r10, SPRN_SPRG_SCRATCH0 */
+   unsigned int insn = PPC_INST_MFSPR | __PPC_RS(R10) |
+   __PPC_SPR(SPRN_SPRG_SCRATCH0);
+
patch_instruction_site(__itlbmiss_exit_1, insn);
 #ifndef CONFIG_PIN_TLB_TEXT
patch_instruction_site(__itlbmiss_exit_2, insn);
@@ -185,6 +185,10 @@ static void mpc8xx_pmu_del(struct perf_event *event, int 
flags)
break;
case PERF_8xx_ID_DTLB_LOAD_MISS:
if (atomic_dec_return(_miss_ref) == 0) {
+   /* mfspr r10, SPRN_DAR */
+   unsigned int insn = PPC_INST_MFSPR | __PPC_RS(R10) |
+   __PPC_SPR(SPRN_DAR);
+
patch_instruction_site(__dtlbmiss_exit_1, insn);
patch_instruction_site(__dtlbmiss_exit_2, insn);
patch_instruction_site(__dtlbmiss_exit_3, insn);
-- 
2.13.3



[RFC PATCH v1 1/3] powerpc/32: prepare for CONFIG_VMAP_STACK

2018-12-11 Thread Christophe Leroy
To support CONFIG_VMAP_STACK, the kernel must be able to activate
Data MMU Translation for accessing the stack. Before doing that
it must save SRR0, SRR1 and DAR in order to not loose them in
case there is a Data TLB Miss once the translation is reactivated.

This patch defines fields in the thread struct for saving those
registers. It also prepares entry_32.S to handle exception entry
with Data MMU Translation enabled.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/processor.h   |  5 +
 arch/powerpc/include/asm/thread_info.h |  5 +
 arch/powerpc/kernel/asm-offsets.c  |  5 +
 arch/powerpc/kernel/entry_32.S | 16 
 4 files changed, 31 insertions(+)

diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 8179b64871ed..e839a1231b17 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -218,6 +218,11 @@ struct thread_struct {
 #ifdef CONFIG_PPC32
void*pgdir; /* root of page-table tree */
unsigned long   ksp_limit;  /* if ksp <= ksp_limit stack overflow */
+#ifdef CONFIG_VMAP_STACK
+   unsigned long   dar;
+   unsigned long   srr0;
+   unsigned long   srr1;
+#endif
 #endif
/* Debug Registers */
struct debug_reg debug;
diff --git a/arch/powerpc/include/asm/thread_info.h 
b/arch/powerpc/include/asm/thread_info.h
index 8e1d0195ac36..488d5c4670ff 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -10,10 +10,15 @@
 #define _ASM_POWERPC_THREAD_INFO_H
 
 #include 
+#include 
 
 #ifdef __KERNEL__
 
+#if defined(CONFIG_VMAP_STACK) && CONFIG_THREAD_SHIFT < PAGE_SHIFT
+#define THREAD_SHIFT   PAGE_SHIFT
+#else
 #define THREAD_SHIFT   CONFIG_THREAD_SHIFT
+#endif
 
 #define THREAD_SIZE(1 << THREAD_SHIFT)
 
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 03439785c2ea..985523ef23e8 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -128,6 +128,11 @@ int main(void)
OFFSET(KSP_VSID, thread_struct, ksp_vsid);
 #else /* CONFIG_PPC64 */
OFFSET(PGDIR, thread_struct, pgdir);
+#ifdef CONFIG_VMAP_STACK
+   OFFSET(SRR0, thread_struct, srr0);
+   OFFSET(SRR1, thread_struct, srr1);
+   OFFSET(DAR, thread_struct, dar);
+#endif
 #ifdef CONFIG_SPE
OFFSET(THREAD_EVR0, thread_struct, evr[0]);
OFFSET(THREAD_ACC, thread_struct, acc);
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 52a061f14c7d..6e2c45fdd2c0 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -150,8 +150,13 @@ transfer_to_handler:
stw r12,_CTR(r11)
stw r2,_XER(r11)
mfspr   r12,SPRN_SPRG_THREAD
+#ifdef CONFIG_VMAP_STACK
+   tovirt(r12, r12)
+#endif
addir2,r12,-THREAD
+#ifndef CONFIG_VMAP_STACK
tovirt(r2,r2)   /* set r2 to current */
+#endif
beq 2f  /* if from user, fix up THREAD.regs */
addir11,r1,STACK_FRAME_OVERHEAD
stw r11,PT_REGS(r12)
@@ -179,9 +184,13 @@ transfer_to_handler:
stw r12,4(r11)
 #endif
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
+#ifdef CONFIG_VMAP_STACK
+   ACCOUNT_CPU_USER_ENTRY(r2, r11, r12)
+#else
tophys(r9, r2)
ACCOUNT_CPU_USER_ENTRY(r9, r11, r12)
 #endif
+#endif
 
b   3f
 
@@ -193,8 +202,12 @@ transfer_to_handler:
ble-stack_ovf   /* then the kernel stack overflowed */
 5:
 #if defined(CONFIG_PPC_BOOK3S_32) || defined(CONFIG_E500)
+#ifdef CONFIG_VMAP_STACK
+   lwz r12, TI_LOCAL_FLAGS(r2)
+#else
tophys(r9,r2)   /* check local flags */
lwz r12,TI_LOCAL_FLAGS(r9)
+#endif
mtcrf   0x01,r12
bt- 31-TLF_NAPPING,4f
bt- 31-TLF_SLEEPING,7f
@@ -203,6 +216,9 @@ transfer_to_handler:
 transfer_to_handler_cont:
 3:
mflrr9
+#ifdef CONFIG_VMAP_STACK
+   tovirt(r9,r9)
+#endif
lwz r11,0(r9)   /* virtual address of handler */
lwz r9,4(r9)/* where to go when done */
 #if defined(CONFIG_PPC_8xx) && defined(CONFIG_PERF_EVENTS)
-- 
2.13.3



[RFC PATCH v1 0/3] Enable CONFIG_VMAP_STACK on the 8xx

2018-12-11 Thread Christophe Leroy
The purpose of this serie is to enable CONFIG_VMAP_STACK on the 8xx.

I'm sending it now to get early feedback if any.
For the time being, there is no proper handling of stack overflow.

This serie applies on the top of the serie "powerpc: Switch to 
CONFIG_THREAD_INFO_IN_TASK"

Christophe Leroy (3):
  powerpc/32: prepare for CONFIG_VMAP_STACK
  powerpc/8xx: Use alternative scratch registers in DTLB miss handler
  powerpc/8xx: Enable CONFIG_VMAP_STACK

 arch/powerpc/Kconfig   |   1 +
 arch/powerpc/include/asm/processor.h   |   5 ++
 arch/powerpc/include/asm/thread_info.h |   5 ++
 arch/powerpc/kernel/asm-offsets.c  |   5 ++
 arch/powerpc/kernel/entry_32.S |  16 +
 arch/powerpc/kernel/head_8xx.S | 114 ++---
 arch/powerpc/perf/8xx-pmu.c|  12 ++--
 7 files changed, 132 insertions(+), 26 deletions(-)

-- 
2.13.3



Re: [PATCH 10/10] perf/doc: update design.txt for exclude_{host|guest} flags

2018-12-11 Thread Michael Ellerman
[ Reviving old thread. ]

Andrew Murray  writes:
> On Tue, Nov 20, 2018 at 10:31:36PM +1100, Michael Ellerman wrote:
>> Andrew Murray  writes:
>> 
>> > Update design.txt to reflect the presence of the exclude_host
>> > and exclude_guest perf flags.
>> >
>> > Signed-off-by: Andrew Murray 
>> > ---
>> >  tools/perf/design.txt | 4 
>> >  1 file changed, 4 insertions(+)
>> >
>> > diff --git a/tools/perf/design.txt b/tools/perf/design.txt
>> > index a28dca2..7de7d83 100644
>> > --- a/tools/perf/design.txt
>> > +++ b/tools/perf/design.txt
>> > @@ -222,6 +222,10 @@ The 'exclude_user', 'exclude_kernel' and 'exclude_hv' 
>> > bits provide a
>> >  way to request that counting of events be restricted to times when the
>> >  CPU is in user, kernel and/or hypervisor mode.
>> >  
>> > +Furthermore the 'exclude_host' and 'exclude_guest' bits provide a way
>> > +to request counting of events restricted to guest and host contexts when
>> > +using virtualisation.
>> 
>> How does exclude_host differ from exclude_hv ?
>
> I believe exclude_host / exclude_guest are intented to distinguish
> between host and guest in the hosted hypervisor context (KVM).

OK yeah, from the perf-list man page:

   u - user-space counting
   k - kernel counting
   h - hypervisor counting
   I - non idle counting
   G - guest counting (in KVM guests)
   H - host counting (not in KVM guests)

> Whereas exclude_hv allows to distinguish between guest and
> hypervisor in the bare-metal type hypervisors.

Except that's exactly not how we use them on powerpc :)

We use exclude_hv to exclude "the hypervisor", regardless of whether
it's KVM or PowerVM (which is a bare-metal hypervisor).

We don't use exclude_host / exclude_guest at all, which I guess is a
bug, except I didn't know they existed until this thread.

eg, in a KVM guest:

  $ perf record -e cycles:G /bin/bash -c "for i in {0..10}; do :;done"
  $ perf report -D | grep -Fc "dso: [hypervisor]"
  16


> In the case of arm64 - if VHE extensions are present then the host
> kernel will run at a higher privilege to the guest kernel, in which
> case there is no distinction between hypervisor and host so we ignore
> exclude_hv. But where VHE extensions are not present then the host
> kernel runs at the same privilege level as the guest and we use a
> higher privilege level to switch between them - in this case we can
> use exclude_hv to discount that hypervisor role of switching between
> guests.

I couldn't find any arm64 perf code using exclude_host/guest at all?

And I don't see any x86 code using exclude_hv.

But maybe that's OK, I just worry this is confusing for users.

cheers


Re: [PATCH v2] cpufreq: powernv: add of_node_put()

2018-12-11 Thread Rafael J. Wysocki
On Wednesday, November 21, 2018 5:02:04 AM CET Viresh Kumar wrote:
> On 20-11-18, 11:05, Yangtao Li wrote:
> > The of_find_node_by_path() returns a node pointer with refcount
> > incremented,but there is the lack of use of the of_node_put() when
> > done.Add the missing of_node_put() to release the refcount.
> > 
> > Signed-off-by: Yangtao Li 
> > ---
> > Changes in v2
> > -update changelog
> > ---
> >  drivers/cpufreq/powernv-cpufreq.c | 17 +++--
> >  1 file changed, 11 insertions(+), 6 deletions(-)
> 
> Acked-by: Viresh Kumar 

Patch applied, thanks!



Re: [PATCH v2] cpufreq: pmac64: add of_node_put()

2018-12-11 Thread Rafael J. Wysocki
On Monday, November 26, 2018 7:02:26 AM CET Viresh Kumar wrote:
> On 23-11-18, 08:33, Yangtao Li wrote:
> > of_find_node_by_path() acquires a reference to the node
> > returned by it and that reference needs to be dropped by its caller.
> > g5_neo2_cpufreq_init() doesn't do that, so fix it.
> > 
> > Signed-off-by: Yangtao Li 
> > Acked-by: Viresh Kumar 
> > ---
> > Changes in v2:
> > -update changelog
> > ---
> >  drivers/cpufreq/pmac64-cpufreq.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/cpufreq/pmac64-cpufreq.c 
> > b/drivers/cpufreq/pmac64-cpufreq.c
> > index be623dd7b9f2..1d32a863332d 100644
> > --- a/drivers/cpufreq/pmac64-cpufreq.c
> > +++ b/drivers/cpufreq/pmac64-cpufreq.c
> > @@ -411,6 +411,7 @@ static int __init g5_neo2_cpufreq_init(struct 
> > device_node *cpunode)
> > pfunc_set_vdnap0 = pmf_find_function(root, "set-vdnap0");
> > pfunc_vdnap0_complete =
> > pmf_find_function(root, "slewing-done");
> > +   of_node_put(root);
> > if (pfunc_set_vdnap0 == NULL ||
> > pfunc_vdnap0_complete == NULL) {
> > pr_err("Can't find required platform function\n");
> 
> Acked-by: Viresh Kumar 

Patch applied, thanks!



Re: [PATCH] ocxl: Simplify free_spa()

2018-12-11 Thread Greg Kurz
On Tue, 11 Dec 2018 20:13:21 +1100
Andrew Donnellan  wrote:

> On 11/12/18 7:57 pm, Greg Kurz wrote:
> > I now realize that I should have mentioned the real motivation for this
> > change. I'm working on refactoring the code so that we can use ocxl in a
> > KVM guest. The concept of link can be shared by both powernv and pseries
> > variants but the SPA is definitely a powernv only thingy. The benefit
> > of this patch is hence to kick 'struct link' out of free_spa() so that
> > it can be utimately moved to powernv specific code.
> > 
> > The initial version of this change was just moving the link->spa check
> > and link->spa = NULL to the callers, where it was quite obvious they're
> > not needed... Should I re-post this as two patches for clarity ?  
> 
> Ah, that makes much more sense.
> 
> If that's the case, then why not go all the way and change the function 
> signature while you're at it?
> 

Sure, will do.


Re: use generic DMA mapping code in powerpc V4

2018-12-11 Thread Rui Salvaterra
On Mon, 10 Dec 2018 at 20:49, Benjamin Herrenschmidt
 wrote:
>

[snip]

>
> AGP is a gigantic nightmare :-) It's not just cache coherency issues
> (some implementations are coherent, some aren't, Apple's is ... weird).
>
> Apple has all sort of bugs, and Darwin source code only sheds light on
> some of them. Some implementation can only read, not write I think, for
> example. There are issues with transfers crossing some boundaries I
> beleive, but it's all unclear.
>
> Apple makes this work with a combination of hacks in the AGP "driver"
> and the closed source GPU driver, which we don't see.
>
> I have given up trying to make that stuff work reliably a decade ago :)
>
> Cheers,
> Ben.

That's what I was afraid of… what a mess. At least now I have a
definitive answer from one of the original authors of the code, thanks
a lot, Ben. :)
I have an unresearched belief that AGP support was hacked in the Mac
series as an afterthought (weren't they supposed to be PCI/PCI-X
only?), and your explanation surely seems to corroborate. :/


Re: [PATCH] crypto: talitos - fix ablkcipher for CONFIG_VMAP_STACK

2018-12-11 Thread Herbert Xu
On Tue, Dec 11, 2018 at 07:29:40AM +, Christophe Leroy wrote:
>
> diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
> index 6988012deca4..385ec970b639 100644
> --- a/drivers/crypto/talitos.c
> +++ b/drivers/crypto/talitos.c
> @@ -1668,8 +1668,11 @@ static struct talitos_edesc 
> *ablkcipher_edesc_alloc(struct ablkcipher_request *
>   struct talitos_ctx *ctx = crypto_ablkcipher_ctx(cipher);
>   unsigned int ivsize = crypto_ablkcipher_ivsize(cipher);
>  
> + if (ivsize)
> + memcpy(ctx->iv, areq->info, ivsize);

The ctx is per-tfm, not per-request.  So you cannot write to it.
This needs to go into a pre-request area.

>   return talitos_edesc_alloc(ctx->dev, areq->src, areq->dst,
> -areq->info, 0, areq->nbytes, 0, ivsize, 0,
> +ctx->iv, 0, areq->nbytes, 0, ivsize, 0,
>  areq->base.flags, encrypt);
>  }

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] ocxl: Simplify free_spa()

2018-12-11 Thread Andrew Donnellan

On 11/12/18 7:57 pm, Greg Kurz wrote:

I now realize that I should have mentioned the real motivation for this
change. I'm working on refactoring the code so that we can use ocxl in a
KVM guest. The concept of link can be shared by both powernv and pseries
variants but the SPA is definitely a powernv only thingy. The benefit
of this patch is hence to kick 'struct link' out of free_spa() so that
it can be utimately moved to powernv specific code.

The initial version of this change was just moving the link->spa check
and link->spa = NULL to the callers, where it was quite obvious they're
not needed... Should I re-post this as two patches for clarity ?


Ah, that makes much more sense.

If that's the case, then why not go all the way and change the function 
signature while you're at it?


--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited



Re: [PATCH] ocxl: Simplify free_spa()

2018-12-11 Thread Greg Kurz
On Tue, 11 Dec 2018 12:05:49 +1100
Andrew Donnellan  wrote:

> On 11/12/18 2:15 am, Greg Kurz wrote:
> > The only users of free_spa() are alloc_link() and free_link(), and
> > in both cases:
> > 
> > - link->spa != NULL
> > 
> > - free_spa(link) is immediatly followed by kfree(link)
> > 
> > The check isn't needed, and it doesn't bring much to clear the link->spa
> > pointer. Drop both.
> > 
> > Signed-off-by: Greg Kurz   
> 
> I like defensive programming but for this case I don't really care too 
> much either way
> 

I now realize that I should have mentioned the real motivation for this
change. I'm working on refactoring the code so that we can use ocxl in a
KVM guest. The concept of link can be shared by both powernv and pseries
variants but the SPA is definitely a powernv only thingy. The benefit
of this patch is hence to kick 'struct link' out of free_spa() so that
it can be utimately moved to powernv specific code.

The initial version of this change was just moving the link->spa check
and link->spa = NULL to the callers, where it was quite obvious they're
not needed... Should I re-post this as two patches for clarity ?

> Acked-by: Andrew Donnellan 
> 
> > ---
> >   drivers/misc/ocxl/link.c |7 ++-
> >   1 file changed, 2 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
> > index 31695a078485..eed92055184d 100644
> > --- a/drivers/misc/ocxl/link.c
> > +++ b/drivers/misc/ocxl/link.c
> > @@ -352,11 +352,8 @@ static void free_spa(struct link *link)
> > pr_debug("Freeing SPA for %x:%x:%x\n", link->domain, link->bus,
> > link->dev);
> >   
> > -   if (spa && spa->spa_mem) {
> > -   free_pages((unsigned long) spa->spa_mem, spa->spa_order);
> > -   kfree(spa);
> > -   link->spa = NULL;
> > -   }
> > +   free_pages((unsigned long) spa->spa_mem, spa->spa_order);
> > +   kfree(spa);
> >   }
> >   
> >   static int alloc_link(struct pci_dev *dev, int PE_mask, struct link 
> > **out_link)
> >   
> 



Re: [PATCH] ocxl: Fix endiannes bug in read_afu_name()

2018-12-11 Thread Greg Kurz
On Tue, 11 Dec 2018 11:24:08 +1100
Andrew Donnellan  wrote:

> On 11/12/18 11:05 am, Andrew Donnellan wrote:
> > On 11/12/18 2:10 am, Greg Kurz wrote:  
> >> The double word returned by read_afu_info(OCXL_DVSEC_TEMPL_NAME) contains
> >> four characters of the AFU name, read from the PCI config space, hence
> >> with a little-endian ordering. When composing the string, a big-endian
> >> system must swap the bytes so that the characters appear in the right
> >> order.
> >>
> >> Do this with le32_to_cpu().
> >>
> >> Signed-off-by: Greg Kurz   
> > 
> > snowpatch reports the following sparse warning:
> > 
> > +drivers/misc/ocxl/config.c:321:24: warning: cast to restricted __le32
> > 
> > You probably need to change val from a u32 to a __le32.  
> 

You might be right, I'll look into this.

> Also does this need to go to stable?
> 

Oops... this bug has been there since the beginning, so yes
it does. I simply forgot to add the Cc: stable tag... :-\

Cheers,

--
Greg