Re: [PATCH 2/4] samples/bpf: Use llc in PATH, rather than a hardcoded value

2016-04-01 Thread Naveen N. Rao
On 2016/03/31 08:19PM, Daniel Borkmann wrote:
> On 03/31/2016 07:46 PM, Alexei Starovoitov wrote:
> >On 3/31/16 4:25 AM, Naveen N. Rao wrote:
> >>  clang $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) \
> >>  -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign 
> >> \
> >>--O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf -filetype=obj -o $@
> >>+-O2 -emit-llvm -c $< -o -| llc -march=bpf -filetype=obj -o $@
> >>  clang $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) \
> >>  -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign 
> >> \
> >>--O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf -filetype=asm -o $@.s
> >>+-O2 -emit-llvm -c $< -o -| llc -march=bpf -filetype=asm -o $@.s
> >
> >that was a workaround when clang/llvm didn't have bpf support.
> >Now clang 3.7 and 3.8 have bpf built-in, so make sense to remove
> >manual calls to llc completely.
> >Just use 'clang -target bpf -O2 -D... -c $< -o $@'
> 
> +1, the clang part in that Makefile should also more correctly be called
> with '-target bpf' as it turns out (despite llc with '-march=bpf' ...).
> Better to use clang directly as suggested by Alexei.

I'm likely missing something obvious, but I cannot get this to work.  
With this diff:

 $(obj)/%.o: $(src)/%.c
clang $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) \
-D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value 
-Wno-pointer-sign \
-   -O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf 
-filetype=obj -o $@
-   clang $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) \
-   -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value 
-Wno-pointer-sign \
-   -O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf 
-filetype=asm -o $@.s
+   -O2 -target bpf -c $< -o $@

I see far too many errors thrown starting with:

clang  -nostdinc -isystem 
/usr/lib/gcc/x86_64-redhat-linux/4.8.2/include 
-I./arch/x86/include -Iarch/x86/include/generated/uapi 
-Iarch/x86/include/generated  -Iinclude 
-I./arch/x86/include/uapi -Iarch/x86/include/generated/uapi 
-I./include/uapi -Iinclude/generated/uapi -include 
./include/linux/kconfig.h  \
-D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value 
-Wno-pointer-sign \
-O2 -target bpf -c samples/bpf/map_perf_test_kern.c -o 
samples/bpf/map_perf_test_kern.o
In file included from samples/bpf/map_perf_test_kern.c:7:
In file included from include/linux/skbuff.h:17:
In file included from include/linux/kernel.h:10:
In file included from include/linux/bitops.h:36:
In file included from ./arch/x86/include/asm/bitops.h:500:
./arch/x86/include/asm/arch_hweight.h:31:10: error: invalid output 
constraint '=a' in asm
 : "="REG_OUT (res)
   ^
./arch/x86/include/asm/arch_hweight.h:59:10: error: invalid output 
constraint '=a' in asm
 : "="REG_OUT (res)


What am I missing?


- Naveen

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 4/4] samples/bpf: Enable powerpc support

2016-04-01 Thread Naveen N. Rao
On 2016/03/31 10:52AM, Alexei Starovoitov wrote:
> On 3/31/16 4:25 AM, Naveen N. Rao wrote:
> ...
> >+
> >+#ifdef __powerpc__
> >+#define BPF_KPROBE_READ_RET_IP(ip, ctx) { (ip) = (ctx)->link; }
> >+#define BPF_KRETPROBE_READ_RET_IP(ip, ctx)  BPF_KPROBE_READ_RET_IP(ip, ctx)
> >+#else
> >+#define BPF_KPROBE_READ_RET_IP(ip, ctx) 
> >\
> >+bpf_probe_read(&(ip), sizeof(ip), (void *)PT_REGS_RET(ctx))
> >+#define BPF_KRETPROBE_READ_RET_IP(ip, ctx)  
> >\
> >+bpf_probe_read(&(ip), sizeof(ip),   
> >\
> >+(void *)(PT_REGS_FP(ctx) + sizeof(ip)))
> 
> makes sense, but please use ({ }) gcc extension instead of {} and
> open call to make sure that macro body is scoped.

To be sure I understand this right, do you mean something like this?

+
+#ifdef __powerpc__
+#define BPF_KPROBE_READ_RET_IP(ip, ctx)({ (ip) = (ctx)->link; 
})
+#define BPF_KRETPROBE_READ_RET_IP  BPF_KPROBE_READ_RET_IP
+#else
+#define BPF_KPROBE_READ_RET_IP(ip, ctx)({  
\
+   bpf_probe_read(&(ip), sizeof(ip), (void *)PT_REGS_RET(ctx)); })
+#define BPF_KRETPROBE_READ_RET_IP(ip, ctx) ({  
\
+   bpf_probe_read(&(ip), sizeof(ip),   
\
+   (void *)(PT_REGS_FP(ctx) + sizeof(ip))); })
+#endif
+


Thanks,
Naveen

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/4] samples/bpf: Use llc in PATH, rather than a hardcoded value

2016-04-01 Thread Alexei Starovoitov

On 4/1/16 7:37 AM, Naveen N. Rao wrote:

On 2016/03/31 08:19PM, Daniel Borkmann wrote:

On 03/31/2016 07:46 PM, Alexei Starovoitov wrote:

On 3/31/16 4:25 AM, Naveen N. Rao wrote:

  clang $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) \
  -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
--O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf -filetype=obj -o $@
+-O2 -emit-llvm -c $< -o -| llc -march=bpf -filetype=obj -o $@
  clang $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) \
  -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
--O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf -filetype=asm -o $@.s
+-O2 -emit-llvm -c $< -o -| llc -march=bpf -filetype=asm -o $@.s


that was a workaround when clang/llvm didn't have bpf support.
Now clang 3.7 and 3.8 have bpf built-in, so make sense to remove
manual calls to llc completely.
Just use 'clang -target bpf -O2 -D... -c $< -o $@'


+1, the clang part in that Makefile should also more correctly be called
with '-target bpf' as it turns out (despite llc with '-march=bpf' ...).
Better to use clang directly as suggested by Alexei.


I'm likely missing something obvious, but I cannot get this to work.
With this diff:

 $(obj)/%.o: $(src)/%.c
clang $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) \
-D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value 
-Wno-pointer-sign \
-   -O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf 
-filetype=obj -o $@
-   clang $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) \
-   -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value 
-Wno-pointer-sign \
-   -O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf 
-filetype=asm -o $@.s
+   -O2 -target bpf -c $< -o $@

I see far too many errors thrown starting with:
./arch/x86/include/asm/arch_hweight.h:31:10: error: invalid output 
constraint '=a' in asm
 : "="REG_OUT (res)


ahh. yes. when processing kernel headers clang has to assume x86 style
inline asm, though all of these functions will be ignored.
I don't have a quick fix for this yet.
Let's go back to your original change $(LLC)->llc

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH 6/6] ppc: ebpf/jit: Implement JIT compiler for extended BPF

2016-04-01 Thread Alexei Starovoitov

On 4/1/16 2:58 AM, Naveen N. Rao wrote:

PPC64 eBPF JIT compiler. Works for both ABIv1 and ABIv2.

Enable with:
echo 1 > /proc/sys/net/core/bpf_jit_enable
or
echo 2 > /proc/sys/net/core/bpf_jit_enable

... to see the generated JIT code. This can further be processed with
tools/net/bpf_jit_disasm.

With CONFIG_TEST_BPF=m and 'modprobe test_bpf':
test_bpf: Summary: 291 PASSED, 0 FAILED, [234/283 JIT'ed]

... on both ppc64 BE and LE.

The details of the approach are documented through various comments in
the code, as are the TODOs. Some of the prominent TODOs include
implementing BPF tail calls and skb loads.

Cc: Matt Evans 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: Alexei Starovoitov 
Cc: "David S. Miller" 
Cc: Ananth N Mavinakayanahalli 
Signed-off-by: Naveen N. Rao 
---
  arch/powerpc/include/asm/ppc-opcode.h |  19 +-
  arch/powerpc/net/Makefile |   4 +
  arch/powerpc/net/bpf_jit.h|  66 ++-
  arch/powerpc/net/bpf_jit64.h  |  58 +++
  arch/powerpc/net/bpf_jit_comp64.c | 828 ++
  5 files changed, 973 insertions(+), 2 deletions(-)
  create mode 100644 arch/powerpc/net/bpf_jit64.h
  create mode 100644 arch/powerpc/net/bpf_jit_comp64.c

...

-#ifdef CONFIG_PPC64
+#if defined(CONFIG_PPC64) && (!defined(_CALL_ELF) || _CALL_ELF != 2)


impressive stuff!
Everything nicely documented. Could you add few words for the above
condition as well ?
Or may be a new macro, since it occurs many times?
What are these _CALL_ELF == 2 and != 2 conditions mean? ppc ABIs ?
Will there ever be v3 ?

So far most of the bpf jits were going via net-next tree, but if
in this case no changes to the core is necessary then I guess it's fine
to do it via powerpc tree. What's your plan?

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 4/4] samples/bpf: Enable powerpc support

2016-04-01 Thread Alexei Starovoitov

On 4/1/16 7:41 AM, Naveen N. Rao wrote:

On 2016/03/31 10:52AM, Alexei Starovoitov wrote:

On 3/31/16 4:25 AM, Naveen N. Rao wrote:
...

+
+#ifdef __powerpc__
+#define BPF_KPROBE_READ_RET_IP(ip, ctx){ (ip) = (ctx)->link; }
+#define BPF_KRETPROBE_READ_RET_IP(ip, ctx) BPF_KPROBE_READ_RET_IP(ip, ctx)
+#else
+#define BPF_KPROBE_READ_RET_IP(ip, ctx)
\
+   bpf_probe_read(&(ip), sizeof(ip), (void *)PT_REGS_RET(ctx))
+#define BPF_KRETPROBE_READ_RET_IP(ip, ctx) 
\
+   bpf_probe_read(&(ip), sizeof(ip),   \
+   (void *)(PT_REGS_FP(ctx) + sizeof(ip)))


makes sense, but please use ({ }) gcc extension instead of {} and
open call to make sure that macro body is scoped.


To be sure I understand this right, do you mean something like this?

+
+#ifdef __powerpc__
+#define BPF_KPROBE_READ_RET_IP(ip, ctx)({ (ip) = (ctx)->link; 
})
+#define BPF_KRETPROBE_READ_RET_IP  BPF_KPROBE_READ_RET_IP
+#else
+#define BPF_KPROBE_READ_RET_IP(ip, ctx)({  
\
+   bpf_probe_read(&(ip), sizeof(ip), (void *)PT_REGS_RET(ctx)); })
+#define BPF_KRETPROBE_READ_RET_IP(ip, ctx) ({  
\
+   bpf_probe_read(&(ip), sizeof(ip),   
\
+   (void *)(PT_REGS_FP(ctx) + sizeof(ip))); })
+#endif


yes. Thanks!

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH 6/6] ppc: ebpf/jit: Implement JIT compiler for extended BPF

2016-04-01 Thread Daniel Borkmann

On 04/01/2016 08:10 PM, Alexei Starovoitov wrote:

On 4/1/16 2:58 AM, Naveen N. Rao wrote:

PPC64 eBPF JIT compiler. Works for both ABIv1 and ABIv2.

Enable with:
echo 1 > /proc/sys/net/core/bpf_jit_enable
or
echo 2 > /proc/sys/net/core/bpf_jit_enable

... to see the generated JIT code. This can further be processed with
tools/net/bpf_jit_disasm.

With CONFIG_TEST_BPF=m and 'modprobe test_bpf':
test_bpf: Summary: 291 PASSED, 0 FAILED, [234/283 JIT'ed]

... on both ppc64 BE and LE.

The details of the approach are documented through various comments in
the code, as are the TODOs. Some of the prominent TODOs include
implementing BPF tail calls and skb loads.

Cc: Matt Evans 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: Alexei Starovoitov 
Cc: "David S. Miller" 
Cc: Ananth N Mavinakayanahalli 
Signed-off-by: Naveen N. Rao 
---
  arch/powerpc/include/asm/ppc-opcode.h |  19 +-
  arch/powerpc/net/Makefile |   4 +
  arch/powerpc/net/bpf_jit.h|  66 ++-
  arch/powerpc/net/bpf_jit64.h  |  58 +++
  arch/powerpc/net/bpf_jit_comp64.c | 828 ++
  5 files changed, 973 insertions(+), 2 deletions(-)
  create mode 100644 arch/powerpc/net/bpf_jit64.h
  create mode 100644 arch/powerpc/net/bpf_jit_comp64.c

...

-#ifdef CONFIG_PPC64
+#if defined(CONFIG_PPC64) && (!defined(_CALL_ELF) || _CALL_ELF != 2)


impressive stuff!


+1, awesome to see another one!


Everything nicely documented. Could you add few words for the above
condition as well ?
Or may be a new macro, since it occurs many times?
What are these _CALL_ELF == 2 and != 2 conditions mean? ppc ABIs ?
Will there ever be v3 ?


Minor TODO would also be to convert to use bpf_jit_binary_alloc() and
bpf_jit_binary_free() API for the image, which is done by other eBPF
jits, too.


So far most of the bpf jits were going via net-next tree, but if
in this case no changes to the core is necessary then I guess it's fine
to do it via powerpc tree. What's your plan?



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3] ppc64/book3s: fix branching to out of line handlers in relocation kernel

2016-04-01 Thread Hari Bathini



On 04/01/2016 04:07 PM, Michael Ellerman wrote:

On Fri, 2016-04-01 at 12:23 +0530, Hari Bathini wrote:

On 04/01/2016 11:44 AM, Michael Ellerman wrote:

On Wed, 2016-03-30 at 23:49 +0530, Hari Bathini wrote:

Some of the interrupt vectors on 64-bit POWER server processors  are
only 32 bytes long (8 instructions), which is not enough for the full

...

Let us fix this undependable code path by moving these OOL handlers below
__end_interrupts marker to make sure we also copy these handlers to real
address 0x100 when running a relocatable kernel. Because the interrupt
vectors branching to these OOL handlers are not long enough to use
LOAD_HANDLER() for branching as discussed above.


...

changes from v2:
2. Move the OOL handlers before __end_interrupts marker instead of moving the 
__end_interrupts marker
3. Leave __end_handlers marker as is.

Hi Hari,

Thanks for trying this. In the end I've decided it's not a good option.

If you build an allmodconfig, and turn on CONFIG_RELOCATABLE, and then look at
the disassembly, you see this:

c0006ffc:   48 00 29 04 b   c0009900 
<.ret_from_except>

c0007000 <__end_handlers>:


At 0x7000 we have the FWNMI area, which is fixed and can't move. As you see
above we end up with only 4 bytes of space between the end of the handlers and
the FWNMI area.

So any tiny change that adds two more instructions prior to 0x7000 will then
fail to build.

Hi Michael,

I agree. But the OOL handlers that are moved up in v3 were below
0x7000 earlier as well and moving them below __end_interrupts marker
shouldn't make any difference in terms of space consumption at least in
comparison between v2 & v3. So, I guess picking either v2 or v3
doesn't change this for better.

It does make a difference, due to alignment. Prior to your patch we have ~24
bytes free.


Hi Michael,

Hmmm.. I thought ~24 bytes was not such a difference but with the scenario
you mentioned it does sound critical. Actually, this patch came into being
for want of another 8~12 bytes. So, I should have known better about
space constraint.




Also, there is code between __end_interrupts and __end_handlers
that is not location dependent as long as it is within 64K (0x1)
that can be moved above 0x8000, if need be.

That's true, but that sort of change is unlikely to backport well. And we need
to backport this fix to everything.


That does sound like a maintainer's nightmare.


But if you can get that to work I'll consider it. I tried quickly but couldn't
get it working, due to problems with the feature else sections being too far
away from.


Same case. May need sometime to get that right.
Also, exploring holes between __start_interrupts & __end_interrupts.
Will try and get back on this soon.
If none of this works, we have v2 anyway.

Thanks
Hari

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] ftrace: filter: Match dot symbols when searching functions on ppc64.

2016-04-01 Thread kbuild test robot
Hi Thiago,

[auto build test ERROR on v4.6-rc1]
[also build test ERROR on next-20160401]
[cannot apply to tip/perf/core]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improving the system]

url:
https://github.com/0day-ci/linux/commits/Thiago-Jung-Bauermann/ftrace-filter-Match-dot-symbols-when-searching-functions-on-ppc64/20160401-112617
config: powerpc-ppc6xx_defconfig (attached as .config)
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=powerpc 

All errors (new ones prefixed by >>):

   In file included from include/linux/ftrace.h:20:0,
from include/linux/perf_event.h:47,
from include/linux/trace_events.h:9,
from include/trace/syscall.h:6,
from include/linux/syscalls.h:81,
from arch/powerpc/kernel/pci-common.c:29:
>> arch/powerpc/include/asm/ftrace.h:62:5: error: "CONFIG_PPC64" is not defined 
>> [-Werror=undef]
#if CONFIG_PPC64 && (!defined(_CALL_ELF) || _CALL_ELF != 2)
^
   cc1: all warnings being treated as errors

vim +/CONFIG_PPC64 +62 arch/powerpc/include/asm/ftrace.h

56  
57  struct dyn_arch_ftrace {
58  struct module *mod;
59  };
60  #endif /*  CONFIG_DYNAMIC_FTRACE */
61  
  > 62  #if CONFIG_PPC64 && (!defined(_CALL_ELF) || _CALL_ELF != 2)
63  #define ARCH_HAS_FTRACE_MATCH_ADJUST
64  static inline void arch_ftrace_match_adjust(char **str, char *search)
65  {

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] ftrace: filter: Match dot symbols when searching functions on ppc64.

2016-04-01 Thread Thiago Jung Bauermann
Am Samstag, 02 April 2016, 03:51:21 schrieb kbuild test robot:
> >> arch/powerpc/include/asm/ftrace.h:62:5: error: "CONFIG_PPC64" is not
> >> defined [-Werror=undef]
> #if CONFIG_PPC64 && (!defined(_CALL_ELF) || _CALL_ELF != 2)
> ^
>cc1: all warnings being treated as errors

I forgot to use defined() in the #if expression. Here’s the fixed version.

-- 
[]'s
Thiago Jung Bauermann
IBM Linux Technology Center


 8<  8<  8<  8< 


From 27660a3b6c4147f9e1811b103cc47a34a53817c1 Mon Sep 17 00:00:00 2001
From: Thiago Jung Bauermann 
Date: Wed, 30 Mar 2016 21:26:32 -0300
Subject: [PATCH] ftrace: Match dot symbols when searching functions on ppc64

In the ppc64 big endian ABI, function symbols point to function
descriptors. The symbols which point to the function entry points
have a dot in front of the function name. Consequently, when the
ftrace filter mechanism searches for the symbol corresponding to
an entry point address, it gets the dot symbol.

As a result, ftrace filter users have to be aware of this ABI detail on
ppc64 and prepend a dot to the function name when setting the filter.

The perf probe command insulates the user from this by ignoring the dot
in front of the symbol name when matching function names to symbols,
but the sysfs interface does not. This patch makes the ftrace filter
mechanism do the same when searching symbols.

Fixes the following failure in ftracetest's kprobe_ftrace.tc:

  .../kprobe_ftrace.tc: line 9: echo: write error: Invalid argument

That failure is on this line of kprobe_ftrace.tc:

  echo _do_fork > set_ftrace_filter

This is because there's no _do_fork entry in the functions list:

  # cat available_filter_functions | grep _do_fork
  ._do_fork

This change introduces no regressions on the perf and ftracetest
testsuite results.

Cc: Steven Rostedt 
Cc: Ingo Molnar 
Cc: Michael Ellerman 
Signed-off-by: Thiago Jung Bauermann 
---
 arch/powerpc/include/asm/ftrace.h |  9 +
 kernel/trace/ftrace.c | 13 +
 2 files changed, 22 insertions(+)

diff --git a/arch/powerpc/include/asm/ftrace.h 
b/arch/powerpc/include/asm/ftrace.h
index 50ca7585abe2..f6ed1908f0f7 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -58,6 +58,15 @@ struct dyn_arch_ftrace {
struct module *mod;
 };
 #endif /*  CONFIG_DYNAMIC_FTRACE */
+
+#if defined(CONFIG_PPC64) && (!defined(_CALL_ELF) || _CALL_ELF != 2)
+#define ARCH_HAS_FTRACE_MATCH_ADJUST
+static inline void arch_ftrace_match_adjust(char **str, char *search)
+{
+   if ((*str)[0] == '.' && search[0] != '.')
+   (*str)++;
+}
+#endif /* defined(CONFIG_PPC64) && (!defined(_CALL_ELF) || _CALL_ELF != 2) */
 #endif /* __ASSEMBLY__ */
 
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index b1870fbd2b67..e806c2a3b7a8 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -3444,11 +3444,24 @@ struct ftrace_glob {
int type;
 };
 
+#ifndef ARCH_HAS_FTRACE_MATCH_ADJUST
+/*
+ * If symbols in an architecture don't correspond exactly to the user-visible
+ * name of what they represent, it is possible to define this function to
+ * perform the necessary adjustments.
+*/
+static inline void arch_ftrace_match_adjust(char **str, char *search)
+{
+}
+#endif
+
 static int ftrace_match(char *str, struct ftrace_glob *g)
 {
int matched = 0;
int slen;
 
+   arch_ftrace_match_adjust(, g->search);
+
switch (g->type) {
case MATCH_FULL:
if (strcmp(str, g->search) == 0)
-- 
1.9.1


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v7, 4/5] powerpc/fsl: move mpc85xx.h to include/linux/fsl

2016-04-01 Thread Stephen Boyd
On 03/31/2016 08:07 PM, Yangbo Lu wrote:
>  drivers/clk/clk-qoriq.c   | 3 +--
>

For clk part:

Acked-by: Stephen Boyd 

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc: ppc4xx: drop unused variable

2016-04-01 Thread Linus Walleij
On Fri, Apr 1, 2016 at 4:31 AM, Michael Ellerman  wrote:
> On Thu, 2016-03-31 at 14:57 +0200, Linus Walleij wrote:
>> On Thu, Mar 31, 2016 at 12:09 PM, Michael Ellerman  
>> wrote:
>>
>> > If you feel like cross building powerpc in future it should be as simple 
>> > as:
>> >
>> >  $ dnf install gcc-powerpc64-linux-gnu || apt-get install 
>> > gcc-powerpc-linux-gnu
>> >  $ make ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu- ...
>>
>> Ah hm yeah I guess everyone "should", it's just that these days I
>> mainly rely on Fenguang's kautobuild to do this job for me and
>> get back with the result from a plethora of arches.
>
> Sure. That makes sense for all the silly little architectures.
>
> But for powerpc you should really cross compile.

I think kautobuild cross compiles?

The only thing that happens IIUC is I let somebody else do the
job at the Intel server farm.

Yours,
Linus Walleij
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3] ppc64/book3s: fix branching to out of line handlers in relocation kernel

2016-04-01 Thread Hari Bathini



On 04/01/2016 11:44 AM, Michael Ellerman wrote:

On Wed, 2016-03-30 at 23:49 +0530, Hari Bathini wrote:

Some of the interrupt vectors on 64-bit POWER server processors  are
only 32 bytes long (8 instructions), which is not enough for the full

...

Let us fix this undependable code path by moving these OOL handlers below
__end_interrupts marker to make sure we also copy these handlers to real
address 0x100 when running a relocatable kernel. Because the interrupt
vectors branching to these OOL handlers are not long enough to use
LOAD_HANDLER() for branching as discussed above.


...

changes from v2:
2. Move the OOL handlers before __end_interrupts marker instead of moving the 
__end_interrupts marker
3. Leave __end_handlers marker as is.

Hi Hari,

Thanks for trying this. In the end I've decided it's not a good option.

If you build an allmodconfig, and turn on CONFIG_RELOCATABLE, and then look at
the disassembly, you see this:

   c0006ffc:   48 00 29 04 b   c0009900 
<.ret_from_except>
   
   c0007000 <__end_handlers>:


At 0x7000 we have the FWNMI area, which is fixed and can't move. As you see
above we end up with only 4 bytes of space between the end of the handlers and
the FWNMI area.

So any tiny change that adds two more instructions prior to 0x7000 will then
fail to build.


Hi Michael,

I agree. But the OOL handlers that are moved up in v3 were below
0x7000 earlier as well and moving them below __end_interrupts marker
shouldn't make any difference in terms of space consumption at least in
comparison between v2 & v3. So, I guess picking either v2 or v3
doesn't change this for better.

Also, there is code between __end_interrupts and __end_handlers
that is not location dependent as long as it is within 64K (0x1)
that can be moved above 0x8000, if need be.

For these reasons, I feel v3 is better going forward as it keeps
__start_interrupts to __end_interrupts code compact and
leaves alone the code that doesn't need to be copied to real 0.

Am I missing something here?

Thanks
Hari


None of that's your fault, it's just the nature of the code in there, it's very
space constrained.

For now I'll take your v2, but I'll edit the comment and drop the removal of
__end_handlers.

cheers



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3] ppc64/book3s: fix branching to out of line handlers in relocation kernel

2016-04-01 Thread Gabriel Paubert
Hi Michael,

On Fri, Apr 01, 2016 at 05:14:35PM +1100, Michael Ellerman wrote:
> On Wed, 2016-03-30 at 23:49 +0530, Hari Bathini wrote:
> > Some of the interrupt vectors on 64-bit POWER server processors  are
> > only 32 bytes long (8 instructions), which is not enough for the full
> ...
> > Let us fix this undependable code path by moving these OOL handlers below
> > __end_interrupts marker to make sure we also copy these handlers to real
> > address 0x100 when running a relocatable kernel. Because the interrupt
> > vectors branching to these OOL handlers are not long enough to use
> > LOAD_HANDLER() for branching as discussed above.
> > 
> ...
> > changes from v2:
> > 2. Move the OOL handlers before __end_interrupts marker instead of moving 
> > the __end_interrupts marker
> > 3. Leave __end_handlers marker as is.
> 
> Hi Hari,
> 
> Thanks for trying this. In the end I've decided it's not a good option.
> 
> If you build an allmodconfig, and turn on CONFIG_RELOCATABLE, and then look at
> the disassembly, you see this:
> 
>   c0006ffc:   48 00 29 04 b   c0009900 
> <.ret_from_except>
>   
>   c0007000 <__end_handlers>:
> 
> At 0x7000 we have the FWNMI area, which is fixed and can't move. As you see
> above we end up with only 4 bytes of space between the end of the handlers and
> the FWNMI area.

Nitpicking a bit, if I correctly read the above disassembly and there is an 
instuction
at 0x6ffc, the free space is exactly 0! 

> 
> So any tiny change that adds two more instructions prior to 0x7000 will then
> fail to build.

Even one instruction provided I still know how to count.

> 
> None of that's your fault, it's just the nature of the code in there, it's 
> very
> space constrained.

Calling it space very constrained makes you win the understatement of the month 
award, on April fool's day :-)

Regards,
Gabriel
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH 2/6] ppc: bpf/jit: Optimize 64-bit Immediate loads

2016-04-01 Thread Naveen N. Rao
Similar to the LI32() optimization, if the value can be represented
in 32-bits, use LI32(). Also handle loading a few specific forms of
immediate values in an optimum manner.

While at it, remove the semicolon at the end of the macro!

Cc: Matt Evans 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: Alexei Starovoitov 
Cc: "David S. Miller" 
Cc: Ananth N Mavinakayanahalli 
Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/net/bpf_jit.h | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index a9882db..4c1e055 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -244,20 +244,25 @@ DECLARE_LOAD_FUNC(sk_load_byte_msh);
} } while(0)
 
 #define PPC_LI64(d, i) do {  \
-   if (!((uintptr_t)(i) & 0xULL))\
+   if ((long)(i) >= -2147483648 &&   \
+   (long)(i) < 2147483648)   \
PPC_LI32(d, i);   \
else {\
-   PPC_LIS(d, ((uintptr_t)(i) >> 48));   \
-   if ((uintptr_t)(i) & 0xULL)   \
-   PPC_ORI(d, d, \
-   ((uintptr_t)(i) >> 32) & 0x); \
+   if (!((uintptr_t)(i) & 0x8000ULL))\
+   PPC_LI(d, ((uintptr_t)(i) >> 32) & 0x);   \
+   else {\
+   PPC_LIS(d, ((uintptr_t)(i) >> 48));   \
+   if ((uintptr_t)(i) & 0xULL)   \
+   PPC_ORI(d, d, \
+ ((uintptr_t)(i) >> 32) & 0x);   \
+   } \
PPC_SLDI(d, d, 32);   \
if ((uintptr_t)(i) & 0xULL)   \
PPC_ORIS(d, d,\
 ((uintptr_t)(i) >> 16) & 0x);\
if ((uintptr_t)(i) & 0xULL)   \
PPC_ORI(d, d, (uintptr_t)(i) & 0x);   \
-   } } while (0);
+   } } while (0)
 
 #ifdef CONFIG_PPC64
 #define PPC_FUNC_ADDR(d,i) do { PPC_LI64(d, i); } while(0)
-- 
2.7.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH 6/6] ppc: ebpf/jit: Implement JIT compiler for extended BPF

2016-04-01 Thread Naveen N. Rao
PPC64 eBPF JIT compiler. Works for both ABIv1 and ABIv2.

Enable with:
echo 1 > /proc/sys/net/core/bpf_jit_enable
or
echo 2 > /proc/sys/net/core/bpf_jit_enable

... to see the generated JIT code. This can further be processed with
tools/net/bpf_jit_disasm.

With CONFIG_TEST_BPF=m and 'modprobe test_bpf':
test_bpf: Summary: 291 PASSED, 0 FAILED, [234/283 JIT'ed]

... on both ppc64 BE and LE.

The details of the approach are documented through various comments in
the code, as are the TODOs. Some of the prominent TODOs include
implementing BPF tail calls and skb loads.

Cc: Matt Evans 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: Alexei Starovoitov 
Cc: "David S. Miller" 
Cc: Ananth N Mavinakayanahalli 
Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/include/asm/ppc-opcode.h |  19 +-
 arch/powerpc/net/Makefile |   4 +
 arch/powerpc/net/bpf_jit.h|  66 ++-
 arch/powerpc/net/bpf_jit64.h  |  58 +++
 arch/powerpc/net/bpf_jit_comp64.c | 828 ++
 5 files changed, 973 insertions(+), 2 deletions(-)
 create mode 100644 arch/powerpc/net/bpf_jit64.h
 create mode 100644 arch/powerpc/net/bpf_jit_comp64.c

diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index 95fd811..bca92e8 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -141,9 +141,11 @@
 #define PPC_INST_ISEL  0x7c1e
 #define PPC_INST_ISEL_MASK 0xfc3e
 #define PPC_INST_LDARX 0x7ca8
+#define PPC_INST_STDCX 0x7c0001ad
 #define PPC_INST_LSWI  0x7c0004aa
 #define PPC_INST_LSWX  0x7c00042a
 #define PPC_INST_LWARX 0x7c28
+#define PPC_INST_STWCX 0x7c00012d
 #define PPC_INST_LWSYNC0x7c2004ac
 #define PPC_INST_SYNC  0x7c0004ac
 #define PPC_INST_SYNC_MASK 0xfc0007fe
@@ -210,8 +212,11 @@
 #define PPC_INST_LBZ   0x8800
 #define PPC_INST_LD0xe800
 #define PPC_INST_LHZ   0xa000
-#define PPC_INST_LHBRX 0x7c00062c
 #define PPC_INST_LWZ   0x8000
+#define PPC_INST_LHBRX 0x7c00062c
+#define PPC_INST_LDBRX 0x7c000428
+#define PPC_INST_STB   0x9800
+#define PPC_INST_STH   0xb000
 #define PPC_INST_STD   0xf800
 #define PPC_INST_STDU  0xf801
 #define PPC_INST_STW   0x9000
@@ -220,22 +225,34 @@
 #define PPC_INST_MTLR  0x7c0803a6
 #define PPC_INST_CMPWI 0x2c00
 #define PPC_INST_CMPDI 0x2c20
+#define PPC_INST_CMPW  0x7c00
+#define PPC_INST_CMPD  0x7c20
 #define PPC_INST_CMPLW 0x7c40
+#define PPC_INST_CMPLD 0x7c200040
 #define PPC_INST_CMPLWI0x2800
+#define PPC_INST_CMPLDI0x2820
 #define PPC_INST_ADDI  0x3800
 #define PPC_INST_ADDIS 0x3c00
 #define PPC_INST_ADD   0x7c000214
 #define PPC_INST_SUB   0x7c50
 #define PPC_INST_BLR   0x4e800020
 #define PPC_INST_BLRL  0x4e800021
+#define PPC_INST_MULLD 0x7c0001d2
 #define PPC_INST_MULLW 0x7c0001d6
 #define PPC_INST_MULHWU0x7c16
 #define PPC_INST_MULLI 0x1c00
 #define PPC_INST_DIVWU 0x7c000396
+#define PPC_INST_DIVD  0x7c0003d2
 #define PPC_INST_RLWINM0x5400
+#define PPC_INST_RLWIMI0x5000
+#define PPC_INST_RLDICL0x7800
 #define PPC_INST_RLDICR0x7804
 #define PPC_INST_SLW   0x7c30
+#define PPC_INST_SLD   0x7c36
 #define PPC_INST_SRW   0x7c000430
+#define PPC_INST_SRD   0x7c000436
+#define PPC_INST_SRAD  0x7c000634
+#define PPC_INST_SRADI 0x7c000674
 #define PPC_INST_AND   0x7c38
 #define PPC_INST_ANDDOT0x7c39
 #define PPC_INST_OR0x7c000378
diff --git a/arch/powerpc/net/Makefile b/arch/powerpc/net/Makefile
index 1306a58..968c1fc3 100644
--- a/arch/powerpc/net/Makefile
+++ b/arch/powerpc/net/Makefile
@@ -1,4 +1,8 @@
 #
 # Arch-specific network modules
 #
+ifeq ($(CONFIG_PPC64),y)
+obj-$(CONFIG_BPF_JIT) += bpf_jit_comp64.o
+else
 obj-$(CONFIG_BPF_JIT) += bpf_jit_asm.o bpf_jit_comp.o
+endif
diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index 

[GIT PULL] Please pull powerpc/linux.git powerpc-4.6-2 tag

2016-04-01 Thread Michael Ellerman
Hi Linus,

Please pull powerpc fixes for 4.6:

The following changes since commit f55532a0c0b8bb6148f4e07853b876ef73bc69ca:

  Linux 4.6-rc1 (2016-03-26 16:03:24 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
tags/powerpc-4.6-2

for you to fetch changes up to 71528d8bd7a8aa920cd69d4223c6c87d5849257d:

  powerpc: Correct used_vsr comment (2016-03-29 12:08:08 +1100)


powerpc fixes for 4.6

 - Fixup preempt underflow with huge pages from Sebastian Siewior
 - Fix altivec SPR not being saved from Oliver O'Halloran
 - Correct used_vsr comment from Simon Guo


Oliver O'Halloran (1):
  powerpc/process: Fix altivec SPR not being saved

Sebastian Siewior (1):
  powerpc/mm: Fixup preempt underflow with huge pages

Simon Guo (1):
  powerpc: Correct used_vsr comment

 arch/powerpc/include/asm/processor.h | 2 +-
 arch/powerpc/kernel/process.c| 2 +-
 arch/powerpc/mm/hugetlbpage.c| 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)


signature.asc
Description: This is a digitally signed message part
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3] ppc64/book3s: fix branching to out of line handlers in relocation kernel

2016-04-01 Thread Michael Ellerman
On Fri, 2016-04-01 at 12:23 +0530, Hari Bathini wrote:
> 
> On 04/01/2016 11:44 AM, Michael Ellerman wrote:
> > On Wed, 2016-03-30 at 23:49 +0530, Hari Bathini wrote:
> > > Some of the interrupt vectors on 64-bit POWER server processors  are
> > > only 32 bytes long (8 instructions), which is not enough for the full
> > ...
> > > Let us fix this undependable code path by moving these OOL handlers below
> > > __end_interrupts marker to make sure we also copy these handlers to real
> > > address 0x100 when running a relocatable kernel. Because the interrupt
> > > vectors branching to these OOL handlers are not long enough to use
> > > LOAD_HANDLER() for branching as discussed above.
> > > 
> > ...
> > > changes from v2:
> > > 2. Move the OOL handlers before __end_interrupts marker instead of moving 
> > > the __end_interrupts marker
> > > 3. Leave __end_handlers marker as is.
> > Hi Hari,
> > 
> > Thanks for trying this. In the end I've decided it's not a good option.
> > 
> > If you build an allmodconfig, and turn on CONFIG_RELOCATABLE, and then look 
> > at
> > the disassembly, you see this:
> > 
> >c0006ffc:   48 00 29 04 b   c0009900 
> > <.ret_from_except>
> >
> >c0007000 <__end_handlers>:
> > 
> > At 0x7000 we have the FWNMI area, which is fixed and can't move. As you see
> > above we end up with only 4 bytes of space between the end of the handlers 
> > and
> > the FWNMI area.
> > 
> > So any tiny change that adds two more instructions prior to 0x7000 will then
> > fail to build.
> 
> Hi Michael,
> 
> I agree. But the OOL handlers that are moved up in v3 were below
> 0x7000 earlier as well and moving them below __end_interrupts marker
> shouldn't make any difference in terms of space consumption at least in
> comparison between v2 & v3. So, I guess picking either v2 or v3
> doesn't change this for better.

It does make a difference, due to alignment. Prior to your patch we have ~24
bytes free.

> Also, there is code between __end_interrupts and __end_handlers
> that is not location dependent as long as it is within 64K (0x1)
> that can be moved above 0x8000, if need be.
 
That's true, but that sort of change is unlikely to backport well. And we need
to backport this fix to everything.

But if you can get that to work I'll consider it. I tried quickly but couldn't
get it working, due to problems with the feature else sections being too far
away from.

cheers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3] ppc64/book3s: fix branching to out of line handlers in relocation kernel

2016-04-01 Thread Michael Ellerman
On Fri, 2016-04-01 at 08:37 +0200, Gabriel Paubert wrote:
> On Fri, Apr 01, 2016 at 05:14:35PM +1100, Michael Ellerman wrote:
> > If you build an allmodconfig, and turn on CONFIG_RELOCATABLE, and then look 
> > at
> > the disassembly, you see this:
> > 
> >   c0006ffc:   48 00 29 04 b   c0009900 
> > <.ret_from_except>
> >   
> >   c0007000 <__end_handlers>:
> > 
> > At 0x7000 we have the FWNMI area, which is fixed and can't move. As you see
> > above we end up with only 4 bytes of space between the end of the handlers 
> > and
> > the FWNMI area.
> 
> Nitpicking a bit, if I correctly read the above disassembly and there is an 
> instuction
> at 0x6ffc, the free space is exactly 0! 

Well spotted! It was of course an April fools .. joke ? :)

> > None of that's your fault, it's just the nature of the code in there, it's 
> > very
> > space constrained.
> 
> Calling it space very constrained makes you win the understatement of the 
> month 
> award, on April fool's day :-)

Well there are some holes here and there, so we could write two instructions,
then branch to the next hole, five more instructions, branch to the next hole
etc. But that makes for hard to read code :)

cheers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH 1/6] ppc: bpf/jit: Fix/enhance 32-bit Load Immediate implementation

2016-04-01 Thread Naveen N. Rao
The existing LI32() macro can sometimes result in a sign-extended 32-bit
load that does not clear the top 32-bits properly. As an example,
loading 0x7fff results in the register containing
0x7fff. While this does not impact classic BPF JIT
implementation (since that only uses the lower word for all operations),
we would like to share this macro between classic BPF JIT and extended
BPF JIT, wherein the entire 64-bit value in the register matters. Fix
this by first doing a shifted LI followed by ORI.

An additional optimization is with loading values between -32768 to -1,
where we now only need a single LI.

The new implementation now generates the same or less number of
instructions.

Cc: Matt Evans 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: Alexei Starovoitov 
Cc: "David S. Miller" 
Cc: Ananth N Mavinakayanahalli 
Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/net/bpf_jit.h | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index 889fd19..a9882db 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -232,10 +232,17 @@ DECLARE_LOAD_FUNC(sk_load_byte_msh);
 (((cond) & 0x3ff) << 16) |   \
 (((dest) - (ctx->idx * 4)) & \
  0xfffc))
-#define PPC_LI32(d, i) do { PPC_LI(d, IMM_L(i)); \
-   if ((u32)(uintptr_t)(i) >= 32768) {   \
-   PPC_ADDIS(d, d, IMM_HA(i));   \
+/* Sign-extended 32-bit immediate load */
+#define PPC_LI32(d, i) do {  \
+   if ((int)(uintptr_t)(i) >= -32768 &&  \
+   (int)(uintptr_t)(i) < 32768)  \
+   PPC_LI(d, i); \
+   else {\
+   PPC_LIS(d, IMM_H(i)); \
+   if (IMM_L(i)) \
+   PPC_ORI(d, d, IMM_L(i));  \
} } while(0)
+
 #define PPC_LI64(d, i) do {  \
if (!((uintptr_t)(i) & 0xULL))\
PPC_LI32(d, i);   \
-- 
2.7.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH 3/6] ppc: bpf/jit: Introduce rotate immediate instructions

2016-04-01 Thread Naveen N. Rao
Since we will be using the rotate immediate instructions for extended
BPF JIT, let's introduce macros for the same. And since the shift
immediate operations use the rotate immediate instructions, let's redo
those macros to use the newly introduced instructions.

Cc: Matt Evans 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: Alexei Starovoitov 
Cc: "David S. Miller" 
Cc: Ananth N Mavinakayanahalli 
Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/include/asm/ppc-opcode.h |  2 ++
 arch/powerpc/net/bpf_jit.h| 20 +++-
 2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index 7ab04fc..95fd811 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -271,6 +271,8 @@
 #define __PPC_SH(s)__PPC_WS(s)
 #define __PPC_MB(s)(((s) & 0x1f) << 6)
 #define __PPC_ME(s)(((s) & 0x1f) << 1)
+#define __PPC_MB64(s)  (__PPC_MB(s) | ((s) & 0x20))
+#define __PPC_ME64(s)  __PPC_MB64(s)
 #define __PPC_BI(s)(((s) & 0x1f) << 16)
 #define __PPC_CT(t)(((t) & 0x0f) << 21)
 
diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index 4c1e055..95d0e38 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -210,18 +210,20 @@ DECLARE_LOAD_FUNC(sk_load_byte_msh);
 ___PPC_RS(a) | ___PPC_RB(s))
 #define PPC_SRW(d, a, s)   EMIT(PPC_INST_SRW | ___PPC_RA(d) |\
 ___PPC_RS(a) | ___PPC_RB(s))
+#define PPC_RLWINM(d, a, i, mb, me)EMIT(PPC_INST_RLWINM | ___PPC_RA(d) | \
+   ___PPC_RS(a) | __PPC_SH(i) |  \
+   __PPC_MB(mb) | __PPC_ME(me))
+#define PPC_RLDICR(d, a, i, me)EMIT(PPC_INST_RLDICR | 
___PPC_RA(d) | \
+   ___PPC_RS(a) | __PPC_SH(i) |  \
+   __PPC_ME64(me) | (((i) & 0x20) >> 4))
+
 /* slwi = rlwinm Rx, Ry, n, 0, 31-n */
-#define PPC_SLWI(d, a, i)  EMIT(PPC_INST_RLWINM | ___PPC_RA(d) | \
-___PPC_RS(a) | __PPC_SH(i) | \
-__PPC_MB(0) | __PPC_ME(31-(i)))
+#define PPC_SLWI(d, a, i)  PPC_RLWINM(d, a, i, 0, 31-(i))
 /* srwi = rlwinm Rx, Ry, 32-n, n, 31 */
-#define PPC_SRWI(d, a, i)  EMIT(PPC_INST_RLWINM | ___PPC_RA(d) | \
-___PPC_RS(a) | __PPC_SH(32-(i)) |\
-__PPC_MB(i) | __PPC_ME(31))
+#define PPC_SRWI(d, a, i)  PPC_RLWINM(d, a, 32-(i), i, 31)
 /* sldi = rldicr Rx, Ry, n, 63-n */
-#define PPC_SLDI(d, a, i)  EMIT(PPC_INST_RLDICR | ___PPC_RA(d) | \
-___PPC_RS(a) | __PPC_SH(i) | \
-__PPC_MB(63-(i)) | (((i) & 0x20) >> 4))
+#define PPC_SLDI(d, a, i)  PPC_RLDICR(d, a, i, 63-(i))
+
 #define PPC_NEG(d, a)  EMIT(PPC_INST_NEG | ___PPC_RT(d) | ___PPC_RA(a))
 
 /* Long jump; (unconditional 'branch') */
-- 
2.7.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH 5/6] ppc: bpf/jit: Isolate classic BPF JIT specifics into a separate header

2016-04-01 Thread Naveen N. Rao
Break out classic BPF JIT specifics into a separate header in
preparation for eBPF JIT implementation. Note that ppc32 will still need
the classic BPF JIT.

Cc: Matt Evans 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: Alexei Starovoitov 
Cc: "David S. Miller" 
Cc: Ananth N Mavinakayanahalli 
Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/net/bpf_jit.h  | 122 +-
 arch/powerpc/net/bpf_jit32.h| 140 
 arch/powerpc/net/bpf_jit_asm.S  |   2 +-
 arch/powerpc/net/bpf_jit_comp.c |   2 +-
 4 files changed, 145 insertions(+), 121 deletions(-)
 create mode 100644 arch/powerpc/net/bpf_jit32.h

diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index 9041d3f..f650767 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -1,6 +1,8 @@
-/* bpf_jit.h: BPF JIT compiler for PPC64
+/*
+ * bpf_jit.h: BPF JIT compiler for PPC
  *
  * Copyright 2011 Matt Evans , IBM Corporation
+ *  2016 Naveen N. Rao 
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License
@@ -10,66 +12,8 @@
 #ifndef _BPF_JIT_H
 #define _BPF_JIT_H
 
-#ifdef CONFIG_PPC64
-#define BPF_PPC_STACK_R3_OFF   48
-#define BPF_PPC_STACK_LOCALS   32
-#define BPF_PPC_STACK_BASIC(48+64)
-#define BPF_PPC_STACK_SAVE (18*8)
-#define BPF_PPC_STACKFRAME (BPF_PPC_STACK_BASIC+BPF_PPC_STACK_LOCALS+ \
-BPF_PPC_STACK_SAVE)
-#define BPF_PPC_SLOWPATH_FRAME (48+64)
-#else
-#define BPF_PPC_STACK_R3_OFF   24
-#define BPF_PPC_STACK_LOCALS   16
-#define BPF_PPC_STACK_BASIC(24+32)
-#define BPF_PPC_STACK_SAVE (18*4)
-#define BPF_PPC_STACKFRAME (BPF_PPC_STACK_BASIC+BPF_PPC_STACK_LOCALS+ \
-BPF_PPC_STACK_SAVE)
-#define BPF_PPC_SLOWPATH_FRAME (24+32)
-#endif
-
-#define REG_SZ (BITS_PER_LONG/8)
-
-/*
- * Generated code register usage:
- *
- * As normal PPC C ABI (e.g. r1=sp, r2=TOC), with:
- *
- * skb r3  (Entry parameter)
- * A register  r4
- * X register  r5
- * addr param  r6
- * r7-r10  scratch
- * skb->data   r14
- * skb headlen r15 (skb->len - skb->data_len)
- * m[0]r16
- * m[...]  ...
- * m[15]   r31
- */
-#define r_skb  3
-#define r_ret  3
-#define r_A4
-#define r_X5
-#define r_addr 6
-#define r_scratch1 7
-#define r_scratch2 8
-#define r_D14
-#define r_HL   15
-#define r_M16
-
 #ifndef __ASSEMBLY__
 
-/*
- * Assembly helpers from arch/powerpc/net/bpf_jit.S:
- */
-#define DECLARE_LOAD_FUNC(func)\
-   extern u8 func[], func##_negative_offset[], func##_positive_offset[]
-
-DECLARE_LOAD_FUNC(sk_load_word);
-DECLARE_LOAD_FUNC(sk_load_half);
-DECLARE_LOAD_FUNC(sk_load_byte);
-DECLARE_LOAD_FUNC(sk_load_byte_msh);
-
 #ifdef CONFIG_PPC64
 #define FUNCTION_DESCR_SIZE24
 #else
@@ -131,46 +75,6 @@ DECLARE_LOAD_FUNC(sk_load_byte_msh);
 #define PPC_BPF_STLU(r, base, i) do { PPC_STWU(r, base, i); } while(0)
 #endif
 
-/* Convenience helpers for the above with 'far' offsets: */
-#define PPC_LBZ_OFFS(r, base, i) do { if ((i) < 32768) PPC_LBZ(r, base, i);   \
-   else {  PPC_ADDIS(r, base, IMM_HA(i));\
-   PPC_LBZ(r, r, IMM_L(i)); } } while(0)
-
-#define PPC_LD_OFFS(r, base, i) do { if ((i) < 32768) PPC_LD(r, base, i); \
-   else {  PPC_ADDIS(r, base, IMM_HA(i));\
-   PPC_LD(r, r, IMM_L(i)); } } while(0)
-
-#define PPC_LWZ_OFFS(r, base, i) do { if ((i) < 32768) PPC_LWZ(r, base, i);   \
-   else {  PPC_ADDIS(r, base, IMM_HA(i));\
-   PPC_LWZ(r, r, IMM_L(i)); } } while(0)
-
-#define PPC_LHZ_OFFS(r, base, i) do { if ((i) < 32768) PPC_LHZ(r, base, i);   \
-   else {  PPC_ADDIS(r, base, IMM_HA(i));\
-   PPC_LHZ(r, r, IMM_L(i)); } } while(0)
-
-#ifdef CONFIG_PPC64
-#define PPC_LL_OFFS(r, base, i) do { PPC_LD_OFFS(r, base, i); } while(0)
-#else
-#define PPC_LL_OFFS(r, base, i) do { PPC_LWZ_OFFS(r, base, i); } while(0)
-#endif
-
-#ifdef CONFIG_SMP
-#ifdef CONFIG_PPC64
-#define PPC_BPF_LOAD_CPU(r)\
-   do { BUILD_BUG_ON(FIELD_SIZEOF(struct paca_struct, paca_index) != 2);   
\
-   PPC_LHZ_OFFS(r, 13, offsetof(struct paca_struct, paca_index));  
\
-   } while (0)
-#else
-#define PPC_BPF_LOAD_CPU(r) \
-   do { BUILD_BUG_ON(FIELD_SIZEOF(struct thread_info, cpu) != 4);  
\
-   PPC_LHZ_OFFS(r, (1 & ~(THREAD_SIZE - 1)),   
\
-   

[PATCH] KVM: PPC:enable mmio_sign_extend in kvmppc_handle_load()

2016-04-01 Thread Bin Lu
The logic of 'kvmppc_handle_loads' is same as 'kvmppc_handle_load',
but with sign extension.  It sets mmio_sign_extend to 1, then calls
'kvmppc_handle_load', but in 'kvmppc_handle_load', the
mmio_sign_extend flag is reset to 0, so the data does not actually get
sign-extended.

This patch fixes the bug by removing the 'kvmppc_handle_loads'
function, adding a new parameter 'mmio_sign_extend' to
'kvmppc_handle_load'.  Calls to kvmppc_handle_loads() are replaced by
calls to kvmppc_handle_load() with 1 for the sign-extend parameter,
and existing calls to kvmppc_handle_load() have 0 added for the
sign-extend parameter.

Signed-off-by: Bin Lu 
---
 arch/powerpc/include/asm/kvm_ppc.h   |  5 +
 arch/powerpc/kvm/book3s_paired_singles.c |  6 +++---
 arch/powerpc/kvm/emulate_loadstore.c | 34 
 arch/powerpc/kvm/powerpc.c   | 17 ++--
 4 files changed, 23 insertions(+), 39 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index af353f6..76829fa 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -74,10 +74,7 @@ extern void kvmppc_handler_highmem(void);
 extern void kvmppc_dump_vcpu(struct kvm_vcpu *vcpu);
 extern int kvmppc_handle_load(struct kvm_run *run, struct kvm_vcpu *vcpu,
   unsigned int rt, unsigned int bytes,
- int is_default_endian);
-extern int kvmppc_handle_loads(struct kvm_run *run, struct kvm_vcpu *vcpu,
-   unsigned int rt, unsigned int bytes,
-  int is_default_endian);
+ int is_default_endian, u8 is_sign_extend);
 extern int kvmppc_handle_store(struct kvm_run *run, struct kvm_vcpu *vcpu,
   u64 val, unsigned int bytes,
   int is_default_endian);
diff --git a/arch/powerpc/kvm/book3s_paired_singles.c 
b/arch/powerpc/kvm/book3s_paired_singles.c
index a759d9a..c9e3008 100644
--- a/arch/powerpc/kvm/book3s_paired_singles.c
+++ b/arch/powerpc/kvm/book3s_paired_singles.c
@@ -200,7 +200,7 @@ static int kvmppc_emulate_fpr_load(struct kvm_run *run, 
struct kvm_vcpu *vcpu,
goto done_load;
} else if (r == EMULATE_DO_MMIO) {
emulated = kvmppc_handle_load(run, vcpu, KVM_MMIO_REG_FPR | rs,
- len, 1);
+ len, 1, 0);
goto done_load;
}
 
@@ -291,12 +291,12 @@ static int kvmppc_emulate_psq_load(struct kvm_run *run, 
struct kvm_vcpu *vcpu,
goto done_load;
} else if ((r == EMULATE_DO_MMIO) && w) {
emulated = kvmppc_handle_load(run, vcpu, KVM_MMIO_REG_FPR | rs,
- 4, 1);
+ 4, 1, 0);
vcpu->arch.qpr[rs] = tmp[1];
goto done_load;
} else if (r == EMULATE_DO_MMIO) {
emulated = kvmppc_handle_load(run, vcpu, KVM_MMIO_REG_FQPR | rs,
- 8, 1);
+ 8, 1, 0);
goto done_load;
}
 
diff --git a/arch/powerpc/kvm/emulate_loadstore.c 
b/arch/powerpc/kvm/emulate_loadstore.c
index 6d3c0ee..38b3b6b 100644
--- a/arch/powerpc/kvm/emulate_loadstore.c
+++ b/arch/powerpc/kvm/emulate_loadstore.c
@@ -70,15 +70,15 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
case 31:
switch (get_xop(inst)) {
case OP_31_XOP_LWZX:
-   emulated = kvmppc_handle_load(run, vcpu, rt, 4, 1);
+   emulated = kvmppc_handle_load(run, vcpu, rt, 4, 1, 0);
break;
 
case OP_31_XOP_LBZX:
-   emulated = kvmppc_handle_load(run, vcpu, rt, 1, 1);
+   emulated = kvmppc_handle_load(run, vcpu, rt, 1, 1, 0);
break;
 
case OP_31_XOP_LBZUX:
-   emulated = kvmppc_handle_load(run, vcpu, rt, 1, 1);
+   emulated = kvmppc_handle_load(run, vcpu, rt, 1, 1, 0);
kvmppc_set_gpr(vcpu, ra, vcpu->arch.vaddr_accessed);
break;
 
@@ -102,15 +102,15 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
break;
 
case OP_31_XOP_LHAX:
-   emulated = kvmppc_handle_loads(run, vcpu, rt, 2, 1);
+   emulated = kvmppc_handle_load(run, vcpu, rt, 2, 1, 1);
break;
 
case OP_31_XOP_LHZX:
-   emulated = kvmppc_handle_load(run, vcpu, rt, 2, 1);
+   emulated = kvmppc_handle_load(run, vcpu, rt, 2, 1, 0);
break;
 
case 

Re: [RFC PATCH 0/6] eBPF JIT for PPC64

2016-04-01 Thread Naveen N. Rao
On 2016/04/01 03:28PM, Naveen N Rao wrote:
> Implement extended BPF JIT for ppc64. We retain the classic BPF JIT for
> ppc32 and move ppc64 BE/LE to use the new JIT. Classic BPF filters will
> be converted to extended BPF (see convert_filter()) and JIT'ed with the
> new compiler.
> 
> Most of the existing macros are retained and fixed/enhanced where
> appropriate. Patches 1-4 are geared towards this.
> 
> Patch 5 breaks out the classic BPF JIT specifics into a separate
> bpf_jit32.h header file, while retaining all the generic instruction
> macros in bpf_jit.h. Most of these macros can potentially be generalized
> and moved to more common code (tagged with a TODO in patch 6).
> 
> Patch 6 implements eBPF JIT for ppc64.

As a comparison, here are the test results with the BPF test suite 
kernel module:

With the classic BPF JIT:
test_bpf: Summary: 291 PASSED, 0 FAILED, [85/283 JIT'ed]

and with the extended BPF JIT:
test_bpf: Summary: 291 PASSED, 0 FAILED, [234/283 JIT'ed]

As noted in patch 6, there are still a few more instructions to be 
JIT'ed.


- Naveen

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 65/65] powerpc/mm/radix: Cputable update for radix

2016-04-01 Thread Aneesh Kumar K.V
Michael Ellerman  writes:

> [ text/plain ]
> On Sun, 2016-03-27 at 13:54 +0530, Aneesh Kumar K.V wrote:
>
>> This patch move the existing p9 hash to a different PVR and add
>> radix feature with p9 PVR. That implies we will not be able to
>> runtime select P9 hash. With P9 Radix we need to do
>> 
>> * set UPRT = 0 in cpu setup
>> * set different TLB set count
>> 
>> We ideally want to use ibm,pa-features to enable disable radix. But
>> we have already done setup cpu by the time we reach pa-features check.
>> 
>> So for now use this hack.
>> 
>> Not-Signed-off-by: Aneesh Kumar K.V 
> ...
>> diff --git a/arch/powerpc/kernel/cpu_setup_power.S 
>> b/arch/powerpc/kernel/cpu_setup_power.S
>> index 584e119fa8b0..e9b76c651bd1 100644
>> --- a/arch/powerpc/kernel/cpu_setup_power.S
>> +++ b/arch/powerpc/kernel/cpu_setup_power.S
>> @@ -117,6 +117,41 @@ _GLOBAL(__restore_cpu_power9)
>>  mtlrr11
>>  blr
>>  
>> +_GLOBAL(__setup_cpu_power9_uprt)
>> +mflrr11
>> +bl  __init_FSCR
>> +bl  __init_hvmode_206
>> +mtlrr11
>> +beqlr
>> +li  r0,0
>> +mtspr   SPRN_LPID,r0
>> +mfspr   r3,SPRN_LPCR
>> +ori r3, r3, LPCR_PECEDH
>> +orisr3,r3,(LPCR_UPRT >> 16)
>> +bl  __init_LPCR
>
> I don't see why we *have* to initialise this here.
>
> ie. could we do it later in early_init_mmu() or similar ?

That helped. This works for me.

commit 9c9d8b4f6a2c2210c90cbb3f5c6d33b2a642e8d2
Author: Aneesh Kumar K.V 
Date:   Mon Feb 15 13:44:01 2016 +0530

powerpc/mm/radix: Cputable update for radix

With P9 Radix we need to do

* set UPRT = 1
* set different TLB set count

In this patch we delay the UPRT=1 to early mmu init. We also update
other cpu_spec callback there. The restore cpu callback is used to
init secondary cpus and also during opal init. So we do a full
radix variant for that, even though the only difference is UPRT=1

Signed-off-by: Aneesh Kumar K.V 

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index b546e6f28d44..3400ed884f10 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -347,6 +347,10 @@
 #define   LPCR_LPES_SH 2
 #define   LPCR_RMI 0x0002  /* real mode is cache inhibit */
 #define   LPCR_HDICE   0x0001  /* Hyp Decr enable (HV,PR,EE) */
+/*
+ * Used in asm code, hence we don't want to use PPC_BITCOUNT
+ */
+#define  LPCR_UPRT (ASM_CONST(0x1) << 22)
 #ifndef SPRN_LPID
 #define SPRN_LPID  0x13F   /* Logical Partition Identifier */
 #endif
diff --git a/arch/powerpc/kernel/cpu_setup_power.S 
b/arch/powerpc/kernel/cpu_setup_power.S
index 584e119fa8b0..8d717954d0ca 100644
--- a/arch/powerpc/kernel/cpu_setup_power.S
+++ b/arch/powerpc/kernel/cpu_setup_power.S
@@ -117,6 +117,24 @@ _GLOBAL(__restore_cpu_power9)
mtlrr11
blr
 
+_GLOBAL(__restore_cpu_power9_uprt)
+   mflrr11
+   bl  __init_FSCR
+   mfmsr   r3
+   rldicl. r0,r3,4,63
+   mtlrr11
+   beqlr
+   li  r0,0
+   mtspr   SPRN_LPID,r0
+   mfspr   r3,SPRN_LPCR
+   ori r3, r3, LPCR_PECEDH
+   orisr3,r3,LPCR_UPRT@h
+   bl  __init_LPCR
+   bl  __init_HFSCR
+   bl  __init_tlb_power7
+   mtlrr11
+   blr
+
 __init_hvmode_206:
/* Disable CPU_FTR_HVMODE and exit if MSR:HV is not set */
mfmsr   r3
diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index 6c662b8de90d..e009722d5914 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -514,7 +514,7 @@ static struct cpu_spec __initdata cpu_specs[] = {
.cpu_features   = CPU_FTRS_POWER9,
.cpu_user_features  = COMMON_USER_POWER9,
.cpu_user_features2 = COMMON_USER2_POWER9,
-   .mmu_features   = MMU_FTRS_POWER9,
+   .mmu_features   = MMU_FTRS_POWER9 | MMU_FTR_RADIX,
.icache_bsize   = 128,
.dcache_bsize   = 128,
.num_pmcs   = 6,
diff --git a/arch/powerpc/kernel/mce_power.c b/arch/powerpc/kernel/mce_power.c
index 92a66a2a9b85..f902ede263ab 100644
--- a/arch/powerpc/kernel/mce_power.c
+++ b/arch/powerpc/kernel/mce_power.c
@@ -75,6 +75,10 @@ void __flush_tlb_power9(unsigned int action)
flush_tlb_206(POWER9_TLB_SETS_HASH, action);
 }
 
+void __flush_tlb_power9_radix(unsigned int action)
+{
+   flush_tlb_206(POWER9_TLB_SETS_RADIX, action);
+}
 
 /* flush SLBs and reload */
 #ifdef CONFIG_PPC_MMU_STD_64
diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index bb1eb7d0911c..6e56051bf825 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -294,8 +294,20 @@ found:
  

[RFC PATCH 0/6] eBPF JIT for PPC64

2016-04-01 Thread Naveen N. Rao
Implement extended BPF JIT for ppc64. We retain the classic BPF JIT for
ppc32 and move ppc64 BE/LE to use the new JIT. Classic BPF filters will
be converted to extended BPF (see convert_filter()) and JIT'ed with the
new compiler.

Most of the existing macros are retained and fixed/enhanced where
appropriate. Patches 1-4 are geared towards this.

Patch 5 breaks out the classic BPF JIT specifics into a separate
bpf_jit32.h header file, while retaining all the generic instruction
macros in bpf_jit.h. Most of these macros can potentially be generalized
and moved to more common code (tagged with a TODO in patch 6).

Patch 6 implements eBPF JIT for ppc64.

This is still *early* *RFC* and there are still a few instruction
classes to be JIT'ed. I am posting this in advance so as to get early
feedback. Kindly review the same and if possible, try it out and let me
know how it goes!


- Naveen

Naveen N. Rao (6):
  ppc: bpf/jit: Fix/enhance 32-bit Load Immediate implementation
  ppc: bpf/jit: Optimize 64-bit Immediate loads
  ppc: bpf/jit: Introduce rotate immediate instructions
  ppc: bpf/jit: A few cleanups
  ppc: bpf/jit: Isolate classic BPF JIT specifics into a separate header
  ppc: ebpf/jit: Implement JIT compiler for extended BPF

 arch/powerpc/include/asm/ppc-opcode.h |  21 +-
 arch/powerpc/net/Makefile |   4 +
 arch/powerpc/net/bpf_jit.h| 251 +--
 arch/powerpc/net/bpf_jit32.h  | 140 ++
 arch/powerpc/net/bpf_jit64.h  |  58 +++
 arch/powerpc/net/bpf_jit_asm.S|   2 +-
 arch/powerpc/net/bpf_jit_comp.c   |  10 +-
 arch/powerpc/net/bpf_jit_comp64.c | 828 ++
 8 files changed, 1163 insertions(+), 151 deletions(-)
 create mode 100644 arch/powerpc/net/bpf_jit32.h
 create mode 100644 arch/powerpc/net/bpf_jit64.h
 create mode 100644 arch/powerpc/net/bpf_jit_comp64.c

-- 
2.7.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC PATCH 4/6] ppc: bpf/jit: A few cleanups

2016-04-01 Thread Naveen N. Rao
1. Per the ISA, ADDIS actually uses RT, rather than RS. Though
the result is the same, make the usage clear.
2. The multiply instruction used is a 32-bit multiply. Rename PPC_MUL()
to PPC_MULW() to make the same clear.
3. PPC_STW[U] take the entire 16-bit immediate value and do not require
word-alignment, per the ISA. Change the macros to use IMM_L().
4. A few white-space cleanups to satisfy checkpatch.pl.

Cc: Matt Evans 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: Alexei Starovoitov 
Cc: "David S. Miller" 
Cc: Ananth N Mavinakayanahalli 
Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/net/bpf_jit.h  | 13 +++--
 arch/powerpc/net/bpf_jit_comp.c |  8 
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index 95d0e38..9041d3f 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -83,7 +83,7 @@ DECLARE_LOAD_FUNC(sk_load_byte_msh);
  */
 #define IMM_H(i)   ((uintptr_t)(i)>>16)
 #define IMM_HA(i)  (((uintptr_t)(i)>>16) +   \
-(((uintptr_t)(i) & 0x8000) >> 15))
+   (((uintptr_t)(i) & 0x8000) >> 15))
 #define IMM_L(i)   ((uintptr_t)(i) & 0x)
 
 #define PLANT_INSTR(d, idx, instr)   \
@@ -99,16 +99,16 @@ DECLARE_LOAD_FUNC(sk_load_byte_msh);
 #define PPC_MR(d, a)   PPC_OR(d, a, a)
 #define PPC_LI(r, i)   PPC_ADDI(r, 0, i)
 #define PPC_ADDIS(d, a, i) EMIT(PPC_INST_ADDIS | \
-___PPC_RS(d) | ___PPC_RA(a) | IMM_L(i))
+___PPC_RT(d) | ___PPC_RA(a) | IMM_L(i))
 #define PPC_LIS(r, i)  PPC_ADDIS(r, 0, i)
 #define PPC_STD(r, base, i)EMIT(PPC_INST_STD | ___PPC_RS(r) |\
 ___PPC_RA(base) | ((i) & 0xfffc))
 #define PPC_STDU(r, base, i)   EMIT(PPC_INST_STDU | ___PPC_RS(r) |   \
 ___PPC_RA(base) | ((i) & 0xfffc))
 #define PPC_STW(r, base, i)EMIT(PPC_INST_STW | ___PPC_RS(r) |\
-___PPC_RA(base) | ((i) & 0xfffc))
+___PPC_RA(base) | IMM_L(i))
 #define PPC_STWU(r, base, i)   EMIT(PPC_INST_STWU | ___PPC_RS(r) |   \
-___PPC_RA(base) | ((i) & 0xfffc))
+___PPC_RA(base) | IMM_L(i))
 
 #define PPC_LBZ(r, base, i)EMIT(PPC_INST_LBZ | ___PPC_RT(r) |\
 ___PPC_RA(base) | IMM_L(i))
@@ -174,13 +174,14 @@ DECLARE_LOAD_FUNC(sk_load_byte_msh);
 #define PPC_CMPWI(a, i)EMIT(PPC_INST_CMPWI | ___PPC_RA(a) | 
IMM_L(i))
 #define PPC_CMPDI(a, i)EMIT(PPC_INST_CMPDI | ___PPC_RA(a) | 
IMM_L(i))
 #define PPC_CMPLWI(a, i)   EMIT(PPC_INST_CMPLWI | ___PPC_RA(a) | IMM_L(i))
-#define PPC_CMPLW(a, b)EMIT(PPC_INST_CMPLW | ___PPC_RA(a) | 
___PPC_RB(b))
+#define PPC_CMPLW(a, b)EMIT(PPC_INST_CMPLW | ___PPC_RA(a) |
  \
+   ___PPC_RB(b))
 
 #define PPC_SUB(d, a, b)   EMIT(PPC_INST_SUB | ___PPC_RT(d) |\
 ___PPC_RB(a) | ___PPC_RA(b))
 #define PPC_ADD(d, a, b)   EMIT(PPC_INST_ADD | ___PPC_RT(d) |\
 ___PPC_RA(a) | ___PPC_RB(b))
-#define PPC_MUL(d, a, b)   EMIT(PPC_INST_MULLW | ___PPC_RT(d) |  \
+#define PPC_MULW(d, a, b)  EMIT(PPC_INST_MULLW | ___PPC_RT(d) |  \
 ___PPC_RA(a) | ___PPC_RB(b))
 #define PPC_MULHWU(d, a, b)EMIT(PPC_INST_MULHWU | ___PPC_RT(d) | \
 ___PPC_RA(a) | ___PPC_RB(b))
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 2d66a84..6012aac 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -161,14 +161,14 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 
*image,
break;
case BPF_ALU | BPF_MUL | BPF_X: /* A *= X; */
ctx->seen |= SEEN_XREG;
-   PPC_MUL(r_A, r_A, r_X);
+   PPC_MULW(r_A, r_A, r_X);
break;
case BPF_ALU | BPF_MUL | BPF_K: /* A *= K */
if (K < 32768)
PPC_MULI(r_A, r_A, K);
else {
PPC_LI32(r_scratch1, K);
-   PPC_MUL(r_A, r_A, r_scratch1);
+   PPC_MULW(r_A, r_A, r_scratch1);
}
 

Re: [v7, 0/5] Fix eSDHC host version register bug

2016-04-01 Thread Scott Wood
On Fri, 2016-04-01 at 11:07 +0800, Yangbo Lu wrote:
> This patchset is used to fix a host version register bug in the T4240-R1.0
> -R2.0
> eSDHC controller. To get the SoC version and revision, it's needed to add
> the
> GUTS driver to access the global utilities registers.
> 
> So, the first three patches are to add the GUTS driver.
> The following two patches are to enable GUTS driver support to get SVR in
> eSDHC
> driver and fix host version for T4240.
> 
> Yangbo Lu (5):
>   ARM64: dts: ls2080a: add device configuration node
>   soc: fsl: add GUTS driver for QorIQ platforms
>   dt: move guts devicetree doc out of powerpc directory
>   powerpc/fsl: move mpc85xx.h to include/linux/fsl
>   mmc: sdhci-of-esdhc: fix host version for T4240-R1.0-R2.0

Acked-by: Scott Wood 

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[net-next PATCH 1/2 v4] ibmvnic: map L2/L3/L4 header descriptors to firmware

2016-04-01 Thread Thomas Falcon
Allow the VNIC driver to provide descriptors containing
L2/L3/L4 headers to firmware.  This feature is needed
for greater hardware compatibility and enablement of checksum
and TCP offloading features.

A new function is included for the hypervisor call,
H_SEND_SUBCRQ_INDIRECT, allowing a DMA-mapped array of SCRQ
descriptor elements to be sent to the VNIC server.

These additions will help fully enable checksum offloading as
well as other features as they are included later.

Signed-off-by: Thomas Falcon 
Cc: John Allen 
---
v2: Fixed typo error caught by kbuild test bot
v3: Fixed erroneous patch sender
v4: sorry for the delay in resending, Thanks to David Miller for comments,
removed all extra memory allocations,
merged some helper functions,
calculate all header lengths to meet firmware requirements,
fixed endian bugs in the send_subcrq_indirect
---
 drivers/net/ethernet/ibm/ibmvnic.c | 195 -
 drivers/net/ethernet/ibm/ibmvnic.h |   3 +
 2 files changed, 194 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 6e9e16ee..4e97e76 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -61,6 +61,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -94,6 +95,7 @@ static int ibmvnic_reenable_crq_queue(struct ibmvnic_adapter 
*);
 static int ibmvnic_send_crq(struct ibmvnic_adapter *, union ibmvnic_crq *);
 static int send_subcrq(struct ibmvnic_adapter *adapter, u64 remote_handle,
   union sub_crq *sub_crq);
+static int send_subcrq_indirect(struct ibmvnic_adapter *, u64, u64, u64);
 static irqreturn_t ibmvnic_interrupt_rx(int irq, void *instance);
 static int enable_scrq_irq(struct ibmvnic_adapter *,
   struct ibmvnic_sub_crq_queue *);
@@ -561,10 +563,141 @@ static int ibmvnic_close(struct net_device *netdev)
return 0;
 }
 
+/**
+ * build_hdr_data - creates L2/L3/L4 header data buffer
+ * @hdr_field - bitfield determining needed headers
+ * @skb - socket buffer
+ * @hdr_len - array of header lengths
+ * @tot_len - total length of data
+ *
+ * Reads hdr_field to determine which headers are needed by firmware.
+ * Builds a buffer containing these headers.  Saves individual header
+ * lengths and total buffer length to be used to build descriptors.
+ */
+static int build_hdr_data(u8 hdr_field, struct sk_buff *skb,
+ int *hdr_len, u8 *hdr_data)
+{
+   int len = 0;
+   u8 *hdr;
+
+   hdr_len[0] = sizeof(struct ethhdr);
+
+   if (skb->protocol == htons(ETH_P_IP)) {
+   hdr_len[1] = ip_hdr(skb)->ihl * 4;
+   if (ip_hdr(skb)->protocol == IPPROTO_TCP)
+   hdr_len[2] = tcp_hdrlen(skb);
+   else if (ip_hdr(skb)->protocol == IPPROTO_UDP)
+   hdr_len[2] = sizeof(struct udphdr);
+   } else if (skb->protocol == htons(ETH_P_IPV6)) {
+   hdr_len[1] = sizeof(struct ipv6hdr);
+   if (ipv6_hdr(skb)->nexthdr == IPPROTO_TCP)
+   hdr_len[2] = tcp_hdrlen(skb);
+   else if (ipv6_hdr(skb)->nexthdr == IPPROTO_UDP)
+   hdr_len[2] = sizeof(struct udphdr);
+   }
+
+   memset(hdr_data, 0, 120);
+   if ((hdr_field >> 6) & 1) {
+   hdr = skb_mac_header(skb);
+   memcpy(hdr_data, hdr, hdr_len[0]);
+   len += hdr_len[0];
+   }
+
+   if ((hdr_field >> 5) & 1) {
+   hdr = skb_network_header(skb);
+   memcpy(hdr_data + len, hdr, hdr_len[1]);
+   len += hdr_len[1];
+   }
+
+   if ((hdr_field >> 4) & 1) {
+   hdr = skb_transport_header(skb);
+   memcpy(hdr_data + len, hdr, hdr_len[2]);
+   len += hdr_len[2];
+   }
+   return len;
+}
+
+/**
+ * create_hdr_descs - create header and header extension descriptors
+ * @hdr_field - bitfield determining needed headers
+ * @data - buffer containing header data
+ * @len - length of data buffer
+ * @hdr_len - array of individual header lengths
+ * @scrq_arr - descriptor array
+ *
+ * Creates header and, if needed, header extension descriptors and
+ * places them in a descriptor array, scrq_arr
+ */
+
+static void create_hdr_descs(u8 hdr_field, u8 *hdr_data, int len, int *hdr_len,
+union sub_crq *scrq_arr)
+{
+   union sub_crq hdr_desc;
+   int tmp_len = len;
+   u8 *data, *cur;
+   int tmp;
+
+   while (tmp_len > 0) {
+   cur = hdr_data + len - tmp_len;
+
+   memset(_desc, 0, sizeof(hdr_desc));
+   if (cur != hdr_data) {
+   data = hdr_desc.hdr_ext.data;
+   tmp = tmp_len > 29 ? 29 : tmp_len;
+   hdr_desc.hdr_ext.first = IBMVNIC_CRQ_CMD;
+   

[net-next PATCH 2/2 v4] ibmvnic: enable RX checksum offload

2016-04-01 Thread Thomas Falcon
Enable RX Checksum offload feature in the ibmvnic driver.

Signed-off-by: Thomas Falcon 
Cc: John Allen 
---
v4: this patch included since it is enabled by the previous patch
---
 drivers/net/ethernet/ibm/ibmvnic.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 4e97e76..21bccf6 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -2105,6 +2105,10 @@ static void handle_query_ip_offload_rsp(struct 
ibmvnic_adapter *adapter)
if (buf->tcp_ipv6_chksum || buf->udp_ipv6_chksum)
adapter->netdev->features |= NETIF_F_IPV6_CSUM;
 
+   if ((adapter->netdev->features &
+   (NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM)))
+   adapter->netdev->features |= NETIF_F_RXCSUM;
+
memset(, 0, sizeof(crq));
crq.control_ip_offload.first = IBMVNIC_CRQ_CMD;
crq.control_ip_offload.cmd = CONTROL_IP_OFFLOAD;
-- 
2.4.11

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3] ppc64/book3s: fix branching to out of line handlers in relocation kernel

2016-04-01 Thread Michael Ellerman
On Wed, 2016-03-30 at 23:49 +0530, Hari Bathini wrote:
> Some of the interrupt vectors on 64-bit POWER server processors  are
> only 32 bytes long (8 instructions), which is not enough for the full
...
> Let us fix this undependable code path by moving these OOL handlers below
> __end_interrupts marker to make sure we also copy these handlers to real
> address 0x100 when running a relocatable kernel. Because the interrupt
> vectors branching to these OOL handlers are not long enough to use
> LOAD_HANDLER() for branching as discussed above.
> 
...
> changes from v2:
> 2. Move the OOL handlers before __end_interrupts marker instead of moving the 
> __end_interrupts marker
> 3. Leave __end_handlers marker as is.

Hi Hari,

Thanks for trying this. In the end I've decided it's not a good option.

If you build an allmodconfig, and turn on CONFIG_RELOCATABLE, and then look at
the disassembly, you see this:

  c0006ffc:   48 00 29 04 b   c0009900 
<.ret_from_except>
  
  c0007000 <__end_handlers>:

At 0x7000 we have the FWNMI area, which is fixed and can't move. As you see
above we end up with only 4 bytes of space between the end of the handlers and
the FWNMI area.

So any tiny change that adds two more instructions prior to 0x7000 will then
fail to build.

None of that's your fault, it's just the nature of the code in there, it's very
space constrained.

For now I'll take your v2, but I'll edit the comment and drop the removal of
__end_handlers.

cheers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 65/65] powerpc/mm/radix: Cputable update for radix

2016-04-01 Thread Michael Ellerman
On Sun, 2016-03-27 at 13:54 +0530, Aneesh Kumar K.V wrote:

> This patch move the existing p9 hash to a different PVR and add
> radix feature with p9 PVR. That implies we will not be able to
> runtime select P9 hash. With P9 Radix we need to do
> 
> * set UPRT = 0 in cpu setup
> * set different TLB set count
> 
> We ideally want to use ibm,pa-features to enable disable radix. But
> we have already done setup cpu by the time we reach pa-features check.
> 
> So for now use this hack.
> 
> Not-Signed-off-by: Aneesh Kumar K.V 
...
> diff --git a/arch/powerpc/kernel/cpu_setup_power.S 
> b/arch/powerpc/kernel/cpu_setup_power.S
> index 584e119fa8b0..e9b76c651bd1 100644
> --- a/arch/powerpc/kernel/cpu_setup_power.S
> +++ b/arch/powerpc/kernel/cpu_setup_power.S
> @@ -117,6 +117,41 @@ _GLOBAL(__restore_cpu_power9)
>   mtlrr11
>   blr
>  
> +_GLOBAL(__setup_cpu_power9_uprt)
> + mflrr11
> + bl  __init_FSCR
> + bl  __init_hvmode_206
> + mtlrr11
> + beqlr
> + li  r0,0
> + mtspr   SPRN_LPID,r0
> + mfspr   r3,SPRN_LPCR
> + ori r3, r3, LPCR_PECEDH
> + orisr3,r3,(LPCR_UPRT >> 16)
> + bl  __init_LPCR

I don't see why we *have* to initialise this here.

ie. could we do it later in early_init_mmu() or similar ?

cheers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev