Re: [PATCH] PowerPC: Replace kretprobe with rethook
Hi Abhishek, kernel test robot noticed the following build errors: [auto build test ERROR on powerpc/next] [also build test ERROR on powerpc/fixes linus/master v6.9 next-20240516] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Abhishek-Dubey/PowerPC-Replace-kretprobe-with-rethook/20240516-214818 base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next patch link: https://lore.kernel.org/r/20240516134646.1059114-1-adubey%40linux.ibm.com patch subject: [PATCH] PowerPC: Replace kretprobe with rethook config: powerpc-asp8347_defconfig (https://download.01.org/0day-ci/archive/20240517/202405171203.casoixjg-...@intel.com/config) compiler: clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240517/202405171203.casoixjg-...@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot | Closes: https://lore.kernel.org/oe-kbuild-all/202405171203.casoixjg-...@intel.com/ All errors (new ones prefixed by >>): >> ld.lld: error: undefined symbol: arch_rethook_trampoline >>> referenced by stacktrace.c >>> arch/powerpc/kernel/stacktrace.o:(arch_stack_walk_reliable) in archive vmlinux.a >>> referenced by stacktrace.c >>> arch/powerpc/kernel/stacktrace.o:(arch_stack_walk_reliable) in archive vmlinux.a -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki
Re: [PATCH] PowerPC: Replace kretprobe with rethook
Hi Abhishek, kernel test robot noticed the following build errors: [auto build test ERROR on powerpc/next] [also build test ERROR on powerpc/fixes linus/master v6.9 next-20240516] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Abhishek-Dubey/PowerPC-Replace-kretprobe-with-rethook/20240516-214818 base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next patch link: https://lore.kernel.org/r/20240516134646.1059114-1-adubey%40linux.ibm.com patch subject: [PATCH] PowerPC: Replace kretprobe with rethook config: powerpc-allnoconfig (https://download.01.org/0day-ci/archive/20240517/202405171247.spwntdjg-...@intel.com/config) compiler: powerpc-linux-gcc (GCC) 13.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240517/202405171247.spwntdjg-...@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot | Closes: https://lore.kernel.org/oe-kbuild-all/202405171247.spwntdjg-...@intel.com/ All errors (new ones prefixed by >>): powerpc-linux-ld: arch/powerpc/kernel/stacktrace.o: in function `arch_stack_walk_reliable': >> stacktrace.c:(.text+0x172): undefined reference to `arch_rethook_trampoline' >> powerpc-linux-ld: stacktrace.c:(.text+0x17e): undefined reference to >> `arch_rethook_trampoline' -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki
[powerpc:next] BUILD SUCCESS 61700f816e6f58f6b1aaa881a69a784d146e30f0
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next branch HEAD: 61700f816e6f58f6b1aaa881a69a784d146e30f0 powerpc/fadump: Fix section mismatch warning elapsed time: 734m configs tested: 190 configs skipped: 3 The following configs have been built successfully. More configs may be tested in the coming days. tested configs: alpha allnoconfig gcc alphaallyesconfig gcc alpha defconfig gcc arc allmodconfig gcc arc allnoconfig gcc arc allyesconfig gcc arc defconfig gcc arc randconfig-001-20240517 gcc arc randconfig-002-20240517 gcc arm allmodconfig gcc arm allnoconfig clang arm allyesconfig gcc arm collie_defconfig gcc arm davinci_all_defconfig clang arm defconfig clang armdove_defconfig gcc armhisi_defconfig gcc arm lpc32xx_defconfig clang arm nhk8815_defconfig clang arm pxa_defconfig gcc arm randconfig-001-20240517 clang arm randconfig-002-20240517 clang arm randconfig-003-20240517 clang arm randconfig-004-20240517 clang arm sama7_defconfig clang arm spear13xx_defconfig gcc arm tegra_defconfig gcc arm vf610m4_defconfig gcc arm wpcm450_defconfig gcc arm64allmodconfig clang arm64 allnoconfig gcc arm64 defconfig gcc arm64 randconfig-001-20240517 clang arm64 randconfig-002-20240517 gcc arm64 randconfig-003-20240517 clang arm64 randconfig-004-20240517 clang csky allmodconfig gcc csky allnoconfig gcc csky allyesconfig gcc cskydefconfig gcc csky randconfig-001-20240517 gcc csky randconfig-002-20240517 gcc hexagon allmodconfig clang hexagon allnoconfig clang hexagon allyesconfig clang hexagon defconfig clang hexagon randconfig-001-20240517 clang hexagon randconfig-002-20240517 clang i386 allmodconfig gcc i386 allnoconfig gcc i386 allyesconfig gcc i386 buildonly-randconfig-001-20240516 clang i386 buildonly-randconfig-001-20240517 clang i386 buildonly-randconfig-002-20240516 clang i386 buildonly-randconfig-002-20240517 clang i386 buildonly-randconfig-003-20240516 clang i386 buildonly-randconfig-003-20240517 gcc i386 buildonly-randconfig-004-20240516 gcc i386 buildonly-randconfig-004-20240517 clang i386 buildonly-randconfig-005-20240516 gcc i386 buildonly-randconfig-005-20240517 clang i386 buildonly-randconfig-006-20240516 gcc i386 buildonly-randconfig-006-20240517 gcc i386defconfig clang i386 randconfig-001-20240516 gcc i386 randconfig-001-20240517 gcc i386 randconfig-002-20240516 gcc i386 randconfig-002-20240517 gcc i386 randconfig-003-20240516 clang i386 randconfig-003-20240517 gcc i386 randconfig-004-20240516 clang i386 randconfig-004-20240517 gcc i386 randconfig-005-20240516 clang i386 randconfig-005-20240517 gcc i386 randconfig-006-20240516 clang i386 randconfig-006-20240517 gcc i386 randconfig-011-20240516 gcc i386 randconfig-011-20240517 gcc i386 randconfig-012-20240516 gcc i386 randconfig-012-20240517 clang i386 randconfig-013-20240516 clang i386 randconfig-013-20240517 gcc i386 randconfig-014-20240516 gcc i386 randconfig-014-20240517 gcc i386 randconfig-015-20240516 gcc i386 randconfig
Re: [PATCH v2 2/2] powerpc: hotplug driver bridge support
On Tue, May 14, 2024 at 11:54 PM Krishna Kumar wrote: > > There is an issue with the hotplug operation when it's done on the > bridge/switch slot. The bridge-port and devices behind the bridge, which > become offline by hot-unplug operation, don't get hot-plugged/enabled by > doing hot-plug operation on that slot. Only the first port of the bridge > gets enabled and the remaining port/devices remain unplugged. The hot > plug/unplug operation is done by the hotplug driver > (drivers/pci/hotplug/pnv_php.c). > > Root Cause Analysis: This behavior is due to missing code for the DPC > switch/bridge. I don't see anything touching DPC in this series? > *snip* > > Command for reproducing the issue : > > For hot unplug/disable - echo 0 > /sys/bus/pci/slots/C5/power > For hot plug/enable -echo 1 > /sys/bus/pci/slots/C5/power > > where C5 is slot associated with bridge. > > Scenario/Tests: > Output of lspci -nn before test is given below. This snippet contains > devices used for testing on Powernv machine. > > 0004:02:00.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052] > 0004:02:01.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052] > 0004:02:02.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052] > 0004:02:03.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052] > 0004:08:00.0 Serial Attached SCSI controller [0107]: > Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3 [1000:00c9] (rev 01) > 0004:09:00.0 Serial Attached SCSI controller [0107]: > Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3 [1000:00c9] (rev 01) > > Output of lspci -tv before test is as follows: > > # lspci -tv > +-[0004:00]---00.0-[01-0e]--+-00.0-[02-0e]--+-00.0-[03-07]-- > | | +-01.0-[08]00.0 Broadcom / > LSI SAS3216 PCI-Express Fusion-MPT SAS-3 > | | +-02.0-[09]00.0 Broadcom / > LSI SAS3216 PCI-Express Fusion-MPT SAS-3 > | | \-03.0-[0a-0e]-- > | \-00.1 PMC-Sierra Inc. Device 4052 > > C5(bridge) and C6(End Point) slot address are as below: > # cat /sys/bus/pci/slots/C5/address > 0004:02:00 > # cat /sys/bus/pci/slots/C6/address > 0004:09:00 Uh, if I'm reading this right it looks like your "slot" C5 is actually the PCIe switch's internal bus which is definitely not hot pluggable. I find it helps to look at the PCI topology in terms of where the physical PCIe links are. Here we've got: - A link between the PHB (0004:00:00.0) and the switch upstream port (0004:01:00.0) - A link from switch downstream port 0 (0004:02:00.0) to nothing - A link from switch downstream port 1 (0004:02:01.0) to a SAS card - A link from switch downstream port 2 (0004:02:02.0) to a SAS card - A link from switch downstream port 2 (0004:02:03.0) to nothing Note that there's no PCIe link between the switch upstream port (0004:01:00.0) and the downstream ports on bus 0004:02. The connection between those is invisible to us because it's custom bus logic internal to the PCIe switch ASIC. What I think has happened here is that system firmware has supplied bad PCIe slot information to OPAL which has resulted in pnv_php advertising a slot in the wrong place. Assuming this following the usual IBM convention I'd expect the bridge device for C5 to be the PHB's root port and the bus should be 0004:01. It might be worth adding some logic to pnv_php to verify the PCI bridge upstream of the slot actually has the PCIe slot capability to guard against this problem. > Hot-unplug operation on slot associated with bridge: > # echo 0 > /sys/bus/pci/slots/C5/power > # lspci -tv > +-[0004:00]---00.0-[01-0e]--+-00.0-[02-0e]-- > | \-00.1 PMC-Sierra Inc. Device 4052 Yep, "powering off" C5 doesn't remove the upstream port device. This would create problems if you physically removed the card from C5 since the kernel would assume the switch device is still present. > *snip* > diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c > index 38561d6a2079..bea612759832 100644 > --- a/arch/powerpc/kernel/pci_dn.c > +++ b/arch/powerpc/kernel/pci_dn.c > @@ -493,4 +493,36 @@ static void pci_dev_pdn_setup(struct pci_dev *pdev) > pdn = pci_get_pdn(pdev); > pdev->dev.archdata.pci_data = pdn; > } > + > +void pci_traverse_sibling_nodes_and_scan_slot(struct device_node *start, > struct pci_bus *bus) > +{ > + struct device_node *dn; > + int slotno; > + > + u32 class = 0; > + > + if (!of_property_read_u32(start->child, "class-code", )) { > + /* Call of pci_scan_slot for non-bridge/EP case */ > + if (!((class >> 8) == PCI_CLASS_BRIDGE_PCI)) { > + slotno = PCI_SLOT(PCI_DN(start->child)->devfn); > + pci_scan_slot(bus, PCI_DEVFN(slotno, 0)); > + return; > + } > + } > + > + /* Iterate all siblings */ > +
Re: [PATCH v8] arch/powerpc/kvm: Add support for reading VPA counters for pseries guests
Hi Gautam, kernel test robot noticed the following build errors: [auto build test ERROR on powerpc/topic/ppc-kvm] [also build test ERROR on powerpc/next powerpc/fixes kvm/queue mst-vhost/linux-next linus/master v6.9 next-20240516] [cannot apply to kvm/linux-next] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Gautam-Menghani/arch-powerpc-kvm-Add-support-for-reading-VPA-counters-for-pseries-guests/20240510-185213 base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git topic/ppc-kvm patch link: https://lore.kernel.org/r/20240510104941.78410-1-gautam%40linux.ibm.com patch subject: [PATCH v8] arch/powerpc/kvm: Add support for reading VPA counters for pseries guests config: powerpc-allmodconfig (https://download.01.org/0day-ci/archive/20240517/202405170932.tl7g99ij-...@intel.com/config) compiler: powerpc64-linux-gcc (GCC) 13.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240517/202405170932.tl7g99ij-...@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot | Closes: https://lore.kernel.org/oe-kbuild-all/202405170932.tl7g99ij-...@intel.com/ All errors (new ones prefixed by >>): powerpc64-linux-ld: warning: discarding dynamic section .glink powerpc64-linux-ld: warning: discarding dynamic section .plt powerpc64-linux-ld: linkage table error against `__traceiter_kvmppc_vcpu_stats' powerpc64-linux-ld: stubs don't match calculated size powerpc64-linux-ld: can not build stubs: bad value powerpc64-linux-ld: arch/powerpc/kvm/book3s_hv_nestedv2.o: in function `do_trace_nested_cs_time': >> book3s_hv_nestedv2.c:(.text.do_trace_nested_cs_time+0x264): undefined >> reference to `__traceiter_kvmppc_vcpu_stats' >> powerpc64-linux-ld: >> arch/powerpc/kvm/book3s_hv_nestedv2.o:(__jump_table+0x8): undefined >> reference to `__tracepoint_kvmppc_vcpu_stats' -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki
Re: [PATCH RESEND v8 16/16] bpf: remove CONFIG_BPF_JIT dependency on CONFIG_MODULES of
3: 800081223dd0 x22: 3a198a40 x21: [ 44.430908] x20: d4202000 x19: 8000814dd880 x18: 0006 [ 44.439122] x17: x16: 0020 x15: 0002 [ 44.447338] x14: 8000811a6370 x13: 2000 x12: [ 44.43] x11: 8000811a6370 x10: 0166 x9 : 8000811fe370 [ 44.463771] x8 : 00017fe8 x7 : f000 x6 : 8000811fe370 [ 44.471989] x5 : x4 : x3 : [ 44.480208] x2 : x1 : x0 : 02203240 [ 44.488420] Call trace: [ 44.491847] vfree (mm/vmalloc.c:3324 (discriminator 1)) [ 44.495900] execmem_free (mm/execmem.c:70) [ 44.500394] bpf_jit_free_exec+0x10/0x1c [ 44.505329] bpf_prog_pack_free (kernel/bpf/core.c:1006) [ 44.510507] bpf_jit_binary_pack_free (kernel/bpf/core.c:1195) [ 44.516017] bpf_jit_free (include/linux/filter.h:1083 arch/arm64/net/bpf_jit_comp.c:2474) [ 44.520424] bpf_prog_free_deferred (kernel/bpf/core.c:2785) [ 44.525864] process_one_work (kernel/workqueue.c:3273) [ 44.530754] worker_thread (kernel/workqueue.c:3342 (discriminator 2) kernel/workqueue.c:3429 (discriminator 2)) [ 44.535364] kthread (kernel/kthread.c:388) [ 44.539417] ret_from_fork (arch/arm64/kernel/entry.S:861) [ 44.543791] ---[ end trace ]--- # bad: [dbd9e2e056d8577375ae4b31ada94f8aa3769e8a] Add linux-next specific files for 20240516 git bisect start 'next/master' # status: waiting for good commit(s), bad commit known # good: [8c06da67d0bd3139a97f301b4aa9c482b9d4f29e] Merge tag 'livepatching-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching git bisect good 8c06da67d0bd3139a97f301b4aa9c482b9d4f29e # good: [147d3734724040bb0aff1252299e48947a6c8858] Merge branch 'master' of git://linuxtv.org/mchehab/media-next.git git bisect good 147d3734724040bb0aff1252299e48947a6c8858 # bad: [729cf96da8de5e7ae70fef40a1b864bc00c2dca1] Merge branch 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvm.git git bisect bad 729cf96da8de5e7ae70fef40a1b864bc00c2dca1 # good: [4364438497c638785b1394aab764a15b6baefaf3] Merge branch 'drm-xe-next' of https://gitlab.freedesktop.org/drm/xe/kernel git bisect good 4364438497c638785b1394aab764a15b6baefaf3 # bad: [b3ead6c10eccbfa446ce30927f94472c278cd3d7] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git git bisect bad b3ead6c10eccbfa446ce30927f94472c278cd3d7 # bad: [d83384f475a4cfa0e9bda1cab538d99360fa2c48] Merge branch 'for-mfd-next' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd.git git bisect bad d83384f475a4cfa0e9bda1cab538d99360fa2c48 # bad: [9564f97e8e3ec6bdbf0c105b45fa2516d64c4685] Merge branch 'for-next' of git://git.kernel.dk/linux-block.git git bisect bad 9564f97e8e3ec6bdbf0c105b45fa2516d64c4685 # bad: [0e6c77dedcb11f510c0dbdaf6455b918b28f1b62] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input.git git bisect bad 0e6c77dedcb11f510c0dbdaf6455b918b28f1b62 # good: [5852f2afcdd9b7c9dedec4fdf14b8b079349828f] Input: drop explicit initialization of struct i2c_device_id::driver_data to 0 git bisect good 5852f2afcdd9b7c9dedec4fdf14b8b079349828f # good: [223b5e57d0d50b0c07b933350dbcde92018d3080] mm/execmem, arch: convert remaining overrides of module_alloc to execmem git bisect good 223b5e57d0d50b0c07b933350dbcde92018d3080 # good: [14e56fb2ed1dbc3c3171d12ab435b0f691f6f215] x86/ftrace: enable dynamic ftrace without CONFIG_MODULES git bisect good 14e56fb2ed1dbc3c3171d12ab435b0f691f6f215 # good: [7582b7be16d0ba90e3dbd9575a730cabd9eb852a] kprobes: remove dependency on CONFIG_MODULES git bisect good 7582b7be16d0ba90e3dbd9575a730cabd9eb852a # bad: [86d899efdd58c98a0d196e31945009fc47a56264] Merge branch 'modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git git bisect bad 86d899efdd58c98a0d196e31945009fc47a56264 # bad: [2c9e5d4a008293407836d29d35dfd4353615bd2f] bpf: remove CONFIG_BPF_JIT dependency on CONFIG_MODULES of git bisect bad 2c9e5d4a008293407836d29d35dfd4353615bd2f # first bad commit: [2c9e5d4a008293407836d29d35dfd4353615bd2f] bpf: remove CONFIG_BPF_JIT dependency on CONFIG_MODULES of config.gz Description: application/gzip
Re: [PATCH v2 1/1] arch/fault: don't print logs for pte marker poison errors
On Wed, May 15, 2024 at 1:19 PM Borislav Petkov wrote: > > On Wed, May 15, 2024 at 12:19:16PM -0700, Axel Rasmussen wrote: > > An unprivileged process can allocate a VMA, use the userfaultfd API to > > install one of these PTE markers, and then register a no-op SIGBUS > > handler. Now it can access that address in a tight loop, > > Maybe the userfaultfd should not allow this, I dunno. You made me look > at this thing and to me it all sounds weird. One thread does page fault > handling for the other and that helps with live migration somehow. OMG, > whaaat? > > Maybe I don't understand it and probably never will... > > But, for example, membarrier used do to a stupid thing of allowing one > thread to hammer another with an IPI storm. Bad bad idea. So it got > fixed. > > All I'm saying is, if unprivileged processes can do crap, they should be > prevented from doing crap. Like ratelimiting the pagefaults or whatnot. > > One of the recovery action strategies from memory poison is, well, you > kill the process. If you can detect the hammering process which > installed that page marker, you kill it. Problem solved. > > But again, this userfaultfd thing sounds really weird so I could very > well be way wrong. > > > Even in a non-contrived / non-malicious case, use of this API could > > have similar effects. If nothing else, the log message can be > > confusing to administrators: they state that an MCE occurred, whereas > > with the simulated poison API, this is not the case; it isn't a "real" > > MCE / hardware error. > > Yeah, I read that part in > > Documentation/admin-guide/mm/userfaultfd.rst > > Simulated poison huh? Another WTF. > > > In the KVM use case, the host can't just allocate a new page, because > > it doesn't know what the guest might have had stored there. Best we > > Ok, let's think of real hw poison. > > When doing the recovery, you don't care what's stored there because as > far as the hardware is concerned, if you consume that poison the *whole* > machine might go down. > > So you lose the page. Plain and simple. And the guest can go visit the > bureau of complaints and grievances. > > Still better than killing the guest or even the whole host with other > guests running on it. > > > can do is propagate the poison into the guest, and let the guest OS > > deal with it as it sees fit, and mark the page poisoned on the host. > > You mark the page as poison on the host and you yank it from under the > guest. That physical frame is gone and the faster all the actors > involved understand that, the better. > > > I don't disagree the guest *shouldn't* reaccess it in this case. :) > > But if it did, it should get another poison event just as you say. > > Yes, it shouldn't. Look at memory_failure(). This will kill whole > processes if it has to, depending on what the page is used for. > > > And, live migration between physical hosts should be transparent to > > the guest. So if the guest gets a poison, and then we live migrate it, > > So if I were to design this, I'd do it this way: > > 0. guest gets hw poison injected > > 1. it runs memory_failure() and it kills the processes using the page. > > 2. page is marked poisoned on the host so no other guest gets it. > > That's it. No second accesses whatsoever. At least this is how it works > on baremetal. I agree with almost all of the above. But one point is, I don't think we can trust the guest to be reasonable. :) Public cloud provider customers might run some OS other than Linux, or an old / buggy kernel, or one with out-of-tree patches which make it do who knows what. There can also be users who are actively malicious. Some customers may try to do fancy "poison recovery" where they can avoid killing the in-guest process when a poison event occurs. These implementations can be buggy :) and unintentionally reaccess. > > This hw poisoning emulation is just silly and unnecessary. > > But again, I probably am missing some aspects. It all just sounded > really weird to me that's why I thought I should ask what's behind all > that. > > Thx. > > -- > Regards/Gruss, > Boris. > > https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH 1/3] crypto: X25519 low-level primitives for ppc64le.
Hi! On Thu, May 16, 2024 at 10:06:58PM +1000, Michael Ellerman wrote: > Andy Polyakov writes: > >>> +.abiversion 2 > >> > >> I'd prefer that was left to the compiler flags. > > > > Problem is that it's the compiler that is responsible for providing this > > directive in the intermediate .s prior invoking the assembler. And there > > is no assembler flag to pass through -Wa. > > Hmm, right. But none of our existing .S files include .abiversion > directives. > > We build .S files with gcc, passing -mabi=elfv2, but it seems to have no > effect. Yup. You coulds include some header file, maybe? Since you run the assembler code through the C preprocessor anyway, for some weird reason :-) > But the actual code follows ELFv2, because we wrote it that way, and I > guess the linker doesn't look at the actual ABI version of the .o ? It isn't a version. It is an actual different ABI. GNU LD allows linking together whatever, yes. > Is .abiversion documented anywhere? I can't see it in the manual. Yeah me neither. https://sourceware.org/bugzilla/enter_bug.cgi ? A commandline flag (to GAS) would seem best? Segher
Re: [PATCH 2/3] crypto: X25519 core functions for ppc64le
On Wed, May 15, 2024 at 10:29:56AM +0200, Andy Polyakov wrote: > >+static void cswap(fe51 p, fe51 q, unsigned int bit) > > The "c" in cswap stands for "constant-time," and the problem is that > contemporary compilers have exhibited the ability to produce > non-constant-time machine code as result of compilation of the above > kind of technique. This can happen with *any* comnpiler, on *any* platform. In general, you have to write machine code if you want to be sure what machine code will eventually be executed. > The outcome is platform-specific and ironically some > of PPC code generators were observed to generate "most" > non-constant-time code. "Most" in sense that execution time variations > would be most easy to catch. One way to work around the problem, at > least for the time being, is to add 'asm volatile("" : "+r"(c))' after > you calculate 'c'. But there is no guarantee that the next compiler > version won't see through it, hence the permanent solution is to do it > in assembly. I can put together something... Such tricks can help ameliorate the problem, sure. But it is not a solution ever. Segher
[PATCH v2 3/3] crypto: Update Kconfig and Makefile for ppc64le x25519.
Defined CRYPTO_CURVE25519_PPC64 to support X25519 for ppc64le. Added new module curve25519-ppc64le for X25519. Signed-off-by: Danny Tsen --- arch/powerpc/crypto/Kconfig | 11 +++ arch/powerpc/crypto/Makefile | 2 ++ 2 files changed, 13 insertions(+) diff --git a/arch/powerpc/crypto/Kconfig b/arch/powerpc/crypto/Kconfig index 1e201b7ae2fc..09ebcbdfb34f 100644 --- a/arch/powerpc/crypto/Kconfig +++ b/arch/powerpc/crypto/Kconfig @@ -2,6 +2,17 @@ menu "Accelerated Cryptographic Algorithms for CPU (powerpc)" +config CRYPTO_CURVE25519_PPC64 + tristate "Public key crypto: Curve25519 (PowerPC64)" + depends on PPC64 && CPU_LITTLE_ENDIAN + select CRYPTO_LIB_CURVE25519_GENERIC + select CRYPTO_ARCH_HAVE_LIB_CURVE25519 + help + Curve25519 algorithm + + Architecture: PowerPC64 + - Little-endian + config CRYPTO_CRC32C_VPMSUM tristate "CRC32c" depends on PPC64 && ALTIVEC diff --git a/arch/powerpc/crypto/Makefile b/arch/powerpc/crypto/Makefile index fca0e9739869..59808592f0a1 100644 --- a/arch/powerpc/crypto/Makefile +++ b/arch/powerpc/crypto/Makefile @@ -17,6 +17,7 @@ obj-$(CONFIG_CRYPTO_AES_GCM_P10) += aes-gcm-p10-crypto.o obj-$(CONFIG_CRYPTO_CHACHA20_P10) += chacha-p10-crypto.o obj-$(CONFIG_CRYPTO_POLY1305_P10) += poly1305-p10-crypto.o obj-$(CONFIG_CRYPTO_DEV_VMX_ENCRYPT) += vmx-crypto.o +obj-$(CONFIG_CRYPTO_CURVE25519_PPC64) += curve25519-ppc64le.o aes-ppc-spe-y := aes-spe-core.o aes-spe-keys.o aes-tab-4k.o aes-spe-modes.o aes-spe-glue.o md5-ppc-y := md5-asm.o md5-glue.o @@ -29,6 +30,7 @@ aes-gcm-p10-crypto-y := aes-gcm-p10-glue.o aes-gcm-p10.o ghashp10-ppc.o aesp10-p chacha-p10-crypto-y := chacha-p10-glue.o chacha-p10le-8x.o poly1305-p10-crypto-y := poly1305-p10-glue.o poly1305-p10le_64.o vmx-crypto-objs := vmx.o aesp8-ppc.o ghashp8-ppc.o aes.o aes_cbc.o aes_ctr.o aes_xts.o ghash.o +curve25519-ppc64le-y := curve25519-ppc64le-core.o curve25519-ppc64le_asm.o ifeq ($(CONFIG_CPU_LITTLE_ENDIAN),y) override flavour := linux-ppc64le -- 2.31.1
[PATCH v2 2/3] crypto: X25519 core functions for ppc64le
X25519 core functions to handle scalar multiplication for ppc64le. Signed-off-by: Danny Tsen --- arch/powerpc/crypto/curve25519-ppc64le-core.c | 299 ++ 1 file changed, 299 insertions(+) create mode 100644 arch/powerpc/crypto/curve25519-ppc64le-core.c diff --git a/arch/powerpc/crypto/curve25519-ppc64le-core.c b/arch/powerpc/crypto/curve25519-ppc64le-core.c new file mode 100644 index ..4e3e44ea4484 --- /dev/null +++ b/arch/powerpc/crypto/curve25519-ppc64le-core.c @@ -0,0 +1,299 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright 2024- IBM Corp. + * + * X25519 scalar multiplication with 51 bits limbs for PPC64le. + * Based on RFC7748 and AArch64 optimized implementation for X25519 + * - Algorithm 1 Scalar multiplication of a variable point + */ + +#include +#include + +#include +#include +#include +#include +#include + +#include +#include + +typedef uint64_t fe51[5]; + +asmlinkage void x25519_fe51_mul(fe51 h, const fe51 f, const fe51 g); +asmlinkage void x25519_fe51_sqr(fe51 h, const fe51 f); +asmlinkage void x25519_fe51_mul121666(fe51 h, fe51 f); +asmlinkage void x25519_fe51_sqr_times(fe51 h, const fe51 f, int n); +asmlinkage void x25519_fe51_frombytes(fe51 h, const uint8_t *s); +asmlinkage void x25519_fe51_tobytes(uint8_t *s, const fe51 h); +asmlinkage void x25519_cswap(fe51 p, fe51 q, unsigned int bit); + +#define fmul x25519_fe51_mul +#define fsqr x25519_fe51_sqr +#define fmul121666 x25519_fe51_mul121666 +#define fe51_tobytes x25519_fe51_tobytes + +static void fadd(fe51 h, const fe51 f, const fe51 g) +{ + h[0] = f[0] + g[0]; + h[1] = f[1] + g[1]; + h[2] = f[2] + g[2]; + h[3] = f[3] + g[3]; + h[4] = f[4] + g[4]; +} + +/* + * Prime = 2 ** 255 - 19, 255 bits + *(0x7fff ffed) + * + * Prime in 5 51-bit limbs + */ +static fe51 prime51 = { 0x7ffed, 0x7, 0x7, 0x7, 0x7}; + +static void fsub(fe51 h, const fe51 f, const fe51 g) +{ + h[0] = (f[0] + ((prime51[0] * 2))) - g[0]; + h[1] = (f[1] + ((prime51[1] * 2))) - g[1]; + h[2] = (f[2] + ((prime51[2] * 2))) - g[2]; + h[3] = (f[3] + ((prime51[3] * 2))) - g[3]; + h[4] = (f[4] + ((prime51[4] * 2))) - g[4]; +} + +static void fe51_frombytes(fe51 h, const uint8_t *s) +{ + /* +* Make sure 64-bit aligned. +*/ + unsigned char sbuf[32+8]; + unsigned char *sb = PTR_ALIGN((void *)sbuf, 8); + + memcpy(sb, s, 32); + x25519_fe51_frombytes(h, sb); +} + +static void finv(fe51 o, const fe51 i) +{ + fe51 a0, b, c, t00; + + fsqr(a0, i); + x25519_fe51_sqr_times(t00, a0, 2); + + fmul(b, t00, i); + fmul(a0, b, a0); + + fsqr(t00, a0); + + fmul(b, t00, b); + x25519_fe51_sqr_times(t00, b, 5); + + fmul(b, t00, b); + x25519_fe51_sqr_times(t00, b, 10); + + fmul(c, t00, b); + x25519_fe51_sqr_times(t00, c, 20); + + fmul(t00, t00, c); + x25519_fe51_sqr_times(t00, t00, 10); + + fmul(b, t00, b); + x25519_fe51_sqr_times(t00, b, 50); + + fmul(c, t00, b); + x25519_fe51_sqr_times(t00, c, 100); + + fmul(t00, t00, c); + x25519_fe51_sqr_times(t00, t00, 50); + + fmul(t00, t00, b); + x25519_fe51_sqr_times(t00, t00, 5); + + fmul(o, t00, a0); +} + +static void curve25519_fe51(uint8_t out[32], const uint8_t scalar[32], + const uint8_t point[32]) +{ + fe51 x1, x2, z2, x3, z3; + uint8_t s[32]; + unsigned int swap = 0; + int i; + + memcpy(s, scalar, 32); + s[0] &= 0xf8; + s[31] &= 0x7f; + s[31] |= 0x40; + fe51_frombytes(x1, point); + + z2[0] = z2[1] = z2[2] = z2[3] = z2[4] = 0; + x3[0] = x1[0]; + x3[1] = x1[1]; + x3[2] = x1[2]; + x3[3] = x1[3]; + x3[4] = x1[4]; + + x2[0] = z3[0] = 1; + x2[1] = z3[1] = 0; + x2[2] = z3[2] = 0; + x2[3] = z3[3] = 0; + x2[4] = z3[4] = 0; + + for (i = 254; i >= 0; --i) { + unsigned int k_t = 1 & (s[i / 8] >> (i & 7)); + fe51 a, b, c, d, e; + fe51 da, cb, aa, bb; + fe51 dacb_p, dacb_m; + + swap ^= k_t; + x25519_cswap(x2, x3, swap); + x25519_cswap(z2, z3, swap); + swap = k_t; + + fsub(b, x2, z2);// B = x_2 - z_2 + fadd(a, x2, z2);// A = x_2 + z_2 + fsub(d, x3, z3);// D = x_3 - z_3 + fadd(c, x3, z3);// C = x_3 + z_3 + + fsqr(bb, b);// BB = B^2 + fsqr(aa, a);// AA = A^2 + fmul(da, d, a); // DA = D * A + fmul(cb, c, b); // CB = C * B + + fsub(e, aa,
[PATCH v2 0/3] crypto: X25519 supports for ppc64le
This patch series provide X25519 support for ppc64le with a new module curve25519-ppc64le. The implementation is based on CRYPTOGAMs perl output from x25519-ppc64.pl. (see https://github.com/dot-asm/cryptogams/) Modified and added 4 supporting functions. This patch has passed the selftest by running modprobe curve25519-ppc64le. Danny Tsen (3): X25519 low-level primitives for ppc64le. X25519 core functions for ppc64le Update Kconfig and Makefile for ppc64le x25519. arch/powerpc/crypto/Kconfig | 11 + arch/powerpc/crypto/Makefile | 2 + arch/powerpc/crypto/curve25519-ppc64le-core.c | 299 arch/powerpc/crypto/curve25519-ppc64le_asm.S | 671 ++ 4 files changed, 983 insertions(+) create mode 100644 arch/powerpc/crypto/curve25519-ppc64le-core.c create mode 100644 arch/powerpc/crypto/curve25519-ppc64le_asm.S -- 2.31.1
[PATCH v2 1/3] crypto: X25519 low-level primitives for ppc64le.
Use the perl output of x25519-ppc64.pl from CRYPTOGAMs (see https://github.com/dot-asm/cryptogams/) and added four supporting functions, x25519_fe51_sqr_times, x25519_fe51_frombytes, x25519_fe51_tobytes and x25519_cswap. Signed-off-by: Danny Tsen --- arch/powerpc/crypto/curve25519-ppc64le_asm.S | 671 +++ 1 file changed, 671 insertions(+) create mode 100644 arch/powerpc/crypto/curve25519-ppc64le_asm.S diff --git a/arch/powerpc/crypto/curve25519-ppc64le_asm.S b/arch/powerpc/crypto/curve25519-ppc64le_asm.S new file mode 100644 index ..06c1febe24b9 --- /dev/null +++ b/arch/powerpc/crypto/curve25519-ppc64le_asm.S @@ -0,0 +1,671 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +# +# This code is taken from CRYPTOGAMs[1] and is included here using the option +# in the license to distribute the code under the GPL. Therefore this program +# is free software; you can redistribute it and/or modify it under the terms of +# the GNU General Public License version 2 as published by the Free Software +# Foundation. +# +# [1] https://github.com/dot-asm/cryptogams/ + +# Copyright (c) 2006-2017, CRYPTOGAMS by +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# +# * Redistributions of source code must retain copyright notices, +# this list of conditions and the following disclaimer. +# +# * Redistributions in binary form must reproduce the above +# copyright notice, this list of conditions and the following +# disclaimer in the documentation and/or other materials +# provided with the distribution. +# +# * Neither the name of the CRYPTOGAMS nor the names of its +# copyright holder and contributors may be used to endorse or +# promote products derived from this software without specific +# prior written permission. +# +# ALTERNATIVELY, provided that this notice is retained in full, this +# product may be distributed under the terms of the GNU General Public +# License (GPL), in which case the provisions of the GPL apply INSTEAD OF +# those given above. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +# +# Written by Andy Polyakov for the OpenSSL +# project. The module is, however, dual licensed under OpenSSL and +# CRYPTOGAMS licenses depending on where you obtain it. For further +# details see https://www.openssl.org/~appro/cryptogams/. +# + +# +# +# Written and Modified by Danny Tsen +# - Added x25519_fe51_sqr_times, x25519_fe51_frombytes, x25519_fe51_tobytes +# and x25519_cswap +# +# Copyright 2024- IBM Corp. +# +# X25519 lower-level primitives for PPC64. +# + +#include + +.text + +.align 5 +SYM_FUNC_START(x25519_fe51_mul) + + stdu1,-144(1) + std 21,56(1) + std 22,64(1) + std 23,72(1) + std 24,80(1) + std 25,88(1) + std 26,96(1) + std 27,104(1) + std 28,112(1) + std 29,120(1) + std 30,128(1) + std 31,136(1) + + ld 6,0(5) + ld 7,0(4) + ld 8,8(4) + ld 9,16(4) + ld 10,24(4) + ld 11,32(4) + + mulld 22,7,6 + mulhdu 23,7,6 + + mulld 24,8,6 + mulhdu 25,8,6 + + mulld 30,11,6 + mulhdu 31,11,6 + ld 4,8(5) + mulli 11,11,19 + + mulld 26,9,6 + mulhdu 27,9,6 + + mulld 28,10,6 + mulhdu 29,10,6 + mulld 12,11,4 + mulhdu 21,11,4 + addc22,22,12 + adde23,23,21 + + mulld 12,7,4 + mulhdu 21,7,4 + addc24,24,12 + adde25,25,21 + + mulld 12,10,4 + mulhdu 21,10,4 + ld 6,16(5) + mulli 10,10,19 + addc30,30,12 + adde31,31,21 + + mulld 12,8,4 + mulhdu 21,8,4 + addc26,26,12 + adde27,27,21 + + mulld 12,9,4 + mulhdu 21,9,4 + addc
Re: [PATCH v15 00/16] Add audio support in v4l2 framework
On 15. 05. 24 15:34, Shengjiu Wang wrote: On Wed, May 15, 2024 at 6:46 PM Jaroslav Kysela wrote: On 15. 05. 24 12:19, Takashi Iwai wrote: On Wed, 15 May 2024 11:50:52 +0200, Jaroslav Kysela wrote: On 15. 05. 24 11:17, Hans Verkuil wrote: Hi Jaroslav, On 5/13/24 13:56, Jaroslav Kysela wrote: On 09. 05. 24 13:13, Jaroslav Kysela wrote: On 09. 05. 24 12:44, Shengjiu Wang wrote: mem2mem is just like the decoder in the compress pipeline. which is one of the components in the pipeline. I was thinking of loopback with endpoints using compress streams, without physical endpoint, something like: compress playback (to feed data from userspace) -> DSP (processing) -> compress capture (send data back to userspace) Unless I'm missing something, you should be able to process data as fast as you can feed it and consume it in such case. Actually in the beginning I tried this, but it did not work well. ALSA needs time control for playback and capture, playback and capture needs to synchronize. Usually the playback and capture pipeline is independent in ALSA design, but in this case, the playback and capture should synchronize, they are not independent. The core compress API core no strict timing constraints. You can eventually0 have two half-duplex compress devices, if you like to have really independent mechanism. If something is missing in API, you can extend this API (like to inform the user space that it's a producer/consumer processing without any relation to the real time). I like this idea. I was thinking more about this. If I am right, the mentioned use in gstreamer is supposed to run the conversion (DSP) job in "one shot" (can be handled using one system call like blocking ioctl). The goal is just to offload the CPU work to the DSP (co-processor). If there are no requirements for the queuing, we can implement this ioctl in the compress ALSA API easily using the data management through the dma-buf API. We can eventually define a new direction (enum snd_compr_direction) like SND_COMPRESS_CONVERT or so to allow handle this new data scheme. The API may be extended later on real demand, of course. Otherwise all pieces are already in the current ALSA compress API (capabilities, params, enumeration). The realtime controls may be created using ALSA control API. So does this mean that Shengjiu should attempt to use this ALSA approach first? I've not seen any argument to use v4l2 mem2mem buffer scheme for this data conversion forcefully. It looks like a simple job and ALSA APIs may be extended for this simple purpose. Shengjiu, what are your requirements for gstreamer support? Would be a new blocking ioctl enough for the initial support in the compress ALSA API? If it works with compress API, it'd be great, yeah. So, your idea is to open compress-offload devices for read and write, then and let them convert a la batch jobs without timing control? For full-duplex usages, we might need some more extensions, so that both read and write parameters can be synchronized. (So far the compress stream is a unidirectional, and the runtime buffer for a single stream.) And the buffer management is based on the fixed size fragments. I hope this doesn't matter much for the intended operation? It's a question, if the standard I/O is really required for this case. My quick idea was to just implement a new "direction" for this job supporting only one ioctl for the data processing which will execute the job in "one shot" at the moment. The I/O may be handled through dma-buf API (which seems to be standard nowadays for this purpose and allows future chaining). So something like: struct dsp_job { int source_fd; /* dma-buf FD with source data - for dma_buf_get() */ int target_fd; /* dma-buf FD for target data - for dma_buf_get() */ ... maybe some extra data size members here ... ... maybe some special parameters here ... }; #define SNDRV_COMPRESS_DSPJOB _IOWR('C', 0x60, struct dsp_job) This ioctl will be blocking (thus synced). My question is, if it's feasible for gstreamer or not. For this particular case, if the rate conversion is implemented in software, it will block the gstreamer data processing, too. Thanks. I have several questions: 1. Compress API alway binds to a sound card. Can we avoid that? For ASRC, it is just one component, Is this a real issue? Usually, I would expect a sound hardware (card) presence when ASRC is available, or not? Eventually, a separate sound card with one compress device may be created, too. For enumeration - the user space may just iterate through all sound cards / compress devices to find ASRC in the system. The devices/interfaces in the sound card are independent. Also, USB MIDI converters offer only one serial MIDI interface for example, too. 2. Compress API doesn't seem to support mmap(). Is this a problem for sending and getting data to/from the driver? I proposed to use dma-buf for I/O (separate
Re: [PATCH v15 00/16] Add audio support in v4l2 framework
On 15. 05. 24 22:33, Nicolas Dufresne wrote: Hi, GStreamer hat on ... Le mercredi 15 mai 2024 à 12:46 +0200, Jaroslav Kysela a écrit : On 15. 05. 24 12:19, Takashi Iwai wrote: On Wed, 15 May 2024 11:50:52 +0200, Jaroslav Kysela wrote: On 15. 05. 24 11:17, Hans Verkuil wrote: Hi Jaroslav, On 5/13/24 13:56, Jaroslav Kysela wrote: On 09. 05. 24 13:13, Jaroslav Kysela wrote: On 09. 05. 24 12:44, Shengjiu Wang wrote: mem2mem is just like the decoder in the compress pipeline. which is one of the components in the pipeline. I was thinking of loopback with endpoints using compress streams, without physical endpoint, something like: compress playback (to feed data from userspace) -> DSP (processing) -> compress capture (send data back to userspace) Unless I'm missing something, you should be able to process data as fast as you can feed it and consume it in such case. Actually in the beginning I tried this, but it did not work well. ALSA needs time control for playback and capture, playback and capture needs to synchronize. Usually the playback and capture pipeline is independent in ALSA design, but in this case, the playback and capture should synchronize, they are not independent. The core compress API core no strict timing constraints. You can eventually0 have two half-duplex compress devices, if you like to have really independent mechanism. If something is missing in API, you can extend this API (like to inform the user space that it's a producer/consumer processing without any relation to the real time). I like this idea. I was thinking more about this. If I am right, the mentioned use in gstreamer is supposed to run the conversion (DSP) job in "one shot" (can be handled using one system call like blocking ioctl). The goal is just to offload the CPU work to the DSP (co-processor). If there are no requirements for the queuing, we can implement this ioctl in the compress ALSA API easily using the data management through the dma-buf API. We can eventually define a new direction (enum snd_compr_direction) like SND_COMPRESS_CONVERT or so to allow handle this new data scheme. The API may be extended later on real demand, of course. Otherwise all pieces are already in the current ALSA compress API (capabilities, params, enumeration). The realtime controls may be created using ALSA control API. So does this mean that Shengjiu should attempt to use this ALSA approach first? I've not seen any argument to use v4l2 mem2mem buffer scheme for this data conversion forcefully. It looks like a simple job and ALSA APIs may be extended for this simple purpose. Shengjiu, what are your requirements for gstreamer support? Would be a new blocking ioctl enough for the initial support in the compress ALSA API? If it works with compress API, it'd be great, yeah. So, your idea is to open compress-offload devices for read and write, then and let them convert a la batch jobs without timing control? For full-duplex usages, we might need some more extensions, so that both read and write parameters can be synchronized. (So far the compress stream is a unidirectional, and the runtime buffer for a single stream.) And the buffer management is based on the fixed size fragments. I hope this doesn't matter much for the intended operation? It's a question, if the standard I/O is really required for this case. My quick idea was to just implement a new "direction" for this job supporting only one ioctl for the data processing which will execute the job in "one shot" at the moment. The I/O may be handled through dma-buf API (which seems to be standard nowadays for this purpose and allows future chaining). So something like: struct dsp_job { int source_fd; /* dma-buf FD with source data - for dma_buf_get() */ int target_fd; /* dma-buf FD for target data - for dma_buf_get() */ ... maybe some extra data size members here ... ... maybe some special parameters here ... }; #define SNDRV_COMPRESS_DSPJOB _IOWR('C', 0x60, struct dsp_job) This ioctl will be blocking (thus synced). My question is, if it's feasible for gstreamer or not. For this particular case, if the rate conversion is implemented in software, it will block the gstreamer data processing, too. Yes, GStreamer threading is using a push-back model, so blocking for the time of the processing is fine. Note that the extra simplicity will suffer from ioctl() latency. In GFX, they solve this issue with fences. That allow setting up the next operation in the chain before the data has been produced. The fences look really nicely and seem more modern. It should be possible with dma-buf/sync_file.c interface to handle multiple jobs simultaneously and share the state between user space and kernel driver. In this case, I think that two non-blocking ioctls should be enough - add a new job with source/target dma buffers guarded by one fence and abort (flush) all active jobs. I'll try to propose an API extension for the
[PATCH] Perf: Calling available function for stats printing
For printing dump_trace, use existing stats_print() function. Signed-off-by: Abhishek Dubey --- tools/perf/builtin-report.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index dcd93ee5fc24..3cabd5b0bfec 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -1088,10 +1088,7 @@ static int __cmd_report(struct report *rep) perf_session__fprintf_dsos(session, stdout); if (dump_trace) { - perf_session__fprintf_nr_events(session, stdout, - rep->skip_empty); - evlist__fprintf_nr_events(session->evlist, stdout, - rep->skip_empty); + stats_print(rep); return 0; } } -- 2.44.0
[PATCH] PowerPC: Replace kretprobe with rethook
This is an adaptation of commit f3a112c0c40d ("x86,rethook,kprobes: Replace kretprobe with rethook on x86") to Power. Replaces the kretprobe code with rethook on Power. With this patch, kretprobe on Power uses the rethook instead of kretprobe specific trampoline code. Reference to other archs: commit b57c2f124098 ("riscv: add riscv rethook implementation") commit 7b0a096436c2 ("LoongArch: Replace kretprobe with rethook") Signed-off-by: Abhishek Dubey --- arch/powerpc/Kconfig | 1 + arch/powerpc/kernel/Makefile | 1 + arch/powerpc/kernel/kprobes.c| 65 + arch/powerpc/kernel/optprobes.c | 2 +- arch/powerpc/kernel/rethook.c| 71 arch/powerpc/kernel/stacktrace.c | 6 +-- include/linux/rethook.h | 1 - 7 files changed, 78 insertions(+), 69 deletions(-) create mode 100644 arch/powerpc/kernel/rethook.c diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 1c4be3373686..108de491965a 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -268,6 +268,7 @@ config PPC select HAVE_PERF_EVENTS_NMI if PPC64 select HAVE_PERF_REGS select HAVE_PERF_USER_STACK_DUMP + select HAVE_RETHOOK select HAVE_REGS_AND_STACK_ACCESS_API select HAVE_RELIABLE_STACKTRACE select HAVE_RSEQ diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index d3282fbea4f2..181d764be3a6 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -142,6 +142,7 @@ obj-$(CONFIG_KPROBES) += kprobes.o obj-$(CONFIG_OPTPROBES)+= optprobes.o optprobes_head.o obj-$(CONFIG_KPROBES_ON_FTRACE)+= kprobes-ftrace.o obj-$(CONFIG_UPROBES) += uprobes.o +obj-$(CONFIG_RETHOOK) += rethook.o obj-$(CONFIG_PPC_UDBG_16550) += legacy_serial.o udbg_16550.o obj-$(CONFIG_SWIOTLB) += dma-swiotlb.o obj-$(CONFIG_ARCH_HAS_DMA_SET_MASK) += dma-mask.o diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c index bbca90a5e2ec..614bb68ad0e6 100644 --- a/arch/powerpc/kernel/kprobes.c +++ b/arch/powerpc/kernel/kprobes.c @@ -248,16 +248,6 @@ static nokprobe_inline void set_current_kprobe(struct kprobe *p, struct pt_regs kcb->kprobe_saved_msr = regs->msr; } -void arch_prepare_kretprobe(struct kretprobe_instance *ri, struct pt_regs *regs) -{ - ri->ret_addr = (kprobe_opcode_t *)regs->link; - ri->fp = NULL; - - /* Replace the return addr with trampoline addr */ - regs->link = (unsigned long)__kretprobe_trampoline; -} -NOKPROBE_SYMBOL(arch_prepare_kretprobe); - static int try_to_emulate(struct kprobe *p, struct pt_regs *regs) { int ret; @@ -414,49 +404,6 @@ int kprobe_handler(struct pt_regs *regs) } NOKPROBE_SYMBOL(kprobe_handler); -/* - * Function return probe trampoline: - * - init_kprobes() establishes a probepoint here - * - When the probed function returns, this probe - * causes the handlers to fire - */ -asm(".global __kretprobe_trampoline\n" - ".type __kretprobe_trampoline, @function\n" - "__kretprobe_trampoline:\n" - "nop\n" - "blr\n" - ".size __kretprobe_trampoline, .-__kretprobe_trampoline\n"); - -/* - * Called when the probe at kretprobe trampoline is hit - */ -static int trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs) -{ - unsigned long orig_ret_address; - - orig_ret_address = __kretprobe_trampoline_handler(regs, NULL); - /* -* We get here through one of two paths: -* 1. by taking a trap -> kprobe_handler() -> here -* 2. by optprobe branch -> optimized_callback() -> opt_pre_handler() -> here -* -* When going back through (1), we need regs->nip to be setup properly -* as it is used to determine the return address from the trap. -* For (2), since nip is not honoured with optprobes, we instead setup -* the link register properly so that the subsequent 'blr' in -* __kretprobe_trampoline jumps back to the right instruction. -* -* For nip, we should set the address to the previous instruction since -* we end up emulating it in kprobe_handler(), which increments the nip -* again. -*/ - regs_set_return_ip(regs, orig_ret_address - 4); - regs->link = orig_ret_address; - - return 0; -} -NOKPROBE_SYMBOL(trampoline_probe_handler); - /* * Called after single-stepping. p->addr is the address of the * instruction whose first byte has been replaced by the "breakpoint" @@ -559,19 +506,9 @@ int kprobe_fault_handler(struct pt_regs *regs, int trapnr) } NOKPROBE_SYMBOL(kprobe_fault_handler); -static struct kprobe trampoline_p = { - .addr = (kprobe_opcode_t *) &__kretprobe_trampoline, - .pre_handler = trampoline_probe_handler -}; - -int __init arch_init_kprobes(void) -{ - return register_kprobe(_p);
Re: [PATCH 1/3] crypto: X25519 low-level primitives for ppc64le.
Hi, +.abiversion2 I'd prefer that was left to the compiler flags. Problem is that it's the compiler that is responsible for providing this directive in the intermediate .s prior invoking the assembler. And there is no assembler flag to pass through -Wa. Hmm, right. But none of our existing .S files include .abiversion directives. We build .S files with gcc, passing -mabi=elfv2, but it seems to have no effect. So all the intermediate .o's generated from .S files are not ELFv2: $ find .build/ -name '*.o' | xargs file | grep Unspecified .build/arch/powerpc/kernel/vdso/note-64.o:ELF 64-bit LSB relocatable, 64-bit PowerPC or cisco 7500, Unspecified or Power ELF V1 ABI, version 1 (SYSV), not stripped I would guess that contemporary linker is more forgiving than it was back then when the .abiversion directive was added. If it works now, then it of course can be omitted. I suppose my original remark should be viewed rather as "you can't replace it with a command line option" than "you can't make it work without it." :-) But the actual code follows ELFv2, because we wrote it that way, and I guess the linker doesn't look at the actual ABI version of the .o ? So it currently works. But it's kind of gross that those .o files are not ELFv2 for an ELFv2 build. Well, as far as passing base types and pointers to/from assembly goes, there are no differences between the versions. Then it's a question of meaning assigned to r2 and r13, but as long as you don't touch them, you can freely reuse the code with either ABI. With this in mind the .abiversion directive is effectively reduced to just a marker in the .o file. In other words the instruction sequences by themselves are customarily ABI-neutral, at least in "general calculation" modules such as the suggested one, so that if it works 100% without the .abiversion directive, then it can be safely omitted. Cheers.
[PATCH] powerpc/fadump: Fix section mismatch warning
With some compilers/configs fadump_setup_param_area() isn't inlined into its caller (which is __init), leading to a section mismatch warning: WARNING: modpost: vmlinux: section mismatch in reference: fadump_setup_param_area+0x200 (section: .text.fadump_setup_param_area) -> memblock_phys_alloc_range (section: .init.text) Fix it by adding an __init annotation. Fixes: 683eab94da75 ("powerpc/fadump: setup additional parameters for dump capture kernel") Reported-by: Stephen Rothwell Closes: https://lore.kernel.org/all/20240515163708.3380c...@canb.auug.org.au/ Reported-by: kernel test robot Closes: https://lore.kernel.org/all/202405140922.ouclox4y-...@intel.com/ Signed-off-by: Michael Ellerman --- arch/powerpc/kernel/fadump.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index 2276bacc4170..60f974775fc8 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -1740,7 +1740,7 @@ static void __init fadump_process(void) * Reserve memory to store additional parameters to be passed * for fadump/capture kernel. */ -static void fadump_setup_param_area(void) +static void __init fadump_setup_param_area(void) { phys_addr_t range_start, range_end; -- 2.45.0
Re: [PATCHv4 8/9] ASoC: fsl-asoc-card: add DT property "cpu-system-clock-direction-out"
On Wed, May 15, 2024 at 03:54:10PM +0200, Elinor Montmasson wrote: > Add new optional DT property "cpu-system-clock-direction-out" to set > sysclk direction as "out" for the CPU DAI when using the generic codec. > It is set for both Tx and Rx. > If not set, the direction is "in". > The way the direction value is used is up to the CPU DAI driver > implementation. This feels like we should be using the clock bindings to specify the clock input of whatever is using the output from the SoC, though that's a lot more work. signature.asc Description: PGP signature
[PATCH 1/1] powerpc/numa: Online a node if PHB is attached.
In the current design, a numa-node is made online only if that node is attached to cpu/memory. With this design, if any PCI/IO device is found to be attached to a numa-node which is not online then the numa-node id of the corresponding PCI/IO device is set to NUMA_NO_NODE(-1). This design may negatively impact the performance of PCIe device if the numa-node assigned to PCIe device is -1 because in such case we may not be able to accurately calculate the distance between two nodes. The multi-controller NVMe PCIe disk has an issue with calculating the node distance if the PCIe NVMe controller is attached to a PCI host bridge which has numa-node id value set to NUMA_NO_NODE. This patch helps fix this ensuring that a cpu/memory less numa node is made online if it's attached to PCI host bridge. Signed-off-by: Nilay Shroff --- arch/powerpc/mm/numa.c | 14 +- arch/powerpc/platforms/pseries/pci_dlpar.c | 14 ++ 2 files changed, 27 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index a490724e84ad..9e5e366cee43 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -896,7 +896,7 @@ static int __init numa_setup_drmem_lmb(struct drmem_lmb *lmb, static int __init parse_numa_properties(void) { - struct device_node *memory; + struct device_node *memory, *pci; int default_nid = 0; unsigned long i; const __be32 *associativity; @@ -1010,6 +1010,18 @@ static int __init parse_numa_properties(void) goto new_range; } + for_each_node_by_name(pci, "pci") { + int nid; + + associativity = of_get_associativity(pci); + if (associativity) { + nid = associativity_to_nid(associativity); + initialize_form1_numa_distance(associativity); + } + if (likely(nid >= 0) && !node_online(nid)) + node_set_online(nid); + } + /* * Now do the same thing for each MEMBLOCK listed in the * ibm,dynamic-memory property in the diff --git a/arch/powerpc/platforms/pseries/pci_dlpar.c b/arch/powerpc/platforms/pseries/pci_dlpar.c index 4448386268d9..52e2623a741d 100644 --- a/arch/powerpc/platforms/pseries/pci_dlpar.c +++ b/arch/powerpc/platforms/pseries/pci_dlpar.c @@ -11,6 +11,7 @@ #include #include +#include #include #include #include @@ -21,9 +22,22 @@ struct pci_controller *init_phb_dynamic(struct device_node *dn) { struct pci_controller *phb; + int nid; pr_debug("PCI: Initializing new hotplug PHB %pOF\n", dn); + nid = of_node_to_nid(dn); + if (likely((nid) >= 0)) { + if (!node_online(nid)) { + if (__register_one_node(nid)) { + pr_err("PCI: Failed to register node %d\n", nid); + } else { + update_numa_distance(dn); + node_set_online(nid); + } + } + } + phb = pcibios_alloc_controller(dn); if (!phb) return NULL; -- 2.44.0
[PATCH 0/1] powerpc/numa: Make cpu/memory less numa-node online
Hi, On NUMA aware system, we make a numa-node online only if that node is attached to cpu/memory. However it's possible that we have some PCI/IO device affinitized to a numa-node which is not currently online. In such case we set the numa-node id of the corresponding PCI device to -1 (NUMA_NO_NODE). Not assigning the correct numa-node id to PCI device may impact the performance of such device. For instance, we have a multi controller NVMe disk where each controller of the disk is attached to different PHB (PCI host bridge). Each of these PHBs has numa-node id assigned during PCI enumeration. During PCI enumeration if we find that the numa-node is not online then we set the numa-node id of the PHB to -1. If we create shared namespace and attach to multi controller NVMe disk then that namespace could be accessed through each controller and as each controller is connected to different PHBs, it's possible to access the same namespace using multiple PCI channel. While sending IO to a shared namespace, NVMe driver would calculate the optimal IO path using numa-node distance. However if the numa-node id is not correctly assigned to NVMe PCIe controller then it's possible that driver would calculate incorrect NUMA distance and hence select the non-optimal path for sending IO. If this happens then we could potentially observe the degraded IO performance. Please find below the performance of a multi-controller NVMe disk w/ and w/o the proposed patch applied: # lspci 0524:28:00.0 Non-Volatile memory controller: KIOXIA Corporation NVMe SSD Controller CM7 2.5" (rev 01) 0584:28:00.0 Non-Volatile memory controller: KIOXIA Corporation NVMe SSD Controller CM7 2.5" (rev 01) # nvme list -v SubsystemSubsystem-NQN Controllers nvme-subsys1 nqn.2019-10.com.kioxia:KCM7DRUG1T92:3D60A04906N1 nvme0, nvme1 Device SN MN FR TxPort AsdressSlot SubsystemNamespaces -- -- -- nvme03D60A04906N1 1.6TB NVMe Gen4 U.2 SSD IV REV.CAS2 pcie 0524:28:00.0 nvme-subsys1 nvme1n3 nvme13D60A04906N1 1.6TB NVMe Gen4 U.2 SSD IV REV.CAS2 pcie 0584:28:00.0 nvme-subsys1 nvme1n3 Device Generic NSID Usage Format Controllers -- -- /dev/nvme1n3 /dev/ng1n3 0x3 5.75 GB / 5.75 GB 4 KiB + 0 B nvme0, nvme1 We can see above the nvme disk has two controllers nvme0 and nvme1.Both these controllers can be accessed from two different PCI channels (0524:28 and 0584:28). I have also created a shared namespace (/dev/nvme1n3) which is connected behind controllers nvme0 and nvme1. Test-1: Measure IO performance w/o proposed patch: -- # numactl -H available: 1 nodes (0) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 node 0 size: 31565 MB node 0 free: 28452 MB node distances: node 0 0: 10 On this machine we only have node 0 online. # cat /sys/class/nvme/nvme1/numa_node -1 # cat /sys/class/nvme/nvme0/numa_node 0 # cat /sys/class/nvme-subsystem/nvme-subsys1/iopolicy numa We can find above the numa node id assigned to nvme1 is -1, however, the numa node id assigned to nvme0 is 0. Also the iopolicy is set to numa. Now we would run IO perf test and measure the performance: # fio --filename=/dev/nvme1n3 --direct=1 --rw=randwrite --bs=4k --ioengine=io_uring --iodepth=512 --runtime=60 --numjobs=4 --time_based --group_reporting --name=iops-test-job --eta-newline=1 --cpus_allowed=0-3 iops-test-job: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=512 ... fio-3.35 Starting 4 processes [...] [...] iops-test-job: (groupid=0, jobs=4): err= 0: pid=5665: Tue Apr 30 04:07:31 2024 write: IOPS=632k, BW=2469MiB/s (2589MB/s)(145GiB/60003msec); 0 zone resets slat (usec): min=2, max=10031, avg= 4.62, stdev= 5.40 clat (usec): min=12, max=15687, avg=3233.58, stdev=877.78 lat (usec): min=16, max=15693, avg=3238.19, stdev=879.06 clat percentiles (usec): | 1.00th=[ 2868], 5.00th=[ 2900], 10.00th=[ 2900], 20.00th=[ 2900], | 30.00th=[ 2933], 40.00th=[ 2933], 50.00th=[ 2933], 60.00th=[ 2933], | 70.00th=[ 2933], 80.00th=[ 2966], 90.00th=[ 5604], 95.00th=[ 5669], | 99.00th=[ 5735], 99.50th=[ 5735], 99.90th=[ 5866], 99.95th=[ 6456],
Re: [PATCHv4 7/9] ASoC: fsl-asoc-card: add DT clock "cpu_sysclk" with generic codec
On Wed, May 15, 2024 at 03:54:09PM +0200, Elinor Montmasson wrote: > Add an optional DT clock "cpu_sysclk" to get the CPU DAI system-clock > frequency when using the generic codec. > It is set for both Tx and Rx. > The way the frequency value is used is up to the CPU DAI driver > implementation. > + struct clk *cpu_sysclk = clk_get(>dev, "cpu_sysclk"); > + if (!IS_ERR(cpu_sysclk)) { > + priv->cpu_priv.sysclk_freq[TX] = > clk_get_rate(cpu_sysclk); > + priv->cpu_priv.sysclk_freq[RX] = > priv->cpu_priv.sysclk_freq[TX]; > + clk_put(cpu_sysclk); > + } I don't really understand the goal here - this is just reading whatever frequency happens to be set in the hardware when the driver starts up which if nothing else seems rather fragile? signature.asc Description: PGP signature
Re: [PATCHv4 9/9] ASoC: dt-bindings: fsl-asoc-card: add compatible for generic codec
On Wed, May 15, 2024 at 03:54:11PM +0200, Elinor Montmasson wrote: > Add documentation about new dts bindings following new support > for compatible "fsl,imx-audio-generic". >audio-codec: > -$ref: /schemas/types.yaml#/definitions/phandle > -description: The phandle of an audio codec > +$ref: /schemas/types.yaml#/definitions/phandle-array > +description: | > + The phandle of an audio codec. > + If using the "fsl,imx-audio-generic" compatible, give instead a pair of > + phandles with the spdif_transmitter first (driver SPDIF DIT) and the > + spdif_receiver second (driver SPDIF DIR). > +items: > + maxItems: 1 This description (and the code) don't feel like they're actually generic - they're clearly specific to the bidrectional S/PDIF case. I'd expect something called -generic to cope with single CODECs as well as double, and not to have any constraints on what those are. signature.asc Description: PGP signature
Re: [PATCH 1/3] crypto: X25519 low-level primitives for ppc64le.
Andy Polyakov writes: > Hi, > >>> +.abiversion2 >> >> I'd prefer that was left to the compiler flags. > > Problem is that it's the compiler that is responsible for providing this > directive in the intermediate .s prior invoking the assembler. And there > is no assembler flag to pass through -Wa. Hmm, right. But none of our existing .S files include .abiversion directives. We build .S files with gcc, passing -mabi=elfv2, but it seems to have no effect. So all the intermediate .o's generated from .S files are not ELFv2: $ find .build/ -name '*.o' | xargs file | grep Unspecified .build/arch/powerpc/kernel/vdso/note-64.o:ELF 64-bit LSB relocatable, 64-bit PowerPC or cisco 7500, Unspecified or Power ELF V1 ABI, version 1 (SYSV), not stripped .build/arch/powerpc/kernel/vdso/sigtramp64-64.o: ELF 64-bit LSB relocatable, 64-bit PowerPC or cisco 7500, Unspecified or Power ELF V1 ABI, version 1 (SYSV), not stripped .build/arch/powerpc/kernel/vdso/getcpu-64.o: ELF 64-bit LSB relocatable, 64-bit PowerPC or cisco 7500, Unspecified or Power ELF V1 ABI, version 1 (SYSV), not stripped .build/arch/powerpc/kernel/vdso/gettimeofday-64.o:ELF 64-bit LSB relocatable, 64-bit PowerPC or cisco 7500, Unspecified or Power ELF V1 ABI, version 1 (SYSV), not stripped .build/arch/powerpc/kernel/vdso/datapage-64.o:ELF 64-bit LSB relocatable, 64-bit PowerPC or cisco 7500, Unspecified or Power ELF V1 ABI, version 1 (SYSV), not stripped ... But the actual code follows ELFv2, because we wrote it that way, and I guess the linker doesn't look at the actual ABI version of the .o ? So it currently works. But it's kind of gross that those .o files are not ELFv2 for an ELFv2 build. > If concern is ABI neutrality, > then solution would rather be #if (_CALL_ELF-0) == 2/#endif. One can > also make a case for > > #ifdef _CALL_ELF > .abiversion _CALL_ELF > #endif Is .abiversion documented anywhere? I can't see it in the manual. We used to use _CALL_ELF, but the kernel config is supposed to be the source of truth, so we'd use: #ifdef CONFIG_PPC64_ELF_ABI_V2 .abiversion 2 #endif And probably put it in a macro like: #ifdef CONFIG_PPC64_ELF_ABI_V2 #define ASM_ABI_VERSION .abiversion 2 #else #define ASM_ABI_VERSION #endif Or something like that. But it's annoying that we need to go and sprinkle that in every .S file. Anyway, my comment can be ignored as far as this series is concerned, seems we have to clean this up everywhere. cheers
Re: [PATCH 1/3] crypto: X25519 low-level primitives for ppc64le.
Hi Andy, I learned something here. Will fix this. Thanks. -Danny On 5/16/24 3:38 AM, Andy Polyakov wrote: Hi, +.abiversion 2 I'd prefer that was left to the compiler flags. Problem is that it's the compiler that is responsible for providing this directive in the intermediate .s prior invoking the assembler. And there is no assembler flag to pass through -Wa. If concern is ABI neutrality, then solution would rather be #if (_CALL_ELF-0) == 2/#endif. One can also make a case for #ifdef _CALL_ELF .abiversion _CALL_ELF #endif Cheers.
Re: [PATCH 1/3] crypto: X25519 low-level primitives for ppc64le.
On 5/15/24 11:53 PM, Michael Ellerman wrote: Hi Danny, Danny Tsen writes: Use the perl output of x25519-ppc64.pl from CRYPTOGAMs and added three supporting functions, x25519_fe51_sqr_times, x25519_fe51_frombytes and x25519_fe51_tobytes. For other algorithms we have checked-in the perl script and generated the code at runtime. Is there a reason you've done it differently this time? Hi Michael, It's easier for me to read and use just assembly not mixed with perl and it's easier for me to debug and testing also I copied some code and made some modification. Signed-off-by: Danny Tsen --- arch/powerpc/crypto/curve25519-ppc64le_asm.S | 648 +++ 1 file changed, 648 insertions(+) create mode 100644 arch/powerpc/crypto/curve25519-ppc64le_asm.S diff --git a/arch/powerpc/crypto/curve25519-ppc64le_asm.S b/arch/powerpc/crypto/curve25519-ppc64le_asm.S new file mode 100644 index ..8a018104838a --- /dev/null +++ b/arch/powerpc/crypto/curve25519-ppc64le_asm.S @@ -0,0 +1,648 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +# +# Copyright 2024- IBM Corp. All Rights Reserved. I'm not a lawyer, but AFAIK "All Rights Reserved" is not required and can be confusing - because we are not reserving all rights, we are granting some rights under the GPL. I also think the IBM copyright should be down below where your modifications are described. Will change that. +# This code is taken from CRYPTOGAMs[1] and is included here using the option +# in the license to distribute the code under the GPL. Therefore this program +# is free software; you can redistribute it and/or modify it under the terms of +# the GNU General Public License version 2 as published by the Free Software +# Foundation. +# +# [1] https://www.openssl.org/~appro/cryptogams/ + +# Copyright (c) 2006-2017, CRYPTOGAMS by +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# +# * Redistributions of source code must retain copyright notices, +# this list of conditions and the following disclaimer. +# +# * Redistributions in binary form must reproduce the above +# copyright notice, this list of conditions and the following +# disclaimer in the documentation and/or other materials +# provided with the distribution. +# +# * Neither the name of the CRYPTOGAMS nor the names of its +# copyright holder and contributors may be used to endorse or +# promote products derived from this software without specific +# prior written permission. +# +# ALTERNATIVELY, provided that this notice is retained in full, this +# product may be distributed under the terms of the GNU General Public +# License (GPL), in which case the provisions of the GPL apply INSTEAD OF +# those given above. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +# +# Written by Andy Polyakov for the OpenSSL +# project. The module is, however, dual licensed under OpenSSL and +# CRYPTOGAMS licenses depending on where you obtain it. For further +# details see https://www.openssl.org/~appro/cryptogams/. +# + +# +# +# Written and Modified by Danny Tsen +# - Added x25519_fe51_sqr_times, x25519_fe51_frombytes, x25519_fe51_tobytes ie. here. +# X25519 lower-level primitives for PPC64. +# + +#include + +.machine "any" Please don't add new .machine directives unless they are required. +.abiversion2 I'd prefer that was left to the compiler flags. Ok. Thanks. -Danny cheers
Re: [PATCH 1/3] crypto: X25519 low-level primitives for ppc64le.
Hi, +.abiversion2 I'd prefer that was left to the compiler flags. Problem is that it's the compiler that is responsible for providing this directive in the intermediate .s prior invoking the assembler. And there is no assembler flag to pass through -Wa. If concern is ABI neutrality, then solution would rather be #if (_CALL_ELF-0) == 2/#endif. One can also make a case for #ifdef _CALL_ELF .abiversion _CALL_ELF #endif Cheers.