Re: linux-next: build warning after merge of the powerpc tree
Hi Michael, On Thu, 16 May 2024 12:42:01 +1000 Michael Ellerman wrote: > > Stephen Rothwell writes: > > > > After merging the powerpc tree, today's (it may have been yesterday's) > > linux-next build (powerpc allyesconfig) produced this warning: > > > > WARNING: modpost: vmlinux: section mismatch in reference: > > fadump_setup_param_area+0x200 (section: .text.fadump_setup_param_area) -> > > memblock_phys_alloc_range (section: .init.text) > > I don't see the warning, but clearly it is possible if the compiler > decides not to inline fadump_setup_param_area(). > > What compiler version are you using? $ gcc --version gcc (Debian 13.2.0-7) 13.2.0 (on zz1 if you want to test) -- Cheers, Stephen Rothwell pgpxBGGLpHnkV.pgp Description: OpenPGP digital signature
Re: [PATCH 1/3] crypto: X25519 low-level primitives for ppc64le.
Hi Danny, Danny Tsen writes: > Use the perl output of x25519-ppc64.pl from CRYPTOGAMs and added three > supporting functions, x25519_fe51_sqr_times, x25519_fe51_frombytes > and x25519_fe51_tobytes. For other algorithms we have checked-in the perl script and generated the code at runtime. Is there a reason you've done it differently this time? > Signed-off-by: Danny Tsen > --- > arch/powerpc/crypto/curve25519-ppc64le_asm.S | 648 +++ > 1 file changed, 648 insertions(+) > create mode 100644 arch/powerpc/crypto/curve25519-ppc64le_asm.S > > diff --git a/arch/powerpc/crypto/curve25519-ppc64le_asm.S > b/arch/powerpc/crypto/curve25519-ppc64le_asm.S > new file mode 100644 > index ..8a018104838a > --- /dev/null > +++ b/arch/powerpc/crypto/curve25519-ppc64le_asm.S > @@ -0,0 +1,648 @@ > +/* SPDX-License-Identifier: GPL-2.0-or-later */ > +# > +# Copyright 2024- IBM Corp. All Rights Reserved. I'm not a lawyer, but AFAIK "All Rights Reserved" is not required and can be confusing - because we are not reserving all rights, we are granting some rights under the GPL. I also think the IBM copyright should be down below where your modifications are described. > +# This code is taken from CRYPTOGAMs[1] and is included here using the option > +# in the license to distribute the code under the GPL. Therefore this program > +# is free software; you can redistribute it and/or modify it under the terms > of > +# the GNU General Public License version 2 as published by the Free Software > +# Foundation. > +# > +# [1] https://www.openssl.org/~appro/cryptogams/ > + > +# Copyright (c) 2006-2017, CRYPTOGAMS by > +# All rights reserved. > +# > +# Redistribution and use in source and binary forms, with or without > +# modification, are permitted provided that the following conditions > +# are met: > +# > +# * Redistributions of source code must retain copyright notices, > +# this list of conditions and the following disclaimer. > +# > +# * Redistributions in binary form must reproduce the above > +# copyright notice, this list of conditions and the following > +# disclaimer in the documentation and/or other materials > +# provided with the distribution. > +# > +# * Neither the name of the CRYPTOGAMS nor the names of its > +# copyright holder and contributors may be used to endorse or > +# promote products derived from this software without specific > +# prior written permission. > +# > +# ALTERNATIVELY, provided that this notice is retained in full, this > +# product may be distributed under the terms of the GNU General Public > +# License (GPL), in which case the provisions of the GPL apply INSTEAD OF > +# those given above. > +# > +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER AND CONTRIBUTORS > +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR > +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT > +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, > +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY > +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > + > +# > +# Written by Andy Polyakov for the OpenSSL > +# project. The module is, however, dual licensed under OpenSSL and > +# CRYPTOGAMS licenses depending on where you obtain it. For further > +# details see https://www.openssl.org/~appro/cryptogams/. > +# > + > +# > +# > +# Written and Modified by Danny Tsen > +# - Added x25519_fe51_sqr_times, x25519_fe51_frombytes, x25519_fe51_tobytes ie. here. > +# X25519 lower-level primitives for PPC64. > +# > + > +#include > + > +.machine "any" Please don't add new .machine directives unless they are required. > +.abiversion 2 I'd prefer that was left to the compiler flags. cheers
Re: linux-next: build warning after merge of the powerpc tree
Stephen Rothwell writes: > Hi all, > > After merging the powerpc tree, today's (it may have been yesterday's) > linux-next build (powerpc allyesconfig) produced this warning: > > WARNING: modpost: vmlinux: section mismatch in reference: > fadump_setup_param_area+0x200 (section: .text.fadump_setup_param_area) -> > memblock_phys_alloc_range (section: .init.text) I don't see the warning, but clearly it is possible if the compiler decides not to inline fadump_setup_param_area(). What compiler version are you using? cheers
Re: [PATCH v3] kprobe/ftrace: bail out if ftrace was killed
On Wed, 15 May 2024 15:18:08 -0700 Stephen Brennan wrote: > Masami Hiramatsu (Google) writes: > > On Thu, 2 May 2024 01:35:16 +0800 > > Guo Ren wrote: > > > >> On Thu, May 2, 2024 at 12:30 AM Stephen Brennan > >> wrote: > >> > > >> > If an error happens in ftrace, ftrace_kill() will prevent disarming > >> > kprobes. Eventually, the ftrace_ops associated with the kprobes will be > >> > freed, yet the kprobes will still be active, and when triggered, they > >> > will use the freed memory, likely resulting in a page fault and panic. > >> > > >> > This behavior can be reproduced quite easily, by creating a kprobe and > >> > then triggering a ftrace_kill(). For simplicity, we can simulate an > >> > ftrace error with a kernel module like [1]: > >> > > >> > [1]: https://github.com/brenns10/kernel_stuff/tree/master/ftrace_killer > >> > > >> > sudo perf probe --add commit_creds > >> > sudo perf trace -e probe:commit_creds > >> > # In another terminal > >> > make > >> > sudo insmod ftrace_killer.ko # calls ftrace_kill(), simulating bug > >> > # Back to perf terminal > >> > # ctrl-c > >> > sudo perf probe --del commit_creds > >> > > >> > After a short period, a page fault and panic would occur as the kprobe > >> > continues to execute and uses the freed ftrace_ops. While ftrace_kill() > >> > is supposed to be used only in extreme circumstances, it is invoked in > >> > FTRACE_WARN_ON() and so there are many places where an unexpected bug > >> > could be triggered, yet the system may continue operating, possibly > >> > without the administrator noticing. If ftrace_kill() does not panic the > >> > system, then we should do everything we can to continue operating, > >> > rather than leave a ticking time bomb. > >> > > >> > Signed-off-by: Stephen Brennan > >> > --- > >> > Changes in v3: > >> > Don't expose ftrace_is_dead(). Create a "kprobe_ftrace_disabled" > >> > variable and check it directly in the kprobe handlers. > >> > Link to v1/v2 discussion: > >> > > >> > https://lore.kernel.org/all/20240426225834.993353-1-stephen.s.bren...@oracle.com/ > >> > > >> > arch/csky/kernel/probes/ftrace.c | 3 +++ > >> > arch/loongarch/kernel/ftrace_dyn.c | 3 +++ > >> > arch/parisc/kernel/ftrace.c | 3 +++ > >> > arch/powerpc/kernel/kprobes-ftrace.c | 3 +++ > >> > arch/riscv/kernel/probes/ftrace.c| 3 +++ > >> > arch/s390/kernel/ftrace.c| 3 +++ > >> > arch/x86/kernel/kprobes/ftrace.c | 3 +++ > >> > include/linux/kprobes.h | 7 +++ > >> > kernel/kprobes.c | 6 ++ > >> > kernel/trace/ftrace.c| 1 + > >> > 10 files changed, 35 insertions(+) > >> > > >> > diff --git a/arch/csky/kernel/probes/ftrace.c > >> > b/arch/csky/kernel/probes/ftrace.c > >> > index 834cffcfbce3..7ba4b98076de 100644 > >> > --- a/arch/csky/kernel/probes/ftrace.c > >> > +++ b/arch/csky/kernel/probes/ftrace.c > >> > @@ -12,6 +12,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned > >> > long parent_ip, > >> > struct kprobe_ctlblk *kcb; > >> > struct pt_regs *regs; > >> > > >> > + if (unlikely(kprobe_ftrace_disabled)) > >> > + return; > >> > + > >> For csky part. > >> Acked-by: Guo Ren > > > > Thanks Stephen, Guo and Steve! > > > > Let me pick this to probes/for-next! > > Thank you Masami! > > I did want to check, is this the correct git tree to be watching? > > https://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git/log/?h=probes/for-next > > ( I'm not trying to pressure on timing, as I know the merge window is > hectic. Just making sure I'm watching the correct place! ) Sorry, I forgot to push it from my local tree. Now it should be there. Thanks, -- Masami Hiramatsu (Google)
Re: [PATCH v3] kprobe/ftrace: bail out if ftrace was killed
Masami Hiramatsu (Google) writes: > On Thu, 2 May 2024 01:35:16 +0800 > Guo Ren wrote: > >> On Thu, May 2, 2024 at 12:30 AM Stephen Brennan >> wrote: >> > >> > If an error happens in ftrace, ftrace_kill() will prevent disarming >> > kprobes. Eventually, the ftrace_ops associated with the kprobes will be >> > freed, yet the kprobes will still be active, and when triggered, they >> > will use the freed memory, likely resulting in a page fault and panic. >> > >> > This behavior can be reproduced quite easily, by creating a kprobe and >> > then triggering a ftrace_kill(). For simplicity, we can simulate an >> > ftrace error with a kernel module like [1]: >> > >> > [1]: https://github.com/brenns10/kernel_stuff/tree/master/ftrace_killer >> > >> > sudo perf probe --add commit_creds >> > sudo perf trace -e probe:commit_creds >> > # In another terminal >> > make >> > sudo insmod ftrace_killer.ko # calls ftrace_kill(), simulating bug >> > # Back to perf terminal >> > # ctrl-c >> > sudo perf probe --del commit_creds >> > >> > After a short period, a page fault and panic would occur as the kprobe >> > continues to execute and uses the freed ftrace_ops. While ftrace_kill() >> > is supposed to be used only in extreme circumstances, it is invoked in >> > FTRACE_WARN_ON() and so there are many places where an unexpected bug >> > could be triggered, yet the system may continue operating, possibly >> > without the administrator noticing. If ftrace_kill() does not panic the >> > system, then we should do everything we can to continue operating, >> > rather than leave a ticking time bomb. >> > >> > Signed-off-by: Stephen Brennan >> > --- >> > Changes in v3: >> > Don't expose ftrace_is_dead(). Create a "kprobe_ftrace_disabled" >> > variable and check it directly in the kprobe handlers. >> > Link to v1/v2 discussion: >> > >> > https://lore.kernel.org/all/20240426225834.993353-1-stephen.s.bren...@oracle.com/ >> > >> > arch/csky/kernel/probes/ftrace.c | 3 +++ >> > arch/loongarch/kernel/ftrace_dyn.c | 3 +++ >> > arch/parisc/kernel/ftrace.c | 3 +++ >> > arch/powerpc/kernel/kprobes-ftrace.c | 3 +++ >> > arch/riscv/kernel/probes/ftrace.c| 3 +++ >> > arch/s390/kernel/ftrace.c| 3 +++ >> > arch/x86/kernel/kprobes/ftrace.c | 3 +++ >> > include/linux/kprobes.h | 7 +++ >> > kernel/kprobes.c | 6 ++ >> > kernel/trace/ftrace.c| 1 + >> > 10 files changed, 35 insertions(+) >> > >> > diff --git a/arch/csky/kernel/probes/ftrace.c >> > b/arch/csky/kernel/probes/ftrace.c >> > index 834cffcfbce3..7ba4b98076de 100644 >> > --- a/arch/csky/kernel/probes/ftrace.c >> > +++ b/arch/csky/kernel/probes/ftrace.c >> > @@ -12,6 +12,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned >> > long parent_ip, >> > struct kprobe_ctlblk *kcb; >> > struct pt_regs *regs; >> > >> > + if (unlikely(kprobe_ftrace_disabled)) >> > + return; >> > + >> For csky part. >> Acked-by: Guo Ren > > Thanks Stephen, Guo and Steve! > > Let me pick this to probes/for-next! Thank you Masami! I did want to check, is this the correct git tree to be watching? https://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git/log/?h=probes/for-next ( I'm not trying to pressure on timing, as I know the merge window is hectic. Just making sure I'm watching the correct place! ) Thanks, Stephen
Re: kswapd0: page allocation failure: order:0, mode:0x820(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0 (Kernel v6.5.9, 32bit ppc)
On Wed, May 15, 2024 at 2:45 PM Erhard Furtner wrote: > > On Wed, 8 May 2024 20:21:11 +0200 > Erhard Furtner wrote: > > > Greetings! > > > > Got that on my dual CPU PowerMac G4 DP shortly after boot. This does not > > happen every time at bootup though: > > > > [...] > > kswapd0: page allocation failure: order:0, mode:0x820(GFP_ATOMIC), > > nodemask=(null),cpuset=/,mems_allowed=0 > > CPU: 1 PID: 40 Comm: kswapd0 Not tainted 6.8.9-gentoo-PMacG4 #1 > > Hardware name: PowerMac3,6 7455 0x80010303 PowerMac > > Very similar page allocation failure on the same machine on kernel 6.9.0 too. > Seems it can easily be provoked by running a memory stressor, e.g. "stress-ng > --vm 2 --vm-bytes 1930M --verify -v": > > [...] > kswapd0: page allocation failure: order:0, mode:0xcc0(GFP_KERNEL), > nodemask=(null),cpuset=/,mems_allowed=0 > CPU: 0 PID: 41 Comm: kswapd0 Not tainted 6.9.0-gentoo-PMacG4 #1 > Hardware name: PowerMac3,6 7455 0x80010303 PowerMac > Call Trace: > [c1c65940] [c07926d4] dump_stack_lvl+0x80/0xac (unreliable) > [c1c65960] [c01b6234] warn_alloc+0x100/0x178 > [c1c659c0] [c01b661c] __alloc_pages+0x370/0x8d0 > [c1c65a80] [c01c4854] __read_swap_cache_async+0xc0/0x1cc > [c1c65ad0] [c01cb580] zswap_writeback_entry+0x50/0x154 > [c1c65be0] [c01cb6f4] shrink_memcg_cb+0x70/0xec > [c1c65c10] [c019518c] __list_lru_walk_one+0xa0/0x154 > [c1c65c70] [c01952a4] list_lru_walk_one+0x64/0x7c > [c1c65ca0] [c01cad00] zswap_shrinker_scan+0xac/0xc4 > [c1c65cd0] [c018052c] do_shrink_slab+0x18c/0x304 > [c1c65d20] [c0180a40] shrink_slab+0x174/0x260 > [c1c65da0] [c017cb0c] shrink_one+0xbc/0x134 > [c1c65dd0] [c017e3e4] shrink_node+0x238/0x84c > [c1c65e50] [c017ed38] balance_pgdat+0x340/0x650 > [c1c65f50] [c017f270] kswapd+0x228/0x25c > [c1c65fc0] [c006bbac] kthread+0xe4/0xe8 > [c1c65ff0] [c0015304] start_kernel_thread+0x10/0x14 > SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC) > cache: skbuff_small_head, object size: 480, buffer size: 544, default > order: 1, min order: 0 > node 0: slabs: 15, objs: 225, free: 0 > SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC) > cache: skbuff_small_head, object size: 480, buffer size: 544, default > order: 1, min order: 0 > node 0: slabs: 15, objs: 225, free: 0 > SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC) > cache: skbuff_small_head, object size: 480, buffer size: 544, default > order: 1, min order: 0 > node 0: slabs: 15, objs: 225, free: 0 > SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC) > cache: skbuff_small_head, object size: 480, buffer size: 544, default > order: 1, min order: 0 > node 0: slabs: 15, objs: 225, free: 0 > SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC) > cache: skbuff_small_head, object size: 480, buffer size: 544, default > order: 1, min order: 0 > node 0: slabs: 15, objs: 225, free: 0 > SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC) > cache: kmalloc-rnd-15-2k, object size: 2048, buffer size: 6144, default > order: 3, min order: 1 > kmalloc-rnd-15-2k debugging increased min order, use slab_debug=O to > disable. > node 0: slabs: 33, objs: 165, free: 0 > SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC) > cache: skbuff_small_head, object size: 480, buffer size: 544, default > order: 1, min order: 0 > node 0: slabs: 15, objs: 225, free: 0 > SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC) > cache: kmalloc-rnd-15-2k, object size: 2048, buffer size: 6144, default > order: 3, min order: 1 > kmalloc-rnd-15-2k debugging increased min order, use slab_debug=O to > disable. > node 0: slabs: 33, objs: 165, free: 0 > SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC) > cache: skbuff_small_head, object size: 480, buffer size: 544, default > order: 1, min order: 0 > node 0: slabs: 15, objs: 225, free: 0 > SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC) > cache: kmalloc-rnd-15-2k, object size: 2048, buffer size: 6144, default > order: 3, min order: 1 > kmalloc-rnd-15-2k debugging increased min order, use slab_debug=O to > disable. > node 0: slabs: 33, objs: 165, free: 0 > Mem-Info: > active_anon:340071 inactive_anon:139179 isolated_anon:0 > active_file:8297 inactive_file:2506 isolated_file:0 > unevictable:4 dirty:1 writeback:18 > slab_reclaimable:1377 slab_unreclaimable:7426 > mapped:6804 shmem:112 pagetables:946 > sec_pagetables:0 bounce:0 > kernel_misc_reclaimable:0 > free:1141 free_pcp:7 free_cma:0 > Node 0 active_anon:1360284kB inactive_anon:556716kB active_file:33188kB > inactive_file:10024kB unevictable:16kB isolated(anon):0kB isolated(file):0kB > mapped:27216kB dirty:4kB writeback:72kB shmem:448kB writeback_tmp:0kB > kernel_stack:1560kB pagetables:3784kB sec_pagetables:0kB all_unreclaimable? no > DMA free:56kB boost:7756kB min:11208kB low:12068kB high:12928kB > reserved_highatomic:0KB active_anon:635128kB inactive_anon:58260kB >
Re: [PATCH v15 00/16] Add audio support in v4l2 framework
Hi, GStreamer hat on ... Le mercredi 15 mai 2024 à 12:46 +0200, Jaroslav Kysela a écrit : > On 15. 05. 24 12:19, Takashi Iwai wrote: > > On Wed, 15 May 2024 11:50:52 +0200, > > Jaroslav Kysela wrote: > > > > > > On 15. 05. 24 11:17, Hans Verkuil wrote: > > > > Hi Jaroslav, > > > > > > > > On 5/13/24 13:56, Jaroslav Kysela wrote: > > > > > On 09. 05. 24 13:13, Jaroslav Kysela wrote: > > > > > > On 09. 05. 24 12:44, Shengjiu Wang wrote: > > > > > > > > > mem2mem is just like the decoder in the compress pipeline. > > > > > > > > > which is > > > > > > > > > one of the components in the pipeline. > > > > > > > > > > > > > > > > I was thinking of loopback with endpoints using compress > > > > > > > > streams, > > > > > > > > without physical endpoint, something like: > > > > > > > > > > > > > > > > compress playback (to feed data from userspace) -> DSP > > > > > > > > (processing) -> > > > > > > > > compress capture (send data back to userspace) > > > > > > > > > > > > > > > > Unless I'm missing something, you should be able to process > > > > > > > > data as fast > > > > > > > > as you can feed it and consume it in such case. > > > > > > > > > > > > > > > > > > > > > > Actually in the beginning I tried this, but it did not work well. > > > > > > > ALSA needs time control for playback and capture, playback and > > > > > > > capture > > > > > > > needs to synchronize. Usually the playback and capture pipeline > > > > > > > is > > > > > > > independent in ALSA design, but in this case, the playback and > > > > > > > capture > > > > > > > should synchronize, they are not independent. > > > > > > > > > > > > The core compress API core no strict timing constraints. You can > > > > > > eventually0 > > > > > > have two half-duplex compress devices, if you like to have really > > > > > > independent > > > > > > mechanism. If something is missing in API, you can extend this API > > > > > > (like to > > > > > > inform the user space that it's a producer/consumer processing > > > > > > without any > > > > > > relation to the real time). I like this idea. > > > > > > > > > > I was thinking more about this. If I am right, the mentioned use in > > > > > gstreamer > > > > > is supposed to run the conversion (DSP) job in "one shot" (can be > > > > > handled > > > > > using one system call like blocking ioctl). The goal is just to > > > > > offload the > > > > > CPU work to the DSP (co-processor). If there are no requirements for > > > > > the > > > > > queuing, we can implement this ioctl in the compress ALSA API easily > > > > > using the > > > > > data management through the dma-buf API. We can eventually define a > > > > > new > > > > > direction (enum snd_compr_direction) like SND_COMPRESS_CONVERT or so > > > > > to allow > > > > > handle this new data scheme. The API may be extended later on real > > > > > demand, of > > > > > course. > > > > > > > > > > Otherwise all pieces are already in the current ALSA compress API > > > > > (capabilities, params, enumeration). The realtime controls may be > > > > > created > > > > > using ALSA control API. > > > > > > > > So does this mean that Shengjiu should attempt to use this ALSA > > > > approach first? > > > > > > I've not seen any argument to use v4l2 mem2mem buffer scheme for this > > > data conversion forcefully. It looks like a simple job and ALSA APIs > > > may be extended for this simple purpose. > > > > > > Shengjiu, what are your requirements for gstreamer support? Would be a > > > new blocking ioctl enough for the initial support in the compress ALSA > > > API? > > > > If it works with compress API, it'd be great, yeah. > > So, your idea is to open compress-offload devices for read and write, > > then and let them convert a la batch jobs without timing control? > > > > For full-duplex usages, we might need some more extensions, so that > > both read and write parameters can be synchronized. (So far the > > compress stream is a unidirectional, and the runtime buffer for a > > single stream.) > > > > And the buffer management is based on the fixed size fragments. I > > hope this doesn't matter much for the intended operation? > > It's a question, if the standard I/O is really required for this case. My > quick idea was to just implement a new "direction" for this job supporting > only one ioctl for the data processing which will execute the job in "one > shot" at the moment. The I/O may be handled through dma-buf API (which seems > to be standard nowadays for this purpose and allows future chaining). > > So something like: > > struct dsp_job { > int source_fd; /* dma-buf FD with source data - for dma_buf_get() */ > int target_fd; /* dma-buf FD for target data - for dma_buf_get() */ > ... maybe some extra data size members here ... > ... maybe some special parameters here ... > }; > > #define SNDRV_COMPRESS_DSPJOB _IOWR('C', 0x60, struct dsp_job) > > This ioctl will be blocking (thus synced).
Re: [PATCH v2 1/1] arch/fault: don't print logs for pte marker poison errors
On Wed, May 15, 2024 at 12:19:16PM -0700, Axel Rasmussen wrote: > An unprivileged process can allocate a VMA, use the userfaultfd API to > install one of these PTE markers, and then register a no-op SIGBUS > handler. Now it can access that address in a tight loop, Maybe the userfaultfd should not allow this, I dunno. You made me look at this thing and to me it all sounds weird. One thread does page fault handling for the other and that helps with live migration somehow. OMG, whaaat? Maybe I don't understand it and probably never will... But, for example, membarrier used do to a stupid thing of allowing one thread to hammer another with an IPI storm. Bad bad idea. So it got fixed. All I'm saying is, if unprivileged processes can do crap, they should be prevented from doing crap. Like ratelimiting the pagefaults or whatnot. One of the recovery action strategies from memory poison is, well, you kill the process. If you can detect the hammering process which installed that page marker, you kill it. Problem solved. But again, this userfaultfd thing sounds really weird so I could very well be way wrong. > Even in a non-contrived / non-malicious case, use of this API could > have similar effects. If nothing else, the log message can be > confusing to administrators: they state that an MCE occurred, whereas > with the simulated poison API, this is not the case; it isn't a "real" > MCE / hardware error. Yeah, I read that part in Documentation/admin-guide/mm/userfaultfd.rst Simulated poison huh? Another WTF. > In the KVM use case, the host can't just allocate a new page, because > it doesn't know what the guest might have had stored there. Best we Ok, let's think of real hw poison. When doing the recovery, you don't care what's stored there because as far as the hardware is concerned, if you consume that poison the *whole* machine might go down. So you lose the page. Plain and simple. And the guest can go visit the bureau of complaints and grievances. Still better than killing the guest or even the whole host with other guests running on it. > can do is propagate the poison into the guest, and let the guest OS > deal with it as it sees fit, and mark the page poisoned on the host. You mark the page as poison on the host and you yank it from under the guest. That physical frame is gone and the faster all the actors involved understand that, the better. > I don't disagree the guest *shouldn't* reaccess it in this case. :) > But if it did, it should get another poison event just as you say. Yes, it shouldn't. Look at memory_failure(). This will kill whole processes if it has to, depending on what the page is used for. > And, live migration between physical hosts should be transparent to > the guest. So if the guest gets a poison, and then we live migrate it, So if I were to design this, I'd do it this way: 0. guest gets hw poison injected 1. it runs memory_failure() and it kills the processes using the page. 2. page is marked poisoned on the host so no other guest gets it. That's it. No second accesses whatsoever. At least this is how it works on baremetal. This hw poisoning emulation is just silly and unnecessary. But again, I probably am missing some aspects. It all just sounded really weird to me that's why I thought I should ask what's behind all that. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH v2 1/1] arch/fault: don't print logs for pte marker poison errors
On Wed, May 15, 2024 at 11:33 AM Borislav Petkov wrote: > > On Wed, May 15, 2024 at 10:33:03AM -0700, Axel Rasmussen wrote: > > Right, the goal is to still have the process get a SIGBUS, but to > > avoid the "MCE error" log message. The basic issue is, unprivileged > > users can set these markers up, and thereby completely spam up the > > log. > > What is the real attack scenario you want to protect against? > > Or is this something hypothetical? An unprivileged process can allocate a VMA, use the userfaultfd API to install one of these PTE markers, and then register a no-op SIGBUS handler. Now it can access that address in a tight loop, generating a huge number of kernel log messages. This can e.g. bog down the system, or at least drown out other important log messages. For example the userfaultfd selftest does something similar to this to test that the API works properly. :) Even in a non-contrived / non-malicious case, use of this API could have similar effects. If nothing else, the log message can be confusing to administrators: they state that an MCE occurred, whereas with the simulated poison API, this is not the case; it isn't a "real" MCE / hardware error. > > > That said, one thing I'm not sure about is whether or not > > VM_FAULT_SIGBUS is a viable alternative (returned for a new PTE marker > > type specific to simulated poison). The goal of the simulated poison > > feature is to "closely simulate" a real hardware poison event. If you > > live migrate a VM from a host with real poisoned memory, to a new > > host: you'd want to keep the same behavior if the guest accessed those > > addresses again, so as not to confuse the guest about why it suddenly > > became "un-poisoned". > > Well, the recovery action is to poison the page and the process should > be resilient enough and allocate a new, clean page which doesn't trigger > hw poison hopefully, if possible. > > It doesn't make a whole lotta sense if poison "remains". Hardware poison > you don't want to touch a second time either - otherwise you might > consume that poison and die. In the KVM use case, the host can't just allocate a new page, because it doesn't know what the guest might have had stored there. Best we can do is propagate the poison into the guest, and let the guest OS deal with it as it sees fit, and mark the page poisoned on the host. I don't disagree the guest *shouldn't* reaccess it in this case. :) But if it did, it should get another poison event just as you say. And, live migration between physical hosts should be transparent to the guest. So if the guest gets a poison, and then we live migrate it, and then it accesses that address again, it should likewise get another poison event, just as before. Even though the underlying physical memory is *not* poisoned on the new host machine. So the use case is, after live migration, we install one of these PTE markers to simulate a poison event in case the address is accessed again. But since it isn't *really* a hardware error on the new host, no reason to spam the host kernel log when / if this occurs. > > -- > Regards/Gruss, > Boris. > > https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH v2 1/1] arch/fault: don't print logs for pte marker poison errors
On Wed, May 15, 2024 at 10:33:03AM -0700, Axel Rasmussen wrote: > Right, the goal is to still have the process get a SIGBUS, but to > avoid the "MCE error" log message. The basic issue is, unprivileged > users can set these markers up, and thereby completely spam up the > log. What is the real attack scenario you want to protect against? Or is this something hypothetical? > That said, one thing I'm not sure about is whether or not > VM_FAULT_SIGBUS is a viable alternative (returned for a new PTE marker > type specific to simulated poison). The goal of the simulated poison > feature is to "closely simulate" a real hardware poison event. If you > live migrate a VM from a host with real poisoned memory, to a new > host: you'd want to keep the same behavior if the guest accessed those > addresses again, so as not to confuse the guest about why it suddenly > became "un-poisoned". Well, the recovery action is to poison the page and the process should be resilient enough and allocate a new, clean page which doesn't trigger hw poison hopefully, if possible. It doesn't make a whole lotta sense if poison "remains". Hardware poison you don't want to touch a second time either - otherwise you might consume that poison and die. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH v2 1/1] arch/fault: don't print logs for pte marker poison errors
On Wed, May 15, 2024 at 3:54 AM Oscar Salvador wrote: > > On Wed, May 15, 2024 at 12:41:42PM +0200, Borislav Petkov wrote: > > On Fri, May 10, 2024 at 11:29:26AM -0700, Axel Rasmussen wrote: > > > @@ -3938,7 +3938,7 @@ static vm_fault_t handle_pte_marker(struct vm_fault > > > *vmf) > > > > > > /* Higher priority than uffd-wp when data corrupted */ > > > if (marker & PTE_MARKER_POISONED) > > > - return VM_FAULT_HWPOISON; > > > + return VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_SILENT; > > > > If you know here that this poisoning should be silent, why do you have > > to make it all complicated and propagate it into arch code, waste > > a separate VM_FAULT flag just for that instead of simply returning here > > a VM_FAULT_COMPLETED or some other innocuous value which would stop > > processing the fault? > > AFAIK, He only wants it to be silent wrt. the arch fault handler not > screaming, > but he still wants to be able to trigger force_sig_mceerr(). Right, the goal is to still have the process get a SIGBUS, but to avoid the "MCE error" log message. The basic issue is, unprivileged users can set these markers up, and thereby completely spam up the log. Also since this is a process-specific thing, and it's not a real hardware poison event, it's unclear system admins care at all at a global level (this is why we didn't want to switch to just printk_ratelimited for example). Better to let the process handle the SIGBUS however it likes for its use case (logging a message elsewhere, etc.). That said, one thing I'm not sure about is whether or not VM_FAULT_SIGBUS is a viable alternative (returned for a new PTE marker type specific to simulated poison). The goal of the simulated poison feature is to "closely simulate" a real hardware poison event. If you live migrate a VM from a host with real poisoned memory, to a new host: you'd want to keep the same behavior if the guest accessed those addresses again, so as not to confuse the guest about why it suddenly became "un-poisoned". At a basic level I think VM_FAULT_SIGBUS gives us what we want (send SIGBUS to the process, don't log about MCEs), but I'm not confident I know all the differences vs. VM_FAULT_HWPOISON on all the arches. > > > -- > Oscar Salvador > SUSE Labs
Re: [PATCH 2/3] crypto: X25519 core functions for ppc64le
Thanks for the info. I should be able to do it. I was hoping an assembly guru like you can show me some tricks here if there is :) No tricks in cswap, it's as straightforward as it gets, so go ahead :-)
Re: [PATCH v15 00/16] Add audio support in v4l2 framework
On 5/9/24 06:13, Jaroslav Kysela wrote: > On 09. 05. 24 12:44, Shengjiu Wang wrote: mem2mem is just like the decoder in the compress pipeline. which is one of the components in the pipeline. >>> >>> I was thinking of loopback with endpoints using compress streams, >>> without physical endpoint, something like: >>> >>> compress playback (to feed data from userspace) -> DSP (processing) -> >>> compress capture (send data back to userspace) >>> >>> Unless I'm missing something, you should be able to process data as fast >>> as you can feed it and consume it in such case. >>> >> >> Actually in the beginning I tried this, but it did not work well. >> ALSA needs time control for playback and capture, playback and capture >> needs to synchronize. Usually the playback and capture pipeline is >> independent in ALSA design, but in this case, the playback and capture >> should synchronize, they are not independent. > > The core compress API core no strict timing constraints. You can > eventually0 have two half-duplex compress devices, if you like to have > really independent mechanism. If something is missing in API, you can > extend this API (like to inform the user space that it's a > producer/consumer processing without any relation to the real time). I > like this idea. The compress API was never intended to be used this way. It was meant to send compressed data to a DSP for rendering, and keep the host processor in a low-power state while the DSP local buffer was drained. There was no intent to do a loop back to the host, because that keeps the host in a high-power state and probably negates the power savings due to a DSP. The other problem with the loopback is that the compress stuff is usually a "Front-End" in ASoC/DPCM parlance, and we don't have a good way to do a loopback between Front-Ends. The entire framework is based on FEs being connected to BEs. One problem that I can see for ASRC is that it's not clear when the data will be completely processed on the "capture" stream when you stop the "playback" stream. There's a non-zero risk of having a truncated output or waiting for data that will never be generated. In other words, it might be possible to reuse/extend the compress API for a 'coprocessor' approach without any rendering to traditional interfaces, but it's uncharted territory.
[PATCHv4 3/9] ASoC: fsl-asoc-card: add compatibility to use 2 codecs in dai-links
Adapt the driver to work with configurations using two codecs or more. Modify fsl_asoc_card_probe() to handle use cases where 2 codecs are given in the device tree. This will be needed for the generic codec case. Use cases using one codec will ignore any given codecs other than the first. Signed-off-by: Elinor Montmasson Co-authored-by: Philip-Dylan Gleonec --- sound/soc/fsl/fsl-asoc-card.c | 239 -- 1 file changed, 139 insertions(+), 100 deletions(-) diff --git a/sound/soc/fsl/fsl-asoc-card.c b/sound/soc/fsl/fsl-asoc-card.c index c83492e7cec2..620a25eb068a 100644 --- a/sound/soc/fsl/fsl-asoc-card.c +++ b/sound/soc/fsl/fsl-asoc-card.c @@ -99,7 +99,7 @@ struct fsl_asoc_card_priv { struct simple_util_jack hp_jack; struct simple_util_jack mic_jack; struct platform_device *pdev; - struct codec_priv codec_priv; + struct codec_priv codec_priv[2]; struct cpu_priv cpu_priv; struct snd_soc_card card; u8 streams; @@ -172,11 +172,13 @@ static int fsl_asoc_card_hw_params(struct snd_pcm_substream *substream, struct snd_soc_pcm_runtime *rtd = snd_soc_substream_to_rtd(substream); struct fsl_asoc_card_priv *priv = snd_soc_card_get_drvdata(rtd->card); bool tx = substream->stream == SNDRV_PCM_STREAM_PLAYBACK; - struct codec_priv *codec_priv = >codec_priv; + struct codec_priv *codec_priv; + struct snd_soc_dai *codec_dai; struct cpu_priv *cpu_priv = >cpu_priv; struct device *dev = rtd->card->dev; unsigned int pll_out; int ret; + int i; priv->sample_rate = params_rate(params); priv->sample_format = params_format(params); @@ -208,28 +210,32 @@ static int fsl_asoc_card_hw_params(struct snd_pcm_substream *substream, } /* Specific configuration for PLL */ - if (codec_priv->pll_id >= 0 && codec_priv->fll_id >= 0) { - if (priv->sample_format == SNDRV_PCM_FORMAT_S24_LE) - pll_out = priv->sample_rate * 384; - else - pll_out = priv->sample_rate * 256; + for_each_rtd_codec_dais(rtd, i, codec_dai) { + codec_priv = >codec_priv[i]; - ret = snd_soc_dai_set_pll(snd_soc_rtd_to_codec(rtd, 0), - codec_priv->pll_id, - codec_priv->mclk_id, - codec_priv->mclk_freq, pll_out); - if (ret) { - dev_err(dev, "failed to start FLL: %d\n", ret); - goto fail; - } + if (codec_priv->pll_id >= 0 && codec_priv->fll_id >= 0) { + if (priv->sample_format == SNDRV_PCM_FORMAT_S24_LE) + pll_out = priv->sample_rate * 384; + else + pll_out = priv->sample_rate * 256; - ret = snd_soc_dai_set_sysclk(snd_soc_rtd_to_codec(rtd, 0), -codec_priv->fll_id, -pll_out, SND_SOC_CLOCK_IN); + ret = snd_soc_dai_set_pll(codec_dai, + codec_priv->pll_id, + codec_priv->mclk_id, + codec_priv->mclk_freq, pll_out); + if (ret) { + dev_err(dev, "failed to start FLL: %d\n", ret); + goto fail; + } - if (ret && ret != -ENOTSUPP) { - dev_err(dev, "failed to set SYSCLK: %d\n", ret); - goto fail; + ret = snd_soc_dai_set_sysclk(codec_dai, + codec_priv->fll_id, + pll_out, SND_SOC_CLOCK_IN); + + if (ret && ret != -ENOTSUPP) { + dev_err(dev, "failed to set SYSCLK: %d\n", ret); + goto fail; + } } } @@ -244,28 +250,34 @@ static int fsl_asoc_card_hw_free(struct snd_pcm_substream *substream) { struct snd_soc_pcm_runtime *rtd = snd_soc_substream_to_rtd(substream); struct fsl_asoc_card_priv *priv = snd_soc_card_get_drvdata(rtd->card); - struct codec_priv *codec_priv = >codec_priv; + struct codec_priv *codec_priv; + struct snd_soc_dai *codec_dai; struct device *dev = rtd->card->dev; int ret; + int i; priv->streams &= ~BIT(substream->stream); - if (!priv->streams && codec_priv->pll_id >= 0 && codec_priv->fll_id >= 0) { - /* Force freq to be free_freq to avoid error message in codec */ - ret =
[PATCHv4 9/9] ASoC: dt-bindings: fsl-asoc-card: add compatible for generic codec
Add documentation about new dts bindings following new support for compatible "fsl,imx-audio-generic". Some CPU DAI don't require a real audio codec. The new compatible "fsl,imx-audio-generic" allows using the driver with codec drivers SPDIF DIT and SPDIF DIR as dummy codecs. It also allows using not pre-configured audio codecs which don't require specific control through a codec driver. The new dts properties give the possibility to set some parameters about the CPU DAI usually set through the codec configuration. Signed-off-by: Elinor Montmasson --- .../bindings/sound/fsl-asoc-card.yaml | 96 ++- 1 file changed, 92 insertions(+), 4 deletions(-) diff --git a/Documentation/devicetree/bindings/sound/fsl-asoc-card.yaml b/Documentation/devicetree/bindings/sound/fsl-asoc-card.yaml index 9922664d5ccc..332d8bf96e06 100644 --- a/Documentation/devicetree/bindings/sound/fsl-asoc-card.yaml +++ b/Documentation/devicetree/bindings/sound/fsl-asoc-card.yaml @@ -23,6 +23,16 @@ description: and PCM DAI formats. However, it'll be also possible to support those non AC'97/I2S/PCM type sound cards, such as S/PDIF audio and HDMI audio, as long as the driver has been properly upgraded. + To use CPU DAIs that do not require a codec such as an S/PDIF controller, + or to use a DAI to output or capture raw I2S/TDM data, you can + use the compatible "fsl,imx-audio-generic". + +definitions: + imx-audio-generic-dependency: +properties: + compatible: +contains: + const: fsl,imx-audio-generic maintainers: - Shengjiu Wang @@ -81,6 +91,7 @@ properties: - fsl,imx-audio-wm8960 - fsl,imx-audio-wm8962 - fsl,imx-audio-wm8958 + - fsl,imx-audio-generic model: $ref: /schemas/types.yaml#/definitions/string @@ -93,8 +104,14 @@ properties: need to add ASRC support via DPCM. audio-codec: -$ref: /schemas/types.yaml#/definitions/phandle -description: The phandle of an audio codec +$ref: /schemas/types.yaml#/definitions/phandle-array +description: | + The phandle of an audio codec. + If using the "fsl,imx-audio-generic" compatible, give instead a pair of + phandles with the spdif_transmitter first (driver SPDIF DIT) and the + spdif_receiver second (driver SPDIF DIR). +items: + maxItems: 1 audio-cpu: $ref: /schemas/types.yaml#/definitions/phandle @@ -150,8 +167,8 @@ properties: description: dai-link uses bit clock inversion. mclk-id: -$ref: /schemas/types.yaml#/definitions/uint32 -description: main clock id, specific for each card configuration. +$ref: /schemas/types.yaml#/definitions/uint32-array +description: Main clock id for each codec, specific for each card configuration. mux-int-port: $ref: /schemas/types.yaml#/definitions/uint32 @@ -167,10 +184,68 @@ properties: $ref: /schemas/types.yaml#/definitions/phandle description: The phandle of an CPU DAI controller + # Properties relevant only with "fsl,imx-audio-generic" compatible + dai-tdm-slot-width: +description: See tdm-slot.txt. +$ref: /schemas/types.yaml#/definitions/uint32 + + dai-tdm-slot-num: +description: See tdm-slot.txt. +$ref: /schemas/types.yaml#/definitions/uint32 + + clocks: +description: | + The CPU DAI system clock, used to retrieve + the CPU DAI system clock frequency with the generic codec. +maxItems: 1 + + clock-names: +items: + - const: cpu_sysclk + + cpu-system-clock-direction-out: +description: | + Specifies cpu system clock direction as 'out' on initialization. + If not set, direction is 'in'. +$ref: /schemas/types.yaml#/definitions/flag + +dependencies: + dai-tdm-slot-width: +$ref: "#/definitions/imx-audio-generic-dependency" + dai-tdm-slot-num: +$ref: "#/definitions/imx-audio-generic-dependency" + clocks: +$ref: "#/definitions/imx-audio-generic-dependency" + cpu-system-clock-direction-out: +$ref: "#/definitions/imx-audio-generic-dependency" + required: - compatible - model +allOf: + - if: + $ref: "#/definitions/imx-audio-generic-dependency" +then: + properties: +audio-codec: + items: +- description: SPDIF DIT phandle +- description: SPDIF DIR phandle +mclk-id: + maxItems: 1 + items: +minItems: 1 +maxItems: 2 +else: + properties: +audio-codec: + maxItems: 1 +mclk-id: + maxItems: 1 + items: +maxItems: 1 + unevaluatedProperties: false examples: @@ -195,3 +270,16 @@ examples: "AIN2L", "Line In Jack", "AIN2R", "Line In Jack"; }; + + - | +#include +sound-spdif-asrc { + compatible = "fsl,imx-audio-generic"; + model = "spdif-asrc-audio"; + audio-cpu = <>; + audio-asrc = <>; + audio-codec = <>,
[PATCHv4 0/9] ASoC: fsl-asoc-card: compatibility integration of a generic codec use case for use with S/PDIF controller
Hello, This is the v4 of the series of patch aiming to make the machine driver "fsl-asoc-card" compatible with use cases where there is no real codec driver. It proposes to use the "spdif_receiver" and "spdif_transmitter" drivers instead of the dummy codec. This is a first step in using the S/PDIF controller with the ASRC. The five first patches add compatibility with the pair of codecs "spdif_receiver" and "spdif_transmitter" with a new compatible, "fsl,imx-audio-generic". Codec parameters are set with default values. Consequently, the driver is modified to work with multi-codec use cases. It can get 2 codecs phandles from the device tree, and the "fsl_asoc_card_priv" struct now has 2 "codec_priv" to store properties of both codecs. It is fixed to 2 codecs as only "fsl,imx-audio-generic" uses multiple codecs at the moment. However, the driver now uses "for_each_codecs" macros when possible to ease future implementations of multi-codec configurations. The three following patches add configuration options for the devicetree. They configure the CPU DAI when using "fsl,imx-audio-generic". These options are usually hard-coded in "fsl-asoc-card.c" for each audio codec. Because the generic codec could be used with other CPU DAIs than the S/PDIF controller, setting these parameters could be required. These new options try to follow the style of the simple-card driver: * standard TDM properties are used, as defined in "tdm-slot.txt". * the CPU DAI system-clock can be specified, allowing the codec to retrieve its frequency. * the CPU DAI system-clock direction can be specified through a new binding, the same way it is done in simple-card. The last commit updates the DT bindings documentation and add a new example for the generic codec use case. This series of patch was successfully built for arm64 and x86 on top of the latest??"for-next" branch of the ASoC git tree on the 14th of May 2024. These modifications have also been tested on an i.MX8MN evaluation board, with a linux kernel RT v6.1.26-rt8. If you have any question or remark about these commits, don't hesitate to reply. Best regards, Elinor Montmasson Changelog: v3 -> v4: * Use the standard TDM bidings, as defined in "tdm-slot.txt", for the new optional DT bindings setting the TDM slots number and width. * Use the clock DT bindings to optionally specify the CPU DAI system clock frequency, instead of a dedicated new binding. * Rename the new DT binding "cpu-sysclk-dir-out" to "cpu-system-clock-direction-out" to better follow the style of the simple-card driver. * Merge TX an RX bindings for CPU DAI system-clock, to better follow the style of the simple-card driver, and also as there was no use case in fsl-asoc-card where TX and RX settings had to be different. * Add the documentation for the new bindings in the new DT schema bindings documentation. Also add an example with the generic codec. * v3 patch series at : https://lore.kernel.org/alsa-devel/20231218124058.2047167-1-elinor.montmas...@savoirfairelinux.com/ v2 -> v3: * When the bitmaster or framemaster are retrieved from the device tree, the driver will now compare them with the two codecs possibly given in device tree, and not just the first codec. * Improve driver modifications to use multiple codecs for better integration of future multi-codec use cases: * Use "for_each_codec" macros when possible. * "fsl_asoc_card_priv" struct now has 2 "codec_priv" as the driver can currently retrieve 2 codec phandles from the device tree. * Fix subject of patch 10/10 to follow the style of the subsystem * v2 patch series at: https://lore.kernel.org/alsa-devel/20231027144734.3654829-1-elinor.montmas...@savoirfairelinux.com/ v1 -> v2: * Replace use of the dummy codec by the pair of codecs spdif_receiver / spdif_transmitter. * Adapt how dai links codecs are used to take into account the possibility for multiple codecs per link. * Change compatible name. * Adapt driver to be able to register two codecs given in the device tree. * v1 patch series at: https://lore.kernel.org/alsa-devel/20230901144550.520072-1-elinor.montmas...@savoirfairelinux.com/ Elinor Montmasson (9): ASoC: fsl-asoc-card: add support for dai links with multiple codecs ASoC: fsl-asoc-card: add second dai link component for codecs ASoC: fsl-asoc-card: add compatibility to use 2 codecs in dai-links ASoC: fsl-asoc-card: add new compatible for a generic codec use case ASoC: fsl-asoc-card: set generic codec as clock provider ASoC: fsl-asoc-card: add use of devicetree TDM slot properties ASoC: fsl-asoc-card: add DT clock "cpu_sysclk" with generic codec ASoC: fsl-asoc-card: add DT property "cpu-system-clock-direction-out" ASoC: dt-bindings: fsl-asoc-card: add compatible for generic codec .../bindings/sound/fsl-asoc-card.yaml | 96 +- sound/soc/fsl/fsl-asoc-card.c | 306 +++--- 2 files changed, 287 insertions(+), 115 deletions(-) -- 2.34.1
[PATCHv4 4/9] ASoC: fsl-asoc-card: add new compatible for a generic codec use case
Add the new compatible "fsl,imx-audio-generic" for a generic codec use case. It allows using the fsl-asoc-card driver with the spdif_receiver and spdif_transmitter codec drivers used as dummy codecs. It can be used for cases where there is no real codec or codecs which do not require declaring controls. Signed-off-by: Elinor Montmasson Co-authored-by: Philip-Dylan Gleonec --- sound/soc/fsl/fsl-asoc-card.c | 29 ++--- 1 file changed, 26 insertions(+), 3 deletions(-) diff --git a/sound/soc/fsl/fsl-asoc-card.c b/sound/soc/fsl/fsl-asoc-card.c index 620a25eb068a..a4ecc9093558 100644 --- a/sound/soc/fsl/fsl-asoc-card.c +++ b/sound/soc/fsl/fsl-asoc-card.c @@ -567,6 +567,7 @@ static int fsl_asoc_card_probe(struct platform_device *pdev) struct platform_device *cpu_pdev; struct fsl_asoc_card_priv *priv; struct device *codec_dev[2] = { NULL, NULL }; + const char *generic_codec_dai_names[2]; const char *codec_dai_name; const char *codec_dev_name[2]; u32 asrc_fmt = 0; @@ -744,6 +745,11 @@ static int fsl_asoc_card_probe(struct platform_device *pdev) priv->codec_priv[0].fll_id = WM8904_CLK_FLL; priv->codec_priv[0].pll_id = WM8904_FLL_MCLK; priv->dai_fmt |= SND_SOC_DAIFMT_CBP_CFP; + } else if (of_device_is_compatible(np, "fsl,imx-audio-generic")) { + generic_codec_dai_names[0] = "dit-hifi"; + generic_codec_dai_names[1] = "dir-hifi"; + priv->dai_link[0].num_codecs = 2; + priv->dai_link[2].num_codecs = 2; } else { dev_err(>dev, "unknown Device Tree compatible\n"); ret = -EINVAL; @@ -798,6 +804,12 @@ static int fsl_asoc_card_probe(struct platform_device *pdev) ret = -EPROBE_DEFER; goto asrc_fail; } + if (of_device_is_compatible(np, "fsl,imx-audio-generic") + && !codec_dev[1]) { + dev_dbg(>dev, "failed to find second codec device\n"); + ret = -EPROBE_DEFER; + goto asrc_fail; + } /* Common settings for corresponding Freescale CPU DAI driver */ if (of_node_name_eq(cpu_np, "ssi")) { @@ -855,11 +867,21 @@ static int fsl_asoc_card_probe(struct platform_device *pdev) /* Normal DAI Link */ priv->dai_link[0].cpus->of_node = cpu_np; - priv->dai_link[0].codecs[0].dai_name = codec_dai_name; - if (!fsl_asoc_card_is_ac97(priv)) + if (of_device_is_compatible(np, "fsl,imx-audio-generic")) { + priv->dai_link[0].codecs[0].dai_name = + generic_codec_dai_names[0]; + priv->dai_link[0].codecs[1].dai_name = + generic_codec_dai_names[1]; + } else { + priv->dai_link[0].codecs[0].dai_name = codec_dai_name; + } + + if (!fsl_asoc_card_is_ac97(priv)) { priv->dai_link[0].codecs[0].of_node = codec_np[0]; - else { + if (of_device_is_compatible(np, "fsl,imx-audio-generic")) + priv->dai_link[0].codecs[1].of_node = codec_np[1]; + } else { u32 idx; ret = of_property_read_u32(cpu_np, "cell-index", ); @@ -990,6 +1012,7 @@ static const struct of_device_id fsl_asoc_card_dt_ids[] = { { .compatible = "fsl,imx-audio-wm8958", }, { .compatible = "fsl,imx-audio-nau8822", }, { .compatible = "fsl,imx-audio-wm8904", }, + { .compatible = "fsl,imx-audio-generic", }, {} }; MODULE_DEVICE_TABLE(of, fsl_asoc_card_dt_ids); -- 2.34.1
[PATCHv4 5/9] ASoC: fsl-asoc-card: set generic codec as clock provider
The default dai format defined by DAI_FMT_BASE doesn't set if the codec is consumer or provider of the bit and frame clocks. S/PDIF DIR usually converts audio signal to an asynchronous I2S/PCM stream, and doesn't consume a bit or frame clock. As S/PDIF DIR and DIT are used as codecs for the generic use case, set codecs as provider of both bit and frame clocks by default. Signed-off-by: Elinor Montmasson Co-authored-by: Philip-Dylan Gleonec --- sound/soc/fsl/fsl-asoc-card.c | 1 + 1 file changed, 1 insertion(+) diff --git a/sound/soc/fsl/fsl-asoc-card.c b/sound/soc/fsl/fsl-asoc-card.c index a4ecc9093558..82ed7f4e81a1 100644 --- a/sound/soc/fsl/fsl-asoc-card.c +++ b/sound/soc/fsl/fsl-asoc-card.c @@ -750,6 +750,7 @@ static int fsl_asoc_card_probe(struct platform_device *pdev) generic_codec_dai_names[1] = "dir-hifi"; priv->dai_link[0].num_codecs = 2; priv->dai_link[2].num_codecs = 2; + priv->dai_fmt |= SND_SOC_DAIFMT_CBP_CFP; } else { dev_err(>dev, "unknown Device Tree compatible\n"); ret = -EINVAL; -- 2.34.1
[PATCHv4 1/9] ASoC: fsl-asoc-card: add support for dai links with multiple codecs
Add support for dai links using multiple codecs for multi-codec use cases. Signed-off-by: Elinor Montmasson Co-authored-by: Philip-Dylan Gleonec --- sound/soc/fsl/fsl-asoc-card.c | 22 ++ 1 file changed, 14 insertions(+), 8 deletions(-) diff --git a/sound/soc/fsl/fsl-asoc-card.c b/sound/soc/fsl/fsl-asoc-card.c index 5ddc0c2fe53f..8a2a6e5461dc 100644 --- a/sound/soc/fsl/fsl-asoc-card.c +++ b/sound/soc/fsl/fsl-asoc-card.c @@ -815,10 +815,10 @@ static int fsl_asoc_card_probe(struct platform_device *pdev) /* Normal DAI Link */ priv->dai_link[0].cpus->of_node = cpu_np; - priv->dai_link[0].codecs->dai_name = codec_dai_name; + priv->dai_link[0].codecs[0].dai_name = codec_dai_name; if (!fsl_asoc_card_is_ac97(priv)) - priv->dai_link[0].codecs->of_node = codec_np; + priv->dai_link[0].codecs[0].of_node = codec_np; else { u32 idx; @@ -829,11 +829,11 @@ static int fsl_asoc_card_probe(struct platform_device *pdev) goto asrc_fail; } - priv->dai_link[0].codecs->name = + priv->dai_link[0].codecs[0].name = devm_kasprintf(>dev, GFP_KERNEL, "ac97-codec.%u", (unsigned int)idx); - if (!priv->dai_link[0].codecs->name) { + if (!priv->dai_link[0].codecs[0].name) { ret = -ENOMEM; goto asrc_fail; } @@ -844,13 +844,19 @@ static int fsl_asoc_card_probe(struct platform_device *pdev) priv->card.num_links = 1; if (asrc_pdev) { + int i; + struct snd_soc_dai_link_component *codec; + struct snd_soc_dai_link *link; + /* DPCM DAI Links only if ASRC exists */ priv->dai_link[1].cpus->of_node = asrc_np; priv->dai_link[1].platforms->of_node = asrc_np; - priv->dai_link[2].codecs->dai_name = codec_dai_name; - priv->dai_link[2].codecs->of_node = codec_np; - priv->dai_link[2].codecs->name = - priv->dai_link[0].codecs->name; + link = &(priv->dai_link[2]); + for_each_link_codecs(link, i, codec) { + codec->dai_name = priv->dai_link[0].codecs[i].dai_name; + codec->of_node = priv->dai_link[0].codecs[i].of_node; + codec->name = priv->dai_link[0].codecs[i].name; + } priv->dai_link[2].cpus->of_node = cpu_np; priv->dai_link[2].dai_fmt = priv->dai_fmt; priv->card.num_links = 3; -- 2.34.1
[PATCHv4 7/9] ASoC: fsl-asoc-card: add DT clock "cpu_sysclk" with generic codec
Add an optional DT clock "cpu_sysclk" to get the CPU DAI system-clock frequency when using the generic codec. It is set for both Tx and Rx. The way the frequency value is used is up to the CPU DAI driver implementation. Signed-off-by: Elinor Montmasson --- sound/soc/fsl/fsl-asoc-card.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/sound/soc/fsl/fsl-asoc-card.c b/sound/soc/fsl/fsl-asoc-card.c index 9aca8ad15372..c7fc9c16f761 100644 --- a/sound/soc/fsl/fsl-asoc-card.c +++ b/sound/soc/fsl/fsl-asoc-card.c @@ -754,6 +754,12 @@ static int fsl_asoc_card_probe(struct platform_device *pdev) snd_soc_of_parse_tdm_slot(np, NULL, NULL, >cpu_priv.slot_num, >cpu_priv.slot_width); + struct clk *cpu_sysclk = clk_get(>dev, "cpu_sysclk"); + if (!IS_ERR(cpu_sysclk)) { + priv->cpu_priv.sysclk_freq[TX] = clk_get_rate(cpu_sysclk); + priv->cpu_priv.sysclk_freq[RX] = priv->cpu_priv.sysclk_freq[TX]; + clk_put(cpu_sysclk); + } } else { dev_err(>dev, "unknown Device Tree compatible\n"); ret = -EINVAL; -- 2.34.1
[PATCHv4 6/9] ASoC: fsl-asoc-card: add use of devicetree TDM slot properties
Add use of optional TDM slot properties "dai-tdm-slot-num" and "dai-tdm-slot-width" through snd_soc_of_parse_tdm_slot(). They allow setting a custom TDM slot width in bits and number of slots for the CPU DAI when using the generic codec. Signed-off-by: Elinor Montmasson --- sound/soc/fsl/fsl-asoc-card.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/sound/soc/fsl/fsl-asoc-card.c b/sound/soc/fsl/fsl-asoc-card.c index 82ed7f4e81a1..9aca8ad15372 100644 --- a/sound/soc/fsl/fsl-asoc-card.c +++ b/sound/soc/fsl/fsl-asoc-card.c @@ -751,6 +751,9 @@ static int fsl_asoc_card_probe(struct platform_device *pdev) priv->dai_link[0].num_codecs = 2; priv->dai_link[2].num_codecs = 2; priv->dai_fmt |= SND_SOC_DAIFMT_CBP_CFP; + snd_soc_of_parse_tdm_slot(np, NULL, NULL, + >cpu_priv.slot_num, + >cpu_priv.slot_width); } else { dev_err(>dev, "unknown Device Tree compatible\n"); ret = -EINVAL; -- 2.34.1
[PATCHv4 2/9] ASoC: fsl-asoc-card: add second dai link component for codecs
Add a second dai link component for codecs that will be used for the generic codec use case. It will use spdif_receiver and spdif_transmitter drivers as dummy codec drivers, needing 2 codecs slots for the links. To prevent deferring in use cases using only one codec, also set by default the number of codecs to 1 for the relevant dai links. Signed-off-by: Elinor Montmasson Co-authored-by: Philip-Dylan Gleonec --- sound/soc/fsl/fsl-asoc-card.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/sound/soc/fsl/fsl-asoc-card.c b/sound/soc/fsl/fsl-asoc-card.c index 8a2a6e5461dc..c83492e7cec2 100644 --- a/sound/soc/fsl/fsl-asoc-card.c +++ b/sound/soc/fsl/fsl-asoc-card.c @@ -296,7 +296,7 @@ static int be_hw_params_fixup(struct snd_soc_pcm_runtime *rtd, SND_SOC_DAILINK_DEFS(hifi, DAILINK_COMP_ARRAY(COMP_EMPTY()), - DAILINK_COMP_ARRAY(COMP_EMPTY()), + DAILINK_COMP_ARRAY(COMP_EMPTY(), COMP_EMPTY()), DAILINK_COMP_ARRAY(COMP_EMPTY())); SND_SOC_DAILINK_DEFS(hifi_fe, @@ -306,7 +306,7 @@ SND_SOC_DAILINK_DEFS(hifi_fe, SND_SOC_DAILINK_DEFS(hifi_be, DAILINK_COMP_ARRAY(COMP_EMPTY()), - DAILINK_COMP_ARRAY(COMP_EMPTY())); + DAILINK_COMP_ARRAY(COMP_EMPTY(), COMP_EMPTY())); static const struct snd_soc_dai_link fsl_asoc_card_dai[] = { /* Default ASoC DAI Link*/ @@ -618,6 +618,8 @@ static int fsl_asoc_card_probe(struct platform_device *pdev) memcpy(priv->dai_link, fsl_asoc_card_dai, sizeof(struct snd_soc_dai_link) * ARRAY_SIZE(priv->dai_link)); + priv->dai_link[0].num_codecs = 1; + priv->dai_link[2].num_codecs = 1; priv->card.dapm_routes = audio_map; priv->card.num_dapm_routes = ARRAY_SIZE(audio_map); -- 2.34.1
[PATCHv4 8/9] ASoC: fsl-asoc-card: add DT property "cpu-system-clock-direction-out"
Add new optional DT property "cpu-system-clock-direction-out" to set sysclk direction as "out" for the CPU DAI when using the generic codec. It is set for both Tx and Rx. If not set, the direction is "in". The way the direction value is used is up to the CPU DAI driver implementation. Signed-off-by: Elinor Montmasson --- sound/soc/fsl/fsl-asoc-card.c | 4 1 file changed, 4 insertions(+) diff --git a/sound/soc/fsl/fsl-asoc-card.c b/sound/soc/fsl/fsl-asoc-card.c index c7fc9c16f761..f3fc2b29c92f 100644 --- a/sound/soc/fsl/fsl-asoc-card.c +++ b/sound/soc/fsl/fsl-asoc-card.c @@ -760,6 +760,10 @@ static int fsl_asoc_card_probe(struct platform_device *pdev) priv->cpu_priv.sysclk_freq[RX] = priv->cpu_priv.sysclk_freq[TX]; clk_put(cpu_sysclk); } + priv->cpu_priv.sysclk_dir[TX] = + of_property_read_bool(np, "cpu-system-clock-direction-out") ? + SND_SOC_CLOCK_OUT : SND_SOC_CLOCK_IN; + priv->cpu_priv.sysclk_dir[RX] = priv->cpu_priv.sysclk_dir[TX]; } else { dev_err(>dev, "unknown Device Tree compatible\n"); ret = -EINVAL; -- 2.34.1
Re: [PATCH 2/3] crypto: X25519 core functions for ppc64le
Hi Andy, Thanks for the info. I should be able to do it. I was hoping an assembly guru like you can show me some tricks here if there is :) Thanks. -Danny On 5/15/24 8:33 AM, Andy Polyakov wrote: +static void cswap(fe51 p, fe51 q, unsigned int bit) +{ + u64 t, i; + u64 c = 0 - (u64) bit; + + for (i = 0; i < 5; ++i) { + t = c & (p[i] ^ q[i]); + p[i] ^= t; + q[i] ^= t; + } +} The "c" in cswap stands for "constant-time," and the problem is that contemporary compilers have exhibited the ability to produce non-constant-time machine code as result of compilation of the above kind of technique. The outcome is platform-specific and ironically some of PPC code generators were observed to generate "most" non-constant-time code. "Most" in sense that execution time variations would be most easy to catch. Just to substantiate the point, consider https://godbolt.org/z/faYnEcPT7, and note the conditional branch in the middle of the loop, which flies in the face of constant-time-ness. In case you object 'bit &= 1' on line 7 in the C code. Indeed, if you comment it out, the generated code will be fine. But the point is that the compiler is capable of and was in fact observed to figure out that the caller passes either one or zero and generate the machine code in the assembly window. In other words 'bit &= 1' is just a reflection of what the caller does. ... the permanent solution is to do it in assembly. I can put together something... Though you should be able to do this just as well :-) So should I or would you? Cheers.
Re: [PATCH v15 00/16] Add audio support in v4l2 framework
On Wed, May 15, 2024 at 6:46 PM Jaroslav Kysela wrote: > > On 15. 05. 24 12:19, Takashi Iwai wrote: > > On Wed, 15 May 2024 11:50:52 +0200, > > Jaroslav Kysela wrote: > >> > >> On 15. 05. 24 11:17, Hans Verkuil wrote: > >>> Hi Jaroslav, > >>> > >>> On 5/13/24 13:56, Jaroslav Kysela wrote: > On 09. 05. 24 13:13, Jaroslav Kysela wrote: > > On 09. 05. 24 12:44, Shengjiu Wang wrote: > mem2mem is just like the decoder in the compress pipeline. which is > one of the components in the pipeline. > >>> > >>> I was thinking of loopback with endpoints using compress streams, > >>> without physical endpoint, something like: > >>> > >>> compress playback (to feed data from userspace) -> DSP (processing) -> > >>> compress capture (send data back to userspace) > >>> > >>> Unless I'm missing something, you should be able to process data as > >>> fast > >>> as you can feed it and consume it in such case. > >>> > >> > >> Actually in the beginning I tried this, but it did not work well. > >> ALSA needs time control for playback and capture, playback and capture > >> needs to synchronize. Usually the playback and capture pipeline is > >> independent in ALSA design, but in this case, the playback and capture > >> should synchronize, they are not independent. > > > > The core compress API core no strict timing constraints. You can > > eventually0 > > have two half-duplex compress devices, if you like to have really > > independent > > mechanism. If something is missing in API, you can extend this API > > (like to > > inform the user space that it's a producer/consumer processing without > > any > > relation to the real time). I like this idea. > > I was thinking more about this. If I am right, the mentioned use in > gstreamer > is supposed to run the conversion (DSP) job in "one shot" (can be handled > using one system call like blocking ioctl). The goal is just to offload > the > CPU work to the DSP (co-processor). If there are no requirements for the > queuing, we can implement this ioctl in the compress ALSA API easily > using the > data management through the dma-buf API. We can eventually define a new > direction (enum snd_compr_direction) like SND_COMPRESS_CONVERT or so to > allow > handle this new data scheme. The API may be extended later on real > demand, of > course. > > Otherwise all pieces are already in the current ALSA compress API > (capabilities, params, enumeration). The realtime controls may be created > using ALSA control API. > >>> > >>> So does this mean that Shengjiu should attempt to use this ALSA approach > >>> first? > >> > >> I've not seen any argument to use v4l2 mem2mem buffer scheme for this > >> data conversion forcefully. It looks like a simple job and ALSA APIs > >> may be extended for this simple purpose. > >> > >> Shengjiu, what are your requirements for gstreamer support? Would be a > >> new blocking ioctl enough for the initial support in the compress ALSA > >> API? > > > > If it works with compress API, it'd be great, yeah. > > So, your idea is to open compress-offload devices for read and write, > > then and let them convert a la batch jobs without timing control? > > > > For full-duplex usages, we might need some more extensions, so that > > both read and write parameters can be synchronized. (So far the > > compress stream is a unidirectional, and the runtime buffer for a > > single stream.) > > > > And the buffer management is based on the fixed size fragments. I > > hope this doesn't matter much for the intended operation? > > It's a question, if the standard I/O is really required for this case. My > quick idea was to just implement a new "direction" for this job supporting > only one ioctl for the data processing which will execute the job in "one > shot" at the moment. The I/O may be handled through dma-buf API (which seems > to be standard nowadays for this purpose and allows future chaining). > > So something like: > > struct dsp_job { > int source_fd; /* dma-buf FD with source data - for dma_buf_get() */ > int target_fd; /* dma-buf FD for target data - for dma_buf_get() */ > ... maybe some extra data size members here ... > ... maybe some special parameters here ... > }; > > #define SNDRV_COMPRESS_DSPJOB _IOWR('C', 0x60, struct dsp_job) > > This ioctl will be blocking (thus synced). My question is, if it's feasible > for gstreamer or not. For this particular case, if the rate conversion is > implemented in software, it will block the gstreamer data processing, too. > Thanks. I have several questions: 1. Compress API alway binds to a sound card. Can we avoid that? For ASRC, it is just one component, 2. Compress API doesn't seem to support mmap(). Is this a problem for sending and getting data
Re: [PATCH 2/3] crypto: X25519 core functions for ppc64le
+static void cswap(fe51 p, fe51 q, unsigned int bit) +{ + u64 t, i; + u64 c = 0 - (u64) bit; + + for (i = 0; i < 5; ++i) { + t = c & (p[i] ^ q[i]); + p[i] ^= t; + q[i] ^= t; + } +} The "c" in cswap stands for "constant-time," and the problem is that contemporary compilers have exhibited the ability to produce non-constant-time machine code as result of compilation of the above kind of technique. The outcome is platform-specific and ironically some of PPC code generators were observed to generate "most" non-constant-time code. "Most" in sense that execution time variations would be most easy to catch. Just to substantiate the point, consider https://godbolt.org/z/faYnEcPT7, and note the conditional branch in the middle of the loop, which flies in the face of constant-time-ness. In case you object 'bit &= 1' on line 7 in the C code. Indeed, if you comment it out, the generated code will be fine. But the point is that the compiler is capable of and was in fact observed to figure out that the caller passes either one or zero and generate the machine code in the assembly window. In other words 'bit &= 1' is just a reflection of what the caller does. ... the permanent solution is to do it in assembly. I can put together something... Though you should be able to do this just as well :-) So should I or would you? Cheers.
Re: [PATCH 2/3] crypto: X25519 core functions for ppc64le
Hi Andy, Points taken. And much appreciate for the help. Thanks. -Danny On 5/15/24 3:29 AM, Andy Polyakov wrote: Hi, +static void cswap(fe51 p, fe51 q, unsigned int bit) +{ + u64 t, i; + u64 c = 0 - (u64) bit; + + for (i = 0; i < 5; ++i) { + t = c & (p[i] ^ q[i]); + p[i] ^= t; + q[i] ^= t; + } +} The "c" in cswap stands for "constant-time," and the problem is that contemporary compilers have exhibited the ability to produce non-constant-time machine code as result of compilation of the above kind of technique. The outcome is platform-specific and ironically some of PPC code generators were observed to generate "most" non-constant-time code. "Most" in sense that execution time variations would be most easy to catch. One way to work around the problem, at least for the time being, is to add 'asm volatile("" : "+r"(c))' after you calculate 'c'. But there is no guarantee that the next compiler version won't see through it, hence the permanent solution is to do it in assembly. I can put together something... Cheers.
Re: [PATCH 1/3] crypto: X25519 low-level primitives for ppc64le.
See inline. On 5/15/24 4:06 AM, Andy Polyakov wrote: Hi, +SYM_FUNC_START(x25519_fe51_sqr_times) ... + +.Lsqr_times_loop: ... + + std 9,16(3) + std 10,24(3) + std 11,32(3) + std 7,0(3) + std 8,8(3) + bdnz .Lsqr_times_loop I see no reason for why the stores can't be moved outside the loop in question. Yeah. I'll fix it. +SYM_FUNC_START(x25519_fe51_frombytes) +.align 5 + + li 12, -1 + srdi 12, 12, 13 # 0x7 + + ld 5, 0(4) + ld 6, 8(4) + ld 7, 16(4) + ld 8, 24(4) Is there actual guarantee that the byte input is 64-bit aligned? While it is true that processor is obliged to handle misaligned loads and stores by the ISA specification, them being inefficient doesn't go against it. Most notably inefficiency is likely to be noted at the page boundaries. What I'm trying to say is that it would be more appropriate to avoid the unaligned loads (and stores). Good point. Maybe I can handle it with 64-bit aligned for the input. Thanks. Cheers.
Re: [PATCH 1/3] crypto: X25519 low-level primitives for ppc64le.
Thank you Andy. Will fix this. On 5/15/24 3:11 AM, Andy Polyakov wrote: Hi, Couple of remarks inline. +# [1] https://www.openssl.org/~appro/cryptogams/ https://github.com/dot-asm/cryptogams/ is arguably better reference. +SYM_FUNC_START(x25519_fe51_mul) +.align 5 The goal is to align the label, not the first instruction after the directive. It's not a problem in this spot, in the beginning of the module that is, but further below it's likely to inject redundant nops between the label and meaningful code. But since the directive in question is not position-sensitive one can resolve this by changing the order of the directive and the SYM_FUNC_START macro. Cheers.
Re: [PATCH 1/3] crypto: X25519 low-level primitives for ppc64le.
Hi, +SYM_FUNC_START(x25519_fe51_sqr_times) ... + +.Lsqr_times_loop: ... + + std 9,16(3) + std 10,24(3) + std 11,32(3) + std 7,0(3) + std 8,8(3) + bdnz.Lsqr_times_loop I see no reason for why the stores can't be moved outside the loop in question. +SYM_FUNC_START(x25519_fe51_frombytes) +.align 5 + + li 12, -1 + srdi12, 12, 13 # 0x7 + + ld 5, 0(4) + ld 6, 8(4) + ld 7, 16(4) + ld 8, 24(4) Is there actual guarantee that the byte input is 64-bit aligned? While it is true that processor is obliged to handle misaligned loads and stores by the ISA specification, them being inefficient doesn't go against it. Most notably inefficiency is likely to be noted at the page boundaries. What I'm trying to say is that it would be more appropriate to avoid the unaligned loads (and stores). Cheers.
Re: [PATCH 2/3] crypto: X25519 core functions for ppc64le
Hi, +static void cswap(fe51 p, fe51 q, unsigned int bit) +{ + u64 t, i; + u64 c = 0 - (u64) bit; + + for (i = 0; i < 5; ++i) { + t = c & (p[i] ^ q[i]); + p[i] ^= t; + q[i] ^= t; + } +} The "c" in cswap stands for "constant-time," and the problem is that contemporary compilers have exhibited the ability to produce non-constant-time machine code as result of compilation of the above kind of technique. The outcome is platform-specific and ironically some of PPC code generators were observed to generate "most" non-constant-time code. "Most" in sense that execution time variations would be most easy to catch. One way to work around the problem, at least for the time being, is to add 'asm volatile("" : "+r"(c))' after you calculate 'c'. But there is no guarantee that the next compiler version won't see through it, hence the permanent solution is to do it in assembly. I can put together something... Cheers.
Re: [PATCH 1/3] crypto: X25519 low-level primitives for ppc64le.
Hi, Couple of remarks inline. +# [1] https://www.openssl.org/~appro/cryptogams/ https://github.com/dot-asm/cryptogams/ is arguably better reference. +SYM_FUNC_START(x25519_fe51_mul) +.align 5 The goal is to align the label, not the first instruction after the directive. It's not a problem in this spot, in the beginning of the module that is, but further below it's likely to inject redundant nops between the label and meaningful code. But since the directive in question is not position-sensitive one can resolve this by changing the order of the directive and the SYM_FUNC_START macro. Cheers.
Re: [PATCH v2 1/1] arch/fault: don't print logs for pte marker poison errors
On Wed, May 15, 2024 at 12:41:42PM +0200, Borislav Petkov wrote: > On Fri, May 10, 2024 at 11:29:26AM -0700, Axel Rasmussen wrote: > > @@ -3938,7 +3938,7 @@ static vm_fault_t handle_pte_marker(struct vm_fault > > *vmf) > > > > /* Higher priority than uffd-wp when data corrupted */ > > if (marker & PTE_MARKER_POISONED) > > - return VM_FAULT_HWPOISON; > > + return VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_SILENT; > > If you know here that this poisoning should be silent, why do you have > to make it all complicated and propagate it into arch code, waste > a separate VM_FAULT flag just for that instead of simply returning here > a VM_FAULT_COMPLETED or some other innocuous value which would stop > processing the fault? AFAIK, He only wants it to be silent wrt. the arch fault handler not screaming, but he still wants to be able to trigger force_sig_mceerr(). -- Oscar Salvador SUSE Labs
Re: [PATCH v2 1/1] arch/fault: don't print logs for pte marker poison errors
On Fri, May 10, 2024 at 11:29:26AM -0700, Axel Rasmussen wrote: > @@ -3938,7 +3938,7 @@ static vm_fault_t handle_pte_marker(struct vm_fault > *vmf) > > /* Higher priority than uffd-wp when data corrupted */ > if (marker & PTE_MARKER_POISONED) > - return VM_FAULT_HWPOISON; > + return VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_SILENT; If you know here that this poisoning should be silent, why do you have to make it all complicated and propagate it into arch code, waste a separate VM_FAULT flag just for that instead of simply returning here a VM_FAULT_COMPLETED or some other innocuous value which would stop processing the fault? -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH v15 00/16] Add audio support in v4l2 framework
On 15. 05. 24 12:19, Takashi Iwai wrote: On Wed, 15 May 2024 11:50:52 +0200, Jaroslav Kysela wrote: On 15. 05. 24 11:17, Hans Verkuil wrote: Hi Jaroslav, On 5/13/24 13:56, Jaroslav Kysela wrote: On 09. 05. 24 13:13, Jaroslav Kysela wrote: On 09. 05. 24 12:44, Shengjiu Wang wrote: mem2mem is just like the decoder in the compress pipeline. which is one of the components in the pipeline. I was thinking of loopback with endpoints using compress streams, without physical endpoint, something like: compress playback (to feed data from userspace) -> DSP (processing) -> compress capture (send data back to userspace) Unless I'm missing something, you should be able to process data as fast as you can feed it and consume it in such case. Actually in the beginning I tried this, but it did not work well. ALSA needs time control for playback and capture, playback and capture needs to synchronize. Usually the playback and capture pipeline is independent in ALSA design, but in this case, the playback and capture should synchronize, they are not independent. The core compress API core no strict timing constraints. You can eventually0 have two half-duplex compress devices, if you like to have really independent mechanism. If something is missing in API, you can extend this API (like to inform the user space that it's a producer/consumer processing without any relation to the real time). I like this idea. I was thinking more about this. If I am right, the mentioned use in gstreamer is supposed to run the conversion (DSP) job in "one shot" (can be handled using one system call like blocking ioctl). The goal is just to offload the CPU work to the DSP (co-processor). If there are no requirements for the queuing, we can implement this ioctl in the compress ALSA API easily using the data management through the dma-buf API. We can eventually define a new direction (enum snd_compr_direction) like SND_COMPRESS_CONVERT or so to allow handle this new data scheme. The API may be extended later on real demand, of course. Otherwise all pieces are already in the current ALSA compress API (capabilities, params, enumeration). The realtime controls may be created using ALSA control API. So does this mean that Shengjiu should attempt to use this ALSA approach first? I've not seen any argument to use v4l2 mem2mem buffer scheme for this data conversion forcefully. It looks like a simple job and ALSA APIs may be extended for this simple purpose. Shengjiu, what are your requirements for gstreamer support? Would be a new blocking ioctl enough for the initial support in the compress ALSA API? If it works with compress API, it'd be great, yeah. So, your idea is to open compress-offload devices for read and write, then and let them convert a la batch jobs without timing control? For full-duplex usages, we might need some more extensions, so that both read and write parameters can be synchronized. (So far the compress stream is a unidirectional, and the runtime buffer for a single stream.) And the buffer management is based on the fixed size fragments. I hope this doesn't matter much for the intended operation? It's a question, if the standard I/O is really required for this case. My quick idea was to just implement a new "direction" for this job supporting only one ioctl for the data processing which will execute the job in "one shot" at the moment. The I/O may be handled through dma-buf API (which seems to be standard nowadays for this purpose and allows future chaining). So something like: struct dsp_job { int source_fd; /* dma-buf FD with source data - for dma_buf_get() */ int target_fd; /* dma-buf FD for target data - for dma_buf_get() */ ... maybe some extra data size members here ... ... maybe some special parameters here ... }; #define SNDRV_COMPRESS_DSPJOB _IOWR('C', 0x60, struct dsp_job) This ioctl will be blocking (thus synced). My question is, if it's feasible for gstreamer or not. For this particular case, if the rate conversion is implemented in software, it will block the gstreamer data processing, too. Jaroslav -- Jaroslav Kysela Linux Sound Maintainer; ALSA Project; Red Hat, Inc.
Re: [PATCH v2 1/1] arch/fault: don't print logs for pte marker poison errors
On Tue, May 14, 2024 at 03:34:24PM -0600, Peter Xu wrote: > The question is whether we can't. > > Now we reserved a swp entry just for hwpoison and it makes sense only > because we cached the poisoned pfn inside. My long standing question is > why do we ever need that pfn after all. If we don't need the pfn, we > simply need a bit in the pgtable entry saying that it's poisoned, if > accessed we should kill the process using sigbus. > > I used to comment on this before, the only path that uses that pfn is > check_hwpoisoned_entry(), which was introduced in: > > commit a3f5d80ea401ac857f2910e28b15f35b2cf902f4 > Author: Naoya Horiguchi > Date: Mon Jun 28 19:43:14 2021 -0700 > > mm,hwpoison: send SIGBUS with error virutal address > > Now an action required MCE in already hwpoisoned address surely sends a > SIGBUS to current process, but the SIGBUS doesn't convey error virtual > address. That's not optimal for hwpoison-aware applications. > > To fix the issue, make memory_failure() call kill_accessing_process(), > that does pagetable walk to find the error virtual address. It could find > multiple virtual addresses for the same error page, and it seems hard to > tell which virtual address is correct one. But that's rare and sending > incorrect virtual address could be better than no address. So let's > report the first found virtual address for now. > > So this time I read more on this and Naoya explained why - it's only used > so far to dump the VA of the poisoned entry. Well, not really dumped, but we just pass the VA down the chain to the signal handler. But on the question whether we need the pfn encoded, I am not sure actually. check_hwpoisoned_entry() checks whether the pfn where the pte sits is the same as the hwpoisoned one, so we know if the process has to be killed. Now, could we get away with using pte_page() instead of pte_pfn() in there, and passing the hwpoisoned page instead ot the pfn? I am trying to think of the implications, then we should not need the encoded pfn? > However what confused me is, if an entry is poisoned already logically we > dump that message in the fault handler not memory_failure(), which is: > > MCE: Killing uffd-unit-tests:650 due to hardware memory corruption fault at > 7f3589d7e000 > > So perhaps we're trying to also dump that when the MCEs (points to the same > pfn) are only generated concurrently? I donno much on hwpoison so I cannot > tell, there's also implication where it's only triggered if > MF_ACTION_REQUIRED. But I think it means hwpoison may work without pfn > encoded, but I don't know the implication to lose that dmesg line. Not necesarily concurrently, but simply if for any reason the page could not have been unmapped. Let us say you have ProcessA and ProcessB mapping the same page. We get an MCE for that page but we could not unmapped it, at least not from all processes (maybe just ProcessA). memory-failure code will mark it as hwpoison, now ProcessA faults it in and gets killed because we see that the page is hwpoisoned in the fault path, so we sill send VM_FAULT_HWPOISON all the way down till you see the: "MCE: Killing uffd-unit-tests:650 due to hardware memory corruption fault at 7f3589d7e000" from e.g: arch/x86/mm/fault.c:do_sigbus() Now, ProcessB still has the page mapped, so upon re-accessing it, it will trigger a new MCE event. memory-failure code will see that this page has already been marked as hwpoisoned, but since we failed to unmap it (otherwise no one should be re-accessing it), it goes: "ok, let's just kill everybody who has this page mapped". > We used to not dump error for swapin error. Note that here what I am > saying is not that Axel is doing things wrong, but it's just that logically > swapin error (as pte marker) can also be with !QUIET, so my final point is > we may want to avoid having the assumption that "pte marker should always > be QUITE", because I want to make it clear that pte marker can used in any > form, so itself shouldn't imply anything.. I think it would make more sense if we have a separate marker for swapin errors? I mean, deep down, they do not mean the same as poison, right? Then you can choose which events get to be silent because you do not care, and which ones need to scream loud. I think swapin errors belong to the latter. At least I a heads-up why a process is getting killed is appreciated, IMHO. -- Oscar Salvador SUSE Labs
Re: [PATCH v15 00/16] Add audio support in v4l2 framework
On Wed, 15 May 2024 11:50:52 +0200, Jaroslav Kysela wrote: > > On 15. 05. 24 11:17, Hans Verkuil wrote: > > Hi Jaroslav, > > > > On 5/13/24 13:56, Jaroslav Kysela wrote: > >> On 09. 05. 24 13:13, Jaroslav Kysela wrote: > >>> On 09. 05. 24 12:44, Shengjiu Wang wrote: > >> mem2mem is just like the decoder in the compress pipeline. which is > >> one of the components in the pipeline. > > > > I was thinking of loopback with endpoints using compress streams, > > without physical endpoint, something like: > > > > compress playback (to feed data from userspace) -> DSP (processing) -> > > compress capture (send data back to userspace) > > > > Unless I'm missing something, you should be able to process data as fast > > as you can feed it and consume it in such case. > > > > Actually in the beginning I tried this, but it did not work well. > ALSA needs time control for playback and capture, playback and capture > needs to synchronize. Usually the playback and capture pipeline is > independent in ALSA design, but in this case, the playback and capture > should synchronize, they are not independent. > >>> > >>> The core compress API core no strict timing constraints. You can > >>> eventually0 > >>> have two half-duplex compress devices, if you like to have really > >>> independent > >>> mechanism. If something is missing in API, you can extend this API (like > >>> to > >>> inform the user space that it's a producer/consumer processing without any > >>> relation to the real time). I like this idea. > >> > >> I was thinking more about this. If I am right, the mentioned use in > >> gstreamer > >> is supposed to run the conversion (DSP) job in "one shot" (can be handled > >> using one system call like blocking ioctl). The goal is just to offload > >> the > >> CPU work to the DSP (co-processor). If there are no requirements for the > >> queuing, we can implement this ioctl in the compress ALSA API easily using > >> the > >> data management through the dma-buf API. We can eventually define a new > >> direction (enum snd_compr_direction) like SND_COMPRESS_CONVERT or so to > >> allow > >> handle this new data scheme. The API may be extended later on real demand, > >> of > >> course. > >> > >> Otherwise all pieces are already in the current ALSA compress API > >> (capabilities, params, enumeration). The realtime controls may be created > >> using ALSA control API. > > > > So does this mean that Shengjiu should attempt to use this ALSA approach > > first? > > I've not seen any argument to use v4l2 mem2mem buffer scheme for this > data conversion forcefully. It looks like a simple job and ALSA APIs > may be extended for this simple purpose. > > Shengjiu, what are your requirements for gstreamer support? Would be a > new blocking ioctl enough for the initial support in the compress ALSA > API? If it works with compress API, it'd be great, yeah. So, your idea is to open compress-offload devices for read and write, then and let them convert a la batch jobs without timing control? For full-duplex usages, we might need some more extensions, so that both read and write parameters can be synchronized. (So far the compress stream is a unidirectional, and the runtime buffer for a single stream.) And the buffer management is based on the fixed size fragments. I hope this doesn't matter much for the intended operation? thanks, Takashi
Re: [PATCH v15 00/16] Add audio support in v4l2 framework
On 15. 05. 24 11:17, Hans Verkuil wrote: Hi Jaroslav, On 5/13/24 13:56, Jaroslav Kysela wrote: On 09. 05. 24 13:13, Jaroslav Kysela wrote: On 09. 05. 24 12:44, Shengjiu Wang wrote: mem2mem is just like the decoder in the compress pipeline. which is one of the components in the pipeline. I was thinking of loopback with endpoints using compress streams, without physical endpoint, something like: compress playback (to feed data from userspace) -> DSP (processing) -> compress capture (send data back to userspace) Unless I'm missing something, you should be able to process data as fast as you can feed it and consume it in such case. Actually in the beginning I tried this, but it did not work well. ALSA needs time control for playback and capture, playback and capture needs to synchronize. Usually the playback and capture pipeline is independent in ALSA design, but in this case, the playback and capture should synchronize, they are not independent. The core compress API core no strict timing constraints. You can eventually0 have two half-duplex compress devices, if you like to have really independent mechanism. If something is missing in API, you can extend this API (like to inform the user space that it's a producer/consumer processing without any relation to the real time). I like this idea. I was thinking more about this. If I am right, the mentioned use in gstreamer is supposed to run the conversion (DSP) job in "one shot" (can be handled using one system call like blocking ioctl). The goal is just to offload the CPU work to the DSP (co-processor). If there are no requirements for the queuing, we can implement this ioctl in the compress ALSA API easily using the data management through the dma-buf API. We can eventually define a new direction (enum snd_compr_direction) like SND_COMPRESS_CONVERT or so to allow handle this new data scheme. The API may be extended later on real demand, of course. Otherwise all pieces are already in the current ALSA compress API (capabilities, params, enumeration). The realtime controls may be created using ALSA control API. So does this mean that Shengjiu should attempt to use this ALSA approach first? I've not seen any argument to use v4l2 mem2mem buffer scheme for this data conversion forcefully. It looks like a simple job and ALSA APIs may be extended for this simple purpose. Shengjiu, what are your requirements for gstreamer support? Would be a new blocking ioctl enough for the initial support in the compress ALSA API? Jaroslav -- Jaroslav Kysela Linux Sound Maintainer; ALSA Project; Red Hat, Inc.
Re: [PATCH v15 00/16] Add audio support in v4l2 framework
Hi Jaroslav, On 5/13/24 13:56, Jaroslav Kysela wrote: > On 09. 05. 24 13:13, Jaroslav Kysela wrote: >> On 09. 05. 24 12:44, Shengjiu Wang wrote: > mem2mem is just like the decoder in the compress pipeline. which is > one of the components in the pipeline. I was thinking of loopback with endpoints using compress streams, without physical endpoint, something like: compress playback (to feed data from userspace) -> DSP (processing) -> compress capture (send data back to userspace) Unless I'm missing something, you should be able to process data as fast as you can feed it and consume it in such case. >>> >>> Actually in the beginning I tried this, but it did not work well. >>> ALSA needs time control for playback and capture, playback and capture >>> needs to synchronize. Usually the playback and capture pipeline is >>> independent in ALSA design, but in this case, the playback and capture >>> should synchronize, they are not independent. >> >> The core compress API core no strict timing constraints. You can eventually0 >> have two half-duplex compress devices, if you like to have really independent >> mechanism. If something is missing in API, you can extend this API (like to >> inform the user space that it's a producer/consumer processing without any >> relation to the real time). I like this idea. > > I was thinking more about this. If I am right, the mentioned use in gstreamer > is supposed to run the conversion (DSP) job in "one shot" (can be handled > using one system call like blocking ioctl). The goal is just to offload the > CPU work to the DSP (co-processor). If there are no requirements for the > queuing, we can implement this ioctl in the compress ALSA API easily using > the > data management through the dma-buf API. We can eventually define a new > direction (enum snd_compr_direction) like SND_COMPRESS_CONVERT or so to allow > handle this new data scheme. The API may be extended later on real demand, of > course. > > Otherwise all pieces are already in the current ALSA compress API > (capabilities, params, enumeration). The realtime controls may be created > using ALSA control API. So does this mean that Shengjiu should attempt to use this ALSA approach first? If there is a way to do this reasonably cleanly in the ALSA API, then that obviously is much better from my perspective as a media maintainer. My understanding was always that it can't be done (or at least not without a major effort) in ALSA, and in that case V4L2 is a decent plan B, but based on this I gather that it is possible in ALSA after all. So can I shelf this patch series for now? Regards, Hans
linux-next: build warning after merge of the powerpc tree
Hi all, After merging the powerpc tree, today's (it may have been yesterday's) linux-next build (powerpc allyesconfig) produced this warning: WARNING: modpost: vmlinux: section mismatch in reference: fadump_setup_param_area+0x200 (section: .text.fadump_setup_param_area) -> memblock_phys_alloc_range (section: .init.text) Introduced by commit 683eab94da75 ("powerpc/fadump: setup additional parameters for dump capture kernel") -- Cheers, Stephen Rothwell pgpgHE8qJ2WgS.pgp Description: OpenPGP digital signature