Re: linux-next: build warning after merge of the powerpc tree

2024-05-15 Thread Stephen Rothwell
Hi Michael,

On Thu, 16 May 2024 12:42:01 +1000 Michael Ellerman  wrote:
>
> Stephen Rothwell  writes:
> >
> > After merging the powerpc tree, today's (it may have been yesterday's)
> > linux-next build (powerpc allyesconfig) produced this warning:
> >
> > WARNING: modpost: vmlinux: section mismatch in reference: 
> > fadump_setup_param_area+0x200 (section: .text.fadump_setup_param_area) -> 
> > memblock_phys_alloc_range (section: .init.text)  
> 
> I don't see the warning, but clearly it is possible if the compiler
> decides not to inline fadump_setup_param_area().
> 
> What compiler version are you using?

$ gcc --version
gcc (Debian 13.2.0-7) 13.2.0

(on zz1 if you want to test)
-- 
Cheers,
Stephen Rothwell


pgpxBGGLpHnkV.pgp
Description: OpenPGP digital signature


Re: [PATCH 1/3] crypto: X25519 low-level primitives for ppc64le.

2024-05-15 Thread Michael Ellerman
Hi Danny,

Danny Tsen  writes:
> Use the perl output of x25519-ppc64.pl from CRYPTOGAMs and added three
> supporting functions, x25519_fe51_sqr_times, x25519_fe51_frombytes
> and x25519_fe51_tobytes.

For other algorithms we have checked-in the perl script and generated
the code at runtime. Is there a reason you've done it differently this time?

> Signed-off-by: Danny Tsen 
> ---
>  arch/powerpc/crypto/curve25519-ppc64le_asm.S | 648 +++
>  1 file changed, 648 insertions(+)
>  create mode 100644 arch/powerpc/crypto/curve25519-ppc64le_asm.S
>
> diff --git a/arch/powerpc/crypto/curve25519-ppc64le_asm.S 
> b/arch/powerpc/crypto/curve25519-ppc64le_asm.S
> new file mode 100644
> index ..8a018104838a
> --- /dev/null
> +++ b/arch/powerpc/crypto/curve25519-ppc64le_asm.S
> @@ -0,0 +1,648 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +#
> +# Copyright 2024- IBM Corp.  All Rights Reserved.
 
I'm not a lawyer, but AFAIK "All Rights Reserved" is not required and
can be confusing - because we are not reserving all rights, we are
granting some rights under the GPL.

I also think the IBM copyright should be down below where your
modifications are described.

> +# This code is taken from CRYPTOGAMs[1] and is included here using the option
> +# in the license to distribute the code under the GPL. Therefore this program
> +# is free software; you can redistribute it and/or modify it under the terms 
> of
> +# the GNU General Public License version 2 as published by the Free Software
> +# Foundation.
> +#
> +# [1] https://www.openssl.org/~appro/cryptogams/
> +
> +# Copyright (c) 2006-2017, CRYPTOGAMS by 
> +# All rights reserved.
> +#
> +# Redistribution and use in source and binary forms, with or without
> +# modification, are permitted provided that the following conditions
> +# are met:
> +#
> +#   * Redistributions of source code must retain copyright notices,
> +# this list of conditions and the following disclaimer.
> +#
> +#   * Redistributions in binary form must reproduce the above
> +# copyright notice, this list of conditions and the following
> +# disclaimer in the documentation and/or other materials
> +# provided with the distribution.
> +#
> +#   * Neither the name of the CRYPTOGAMS nor the names of its
> +# copyright holder and contributors may be used to endorse or
> +# promote products derived from this software without specific
> +# prior written permission.
> +#
> +# ALTERNATIVELY, provided that this notice is retained in full, this
> +# product may be distributed under the terms of the GNU General Public
> +# License (GPL), in which case the provisions of the GPL apply INSTEAD OF
> +# those given above.
> +#
> +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER AND CONTRIBUTORS
> +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> +
> +# 
> +# Written by Andy Polyakov  for the OpenSSL
> +# project. The module is, however, dual licensed under OpenSSL and
> +# CRYPTOGAMS licenses depending on where you obtain it. For further
> +# details see https://www.openssl.org/~appro/cryptogams/.
> +# 
> +
> +#
> +# 
> +# Written and Modified by Danny Tsen 
> +# - Added x25519_fe51_sqr_times, x25519_fe51_frombytes, x25519_fe51_tobytes

ie. here.

> +# X25519 lower-level primitives for PPC64.
> +#
> +
> +#include 
> +
> +.machine "any"
 
Please don't add new .machine directives unless they are required.

> +.abiversion  2

I'd prefer that was left to the compiler flags.

cheers


Re: linux-next: build warning after merge of the powerpc tree

2024-05-15 Thread Michael Ellerman
Stephen Rothwell  writes:
> Hi all,
>
> After merging the powerpc tree, today's (it may have been yesterday's)
> linux-next build (powerpc allyesconfig) produced this warning:
>
> WARNING: modpost: vmlinux: section mismatch in reference: 
> fadump_setup_param_area+0x200 (section: .text.fadump_setup_param_area) -> 
> memblock_phys_alloc_range (section: .init.text)

I don't see the warning, but clearly it is possible if the compiler
decides not to inline fadump_setup_param_area().

What compiler version are you using?

cheers


Re: [PATCH v3] kprobe/ftrace: bail out if ftrace was killed

2024-05-15 Thread Google
On Wed, 15 May 2024 15:18:08 -0700
Stephen Brennan  wrote:

> Masami Hiramatsu (Google)  writes:
> > On Thu, 2 May 2024 01:35:16 +0800
> > Guo Ren  wrote:
> >
> >> On Thu, May 2, 2024 at 12:30 AM Stephen Brennan
> >>  wrote:
> >> >
> >> > If an error happens in ftrace, ftrace_kill() will prevent disarming
> >> > kprobes. Eventually, the ftrace_ops associated with the kprobes will be
> >> > freed, yet the kprobes will still be active, and when triggered, they
> >> > will use the freed memory, likely resulting in a page fault and panic.
> >> >
> >> > This behavior can be reproduced quite easily, by creating a kprobe and
> >> > then triggering a ftrace_kill(). For simplicity, we can simulate an
> >> > ftrace error with a kernel module like [1]:
> >> >
> >> > [1]: https://github.com/brenns10/kernel_stuff/tree/master/ftrace_killer
> >> >
> >> >   sudo perf probe --add commit_creds
> >> >   sudo perf trace -e probe:commit_creds
> >> >   # In another terminal
> >> >   make
> >> >   sudo insmod ftrace_killer.ko  # calls ftrace_kill(), simulating bug
> >> >   # Back to perf terminal
> >> >   # ctrl-c
> >> >   sudo perf probe --del commit_creds
> >> >
> >> > After a short period, a page fault and panic would occur as the kprobe
> >> > continues to execute and uses the freed ftrace_ops. While ftrace_kill()
> >> > is supposed to be used only in extreme circumstances, it is invoked in
> >> > FTRACE_WARN_ON() and so there are many places where an unexpected bug
> >> > could be triggered, yet the system may continue operating, possibly
> >> > without the administrator noticing. If ftrace_kill() does not panic the
> >> > system, then we should do everything we can to continue operating,
> >> > rather than leave a ticking time bomb.
> >> >
> >> > Signed-off-by: Stephen Brennan 
> >> > ---
> >> > Changes in v3:
> >> >   Don't expose ftrace_is_dead(). Create a "kprobe_ftrace_disabled"
> >> >   variable and check it directly in the kprobe handlers.
> >> > Link to v1/v2 discussion:
> >> >   
> >> > https://lore.kernel.org/all/20240426225834.993353-1-stephen.s.bren...@oracle.com/
> >> >
> >> >  arch/csky/kernel/probes/ftrace.c | 3 +++
> >> >  arch/loongarch/kernel/ftrace_dyn.c   | 3 +++
> >> >  arch/parisc/kernel/ftrace.c  | 3 +++
> >> >  arch/powerpc/kernel/kprobes-ftrace.c | 3 +++
> >> >  arch/riscv/kernel/probes/ftrace.c| 3 +++
> >> >  arch/s390/kernel/ftrace.c| 3 +++
> >> >  arch/x86/kernel/kprobes/ftrace.c | 3 +++
> >> >  include/linux/kprobes.h  | 7 +++
> >> >  kernel/kprobes.c | 6 ++
> >> >  kernel/trace/ftrace.c| 1 +
> >> >  10 files changed, 35 insertions(+)
> >> >
> >> > diff --git a/arch/csky/kernel/probes/ftrace.c 
> >> > b/arch/csky/kernel/probes/ftrace.c
> >> > index 834cffcfbce3..7ba4b98076de 100644
> >> > --- a/arch/csky/kernel/probes/ftrace.c
> >> > +++ b/arch/csky/kernel/probes/ftrace.c
> >> > @@ -12,6 +12,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned 
> >> > long parent_ip,
> >> > struct kprobe_ctlblk *kcb;
> >> > struct pt_regs *regs;
> >> >
> >> > +   if (unlikely(kprobe_ftrace_disabled))
> >> > +   return;
> >> > +
> >> For csky part.
> >> Acked-by: Guo Ren 
> >
> > Thanks Stephen, Guo and Steve!
> >
> > Let me pick this to probes/for-next!
> 
> Thank you Masami!
> 
> I did want to check, is this the correct git tree to be watching?
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git/log/?h=probes/for-next
> 
> ( I'm not trying to pressure on timing, as I know the merge window is
>   hectic. Just making sure I'm watching the correct place! )

Sorry, I forgot to push it from my local tree. Now it should be there.

Thanks,

-- 
Masami Hiramatsu (Google) 


Re: [PATCH v3] kprobe/ftrace: bail out if ftrace was killed

2024-05-15 Thread Stephen Brennan
Masami Hiramatsu (Google)  writes:
> On Thu, 2 May 2024 01:35:16 +0800
> Guo Ren  wrote:
>
>> On Thu, May 2, 2024 at 12:30 AM Stephen Brennan
>>  wrote:
>> >
>> > If an error happens in ftrace, ftrace_kill() will prevent disarming
>> > kprobes. Eventually, the ftrace_ops associated with the kprobes will be
>> > freed, yet the kprobes will still be active, and when triggered, they
>> > will use the freed memory, likely resulting in a page fault and panic.
>> >
>> > This behavior can be reproduced quite easily, by creating a kprobe and
>> > then triggering a ftrace_kill(). For simplicity, we can simulate an
>> > ftrace error with a kernel module like [1]:
>> >
>> > [1]: https://github.com/brenns10/kernel_stuff/tree/master/ftrace_killer
>> >
>> >   sudo perf probe --add commit_creds
>> >   sudo perf trace -e probe:commit_creds
>> >   # In another terminal
>> >   make
>> >   sudo insmod ftrace_killer.ko  # calls ftrace_kill(), simulating bug
>> >   # Back to perf terminal
>> >   # ctrl-c
>> >   sudo perf probe --del commit_creds
>> >
>> > After a short period, a page fault and panic would occur as the kprobe
>> > continues to execute and uses the freed ftrace_ops. While ftrace_kill()
>> > is supposed to be used only in extreme circumstances, it is invoked in
>> > FTRACE_WARN_ON() and so there are many places where an unexpected bug
>> > could be triggered, yet the system may continue operating, possibly
>> > without the administrator noticing. If ftrace_kill() does not panic the
>> > system, then we should do everything we can to continue operating,
>> > rather than leave a ticking time bomb.
>> >
>> > Signed-off-by: Stephen Brennan 
>> > ---
>> > Changes in v3:
>> >   Don't expose ftrace_is_dead(). Create a "kprobe_ftrace_disabled"
>> >   variable and check it directly in the kprobe handlers.
>> > Link to v1/v2 discussion:
>> >   
>> > https://lore.kernel.org/all/20240426225834.993353-1-stephen.s.bren...@oracle.com/
>> >
>> >  arch/csky/kernel/probes/ftrace.c | 3 +++
>> >  arch/loongarch/kernel/ftrace_dyn.c   | 3 +++
>> >  arch/parisc/kernel/ftrace.c  | 3 +++
>> >  arch/powerpc/kernel/kprobes-ftrace.c | 3 +++
>> >  arch/riscv/kernel/probes/ftrace.c| 3 +++
>> >  arch/s390/kernel/ftrace.c| 3 +++
>> >  arch/x86/kernel/kprobes/ftrace.c | 3 +++
>> >  include/linux/kprobes.h  | 7 +++
>> >  kernel/kprobes.c | 6 ++
>> >  kernel/trace/ftrace.c| 1 +
>> >  10 files changed, 35 insertions(+)
>> >
>> > diff --git a/arch/csky/kernel/probes/ftrace.c 
>> > b/arch/csky/kernel/probes/ftrace.c
>> > index 834cffcfbce3..7ba4b98076de 100644
>> > --- a/arch/csky/kernel/probes/ftrace.c
>> > +++ b/arch/csky/kernel/probes/ftrace.c
>> > @@ -12,6 +12,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned 
>> > long parent_ip,
>> > struct kprobe_ctlblk *kcb;
>> > struct pt_regs *regs;
>> >
>> > +   if (unlikely(kprobe_ftrace_disabled))
>> > +   return;
>> > +
>> For csky part.
>> Acked-by: Guo Ren 
>
> Thanks Stephen, Guo and Steve!
>
> Let me pick this to probes/for-next!

Thank you Masami!

I did want to check, is this the correct git tree to be watching?

https://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git/log/?h=probes/for-next

( I'm not trying to pressure on timing, as I know the merge window is
  hectic. Just making sure I'm watching the correct place! )

Thanks,
Stephen


Re: kswapd0: page allocation failure: order:0, mode:0x820(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0 (Kernel v6.5.9, 32bit ppc)

2024-05-15 Thread Yu Zhao
On Wed, May 15, 2024 at 2:45 PM Erhard Furtner  wrote:
>
> On Wed, 8 May 2024 20:21:11 +0200
> Erhard Furtner  wrote:
>
> > Greetings!
> >
> > Got that on my dual CPU PowerMac G4 DP shortly after boot. This does not 
> > happen every time at bootup though:
> >
> > [...]
> > kswapd0: page allocation failure: order:0, mode:0x820(GFP_ATOMIC), 
> > nodemask=(null),cpuset=/,mems_allowed=0
> > CPU: 1 PID: 40 Comm: kswapd0 Not tainted 6.8.9-gentoo-PMacG4 #1
> > Hardware name: PowerMac3,6 7455 0x80010303 PowerMac
>
> Very similar page allocation failure on the same machine on kernel 6.9.0 too. 
> Seems it can easily be provoked by running a memory stressor, e.g. "stress-ng 
> --vm 2 --vm-bytes 1930M --verify -v":
>
> [...]
> kswapd0: page allocation failure: order:0, mode:0xcc0(GFP_KERNEL), 
> nodemask=(null),cpuset=/,mems_allowed=0
> CPU: 0 PID: 41 Comm: kswapd0 Not tainted 6.9.0-gentoo-PMacG4 #1
> Hardware name: PowerMac3,6 7455 0x80010303 PowerMac
> Call Trace:
> [c1c65940] [c07926d4] dump_stack_lvl+0x80/0xac (unreliable)
> [c1c65960] [c01b6234] warn_alloc+0x100/0x178
> [c1c659c0] [c01b661c] __alloc_pages+0x370/0x8d0
> [c1c65a80] [c01c4854] __read_swap_cache_async+0xc0/0x1cc
> [c1c65ad0] [c01cb580] zswap_writeback_entry+0x50/0x154
> [c1c65be0] [c01cb6f4] shrink_memcg_cb+0x70/0xec
> [c1c65c10] [c019518c] __list_lru_walk_one+0xa0/0x154
> [c1c65c70] [c01952a4] list_lru_walk_one+0x64/0x7c
> [c1c65ca0] [c01cad00] zswap_shrinker_scan+0xac/0xc4
> [c1c65cd0] [c018052c] do_shrink_slab+0x18c/0x304
> [c1c65d20] [c0180a40] shrink_slab+0x174/0x260
> [c1c65da0] [c017cb0c] shrink_one+0xbc/0x134
> [c1c65dd0] [c017e3e4] shrink_node+0x238/0x84c
> [c1c65e50] [c017ed38] balance_pgdat+0x340/0x650
> [c1c65f50] [c017f270] kswapd+0x228/0x25c
> [c1c65fc0] [c006bbac] kthread+0xe4/0xe8
> [c1c65ff0] [c0015304] start_kernel_thread+0x10/0x14
> SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
>   cache: skbuff_small_head, object size: 480, buffer size: 544, default 
> order: 1, min order: 0
>   node 0: slabs: 15, objs: 225, free: 0
> SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
>   cache: skbuff_small_head, object size: 480, buffer size: 544, default 
> order: 1, min order: 0
>   node 0: slabs: 15, objs: 225, free: 0
> SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
>   cache: skbuff_small_head, object size: 480, buffer size: 544, default 
> order: 1, min order: 0
>   node 0: slabs: 15, objs: 225, free: 0
> SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
>   cache: skbuff_small_head, object size: 480, buffer size: 544, default 
> order: 1, min order: 0
>   node 0: slabs: 15, objs: 225, free: 0
> SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
>   cache: skbuff_small_head, object size: 480, buffer size: 544, default 
> order: 1, min order: 0
>   node 0: slabs: 15, objs: 225, free: 0
> SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
>   cache: kmalloc-rnd-15-2k, object size: 2048, buffer size: 6144, default 
> order: 3, min order: 1
>   kmalloc-rnd-15-2k debugging increased min order, use slab_debug=O to 
> disable.
>   node 0: slabs: 33, objs: 165, free: 0
> SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
>   cache: skbuff_small_head, object size: 480, buffer size: 544, default 
> order: 1, min order: 0
>   node 0: slabs: 15, objs: 225, free: 0
> SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
>   cache: kmalloc-rnd-15-2k, object size: 2048, buffer size: 6144, default 
> order: 3, min order: 1
>   kmalloc-rnd-15-2k debugging increased min order, use slab_debug=O to 
> disable.
>   node 0: slabs: 33, objs: 165, free: 0
> SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
>   cache: skbuff_small_head, object size: 480, buffer size: 544, default 
> order: 1, min order: 0
>   node 0: slabs: 15, objs: 225, free: 0
> SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
>   cache: kmalloc-rnd-15-2k, object size: 2048, buffer size: 6144, default 
> order: 3, min order: 1
>   kmalloc-rnd-15-2k debugging increased min order, use slab_debug=O to 
> disable.
>   node 0: slabs: 33, objs: 165, free: 0
> Mem-Info:
> active_anon:340071 inactive_anon:139179 isolated_anon:0
>  active_file:8297 inactive_file:2506 isolated_file:0
>  unevictable:4 dirty:1 writeback:18
>  slab_reclaimable:1377 slab_unreclaimable:7426
>  mapped:6804 shmem:112 pagetables:946
>  sec_pagetables:0 bounce:0
>  kernel_misc_reclaimable:0
>  free:1141 free_pcp:7 free_cma:0
> Node 0 active_anon:1360284kB inactive_anon:556716kB active_file:33188kB 
> inactive_file:10024kB unevictable:16kB isolated(anon):0kB isolated(file):0kB 
> mapped:27216kB dirty:4kB writeback:72kB shmem:448kB writeback_tmp:0kB 
> kernel_stack:1560kB pagetables:3784kB sec_pagetables:0kB all_unreclaimable? no
> DMA free:56kB boost:7756kB min:11208kB low:12068kB high:12928kB 
> reserved_highatomic:0KB active_anon:635128kB inactive_anon:58260kB 
> 

Re: [PATCH v15 00/16] Add audio support in v4l2 framework

2024-05-15 Thread Nicolas Dufresne
Hi,

GStreamer hat on ...

Le mercredi 15 mai 2024 à 12:46 +0200, Jaroslav Kysela a écrit :
> On 15. 05. 24 12:19, Takashi Iwai wrote:
> > On Wed, 15 May 2024 11:50:52 +0200,
> > Jaroslav Kysela wrote:
> > > 
> > > On 15. 05. 24 11:17, Hans Verkuil wrote:
> > > > Hi Jaroslav,
> > > > 
> > > > On 5/13/24 13:56, Jaroslav Kysela wrote:
> > > > > On 09. 05. 24 13:13, Jaroslav Kysela wrote:
> > > > > > On 09. 05. 24 12:44, Shengjiu Wang wrote:
> > > > > > > > > mem2mem is just like the decoder in the compress pipeline. 
> > > > > > > > > which is
> > > > > > > > > one of the components in the pipeline.
> > > > > > > > 
> > > > > > > > I was thinking of loopback with endpoints using compress 
> > > > > > > > streams,
> > > > > > > > without physical endpoint, something like:
> > > > > > > > 
> > > > > > > > compress playback (to feed data from userspace) -> DSP 
> > > > > > > > (processing) ->
> > > > > > > > compress capture (send data back to userspace)
> > > > > > > > 
> > > > > > > > Unless I'm missing something, you should be able to process 
> > > > > > > > data as fast
> > > > > > > > as you can feed it and consume it in such case.
> > > > > > > > 
> > > > > > > 
> > > > > > > Actually in the beginning I tried this,  but it did not work well.
> > > > > > > ALSA needs time control for playback and capture, playback and 
> > > > > > > capture
> > > > > > > needs to synchronize.  Usually the playback and capture pipeline 
> > > > > > > is
> > > > > > > independent in ALSA design,  but in this case, the playback and 
> > > > > > > capture
> > > > > > > should synchronize, they are not independent.
> > > > > > 
> > > > > > The core compress API core no strict timing constraints. You can 
> > > > > > eventually0
> > > > > > have two half-duplex compress devices, if you like to have really 
> > > > > > independent
> > > > > > mechanism. If something is missing in API, you can extend this API 
> > > > > > (like to
> > > > > > inform the user space that it's a producer/consumer processing 
> > > > > > without any
> > > > > > relation to the real time). I like this idea.
> > > > > 
> > > > > I was thinking more about this. If I am right, the mentioned use in 
> > > > > gstreamer
> > > > > is supposed to run the conversion (DSP) job in "one shot" (can be 
> > > > > handled
> > > > > using one system call like blocking ioctl).  The goal is just to 
> > > > > offload the
> > > > > CPU work to the DSP (co-processor). If there are no requirements for 
> > > > > the
> > > > > queuing, we can implement this ioctl in the compress ALSA API easily 
> > > > > using the
> > > > > data management through the dma-buf API. We can eventually define a 
> > > > > new
> > > > > direction (enum snd_compr_direction) like SND_COMPRESS_CONVERT or so 
> > > > > to allow
> > > > > handle this new data scheme. The API may be extended later on real 
> > > > > demand, of
> > > > > course.
> > > > > 
> > > > > Otherwise all pieces are already in the current ALSA compress API
> > > > > (capabilities, params, enumeration). The realtime controls may be 
> > > > > created
> > > > > using ALSA control API.
> > > > 
> > > > So does this mean that Shengjiu should attempt to use this ALSA 
> > > > approach first?
> > > 
> > > I've not seen any argument to use v4l2 mem2mem buffer scheme for this
> > > data conversion forcefully. It looks like a simple job and ALSA APIs
> > > may be extended for this simple purpose.
> > > 
> > > Shengjiu, what are your requirements for gstreamer support? Would be a
> > > new blocking ioctl enough for the initial support in the compress ALSA
> > > API?
> > 
> > If it works with compress API, it'd be great, yeah.
> > So, your idea is to open compress-offload devices for read and write,
> > then and let them convert a la batch jobs without timing control?
> > 
> > For full-duplex usages, we might need some more extensions, so that
> > both read and write parameters can be synchronized.  (So far the
> > compress stream is a unidirectional, and the runtime buffer for a
> > single stream.)
> > 
> > And the buffer management is based on the fixed size fragments.  I
> > hope this doesn't matter much for the intended operation?
> 
> It's a question, if the standard I/O is really required for this case. My 
> quick idea was to just implement a new "direction" for this job supporting 
> only one ioctl for the data processing which will execute the job in "one 
> shot" at the moment. The I/O may be handled through dma-buf API (which seems 
> to be standard nowadays for this purpose and allows future chaining).
> 
> So something like:
> 
> struct dsp_job {
> int source_fd; /* dma-buf FD with source data - for dma_buf_get() */
> int target_fd; /* dma-buf FD for target data - for dma_buf_get() */
> ... maybe some extra data size members here ...
> ... maybe some special parameters here ...
> };
> 
> #define SNDRV_COMPRESS_DSPJOB _IOWR('C', 0x60, struct dsp_job)
> 
> This ioctl will be blocking (thus synced). 

Re: [PATCH v2 1/1] arch/fault: don't print logs for pte marker poison errors

2024-05-15 Thread Borislav Petkov
On Wed, May 15, 2024 at 12:19:16PM -0700, Axel Rasmussen wrote:
> An unprivileged process can allocate a VMA, use the userfaultfd API to
> install one of these PTE markers, and then register a no-op SIGBUS
> handler. Now it can access that address in a tight loop,

Maybe the userfaultfd should not allow this, I dunno. You made me look
at this thing and to me it all sounds weird. One thread does page fault
handling for the other and that helps with live migration somehow. OMG,
whaaat?

Maybe I don't understand it and probably never will...

But, for example, membarrier used do to a stupid thing of allowing one
thread to hammer another with an IPI storm. Bad bad idea. So it got
fixed.

All I'm saying is, if unprivileged processes can do crap, they should be
prevented from doing crap. Like ratelimiting the pagefaults or whatnot.

One of the recovery action strategies from memory poison is, well, you
kill the process. If you can detect the hammering process which
installed that page marker, you kill it. Problem solved.

But again, this userfaultfd thing sounds really weird so I could very
well be way wrong.

> Even in a non-contrived / non-malicious case, use of this API could
> have similar effects. If nothing else, the log message can be
> confusing to administrators: they state that an MCE occurred, whereas
> with the simulated poison API, this is not the case; it isn't a "real"
> MCE / hardware error.

Yeah, I read that part in

Documentation/admin-guide/mm/userfaultfd.rst

Simulated poison huh? Another WTF.

> In the KVM use case, the host can't just allocate a new page, because
> it doesn't know what the guest might have had stored there. Best we

Ok, let's think of real hw poison.

When doing the recovery, you don't care what's stored there because as
far as the hardware is concerned, if you consume that poison the *whole*
machine might go down.

So you lose the page. Plain and simple. And the guest can go visit the
bureau of complaints and grievances.

Still better than killing the guest or even the whole host with other
guests running on it.

> can do is propagate the poison into the guest, and let the guest OS
> deal with it as it sees fit, and mark the page poisoned on the host.

You mark the page as poison on the host and you yank it from under the
guest. That physical frame is gone and the faster all the actors
involved understand that, the better.

> I don't disagree the guest *shouldn't* reaccess it in this case. :)
> But if it did, it should get another poison event just as you say.

Yes, it shouldn't. Look at memory_failure(). This will kill whole
processes if it has to, depending on what the page is used for.

> And, live migration between physical hosts should be transparent to
> the guest. So if the guest gets a poison, and then we live migrate it,

So if I were to design this, I'd do it this way:

0. guest gets hw poison injected

1. it runs memory_failure() and it kills the processes using the page.

2. page is marked poisoned on the host so no other guest gets it.

That's it. No second accesses whatsoever. At least this is how it works
on baremetal.

This hw poisoning emulation is just silly and unnecessary.

But again, I probably am missing some aspects. It all just sounded
really weird to me that's why I thought I should ask what's behind all
that.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH v2 1/1] arch/fault: don't print logs for pte marker poison errors

2024-05-15 Thread Axel Rasmussen
On Wed, May 15, 2024 at 11:33 AM Borislav Petkov  wrote:
>
> On Wed, May 15, 2024 at 10:33:03AM -0700, Axel Rasmussen wrote:
> > Right, the goal is to still have the process get a SIGBUS, but to
> > avoid the "MCE error" log message. The basic issue is, unprivileged
> > users can set these markers up, and thereby completely spam up the
> > log.
>
> What is the real attack scenario you want to protect against?
>
> Or is this something hypothetical?

An unprivileged process can allocate a VMA, use the userfaultfd API to
install one of these PTE markers, and then register a no-op SIGBUS
handler. Now it can access that address in a tight loop, generating a
huge number of kernel log messages. This can e.g. bog down the system,
or at least drown out other important log messages.

For example the userfaultfd selftest does something similar to this to
test that the API works properly. :)

Even in a non-contrived / non-malicious case, use of this API could
have similar effects. If nothing else, the log message can be
confusing to administrators: they state that an MCE occurred, whereas
with the simulated poison API, this is not the case; it isn't a "real"
MCE / hardware error.

>
> > That said, one thing I'm not sure about is whether or not
> > VM_FAULT_SIGBUS is a viable alternative (returned for a new PTE marker
> > type specific to simulated poison). The goal of the simulated poison
> > feature is to "closely simulate" a real hardware poison event. If you
> > live migrate a VM from a host with real poisoned memory, to a new
> > host: you'd want to keep the same behavior if the guest accessed those
> > addresses again, so as not to confuse the guest about why it suddenly
> > became "un-poisoned".
>
> Well, the recovery action is to poison the page and the process should
> be resilient enough and allocate a new, clean page which doesn't trigger
> hw poison hopefully, if possible.
>
> It doesn't make a whole lotta sense if poison "remains". Hardware poison
> you don't want to touch a second time either - otherwise you might
> consume that poison and die.

In the KVM use case, the host can't just allocate a new page, because
it doesn't know what the guest might have had stored there. Best we
can do is propagate the poison into the guest, and let the guest OS
deal with it as it sees fit, and mark the page poisoned on the host. I
don't disagree the guest *shouldn't* reaccess it in this case. :) But
if it did, it should get another poison event just as you say.

And, live migration between physical hosts should be transparent to
the guest. So if the guest gets a poison, and then we live migrate it,
and then it accesses that address again, it should likewise get
another poison event, just as before. Even though the underlying
physical memory is *not* poisoned on the new host machine.

So the use case is, after live migration, we install one of these PTE
markers to simulate a poison event in case the address is accessed
again. But since it isn't *really* a hardware error on the new host,
no reason to spam the host kernel log when / if this occurs.

>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH v2 1/1] arch/fault: don't print logs for pte marker poison errors

2024-05-15 Thread Borislav Petkov
On Wed, May 15, 2024 at 10:33:03AM -0700, Axel Rasmussen wrote:
> Right, the goal is to still have the process get a SIGBUS, but to
> avoid the "MCE error" log message. The basic issue is, unprivileged
> users can set these markers up, and thereby completely spam up the
> log.

What is the real attack scenario you want to protect against?

Or is this something hypothetical?

> That said, one thing I'm not sure about is whether or not
> VM_FAULT_SIGBUS is a viable alternative (returned for a new PTE marker
> type specific to simulated poison). The goal of the simulated poison
> feature is to "closely simulate" a real hardware poison event. If you
> live migrate a VM from a host with real poisoned memory, to a new
> host: you'd want to keep the same behavior if the guest accessed those
> addresses again, so as not to confuse the guest about why it suddenly
> became "un-poisoned".

Well, the recovery action is to poison the page and the process should
be resilient enough and allocate a new, clean page which doesn't trigger
hw poison hopefully, if possible.

It doesn't make a whole lotta sense if poison "remains". Hardware poison
you don't want to touch a second time either - otherwise you might
consume that poison and die.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH v2 1/1] arch/fault: don't print logs for pte marker poison errors

2024-05-15 Thread Axel Rasmussen
On Wed, May 15, 2024 at 3:54 AM Oscar Salvador  wrote:
>
> On Wed, May 15, 2024 at 12:41:42PM +0200, Borislav Petkov wrote:
> > On Fri, May 10, 2024 at 11:29:26AM -0700, Axel Rasmussen wrote:
> > > @@ -3938,7 +3938,7 @@ static vm_fault_t handle_pte_marker(struct vm_fault 
> > > *vmf)
> > >
> > > /* Higher priority than uffd-wp when data corrupted */
> > > if (marker & PTE_MARKER_POISONED)
> > > -   return VM_FAULT_HWPOISON;
> > > +   return VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_SILENT;
> >
> > If you know here that this poisoning should be silent, why do you have
> > to make it all complicated and propagate it into arch code, waste
> > a separate VM_FAULT flag just for that instead of simply returning here
> > a VM_FAULT_COMPLETED or some other innocuous value which would stop
> > processing the fault?
>
> AFAIK, He only wants it to be silent wrt. the arch fault handler not 
> screaming,
> but he still wants to be able to trigger force_sig_mceerr().

Right, the goal is to still have the process get a SIGBUS, but to
avoid the "MCE error" log message. The basic issue is, unprivileged
users can set these markers up, and thereby completely spam up the
log.

Also since this is a process-specific thing, and it's not a real
hardware poison event, it's unclear system admins care at all at a
global level (this is why we didn't want to switch to just
printk_ratelimited for example). Better to let the process handle the
SIGBUS however it likes for its use case (logging a message elsewhere,
etc.).

That said, one thing I'm not sure about is whether or not
VM_FAULT_SIGBUS is a viable alternative (returned for a new PTE marker
type specific to simulated poison). The goal of the simulated poison
feature is to "closely simulate" a real hardware poison event. If you
live migrate a VM from a host with real poisoned memory, to a new
host: you'd want to keep the same behavior if the guest accessed those
addresses again, so as not to confuse the guest about why it suddenly
became "un-poisoned". At a basic level I think VM_FAULT_SIGBUS gives
us what we want (send SIGBUS to the process, don't log about MCEs),
but I'm not confident I know all the differences vs. VM_FAULT_HWPOISON
on all the arches.

>
>
> --
> Oscar Salvador
> SUSE Labs


Re: [PATCH 2/3] crypto: X25519 core functions for ppc64le

2024-05-15 Thread Andy Polyakov
Thanks for the info.  I should be able to do it.  I was hoping an 
assembly guru like you can show me some tricks here if there is :)


No tricks in cswap, it's as straightforward as it gets, so go ahead :-)



Re: [PATCH v15 00/16] Add audio support in v4l2 framework

2024-05-15 Thread Pierre-Louis Bossart



On 5/9/24 06:13, Jaroslav Kysela wrote:
> On 09. 05. 24 12:44, Shengjiu Wang wrote:
 mem2mem is just like the decoder in the compress pipeline. which is
 one of the components in the pipeline.
>>>
>>> I was thinking of loopback with endpoints using compress streams,
>>> without physical endpoint, something like:
>>>
>>> compress playback (to feed data from userspace) -> DSP (processing) ->
>>> compress capture (send data back to userspace)
>>>
>>> Unless I'm missing something, you should be able to process data as fast
>>> as you can feed it and consume it in such case.
>>>
>>
>> Actually in the beginning I tried this,  but it did not work well.
>> ALSA needs time control for playback and capture, playback and capture
>> needs to synchronize.  Usually the playback and capture pipeline is
>> independent in ALSA design,  but in this case, the playback and capture
>> should synchronize, they are not independent.
> 
> The core compress API core no strict timing constraints. You can
> eventually0 have two half-duplex compress devices, if you like to have
> really independent mechanism. If something is missing in API, you can
> extend this API (like to inform the user space that it's a
> producer/consumer processing without any relation to the real time). I
> like this idea.

The compress API was never intended to be used this way. It was meant to
send compressed data to a DSP for rendering, and keep the host processor
in a low-power state while the DSP local buffer was drained. There was
no intent to do a loop back to the host, because that keeps the host in
a high-power state and probably negates the power savings due to a DSP.

The other problem with the loopback is that the compress stuff is
usually a "Front-End" in ASoC/DPCM parlance, and we don't have a good
way to do a loopback between Front-Ends. The entire framework is based
on FEs being connected to BEs.

One problem that I can see for ASRC is that it's not clear when the data
will be completely processed on the "capture" stream when you stop the
"playback" stream. There's a non-zero risk of having a truncated output
or waiting for data that will never be generated.

In other words, it might be possible to reuse/extend the compress API
for a 'coprocessor' approach without any rendering to traditional
interfaces, but it's uncharted territory.


[PATCHv4 3/9] ASoC: fsl-asoc-card: add compatibility to use 2 codecs in dai-links

2024-05-15 Thread Elinor Montmasson
Adapt the driver to work with configurations using two codecs or more.
Modify fsl_asoc_card_probe() to handle use cases where 2 codecs are
given in the device tree.
This will be needed for the generic codec case.

Use cases using one codec will ignore any given codecs other than the
first.

Signed-off-by: Elinor Montmasson 
Co-authored-by: Philip-Dylan Gleonec 
---
 sound/soc/fsl/fsl-asoc-card.c | 239 --
 1 file changed, 139 insertions(+), 100 deletions(-)

diff --git a/sound/soc/fsl/fsl-asoc-card.c b/sound/soc/fsl/fsl-asoc-card.c
index c83492e7cec2..620a25eb068a 100644
--- a/sound/soc/fsl/fsl-asoc-card.c
+++ b/sound/soc/fsl/fsl-asoc-card.c
@@ -99,7 +99,7 @@ struct fsl_asoc_card_priv {
struct simple_util_jack hp_jack;
struct simple_util_jack mic_jack;
struct platform_device *pdev;
-   struct codec_priv codec_priv;
+   struct codec_priv codec_priv[2];
struct cpu_priv cpu_priv;
struct snd_soc_card card;
u8 streams;
@@ -172,11 +172,13 @@ static int fsl_asoc_card_hw_params(struct 
snd_pcm_substream *substream,
struct snd_soc_pcm_runtime *rtd = snd_soc_substream_to_rtd(substream);
struct fsl_asoc_card_priv *priv = snd_soc_card_get_drvdata(rtd->card);
bool tx = substream->stream == SNDRV_PCM_STREAM_PLAYBACK;
-   struct codec_priv *codec_priv = >codec_priv;
+   struct codec_priv *codec_priv;
+   struct snd_soc_dai *codec_dai;
struct cpu_priv *cpu_priv = >cpu_priv;
struct device *dev = rtd->card->dev;
unsigned int pll_out;
int ret;
+   int i;
 
priv->sample_rate = params_rate(params);
priv->sample_format = params_format(params);
@@ -208,28 +210,32 @@ static int fsl_asoc_card_hw_params(struct 
snd_pcm_substream *substream,
}
 
/* Specific configuration for PLL */
-   if (codec_priv->pll_id >= 0 && codec_priv->fll_id >= 0) {
-   if (priv->sample_format == SNDRV_PCM_FORMAT_S24_LE)
-   pll_out = priv->sample_rate * 384;
-   else
-   pll_out = priv->sample_rate * 256;
+   for_each_rtd_codec_dais(rtd, i, codec_dai) {
+   codec_priv = >codec_priv[i];
 
-   ret = snd_soc_dai_set_pll(snd_soc_rtd_to_codec(rtd, 0),
- codec_priv->pll_id,
- codec_priv->mclk_id,
- codec_priv->mclk_freq, pll_out);
-   if (ret) {
-   dev_err(dev, "failed to start FLL: %d\n", ret);
-   goto fail;
-   }
+   if (codec_priv->pll_id >= 0 && codec_priv->fll_id >= 0) {
+   if (priv->sample_format == SNDRV_PCM_FORMAT_S24_LE)
+   pll_out = priv->sample_rate * 384;
+   else
+   pll_out = priv->sample_rate * 256;
 
-   ret = snd_soc_dai_set_sysclk(snd_soc_rtd_to_codec(rtd, 0),
-codec_priv->fll_id,
-pll_out, SND_SOC_CLOCK_IN);
+   ret = snd_soc_dai_set_pll(codec_dai,
+   codec_priv->pll_id,
+   codec_priv->mclk_id,
+   codec_priv->mclk_freq, pll_out);
+   if (ret) {
+   dev_err(dev, "failed to start FLL: %d\n", ret);
+   goto fail;
+   }
 
-   if (ret && ret != -ENOTSUPP) {
-   dev_err(dev, "failed to set SYSCLK: %d\n", ret);
-   goto fail;
+   ret = snd_soc_dai_set_sysclk(codec_dai,
+   codec_priv->fll_id,
+   pll_out, SND_SOC_CLOCK_IN);
+
+   if (ret && ret != -ENOTSUPP) {
+   dev_err(dev, "failed to set SYSCLK: %d\n", ret);
+   goto fail;
+   }
}
}
 
@@ -244,28 +250,34 @@ static int fsl_asoc_card_hw_free(struct snd_pcm_substream 
*substream)
 {
struct snd_soc_pcm_runtime *rtd = snd_soc_substream_to_rtd(substream);
struct fsl_asoc_card_priv *priv = snd_soc_card_get_drvdata(rtd->card);
-   struct codec_priv *codec_priv = >codec_priv;
+   struct codec_priv *codec_priv;
+   struct snd_soc_dai *codec_dai;
struct device *dev = rtd->card->dev;
int ret;
+   int i;
 
priv->streams &= ~BIT(substream->stream);
 
-   if (!priv->streams && codec_priv->pll_id >= 0 && codec_priv->fll_id >= 
0) {
-   /* Force freq to be free_freq to avoid error message in codec */
-   ret = 

[PATCHv4 9/9] ASoC: dt-bindings: fsl-asoc-card: add compatible for generic codec

2024-05-15 Thread Elinor Montmasson
Add documentation about new dts bindings following new support
for compatible "fsl,imx-audio-generic".

Some CPU DAI don't require a real audio codec. The new compatible
"fsl,imx-audio-generic" allows using the driver with codec drivers
SPDIF DIT and SPDIF DIR as dummy codecs.
It also allows using not pre-configured audio codecs which
don't require specific control through a codec driver.

The new dts properties give the possibility to set some parameters
about the CPU DAI usually set through the codec configuration.

Signed-off-by: Elinor Montmasson 
---
 .../bindings/sound/fsl-asoc-card.yaml | 96 ++-
 1 file changed, 92 insertions(+), 4 deletions(-)

diff --git a/Documentation/devicetree/bindings/sound/fsl-asoc-card.yaml 
b/Documentation/devicetree/bindings/sound/fsl-asoc-card.yaml
index 9922664d5ccc..332d8bf96e06 100644
--- a/Documentation/devicetree/bindings/sound/fsl-asoc-card.yaml
+++ b/Documentation/devicetree/bindings/sound/fsl-asoc-card.yaml
@@ -23,6 +23,16 @@ description:
   and PCM DAI formats. However, it'll be also possible to support those non
   AC'97/I2S/PCM type sound cards, such as S/PDIF audio and HDMI audio, as
   long as the driver has been properly upgraded.
+  To use CPU DAIs that do not require a codec such as an S/PDIF controller,
+  or to use a DAI to output or capture raw I2S/TDM data, you can
+  use the compatible "fsl,imx-audio-generic".
+
+definitions:
+  imx-audio-generic-dependency:
+properties:
+  compatible:
+contains:
+  const: fsl,imx-audio-generic
 
 maintainers:
   - Shengjiu Wang 
@@ -81,6 +91,7 @@ properties:
   - fsl,imx-audio-wm8960
   - fsl,imx-audio-wm8962
   - fsl,imx-audio-wm8958
+  - fsl,imx-audio-generic
 
   model:
 $ref: /schemas/types.yaml#/definitions/string
@@ -93,8 +104,14 @@ properties:
   need to add ASRC support via DPCM.
 
   audio-codec:
-$ref: /schemas/types.yaml#/definitions/phandle
-description: The phandle of an audio codec
+$ref: /schemas/types.yaml#/definitions/phandle-array
+description: |
+  The phandle of an audio codec.
+  If using the "fsl,imx-audio-generic" compatible, give instead a pair of
+  phandles with the spdif_transmitter first (driver SPDIF DIT) and the
+  spdif_receiver second (driver SPDIF DIR).
+items:
+  maxItems: 1
 
   audio-cpu:
 $ref: /schemas/types.yaml#/definitions/phandle
@@ -150,8 +167,8 @@ properties:
 description: dai-link uses bit clock inversion.
 
   mclk-id:
-$ref: /schemas/types.yaml#/definitions/uint32
-description: main clock id, specific for each card configuration.
+$ref: /schemas/types.yaml#/definitions/uint32-array
+description: Main clock id for each codec, specific for each card 
configuration.
 
   mux-int-port:
 $ref: /schemas/types.yaml#/definitions/uint32
@@ -167,10 +184,68 @@ properties:
 $ref: /schemas/types.yaml#/definitions/phandle
 description: The phandle of an CPU DAI controller
 
+  # Properties relevant only with "fsl,imx-audio-generic" compatible
+  dai-tdm-slot-width:
+description: See tdm-slot.txt.
+$ref: /schemas/types.yaml#/definitions/uint32
+
+  dai-tdm-slot-num:
+description: See tdm-slot.txt.
+$ref: /schemas/types.yaml#/definitions/uint32
+
+  clocks:
+description: |
+  The CPU DAI system clock, used to retrieve
+  the CPU DAI system clock frequency with the generic codec.
+maxItems: 1
+
+  clock-names:
+items:
+  - const: cpu_sysclk
+
+  cpu-system-clock-direction-out:
+description: |
+  Specifies cpu system clock direction as 'out' on initialization.
+  If not set, direction is 'in'.
+$ref: /schemas/types.yaml#/definitions/flag
+
+dependencies:
+  dai-tdm-slot-width:
+$ref: "#/definitions/imx-audio-generic-dependency"
+  dai-tdm-slot-num:
+$ref: "#/definitions/imx-audio-generic-dependency"
+  clocks:
+$ref: "#/definitions/imx-audio-generic-dependency"
+  cpu-system-clock-direction-out:
+$ref: "#/definitions/imx-audio-generic-dependency"
+
 required:
   - compatible
   - model
 
+allOf:
+  - if:
+  $ref: "#/definitions/imx-audio-generic-dependency"
+then:
+  properties:
+audio-codec:
+  items:
+- description: SPDIF DIT phandle
+- description: SPDIF DIR phandle
+mclk-id:
+  maxItems: 1
+  items:
+minItems: 1
+maxItems: 2
+else:
+  properties:
+audio-codec:
+  maxItems: 1
+mclk-id:
+  maxItems: 1
+  items:
+maxItems: 1
+
 unevaluatedProperties: false
 
 examples:
@@ -195,3 +270,16 @@ examples:
  "AIN2L", "Line In Jack",
  "AIN2R", "Line In Jack";
 };
+
+  - |
+#include 
+sound-spdif-asrc {
+  compatible = "fsl,imx-audio-generic";
+  model = "spdif-asrc-audio";
+  audio-cpu = <>;
+  audio-asrc = <>;
+  audio-codec = <>, 

[PATCHv4 0/9] ASoC: fsl-asoc-card: compatibility integration of a generic codec use case for use with S/PDIF controller

2024-05-15 Thread Elinor Montmasson
Hello,

This is the v4 of the series of patch aiming to make the machine driver
"fsl-asoc-card" compatible with use cases where there is no real codec
driver. It proposes to use the "spdif_receiver" and "spdif_transmitter"
drivers instead of the dummy codec.
This is a first step in using the S/PDIF controller with the ASRC.

The five first patches add compatibility with the pair of codecs
"spdif_receiver" and "spdif_transmitter" with a new compatible,
"fsl,imx-audio-generic". Codec parameters are set with default values.
Consequently, the driver is modified to work with multi-codec use cases.
It can get 2 codecs phandles from the device tree, and the
"fsl_asoc_card_priv" struct now has 2 "codec_priv" to store properties
of both codecs. It is fixed to 2 codecs as only "fsl,imx-audio-generic"
uses multiple codecs at the moment.
However, the driver now uses "for_each_codecs" macros when possible to
ease future implementations of multi-codec configurations.

The three following patches add configuration options for the
devicetree. They configure the CPU DAI when using
"fsl,imx-audio-generic". These options are usually hard-coded in
"fsl-asoc-card.c" for each audio codec. Because the generic codec could
be used with other CPU DAIs than the S/PDIF controller, setting these
parameters could be required.
These new options try to follow the style of the simple-card driver:
* standard TDM properties are used, as defined in "tdm-slot.txt".
* the CPU DAI system-clock can be specified, allowing the codec to
retrieve its frequency.
* the CPU DAI system-clock direction can be specified through a new
binding, the same way it is done in simple-card.

The last commit updates the DT bindings documentation and add a new
example for the generic codec use case.

This series of patch was successfully built for arm64 and x86 on top of
the latest??"for-next" branch of the ASoC git tree on the 14th of May
2024.
These modifications have also been tested on an i.MX8MN evaluation
board, with a linux kernel RT v6.1.26-rt8.

If you have any question or remark about these commits, don't hesitate
to reply.

Best regards,
Elinor Montmasson


Changelog:
v3 -> v4:
* Use the standard TDM bidings, as defined in "tdm-slot.txt", for the
new optional DT bindings setting the TDM slots number and width.
* Use the clock DT bindings to optionally specify the CPU DAI system
clock frequency, instead of a dedicated new binding.
* Rename the new DT binding "cpu-sysclk-dir-out" to
"cpu-system-clock-direction-out" to better follow the style of the
simple-card driver.
* Merge TX an RX bindings for CPU DAI system-clock, to better follow the
style of the simple-card driver, and also as there was no use case in
fsl-asoc-card where TX and RX settings had to be different.
* Add the documentation for the new bindings in the new DT schema
bindings documentation. Also add an example with the generic codec.
* v3 patch series at :
https://lore.kernel.org/alsa-devel/20231218124058.2047167-1-elinor.montmas...@savoirfairelinux.com/

v2 -> v3:
* When the bitmaster or framemaster are retrieved from the device tree,
the driver will now compare them with the two codecs possibly given in
device tree, and not just the first codec.
* Improve driver modifications to use multiple codecs for better
integration of future multi-codec use cases:
* Use "for_each_codec" macros when possible.
* "fsl_asoc_card_priv" struct now has 2 "codec_priv" as the driver
can currently retrieve 2 codec phandles from the device tree.
* Fix subject of patch 10/10 to follow the style of the subsystem
* v2 patch series at:
https://lore.kernel.org/alsa-devel/20231027144734.3654829-1-elinor.montmas...@savoirfairelinux.com/

v1 -> v2:
* Replace use of the dummy codec by the pair of codecs spdif_receiver /
spdif_transmitter.
* Adapt how dai links codecs are used to take into account the
possibility for multiple codecs per link.
* Change compatible name.
* Adapt driver to be able to register two codecs given in the device
tree.
* v1 patch series at:
https://lore.kernel.org/alsa-devel/20230901144550.520072-1-elinor.montmas...@savoirfairelinux.com/


Elinor Montmasson (9):
  ASoC: fsl-asoc-card: add support for dai links with multiple codecs
  ASoC: fsl-asoc-card: add second dai link component for codecs
  ASoC: fsl-asoc-card: add compatibility to use 2 codecs in dai-links
  ASoC: fsl-asoc-card: add new compatible for a generic codec use case
  ASoC: fsl-asoc-card: set generic codec as clock provider
  ASoC: fsl-asoc-card: add use of devicetree TDM slot properties
  ASoC: fsl-asoc-card: add DT clock "cpu_sysclk" with generic codec
  ASoC: fsl-asoc-card: add DT property "cpu-system-clock-direction-out"
  ASoC: dt-bindings: fsl-asoc-card: add compatible for generic codec

 .../bindings/sound/fsl-asoc-card.yaml |  96 +-
 sound/soc/fsl/fsl-asoc-card.c | 306 +++---
 2 files changed, 287 insertions(+), 115 deletions(-)

-- 
2.34.1



[PATCHv4 4/9] ASoC: fsl-asoc-card: add new compatible for a generic codec use case

2024-05-15 Thread Elinor Montmasson
Add the new compatible "fsl,imx-audio-generic" for a generic codec
use case. It allows using the fsl-asoc-card driver with the
spdif_receiver and spdif_transmitter codec drivers used as dummy codecs.
It can be used for cases where there is no real codec or codecs which do
not require declaring controls.

Signed-off-by: Elinor Montmasson 
Co-authored-by: Philip-Dylan Gleonec 
---
 sound/soc/fsl/fsl-asoc-card.c | 29 ++---
 1 file changed, 26 insertions(+), 3 deletions(-)

diff --git a/sound/soc/fsl/fsl-asoc-card.c b/sound/soc/fsl/fsl-asoc-card.c
index 620a25eb068a..a4ecc9093558 100644
--- a/sound/soc/fsl/fsl-asoc-card.c
+++ b/sound/soc/fsl/fsl-asoc-card.c
@@ -567,6 +567,7 @@ static int fsl_asoc_card_probe(struct platform_device *pdev)
struct platform_device *cpu_pdev;
struct fsl_asoc_card_priv *priv;
struct device *codec_dev[2] = { NULL, NULL };
+   const char *generic_codec_dai_names[2];
const char *codec_dai_name;
const char *codec_dev_name[2];
u32 asrc_fmt = 0;
@@ -744,6 +745,11 @@ static int fsl_asoc_card_probe(struct platform_device 
*pdev)
priv->codec_priv[0].fll_id = WM8904_CLK_FLL;
priv->codec_priv[0].pll_id = WM8904_FLL_MCLK;
priv->dai_fmt |= SND_SOC_DAIFMT_CBP_CFP;
+   } else if (of_device_is_compatible(np, "fsl,imx-audio-generic")) {
+   generic_codec_dai_names[0] = "dit-hifi";
+   generic_codec_dai_names[1] = "dir-hifi";
+   priv->dai_link[0].num_codecs = 2;
+   priv->dai_link[2].num_codecs = 2;
} else {
dev_err(>dev, "unknown Device Tree compatible\n");
ret = -EINVAL;
@@ -798,6 +804,12 @@ static int fsl_asoc_card_probe(struct platform_device 
*pdev)
ret = -EPROBE_DEFER;
goto asrc_fail;
}
+   if (of_device_is_compatible(np, "fsl,imx-audio-generic")
+ && !codec_dev[1]) {
+   dev_dbg(>dev, "failed to find second codec device\n");
+   ret = -EPROBE_DEFER;
+   goto asrc_fail;
+   }
 
/* Common settings for corresponding Freescale CPU DAI driver */
if (of_node_name_eq(cpu_np, "ssi")) {
@@ -855,11 +867,21 @@ static int fsl_asoc_card_probe(struct platform_device 
*pdev)
 
/* Normal DAI Link */
priv->dai_link[0].cpus->of_node = cpu_np;
-   priv->dai_link[0].codecs[0].dai_name = codec_dai_name;
 
-   if (!fsl_asoc_card_is_ac97(priv))
+   if (of_device_is_compatible(np, "fsl,imx-audio-generic")) {
+   priv->dai_link[0].codecs[0].dai_name =
+   generic_codec_dai_names[0];
+   priv->dai_link[0].codecs[1].dai_name =
+   generic_codec_dai_names[1];
+   } else {
+   priv->dai_link[0].codecs[0].dai_name = codec_dai_name;
+   }
+
+   if (!fsl_asoc_card_is_ac97(priv)) {
priv->dai_link[0].codecs[0].of_node = codec_np[0];
-   else {
+   if (of_device_is_compatible(np, "fsl,imx-audio-generic"))
+   priv->dai_link[0].codecs[1].of_node = codec_np[1];
+   } else {
u32 idx;
 
ret = of_property_read_u32(cpu_np, "cell-index", );
@@ -990,6 +1012,7 @@ static const struct of_device_id fsl_asoc_card_dt_ids[] = {
{ .compatible = "fsl,imx-audio-wm8958", },
{ .compatible = "fsl,imx-audio-nau8822", },
{ .compatible = "fsl,imx-audio-wm8904", },
+   { .compatible = "fsl,imx-audio-generic", },
{}
 };
 MODULE_DEVICE_TABLE(of, fsl_asoc_card_dt_ids);
-- 
2.34.1



[PATCHv4 5/9] ASoC: fsl-asoc-card: set generic codec as clock provider

2024-05-15 Thread Elinor Montmasson
The default dai format defined by DAI_FMT_BASE doesn't set if the codec
is consumer or provider of the bit and frame clocks.

S/PDIF DIR usually converts audio signal to an asynchronous I2S/PCM
stream, and doesn't consume a bit or frame clock.

As S/PDIF DIR and DIT are used as codecs for the generic use case,
set codecs as provider of both bit and frame clocks by default.

Signed-off-by: Elinor Montmasson 
Co-authored-by: Philip-Dylan Gleonec 
---
 sound/soc/fsl/fsl-asoc-card.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/sound/soc/fsl/fsl-asoc-card.c b/sound/soc/fsl/fsl-asoc-card.c
index a4ecc9093558..82ed7f4e81a1 100644
--- a/sound/soc/fsl/fsl-asoc-card.c
+++ b/sound/soc/fsl/fsl-asoc-card.c
@@ -750,6 +750,7 @@ static int fsl_asoc_card_probe(struct platform_device *pdev)
generic_codec_dai_names[1] = "dir-hifi";
priv->dai_link[0].num_codecs = 2;
priv->dai_link[2].num_codecs = 2;
+   priv->dai_fmt |= SND_SOC_DAIFMT_CBP_CFP;
} else {
dev_err(>dev, "unknown Device Tree compatible\n");
ret = -EINVAL;
-- 
2.34.1



[PATCHv4 1/9] ASoC: fsl-asoc-card: add support for dai links with multiple codecs

2024-05-15 Thread Elinor Montmasson
Add support for dai links using multiple codecs for multi-codec
use cases.

Signed-off-by: Elinor Montmasson 
Co-authored-by: Philip-Dylan Gleonec 
---
 sound/soc/fsl/fsl-asoc-card.c | 22 ++
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/sound/soc/fsl/fsl-asoc-card.c b/sound/soc/fsl/fsl-asoc-card.c
index 5ddc0c2fe53f..8a2a6e5461dc 100644
--- a/sound/soc/fsl/fsl-asoc-card.c
+++ b/sound/soc/fsl/fsl-asoc-card.c
@@ -815,10 +815,10 @@ static int fsl_asoc_card_probe(struct platform_device 
*pdev)
 
/* Normal DAI Link */
priv->dai_link[0].cpus->of_node = cpu_np;
-   priv->dai_link[0].codecs->dai_name = codec_dai_name;
+   priv->dai_link[0].codecs[0].dai_name = codec_dai_name;
 
if (!fsl_asoc_card_is_ac97(priv))
-   priv->dai_link[0].codecs->of_node = codec_np;
+   priv->dai_link[0].codecs[0].of_node = codec_np;
else {
u32 idx;
 
@@ -829,11 +829,11 @@ static int fsl_asoc_card_probe(struct platform_device 
*pdev)
goto asrc_fail;
}
 
-   priv->dai_link[0].codecs->name =
+   priv->dai_link[0].codecs[0].name =
devm_kasprintf(>dev, GFP_KERNEL,
   "ac97-codec.%u",
   (unsigned int)idx);
-   if (!priv->dai_link[0].codecs->name) {
+   if (!priv->dai_link[0].codecs[0].name) {
ret = -ENOMEM;
goto asrc_fail;
}
@@ -844,13 +844,19 @@ static int fsl_asoc_card_probe(struct platform_device 
*pdev)
priv->card.num_links = 1;
 
if (asrc_pdev) {
+   int i;
+   struct snd_soc_dai_link_component *codec;
+   struct snd_soc_dai_link *link;
+
/* DPCM DAI Links only if ASRC exists */
priv->dai_link[1].cpus->of_node = asrc_np;
priv->dai_link[1].platforms->of_node = asrc_np;
-   priv->dai_link[2].codecs->dai_name = codec_dai_name;
-   priv->dai_link[2].codecs->of_node = codec_np;
-   priv->dai_link[2].codecs->name =
-   priv->dai_link[0].codecs->name;
+   link = &(priv->dai_link[2]);
+   for_each_link_codecs(link, i, codec) {
+   codec->dai_name = priv->dai_link[0].codecs[i].dai_name;
+   codec->of_node = priv->dai_link[0].codecs[i].of_node;
+   codec->name = priv->dai_link[0].codecs[i].name;
+   }
priv->dai_link[2].cpus->of_node = cpu_np;
priv->dai_link[2].dai_fmt = priv->dai_fmt;
priv->card.num_links = 3;
-- 
2.34.1



[PATCHv4 7/9] ASoC: fsl-asoc-card: add DT clock "cpu_sysclk" with generic codec

2024-05-15 Thread Elinor Montmasson
Add an optional DT clock "cpu_sysclk" to get the CPU DAI system-clock
frequency when using the generic codec.
It is set for both Tx and Rx.
The way the frequency value is used is up to the CPU DAI driver
implementation.

Signed-off-by: Elinor Montmasson 
---
 sound/soc/fsl/fsl-asoc-card.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/sound/soc/fsl/fsl-asoc-card.c b/sound/soc/fsl/fsl-asoc-card.c
index 9aca8ad15372..c7fc9c16f761 100644
--- a/sound/soc/fsl/fsl-asoc-card.c
+++ b/sound/soc/fsl/fsl-asoc-card.c
@@ -754,6 +754,12 @@ static int fsl_asoc_card_probe(struct platform_device 
*pdev)
snd_soc_of_parse_tdm_slot(np, NULL, NULL,
>cpu_priv.slot_num,
>cpu_priv.slot_width);
+   struct clk *cpu_sysclk = clk_get(>dev, "cpu_sysclk");
+   if (!IS_ERR(cpu_sysclk)) {
+   priv->cpu_priv.sysclk_freq[TX] = 
clk_get_rate(cpu_sysclk);
+   priv->cpu_priv.sysclk_freq[RX] = 
priv->cpu_priv.sysclk_freq[TX];
+   clk_put(cpu_sysclk);
+   }
} else {
dev_err(>dev, "unknown Device Tree compatible\n");
ret = -EINVAL;
-- 
2.34.1



[PATCHv4 6/9] ASoC: fsl-asoc-card: add use of devicetree TDM slot properties

2024-05-15 Thread Elinor Montmasson
Add use of optional TDM slot properties "dai-tdm-slot-num" and
"dai-tdm-slot-width" through snd_soc_of_parse_tdm_slot().
They allow setting a custom TDM slot width in bits and number of slots
for the CPU DAI when using the generic codec.

Signed-off-by: Elinor Montmasson 
---
 sound/soc/fsl/fsl-asoc-card.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/sound/soc/fsl/fsl-asoc-card.c b/sound/soc/fsl/fsl-asoc-card.c
index 82ed7f4e81a1..9aca8ad15372 100644
--- a/sound/soc/fsl/fsl-asoc-card.c
+++ b/sound/soc/fsl/fsl-asoc-card.c
@@ -751,6 +751,9 @@ static int fsl_asoc_card_probe(struct platform_device *pdev)
priv->dai_link[0].num_codecs = 2;
priv->dai_link[2].num_codecs = 2;
priv->dai_fmt |= SND_SOC_DAIFMT_CBP_CFP;
+   snd_soc_of_parse_tdm_slot(np, NULL, NULL,
+   >cpu_priv.slot_num,
+   >cpu_priv.slot_width);
} else {
dev_err(>dev, "unknown Device Tree compatible\n");
ret = -EINVAL;
-- 
2.34.1



[PATCHv4 2/9] ASoC: fsl-asoc-card: add second dai link component for codecs

2024-05-15 Thread Elinor Montmasson
Add a second dai link component for codecs that will be used for the
generic codec use case.
It will use spdif_receiver and spdif_transmitter drivers as dummy codec
drivers, needing 2 codecs slots for the links.

To prevent deferring in use cases using only one codec, also set
by default the number of codecs to 1 for the relevant dai links.

Signed-off-by: Elinor Montmasson 
Co-authored-by: Philip-Dylan Gleonec 
---
 sound/soc/fsl/fsl-asoc-card.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/sound/soc/fsl/fsl-asoc-card.c b/sound/soc/fsl/fsl-asoc-card.c
index 8a2a6e5461dc..c83492e7cec2 100644
--- a/sound/soc/fsl/fsl-asoc-card.c
+++ b/sound/soc/fsl/fsl-asoc-card.c
@@ -296,7 +296,7 @@ static int be_hw_params_fixup(struct snd_soc_pcm_runtime 
*rtd,
 
 SND_SOC_DAILINK_DEFS(hifi,
DAILINK_COMP_ARRAY(COMP_EMPTY()),
-   DAILINK_COMP_ARRAY(COMP_EMPTY()),
+   DAILINK_COMP_ARRAY(COMP_EMPTY(), COMP_EMPTY()),
DAILINK_COMP_ARRAY(COMP_EMPTY()));
 
 SND_SOC_DAILINK_DEFS(hifi_fe,
@@ -306,7 +306,7 @@ SND_SOC_DAILINK_DEFS(hifi_fe,
 
 SND_SOC_DAILINK_DEFS(hifi_be,
DAILINK_COMP_ARRAY(COMP_EMPTY()),
-   DAILINK_COMP_ARRAY(COMP_EMPTY()));
+   DAILINK_COMP_ARRAY(COMP_EMPTY(), COMP_EMPTY()));
 
 static const struct snd_soc_dai_link fsl_asoc_card_dai[] = {
/* Default ASoC DAI Link*/
@@ -618,6 +618,8 @@ static int fsl_asoc_card_probe(struct platform_device *pdev)
 
memcpy(priv->dai_link, fsl_asoc_card_dai,
   sizeof(struct snd_soc_dai_link) * ARRAY_SIZE(priv->dai_link));
+   priv->dai_link[0].num_codecs = 1;
+   priv->dai_link[2].num_codecs = 1;
 
priv->card.dapm_routes = audio_map;
priv->card.num_dapm_routes = ARRAY_SIZE(audio_map);
-- 
2.34.1



[PATCHv4 8/9] ASoC: fsl-asoc-card: add DT property "cpu-system-clock-direction-out"

2024-05-15 Thread Elinor Montmasson
Add new optional DT property "cpu-system-clock-direction-out" to set
sysclk direction as "out" for the CPU DAI when using the generic codec.
It is set for both Tx and Rx.
If not set, the direction is "in".
The way the direction value is used is up to the CPU DAI driver
implementation.

Signed-off-by: Elinor Montmasson 
---
 sound/soc/fsl/fsl-asoc-card.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/sound/soc/fsl/fsl-asoc-card.c b/sound/soc/fsl/fsl-asoc-card.c
index c7fc9c16f761..f3fc2b29c92f 100644
--- a/sound/soc/fsl/fsl-asoc-card.c
+++ b/sound/soc/fsl/fsl-asoc-card.c
@@ -760,6 +760,10 @@ static int fsl_asoc_card_probe(struct platform_device 
*pdev)
priv->cpu_priv.sysclk_freq[RX] = 
priv->cpu_priv.sysclk_freq[TX];
clk_put(cpu_sysclk);
}
+   priv->cpu_priv.sysclk_dir[TX] =
+   of_property_read_bool(np, 
"cpu-system-clock-direction-out") ?
+   SND_SOC_CLOCK_OUT : SND_SOC_CLOCK_IN;
+   priv->cpu_priv.sysclk_dir[RX] = priv->cpu_priv.sysclk_dir[TX];
} else {
dev_err(>dev, "unknown Device Tree compatible\n");
ret = -EINVAL;
-- 
2.34.1



Re: [PATCH 2/3] crypto: X25519 core functions for ppc64le

2024-05-15 Thread Danny Tsen

Hi Andy,

Thanks for the info.  I should be able to do it.  I was hoping an 
assembly guru like you can show me some tricks here if there is :)


Thanks.

-Danny

On 5/15/24 8:33 AM, Andy Polyakov wrote:

+static void cswap(fe51 p, fe51 q, unsigned int bit)
+{
+    u64 t, i;
+    u64 c = 0 - (u64) bit;
+
+    for (i = 0; i < 5; ++i) {
+    t = c & (p[i] ^ q[i]);
+    p[i] ^= t;
+    q[i] ^= t;
+    }
+}


The "c" in cswap stands for "constant-time," and the problem is that 
contemporary compilers have exhibited the ability to produce 
non-constant-time machine code as result of compilation of the above 
kind of technique. The outcome is platform-specific and ironically 
some of PPC code generators were observed to generate "most" 
non-constant-time code. "Most" in sense that execution time 
variations would be most easy to catch.


Just to substantiate the point, consider 
https://godbolt.org/z/faYnEcPT7, and note the conditional branch in 
the middle of the loop, which flies in the face of constant-time-ness. 
In case you object 'bit &= 1' on line 7 in the C code. Indeed, if you 
comment it out, the generated code will be fine. But the point is that 
the compiler is capable of and was in fact observed to figure out that 
the caller passes either one or zero and generate the machine code in 
the assembly window. In other words 'bit &= 1' is just a reflection of 
what the caller does.


... the permanent solution is to do it in assembly. I can put 
together something...


Though you should be able to do this just as well :-) So should I or 
would you?


Cheers.



Re: [PATCH v15 00/16] Add audio support in v4l2 framework

2024-05-15 Thread Shengjiu Wang
On Wed, May 15, 2024 at 6:46 PM Jaroslav Kysela  wrote:
>
> On 15. 05. 24 12:19, Takashi Iwai wrote:
> > On Wed, 15 May 2024 11:50:52 +0200,
> > Jaroslav Kysela wrote:
> >>
> >> On 15. 05. 24 11:17, Hans Verkuil wrote:
> >>> Hi Jaroslav,
> >>>
> >>> On 5/13/24 13:56, Jaroslav Kysela wrote:
>  On 09. 05. 24 13:13, Jaroslav Kysela wrote:
> > On 09. 05. 24 12:44, Shengjiu Wang wrote:
>  mem2mem is just like the decoder in the compress pipeline. which is
>  one of the components in the pipeline.
> >>>
> >>> I was thinking of loopback with endpoints using compress streams,
> >>> without physical endpoint, something like:
> >>>
> >>> compress playback (to feed data from userspace) -> DSP (processing) ->
> >>> compress capture (send data back to userspace)
> >>>
> >>> Unless I'm missing something, you should be able to process data as 
> >>> fast
> >>> as you can feed it and consume it in such case.
> >>>
> >>
> >> Actually in the beginning I tried this,  but it did not work well.
> >> ALSA needs time control for playback and capture, playback and capture
> >> needs to synchronize.  Usually the playback and capture pipeline is
> >> independent in ALSA design,  but in this case, the playback and capture
> >> should synchronize, they are not independent.
> >
> > The core compress API core no strict timing constraints. You can 
> > eventually0
> > have two half-duplex compress devices, if you like to have really 
> > independent
> > mechanism. If something is missing in API, you can extend this API 
> > (like to
> > inform the user space that it's a producer/consumer processing without 
> > any
> > relation to the real time). I like this idea.
> 
>  I was thinking more about this. If I am right, the mentioned use in 
>  gstreamer
>  is supposed to run the conversion (DSP) job in "one shot" (can be handled
>  using one system call like blocking ioctl).  The goal is just to offload 
>  the
>  CPU work to the DSP (co-processor). If there are no requirements for the
>  queuing, we can implement this ioctl in the compress ALSA API easily 
>  using the
>  data management through the dma-buf API. We can eventually define a new
>  direction (enum snd_compr_direction) like SND_COMPRESS_CONVERT or so to 
>  allow
>  handle this new data scheme. The API may be extended later on real 
>  demand, of
>  course.
> 
>  Otherwise all pieces are already in the current ALSA compress API
>  (capabilities, params, enumeration). The realtime controls may be created
>  using ALSA control API.
> >>>
> >>> So does this mean that Shengjiu should attempt to use this ALSA approach 
> >>> first?
> >>
> >> I've not seen any argument to use v4l2 mem2mem buffer scheme for this
> >> data conversion forcefully. It looks like a simple job and ALSA APIs
> >> may be extended for this simple purpose.
> >>
> >> Shengjiu, what are your requirements for gstreamer support? Would be a
> >> new blocking ioctl enough for the initial support in the compress ALSA
> >> API?
> >
> > If it works with compress API, it'd be great, yeah.
> > So, your idea is to open compress-offload devices for read and write,
> > then and let them convert a la batch jobs without timing control?
> >
> > For full-duplex usages, we might need some more extensions, so that
> > both read and write parameters can be synchronized.  (So far the
> > compress stream is a unidirectional, and the runtime buffer for a
> > single stream.)
> >
> > And the buffer management is based on the fixed size fragments.  I
> > hope this doesn't matter much for the intended operation?
>
> It's a question, if the standard I/O is really required for this case. My
> quick idea was to just implement a new "direction" for this job supporting
> only one ioctl for the data processing which will execute the job in "one
> shot" at the moment. The I/O may be handled through dma-buf API (which seems
> to be standard nowadays for this purpose and allows future chaining).
>
> So something like:
>
> struct dsp_job {
> int source_fd; /* dma-buf FD with source data - for dma_buf_get() */
> int target_fd; /* dma-buf FD for target data - for dma_buf_get() */
> ... maybe some extra data size members here ...
> ... maybe some special parameters here ...
> };
>
> #define SNDRV_COMPRESS_DSPJOB _IOWR('C', 0x60, struct dsp_job)
>
> This ioctl will be blocking (thus synced). My question is, if it's feasible
> for gstreamer or not. For this particular case, if the rate conversion is
> implemented in software, it will block the gstreamer data processing, too.
>

Thanks.

I have several questions:
1.  Compress API alway binds to a sound card.  Can we avoid that?
 For ASRC, it is just one component,

2.  Compress API doesn't seem to support mmap().  Is this a problem
 for sending and getting data 

Re: [PATCH 2/3] crypto: X25519 core functions for ppc64le

2024-05-15 Thread Andy Polyakov

+static void cswap(fe51 p, fe51 q, unsigned int bit)
+{
+    u64 t, i;
+    u64 c = 0 - (u64) bit;
+
+    for (i = 0; i < 5; ++i) {
+    t = c & (p[i] ^ q[i]);
+    p[i] ^= t;
+    q[i] ^= t;
+    }
+}


The "c" in cswap stands for "constant-time," and the problem is that 
contemporary compilers have exhibited the ability to produce 
non-constant-time machine code as result of compilation of the above 
kind of technique. The outcome is platform-specific and ironically some 
of PPC code generators were observed to generate "most" 
non-constant-time code. "Most" in sense that execution time variations 
would be most easy to catch.


Just to substantiate the point, consider 
https://godbolt.org/z/faYnEcPT7, and note the conditional branch in the 
middle of the loop, which flies in the face of constant-time-ness. In 
case you object 'bit &= 1' on line 7 in the C code. Indeed, if you 
comment it out, the generated code will be fine. But the point is that 
the compiler is capable of and was in fact observed to figure out that 
the caller passes either one or zero and generate the machine code in 
the assembly window. In other words 'bit &= 1' is just a reflection of 
what the caller does.


... the permanent solution is to do it 
in assembly. I can put together something...


Though you should be able to do this just as well :-) So should I or 
would you?


Cheers.



Re: [PATCH 2/3] crypto: X25519 core functions for ppc64le

2024-05-15 Thread Danny Tsen

Hi Andy,

Points taken.  And much appreciate for the help.

Thanks.

-Danny

On 5/15/24 3:29 AM, Andy Polyakov wrote:

Hi,


+static void cswap(fe51 p, fe51 q, unsigned int bit)
+{
+    u64 t, i;
+    u64 c = 0 - (u64) bit;
+
+    for (i = 0; i < 5; ++i) {
+    t = c & (p[i] ^ q[i]);
+    p[i] ^= t;
+    q[i] ^= t;
+    }
+}


The "c" in cswap stands for "constant-time," and the problem is that 
contemporary compilers have exhibited the ability to produce 
non-constant-time machine code as result of compilation of the above 
kind of technique. The outcome is platform-specific and ironically 
some of PPC code generators were observed to generate "most" 
non-constant-time code. "Most" in sense that execution time variations 
would be most easy to catch. One way to work around the problem, at 
least for the time being, is to add 'asm volatile("" : "+r"(c))' after 
you calculate 'c'. But there is no guarantee that the next compiler 
version won't see through it, hence the permanent solution is to do it 
in assembly. I can put together something...


Cheers.



Re: [PATCH 1/3] crypto: X25519 low-level primitives for ppc64le.

2024-05-15 Thread Danny Tsen

See inline.

On 5/15/24 4:06 AM, Andy Polyakov wrote:

Hi,


+SYM_FUNC_START(x25519_fe51_sqr_times)
...
+
+.Lsqr_times_loop:
...
+
+    std    9,16(3)
+    std    10,24(3)
+    std    11,32(3)
+    std    7,0(3)
+    std    8,8(3)
+    bdnz    .Lsqr_times_loop


I see no reason for why the stores can't be moved outside the loop in 
question.



Yeah.  I'll fix it.



+SYM_FUNC_START(x25519_fe51_frombytes)
+.align    5
+
+    li    12, -1
+    srdi    12, 12, 13    # 0x7
+
+    ld    5, 0(4)
+    ld    6, 8(4)
+    ld    7, 16(4)
+    ld    8, 24(4)


Is there actual guarantee that the byte input is 64-bit aligned? While 
it is true that processor is obliged to handle misaligned loads and 
stores by the ISA specification, them being inefficient doesn't go 
against it. Most notably inefficiency is likely to be noted at the 
page boundaries. What I'm trying to say is that it would be more 
appropriate to avoid the unaligned loads (and stores).


Good point.  Maybe I can handle it with 64-bit aligned for the input.

Thanks.




Cheers.



Re: [PATCH 1/3] crypto: X25519 low-level primitives for ppc64le.

2024-05-15 Thread Danny Tsen

Thank you Andy.  Will fix this.

On 5/15/24 3:11 AM, Andy Polyakov wrote:

Hi,

Couple of remarks inline.


+# [1] https://www.openssl.org/~appro/cryptogams/


https://github.com/dot-asm/cryptogams/ is arguably better reference.


+SYM_FUNC_START(x25519_fe51_mul)
+.align    5


The goal is to align the label, not the first instruction after the 
directive. It's not a problem in this spot, in the beginning of the 
module that is, but further below it's likely to inject redundant nops 
between the label and meaningful code. But since the directive in 
question is not position-sensitive one can resolve this by changing 
the order of the directive and the SYM_FUNC_START macro.


Cheers.



Re: [PATCH 1/3] crypto: X25519 low-level primitives for ppc64le.

2024-05-15 Thread Andy Polyakov

Hi,


+SYM_FUNC_START(x25519_fe51_sqr_times)
...
+
+.Lsqr_times_loop:
...
+
+   std 9,16(3)
+   std 10,24(3)
+   std 11,32(3)
+   std 7,0(3)
+   std 8,8(3)
+   bdnz.Lsqr_times_loop


I see no reason for why the stores can't be moved outside the loop in 
question.



+SYM_FUNC_START(x25519_fe51_frombytes)
+.align 5
+
+   li  12, -1
+   srdi12, 12, 13  # 0x7
+
+   ld  5, 0(4)
+   ld  6, 8(4)
+   ld  7, 16(4)
+   ld  8, 24(4)


Is there actual guarantee that the byte input is 64-bit aligned? While 
it is true that processor is obliged to handle misaligned loads and 
stores by the ISA specification, them being inefficient doesn't go 
against it. Most notably inefficiency is likely to be noted at the page 
boundaries. What I'm trying to say is that it would be more appropriate 
to avoid the unaligned loads (and stores).


Cheers.



Re: [PATCH 2/3] crypto: X25519 core functions for ppc64le

2024-05-15 Thread Andy Polyakov

Hi,


+static void cswap(fe51 p, fe51 q, unsigned int bit)
+{
+   u64 t, i;
+   u64 c = 0 - (u64) bit;
+
+   for (i = 0; i < 5; ++i) {
+   t = c & (p[i] ^ q[i]);
+   p[i] ^= t;
+   q[i] ^= t;
+   }
+}


The "c" in cswap stands for "constant-time," and the problem is that 
contemporary compilers have exhibited the ability to produce 
non-constant-time machine code as result of compilation of the above 
kind of technique. The outcome is platform-specific and ironically some 
of PPC code generators were observed to generate "most" 
non-constant-time code. "Most" in sense that execution time variations 
would be most easy to catch. One way to work around the problem, at 
least for the time being, is to add 'asm volatile("" : "+r"(c))' after 
you calculate 'c'. But there is no guarantee that the next compiler 
version won't see through it, hence the permanent solution is to do it 
in assembly. I can put together something...


Cheers.



Re: [PATCH 1/3] crypto: X25519 low-level primitives for ppc64le.

2024-05-15 Thread Andy Polyakov

Hi,

Couple of remarks inline.


+# [1] https://www.openssl.org/~appro/cryptogams/


https://github.com/dot-asm/cryptogams/ is arguably better reference.


+SYM_FUNC_START(x25519_fe51_mul)
+.align 5


The goal is to align the label, not the first instruction after the 
directive. It's not a problem in this spot, in the beginning of the 
module that is, but further below it's likely to inject redundant nops 
between the label and meaningful code. But since the directive in 
question is not position-sensitive one can resolve this by changing the 
order of the directive and the SYM_FUNC_START macro.


Cheers.



Re: [PATCH v2 1/1] arch/fault: don't print logs for pte marker poison errors

2024-05-15 Thread Oscar Salvador
On Wed, May 15, 2024 at 12:41:42PM +0200, Borislav Petkov wrote:
> On Fri, May 10, 2024 at 11:29:26AM -0700, Axel Rasmussen wrote:
> > @@ -3938,7 +3938,7 @@ static vm_fault_t handle_pte_marker(struct vm_fault 
> > *vmf)
> >  
> > /* Higher priority than uffd-wp when data corrupted */
> > if (marker & PTE_MARKER_POISONED)
> > -   return VM_FAULT_HWPOISON;
> > +   return VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_SILENT;
> 
> If you know here that this poisoning should be silent, why do you have
> to make it all complicated and propagate it into arch code, waste
> a separate VM_FAULT flag just for that instead of simply returning here
> a VM_FAULT_COMPLETED or some other innocuous value which would stop
> processing the fault?

AFAIK, He only wants it to be silent wrt. the arch fault handler not screaming,
but he still wants to be able to trigger force_sig_mceerr().


-- 
Oscar Salvador
SUSE Labs


Re: [PATCH v2 1/1] arch/fault: don't print logs for pte marker poison errors

2024-05-15 Thread Borislav Petkov
On Fri, May 10, 2024 at 11:29:26AM -0700, Axel Rasmussen wrote:
> @@ -3938,7 +3938,7 @@ static vm_fault_t handle_pte_marker(struct vm_fault 
> *vmf)
>  
>   /* Higher priority than uffd-wp when data corrupted */
>   if (marker & PTE_MARKER_POISONED)
> - return VM_FAULT_HWPOISON;
> + return VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_SILENT;

If you know here that this poisoning should be silent, why do you have
to make it all complicated and propagate it into arch code, waste
a separate VM_FAULT flag just for that instead of simply returning here
a VM_FAULT_COMPLETED or some other innocuous value which would stop
processing the fault?

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH v15 00/16] Add audio support in v4l2 framework

2024-05-15 Thread Jaroslav Kysela

On 15. 05. 24 12:19, Takashi Iwai wrote:

On Wed, 15 May 2024 11:50:52 +0200,
Jaroslav Kysela wrote:


On 15. 05. 24 11:17, Hans Verkuil wrote:

Hi Jaroslav,

On 5/13/24 13:56, Jaroslav Kysela wrote:

On 09. 05. 24 13:13, Jaroslav Kysela wrote:

On 09. 05. 24 12:44, Shengjiu Wang wrote:

mem2mem is just like the decoder in the compress pipeline. which is
one of the components in the pipeline.


I was thinking of loopback with endpoints using compress streams,
without physical endpoint, something like:

compress playback (to feed data from userspace) -> DSP (processing) ->
compress capture (send data back to userspace)

Unless I'm missing something, you should be able to process data as fast
as you can feed it and consume it in such case.



Actually in the beginning I tried this,  but it did not work well.
ALSA needs time control for playback and capture, playback and capture
needs to synchronize.  Usually the playback and capture pipeline is
independent in ALSA design,  but in this case, the playback and capture
should synchronize, they are not independent.


The core compress API core no strict timing constraints. You can eventually0
have two half-duplex compress devices, if you like to have really independent
mechanism. If something is missing in API, you can extend this API (like to
inform the user space that it's a producer/consumer processing without any
relation to the real time). I like this idea.


I was thinking more about this. If I am right, the mentioned use in gstreamer
is supposed to run the conversion (DSP) job in "one shot" (can be handled
using one system call like blocking ioctl).  The goal is just to offload the
CPU work to the DSP (co-processor). If there are no requirements for the
queuing, we can implement this ioctl in the compress ALSA API easily using the
data management through the dma-buf API. We can eventually define a new
direction (enum snd_compr_direction) like SND_COMPRESS_CONVERT or so to allow
handle this new data scheme. The API may be extended later on real demand, of
course.

Otherwise all pieces are already in the current ALSA compress API
(capabilities, params, enumeration). The realtime controls may be created
using ALSA control API.


So does this mean that Shengjiu should attempt to use this ALSA approach first?


I've not seen any argument to use v4l2 mem2mem buffer scheme for this
data conversion forcefully. It looks like a simple job and ALSA APIs
may be extended for this simple purpose.

Shengjiu, what are your requirements for gstreamer support? Would be a
new blocking ioctl enough for the initial support in the compress ALSA
API?


If it works with compress API, it'd be great, yeah.
So, your idea is to open compress-offload devices for read and write,
then and let them convert a la batch jobs without timing control?

For full-duplex usages, we might need some more extensions, so that
both read and write parameters can be synchronized.  (So far the
compress stream is a unidirectional, and the runtime buffer for a
single stream.)

And the buffer management is based on the fixed size fragments.  I
hope this doesn't matter much for the intended operation?


It's a question, if the standard I/O is really required for this case. My 
quick idea was to just implement a new "direction" for this job supporting 
only one ioctl for the data processing which will execute the job in "one 
shot" at the moment. The I/O may be handled through dma-buf API (which seems 
to be standard nowadays for this purpose and allows future chaining).


So something like:

struct dsp_job {
   int source_fd; /* dma-buf FD with source data - for dma_buf_get() */
   int target_fd; /* dma-buf FD for target data - for dma_buf_get() */
   ... maybe some extra data size members here ...
   ... maybe some special parameters here ...
};

#define SNDRV_COMPRESS_DSPJOB _IOWR('C', 0x60, struct dsp_job)

This ioctl will be blocking (thus synced). My question is, if it's feasible 
for gstreamer or not. For this particular case, if the rate conversion is 
implemented in software, it will block the gstreamer data processing, too.


Jaroslav

--
Jaroslav Kysela 
Linux Sound Maintainer; ALSA Project; Red Hat, Inc.



Re: [PATCH v2 1/1] arch/fault: don't print logs for pte marker poison errors

2024-05-15 Thread Oscar Salvador
On Tue, May 14, 2024 at 03:34:24PM -0600, Peter Xu wrote:
> The question is whether we can't.
> 
> Now we reserved a swp entry just for hwpoison and it makes sense only
> because we cached the poisoned pfn inside.  My long standing question is
> why do we ever need that pfn after all.  If we don't need the pfn, we
> simply need a bit in the pgtable entry saying that it's poisoned, if
> accessed we should kill the process using sigbus.
> 
> I used to comment on this before, the only path that uses that pfn is
> check_hwpoisoned_entry(), which was introduced in:
> 
> commit a3f5d80ea401ac857f2910e28b15f35b2cf902f4
> Author: Naoya Horiguchi 
> Date:   Mon Jun 28 19:43:14 2021 -0700
> 
> mm,hwpoison: send SIGBUS with error virutal address
> 
> Now an action required MCE in already hwpoisoned address surely sends a
> SIGBUS to current process, but the SIGBUS doesn't convey error virtual
> address.  That's not optimal for hwpoison-aware applications.
> 
> To fix the issue, make memory_failure() call kill_accessing_process(),
> that does pagetable walk to find the error virtual address.  It could find
> multiple virtual addresses for the same error page, and it seems hard to
> tell which virtual address is correct one.  But that's rare and sending
> incorrect virtual address could be better than no address.  So let's
> report the first found virtual address for now.
> 
> So this time I read more on this and Naoya explained why - it's only used
> so far to dump the VA of the poisoned entry.

Well, not really dumped, but we just pass the VA down the chain to the
signal handler.

But on the question whether we need the pfn encoded, I am not sure
actually.
check_hwpoisoned_entry() checks whether the pfn where the pte sits is
the same as the hwpoisoned one, so we know if the process has to be
killed.

Now, could we get away with using pte_page() instead of pte_pfn() in there, and
passing the hwpoisoned page instead ot the pfn?
I am trying to think of the implications, then we should not need the
encoded pfn?


> However what confused me is, if an entry is poisoned already logically we
> dump that message in the fault handler not memory_failure(), which is:
> 
> MCE: Killing uffd-unit-tests:650 due to hardware memory corruption fault at 
> 7f3589d7e000
> 
> So perhaps we're trying to also dump that when the MCEs (points to the same
> pfn) are only generated concurrently?  I donno much on hwpoison so I cannot
> tell, there's also implication where it's only triggered if
> MF_ACTION_REQUIRED.  But I think it means hwpoison may work without pfn
> encoded, but I don't know the implication to lose that dmesg line.

Not necesarily concurrently, but simply if for any reason the page could
not have been unmapped.
Let us say you have ProcessA and ProcessB mapping the same page.
We get an MCE for that page but we could not unmapped it, at least not
from all processes (maybe just ProcessA).

memory-failure code will mark it as hwpoison, now ProcessA faults it in
and gets killed because we see that the page is hwpoisoned in the fault
path, so we sill send VM_FAULT_HWPOISON all the way down till you see
the:

"MCE: Killing uffd-unit-tests:650 due to hardware memory corruption
fault at 7f3589d7e000" from e.g: arch/x86/mm/fault.c:do_sigbus()

Now, ProcessB still has the page mapped, so upon re-accessing it,
it will trigger a new MCE event. memory-failure code will see that this
page has already been marked as hwpoisoned, but since we failed to
unmap it (otherwise no one should be re-accessing it), it goes: "ok,
let's just kill everybody who has this page mapped".


> We used to not dump error for swapin error.  Note that here what I am
> saying is not that Axel is doing things wrong, but it's just that logically
> swapin error (as pte marker) can also be with !QUIET, so my final point is
> we may want to avoid having the assumption that "pte marker should always
> be QUITE", because I want to make it clear that pte marker can used in any
> form, so itself shouldn't imply anything..

I think it would make more sense if we have a separate marker for swapin
errors?
I mean, deep down, they do not mean the same as poison, right?

Then you can choose which events get to be silent because you do not
care, and which ones need to scream loud.
I think swapin errors belong to the latter. At least I a heads-up why a
process is getting killed is appreciated, IMHO.
 

-- 
Oscar Salvador
SUSE Labs


Re: [PATCH v15 00/16] Add audio support in v4l2 framework

2024-05-15 Thread Takashi Iwai
On Wed, 15 May 2024 11:50:52 +0200,
Jaroslav Kysela wrote:
> 
> On 15. 05. 24 11:17, Hans Verkuil wrote:
> > Hi Jaroslav,
> > 
> > On 5/13/24 13:56, Jaroslav Kysela wrote:
> >> On 09. 05. 24 13:13, Jaroslav Kysela wrote:
> >>> On 09. 05. 24 12:44, Shengjiu Wang wrote:
> >> mem2mem is just like the decoder in the compress pipeline. which is
> >> one of the components in the pipeline.
> > 
> > I was thinking of loopback with endpoints using compress streams,
> > without physical endpoint, something like:
> > 
> > compress playback (to feed data from userspace) -> DSP (processing) ->
> > compress capture (send data back to userspace)
> > 
> > Unless I'm missing something, you should be able to process data as fast
> > as you can feed it and consume it in such case.
> > 
>  
>  Actually in the beginning I tried this,  but it did not work well.
>  ALSA needs time control for playback and capture, playback and capture
>  needs to synchronize.  Usually the playback and capture pipeline is
>  independent in ALSA design,  but in this case, the playback and capture
>  should synchronize, they are not independent.
> >>> 
> >>> The core compress API core no strict timing constraints. You can 
> >>> eventually0
> >>> have two half-duplex compress devices, if you like to have really 
> >>> independent
> >>> mechanism. If something is missing in API, you can extend this API (like 
> >>> to
> >>> inform the user space that it's a producer/consumer processing without any
> >>> relation to the real time). I like this idea.
> >> 
> >> I was thinking more about this. If I am right, the mentioned use in 
> >> gstreamer
> >> is supposed to run the conversion (DSP) job in "one shot" (can be handled
> >> using one system call like blocking ioctl).  The goal is just to offload 
> >> the
> >> CPU work to the DSP (co-processor). If there are no requirements for the
> >> queuing, we can implement this ioctl in the compress ALSA API easily using 
> >> the
> >> data management through the dma-buf API. We can eventually define a new
> >> direction (enum snd_compr_direction) like SND_COMPRESS_CONVERT or so to 
> >> allow
> >> handle this new data scheme. The API may be extended later on real demand, 
> >> of
> >> course.
> >> 
> >> Otherwise all pieces are already in the current ALSA compress API
> >> (capabilities, params, enumeration). The realtime controls may be created
> >> using ALSA control API.
> > 
> > So does this mean that Shengjiu should attempt to use this ALSA approach 
> > first?
> 
> I've not seen any argument to use v4l2 mem2mem buffer scheme for this
> data conversion forcefully. It looks like a simple job and ALSA APIs
> may be extended for this simple purpose.
> 
> Shengjiu, what are your requirements for gstreamer support? Would be a
> new blocking ioctl enough for the initial support in the compress ALSA
> API?

If it works with compress API, it'd be great, yeah.
So, your idea is to open compress-offload devices for read and write,
then and let them convert a la batch jobs without timing control?

For full-duplex usages, we might need some more extensions, so that
both read and write parameters can be synchronized.  (So far the
compress stream is a unidirectional, and the runtime buffer for a
single stream.)

And the buffer management is based on the fixed size fragments.  I
hope this doesn't matter much for the intended operation?


thanks,

Takashi


Re: [PATCH v15 00/16] Add audio support in v4l2 framework

2024-05-15 Thread Jaroslav Kysela

On 15. 05. 24 11:17, Hans Verkuil wrote:

Hi Jaroslav,

On 5/13/24 13:56, Jaroslav Kysela wrote:

On 09. 05. 24 13:13, Jaroslav Kysela wrote:

On 09. 05. 24 12:44, Shengjiu Wang wrote:

mem2mem is just like the decoder in the compress pipeline. which is
one of the components in the pipeline.


I was thinking of loopback with endpoints using compress streams,
without physical endpoint, something like:

compress playback (to feed data from userspace) -> DSP (processing) ->
compress capture (send data back to userspace)

Unless I'm missing something, you should be able to process data as fast
as you can feed it and consume it in such case.



Actually in the beginning I tried this,  but it did not work well.
ALSA needs time control for playback and capture, playback and capture
needs to synchronize.  Usually the playback and capture pipeline is
independent in ALSA design,  but in this case, the playback and capture
should synchronize, they are not independent.


The core compress API core no strict timing constraints. You can eventually0
have two half-duplex compress devices, if you like to have really independent
mechanism. If something is missing in API, you can extend this API (like to
inform the user space that it's a producer/consumer processing without any
relation to the real time). I like this idea.


I was thinking more about this. If I am right, the mentioned use in gstreamer
is supposed to run the conversion (DSP) job in "one shot" (can be handled
using one system call like blocking ioctl).  The goal is just to offload the
CPU work to the DSP (co-processor). If there are no requirements for the
queuing, we can implement this ioctl in the compress ALSA API easily using the
data management through the dma-buf API. We can eventually define a new
direction (enum snd_compr_direction) like SND_COMPRESS_CONVERT or so to allow
handle this new data scheme. The API may be extended later on real demand, of
course.

Otherwise all pieces are already in the current ALSA compress API
(capabilities, params, enumeration). The realtime controls may be created
using ALSA control API.


So does this mean that Shengjiu should attempt to use this ALSA approach first?


I've not seen any argument to use v4l2 mem2mem buffer scheme for this data 
conversion forcefully. It looks like a simple job and ALSA APIs may be 
extended for this simple purpose.


Shengjiu, what are your requirements for gstreamer support? Would be a new 
blocking ioctl enough for the initial support in the compress ALSA API?


Jaroslav

--
Jaroslav Kysela 
Linux Sound Maintainer; ALSA Project; Red Hat, Inc.



Re: [PATCH v15 00/16] Add audio support in v4l2 framework

2024-05-15 Thread Hans Verkuil
Hi Jaroslav,

On 5/13/24 13:56, Jaroslav Kysela wrote:
> On 09. 05. 24 13:13, Jaroslav Kysela wrote:
>> On 09. 05. 24 12:44, Shengjiu Wang wrote:
> mem2mem is just like the decoder in the compress pipeline. which is
> one of the components in the pipeline.

 I was thinking of loopback with endpoints using compress streams,
 without physical endpoint, something like:

 compress playback (to feed data from userspace) -> DSP (processing) ->
 compress capture (send data back to userspace)

 Unless I'm missing something, you should be able to process data as fast
 as you can feed it and consume it in such case.

>>>
>>> Actually in the beginning I tried this,  but it did not work well.
>>> ALSA needs time control for playback and capture, playback and capture
>>> needs to synchronize.  Usually the playback and capture pipeline is
>>> independent in ALSA design,  but in this case, the playback and capture
>>> should synchronize, they are not independent.
>>
>> The core compress API core no strict timing constraints. You can eventually0
>> have two half-duplex compress devices, if you like to have really independent
>> mechanism. If something is missing in API, you can extend this API (like to
>> inform the user space that it's a producer/consumer processing without any
>> relation to the real time). I like this idea.
> 
> I was thinking more about this. If I am right, the mentioned use in gstreamer 
> is supposed to run the conversion (DSP) job in "one shot" (can be handled 
> using one system call like blocking ioctl).  The goal is just to offload the 
> CPU work to the DSP (co-processor). If there are no requirements for the 
> queuing, we can implement this ioctl in the compress ALSA API easily using 
> the 
> data management through the dma-buf API. We can eventually define a new 
> direction (enum snd_compr_direction) like SND_COMPRESS_CONVERT or so to allow 
> handle this new data scheme. The API may be extended later on real demand, of 
> course.
> 
> Otherwise all pieces are already in the current ALSA compress API 
> (capabilities, params, enumeration). The realtime controls may be created 
> using ALSA control API.

So does this mean that Shengjiu should attempt to use this ALSA approach first?

If there is a way to do this reasonably cleanly in the ALSA API, then that
obviously is much better from my perspective as a media maintainer.

My understanding was always that it can't be done (or at least not without
a major effort) in ALSA, and in that case V4L2 is a decent plan B, but based
on this I gather that it is possible in ALSA after all.

So can I shelf this patch series for now?

Regards,

Hans



linux-next: build warning after merge of the powerpc tree

2024-05-15 Thread Stephen Rothwell
Hi all,

After merging the powerpc tree, today's (it may have been yesterday's)
linux-next build (powerpc allyesconfig) produced this warning:

WARNING: modpost: vmlinux: section mismatch in reference: 
fadump_setup_param_area+0x200 (section: .text.fadump_setup_param_area) -> 
memblock_phys_alloc_range (section: .init.text)

Introduced by commit

  683eab94da75 ("powerpc/fadump: setup additional parameters for dump capture 
kernel")

-- 
Cheers,
Stephen Rothwell


pgpgHE8qJ2WgS.pgp
Description: OpenPGP digital signature