date:20240515

Thanks for the info.  I should be able to do it.  I was hoping an 
assembly guru like you can show me some tricks here if there is :)


No tricks in cswap, it's as straightforward as it gets, so go ahead :-)

Re: [PATCH v15 00/16] Add audio support in v4l2 framework

2024-05-15 Thread Pierre-Louis Bossart

On 5/9/24 06:13, Jaroslav Kysela wrote:
> On 09. 05. 24 12:44, Shengjiu Wang wrote:
 mem2mem is just like the decoder in the compress pipeline. which is
 one of the components in the pipeline.
>>>
>>> I was thinking of loopback with endpoints using compress streams,
>>> without physical endpoint, something like:
>>>
>>> compress playback (to feed data from userspace) -> DSP (processing) ->
>>> compress capture (send data back to userspace)
>>>
>>> Unless I'm missing something, you should be able to process data as fast
>>> as you can feed it and consume it in such case.
>>>
>>
>> Actually in the beginning I tried this,  but it did not work well.
>> ALSA needs time control for playback and capture, playback and capture
>> needs to synchronize.  Usually the playback and capture pipeline is
>> independent in ALSA design,  but in this case, the playback and capture
>> should synchronize, they are not independent.
> 
> The core compress API core no strict timing constraints. You can
> eventually0 have two half-duplex compress devices, if you like to have
> really independent mechanism. If something is missing in API, you can
> extend this API (like to inform the user space that it's a
> producer/consumer processing without any relation to the real time). I
> like this idea.

The compress API was never intended to be used this way. It was meant to
send compressed data to a DSP for rendering, and keep the host processor
in a low-power state while the DSP local buffer was drained. There was
no intent to do a loop back to the host, because that keeps the host in
a high-power state and probably negates the power savings due to a DSP.

The other problem with the loopback is that the compress stuff is
usually a "Front-End" in ASoC/DPCM parlance, and we don't have a good
way to do a loopback between Front-Ends. The entire framework is based
on FEs being connected to BEs.

One problem that I can see for ASRC is that it's not clear when the data
will be completely processed on the "capture" stream when you stop the
"playback" stream. There's a non-zero risk of having a truncated output
or waiting for data that will never be generated.

In other words, it might be possible to reuse/extend the compress API
for a 'coprocessor' approach without any rendering to traditional
interfaces, but it's uncharted territory.

[PATCHv4 3/9] ASoC: fsl-asoc-card: add compatibility to use 2 codecs in dai-links

Adapt the driver to work with configurations using two codecs or more.
Modify fsl_asoc_card_probe() to handle use cases where 2 codecs are
given in the device tree.
This will be needed for the generic codec case.

Use cases using one codec will ignore any given codecs other than the
first.

Signed-off-by: Elinor Montmasson 
Co-authored-by: Philip-Dylan Gleonec 
---
 sound/soc/fsl/fsl-asoc-card.c | 239 --
 1 file changed, 139 insertions(+), 100 deletions(-)

diff --git a/sound/soc/fsl/fsl-asoc-card.c b/sound/soc/fsl/fsl-asoc-card.c
index c83492e7cec2..620a25eb068a 100644
--- a/sound/soc/fsl/fsl-asoc-card.c
+++ b/sound/soc/fsl/fsl-asoc-card.c
@@ -99,7 +99,7 @@ struct fsl_asoc_card_priv {
struct simple_util_jack hp_jack;
struct simple_util_jack mic_jack;
struct platform_device *pdev;
-   struct codec_priv codec_priv;
+   struct codec_priv codec_priv[2];
struct cpu_priv cpu_priv;
struct snd_soc_card card;
u8 streams;
@@ -172,11 +172,13 @@ static int fsl_asoc_card_hw_params(struct 
snd_pcm_substream *substream,
struct snd_soc_pcm_runtime *rtd = snd_soc_substream_to_rtd(substream);
struct fsl_asoc_card_priv *priv = snd_soc_card_get_drvdata(rtd->card);
bool tx = substream->stream == SNDRV_PCM_STREAM_PLAYBACK;
-   struct codec_priv *codec_priv = >codec_priv;
+   struct codec_priv *codec_priv;
+   struct snd_soc_dai *codec_dai;
struct cpu_priv *cpu_priv = >cpu_priv;
struct device *dev = rtd->card->dev;
unsigned int pll_out;
int ret;
+   int i;
 
priv->sample_rate = params_rate(params);
priv->sample_format = params_format(params);
@@ -208,28 +210,32 @@ static int fsl_asoc_card_hw_params(struct 
snd_pcm_substream *substream,
}
 
/* Specific configuration for PLL */
-   if (codec_priv->pll_id >= 0 && codec_priv->fll_id >= 0) {
-   if (priv->sample_format == SNDRV_PCM_FORMAT_S24_LE)
-   pll_out = priv->sample_rate * 384;
-   else
-   pll_out = priv->sample_rate * 256;
+   for_each_rtd_codec_dais(rtd, i, codec_dai) {
+   codec_priv = >codec_priv[i];
 
-   ret = snd_soc_dai_set_pll(snd_soc_rtd_to_codec(rtd, 0),
- codec_priv->pll_id,
- codec_priv->mclk_id,
- codec_priv->mclk_freq, pll_out);
-   if (ret) {
-   dev_err(dev, "failed to start FLL: %d\n", ret);
-   goto fail;
-   }
+   if (codec_priv->pll_id >= 0 && codec_priv->fll_id >= 0) {
+   if (priv->sample_format == SNDRV_PCM_FORMAT_S24_LE)
+   pll_out = priv->sample_rate * 384;
+   else
+   pll_out = priv->sample_rate * 256;
 
-   ret = snd_soc_dai_set_sysclk(snd_soc_rtd_to_codec(rtd, 0),
-codec_priv->fll_id,
-pll_out, SND_SOC_CLOCK_IN);
+   ret = snd_soc_dai_set_pll(codec_dai,
+   codec_priv->pll_id,
+   codec_priv->mclk_id,
+   codec_priv->mclk_freq, pll_out);
+   if (ret) {
+   dev_err(dev, "failed to start FLL: %d\n", ret);
+   goto fail;
+   }
 
-   if (ret && ret != -ENOTSUPP) {
-   dev_err(dev, "failed to set SYSCLK: %d\n", ret);
-   goto fail;
+   ret = snd_soc_dai_set_sysclk(codec_dai,
+   codec_priv->fll_id,
+   pll_out, SND_SOC_CLOCK_IN);
+
+   if (ret && ret != -ENOTSUPP) {
+   dev_err(dev, "failed to set SYSCLK: %d\n", ret);
+   goto fail;
+   }
}
}
 
@@ -244,28 +250,34 @@ static int fsl_asoc_card_hw_free(struct snd_pcm_substream 
*substream)
 {
struct snd_soc_pcm_runtime *rtd = snd_soc_substream_to_rtd(substream);
struct fsl_asoc_card_priv *priv = snd_soc_card_get_drvdata(rtd->card);
-   struct codec_priv *codec_priv = >codec_priv;
+   struct codec_priv *codec_priv;
+   struct snd_soc_dai *codec_dai;
struct device *dev = rtd->card->dev;
int ret;
+   int i;
 
priv->streams &= ~BIT(substream->stream);
 
-   if (!priv->streams && codec_priv->pll_id >= 0 && codec_priv->fll_id >= 
0) {
-   /* Force freq to be free_freq to avoid error message in codec */
-   ret =

[PATCHv4 9/9] ASoC: dt-bindings: fsl-asoc-card: add compatible for generic codec

Add documentation about new dts bindings following new support
for compatible "fsl,imx-audio-generic".

Some CPU DAI don't require a real audio codec. The new compatible
"fsl,imx-audio-generic" allows using the driver with codec drivers
SPDIF DIT and SPDIF DIR as dummy codecs.
It also allows using not pre-configured audio codecs which
don't require specific control through a codec driver.

The new dts properties give the possibility to set some parameters
about the CPU DAI usually set through the codec configuration.

Signed-off-by: Elinor Montmasson 
---
 .../bindings/sound/fsl-asoc-card.yaml | 96 ++-
 1 file changed, 92 insertions(+), 4 deletions(-)

diff --git a/Documentation/devicetree/bindings/sound/fsl-asoc-card.yaml 
b/Documentation/devicetree/bindings/sound/fsl-asoc-card.yaml
index 9922664d5ccc..332d8bf96e06 100644
--- a/Documentation/devicetree/bindings/sound/fsl-asoc-card.yaml
+++ b/Documentation/devicetree/bindings/sound/fsl-asoc-card.yaml
@@ -23,6 +23,16 @@ description:
   and PCM DAI formats. However, it'll be also possible to support those non
   AC'97/I2S/PCM type sound cards, such as S/PDIF audio and HDMI audio, as
   long as the driver has been properly upgraded.
+  To use CPU DAIs that do not require a codec such as an S/PDIF controller,
+  or to use a DAI to output or capture raw I2S/TDM data, you can
+  use the compatible "fsl,imx-audio-generic".
+
+definitions:
+  imx-audio-generic-dependency:
+properties:
+  compatible:
+contains:
+  const: fsl,imx-audio-generic
 
 maintainers:
   - Shengjiu Wang 
@@ -81,6 +91,7 @@ properties:
   - fsl,imx-audio-wm8960
   - fsl,imx-audio-wm8962
   - fsl,imx-audio-wm8958
+  - fsl,imx-audio-generic
 
   model:
 $ref: /schemas/types.yaml#/definitions/string
@@ -93,8 +104,14 @@ properties:
   need to add ASRC support via DPCM.
 
   audio-codec:
-$ref: /schemas/types.yaml#/definitions/phandle
-description: The phandle of an audio codec
+$ref: /schemas/types.yaml#/definitions/phandle-array
+description: |
+  The phandle of an audio codec.
+  If using the "fsl,imx-audio-generic" compatible, give instead a pair of
+  phandles with the spdif_transmitter first (driver SPDIF DIT) and the
+  spdif_receiver second (driver SPDIF DIR).
+items:
+  maxItems: 1
 
   audio-cpu:
 $ref: /schemas/types.yaml#/definitions/phandle
@@ -150,8 +167,8 @@ properties:
 description: dai-link uses bit clock inversion.
 
   mclk-id:
-$ref: /schemas/types.yaml#/definitions/uint32
-description: main clock id, specific for each card configuration.
+$ref: /schemas/types.yaml#/definitions/uint32-array
+description: Main clock id for each codec, specific for each card 
configuration.
 
   mux-int-port:
 $ref: /schemas/types.yaml#/definitions/uint32
@@ -167,10 +184,68 @@ properties:
 $ref: /schemas/types.yaml#/definitions/phandle
 description: The phandle of an CPU DAI controller
 
+  # Properties relevant only with "fsl,imx-audio-generic" compatible
+  dai-tdm-slot-width:
+description: See tdm-slot.txt.
+$ref: /schemas/types.yaml#/definitions/uint32
+
+  dai-tdm-slot-num:
+description: See tdm-slot.txt.
+$ref: /schemas/types.yaml#/definitions/uint32
+
+  clocks:
+description: |
+  The CPU DAI system clock, used to retrieve
+  the CPU DAI system clock frequency with the generic codec.
+maxItems: 1
+
+  clock-names:
+items:
+  - const: cpu_sysclk
+
+  cpu-system-clock-direction-out:
+description: |
+  Specifies cpu system clock direction as 'out' on initialization.
+  If not set, direction is 'in'.
+$ref: /schemas/types.yaml#/definitions/flag
+
+dependencies:
+  dai-tdm-slot-width:
+$ref: "#/definitions/imx-audio-generic-dependency"
+  dai-tdm-slot-num:
+$ref: "#/definitions/imx-audio-generic-dependency"
+  clocks:
+$ref: "#/definitions/imx-audio-generic-dependency"
+  cpu-system-clock-direction-out:
+$ref: "#/definitions/imx-audio-generic-dependency"
+
 required:
   - compatible
   - model
 
+allOf:
+  - if:
+  $ref: "#/definitions/imx-audio-generic-dependency"
+then:
+  properties:
+audio-codec:
+  items:
+- description: SPDIF DIT phandle
+- description: SPDIF DIR phandle
+mclk-id:
+  maxItems: 1
+  items:
+minItems: 1
+maxItems: 2
+else:
+  properties:
+audio-codec:
+  maxItems: 1
+mclk-id:
+  maxItems: 1
+  items:
+maxItems: 1
+
 unevaluatedProperties: false
 
 examples:
@@ -195,3 +270,16 @@ examples:
  "AIN2L", "Line In Jack",
  "AIN2R", "Line In Jack";
 };
+
+  - |
+#include 
+sound-spdif-asrc {
+  compatible = "fsl,imx-audio-generic";
+  model = "spdif-asrc-audio";
+  audio-cpu = <>;
+  audio-asrc = <>;
+  audio-codec = <>,

[PATCHv4 0/9] ASoC: fsl-asoc-card: compatibility integration of a generic codec use case for use with S/PDIF controller

Hello,

This is the v4 of the series of patch aiming to make the machine driver
"fsl-asoc-card" compatible with use cases where there is no real codec
driver. It proposes to use the "spdif_receiver" and "spdif_transmitter"
drivers instead of the dummy codec.
This is a first step in using the S/PDIF controller with the ASRC.

The five first patches add compatibility with the pair of codecs
"spdif_receiver" and "spdif_transmitter" with a new compatible,
"fsl,imx-audio-generic". Codec parameters are set with default values.
Consequently, the driver is modified to work with multi-codec use cases.
It can get 2 codecs phandles from the device tree, and the
"fsl_asoc_card_priv" struct now has 2 "codec_priv" to store properties
of both codecs. It is fixed to 2 codecs as only "fsl,imx-audio-generic"
uses multiple codecs at the moment.
However, the driver now uses "for_each_codecs" macros when possible to
ease future implementations of multi-codec configurations.

The three following patches add configuration options for the
devicetree. They configure the CPU DAI when using
"fsl,imx-audio-generic". These options are usually hard-coded in
"fsl-asoc-card.c" for each audio codec. Because the generic codec could
be used with other CPU DAIs than the S/PDIF controller, setting these
parameters could be required.
These new options try to follow the style of the simple-card driver:
* standard TDM properties are used, as defined in "tdm-slot.txt".
* the CPU DAI system-clock can be specified, allowing the codec to
retrieve its frequency.
* the CPU DAI system-clock direction can be specified through a new
binding, the same way it is done in simple-card.

The last commit updates the DT bindings documentation and add a new
example for the generic codec use case.

This series of patch was successfully built for arm64 and x86 on top of
the latest??"for-next" branch of the ASoC git tree on the 14th of May
2024.
These modifications have also been tested on an i.MX8MN evaluation
board, with a linux kernel RT v6.1.26-rt8.

If you have any question or remark about these commits, don't hesitate
to reply.

Best regards,
Elinor Montmasson

Changelog:
v3 -> v4:
* Use the standard TDM bidings, as defined in "tdm-slot.txt", for the
new optional DT bindings setting the TDM slots number and width.
* Use the clock DT bindings to optionally specify the CPU DAI system
clock frequency, instead of a dedicated new binding.
* Rename the new DT binding "cpu-sysclk-dir-out" to
"cpu-system-clock-direction-out" to better follow the style of the
simple-card driver.
* Merge TX an RX bindings for CPU DAI system-clock, to better follow the
style of the simple-card driver, and also as there was no use case in
fsl-asoc-card where TX and RX settings had to be different.
* Add the documentation for the new bindings in the new DT schema
bindings documentation. Also add an example with the generic codec.
* v3 patch series at :
https://lore.kernel.org/alsa-devel/20231218124058.2047167-1-elinor.montmas...@savoirfairelinux.com/

v2 -> v3:
* When the bitmaster or framemaster are retrieved from the device tree,
the driver will now compare them with the two codecs possibly given in
device tree, and not just the first codec.
* Improve driver modifications to use multiple codecs for better
integration of future multi-codec use cases:
* Use "for_each_codec" macros when possible.
* "fsl_asoc_card_priv" struct now has 2 "codec_priv" as the driver
can currently retrieve 2 codec phandles from the device tree.
* Fix subject of patch 10/10 to follow the style of the subsystem
* v2 patch series at:
https://lore.kernel.org/alsa-devel/20231027144734.3654829-1-elinor.montmas...@savoirfairelinux.com/

v1 -> v2:
* Replace use of the dummy codec by the pair of codecs spdif_receiver /
spdif_transmitter.
* Adapt how dai links codecs are used to take into account the
possibility for multiple codecs per link.
* Change compatible name.
* Adapt driver to be able to register two codecs given in the device
tree.
* v1 patch series at:
https://lore.kernel.org/alsa-devel/20230901144550.520072-1-elinor.montmas...@savoirfairelinux.com/

Elinor Montmasson (9):
ASoC: fsl-asoc-card: add support for dai links with multiple codecs
ASoC: fsl-asoc-card: add second dai link component for codecs
ASoC: fsl-asoc-card: add compatibility to use 2 codecs in dai-links
ASoC: fsl-asoc-card: add new compatible for a generic codec use case
ASoC: fsl-asoc-card: set generic codec as clock provider
ASoC: fsl-asoc-card: add use of devicetree TDM slot properties
ASoC: fsl-asoc-card: add DT clock "cpu_sysclk" with generic codec
ASoC: fsl-asoc-card: add DT property "cpu-system-clock-direction-out"
ASoC: dt-bindings: fsl-asoc-card: add compatible for generic codec

.../bindings/sound/fsl-asoc-card.yaml | 96 +-
sound/soc/fsl/fsl-asoc-card.c | 306 +++---
2 files changed, 287 insertions(+), 115 deletions(-)

--
2.34.1

[PATCHv4 4/9] ASoC: fsl-asoc-card: add new compatible for a generic codec use case

Add the new compatible "fsl,imx-audio-generic" for a generic codec
use case. It allows using the fsl-asoc-card driver with the
spdif_receiver and spdif_transmitter codec drivers used as dummy codecs.
It can be used for cases where there is no real codec or codecs which do
not require declaring controls.

Signed-off-by: Elinor Montmasson 
Co-authored-by: Philip-Dylan Gleonec 
---
 sound/soc/fsl/fsl-asoc-card.c | 29 ++---
 1 file changed, 26 insertions(+), 3 deletions(-)

diff --git a/sound/soc/fsl/fsl-asoc-card.c b/sound/soc/fsl/fsl-asoc-card.c
index 620a25eb068a..a4ecc9093558 100644
--- a/sound/soc/fsl/fsl-asoc-card.c
+++ b/sound/soc/fsl/fsl-asoc-card.c
@@ -567,6 +567,7 @@ static int fsl_asoc_card_probe(struct platform_device *pdev)
struct platform_device *cpu_pdev;
struct fsl_asoc_card_priv *priv;
struct device *codec_dev[2] = { NULL, NULL };
+   const char *generic_codec_dai_names[2];
const char *codec_dai_name;
const char *codec_dev_name[2];
u32 asrc_fmt = 0;
@@ -744,6 +745,11 @@ static int fsl_asoc_card_probe(struct platform_device 
*pdev)
priv->codec_priv[0].fll_id = WM8904_CLK_FLL;
priv->codec_priv[0].pll_id = WM8904_FLL_MCLK;
priv->dai_fmt |= SND_SOC_DAIFMT_CBP_CFP;
+   } else if (of_device_is_compatible(np, "fsl,imx-audio-generic")) {
+   generic_codec_dai_names[0] = "dit-hifi";
+   generic_codec_dai_names[1] = "dir-hifi";
+   priv->dai_link[0].num_codecs = 2;
+   priv->dai_link[2].num_codecs = 2;
} else {
dev_err(>dev, "unknown Device Tree compatible\n");
ret = -EINVAL;
@@ -798,6 +804,12 @@ static int fsl_asoc_card_probe(struct platform_device 
*pdev)
ret = -EPROBE_DEFER;
goto asrc_fail;
}
+   if (of_device_is_compatible(np, "fsl,imx-audio-generic")
+ && !codec_dev[1]) {
+   dev_dbg(>dev, "failed to find second codec device\n");
+   ret = -EPROBE_DEFER;
+   goto asrc_fail;
+   }
 
/* Common settings for corresponding Freescale CPU DAI driver */
if (of_node_name_eq(cpu_np, "ssi")) {
@@ -855,11 +867,21 @@ static int fsl_asoc_card_probe(struct platform_device 
*pdev)
 
/* Normal DAI Link */
priv->dai_link[0].cpus->of_node = cpu_np;
-   priv->dai_link[0].codecs[0].dai_name = codec_dai_name;
 
-   if (!fsl_asoc_card_is_ac97(priv))
+   if (of_device_is_compatible(np, "fsl,imx-audio-generic")) {
+   priv->dai_link[0].codecs[0].dai_name =
+   generic_codec_dai_names[0];
+   priv->dai_link[0].codecs[1].dai_name =
+   generic_codec_dai_names[1];
+   } else {
+   priv->dai_link[0].codecs[0].dai_name = codec_dai_name;
+   }
+
+   if (!fsl_asoc_card_is_ac97(priv)) {
priv->dai_link[0].codecs[0].of_node = codec_np[0];
-   else {
+   if (of_device_is_compatible(np, "fsl,imx-audio-generic"))
+   priv->dai_link[0].codecs[1].of_node = codec_np[1];
+   } else {
u32 idx;
 
ret = of_property_read_u32(cpu_np, "cell-index", );
@@ -990,6 +1012,7 @@ static const struct of_device_id fsl_asoc_card_dt_ids[] = {
{ .compatible = "fsl,imx-audio-wm8958", },
{ .compatible = "fsl,imx-audio-nau8822", },
{ .compatible = "fsl,imx-audio-wm8904", },
+   { .compatible = "fsl,imx-audio-generic", },
{}
 };
 MODULE_DEVICE_TABLE(of, fsl_asoc_card_dt_ids);
-- 
2.34.1

[PATCHv4 5/9] ASoC: fsl-asoc-card: set generic codec as clock provider

The default dai format defined by DAI_FMT_BASE doesn't set if the codec
is consumer or provider of the bit and frame clocks.

S/PDIF DIR usually converts audio signal to an asynchronous I2S/PCM
stream, and doesn't consume a bit or frame clock.

As S/PDIF DIR and DIT are used as codecs for the generic use case,
set codecs as provider of both bit and frame clocks by default.

Signed-off-by: Elinor Montmasson 
Co-authored-by: Philip-Dylan Gleonec 
---
 sound/soc/fsl/fsl-asoc-card.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/sound/soc/fsl/fsl-asoc-card.c b/sound/soc/fsl/fsl-asoc-card.c
index a4ecc9093558..82ed7f4e81a1 100644
--- a/sound/soc/fsl/fsl-asoc-card.c
+++ b/sound/soc/fsl/fsl-asoc-card.c
@@ -750,6 +750,7 @@ static int fsl_asoc_card_probe(struct platform_device *pdev)
generic_codec_dai_names[1] = "dir-hifi";
priv->dai_link[0].num_codecs = 2;
priv->dai_link[2].num_codecs = 2;
+   priv->dai_fmt |= SND_SOC_DAIFMT_CBP_CFP;
} else {
dev_err(>dev, "unknown Device Tree compatible\n");
ret = -EINVAL;
-- 
2.34.1

[PATCHv4 1/9] ASoC: fsl-asoc-card: add support for dai links with multiple codecs

Add support for dai links using multiple codecs for multi-codec
use cases.

Signed-off-by: Elinor Montmasson 
Co-authored-by: Philip-Dylan Gleonec 
---
 sound/soc/fsl/fsl-asoc-card.c | 22 ++
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/sound/soc/fsl/fsl-asoc-card.c b/sound/soc/fsl/fsl-asoc-card.c
index 5ddc0c2fe53f..8a2a6e5461dc 100644
--- a/sound/soc/fsl/fsl-asoc-card.c
+++ b/sound/soc/fsl/fsl-asoc-card.c
@@ -815,10 +815,10 @@ static int fsl_asoc_card_probe(struct platform_device 
*pdev)
 
/* Normal DAI Link */
priv->dai_link[0].cpus->of_node = cpu_np;
-   priv->dai_link[0].codecs->dai_name = codec_dai_name;
+   priv->dai_link[0].codecs[0].dai_name = codec_dai_name;
 
if (!fsl_asoc_card_is_ac97(priv))
-   priv->dai_link[0].codecs->of_node = codec_np;
+   priv->dai_link[0].codecs[0].of_node = codec_np;
else {
u32 idx;
 
@@ -829,11 +829,11 @@ static int fsl_asoc_card_probe(struct platform_device 
*pdev)
goto asrc_fail;
}
 
-   priv->dai_link[0].codecs->name =
+   priv->dai_link[0].codecs[0].name =
devm_kasprintf(>dev, GFP_KERNEL,
   "ac97-codec.%u",
   (unsigned int)idx);
-   if (!priv->dai_link[0].codecs->name) {
+   if (!priv->dai_link[0].codecs[0].name) {
ret = -ENOMEM;
goto asrc_fail;
}
@@ -844,13 +844,19 @@ static int fsl_asoc_card_probe(struct platform_device 
*pdev)
priv->card.num_links = 1;
 
if (asrc_pdev) {
+   int i;
+   struct snd_soc_dai_link_component *codec;
+   struct snd_soc_dai_link *link;
+
/* DPCM DAI Links only if ASRC exists */
priv->dai_link[1].cpus->of_node = asrc_np;
priv->dai_link[1].platforms->of_node = asrc_np;
-   priv->dai_link[2].codecs->dai_name = codec_dai_name;
-   priv->dai_link[2].codecs->of_node = codec_np;
-   priv->dai_link[2].codecs->name =
-   priv->dai_link[0].codecs->name;
+   link = &(priv->dai_link[2]);
+   for_each_link_codecs(link, i, codec) {
+   codec->dai_name = priv->dai_link[0].codecs[i].dai_name;
+   codec->of_node = priv->dai_link[0].codecs[i].of_node;
+   codec->name = priv->dai_link[0].codecs[i].name;
+   }
priv->dai_link[2].cpus->of_node = cpu_np;
priv->dai_link[2].dai_fmt = priv->dai_fmt;
priv->card.num_links = 3;
-- 
2.34.1

[PATCHv4 7/9] ASoC: fsl-asoc-card: add DT clock "cpu_sysclk" with generic codec

Add an optional DT clock "cpu_sysclk" to get the CPU DAI system-clock
frequency when using the generic codec.
It is set for both Tx and Rx.
The way the frequency value is used is up to the CPU DAI driver
implementation.

Signed-off-by: Elinor Montmasson 
---
 sound/soc/fsl/fsl-asoc-card.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/sound/soc/fsl/fsl-asoc-card.c b/sound/soc/fsl/fsl-asoc-card.c
index 9aca8ad15372..c7fc9c16f761 100644
--- a/sound/soc/fsl/fsl-asoc-card.c
+++ b/sound/soc/fsl/fsl-asoc-card.c
@@ -754,6 +754,12 @@ static int fsl_asoc_card_probe(struct platform_device 
*pdev)
snd_soc_of_parse_tdm_slot(np, NULL, NULL,
>cpu_priv.slot_num,
>cpu_priv.slot_width);
+   struct clk *cpu_sysclk = clk_get(>dev, "cpu_sysclk");
+   if (!IS_ERR(cpu_sysclk)) {
+   priv->cpu_priv.sysclk_freq[TX] = 
clk_get_rate(cpu_sysclk);
+   priv->cpu_priv.sysclk_freq[RX] = 
priv->cpu_priv.sysclk_freq[TX];
+   clk_put(cpu_sysclk);
+   }
} else {
dev_err(>dev, "unknown Device Tree compatible\n");
ret = -EINVAL;
-- 
2.34.1

[PATCHv4 6/9] ASoC: fsl-asoc-card: add use of devicetree TDM slot properties

Add use of optional TDM slot properties "dai-tdm-slot-num" and
"dai-tdm-slot-width" through snd_soc_of_parse_tdm_slot().
They allow setting a custom TDM slot width in bits and number of slots
for the CPU DAI when using the generic codec.

Signed-off-by: Elinor Montmasson 
---
 sound/soc/fsl/fsl-asoc-card.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/sound/soc/fsl/fsl-asoc-card.c b/sound/soc/fsl/fsl-asoc-card.c
index 82ed7f4e81a1..9aca8ad15372 100644
--- a/sound/soc/fsl/fsl-asoc-card.c
+++ b/sound/soc/fsl/fsl-asoc-card.c
@@ -751,6 +751,9 @@ static int fsl_asoc_card_probe(struct platform_device *pdev)
priv->dai_link[0].num_codecs = 2;
priv->dai_link[2].num_codecs = 2;
priv->dai_fmt |= SND_SOC_DAIFMT_CBP_CFP;
+   snd_soc_of_parse_tdm_slot(np, NULL, NULL,
+   >cpu_priv.slot_num,
+   >cpu_priv.slot_width);
} else {
dev_err(>dev, "unknown Device Tree compatible\n");
ret = -EINVAL;
-- 
2.34.1

[PATCHv4 2/9] ASoC: fsl-asoc-card: add second dai link component for codecs

Add a second dai link component for codecs that will be used for the
generic codec use case.
It will use spdif_receiver and spdif_transmitter drivers as dummy codec
drivers, needing 2 codecs slots for the links.

To prevent deferring in use cases using only one codec, also set
by default the number of codecs to 1 for the relevant dai links.

Signed-off-by: Elinor Montmasson 
Co-authored-by: Philip-Dylan Gleonec 
---
 sound/soc/fsl/fsl-asoc-card.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/sound/soc/fsl/fsl-asoc-card.c b/sound/soc/fsl/fsl-asoc-card.c
index 8a2a6e5461dc..c83492e7cec2 100644
--- a/sound/soc/fsl/fsl-asoc-card.c
+++ b/sound/soc/fsl/fsl-asoc-card.c
@@ -296,7 +296,7 @@ static int be_hw_params_fixup(struct snd_soc_pcm_runtime 
*rtd,
 
 SND_SOC_DAILINK_DEFS(hifi,
DAILINK_COMP_ARRAY(COMP_EMPTY()),
-   DAILINK_COMP_ARRAY(COMP_EMPTY()),
+   DAILINK_COMP_ARRAY(COMP_EMPTY(), COMP_EMPTY()),
DAILINK_COMP_ARRAY(COMP_EMPTY()));
 
 SND_SOC_DAILINK_DEFS(hifi_fe,
@@ -306,7 +306,7 @@ SND_SOC_DAILINK_DEFS(hifi_fe,
 
 SND_SOC_DAILINK_DEFS(hifi_be,
DAILINK_COMP_ARRAY(COMP_EMPTY()),
-   DAILINK_COMP_ARRAY(COMP_EMPTY()));
+   DAILINK_COMP_ARRAY(COMP_EMPTY(), COMP_EMPTY()));
 
 static const struct snd_soc_dai_link fsl_asoc_card_dai[] = {
/* Default ASoC DAI Link*/
@@ -618,6 +618,8 @@ static int fsl_asoc_card_probe(struct platform_device *pdev)
 
memcpy(priv->dai_link, fsl_asoc_card_dai,
   sizeof(struct snd_soc_dai_link) * ARRAY_SIZE(priv->dai_link));
+   priv->dai_link[0].num_codecs = 1;
+   priv->dai_link[2].num_codecs = 1;
 
priv->card.dapm_routes = audio_map;
priv->card.num_dapm_routes = ARRAY_SIZE(audio_map);
-- 
2.34.1

[PATCHv4 8/9] ASoC: fsl-asoc-card: add DT property "cpu-system-clock-direction-out"

Add new optional DT property "cpu-system-clock-direction-out" to set
sysclk direction as "out" for the CPU DAI when using the generic codec.
It is set for both Tx and Rx.
If not set, the direction is "in".
The way the direction value is used is up to the CPU DAI driver
implementation.

Signed-off-by: Elinor Montmasson 
---
 sound/soc/fsl/fsl-asoc-card.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/sound/soc/fsl/fsl-asoc-card.c b/sound/soc/fsl/fsl-asoc-card.c
index c7fc9c16f761..f3fc2b29c92f 100644
--- a/sound/soc/fsl/fsl-asoc-card.c
+++ b/sound/soc/fsl/fsl-asoc-card.c
@@ -760,6 +760,10 @@ static int fsl_asoc_card_probe(struct platform_device 
*pdev)
priv->cpu_priv.sysclk_freq[RX] = 
priv->cpu_priv.sysclk_freq[TX];
clk_put(cpu_sysclk);
}
+   priv->cpu_priv.sysclk_dir[TX] =
+   of_property_read_bool(np, 
"cpu-system-clock-direction-out") ?
+   SND_SOC_CLOCK_OUT : SND_SOC_CLOCK_IN;
+   priv->cpu_priv.sysclk_dir[RX] = priv->cpu_priv.sysclk_dir[TX];
} else {
dev_err(>dev, "unknown Device Tree compatible\n");
ret = -EINVAL;
-- 
2.34.1

Re: [PATCH 2/3] crypto: X25519 core functions for ppc64le


Hi Andy,

Thanks for the info.  I should be able to do it.  I was hoping an 
assembly guru like you can show me some tricks here if there is :)


Thanks.

-Danny

On 5/15/24 8:33 AM, Andy Polyakov wrote:

+static void cswap(fe51 p, fe51 q, unsigned int bit)
+{
+    u64 t, i;
+    u64 c = 0 - (u64) bit;
+
+    for (i = 0; i < 5; ++i) {
+    t = c & (p[i] ^ q[i]);
+    p[i] ^= t;
+    q[i] ^= t;
+    }
+}


The "c" in cswap stands for "constant-time," and the problem is that 
contemporary compilers have exhibited the ability to produce 
non-constant-time machine code as result of compilation of the above 
kind of technique. The outcome is platform-specific and ironically 
some of PPC code generators were observed to generate "most" 
non-constant-time code. "Most" in sense that execution time 
variations would be most easy to catch.


Just to substantiate the point, consider 
https://godbolt.org/z/faYnEcPT7, and note the conditional branch in 
the middle of the loop, which flies in the face of constant-time-ness. 
In case you object 'bit &= 1' on line 7 in the C code. Indeed, if you 
comment it out, the generated code will be fine. But the point is that 
the compiler is capable of and was in fact observed to figure out that 
the caller passes either one or zero and generate the machine code in 
the assembly window. In other words 'bit &= 1' is just a reflection of 
what the caller does.


... the permanent solution is to do it in assembly. I can put 
together something...


Though you should be able to do this just as well :-) So should I or 
would you?


Cheers.

Re: [PATCH v15 00/16] Add audio support in v4l2 framework

2024-05-15 Thread Shengjiu Wang

On Wed, May 15, 2024 at 6:46 PM Jaroslav Kysela  wrote:
>
> On 15. 05. 24 12:19, Takashi Iwai wrote:
> > On Wed, 15 May 2024 11:50:52 +0200,
> > Jaroslav Kysela wrote:
> >>
> >> On 15. 05. 24 11:17, Hans Verkuil wrote:
> >>> Hi Jaroslav,
> >>>
> >>> On 5/13/24 13:56, Jaroslav Kysela wrote:
>  On 09. 05. 24 13:13, Jaroslav Kysela wrote:
> > On 09. 05. 24 12:44, Shengjiu Wang wrote:
>  mem2mem is just like the decoder in the compress pipeline. which is
>  one of the components in the pipeline.
> >>>
> >>> I was thinking of loopback with endpoints using compress streams,
> >>> without physical endpoint, something like:
> >>>
> >>> compress playback (to feed data from userspace) -> DSP (processing) ->
> >>> compress capture (send data back to userspace)
> >>>
> >>> Unless I'm missing something, you should be able to process data as 
> >>> fast
> >>> as you can feed it and consume it in such case.
> >>>
> >>
> >> Actually in the beginning I tried this,  but it did not work well.
> >> ALSA needs time control for playback and capture, playback and capture
> >> needs to synchronize.  Usually the playback and capture pipeline is
> >> independent in ALSA design,  but in this case, the playback and capture
> >> should synchronize, they are not independent.
> >
> > The core compress API core no strict timing constraints. You can 
> > eventually0
> > have two half-duplex compress devices, if you like to have really 
> > independent
> > mechanism. If something is missing in API, you can extend this API 
> > (like to
> > inform the user space that it's a producer/consumer processing without 
> > any
> > relation to the real time). I like this idea.
> 
>  I was thinking more about this. If I am right, the mentioned use in 
>  gstreamer
>  is supposed to run the conversion (DSP) job in "one shot" (can be handled
>  using one system call like blocking ioctl).  The goal is just to offload 
>  the
>  CPU work to the DSP (co-processor). If there are no requirements for the
>  queuing, we can implement this ioctl in the compress ALSA API easily 
>  using the
>  data management through the dma-buf API. We can eventually define a new
>  direction (enum snd_compr_direction) like SND_COMPRESS_CONVERT or so to 
>  allow
>  handle this new data scheme. The API may be extended later on real 
>  demand, of
>  course.
> 
>  Otherwise all pieces are already in the current ALSA compress API
>  (capabilities, params, enumeration). The realtime controls may be created
>  using ALSA control API.
> >>>
> >>> So does this mean that Shengjiu should attempt to use this ALSA approach 
> >>> first?
> >>
> >> I've not seen any argument to use v4l2 mem2mem buffer scheme for this
> >> data conversion forcefully. It looks like a simple job and ALSA APIs
> >> may be extended for this simple purpose.
> >>
> >> Shengjiu, what are your requirements for gstreamer support? Would be a
> >> new blocking ioctl enough for the initial support in the compress ALSA
> >> API?
> >
> > If it works with compress API, it'd be great, yeah.
> > So, your idea is to open compress-offload devices for read and write,
> > then and let them convert a la batch jobs without timing control?
> >
> > For full-duplex usages, we might need some more extensions, so that
> > both read and write parameters can be synchronized.  (So far the
> > compress stream is a unidirectional, and the runtime buffer for a
> > single stream.)
> >
> > And the buffer management is based on the fixed size fragments.  I
> > hope this doesn't matter much for the intended operation?
>
> It's a question, if the standard I/O is really required for this case. My
> quick idea was to just implement a new "direction" for this job supporting
> only one ioctl for the data processing which will execute the job in "one
> shot" at the moment. The I/O may be handled through dma-buf API (which seems
> to be standard nowadays for this purpose and allows future chaining).
>
> So something like:
>
> struct dsp_job {
> int source_fd; /* dma-buf FD with source data - for dma_buf_get() */
> int target_fd; /* dma-buf FD for target data - for dma_buf_get() */
> ... maybe some extra data size members here ...
> ... maybe some special parameters here ...
> };
>
> #define SNDRV_COMPRESS_DSPJOB _IOWR('C', 0x60, struct dsp_job)
>
> This ioctl will be blocking (thus synced). My question is, if it's feasible
> for gstreamer or not. For this particular case, if the rate conversion is
> implemented in software, it will block the gstreamer data processing, too.
>

Thanks.

I have several questions:
1.  Compress API alway binds to a sound card.  Can we avoid that?
 For ASRC, it is just one component,

2.  Compress API doesn't seem to support mmap().  Is this a problem
 for sending and getting data

Re: [PATCH 2/3] crypto: X25519 core functions for ppc64le


+static void cswap(fe51 p, fe51 q, unsigned int bit)
+{
+    u64 t, i;
+    u64 c = 0 - (u64) bit;
+
+    for (i = 0; i < 5; ++i) {
+    t = c & (p[i] ^ q[i]);
+    p[i] ^= t;
+    q[i] ^= t;
+    }
+}


The "c" in cswap stands for "constant-time," and the problem is that 
contemporary compilers have exhibited the ability to produce 
non-constant-time machine code as result of compilation of the above 
kind of technique. The outcome is platform-specific and ironically some 
of PPC code generators were observed to generate "most" 
non-constant-time code. "Most" in sense that execution time variations 
would be most easy to catch.


Just to substantiate the point, consider 
https://godbolt.org/z/faYnEcPT7, and note the conditional branch in the 
middle of the loop, which flies in the face of constant-time-ness. In 
case you object 'bit &= 1' on line 7 in the C code. Indeed, if you 
comment it out, the generated code will be fine. But the point is that 
the compiler is capable of and was in fact observed to figure out that 
the caller passes either one or zero and generate the machine code in 
the assembly window. In other words 'bit &= 1' is just a reflection of 
what the caller does.


... the permanent solution is to do it 
in assembly. I can put together something...


Though you should be able to do this just as well :-) So should I or 
would you?


Cheers.

Re: [PATCH 2/3] crypto: X25519 core functions for ppc64le


Hi Andy,

Points taken.  And much appreciate for the help.

Thanks.

-Danny

On 5/15/24 3:29 AM, Andy Polyakov wrote:

Hi,


+static void cswap(fe51 p, fe51 q, unsigned int bit)
+{
+    u64 t, i;
+    u64 c = 0 - (u64) bit;
+
+    for (i = 0; i < 5; ++i) {
+    t = c & (p[i] ^ q[i]);
+    p[i] ^= t;
+    q[i] ^= t;
+    }
+}


The "c" in cswap stands for "constant-time," and the problem is that 
contemporary compilers have exhibited the ability to produce 
non-constant-time machine code as result of compilation of the above 
kind of technique. The outcome is platform-specific and ironically 
some of PPC code generators were observed to generate "most" 
non-constant-time code. "Most" in sense that execution time variations 
would be most easy to catch. One way to work around the problem, at 
least for the time being, is to add 'asm volatile("" : "+r"(c))' after 
you calculate 'c'. But there is no guarantee that the next compiler 
version won't see through it, hence the permanent solution is to do it 
in assembly. I can put together something...


Cheers.

Re: [PATCH 1/3] crypto: X25519 low-level primitives for ppc64le.


See inline.

On 5/15/24 4:06 AM, Andy Polyakov wrote:

Hi,


+SYM_FUNC_START(x25519_fe51_sqr_times)
...
+
+.Lsqr_times_loop:
...
+
+    std    9,16(3)
+    std    10,24(3)
+    std    11,32(3)
+    std    7,0(3)
+    std    8,8(3)
+    bdnz    .Lsqr_times_loop


I see no reason for why the stores can't be moved outside the loop in 
question.



Yeah.  I'll fix it.



+SYM_FUNC_START(x25519_fe51_frombytes)
+.align    5
+
+    li    12, -1
+    srdi    12, 12, 13    # 0x7
+
+    ld    5, 0(4)
+    ld    6, 8(4)
+    ld    7, 16(4)
+    ld    8, 24(4)


Is there actual guarantee that the byte input is 64-bit aligned? While 
it is true that processor is obliged to handle misaligned loads and 
stores by the ISA specification, them being inefficient doesn't go 
against it. Most notably inefficiency is likely to be noted at the 
page boundaries. What I'm trying to say is that it would be more 
appropriate to avoid the unaligned loads (and stores).


Good point.  Maybe I can handle it with 64-bit aligned for the input.

Thanks.




Cheers.

Re: [PATCH 1/3] crypto: X25519 low-level primitives for ppc64le.


Thank you Andy.  Will fix this.

On 5/15/24 3:11 AM, Andy Polyakov wrote:

Hi,

Couple of remarks inline.


+# [1] https://www.openssl.org/~appro/cryptogams/


https://github.com/dot-asm/cryptogams/ is arguably better reference.


+SYM_FUNC_START(x25519_fe51_mul)
+.align    5


The goal is to align the label, not the first instruction after the 
directive. It's not a problem in this spot, in the beginning of the 
module that is, but further below it's likely to inject redundant nops 
between the label and meaningful code. But since the directive in 
question is not position-sensitive one can resolve this by changing 
the order of the directive and the SYM_FUNC_START macro.


Cheers.

Re: [PATCH 1/3] crypto: X25519 low-level primitives for ppc64le.


Hi,


+SYM_FUNC_START(x25519_fe51_sqr_times)
...
+
+.Lsqr_times_loop:
...
+
+   std 9,16(3)
+   std 10,24(3)
+   std 11,32(3)
+   std 7,0(3)
+   std 8,8(3)
+   bdnz.Lsqr_times_loop


I see no reason for why the stores can't be moved outside the loop in 
question.



+SYM_FUNC_START(x25519_fe51_frombytes)
+.align 5
+
+   li  12, -1
+   srdi12, 12, 13  # 0x7
+
+   ld  5, 0(4)
+   ld  6, 8(4)
+   ld  7, 16(4)
+   ld  8, 24(4)


Is there actual guarantee that the byte input is 64-bit aligned? While 
it is true that processor is obliged to handle misaligned loads and 
stores by the ISA specification, them being inefficient doesn't go 
against it. Most notably inefficiency is likely to be noted at the page 
boundaries. What I'm trying to say is that it would be more appropriate 
to avoid the unaligned loads (and stores).


Cheers.

Re: [PATCH 2/3] crypto: X25519 core functions for ppc64le


Hi,


+static void cswap(fe51 p, fe51 q, unsigned int bit)
+{
+   u64 t, i;
+   u64 c = 0 - (u64) bit;
+
+   for (i = 0; i < 5; ++i) {
+   t = c & (p[i] ^ q[i]);
+   p[i] ^= t;
+   q[i] ^= t;
+   }
+}


The "c" in cswap stands for "constant-time," and the problem is that 
contemporary compilers have exhibited the ability to produce 
non-constant-time machine code as result of compilation of the above 
kind of technique. The outcome is platform-specific and ironically some 
of PPC code generators were observed to generate "most" 
non-constant-time code. "Most" in sense that execution time variations 
would be most easy to catch. One way to work around the problem, at 
least for the time being, is to add 'asm volatile("" : "+r"(c))' after 
you calculate 'c'. But there is no guarantee that the next compiler 
version won't see through it, hence the permanent solution is to do it 
in assembly. I can put together something...


Cheers.

Re: [PATCH 1/3] crypto: X25519 low-level primitives for ppc64le.