Re: [PATCH] ASoC: Intel: Atom: read timestamp moved to period_elapsed

2019-07-10 Thread Curtis Malainey
On Tue, Jul 9, 2019 at 4:45 AM Andy Shevchenko
 wrote:
>
> On Mon, Jul 08, 2019 at 09:01:47PM -0700, Alex Levin wrote:
> > sst_platform_pcm_pointer is called from both snd_pcm_period_elapsed and
> > from snd_pcm_ioctl. Calling read timestamp results in recalculating
> > pcm_delay and buffer_ptr (sst_calc_tstamp) which consumes buffers in a
> > faster rate than intended.
> > In a tested BSW system with chtrt5650, for a rate of 48000, the
> > measured rate was sometimes 10 times more than that.
> > After moving the timestamp read to period elapsed, buffer consumption is
> > as expected.
>
> From code prospective it looks good. You may take mine
> Reviewed-by: Andy Shevchenko 
>
> Though I'm not an expert in the area, Pierre and / or Liam should give
> their blessing.
>
Agreed, Liam or Pierre should also give their ok since this is one of
the closed source firmware drivers.

Reviewed-by: Curtis Malainey 
> >
> > Signed-off-by: Alex Levin 
> > ---
> >  sound/soc/intel/atom/sst-mfld-platform-pcm.c | 23 +---
> >  1 file changed, 15 insertions(+), 8 deletions(-)
> >
> > diff --git a/sound/soc/intel/atom/sst-mfld-platform-pcm.c 
> > b/sound/soc/intel/atom/sst-mfld-platform-pcm.c
> > index 0e8b1c5eec88..196af0b30b41 100644
> > --- a/sound/soc/intel/atom/sst-mfld-platform-pcm.c
> > +++ b/sound/soc/intel/atom/sst-mfld-platform-pcm.c
> > @@ -265,16 +265,28 @@ static void sst_period_elapsed(void *arg)
> >  {
> >   struct snd_pcm_substream *substream = arg;
> >   struct sst_runtime_stream *stream;
> > - int status;
> > + struct snd_soc_pcm_runtime *rtd;
> > + int status, ret_val;
> >
> >   if (!substream || !substream->runtime)
> >   return;
> >   stream = substream->runtime->private_data;
> >   if (!stream)
> >   return;
> > +
> > + rtd = substream->private_data;
> > + if (!rtd)
> > + return;
> > +
> >   status = sst_get_stream_status(stream);
> >   if (status != SST_PLATFORM_RUNNING)
> >   return;
> > +
> > + ret_val = stream->ops->stream_read_tstamp(sst->dev, 
> > >stream_info);
> > + if (ret_val) {
> > + dev_err(rtd->dev, "stream_read_tstamp err code = %d\n", 
> > ret_val);
> > + return;
> > + }
> >   snd_pcm_period_elapsed(substream);
> >  }
> >
> > @@ -658,20 +670,15 @@ static snd_pcm_uframes_t sst_platform_pcm_pointer
> >   (struct snd_pcm_substream *substream)
> >  {
> >   struct sst_runtime_stream *stream;
> > - int ret_val, status;
> > + int status;
> >   struct pcm_stream_info *str_info;
> > - struct snd_soc_pcm_runtime *rtd = substream->private_data;
> >
> >   stream = substream->runtime->private_data;
> >   status = sst_get_stream_status(stream);
> >   if (status == SST_PLATFORM_INIT)
> >   return 0;
> > +
> >   str_info = >stream_info;
> > - ret_val = stream->ops->stream_read_tstamp(sst->dev, str_info);
> > - if (ret_val) {
> > - dev_err(rtd->dev, "sst: error code = %d\n", ret_val);
> > - return ret_val;
> > - }
> >   substream->runtime->delay = str_info->pcm_delay;
> >   return str_info->buffer_ptr;
> >  }
> > --
> > 2.22.0.410.gd8fdbe21b5-goog
> >
>
> --
> With Best Regards,
> Andy Shevchenko
>
>


Re: [PATCH v7 3/4] ASoC: rt5677: clear interrupts by polarity flip

2019-06-18 Thread Curtis Malainey
On Tue, Jun 18, 2019 at 11:47 AM Mark Brown  wrote:
>
> On Tue, Jun 18, 2019 at 11:12:58AM -0700, Curtis Malainey wrote:
> > On Tue, Jun 18, 2019 at 11:01 AM Fletcher Woodruff
> > > On Sun, Jun 16, 2019 at 10:56 AM Cezary Rojewski
> > > > On 2019-06-14 21:48, Fletcher Woodruff wrote:
>
> > > > > + ret = regmap_read(rt5677->regmap, RT5677_IRQ_CTRL1, _irq);
> > > > > + if (ret) {
> > > > > + pr_err("rt5677: failed reading IRQ status: %d\n", ret);
>
> > > > The entire rt5677 makes use of dev_XXX with the exception of.. this very
> > > > function. Consider reusing "component" field which is already part of
> > > > struct rt5677_priv and removing pr_XXX.
>
> > > I was using dev_XXX, but I believe Curtis found that 'component' was
> > > sometimes uninitialized when this function was called, so I switched
> > > back to pr_XXX. I may be misremembering though, so I'll let Curtis
> > > comment as well.
>
> > The issue here is that the IRQ is setup in the i2c probe and the
> > component is setup in the codec probe. In theory if the hardware is in
>
> The component is not needed for a struct device, you must have a struct
> device if you have a regmap or have probed at all.
Ah yes, we could modify the struct and store the i2c device and get
the device out of that as well. That will likely be simpler. Ok lets
do that.


Re: [PATCH v7 3/4] ASoC: rt5677: clear interrupts by polarity flip

2019-06-18 Thread Curtis Malainey
On Tue, Jun 18, 2019 at 11:01 AM Fletcher Woodruff
 wrote:
>
> On Sun, Jun 16, 2019 at 10:56 AM Cezary Rojewski
>  wrote:
> > On 2019-06-14 21:48, Fletcher Woodruff wrote:
> > > +static irqreturn_t rt5677_irq(int unused, void *data)
> > > +{
> > > + struct rt5677_priv *rt5677 = data;
> > > + int ret = 0, i, reg_irq, virq;
> > > + bool irq_fired = false;
> > > +
> > > + mutex_lock(>irq_lock);
> > > + /* Read interrupt status */
> > > + ret = regmap_read(rt5677->regmap, RT5677_IRQ_CTRL1, _irq);
> > > + if (ret) {
> > > + pr_err("rt5677: failed reading IRQ status: %d\n", ret);
> >
> > The entire rt5677 makes use of dev_XXX with the exception of.. this very
> > function. Consider reusing "component" field which is already part of
> > struct rt5677_priv and removing pr_XXX.
>
> I was using dev_XXX, but I believe Curtis found that 'component' was
> sometimes uninitialized when this function was called, so I switched
> back to pr_XXX. I may be misremembering though, so I'll let Curtis
> comment as well.
>
The issue here is that the IRQ is setup in the i2c probe and the
component is setup in the codec probe. In theory if the hardware is in
a bad state you can get an interrupt between the probes and have the
interrupt called with the component being NULL. It might be worth
considering migrating the IRQ setup to the codec probe so this
condition cannot happen and then we can avoid using pr_XXX.
> > > + if (ret) {
> > > + dev_err(>dev, "Failed to request IRQ: %d\n", ret);
> > >   return ret;
> >
> > You may want to simplify this, similarly to how's it done in your
> > rt5677_i2c_probe - leave message alone and return "ret" explicitly at
> > the end.
>
> Good suggestion. I'll update that for the next patch.


Re: [PATCH] ASoC: Intel: sst: fix kmalloc call with wrong flags

2019-06-07 Thread Curtis Malainey
On Fri, Jun 7, 2019 at 3:19 PM Alex Levin  wrote:
>
> When calling kmalloc with GFP_KERNEL in case CONFIG_SLOB is unset,
> kmem_cache_alloc_trace is called.
>
> In case CONFIG_TRACING is set, kmem_cache_alloc_trace will ball
nit: *call
> slab_alloc, which will call slab_pre_alloc_hook which might_sleep_if.
>
> The context in which it is called in this case, the
> intel_sst_interrupt_mrfld, calling a sleeping kmalloc generates a BUG():
>
> Fixes: 972b0d456e64 ("ASoC: Intel: remove GFP_ATOMIC, use GFP_KERNEL")
>
> [   20.250671] BUG: sleeping function called from invalid context at 
> mm/slab.h:422
> [   20.250683] in_atomic(): 1, irqs_disabled(): 1, pid: 1791, name: 
> Chrome_IOThread
> [   20.250690] CPU: 0 PID: 1791 Comm: Chrome_IOThread Tainted: GW 
> 4.19.43 #61
> [   20.250693] Hardware name: GOOGLE Kefka, BIOS Google_Kefka.7287.337.0 
> 03/02/2017
> [   20.250893] R10: 562dd1064d30 R11: 3c8cc825b908 R12: 
> 3c8cc966d3c0
> [   20.250896] R13: 3c8cc9e280c0 R14: 00000000 R15: 
> 
>
> Signed-off-by: Alex Levin 
Reviewed-by: Curtis Malainey 
> ---
>


Re: [PATCH v5 1/3] ASoC: rt5677: allow multiple interrupt sources

2019-05-09 Thread Curtis Malainey
From: Mark Brown 
Date: Wed, May 8, 2019 at 7:33 PM
To: Curtis Malainey
Cc: Fletcher Woodruff, Linux Kernel Mailing List, Ben Zhang, Jaroslav
Kysela, Liam Girdwood, Oder Chiou, Takashi Iwai, Curtis Malainey,


> On Wed, May 08, 2019 at 02:39:32PM -0700, Curtis Malainey wrote:
>
> > Pixelbooks (Samus Chromebook) are the only devices that use this part.
> > Realtek has confirmed this. Therefore we only have to worry about
> > breaking ourselves. That being said I agree there is likely a better
>
> And there are no other parts that are software compatible enough to
> share the same driver?
>
the rt5676 can use this driver, but from my discussions with Realtek,
Samus is the only active consumer of this driver.

> > way to handle general abstraction here. We will need the explicit irq
> > handling since I will be following these patches up with patches that
> > enable hotwording on the codec (we will be sending the firmware to
> > linux-firmware as well that is needed for the process.)
>
> OK.  Like I said it might also be clearer split into multiple patches,
> it was just really difficult to tell what was going on with the diff
> there.


Re: [PATCH v5 1/3] ASoC: rt5677: allow multiple interrupt sources

2019-05-08 Thread Curtis Malainey
From: Mark Brown 
Date: Wed, May 8, 2019 at 12:36 AM
To: Fletcher Woodruff
Cc: , Ben Zhang, Jaroslav Kysela, Liam
Girdwood, Oder Chiou, Takashi Iwai, Curtis Malainey,


> On Tue, May 07, 2019 at 04:01:13PM -0600, Fletcher Woodruff wrote:
>
> > This patch does not add polarity flipping support within regmap-irq
> > because there is extra work that must be done within the irq handler
> > to support hotword detection. On the Chromebook Pixel, the firmware will
> > disconnect GPIO1 from the jack detection irq when a hotword is detected
> > and trigger the interrupt handler. Inside the handler, we will need to
> > detect this, report the hotword event, and re-connect GPIO1 to the jack
> > detection irq.
>
> Please have a conversation with your firmware team about the concept of
> abstraction - this is clearly a problematic thing to do as it's causing
> the state of the system to change for devices that are mostly managed
> from the operating system.  It's not clear to me that this shouldn't be
> split off somehow so that it doesn't impact other systems using this
> hardware.
>
Pixelbooks (Samus Chromebook) are the only devices that use this part.
Realtek has confirmed this. Therefore we only have to worry about
breaking ourselves. That being said I agree there is likely a better
way to handle general abstraction here. We will need the explicit irq
handling since I will be following these patches up with patches that
enable hotwording on the codec (we will be sending the firmware to
linux-firmware as well that is needed for the process.)


Re: [alsa-devel] [PATCH v4 1/3] ASoC: rt5677: allow multiple interrupt sources

2019-05-07 Thread Curtis Malainey
On Fri, May 3, 2019 at 4:10 PM Fletcher Woodruff  wrote:
>
> From: Ben Zhang 
>
> This patch allows headphone plug detect and mic present
> detect to be enabled at the same time. This patch implements
> an irq_chip with irq_domain directly instead of using
> regmap-irq, so that interrupt source line polarities can
> be flipped to support irq sharing.
>
> This patch does not add polarity flipping support within regmap-irq
> because there is extra work that must be done within the irq handler
> to support hotword detection. On the Chromebook Pixel, the firmware will
> disconnect GPIO1 from the jack detection irq when a hotword is detected
> and trigger the interrupt handler. Inside the handler, we will need to
> detect this, report the hotword event, and re-connect GPIO1 to the jack
> detection irq.
>
> Signed-off-by: Ben Zhang 
> Signed-off-by: Fletcher Woodruff 
> ---
>  sound/soc/codecs/rt5677.c | 256 --
>  sound/soc/codecs/rt5677.h |  14 ++-
>  2 files changed, 203 insertions(+), 67 deletions(-)
>
> diff --git a/sound/soc/codecs/rt5677.c b/sound/soc/codecs/rt5677.c
> index 9b7a1833d3316c..899e07e30228a1 100644
> --- a/sound/soc/codecs/rt5677.c
> +++ b/sound/soc/codecs/rt5677.c
> @@ -23,6 +23,10 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
> +#include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -4620,7 +4624,6 @@ static void rt5677_gpio_config(struct rt5677_priv 
> *rt5677, unsigned offset,
>  static int rt5677_to_irq(struct gpio_chip *chip, unsigned offset)
>  {
> struct rt5677_priv *rt5677 = gpiochip_get_data(chip);
> -   struct regmap_irq_chip_data *data = rt5677->irq_data;
> int irq;
>
> if ((rt5677->pdata.jd1_gpio == 1 && offset == RT5677_GPIO1) ||
> @@ -4646,7 +4649,7 @@ static int rt5677_to_irq(struct gpio_chip *chip, 
> unsigned offset)
> return -ENXIO;
> }
>
> -   return regmap_irq_get_virq(data, irq);
> +   return irq_create_mapping(rt5677->domain, irq);
>  }
>
>  static const struct gpio_chip rt5677_template_chip = {
> @@ -4716,37 +4719,13 @@ static int rt5677_probe(struct snd_soc_component 
> *component)
>
> snd_soc_component_force_bias_level(component, SND_SOC_BIAS_OFF);
>
> -   regmap_write(rt5677->regmap, RT5677_DIG_MISC, 0x0020);
> +   regmap_update_bits(rt5677->regmap, RT5677_DIG_MISC,
> +   ~RT5677_IRQ_DEBOUNCE_SEL_MASK, 0x0020);
> regmap_write(rt5677->regmap, RT5677_PWR_DSP2, 0x0c00);
>
> for (i = 0; i < RT5677_GPIO_NUM; i++)
> rt5677_gpio_config(rt5677, i, rt5677->pdata.gpio_config[i]);
>
> -   if (rt5677->irq_data) {
> -   regmap_update_bits(rt5677->regmap, RT5677_GPIO_CTRL1, 0x8000,
> -   0x8000);
> -   regmap_update_bits(rt5677->regmap, RT5677_DIG_MISC, 0x0018,
> -   0x0008);
> -
> -   if (rt5677->pdata.jd1_gpio)
> -   regmap_update_bits(rt5677->regmap, RT5677_JD_CTRL1,
> -   RT5677_SEL_GPIO_JD1_MASK,
> -   rt5677->pdata.jd1_gpio <<
> -   RT5677_SEL_GPIO_JD1_SFT);
> -
> -   if (rt5677->pdata.jd2_gpio)
> -   regmap_update_bits(rt5677->regmap, RT5677_JD_CTRL1,
> -   RT5677_SEL_GPIO_JD2_MASK,
> -   rt5677->pdata.jd2_gpio <<
> -   RT5677_SEL_GPIO_JD2_SFT);
> -
> -   if (rt5677->pdata.jd3_gpio)
> -   regmap_update_bits(rt5677->regmap, RT5677_JD_CTRL1,
> -   RT5677_SEL_GPIO_JD3_MASK,
> -   rt5677->pdata.jd3_gpio <<
> -   RT5677_SEL_GPIO_JD3_SFT);
> -   }
> -
> mutex_init(>dsp_cmd_lock);
> mutex_init(>dsp_pri_lock);
>
> @@ -5063,65 +5042,210 @@ static void rt5677_read_device_properties(struct 
> rt5677_priv *rt5677,
> >pdata.jd3_gpio);
>  }
>
> -static struct regmap_irq rt5677_irqs[] = {
> +struct rt5677_irq_desc {
> +   unsigned int enable_mask;
> +   unsigned int status_mask;
> +   unsigned int polarity_mask;
> +};
> +
> +static const struct rt5677_irq_desc rt5677_irq_descs[] = {
> [RT5677_IRQ_JD1] = {
> -   .reg_offset = 0,
> -   .mask = RT5677_EN_IRQ_GPIO_JD1,
> +   .enable_mask = RT5677_EN_IRQ_GPIO_JD1,
> +   .status_mask = RT5677_STA_GPIO_JD1,
> +   .polarity_mask = RT5677_INV_GPIO_JD1,
> },
> [RT5677_IRQ_JD2] = {
> -   .reg_offset = 0,
> -   .mask = RT5677_EN_IRQ_GPIO_JD2,
> +   .enable_mask = RT5677_EN_IRQ_GPIO_JD2,
> +   .status_mask = RT5677_STA_GPIO_JD2,
> +   .polarity_mask = RT5677_INV_GPIO_JD2,
> },
> [RT5677_IRQ_JD3] = {
> -   .reg_offset = 0,
> 

Re: [PATCH] ASoC: core: Fix use-after-free after deferred card registration

2019-03-26 Thread Curtis Malainey
This has already been patched. See
https://mailman.alsa-project.org/pipermail/alsa-devel/2019-March/146150.html

On Tue, Mar 26, 2019 at 10:23 AM Guenter Roeck  wrote:
>
> If snd_soc_register_card() fails because one of its links fails
> to instantiate with -EPROBE_DEFER, and the to-be-registered link
> is a legacy link, a subsequent retry will trigger a use-after-free
> and quite often a system crash.
>
> Example:
>
> byt-max98090 byt-max98090: ASoC: failed to init link Baytrail Audio
> byt-max98090 byt-max98090: snd_soc_register_card failed -517
> 
> BUG: KASAN: use-after-free in snd_soc_init_platform+0x233/0x312
> Read of size 8 at addr 888067c43070 by task kworker/1:1/23
>
> snd_soc_init_platform() allocates memory attached to the card device.
> This memory is released when the card device is released. However,
> the pointer to the memory (dai_link->platforms) is only cleared from
> soc_cleanup_platform(), which is called from soc_cleanup_card_resources(),
> but not if snd_soc_register_card() fails early.
>
> Add the missing call to soc_cleanup_platform() in the error handling
> code of snd_soc_register_card() to fix the problem.
>
> Fixes: 78a24e10cd94 ("ASoC: soc-core: clear platform pointers on error")
> Cc: Curtis Malainey 
> Signed-off-by: Guenter Roeck 
> ---
>  sound/soc/soc-core.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c
> index 93d316d5bf8e..6bf9884d0863 100644
> --- a/sound/soc/soc-core.c
> +++ b/sound/soc/soc-core.c
> @@ -2799,6 +2799,7 @@ int snd_soc_register_card(struct snd_soc_card *card)
> if (ret) {
> dev_err(card->dev, "ASoC: failed to init link %s\n",
> link->name);
> +   soc_cleanup_platform(card);
> mutex_unlock(_mutex);
> return ret;
> }
> --
> 2.7.4
>


Re: Regression: ASoC: soc-core: clear platform pointers on error

2019-03-01 Thread Curtis Malainey
On Fri, Mar 1, 2019 at 4:07 AM Jon Hunter  wrote:
>
> Hi Mark, Curtis,
>
> I am seeing a regression on -next where the soundcard on one of our
> Tegra boards fails to initialise following a probe deferral. The bisect
> points to the commit 78a24e10cd94420f1b4e2dc5923ae7109e2aaba1 ('ASoC:
> soc-core: clear platform pointers on error') and reverting this on top
> of -next fixes the problem.
>
> Looking at the bootlog from the failure I see ...
>
>  tegra-snd-wm8903 sound: ASoC: failed to init link WM8903
>
>  tegra-snd-wm8903 sound: snd_soc_register_card failed (-517)
>
>  tegra30-i2s 70080400.i2s: DMA channels sourced from device 7008.ahub
>
>  tegra-snd-wm8903 sound: ASoC: Both platform name/of_node are set for WM8903
>
>  tegra-snd-wm8903 sound: ASoC: failed to init link WM8903
>
>  tegra-snd-wm8903 sound: snd_soc_register_card failed (-22)
>
>  tegra-snd-wm8903: probe of sound failed with error -22
>
>
> With the above change I see soc_cleanup_platform() is ever being called
> when the probe is deferred and hence leads to the failure. Note that the
> initial failure, "ASoC: failed to init link WM8903" occurs very early
> in snd_soc_register_card() when initialising the prelinks.
>
> The following fixes it, but I have not scrutinised the code to see if
> there are other exit points that we need to handle.
>
You are indeed correct. That should be there. I think I got confused
when I was doing my initial debugging on our local 4.19 kernel as the
cleanup code had changed when I was testing on -next and forgot to add
this to my upstream patch and it wasn't caught in my testing. Thanks
for catching this. I went back and looked and it appears that is the
only call to soc_init_dai_link that is outside soc_instantiate_card
and therefore not caught by the original patch's cleanup routine. Do
you want to submit a patch and I'll add my signoff?
> diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c
> index 93d316d5bf8e..6f66beb0c3ae 100644
> --- a/sound/soc/soc-core.c
> +++ b/sound/soc/soc-core.c
> @@ -2797,6 +2797,7 @@ int snd_soc_register_card(struct snd_soc_card *card)
>
> ret = soc_init_dai_link(card, link);
> if (ret) {
> +   soc_cleanup_platform(card);
> dev_err(card->dev, "ASoC: failed to init link %s\n",
> link->name);
> mutex_unlock(_mutex);
>
> Cheers
> Jon
>
> --
> nvpublic


Re: [alsa-devel] [PATCH] ASoC: soc-core: Fix null pointer dereference in soc_find_component

2019-01-24 Thread Curtis Malainey
I have a patch to fix the memory leak but I haven't been able to test
it yet because I am remote right now and I accidentally bootlooped the
AMD device I am working on. I will have this tested early next week.
Here is the patch for anyone interested.

Curtis Malainey | Software Engineer | cujomalai...@google.com | 650-898-3849


On Fri, Jan 25, 2019 at 3:26 AM Mark Brown  wrote:
>
> On Thu, Jan 24, 2019 at 01:07:17PM -0600, Pierre-Louis Bossart wrote:
>
> > Thanks for the feedback, will send a formal patch with the helper and
> > machine driver changes after I test more with the legacy drivers. Do you
> > have a preference for one patch that deals with multiple machines drivers in
> > one shot, or individual patches? The latter are nicer for backports (e.g.
> > for Chrome), the former nicer for maintainers...
>
> More patches is good, it doesn't make a huge difference if I get one big
> patch or a series of repetitive patches - big serieses are more of an
> issue if they're all different patches needing individual review.


0001-ASoC-soc-core-clear-platform-pointers-on-error.patch
Description: Binary data


Re: [alsa-devel] [PATCH] ASoC: soc-core: Fix null pointer dereference in soc_find_component

2019-01-22 Thread Curtis Malainey
Curtis Malainey | Software Engineer | cujomalai...@google.com | 650-898-3849


On Wed, Jan 23, 2019 at 4:11 AM Pierre-Louis Bossart
 wrote:
>
>
> > The issue was that we were seeing a memory corruption bug on an AMD
> > chromebooks with that function already (not observed on Intel). I was
> > testing some SOF integrations and was seeing this in the kernel logs.
> > I had Dylan verify my logic before I sent the patch because it took so
> > long to identify the bug and it was traced to the patch that introduce
> > soc_init_platform.
> >
> > [   10.922112] cz-da7219-max98357a AMD7219:00: ASoC: CPU DAI
> > designware-i2s.1.auto not registered
> > [   10.922122] cz-da7219-max98357a AMD7219:00:
> > devm_snd_soc_register_card(acpd7219m98357) failed: -517
> > [   11.001411] cz-da7219-max98357a AMD7219:00: ASoC: Both platform
> > name/of_node are set for amd-max98357-play
> > [   11.001423] cz-da7219-max98357a AMD7219:00: ASoC: failed to init
> > link amd-max98357-play
> > [   11.001431] cz-da7219-max98357a AMD7219:00:
> > devm_snd_soc_register_card(acpd7219m98357) failed: -22
> > [   11.001577] cz-da7219-max98357a: probe of AMD7219:00 failed with error 
> > -22
> >
> > of_node was never getting set but the pointer was becoming populated
> > (outside of the probe call) which traced to soc_init_platform function
> > which was not reallocating memory on a EPROBE_DEFER even though it was
> > getting freed by devm. I am not very familiar with devm but my local
> > maintainers say that it should be freeing the memory even on a
> > PROBE_DEFER.
> > The patch should mirror the memory behaviour in
> > snd_soc_init_multicodec which also reallocates its memory on every
> > probe. I'm not sure how the patch is causing you to defer, is your
> > component list corrupt?
> >
> > Sorry for the duplicate spam, forgot to send via plain text mode,
> > re-sending for the mailing list so it gets accepted.
>
> There is no defer issue with the intel stuff, but we call this routine
> multiple times
>
> snd_soc_register_card
>
> --soc_init_dai_link
>
> snd_soc_init_platform
>
> -- soc_soc_bind_card
>
> snd_soc_instantiate_card
>
> -- soc_check_tplg_fes
>
>  snd_soc_init_platform << ALLOC1
>
> soc_init_dai_link
>
> --snd_soc_init_platform << ALLOC2
>
Ah that explains it, in my testing I didn't have the patch that
brought in the call from within tplg_fes
>
> Initially dai_link->legacy_platform is 0, so gets set after the first
> first devm_kzalloc (ALLOC1) and after that we always allocate new memory
> (ALLOC2). The end result is that whatever we set in soc_check_tplg_fes
> is lost with the new/unnecessary alloc.
>
> I would guess your solution is also a work-around, if devm_ effectively
> freed the memory then the pointer would become NULL. Or may that's the
> issue is that no one actually resets it.
>
>
Yes, its a work around to fix the memory issue. If you set the
platform in the machine driver the code will ignore it and not reset
it. That being said that is not a full proof workaround and a better
solution is definitely needed. We could go and clean up the pointers
in soc_instantiate_card based on the flag being set. That way we only
relocate on a NULL pointer like we used to but still don't affect
statically allocated memory. I will draft a patch, test it on the AMD
device, reply to this thread later with it, Pierre can you test it as
well?

I am curious why soc_check_tplg_fes is calling snd_soc_init_platform.
It should have already been called earlier, in soc_init_dai_link at
the beginning of snd_soc_register_card so the memory should already be
initialized. Unless I am missing somewhere where links are getting
added between the calls.


Re: [alsa-devel] [PATCH] ASoC: soc-core: Fix null pointer dereference in soc_find_component

2019-01-18 Thread Curtis Malainey
On Fri, Jan 18, 2019 at 5:12 PM Curtis Malainey  wrote:
>
>
>
> On Fri, Jan 18, 2019 at 3:02 PM Pierre-Louis Bossart 
>  wrote:
>>
>>
>> On 1/15/19 3:16 PM, Pierre-Louis Bossart wrote:
>> >
>> >>> Beyond the fact that the platform_name seems to be totally useless,
>> >>> additional tests show that the patch ('ASoC: soc-core: defer card probe
>> >>> until all component is added to list') adds a new restriction which
>> >>> contradicts existing error checks.
>> >>>
>> >>> None of the Intel machine drivers set the dailink "cpu_name" field
>> >>> but use
>> >>> the "cpu_dai_name" field instead. This was perfectly legit as
>> >>> documented by
>> >>> the code at the end of soc_init_dai_link()
>> >> This should be fixed by the patch
>> >> "ASoC: core: Don't defer probe on optional, NULL components" which Mark
>> >> already applied to his tree. See
>> >> http://mailman.alsa-project.org/pipermail/alsa-devel/2019-January/144323.html
>> >>
>> >
>> > Ah yes, I missed this patch while I was debugging. Indeed this fixes
>> > the problem and my devices work again with Mark's for-next branch.
>> > Thanks Matthias!
>>
>> This PROBE_DEFER support actually breaks the topology override that
>> we've been relying on for SOF (and which has been in Mark's branch for
>> some time now). This override helps us reuse machine drivers between
>> legacy and SOF-based solutions.
>>
>> With the current code, the tests in soc_register_card() complain that
>> the platform_name can't be tied to a component and stop the card
>> registration, but that's mainly because the tests are done before the
>> topology overrides are done in soc_check_tplg_fes(). Moving
>> soc_check_tplg_fes() from soc_instantiate_card() to an earlier time in
>> soc_register_card() works-around the problem but looks quite invasive
>> (mutex lock, etc).
>>
>> There is also a second problem where we seem to have a memory management
>> issue root caused to the change in snd_soc_init_platform() added by
>> 09ac6a817bd6 ('ASoC: soc-core: fix init platform memory handling')
>>
>> The code does this
>>
>> static int snd_soc_init_platform(struct snd_soc_card *card,
>>   struct snd_soc_dai_link *dai_link)
>> {
>>  struct snd_soc_dai_link_component *platform = dai_link->platform;
>>
>>
>>  /* convert Legacy platform link */
>>  if (!platform || dai_link->legacy_platform) {
>>  platform = devm_kzalloc(card->dev,
>>  sizeof(struct snd_soc_dai_link_component),
>>  GFP_KERNEL);
>>  if (!platform)
>>  return -ENOMEM;
>>
>>  dai_link->platform  = platform;
>>  dai_link->legacy_platform = 1;
>>
>> This last assignment guarantees that memory will be allocated every time
>> this function is called, and whatever overrides are done later will
>> themselves be overridden by the new allocation. I am not sure what the
>> intent was here, Curtis can you please double-check?
>>
The issue was that we were seeing a memory corruption bug on an AMD
chromebooks with that function already (not observed on Intel). I was
testing some SOF integrations and was seeing this in the kernel logs.
I had Dylan verify my logic before I sent the patch because it took so
long to identify the bug and it was traced to the patch that introduce
soc_init_platform.

[   10.922112] cz-da7219-max98357a AMD7219:00: ASoC: CPU DAI
designware-i2s.1.auto not registered
[   10.922122] cz-da7219-max98357a AMD7219:00:
devm_snd_soc_register_card(acpd7219m98357) failed: -517
[   11.001411] cz-da7219-max98357a AMD7219:00: ASoC: Both platform
name/of_node are set for amd-max98357-play
[   11.001423] cz-da7219-max98357a AMD7219:00: ASoC: failed to init
link amd-max98357-play
[   11.001431] cz-da7219-max98357a AMD7219:00:
devm_snd_soc_register_card(acpd7219m98357) failed: -22
[   11.001577] cz-da7219-max98357a: probe of AMD7219:00 failed with error -22

of_node was never getting set but the pointer was becoming populated
(outside of the probe call) which traced to soc_init_platform function
which was not reallocating memory on a EPROBE_DEFER even though it was
getting freed by devm. I am not very familiar with devm but my local
maintainers say that it should be freeing the memory even on a
PROBE_DEFER.
The patch should mirror the memory behaviour in
snd_soc_init_multicodec which also reallocates its memory on every
probe. I'm not sure how the patch is causing you to defer, is your
component list corrupt?

Sorry for the duplicate spam, forgot to send via plain text mode,
re-sending for the mailing list so it gets accepted.
>
>> Details, test code and logs are available here:
>> https://github.com/thesofproject/linux/issues/565
>>
>> Have a nice week-end everyone, that's it for me until Tuesday.
>>
>> -Pierre
>>
>>
>>