On 4/23/26 3:11 AM, Luca Weiss wrote:
Hi Val,

On Thu Apr 23, 2026 at 6:41 AM CEST, Val Packett wrote:
The response sent by the firmware when requesting a clock vote (opcode
AFE_CMD_RSP_REMOTE_LPASS_CORE_HW_VOTE_REQUEST) does not actually have
the same opcode + status payload as APR_BASIC_RSP_RESULT. Rather, it
returns one single u32 which is the client_handle that must be used in
future unvote requests for the same clock.

As a result of this type confusion, the status returned by the callback
to q6afe_vote_lpass_core_hw was actually an out-of-bounds read. It was
only interpreted as success (0) most of the time due to luck, but there
are reports of random ("more likely on cold boot", "depending on hacks
made in other drivers") errors such as:

[   20.961100] qcom-q6afe aprsvc:service:4:4: AFE failed to vote (3)
[   20.961131] Failed to prepare clk 'core': -110

Fix by correctly interpreting the response as a single u32, and actually
store it as the client_handle to ensure unvote would work correctly.

Link: https://lore.kernel.org/all/5976946.DvuYhMxLoT@antlia/
Signed-off-by: Val Packett <[email protected]>
---

Found by reading and comparing with downstream:
https://github.com/YumeMichi/kernel_xiaomi_pipa/blob/27190355fb31284988ddf505cb7cfba5128104cf/techpack/audio/dsp/q6afe.c#L1261-L1263

What's really bizzare though is that some of the logs go:

[ 10.827469] qcom-q6afe aprsvc:service:4:4: cmd = 0x100f4 returned error
= 0x16
[ 10.827512] qcom-q6afe aprsvc:service:4:4: Unknown cmd 0x100f4
[ 14.052896] qcom-q6afe aprsvc:service:4:4: AFE failed to vote (3)

..where the "returned error =" message is something that can only be
printed by the callback if it goes into the **other** switch() branch,
i.e. when hdr->opcode == APR_BASIC_RSP_RESULT. How reading out-of-bounds
only in the AFE_CMD_RSP_REMOTE_LPASS_CORE_HW_VOTE_REQUEST branch would
cause that to happen (even on a subsequent vote after the first one to
perform the read) is beyond me.

Still, the person that reported this heisenbug told me that this has
actually completely fixed it.
(upd: not really it seems, as expected.. there's gotta be something else, could it be trying to use the vote command too early before the firmware is ready for it or something?)
This seems conceptually similar to what I needed to do for SM6350/SM7225
microphone (wip) - it's not necessary for just speaker btw:
https://github.com/z3ntu/linux/commit/107bf0c39e40a207036e963f878f39467f978393

There I'm storing this handle per 'block' and not just one "vote_res",
essentially copying how downstream Linux does it. Your solution
definitely has less lines of diff, but I can't judge whether it's enough
to store it like that :)

Thanks for looking into this though!

Oh wow, didn't know you looked into this already! I'm surprised it was already necessary for getting something to work. We actually shouldn't be getting to the unvote part at runtime yet?? o.0 As only the recent "ASoC: codecs: lpass-*-macro: Switch to PM clock framework for runtime PM" series change that (and tx/rx weren't posted yet).

The downstream multiple storage thing seems completely unnecessary as the entire request-response cycle (afe_apr_send_pkt) happens under the &afe->lock mutex. If another block starts a vote request while another is in progress, it will wait for the first one to unlock the mutex, which would only happen after the response is processed.

~val


Reply via email to