On 4/1/2026 10:23 AM, Shah, Tanmay wrote:
> 
> 
> On 3/31/2026 12:53 PM, Mathieu Poirier wrote:
>> On Mon, Mar 30, 2026 at 01:43:03PM -0500, Shah, Tanmay wrote:
>>> Hello,
>>>
>>> Thanks for the reviews. Please find my comments below:
>>>
>>> On 3/27/2026 2:58 PM, Mathieu Poirier wrote:
>>>> On Tue, Mar 17, 2026 at 01:12:51PM -0700, Tanmay Shah wrote:
>>>>> On AMD-Xilinx platforms cortex-A and cortex-R can be configured as
>>>>> separate subsystems. In this case, both cores can boot independent of
>>>>> each other. If Linux went through uncontrolled reboot during active
>>>>> rpmsg communication, then during next boot it can find rpmsg virtio
>>>>> status not in the reset state. In such case it is important to reset the
>>>>> virtio status during attach callback and wait for sometime for the
>>>>> remote to handle virtio driver reset.
>>>>
>>>> I understand the user case, but won't that situation happen regardless of
>>>> whether cores operate sync or split mode?
>>>>
>>>
>>> I want to make it clear that this is not same as Cortex-R cluster
>>> configured as lockstep vs split mode.
>>>
>>> This is different configuration between Cortex-A cores and Cortex-R
>>> cores. It is a firmware driver configuration of how it treats cortex-A
>>> and Cortex-R subsystems.
>>>
>>> In the firmware driver, we can configure Cortex-A cluster as owner of
>>> Cortex-R cluster, and in that case, if Cortex-A reboots, the firmware
>>> will also reboot cortex-R cores. This policy makes Cortex-A as owner of
>>> Cortex-R cores. In this configuraton this patch is not needed, because
>>> if Cortex-A reboots then platform management firmware will also reboot
>>> Cortex-R cores as well and vdev status flag will be always 0.
>>>
>>> In another configuration in the firmware driver, Cortex-R cores can be
>>> independent of Cortex-A cores. This means, Cortex-A is not the owner of
>>> the Cortex-R cores. Both are independent subsystem. Only in this
>>> configuration, this patch applies because it is possible that Cortex-A
>>> reboots while Cortex-R doesn't and Cortex-R still runs as it is.
>>>
>>> So only in the second type of configuration, this patch is needed when
>>> COrtex-A running linux reboots and when driver probes and tries to
>>> attach it can find that vdev flag is not reset. In the first
>>> configuartion if linux reboots, then It's guranteed that vdev status
>>> flag will always be in the reset state.
>>>
>>> If you prefer I can extend the commit message with all above details or
>>> put as comment in the attach() callback. Let me know which do you prefer.
>>
>> Ok, that clarifies a lot of things.  Please add the above as a comment in
>> attach().
>>
>>>
>>>>>
>>>>> Signed-off-by: Tanmay Shah <[email protected]>
>>>>> ---
>>>>>  drivers/remoteproc/xlnx_r5_remoteproc.c | 46 +++++++++++++++++++++++++
>>>>>  1 file changed, 46 insertions(+)
>>>>>
>>>>> diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c 
>>>>> b/drivers/remoteproc/xlnx_r5_remoteproc.c
>>>>> index 50a9974f3202..f08806f13800 100644
>>>>> --- a/drivers/remoteproc/xlnx_r5_remoteproc.c
>>>>> +++ b/drivers/remoteproc/xlnx_r5_remoteproc.c
>>>>> @@ -5,6 +5,7 @@
>>>>>   */
>>>>>  
>>>>>  #include <dt-bindings/power/xlnx-zynqmp-power.h>
>>>>> +#include <linux/delay.h>
>>>>>  #include <linux/dma-mapping.h>
>>>>>  #include <linux/firmware/xlnx-zynqmp.h>
>>>>>  #include <linux/kernel.h>
>>>>> @@ -29,6 +30,8 @@
>>>>>  #define RSC_TBL_XLNX_MAGIC       ((uint32_t)'x' << 24 | (uint32_t)'a' << 
>>>>> 16 | \
>>>>>                            (uint32_t)'m' << 8 | (uint32_t)'p')
>>>>>  
>>>>> +#define RPROC_ATTACH_TIMEOUT_US (100 * 1000)
>>>>> +
>>>>
>>>> There are some time constant already defined, please use them.
>>>
>>> Ack.
>>>
>>>>
>>>>>  /*
>>>>>   * settings for RPU cluster mode which
>>>>>   * reflects possible values of xlnx,cluster-mode dt-property
>>>>> @@ -865,6 +868,49 @@ static int zynqmp_r5_get_rsc_table_va(struct 
>>>>> zynqmp_r5_core *r5_core)
>>>>>  
>>>>>  static int zynqmp_r5_attach(struct rproc *rproc)
>>>>>  {
>>>>> + struct device *dev = &rproc->dev;
>>>>> + bool wait_for_remote = false;
>>>>> + struct fw_rsc_vdev *rsc;
>>>>> + struct fw_rsc_hdr *hdr;
>>>>> + int i, offset, avail;
>>>>> +
>>>>> + if (!rproc->table_ptr)
>>>>> +         goto attach_success;
>>>>> +
>>>>> + for (i = 0; i < rproc->table_ptr->num; i++) {
>>>>> +         offset = rproc->table_ptr->offset[i];
>>>>> +         hdr = (void *)rproc->table_ptr + offset;
>>>>> +         avail = rproc->table_sz - offset - sizeof(*hdr);
>>>>> +         rsc = (void *)hdr + sizeof(*hdr);
>>>>> +
>>>>> +         /* make sure table isn't truncated */
>>>>> +         if (avail < 0) {
>>>>> +                 dev_err(dev, "rsc table is truncated\n");
>>>>> +                 return -EINVAL;
>>>>> +         }
>>>>> +
>>>>> +         if (hdr->type != RSC_VDEV)
>>>>> +                 continue;
>>>>> +
>>>>> +         /*
>>>>> +          * reset vdev status, in case previous run didn't leave it in
>>>>> +          * a clean state.
>>>>> +          */
>>>>> +         if (rsc->status) {
>>>>> +                 rsc->status = 0;
>>>>> +                 wait_for_remote = true;
>>>>> +                 break;
>>>>> +         }
>>>>> + }
>>>>> +
>>>>> + /* kick remote to notify about attach */
>>>>> + rproc->ops->kick(rproc, 0);
>>>>> +
>>>>> + /* wait for sometime until remote is ready */
>>>>> + if (wait_for_remote)
>>>>> +         usleep_range(100, RPROC_ATTACH_TIMEOUT_US);
>>>>
>>>> Instead of waiting, would it be possible to return -EPROBE_DEFER and let 
>>>> the
>>>> driver core retry mechanic do it's work?
>>>>
>>>
>>> It is not possible to do -EPROBE_DEFER, because attach() callback is not
>>> called only during driver probe.
>>>
>>> It is also called during following command sequence:
>>>
>>> attach() -> detach() -> attach()
>>>
>>> During second attach() callback, we can't do -EPROBE_DEFER, as it's not
>>> driver probe anymore. So I think will have to keep the usleep_range().
>>
>> Right, but in this case the Cortex-A did not go through an uncontrolled 
>> reboot,
>> we know the state of the machine is sound.  Do you see a scenario where it 
>> would
>> not be the case?  
>>
> 
> Yes correct.  We will hit this issue only during probe, after that as
> long as detach() happens we are setting vdev status to 0.
> 
> Another problem with the -EPROBE_DEFER mechanism is that the time
> between return from attach() and next attach() isn't fixed. after
> deferring current probe, when next probe and attach() happens, we will
> always find vdev status to 0, even if remote hasn't handled vdev reset
> event. So we don't know if the remote has handled virtio reset flag
> notification or not. Where 100ms fixed delay, gives fixed time to remote
> to handle vdev reset event. If needed this delay can be increased later.
> 
> This brings up another question to make the solution more robust. Do we
> have any standard way of handling such a situation? Like in other virtio
> standards, can this situation happen where driver comes up and finds the
> virtio device status not in the reset state? How do they handle it?
> 
> Also, is firmware required to restore the resource table to default or
> initial resource table after getting the virtio reset notification? Any
> standard decided for remtoeproc virtio devices around this?
> 
> Thanks,
> Tanmay
> 

All,

This issue was discussed on openamp-rp call, and it was concluded that
the common solution that works for all vendors is needed for this problem.

The design of the solution was discussed like this:

1) Resource table will have standard resource that will be used to ask
device to reset itself.

2) Device will set the virtio status flag to 0 after successful reset.

3) Linux will poll non-zero virtio status until it is set to 0. If
virtio status is not found 0, then driver probe will defer.

I will implement above design and send new patch series. Please consider
this patch series rejected.

Thanks,
Tanmay
>>>
>>>> Thanks,
>>>> Mathieu
>>>>
>>>>> +
>>>>> +attach_success:
>>>>>   dev_dbg(&rproc->dev, "rproc %d attached\n", rproc->index);
>>>>>  
>>>>>   return 0;
>>>>>
>>>>> base-commit: d4ef36fbd57e610d4c334123ce706a2a71187cae
>>>>> -- 
>>>>> 2.34.1
>>>>>
>>>
> 


Reply via email to