> On 2020/2/24 15:03, Prakhya, Sai Praneeth wrote:
> >> On 2020/2/24 11:20, Prakhya, Sai Praneeth wrote:
> >>>>> +       list_for_each_entry_safe(grp_dev, temp, &group->devices, list) {
> >>>>> +               struct device *dev;
> >>>>> +
> >>>>> +               dev = grp_dev->dev;
> >>>>> +               iommu_release_device(dev);
> >>>>> +
> >>>>> +               ret = iommu_group_add_device(group, dev);
> >>>>> +               if (ret)
> >>>>> +                       dev_err(dev, "Failed to add to iommu group %d: 
> >>>>> %d\n",
> >>>>> +                               group->id, ret);
> >>>> Need to handle this error case.
> >>> I wasn't sure on how to handle the error ☹
> >>
> >> Just roll back to the state before calling this function and return
> >> an appropriate error value.
> >>
> >> The likely behavior is detaching the new domains from all devices (if
> >> it has already attached), attaching the old domains to all devices in
> >> the group,
> >
> > And while handling this error case, there is a possibility that attaching 
> > to old
> domain could fail.. so, we might need to handle that error case as well. If we
> plan to handle error case, since we have removed devices from group above,
> adding them back could fail too.. that would lead into handling error case 
> for an
> error case.
> 
> We can assume that the old domain should always be attached back.
> Otherwise, there must be some bugs in the vendor iommu driver.
> 
> It must be able to role back, otherwise users have to reboot the system in 
> order
> to use the devices in the group. This is not acceptable in the production 
> kernel.

I agree that we should be able to roll back but I am afraid that the error 
handling code could become complex than the usual code that gets to run. For 
example, assume there are 'n' devices in the group, 'k' of them are 
successfully processed (i.e. default domain type has been changed) and if k+1 
fails while changing default domain type, to roll back state of k devices, we 
need to maintain a list of processed devices so that we now roll back state for 
devices in this list. The present code does not have any list because it's 
processing sequentially, it takes a device from the group, changes it domain 
and moves to other device and hence does not require a list.

All said, I could give this a try and see how complex the code could turn out 
to be. I hope I am wrong (i.e. turns out implementing error handling is simple).

> > So, given the probability of these functions failing here are very low, I 
> > think,
> why not bite the bullet and say, add code to handle these error cases if we 
> see
> that these functions are failing frequently? Otherwise the error handling 
> code is
> just a dead code.
> >
> >> cleaning
> >> up all new resources allocated in this function, putting a error
> >> message to tell the user why it fails and returning an error code.
> >>
> >>> i.e. group's domain/default_domain are already updated to new domain
> >>> and
> >> assume there are 'n' devices in the group and this failed for 'k'th
> >> device, I wasn't sure how I could roll back the changes made for k-1 
> >> devices.
> >>
> >> A successful attach could be checked by (group->domain ==
> >> group->default_domain).
> >
> > No.. because I have manually set group->domain ==
> > group->default_domain = new_domain (did this because
> > iommu_group_add_device() and iommu_group_create_direct_mappings()
> > needs them)
> 
> You could set group->domain to the default domain only after it has been
> attached to the device successfully, right?

Will reorder things and see how this could be handled.

> > So, probably we might need some other way to check successful attach.
> >
> >>> So, I thought probably just alert the user that there was an error
> >>> while
> >> changing default domain type and try updating for other devices in
> >> the group (hopefully other devices might succeed). Also,*generally*
> >> we shouldn't see any errors here because all these devices were
> >> already in the same group earlier (we aren’t adding/removing new
> >> devices to the group). We are just changing default domain type and
> >> we already made sure that device could be in the requested default domain
> type.

[snipped]

> >> At least I didn't see the iommu_def_domain_type is used if both
> >> idt_devs and dma_devs are both equal to 0. :-)
> >
> > Sorry! I didn't get what you meant by "iommu_def_domain_type", so, could
> you please explain that?
> >
> > When idt_devs == 0 and dma_devs == 0, it means that all the devices in the
> group could be in either of the domain. Hence, I default to DMA domain which 
> is
> covered by this code.
> >             if (!idt_devs)
> >                     req_type = IOMMU_DOMAIN_DMA;
> 
> iommu_def_domain_type is the system level default domain type defined in
> iommu.c. If the vendor iommu driver has no special requirement, we should
> choose the default value.

Ok.. got it, thanks for explaining. I will change this as suggested.

Regards,
Sai
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Reply via email to