RE: [PATCH v2 0/5] iommu: Support identity mappings of reserved-memory regions
Hi Dmitry, > Thank you for the answer. Could you please give more information about: > 1) Are you on software or hardware team, or both? I am in the software team and has contributed to initial Tegra SMMU driver in the downstream along with earlier team member Hiroshi Doyu. > 2) Is SMMU a third party IP or developed in-house? Tegra SMMU is developed in-house. > 3) Do you have a direct access to HDL sources? Are you 100% sure that > hardware does what you say? It was discussed with Hardware team before and again today as well. Enabling ASID for display engine while it continues to access the buffer memory is a safe operation. As per HW team, The only side-effect that can happen is additional latency to transaction as SMMU caches get warmed up. > 4) What happens when CPU writes to ASID register? Does SMMU state machine > latch ASID status (making it visible) only at a single "safe" point? MC makes a decision on routing transaction through either SMMU page tables or bypassing based on the ASID register value. It checks the ASID register only once per transaction in the pipeline. -KR ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs
On Wed, Apr 28, 2021 at 11:56:22AM -0300, Jason Gunthorpe wrote: > On Wed, Apr 28, 2021 at 10:58:29AM +1000, David Gibson wrote: > > On Tue, Apr 27, 2021 at 02:12:12PM -0300, Jason Gunthorpe wrote: > > > On Tue, Apr 27, 2021 at 03:08:46PM +1000, David Gibson wrote: > > > > > Starting from a BDF the general pseudo code is: > > > > > device_name = first_directory_of("/sys/bus/pci/devices/BDF/vfio/") > > > > > device_fd = open("/dev/vfio/"+device_name) > > > > > ioasidfd = open("/dev/ioasid") > > > > > ioctl(device_fd, JOIN_IOASID_FD, ioasidfd) > > > > > > > > This line is the problem. > > > > > > > > [Historical aside: Alex's early drafts for the VFIO interface looked > > > > quite similar to this. Ben Herrenschmidt and myself persuaded him it > > > > was a bad idea, and groups were developed instead. I still think it's > > > > a bad idea, and not just for POWER] > > > > > > Spawning the VFIO device FD from the group FD is incredibly gross from > > > a kernel design perspective. Since that was done the struct > > > vfio_device missed out on a sysfs presence and doesn't have the > > > typical 'struct device' member or dedicated char device you'd expect a > > > FD based subsystem to have. > > > > > > This basically traded normal usage of the driver core for something > > > that doesn't serve a technical usage. Given we are now nearly 10 years > > > later and see that real widely deployed applications are not doing > > > anything special with the group FD it makes me question the wisdom of > > > this choice. > > > > I'm really not sure what "anything special" would constitute here. > > Well, really anything actually. All I see in, say, dpdk, is open the > group fd, get a device fd, do the container dance and never touch the > group fd again or care about groups in any way. It seems typical of > this class of application. Well, sure, the only operation you do on the group itself is attach it to the container (and then every container operation can be thought of as applying to all its attached groups). But that attach operation really is fundamentally about the group. It always, unavoidably, fundamentally affects every device in the group - including devices you may not typically think about, like bridges and switches. That is *not* true of the other device operations, like poking IO. > If dpdk is exposing other devices to a risk it certainly hasn't done > anything to make that obvious. And in practice I suspect it will just break if you give it a >1 device group. > > > Okay, that is fair, but let's solve that problem directly. For > > > instance netlink has been going in the direction of adding a "extack" > > > from the kernel which is a descriptive error string. If the failing > > > ioctl returned the string: > > > > > > "cannot join this device to the IOASID because device XXX in the > > >same group #10 is in use" > > > > Um.. is there a sane way to return strings from an ioctl()? > > Yes, it can be done, a string buffer pointer and length in the input > for instance. I suppose. Rare enough that I expect everyone will ignore it, alas :/. > > Getting the device fds from the group fd kind of follows, because it's > > unsafe to do basically anything on the device unless you already > > control the group (which in this case means attaching it to a > > container/ioasid). I'm entirely open to ways of doing that that are > > less inelegant from a sysfs integration point of view, but the point > > is you must manage the group before you can do anything at all with > > individual devices. > > I think most likely VFIO is going to be the only thing to manage a > multi-device group. You don't get to choose that. You could explicitly limit other things to only one-device groups, but that would be an explicit limitation. Essentially any device can end up in a multi-device group, if you put it behind a PCIe to PCI bridge, or a PCIe switch which doesn't support access controls. The groups are still there, whether or not other things want to deal with them. > I see things like VDPA being primarily about PASID, and an IOASID that > is matched to a PASID is inherently a single device IOMMU group. I don't know enough about PASID to make sense of that. > > I don't see why. I mean, sure, you don't want explicitly the *vfio* > > group as such. But IOMMU group is already a cross-subsystem concept > > and you can explicitly expose that in a different way. > > Yes, and no, the kernel drivers in something like VDPA have decided > what device and group they are in before we get to IOASID. It is > illogical to try to retro-actively bolt in a group concept to their > APIs. Again, I don't know enough about VDPA to make sense of that. Are we essentially talking non-PCI virtual devices here? In which case you could define the VDPA "bus" to always have one-device groups. > > Again, I realy think this is necessary complexity. You're right that > > far too little of the userspace properly
Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs
On Wed, Apr 28, 2021 at 09:21:49PM -0300, Jason Gunthorpe wrote: > On Wed, Apr 28, 2021 at 11:23:39AM +1000, David Gibson wrote: > > > Yes. My proposed model for a unified interface would be that when you > > create a new container/IOASID, *no* IOVAs are valid. > > Hurm, it is quite tricky. All IOMMUs seem to have a dead zone around > the MSI window, so negotiating this all in a general way is not going > to be a very simple API. > > To be general it would be nicer to say something like 'I need XXGB of > IOVA space' 'I need 32 bit IOVA space' etc and have the kernel return > ranges that sum up to at least that big. Then the kernel can do its > all its optimizations. Ah, yes, sorry. We do need an API that lets the kernel make more of the decisions too. For userspace drivers it would generally be sufficient to just ask for XXX size of IOVA space wherever you can get it. Handling guests requires more precision. So, maybe a request interface with a bunch of hint variables and a matching set of MAP_FIXED-like flags to assert which ones aren't negotiable. > I guess you are going to say that the qemu PPC vIOMMU driver needs > more exact control.. *Every* vIOMMU driver needs more exact control. The guest drivers will expect to program the guest devices with IOVAs matching the guest platform's IOMMU model. Therefore the backing host IOMMU has to be programmed to respond to those IOVAs. If it can't be, there's no way around it, and you want to fail out early. With this model that will happen when qemu (say) requests the host IOMMU window(s) to match the guest's expected IOVA ranges. Actually, come to that even guests without a vIOMMU need more exact control: they'll expect IOVA to match GPA, so if your host IOMMU can't be set up translate the full range of GPAs, again, you're out of luck. The only reason x86 has been able to ignore this is that the assumption has been that all IOMMUs can translate IOVAs from 0... Once you really start to look at what the limits are, you need the exact window control I'm describing. > > I expect we'd need some kind of query operation to expose limitations > > on the number of windows, addresses for them, available pagesizes etc. > > Is page size an assumption that hugetlbfs will always be used for backing > memory or something? So for TCEs (and maybe other IOMMUs out there), the IO page tables are independent of the CPU page tables. They don't have the same format, and they don't necessarily have the same page size. In the case of a bare metal kernel working in physical addresses they can use that TCE page size however they like. For userspace you get another layer of complexity. Essentially to implement things correctly the backing IOMMU needs to have a page size granularity that's the minimum of whatever granularity the userspace or guest driver expects and the host page size backing the memory. > > > As an ideal, only things like the HW specific qemu vIOMMU driver > > > should be reaching for all the special stuff. > > > > I'm hoping we can even avoid that, usually. With the explicitly > > created windows model I propose above, it should be able to: qemu will > > create the windows according to the IOVA windows the guest platform > > expects to see and they either will or won't work on the host platform > > IOMMU. If they do, generic maps/unmaps should be sufficient. If they > > don't well, the host IOMMU simply cannot emulate the vIOMMU so you're > > out of luck anyway. > > It is not just P9 that has special stuff, and this whole area of PASID > seems to be quite different on every platform > > If things fit very naturally and generally then maybe, but I've been > down this road before of trying to make a general description of a > group of very special HW. It ended in tears after 10 years when nobody > could understand the "general" API after it was Frankenstein'd up with > special cases for everything. Cautionary tale > > There is a certain appeal to having some > 'PPC_TCE_CREATE_SPECIAL_IOASID' entry point that has a wack of extra > information like windows that can be optionally called by the viommu > driver and it remains well defined and described. Windows really aren't ppc specific. They're absolutely there on x86 and everything else as well - it's just that people are used to having a window at 0.. that you can often get away with treating it sloppily. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs
On Wed, Apr 28, 2021 at 11:23:39AM +1000, David Gibson wrote: > Yes. My proposed model for a unified interface would be that when you > create a new container/IOASID, *no* IOVAs are valid. Hurm, it is quite tricky. All IOMMUs seem to have a dead zone around the MSI window, so negotiating this all in a general way is not going to be a very simple API. To be general it would be nicer to say something like 'I need XXGB of IOVA space' 'I need 32 bit IOVA space' etc and have the kernel return ranges that sum up to at least that big. Then the kernel can do its all its optimizations. I guess you are going to say that the qemu PPC vIOMMU driver needs more exact control.. > I expect we'd need some kind of query operation to expose limitations > on the number of windows, addresses for them, available pagesizes etc. Is page size an assumption that hugetlbfs will always be used for backing memory or something? > > As an ideal, only things like the HW specific qemu vIOMMU driver > > should be reaching for all the special stuff. > > I'm hoping we can even avoid that, usually. With the explicitly > created windows model I propose above, it should be able to: qemu will > create the windows according to the IOVA windows the guest platform > expects to see and they either will or won't work on the host platform > IOMMU. If they do, generic maps/unmaps should be sufficient. If they > don't well, the host IOMMU simply cannot emulate the vIOMMU so you're > out of luck anyway. It is not just P9 that has special stuff, and this whole area of PASID seems to be quite different on every platform If things fit very naturally and generally then maybe, but I've been down this road before of trying to make a general description of a group of very special HW. It ended in tears after 10 years when nobody could understand the "general" API after it was Frankenstein'd up with special cases for everything. Cautionary tale There is a certain appeal to having some 'PPC_TCE_CREATE_SPECIAL_IOASID' entry point that has a wack of extra information like windows that can be optionally called by the viommu driver and it remains well defined and described. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs
On Wed, Apr 28, 2021 at 06:34:11AM +, Tian, Kevin wrote: > > If /dev/ioasid is single HW page table only then I would focus on that > > implementation and leave it to userspace to span different > > /dev/ioasids if needed. > > > > > OK, now I see where the disconnection comes from. In my context ioasid > > > is the identifier that is actually used in the wire, but seems you treat > > > it as > > > a sw-defined namespace purely for representing page tables. We should > > > clear this concept first before further discussing other details. > > > > There is no general HW requirement that every IO page table be > > referred to by the same PASID and this API would have to support > > Yes, but what is the value of allowing multiple PASIDs referring to the > the same I/O page table (except the nesting pgtable case)? Doesn't it > lead to poor iotlb efficiency issue similar to multiple iommu domains > referring to the same page table? I think iotlb efficiency is up to the platform. The general use case is to make an IOASID for something like the GPA and use it concurrently with three say three devices: - VFIO (not PASID) - VDPA (PASID capable HW) - 'Future VDPA storage' (PASID capable HW) The uAPI for this should be very general and the kernel should decide the optimal way to configure the HW. Maybe it is one page table and one PASID, or maybe it is something else. Allowing the kernel to choose the PASID once it knows the RID is the highest generality. > > non-PASID IO page tables as well. So I'd keep the two things > > separated in the uAPI - even though the kernel today has a global > > PASID pool. > > for non-PASID usages the allocated PASID will be wasted if we don't > separate ioasid from pasid. But it may be worthwhile given 1m available > pasids and the simplification in the uAPI which only needs to care about > one id space then. I'd prefer this be a platform choice and not forced in the uAPI, because we can never go back on it if we see that yes we need to optimize here. I understand many platforms have different available PASID spaces already. > > Simple things like DPDK can use #2 and potentially have better PASID > > limits. hypervisors will most likely have to use #1, but it depends on > > how their vIOMMU interface works. > > Can you elaborate why DPDK wants to use #2 i.e. not using a global > PASID? It gives the kernel an option to make the decision about the PASID when it has the full information, including the RID. > > I think the name IOASID is fine for the uAPI, the kernel version can > > be called ioasid_id or something. > > ioasid is already an id and then ioasid_id just adds confusion. Another > point is that ioasid is currently used to represent both PCI PASID and > ARM substream ID in the kernel. It implies that if we want to separate > ioasid and pasid in the uAPI the 'pasid' also needs to be replaced with > another general term usable for substream ID. Are we making the > terms too confusing here? This is why I also am not so sure about exposing the PASID in the API because it is ultimately a HW specific item. As I said to David, one avenue is to have some generic uAPI that is very general and keep all this deeply detailed stuff, that really only matters for qemu, as part of a more HW specific vIOMMU driver interface. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs
On Wed, Apr 28, 2021 at 07:47:56AM +, Tian, Kevin wrote: > > From: Jason Gunthorpe > > Sent: Wednesday, April 28, 2021 1:12 AM > > > [...] > > One option is VFIO can keep its group FD but nothing else will have > > anthing like it. However I don't much like the idea that VFIO will > > have a special and unique programming model to do that same things > > other subsystem will do. That will make it harder for userspace to > > implement. > > Hi, Jason, > > I have a question here. Based on discussions so far, it's clearly that the > new ioasid uAPI will differ from existing VFIO uAPI a lot, e.g. ioasid- > centric operations, no group fd, no incompatible domains, etc. Then > I wonder how we plan to support legacy VFIO applications in this > transition phase. I suspect the VFIO group fd will have to be registered with /dev/ioasid in addition to each device if we are to retain the same model. > Earlier you ever mentioned the desire of directly replacing > /dev/vfio/vfio with /dev/ioasid and having ioasid to present both > VFIO and new uAPI. Doesn't it imply that we have to copy the VFIO > container/group semantics into /dev/ioasid although it's a special > programming model only for VFIO? I gave that as a something to think about, if it doesn't work out then it is just a bad idea to discard. > Alternatively we could keep all the container/group legacy within VFIO > and having /dev/ioasid support only the new uAPI semantics. In this case > VFIO will include a shim iommu backend to connect its legacy uAPI into > drivers/ioasid backend functions for backward compatibility. Then VFIO > will also support a new model which only uses its device uAPI to bind > to new ioasid fd w/o using any legacy container/group/iommu uAPI. > Does this sound a plan? It may be where we end up.. Though I fear it will make it overly complex inside VFIO to access the new stuff. It would be very nice if we could see a path where VFIO insides could only deal with the in-kernel ioasid handles, whatever they are. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs
On Wed, 28 Apr 2021 06:34:11 + "Tian, Kevin" wrote: > > From: Jason Gunthorpe > > Sent: Monday, April 26, 2021 8:38 PM > > > [...] > > > Want to hear your opinion for one open here. There is no doubt that > > > an ioasid represents a HW page table when the table is constructed by > > > userspace and then linked to the IOMMU through the bind/unbind > > > API. But I'm not very sure about whether an ioasid should represent > > > the exact pgtable or the mapping metadata when the underlying > > > pgtable is indirectly constructed through map/unmap API. VFIO does > > > the latter way, which is why it allows multiple incompatible domains > > > in a single container which all share the same mapping metadata. > > > > I think VFIO's map/unmap is way too complex and we know it has bad > > performance problems. > > Can you or Alex elaborate where the complexity and performance problem > locate in VFIO map/umap? We'd like to understand more detail and see how > to avoid it in the new interface. The map/unmap interface is really only good for long lived mappings, the overhead is too high for things like vIOMMU use cases or any case where the mapping is intended to be dynamic. Userspace drivers must make use of a long lived buffer mapping in order to achieve performance. The mapping and unmapping granularity has been a problem as well, type1v1 allowed arbitrary unmaps to bisect the original mapping, with the massive caveat that the caller relies on the return value of the unmap to determine what was actually unmapped because the IOMMU use of superpages is transparent to the caller. This led to type1v2 that simply restricts the user to avoid ever bisecting mappings. That still leaves us with problems for things like virtio-mem support where we need to create initial mappings with a granularity that allows us to later remove entries, which can prevent effective use of IOMMU superpages. Locked page accounting has been another constant issue. We perform locked page accounting at the container level, where each container accounts independently. A user may require multiple containers, the containers may pin the same physical memory, but be accounted against the user once per container. Those are the main ones I can think of. It is nice to have a simple map/unmap interface, I'd hope that a new /dev/ioasid interface wouldn't raise the barrier to entry too high, but the user needs to have the ability to have more control of their mappings and locked page accounting should probably be offloaded somewhere. Thanks, Alex ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs
On Wed, Apr 28, 2021 at 10:58:29AM +1000, David Gibson wrote: > On Tue, Apr 27, 2021 at 02:12:12PM -0300, Jason Gunthorpe wrote: > > On Tue, Apr 27, 2021 at 03:08:46PM +1000, David Gibson wrote: > > > > Starting from a BDF the general pseudo code is: > > > > device_name = first_directory_of("/sys/bus/pci/devices/BDF/vfio/") > > > > device_fd = open("/dev/vfio/"+device_name) > > > > ioasidfd = open("/dev/ioasid") > > > > ioctl(device_fd, JOIN_IOASID_FD, ioasidfd) > > > > > > This line is the problem. > > > > > > [Historical aside: Alex's early drafts for the VFIO interface looked > > > quite similar to this. Ben Herrenschmidt and myself persuaded him it > > > was a bad idea, and groups were developed instead. I still think it's > > > a bad idea, and not just for POWER] > > > > Spawning the VFIO device FD from the group FD is incredibly gross from > > a kernel design perspective. Since that was done the struct > > vfio_device missed out on a sysfs presence and doesn't have the > > typical 'struct device' member or dedicated char device you'd expect a > > FD based subsystem to have. > > > > This basically traded normal usage of the driver core for something > > that doesn't serve a technical usage. Given we are now nearly 10 years > > later and see that real widely deployed applications are not doing > > anything special with the group FD it makes me question the wisdom of > > this choice. > > I'm really not sure what "anything special" would constitute here. Well, really anything actually. All I see in, say, dpdk, is open the group fd, get a device fd, do the container dance and never touch the group fd again or care about groups in any way. It seems typical of this class of application. If dpdk is exposing other devices to a risk it certainly hasn't done anything to make that obvious. > > Okay, that is fair, but let's solve that problem directly. For > > instance netlink has been going in the direction of adding a "extack" > > from the kernel which is a descriptive error string. If the failing > > ioctl returned the string: > > > > "cannot join this device to the IOASID because device XXX in the > >same group #10 is in use" > > Um.. is there a sane way to return strings from an ioctl()? Yes, it can be done, a string buffer pointer and length in the input for instance. > Getting the device fds from the group fd kind of follows, because it's > unsafe to do basically anything on the device unless you already > control the group (which in this case means attaching it to a > container/ioasid). I'm entirely open to ways of doing that that are > less inelegant from a sysfs integration point of view, but the point > is you must manage the group before you can do anything at all with > individual devices. I think most likely VFIO is going to be the only thing to manage a multi-device group. I see things like VDPA being primarily about PASID, and an IOASID that is matched to a PASID is inherently a single device IOMMU group. > I don't see why. I mean, sure, you don't want explicitly the *vfio* > group as such. But IOMMU group is already a cross-subsystem concept > and you can explicitly expose that in a different way. Yes, and no, the kernel drivers in something like VDPA have decided what device and group they are in before we get to IOASID. It is illogical to try to retro-actively bolt in a group concept to their APIs. > Again, I realy think this is necessary complexity. You're right that > far too little of the userspace properly understands group > restrictions.. but these come from real hardware limitations, and I > don't feel like making it *less* obvious in the interface is going to > help that. The appeal of making it less obvious is we can have a single simplified API flow so that an application that doesn't understand or care about groups can have uniformity. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: DMA mapping fill dma_address to 0
Le Wed, Apr 28, 2021 at 11:06:10AM +0100, Robin Murphy a écrit : > On 2021-04-28 09:42, Corentin Labbe wrote: > > Hello > > > > I work on the crypto offloader driver of cortina/gemini SL3516 SoC. > > I test it by filling a LUKS2 partition. > > I got a reproductible problem when handling skcipher requests. > > I use dma_map_sg() and when iterating other the result, sg_dma_address(sg) > > return 0. > > But sg_dma_len(sg) is still correct (4096 in my case). > > > > Below is a simplified view of my code: > > nr_sgs = dma_map_sg(ce->dev, areq->src, sg_nents(areq->src), DMA_TO_DEVICE); > > (nr_sgs = 1 in my case) > > sg = areq->src; > > if (!sg_dma_address(sg)) > > FAIL > > What is this check supposed to be for in the first place? 0 is a valid > DMA address, because it's also a valid physical address, and I recall > RAM at PA 0 on Hikey 960 flushing out some bugs in the past when we > tried to use 0 for DMA_MAPPING_ERROR. All the Gemini DTs appear to show > RAM starting at PA 0 too, so I'd have to guess that it's simply the case > that your DMA buffer happened to end up using that particular page. > > Robin. > Yes, 0 is a valid DMA address. I just find it by going further and printing mem_map value and testing it against sg_page() return. So my original problem was not related to this. Sorry for the noise. Thanks ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: DMA mapping fill dma_address to 0
On 2021-04-28 09:42, Corentin Labbe wrote: Hello I work on the crypto offloader driver of cortina/gemini SL3516 SoC. I test it by filling a LUKS2 partition. I got a reproductible problem when handling skcipher requests. I use dma_map_sg() and when iterating other the result, sg_dma_address(sg) return 0. But sg_dma_len(sg) is still correct (4096 in my case). Below is a simplified view of my code: nr_sgs = dma_map_sg(ce->dev, areq->src, sg_nents(areq->src), DMA_TO_DEVICE); (nr_sgs = 1 in my case) sg = areq->src; if (!sg_dma_address(sg)) FAIL What is this check supposed to be for in the first place? 0 is a valid DMA address, because it's also a valid physical address, and I recall RAM at PA 0 on Hikey 960 flushing out some bugs in the past when we tried to use 0 for DMA_MAPPING_ERROR. All the Gemini DTs appear to show RAM starting at PA 0 too, so I'd have to guess that it's simply the case that your DMA buffer happened to end up using that particular page. Robin. I have digged to find what do dma_map_sg() and I have added some debug. sg_page(sg) return c7efb000 for example so sg_page() works. But it seems the problem is that page_to_phys(sg_page(sg)) return 0. This problem does not appear immediatly, luksOpen and subsequent fsck always work. But it appears fast after, when mouting or rsync files in it. I have added CONFIG_DEBUG_SG, CONFIG_DMA_API_DEBUG, CONFIG_DMA_API_DEBUG_SG but they didnt bringed any more hints. Only "DMA-API: cacheline tracking ENOMEM, dma-debug disabled" appears but always with some "time" between my problem and its display. So I am not sure it is related. Regards ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 05/16] swiotlb: Add restricted DMA pool initialization
On 26/04/2021 17:37, Claire Chang wrote: On Fri, Apr 23, 2021 at 7:34 PM Steven Price wrote: [...] But even then if it's not and we have the situation where debugfs==NULL then the debugfs_create_dir() here will cause a subsequent attempt in swiotlb_create_debugfs() to fail (directory already exists) leading to mem->debugfs being assigned an error value. I suspect the creation of the debugfs directory needs to be separated from io_tlb_default_mem being set. debugfs creation should move into the if (!mem) {...} above to avoid duplication. I think having a separated struct dentry pointer for the default debugfs should be enough? if (!debugfs) debugfs = debugfs_create_dir("swiotlb", NULL); swiotlb_create_debugfs(mem, rmem->name, debugfs); Yes that looks like a good solution to me. Although I'd name the variable something a bit more descriptive than just "debugfs" e.g. "debugfs_dir" or "debugfs_root". Thanks, Steve ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
DMA mapping fill dma_address to 0
Hello I work on the crypto offloader driver of cortina/gemini SL3516 SoC. I test it by filling a LUKS2 partition. I got a reproductible problem when handling skcipher requests. I use dma_map_sg() and when iterating other the result, sg_dma_address(sg) return 0. But sg_dma_len(sg) is still correct (4096 in my case). Below is a simplified view of my code: nr_sgs = dma_map_sg(ce->dev, areq->src, sg_nents(areq->src), DMA_TO_DEVICE); (nr_sgs = 1 in my case) sg = areq->src; if (!sg_dma_address(sg)) FAIL I have digged to find what do dma_map_sg() and I have added some debug. sg_page(sg) return c7efb000 for example so sg_page() works. But it seems the problem is that page_to_phys(sg_page(sg)) return 0. This problem does not appear immediatly, luksOpen and subsequent fsck always work. But it appears fast after, when mouting or rsync files in it. I have added CONFIG_DEBUG_SG, CONFIG_DMA_API_DEBUG, CONFIG_DMA_API_DEBUG_SG but they didnt bringed any more hints. Only "DMA-API: cacheline tracking ENOMEM, dma-debug disabled" appears but always with some "time" between my problem and its display. So I am not sure it is related. Regards ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 0/5] iommu: Support identity mappings of reserved-memory regions
28.04.2021 08:57, Mikko Perttunen пишет: > On 4/28/21 8:51 AM, Dmitry Osipenko wrote: >> 23.04.2021 19:32, Thierry Reding пишет: >>> Note that there will be no new releases of the bootloader for earlier >>> devices, so adding support for these new DT bindings will not be >>> practical. The bootloaders on those devices do pass information about >>> the active framebuffer via the kernel command-line, so we may want to >>> add code to create reserved regions in the IOMMU based on that. >> >> Since this change requires a bootloader update anyways, why it's not >> possible to fix the bootloader properly, making it to stop all the DMA >> activity before jumping into kernel? >> > > That is not desirable, as then we couldn't have seamless > bootloader-kernel boot splash transition. The seamless transition should be more complicated since it should require to read out the hardware state in order to convert it into DRM state + display panel needs to stay ON. It's a bit questionable whether this is really needed, so far this is not achievable in mainline. Nevertheless, it will be good to have an early simple-framebuffer, which I realized only after sending out the message. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs
> From: Jason Gunthorpe > Sent: Wednesday, April 28, 2021 1:12 AM > [...] > One option is VFIO can keep its group FD but nothing else will have > anthing like it. However I don't much like the idea that VFIO will > have a special and unique programming model to do that same things > other subsystem will do. That will make it harder for userspace to > implement. Hi, Jason, I have a question here. Based on discussions so far, it's clearly that the new ioasid uAPI will differ from existing VFIO uAPI a lot, e.g. ioasid- centric operations, no group fd, no incompatible domains, etc. Then I wonder how we plan to support legacy VFIO applications in this transition phase. Earlier you ever mentioned the desire of directly replacing /dev/vfio/vfio with /dev/ioasid and having ioasid to present both VFIO and new uAPI. Doesn't it imply that we have to copy the VFIO container/group semantics into /dev/ioasid although it's a special programming model only for VFIO? Alternatively we could keep all the container/group legacy within VFIO and having /dev/ioasid support only the new uAPI semantics. In this case VFIO will include a shim iommu backend to connect its legacy uAPI into drivers/ioasid backend functions for backward compatibility. Then VFIO will also support a new model which only uses its device uAPI to bind to new ioasid fd w/o using any legacy container/group/iommu uAPI. Does this sound a plan? Thanks Kevin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs
> From: Jason Gunthorpe > Sent: Wednesday, April 28, 2021 1:12 AM > [...] > > As Alex says, if this line fails because of the group restrictions, > > that's not great because it's not very obvious what's gone wrong. > > Okay, that is fair, but let's solve that problem directly. For > instance netlink has been going in the direction of adding a "extack" > from the kernel which is a descriptive error string. If the failing > ioctl returned the string: > > "cannot join this device to the IOASID because device XXX in the >same group #10 is in use" > > Would you agree it is now obvious what has gone wrong? In fact would > you agree this is a lot better user experience than what applications > do today even though they have the group FD? > Currently all the discussions are around implicit vs. explicit uAPI semantics on the group restriction. However if we look beyond group the implicit semantics might be inevitable when dealing with incompatible iommu domains. An existing example of iommu incompatibility is IOMMU_ CACHE. In the future there could be other incompatibilities such as whether nested translation is supported. In the end the userspace has to do some due diligence on understanding iommu topology and attributes to decide how many VFIO containers or ioasid fds should be created. It does push some burden to userspace but it's difficult to define a group- like kernel object to enforce such restriction for iommu compatibility. Then the best that the kernel can do is to return an informational error message in case an incompatible device is attached to the existing domain. If this is the perceived way to move forward anyway, I feel that removing explicit group FD from uAPI doesn't make userspace worse... Thanks Kevin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs
> From: Jason Gunthorpe > Sent: Monday, April 26, 2021 8:38 PM > [...] > > Want to hear your opinion for one open here. There is no doubt that > > an ioasid represents a HW page table when the table is constructed by > > userspace and then linked to the IOMMU through the bind/unbind > > API. But I'm not very sure about whether an ioasid should represent > > the exact pgtable or the mapping metadata when the underlying > > pgtable is indirectly constructed through map/unmap API. VFIO does > > the latter way, which is why it allows multiple incompatible domains > > in a single container which all share the same mapping metadata. > > I think VFIO's map/unmap is way too complex and we know it has bad > performance problems. Can you or Alex elaborate where the complexity and performance problem locate in VFIO map/umap? We'd like to understand more detail and see how to avoid it in the new interface. > > If /dev/ioasid is single HW page table only then I would focus on that > implementation and leave it to userspace to span different > /dev/ioasids if needed. > > > OK, now I see where the disconnection comes from. In my context ioasid > > is the identifier that is actually used in the wire, but seems you treat it > > as > > a sw-defined namespace purely for representing page tables. We should > > clear this concept first before further discussing other details. > > There is no general HW requirement that every IO page table be > referred to by the same PASID and this API would have to support Yes, but what is the value of allowing multiple PASIDs referring to the the same I/O page table (except the nesting pgtable case)? Doesn't it lead to poor iotlb efficiency issue similar to multiple iommu domains referring to the same page table? > non-PASID IO page tables as well. So I'd keep the two things > separated in the uAPI - even though the kernel today has a global > PASID pool. for non-PASID usages the allocated PASID will be wasted if we don't separate ioasid from pasid. But it may be worthwhile given 1m available pasids and the simplification in the uAPI which only needs to care about one id space then. > > > Then following your proposal, does it mean that we need another > > interface for allocating PASID? and since ioasid means different > > thing in uAPI and in-kernel API, possibly a new name is required to > > avoid confusion? > > I would suggest have two ways to control the PASID > > 1) Over /dev/ioasid allocate a PASID for an IOASID. All future PASID > based usages of the IOASID will use that global PASID > > 2) Over the device FD, when the IOASID is bound return the PASID that > was selected. If the IOASID does not have a global PASID then the > kernel is free to make something up. In this mode a single IOASID > can have multiple PASIDs. > > Simple things like DPDK can use #2 and potentially have better PASID > limits. hypervisors will most likely have to use #1, but it depends on > how their vIOMMU interface works. Can you elaborate why DPDK wants to use #2 i.e. not using a global PASID? > > I think the name IOASID is fine for the uAPI, the kernel version can > be called ioasid_id or something. ioasid is already an id and then ioasid_id just adds confusion. Another point is that ioasid is currently used to represent both PCI PASID and ARM substream ID in the kernel. It implies that if we want to separate ioasid and pasid in the uAPI the 'pasid' also needs to be replaced with another general term usable for substream ID. Are we making the terms too confusing here? > > (also looking at ioasid.c, why do we need such a thin and odd wrapper > around xarray?) > I'll leave it to Jean and Jacob. Thanks Kevin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 0/5] iommu: Support identity mappings of reserved-memory regions
On 4/28/21 8:51 AM, Dmitry Osipenko wrote: 23.04.2021 19:32, Thierry Reding пишет: Note that there will be no new releases of the bootloader for earlier devices, so adding support for these new DT bindings will not be practical. The bootloaders on those devices do pass information about the active framebuffer via the kernel command-line, so we may want to add code to create reserved regions in the IOMMU based on that. Since this change requires a bootloader update anyways, why it's not possible to fix the bootloader properly, making it to stop all the DMA activity before jumping into kernel? That is not desirable, as then we couldn't have seamless bootloader-kernel boot splash transition. Mikko ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu