RE: [PATCH v2 0/5] iommu: Support identity mappings of reserved-memory regions

2021-04-28 Thread Krishna Reddy
Hi Dmitry,

> Thank you for the answer. Could you please give more information about:
> 1) Are you on software or hardware team, or both?

I am in the software team and has contributed to initial Tegra SMMU driver in 
the downstream along with earlier team member Hiroshi Doyu.

> 2) Is SMMU a third party IP or developed in-house?

Tegra SMMU is developed in-house. 

> 3) Do you have a direct access to HDL sources? Are you 100% sure that
> hardware does what you say?

It was discussed with Hardware team before and again today as well.
Enabling ASID for display engine while it continues to access the buffer memory 
is a safe operation.
As per HW team, The only side-effect that can happen is additional latency to 
transaction as SMMU caches get warmed up.

> 4) What happens when CPU writes to ASID register? Does SMMU state machine
> latch ASID status (making it visible) only at a single "safe" point?

MC makes a decision on routing transaction through either SMMU page tables or 
bypassing based on the ASID register value.  It
checks the ASID register only once per transaction in the pipeline.

-KR
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-04-28 Thread David Gibson
On Wed, Apr 28, 2021 at 11:56:22AM -0300, Jason Gunthorpe wrote:
> On Wed, Apr 28, 2021 at 10:58:29AM +1000, David Gibson wrote:
> > On Tue, Apr 27, 2021 at 02:12:12PM -0300, Jason Gunthorpe wrote:
> > > On Tue, Apr 27, 2021 at 03:08:46PM +1000, David Gibson wrote:
> > > > > Starting from a BDF the general pseudo code is:
> > > > >  device_name = first_directory_of("/sys/bus/pci/devices/BDF/vfio/")
> > > > >  device_fd = open("/dev/vfio/"+device_name)
> > > > >  ioasidfd = open("/dev/ioasid")
> > > > >  ioctl(device_fd, JOIN_IOASID_FD, ioasidfd)
> > > > 
> > > > This line is the problem.
> > > > 
> > > > [Historical aside: Alex's early drafts for the VFIO interface looked
> > > > quite similar to this.  Ben Herrenschmidt and myself persuaded him it
> > > > was a bad idea, and groups were developed instead.  I still think it's
> > > > a bad idea, and not just for POWER]
> > > 
> > > Spawning the VFIO device FD from the group FD is incredibly gross from
> > > a kernel design perspective. Since that was done the struct
> > > vfio_device missed out on a sysfs presence and doesn't have the
> > > typical 'struct device' member or dedicated char device you'd expect a
> > > FD based subsystem to have.
> > > 
> > > This basically traded normal usage of the driver core for something
> > > that doesn't serve a technical usage. Given we are now nearly 10 years
> > > later and see that real widely deployed applications are not doing
> > > anything special with the group FD it makes me question the wisdom of
> > > this choice.
> > 
> > I'm really not sure what "anything special" would constitute here.
> 
> Well, really anything actually. All I see in, say, dpdk, is open the
> group fd, get a device fd, do the container dance and never touch the
> group fd again or care about groups in any way. It seems typical of
> this class of application.

Well, sure, the only operation you do on the group itself is attach it
to the container (and then every container operation can be thought of
as applying to all its attached groups).  But that attach operation
really is fundamentally about the group.  It always, unavoidably,
fundamentally affects every device in the group - including devices
you may not typically think about, like bridges and switches.

That is *not* true of the other device operations, like poking IO.

> If dpdk is exposing other devices to a risk it certainly hasn't done
> anything to make that obvious.

And in practice I suspect it will just break if you give it a >1
device group.

> > > Okay, that is fair, but let's solve that problem directly. For
> > > instance netlink has been going in the direction of adding a "extack"
> > > from the kernel which is a descriptive error string. If the failing
> > > ioctl returned the string:
> > > 
> > >   "cannot join this device to the IOASID because device XXX in the
> > >same group #10 is in use"
> > 
> > Um.. is there a sane way to return strings from an ioctl()?
> 
> Yes, it can be done, a string buffer pointer and length in the input
> for instance.

I suppose.  Rare enough that I expect everyone will ignore it, alas :/.

> > Getting the device fds from the group fd kind of follows, because it's
> > unsafe to do basically anything on the device unless you already
> > control the group (which in this case means attaching it to a
> > container/ioasid).  I'm entirely open to ways of doing that that are
> > less inelegant from a sysfs integration point of view, but the point
> > is you must manage the group before you can do anything at all with
> > individual devices.
> 
> I think most likely VFIO is going to be the only thing to manage a
> multi-device group.

You don't get to choose that.  You could explicitly limit other things
to only one-device groups, but that would be an explicit limitation.
Essentially any device can end up in a multi-device group, if you put
it behind a PCIe to PCI bridge, or a PCIe switch which doesn't support
access controls.

The groups are still there, whether or not other things want to deal
with them.

> I see things like VDPA being primarily about PASID, and an IOASID that
> is matched to a PASID is inherently a single device IOMMU group.

I don't know enough about PASID to make sense of that.

> > I don't see why.  I mean, sure, you don't want explicitly the *vfio*
> > group as such.  But IOMMU group is already a cross-subsystem concept
> > and you can explicitly expose that in a different way.
> 
> Yes, and no, the kernel drivers in something like VDPA have decided
> what device and group they are in before we get to IOASID. It is
> illogical to try to retro-actively bolt in a group concept to their
> APIs.

Again, I don't know enough about VDPA to make sense of that.  Are we
essentially talking non-PCI virtual devices here?  In which case you
could define the VDPA "bus" to always have one-device groups.

> > Again, I realy think this is necessary complexity.  You're right that
> > far too little of the userspace properly 

Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-04-28 Thread David Gibson
On Wed, Apr 28, 2021 at 09:21:49PM -0300, Jason Gunthorpe wrote:
> On Wed, Apr 28, 2021 at 11:23:39AM +1000, David Gibson wrote:
> 
> > Yes.  My proposed model for a unified interface would be that when you
> > create a new container/IOASID, *no* IOVAs are valid.
> 
> Hurm, it is quite tricky. All IOMMUs seem to have a dead zone around
> the MSI window, so negotiating this all in a general way is not going
> to be a very simple API.
> 
> To be general it would be nicer to say something like 'I need XXGB of
> IOVA space' 'I need 32 bit IOVA space' etc and have the kernel return
> ranges that sum up to at least that big. Then the kernel can do its
> all its optimizations.

Ah, yes, sorry.  We do need an API that lets the kernel make more of
the decisions too.  For userspace drivers it would generally be
sufficient to just ask for XXX size of IOVA space wherever you can get
it.  Handling guests requires more precision.  So, maybe a request
interface with a bunch of hint variables and a matching set of
MAP_FIXED-like flags to assert which ones aren't negotiable.

> I guess you are going to say that the qemu PPC vIOMMU driver needs
> more exact control..

*Every* vIOMMU driver needs more exact control.  The guest drivers
will expect to program the guest devices with IOVAs matching the guest
platform's IOMMU model.  Therefore the backing host IOMMU has to be
programmed to respond to those IOVAs.  If it can't be, there's no way
around it, and you want to fail out early.  With this model that will
happen when qemu (say) requests the host IOMMU window(s) to match the
guest's expected IOVA ranges.

Actually, come to that even guests without a vIOMMU need more exact
control: they'll expect IOVA to match GPA, so if your host IOMMU can't
be set up translate the full range of GPAs, again, you're out of luck.

The only reason x86 has been able to ignore this is that the
assumption has been that all IOMMUs can translate IOVAs from 0...  Once you really start to
look at what the limits are, you need the exact window control I'm
describing.

> > I expect we'd need some kind of query operation to expose limitations
> > on the number of windows, addresses for them, available pagesizes etc.
> 
> Is page size an assumption that hugetlbfs will always be used for backing
> memory or something?

So for TCEs (and maybe other IOMMUs out there), the IO page tables are
independent of the CPU page tables.  They don't have the same format,
and they don't necessarily have the same page size.  In the case of a
bare metal kernel working in physical addresses they can use that TCE
page size however they like.  For userspace you get another layer of
complexity.  Essentially to implement things correctly the backing
IOMMU needs to have a page size granularity that's the minimum of
whatever granularity the userspace or guest driver expects and the
host page size backing the memory.

> > > As an ideal, only things like the HW specific qemu vIOMMU driver
> > > should be reaching for all the special stuff.
> > 
> > I'm hoping we can even avoid that, usually.  With the explicitly
> > created windows model I propose above, it should be able to: qemu will
> > create the windows according to the IOVA windows the guest platform
> > expects to see and they either will or won't work on the host platform
> > IOMMU.  If they do, generic maps/unmaps should be sufficient.  If they
> > don't well, the host IOMMU simply cannot emulate the vIOMMU so you're
> > out of luck anyway.
> 
> It is not just P9 that has special stuff, and this whole area of PASID
> seems to be quite different on every platform
> 
> If things fit very naturally and generally then maybe, but I've been
> down this road before of trying to make a general description of a
> group of very special HW. It ended in tears after 10 years when nobody
> could understand the "general" API after it was Frankenstein'd up with
> special cases for everything. Cautionary tale
> 
> There is a certain appeal to having some
> 'PPC_TCE_CREATE_SPECIAL_IOASID' entry point that has a wack of extra
> information like windows that can be optionally called by the viommu
> driver and it remains well defined and described.

Windows really aren't ppc specific.  They're absolutely there on x86
and everything else as well - it's just that people are used to having
a window at 0.. that you can often get away with
treating it sloppily.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-04-28 Thread Jason Gunthorpe
On Wed, Apr 28, 2021 at 11:23:39AM +1000, David Gibson wrote:

> Yes.  My proposed model for a unified interface would be that when you
> create a new container/IOASID, *no* IOVAs are valid.

Hurm, it is quite tricky. All IOMMUs seem to have a dead zone around
the MSI window, so negotiating this all in a general way is not going
to be a very simple API.

To be general it would be nicer to say something like 'I need XXGB of
IOVA space' 'I need 32 bit IOVA space' etc and have the kernel return
ranges that sum up to at least that big. Then the kernel can do its
all its optimizations.

I guess you are going to say that the qemu PPC vIOMMU driver needs
more exact control..

> I expect we'd need some kind of query operation to expose limitations
> on the number of windows, addresses for them, available pagesizes etc.

Is page size an assumption that hugetlbfs will always be used for backing
memory or something?

> > As an ideal, only things like the HW specific qemu vIOMMU driver
> > should be reaching for all the special stuff.
> 
> I'm hoping we can even avoid that, usually.  With the explicitly
> created windows model I propose above, it should be able to: qemu will
> create the windows according to the IOVA windows the guest platform
> expects to see and they either will or won't work on the host platform
> IOMMU.  If they do, generic maps/unmaps should be sufficient.  If they
> don't well, the host IOMMU simply cannot emulate the vIOMMU so you're
> out of luck anyway.

It is not just P9 that has special stuff, and this whole area of PASID
seems to be quite different on every platform

If things fit very naturally and generally then maybe, but I've been
down this road before of trying to make a general description of a
group of very special HW. It ended in tears after 10 years when nobody
could understand the "general" API after it was Frankenstein'd up with
special cases for everything. Cautionary tale

There is a certain appeal to having some
'PPC_TCE_CREATE_SPECIAL_IOASID' entry point that has a wack of extra
information like windows that can be optionally called by the viommu
driver and it remains well defined and described.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-04-28 Thread Jason Gunthorpe
On Wed, Apr 28, 2021 at 06:34:11AM +, Tian, Kevin wrote:

> > If /dev/ioasid is single HW page table only then I would focus on that
> > implementation and leave it to userspace to span different
> > /dev/ioasids if needed.
> > 
> > > OK, now I see where the disconnection comes from. In my context ioasid
> > > is the identifier that is actually used in the wire, but seems you treat 
> > > it as
> > > a sw-defined namespace purely for representing page tables. We should
> > > clear this concept first before further discussing other details. 
> > 
> > There is no general HW requirement that every IO page table be
> > referred to by the same PASID and this API would have to support
> 
> Yes, but what is the value of allowing multiple PASIDs referring to the
> the same I/O page table (except the nesting pgtable case)? Doesn't it 
> lead to poor iotlb efficiency issue similar to multiple iommu domains 
> referring to the same page table?

I think iotlb efficiency is up to the platform.

The general use case is to make an IOASID for something like the GPA
and use it concurrently with three say three devices:
 - VFIO (not PASID)
 - VDPA (PASID capable HW)
 - 'Future VDPA storage' (PASID capable HW)

The uAPI for this should be very general and the kernel should decide
the optimal way to configure the HW. Maybe it is one page table and
one PASID, or maybe it is something else.

Allowing the kernel to choose the PASID once it knows the RID is the
highest generality.

> > non-PASID IO page tables as well. So I'd keep the two things
> > separated in the uAPI - even though the kernel today has a global
> > PASID pool.
> 
> for non-PASID usages the allocated PASID will be wasted if we don't
> separate ioasid from pasid. But it may be worthwhile given 1m available 
> pasids and the simplification in the uAPI which only needs to care about 
> one id space then.

I'd prefer this be a platform choice and not forced in the uAPI,
because we can never go back on it if we see that yes we need to
optimize here. I understand many platforms have different available
PASID spaces already.

> > Simple things like DPDK can use #2 and potentially have better PASID
> > limits. hypervisors will most likely have to use #1, but it depends on
> > how their vIOMMU interface works.
> 
> Can you elaborate why DPDK wants to use #2 i.e. not using a global
> PASID?

It gives the kernel an option to make the decision about the PASID
when it has the full information, including the RID.

> > I think the name IOASID is fine for the uAPI, the kernel version can
> > be called ioasid_id or something.
> 
> ioasid is already an id and then ioasid_id just adds confusion. Another
> point is that ioasid is currently used to represent both PCI PASID and
> ARM substream ID in the kernel. It implies that if we want to separate
> ioasid and pasid in the uAPI the 'pasid' also needs to be replaced with
> another general term usable for substream ID. Are we making the
> terms too confusing here?

This is why I also am not so sure about exposing the PASID in the API
because it is ultimately a HW specific item.

As I said to David, one avenue is to have some generic uAPI that is
very general and keep all this deeply detailed stuff, that really only
matters for qemu, as part of a more HW specific vIOMMU driver
interface.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-04-28 Thread Jason Gunthorpe
On Wed, Apr 28, 2021 at 07:47:56AM +, Tian, Kevin wrote:
> > From: Jason Gunthorpe 
> > Sent: Wednesday, April 28, 2021 1:12 AM
> > 
> [...]
> > One option is VFIO can keep its group FD but nothing else will have
> > anthing like it. However I don't much like the idea that VFIO will
> > have a special and unique programming model to do that same things
> > other subsystem will do. That will make it harder for userspace to
> > implement.
> 
> Hi, Jason,
> 
> I have a question here. Based on discussions so far, it's clearly that the
> new ioasid uAPI will differ from existing VFIO uAPI a lot, e.g. ioasid-
> centric operations, no group fd, no incompatible domains, etc. Then 
> I wonder how we plan to support legacy VFIO applications in this 
> transition phase. 

I suspect the VFIO group fd will have to be registered with
/dev/ioasid in addition to each device if we are to retain the same
model.

> Earlier you ever mentioned the desire of directly replacing
> /dev/vfio/vfio with /dev/ioasid and having ioasid to present both
> VFIO and new uAPI. Doesn't it imply that we have to copy the VFIO
> container/group semantics into /dev/ioasid although it's a special
> programming model only for VFIO?

I gave that as a something to think about, if it doesn't work out then
it is just a bad idea to discard.

> Alternatively we could keep all the container/group legacy within VFIO
> and having /dev/ioasid support only the new uAPI semantics. In this case
> VFIO will include a shim iommu backend to connect its legacy uAPI into 
> drivers/ioasid backend functions for backward compatibility. Then VFIO
> will also support a new model which only uses its device uAPI to bind
> to new ioasid fd w/o using any legacy container/group/iommu uAPI.
> Does this sound a plan? 

It may be where we end up.. Though I fear it will make it overly
complex inside VFIO to access the new stuff. It would be very nice if
we could see a path where VFIO insides could only deal with the
in-kernel ioasid handles, whatever they are.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-04-28 Thread Alex Williamson
On Wed, 28 Apr 2021 06:34:11 +
"Tian, Kevin"  wrote:

> > From: Jason Gunthorpe 
> > Sent: Monday, April 26, 2021 8:38 PM
> >   
> [...]
> > > Want to hear your opinion for one open here. There is no doubt that
> > > an ioasid represents a HW page table when the table is constructed by
> > > userspace and then linked to the IOMMU through the bind/unbind
> > > API. But I'm not very sure about whether an ioasid should represent
> > > the exact pgtable or the mapping metadata when the underlying
> > > pgtable is indirectly constructed through map/unmap API. VFIO does
> > > the latter way, which is why it allows multiple incompatible domains
> > > in a single container which all share the same mapping metadata.  
> > 
> > I think VFIO's map/unmap is way too complex and we know it has bad
> > performance problems.  
> 
> Can you or Alex elaborate where the complexity and performance problem
> locate in VFIO map/umap? We'd like to understand more detail and see how 
> to avoid it in the new interface.


The map/unmap interface is really only good for long lived mappings,
the overhead is too high for things like vIOMMU use cases or any case
where the mapping is intended to be dynamic.  Userspace drivers must
make use of a long lived buffer mapping in order to achieve performance.

The mapping and unmapping granularity has been a problem as well,
type1v1 allowed arbitrary unmaps to bisect the original mapping, with
the massive caveat that the caller relies on the return value of the
unmap to determine what was actually unmapped because the IOMMU use of
superpages is transparent to the caller.  This led to type1v2 that
simply restricts the user to avoid ever bisecting mappings.  That still
leaves us with problems for things like virtio-mem support where we
need to create initial mappings with a granularity that allows us to
later remove entries, which can prevent effective use of IOMMU
superpages.

Locked page accounting has been another constant issue.  We perform
locked page accounting at the container level, where each container
accounts independently.  A user may require multiple containers, the
containers may pin the same physical memory, but be accounted against
the user once per container.

Those are the main ones I can think of.  It is nice to have a simple
map/unmap interface, I'd hope that a new /dev/ioasid interface wouldn't
raise the barrier to entry too high, but the user needs to have the
ability to have more control of their mappings and locked page
accounting should probably be offloaded somewhere.  Thanks,

Alex

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-04-28 Thread Jason Gunthorpe
On Wed, Apr 28, 2021 at 10:58:29AM +1000, David Gibson wrote:
> On Tue, Apr 27, 2021 at 02:12:12PM -0300, Jason Gunthorpe wrote:
> > On Tue, Apr 27, 2021 at 03:08:46PM +1000, David Gibson wrote:
> > > > Starting from a BDF the general pseudo code is:
> > > >  device_name = first_directory_of("/sys/bus/pci/devices/BDF/vfio/")
> > > >  device_fd = open("/dev/vfio/"+device_name)
> > > >  ioasidfd = open("/dev/ioasid")
> > > >  ioctl(device_fd, JOIN_IOASID_FD, ioasidfd)
> > > 
> > > This line is the problem.
> > > 
> > > [Historical aside: Alex's early drafts for the VFIO interface looked
> > > quite similar to this.  Ben Herrenschmidt and myself persuaded him it
> > > was a bad idea, and groups were developed instead.  I still think it's
> > > a bad idea, and not just for POWER]
> > 
> > Spawning the VFIO device FD from the group FD is incredibly gross from
> > a kernel design perspective. Since that was done the struct
> > vfio_device missed out on a sysfs presence and doesn't have the
> > typical 'struct device' member or dedicated char device you'd expect a
> > FD based subsystem to have.
> > 
> > This basically traded normal usage of the driver core for something
> > that doesn't serve a technical usage. Given we are now nearly 10 years
> > later and see that real widely deployed applications are not doing
> > anything special with the group FD it makes me question the wisdom of
> > this choice.
> 
> I'm really not sure what "anything special" would constitute here.

Well, really anything actually. All I see in, say, dpdk, is open the
group fd, get a device fd, do the container dance and never touch the
group fd again or care about groups in any way. It seems typical of
this class of application.

If dpdk is exposing other devices to a risk it certainly hasn't done
anything to make that obvious.

> > Okay, that is fair, but let's solve that problem directly. For
> > instance netlink has been going in the direction of adding a "extack"
> > from the kernel which is a descriptive error string. If the failing
> > ioctl returned the string:
> > 
> >   "cannot join this device to the IOASID because device XXX in the
> >same group #10 is in use"
> 
> Um.. is there a sane way to return strings from an ioctl()?

Yes, it can be done, a string buffer pointer and length in the input
for instance.

> Getting the device fds from the group fd kind of follows, because it's
> unsafe to do basically anything on the device unless you already
> control the group (which in this case means attaching it to a
> container/ioasid).  I'm entirely open to ways of doing that that are
> less inelegant from a sysfs integration point of view, but the point
> is you must manage the group before you can do anything at all with
> individual devices.

I think most likely VFIO is going to be the only thing to manage a
multi-device group.

I see things like VDPA being primarily about PASID, and an IOASID that
is matched to a PASID is inherently a single device IOMMU group.

> I don't see why.  I mean, sure, you don't want explicitly the *vfio*
> group as such.  But IOMMU group is already a cross-subsystem concept
> and you can explicitly expose that in a different way.

Yes, and no, the kernel drivers in something like VDPA have decided
what device and group they are in before we get to IOASID. It is
illogical to try to retro-actively bolt in a group concept to their
APIs.
 
> Again, I realy think this is necessary complexity.  You're right that
> far too little of the userspace properly understands group
> restrictions.. but these come from real hardware limitations, and I
> don't feel like making it *less* obvious in the interface is going to
> help that.

The appeal of making it less obvious is we can have a single
simplified API flow so that an application that doesn't understand or
care about groups can have uniformity.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: DMA mapping fill dma_address to 0

2021-04-28 Thread Corentin Labbe
Le Wed, Apr 28, 2021 at 11:06:10AM +0100, Robin Murphy a écrit :
> On 2021-04-28 09:42, Corentin Labbe wrote:
> > Hello
> > 
> > I work on the crypto offloader driver of cortina/gemini SL3516 SoC.
> > I test it by filling a LUKS2 partition.
> > I got a reproductible problem when handling skcipher requests.
> > I use dma_map_sg() and when iterating other the result, sg_dma_address(sg) 
> > return 0.
> > But sg_dma_len(sg) is still correct (4096 in my case).
> > 
> > Below is a simplified view of my code:
> > nr_sgs = dma_map_sg(ce->dev, areq->src, sg_nents(areq->src), DMA_TO_DEVICE);
> > (nr_sgs = 1 in my case)
> > sg = areq->src;
> > if (!sg_dma_address(sg))
> > FAIL
> 
> What is this check supposed to be for in the first place? 0 is a valid 
> DMA address, because it's also a valid physical address, and I recall 
> RAM at PA 0 on Hikey 960 flushing out some bugs in the past when we 
> tried to use 0 for DMA_MAPPING_ERROR. All the Gemini DTs appear to show 
> RAM starting at PA 0 too, so I'd have to guess that it's simply the case 
> that your DMA buffer happened to end up using that particular page.
> 
> Robin.
> 

Yes, 0 is a valid DMA address.
I just find it by going further and printing mem_map value and testing it 
against sg_page() return.

So my original problem was not related to this.
Sorry for the noise.
Thanks
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: DMA mapping fill dma_address to 0

2021-04-28 Thread Robin Murphy

On 2021-04-28 09:42, Corentin Labbe wrote:

Hello

I work on the crypto offloader driver of cortina/gemini SL3516 SoC.
I test it by filling a LUKS2 partition.
I got a reproductible problem when handling skcipher requests.
I use dma_map_sg() and when iterating other the result, sg_dma_address(sg) 
return 0.
But sg_dma_len(sg) is still correct (4096 in my case).

Below is a simplified view of my code:
nr_sgs = dma_map_sg(ce->dev, areq->src, sg_nents(areq->src), DMA_TO_DEVICE);
(nr_sgs = 1 in my case)
sg = areq->src;
if (!sg_dma_address(sg))
FAIL


What is this check supposed to be for in the first place? 0 is a valid 
DMA address, because it's also a valid physical address, and I recall 
RAM at PA 0 on Hikey 960 flushing out some bugs in the past when we 
tried to use 0 for DMA_MAPPING_ERROR. All the Gemini DTs appear to show 
RAM starting at PA 0 too, so I'd have to guess that it's simply the case 
that your DMA buffer happened to end up using that particular page.


Robin.


I have digged to find what do dma_map_sg() and I have added some debug.
sg_page(sg) return c7efb000 for example so sg_page() works.
But it seems the problem is that page_to_phys(sg_page(sg)) return 0.

This problem does not appear immediatly, luksOpen and subsequent fsck always 
work.
But it appears fast after, when mouting or rsync files in it.

I have added CONFIG_DEBUG_SG, CONFIG_DMA_API_DEBUG, CONFIG_DMA_API_DEBUG_SG but 
they didnt bringed any more hints.
Only "DMA-API: cacheline tracking ENOMEM, dma-debug disabled" appears but always with 
some "time" between my problem and its display.
So I am not sure it is related.

Regards
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 05/16] swiotlb: Add restricted DMA pool initialization

2021-04-28 Thread Steven Price

On 26/04/2021 17:37, Claire Chang wrote:

On Fri, Apr 23, 2021 at 7:34 PM Steven Price  wrote:

[...]


But even then if it's not and we have the situation where debugfs==NULL
then the debugfs_create_dir() here will cause a subsequent attempt in
swiotlb_create_debugfs() to fail (directory already exists) leading to
mem->debugfs being assigned an error value. I suspect the creation of
the debugfs directory needs to be separated from io_tlb_default_mem
being set.


debugfs creation should move into the if (!mem) {...} above to avoid
duplication.
I think having a separated struct dentry pointer for the default
debugfs should be enough?

if (!debugfs)
 debugfs = debugfs_create_dir("swiotlb", NULL);
swiotlb_create_debugfs(mem, rmem->name, debugfs);


Yes that looks like a good solution to me. Although I'd name the 
variable something a bit more descriptive than just "debugfs" e.g. 
"debugfs_dir" or "debugfs_root".


Thanks,

Steve
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


DMA mapping fill dma_address to 0

2021-04-28 Thread Corentin Labbe
Hello

I work on the crypto offloader driver of cortina/gemini SL3516 SoC.
I test it by filling a LUKS2 partition.
I got a reproductible problem when handling skcipher requests.
I use dma_map_sg() and when iterating other the result, sg_dma_address(sg) 
return 0.
But sg_dma_len(sg) is still correct (4096 in my case).

Below is a simplified view of my code:
nr_sgs = dma_map_sg(ce->dev, areq->src, sg_nents(areq->src), DMA_TO_DEVICE);
(nr_sgs = 1 in my case)
sg = areq->src;
if (!sg_dma_address(sg))
FAIL

I have digged to find what do dma_map_sg() and I have added some debug.
sg_page(sg) return c7efb000 for example so sg_page() works.
But it seems the problem is that page_to_phys(sg_page(sg)) return 0.

This problem does not appear immediatly, luksOpen and subsequent fsck always 
work.
But it appears fast after, when mouting or rsync files in it.

I have added CONFIG_DEBUG_SG, CONFIG_DMA_API_DEBUG, CONFIG_DMA_API_DEBUG_SG but 
they didnt bringed any more hints.
Only "DMA-API: cacheline tracking ENOMEM, dma-debug disabled" appears but 
always with some "time" between my problem and its display.
So I am not sure it is related.

Regards
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/5] iommu: Support identity mappings of reserved-memory regions

2021-04-28 Thread Dmitry Osipenko
28.04.2021 08:57, Mikko Perttunen пишет:
> On 4/28/21 8:51 AM, Dmitry Osipenko wrote:
>> 23.04.2021 19:32, Thierry Reding пишет:
>>> Note that there will be no new releases of the bootloader for earlier
>>> devices, so adding support for these new DT bindings will not be
>>> practical. The bootloaders on those devices do pass information about
>>> the active framebuffer via the kernel command-line, so we may want to
>>> add code to create reserved regions in the IOMMU based on that.
>>
>> Since this change requires a bootloader update anyways, why it's not
>> possible to fix the bootloader properly, making it to stop all the DMA
>> activity before jumping into kernel?
>>
> 
> That is not desirable, as then we couldn't have seamless
> bootloader-kernel boot splash transition.

The seamless transition should be more complicated since it should
require to read out the hardware state in order to convert it into DRM
state + display panel needs to stay ON. It's a bit questionable whether
this is really needed, so far this is not achievable in mainline.

Nevertheless, it will be good to have an early simple-framebuffer, which
I realized only after sending out the message.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-04-28 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Wednesday, April 28, 2021 1:12 AM
> 
[...]
> One option is VFIO can keep its group FD but nothing else will have
> anthing like it. However I don't much like the idea that VFIO will
> have a special and unique programming model to do that same things
> other subsystem will do. That will make it harder for userspace to
> implement.

Hi, Jason,

I have a question here. Based on discussions so far, it's clearly that the
new ioasid uAPI will differ from existing VFIO uAPI a lot, e.g. ioasid-
centric operations, no group fd, no incompatible domains, etc. Then 
I wonder how we plan to support legacy VFIO applications in this 
transition phase. Earlier you ever mentioned the desire of directly
replacing /dev/vfio/vfio with /dev/ioasid and having ioasid to present
both VFIO and new uAPI. Doesn't it imply that we have to copy the 
VFIO container/group semantics into /dev/ioasid although it's a special 
programming model only for VFIO?

Alternatively we could keep all the container/group legacy within VFIO
and having /dev/ioasid support only the new uAPI semantics. In this case
VFIO will include a shim iommu backend to connect its legacy uAPI into 
drivers/ioasid backend functions for backward compatibility. Then VFIO
will also support a new model which only uses its device uAPI to bind
to new ioasid fd w/o using any legacy container/group/iommu uAPI.
Does this sound a plan? 

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-04-28 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Wednesday, April 28, 2021 1:12 AM
> 
[...] 
> > As Alex says, if this line fails because of the group restrictions,
> > that's not great because it's not very obvious what's gone wrong.
> 
> Okay, that is fair, but let's solve that problem directly. For
> instance netlink has been going in the direction of adding a "extack"
> from the kernel which is a descriptive error string. If the failing
> ioctl returned the string:
> 
>   "cannot join this device to the IOASID because device XXX in the
>same group #10 is in use"
> 
> Would you agree it is now obvious what has gone wrong? In fact would
> you agree this is a lot better user experience than what applications
> do today even though they have the group FD?
> 

Currently all the discussions are around implicit vs. explicit uAPI semantics
on the group restriction. However if we look beyond group the implicit 
semantics might be inevitable when dealing with incompatible iommu
domains. An existing example of iommu incompatibility is IOMMU_
CACHE. In the future there could be other incompatibilities such as 
whether nested translation is supported. In the end the userspace has 
to do some due diligence on understanding iommu topology and attributes 
to decide how many VFIO containers or ioasid fds should be created. It 
does push some burden to userspace but it's difficult to define a group-
like kernel object to enforce such restriction for iommu compatibility. 
Then the best that the kernel can do is to return an informational error 
message in case an incompatible device is attached to the existing domain. 
If this is the perceived way to move forward anyway, I feel that removing 
explicit group FD from uAPI doesn't make userspace worse...

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-04-28 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Monday, April 26, 2021 8:38 PM
> 
[...]
> > Want to hear your opinion for one open here. There is no doubt that
> > an ioasid represents a HW page table when the table is constructed by
> > userspace and then linked to the IOMMU through the bind/unbind
> > API. But I'm not very sure about whether an ioasid should represent
> > the exact pgtable or the mapping metadata when the underlying
> > pgtable is indirectly constructed through map/unmap API. VFIO does
> > the latter way, which is why it allows multiple incompatible domains
> > in a single container which all share the same mapping metadata.
> 
> I think VFIO's map/unmap is way too complex and we know it has bad
> performance problems.

Can you or Alex elaborate where the complexity and performance problem
locate in VFIO map/umap? We'd like to understand more detail and see how 
to avoid it in the new interface.

> 
> If /dev/ioasid is single HW page table only then I would focus on that
> implementation and leave it to userspace to span different
> /dev/ioasids if needed.
> 
> > OK, now I see where the disconnection comes from. In my context ioasid
> > is the identifier that is actually used in the wire, but seems you treat it 
> > as
> > a sw-defined namespace purely for representing page tables. We should
> > clear this concept first before further discussing other details. 
> 
> There is no general HW requirement that every IO page table be
> referred to by the same PASID and this API would have to support

Yes, but what is the value of allowing multiple PASIDs referring to the
the same I/O page table (except the nesting pgtable case)? Doesn't it 
lead to poor iotlb efficiency issue similar to multiple iommu domains 
referring to the same page table?

> non-PASID IO page tables as well. So I'd keep the two things
> separated in the uAPI - even though the kernel today has a global
> PASID pool.

for non-PASID usages the allocated PASID will be wasted if we don't
separate ioasid from pasid. But it may be worthwhile given 1m available 
pasids and the simplification in the uAPI which only needs to care about 
one id space then.

> 
> > Then following your proposal, does it mean that we need another
> > interface for allocating PASID? and since ioasid means different
> > thing in uAPI and in-kernel API, possibly a new name is required to
> > avoid confusion?
> 
> I would suggest have two ways to control the PASID
> 
>  1) Over /dev/ioasid allocate a PASID for an IOASID. All future PASID
> based usages of the IOASID will use that global PASID
> 
>  2) Over the device FD, when the IOASID is bound return the PASID that
> was selected. If the IOASID does not have a global PASID then the
> kernel is free to make something up. In this mode a single IOASID
> can have multiple PASIDs.
> 
> Simple things like DPDK can use #2 and potentially have better PASID
> limits. hypervisors will most likely have to use #1, but it depends on
> how their vIOMMU interface works.

Can you elaborate why DPDK wants to use #2 i.e. not using a global
PASID?

> 
> I think the name IOASID is fine for the uAPI, the kernel version can
> be called ioasid_id or something.

ioasid is already an id and then ioasid_id just adds confusion. Another
point is that ioasid is currently used to represent both PCI PASID and
ARM substream ID in the kernel. It implies that if we want to separate
ioasid and pasid in the uAPI the 'pasid' also needs to be replaced with
another general term usable for substream ID. Are we making the
terms too confusing here? 

> 
> (also looking at ioasid.c, why do we need such a thin and odd wrapper
> around xarray?)
> 

I'll leave it to Jean and Jacob.

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 0/5] iommu: Support identity mappings of reserved-memory regions

2021-04-28 Thread Mikko Perttunen

On 4/28/21 8:51 AM, Dmitry Osipenko wrote:

23.04.2021 19:32, Thierry Reding пишет:

Note that there will be no new releases of the bootloader for earlier
devices, so adding support for these new DT bindings will not be
practical. The bootloaders on those devices do pass information about
the active framebuffer via the kernel command-line, so we may want to
add code to create reserved regions in the IOMMU based on that.


Since this change requires a bootloader update anyways, why it's not
possible to fix the bootloader properly, making it to stop all the DMA
activity before jumping into kernel?



That is not desirable, as then we couldn't have seamless 
bootloader-kernel boot splash transition.


Mikko
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu