Re: [PATCH v2 01/10] PCI/P2PDMA: Support peer to peer memory

2018-03-01 Thread Bjorn Helgaas
On Thu, Mar 01, 2018 at 11:14:46PM +, Stephen  Bates wrote:
> > I'm pretty sure the spec disallows routing-to-self so doing a P2P 
> > transaction in that sense isn't going to work unless the device 
> > specifically supports it and intercepts the traffic before it gets to 
> > the port.
> 
> This is correct. Unless the device intercepts the TLP before it hits
> the root-port then this would be considered a "route to self"
> violation and an error event would occur. The same holds for the
> downstream port on a PCI switch (unless route-to-self violations are
> disabled which violates the spec but which I have seen done in
> certain applications).

I agree that a function doing DMA to a sibling within the same
multi-function device would probably not generate a TLP for it (I
would be curious to read about this in the spec if you have a
pointer).

More fundamentally, is there some multi-function-specific restriction
on peer-to-peer DMA?  In conventional PCI, single-function devices on
the same bus can DMA to each other.  The transactions will appear on
the bus, but the upstream bridge will ignore them because the address
is inside the bridge's memory window.  As far as I know, the same
should happen on PCIe.

I don't know what happens with functions of a multi-function device,
either in conventional PCI or PCIe.  I don't remember a restriction on
whether they can DMA to each other, but maybe there is.

Bjorn


Re: [PATCH v2 01/10] PCI/P2PDMA: Support peer to peer memory

2018-03-01 Thread Bjorn Helgaas
On Thu, Mar 01, 2018 at 11:14:46PM +, Stephen  Bates wrote:
> > I'm pretty sure the spec disallows routing-to-self so doing a P2P 
> > transaction in that sense isn't going to work unless the device 
> > specifically supports it and intercepts the traffic before it gets to 
> > the port.
> 
> This is correct. Unless the device intercepts the TLP before it hits
> the root-port then this would be considered a "route to self"
> violation and an error event would occur. The same holds for the
> downstream port on a PCI switch (unless route-to-self violations are
> disabled which violates the spec but which I have seen done in
> certain applications).

I agree that a function doing DMA to a sibling within the same
multi-function device would probably not generate a TLP for it (I
would be curious to read about this in the spec if you have a
pointer).

More fundamentally, is there some multi-function-specific restriction
on peer-to-peer DMA?  In conventional PCI, single-function devices on
the same bus can DMA to each other.  The transactions will appear on
the bus, but the upstream bridge will ignore them because the address
is inside the bridge's memory window.  As far as I know, the same
should happen on PCIe.

I don't know what happens with functions of a multi-function device,
either in conventional PCI or PCIe.  I don't remember a restriction on
whether they can DMA to each other, but maybe there is.

Bjorn


Re: [PATCH v2 01/10] PCI/P2PDMA: Support peer to peer memory

2018-03-01 Thread Stephen Bates
> I'm pretty sure the spec disallows routing-to-self so doing a P2P 
> transaction in that sense isn't going to work unless the device 
> specifically supports it and intercepts the traffic before it gets to 
> the port.

This is correct. Unless the device intercepts the TLP before it hits the 
root-port then this would be considered a "route to self" violation and an 
error event would occur. The same holds for the downstream port on a PCI switch 
(unless route-to-self violations are disabled which violates the spec but which 
I have seen done in certain applications).

Stephen






Re: [PATCH v2 01/10] PCI/P2PDMA: Support peer to peer memory

2018-03-01 Thread Stephen Bates
> I'm pretty sure the spec disallows routing-to-self so doing a P2P 
> transaction in that sense isn't going to work unless the device 
> specifically supports it and intercepts the traffic before it gets to 
> the port.

This is correct. Unless the device intercepts the TLP before it hits the 
root-port then this would be considered a "route to self" violation and an 
error event would occur. The same holds for the downstream port on a PCI switch 
(unless route-to-self violations are disabled which violates the spec but which 
I have seen done in certain applications).

Stephen






Re: [PATCH v2 01/10] PCI/P2PDMA: Support peer to peer memory

2018-03-01 Thread Logan Gunthorpe




I don't think this is correct.  A Root Port defines a hierarchy domain
(I'm looking at PCIe r4.0, sec 1.3.1).  The capability to route
peer-to-peer transactions *between* hierarchy domains is optional.  I
think this means a Root Complex is not required to route transactions
from one Root Port to another Root Port.

This doesn't say anything about routing between two different devices
below a Root Port.  Those would be in the same hierarchy domain and
should follow the conventional PCI routing rules.  Of course, since a
Root Port has one link that leads to one device, they would probably
be different functions of a single multi-function device, so I don't
know how practical it would be to test this.


Yes, given that there's only one device below a root port it will either 
be a switch or a multi-function device. In the multi-function device 
case, I'm pretty sure the spec disallows routing-to-self so doing a P2P 
transaction in that sense isn't going to work unless the device 
specifically supports it and intercepts the traffic before it gets to 
the port.


But, if we're talking about multi-function devices it should be able to 
do everything within it's own driver so it's not exactly Peer-to-Peer. 
Still, if someone has such hardware I think it's up to them to add 
support for this odd situation.


Logan




Re: [PATCH v2 01/10] PCI/P2PDMA: Support peer to peer memory

2018-03-01 Thread Logan Gunthorpe




I don't think this is correct.  A Root Port defines a hierarchy domain
(I'm looking at PCIe r4.0, sec 1.3.1).  The capability to route
peer-to-peer transactions *between* hierarchy domains is optional.  I
think this means a Root Complex is not required to route transactions
from one Root Port to another Root Port.

This doesn't say anything about routing between two different devices
below a Root Port.  Those would be in the same hierarchy domain and
should follow the conventional PCI routing rules.  Of course, since a
Root Port has one link that leads to one device, they would probably
be different functions of a single multi-function device, so I don't
know how practical it would be to test this.


Yes, given that there's only one device below a root port it will either 
be a switch or a multi-function device. In the multi-function device 
case, I'm pretty sure the spec disallows routing-to-self so doing a P2P 
transaction in that sense isn't going to work unless the device 
specifically supports it and intercepts the traffic before it gets to 
the port.


But, if we're talking about multi-function devices it should be able to 
do everything within it's own driver so it's not exactly Peer-to-Peer. 
Still, if someone has such hardware I think it's up to them to add 
support for this odd situation.


Logan




Re: [PATCH v2 01/10] PCI/P2PDMA: Support peer to peer memory

2018-03-01 Thread Bjorn Helgaas
On Thu, Mar 01, 2018 at 11:55:51AM -0700, Logan Gunthorpe wrote:
> Hi Bjorn,
> 
> Thanks for the review. I'll correct all the nits for the next version.
> 
> On 01/03/18 10:37 AM, Bjorn Helgaas wrote:
> > On Wed, Feb 28, 2018 at 04:39:57PM -0700, Logan Gunthorpe wrote:
> > > Some PCI devices may have memory mapped in a BAR space that's
> > > intended for use in Peer-to-Peer transactions. In order to enable
> > > such transactions the memory must be registered with ZONE_DEVICE pages
> > > so it can be used by DMA interfaces in existing drivers.
> 
> > Is there anything about this memory that makes it specifically
> > intended for peer-to-peer transactions?  I assume the device can't
> > really tell whether a transaction is from a CPU or a peer.
> 
> No there's nothing special about the memory and it can still be accessed by
> the CPU. This is just the intended purpose. You could use this PCI memory as
> regular DMA buffers or regular memory but I'm not sure why you would. It
> would probably be pretty bad performance-wise.
> 
> 
> > BTW, maybe there could be some kind of guide for device driver writers
> > in Documentation/PCI/?
> Makes sense we can look at writing something for the next iteration.
> 
> > I think it would be clearer and sufficient to simply say that we have
> > no way to know whether peer-to-peer routing between PCIe Root Ports is
> > supported (PCIe r4.0, sec 1.3.1).
> 
> Fair enough.
> 
> > The fact that you use the PCIe term "switch" suggests that a PCIe
> > Switch is required, but isn't it sufficient for the peers to be below
> > the same "PCI bridge", which would include PCIe Root Ports, PCIe
> > Switch Downstream Ports, and conventional PCI bridges?
> > The comments at get_upstream_bridge_port() suggest that this isn't
> > enough, and the peers actually do have to be below the same PCIe
> > Switch, but I don't know why.
> 
> I do mean Switch as we do need to keep the traffic off the root complex.
> Seeing, as stated above, we don't know if it actually support it. (While we
> can be certain any PCI switch does). So we specifically want to exclude PCIe
> Root ports and I'm not sure about the support of PCI bridges but I can't
> imagine anyone wanting to do P2P around them so I'd rather be safe than
> sorry and exclude them.

I don't think this is correct.  A Root Port defines a hierarchy domain
(I'm looking at PCIe r4.0, sec 1.3.1).  The capability to route
peer-to-peer transactions *between* hierarchy domains is optional.  I
think this means a Root Complex is not required to route transactions
from one Root Port to another Root Port.

This doesn't say anything about routing between two different devices
below a Root Port.  Those would be in the same hierarchy domain and
should follow the conventional PCI routing rules.  Of course, since a
Root Port has one link that leads to one device, they would probably
be different functions of a single multi-function device, so I don't
know how practical it would be to test this.

> > This whole thing is confusing to me.  Why do you want to reject peers
> > directly connected to the same root port?  Why do you require the same
> > Switch Upstream Port?  You don't exclude conventional PCI, but it
> > looks like you would require peers to share *two* upstream PCI-to-PCI
> > bridges?  I would think a single shared upstream bridge (conventional,
> > PCIe Switch Downstream Port, or PCIe Root Port) would be sufficient?
> 
> Hmm, yes, this may just be laziness on my part. Finding the shared upstream
> bridge is a bit more tricky than just showing that they are on the same
> switch. So as coded, a fabric of switches with peers on different legs of
> the fabric are not supported. But yes, maybe they just need to be two
> devices with a single shared upstream bridge that is not the root port.
> Again, we need to reject the root port because we can't know if the root
> complex can support P2P traffic.

This sounds like the same issue as above, so we just need to resolve
that.

> > Since "pci_p2pdma_add_client()" includes "pci_" in its name, it seems
> > sort of weird that callers supply a non-PCI device and then we look up
> > a PCI device here.  I assume you have some reason for this; if you
> > added a writeup in Documentation/PCI, that would be a good place to
> > elaborate on that, maybe with a one-line clue here.
> 
> Well yes, but this is much more convenient for callers which don't need to
> care if the device they are attempting to add (which in the NVMe target
> case, could be a random block device) is a pci device or not. Especially
> seeing find_parent_pci_dev() is non-trivial.

OK.  I accept that it might be convenient, but I still think it leads
to a weird API.  Maybe that's OK; I don't know enough about the
scenario in the caller to do anything more than say "hmm, strange".

> > > +void *pci_alloc_p2pmem(struct pci_dev *pdev, size_t size)
> > > +{
> > > + void *ret;
> > > +
> > > + if (unlikely(!pdev->p2pdma))
> > 
> > Is this a hot path?  I'm 

Re: [PATCH v2 01/10] PCI/P2PDMA: Support peer to peer memory

2018-03-01 Thread Bjorn Helgaas
On Thu, Mar 01, 2018 at 11:55:51AM -0700, Logan Gunthorpe wrote:
> Hi Bjorn,
> 
> Thanks for the review. I'll correct all the nits for the next version.
> 
> On 01/03/18 10:37 AM, Bjorn Helgaas wrote:
> > On Wed, Feb 28, 2018 at 04:39:57PM -0700, Logan Gunthorpe wrote:
> > > Some PCI devices may have memory mapped in a BAR space that's
> > > intended for use in Peer-to-Peer transactions. In order to enable
> > > such transactions the memory must be registered with ZONE_DEVICE pages
> > > so it can be used by DMA interfaces in existing drivers.
> 
> > Is there anything about this memory that makes it specifically
> > intended for peer-to-peer transactions?  I assume the device can't
> > really tell whether a transaction is from a CPU or a peer.
> 
> No there's nothing special about the memory and it can still be accessed by
> the CPU. This is just the intended purpose. You could use this PCI memory as
> regular DMA buffers or regular memory but I'm not sure why you would. It
> would probably be pretty bad performance-wise.
> 
> 
> > BTW, maybe there could be some kind of guide for device driver writers
> > in Documentation/PCI/?
> Makes sense we can look at writing something for the next iteration.
> 
> > I think it would be clearer and sufficient to simply say that we have
> > no way to know whether peer-to-peer routing between PCIe Root Ports is
> > supported (PCIe r4.0, sec 1.3.1).
> 
> Fair enough.
> 
> > The fact that you use the PCIe term "switch" suggests that a PCIe
> > Switch is required, but isn't it sufficient for the peers to be below
> > the same "PCI bridge", which would include PCIe Root Ports, PCIe
> > Switch Downstream Ports, and conventional PCI bridges?
> > The comments at get_upstream_bridge_port() suggest that this isn't
> > enough, and the peers actually do have to be below the same PCIe
> > Switch, but I don't know why.
> 
> I do mean Switch as we do need to keep the traffic off the root complex.
> Seeing, as stated above, we don't know if it actually support it. (While we
> can be certain any PCI switch does). So we specifically want to exclude PCIe
> Root ports and I'm not sure about the support of PCI bridges but I can't
> imagine anyone wanting to do P2P around them so I'd rather be safe than
> sorry and exclude them.

I don't think this is correct.  A Root Port defines a hierarchy domain
(I'm looking at PCIe r4.0, sec 1.3.1).  The capability to route
peer-to-peer transactions *between* hierarchy domains is optional.  I
think this means a Root Complex is not required to route transactions
from one Root Port to another Root Port.

This doesn't say anything about routing between two different devices
below a Root Port.  Those would be in the same hierarchy domain and
should follow the conventional PCI routing rules.  Of course, since a
Root Port has one link that leads to one device, they would probably
be different functions of a single multi-function device, so I don't
know how practical it would be to test this.

> > This whole thing is confusing to me.  Why do you want to reject peers
> > directly connected to the same root port?  Why do you require the same
> > Switch Upstream Port?  You don't exclude conventional PCI, but it
> > looks like you would require peers to share *two* upstream PCI-to-PCI
> > bridges?  I would think a single shared upstream bridge (conventional,
> > PCIe Switch Downstream Port, or PCIe Root Port) would be sufficient?
> 
> Hmm, yes, this may just be laziness on my part. Finding the shared upstream
> bridge is a bit more tricky than just showing that they are on the same
> switch. So as coded, a fabric of switches with peers on different legs of
> the fabric are not supported. But yes, maybe they just need to be two
> devices with a single shared upstream bridge that is not the root port.
> Again, we need to reject the root port because we can't know if the root
> complex can support P2P traffic.

This sounds like the same issue as above, so we just need to resolve
that.

> > Since "pci_p2pdma_add_client()" includes "pci_" in its name, it seems
> > sort of weird that callers supply a non-PCI device and then we look up
> > a PCI device here.  I assume you have some reason for this; if you
> > added a writeup in Documentation/PCI, that would be a good place to
> > elaborate on that, maybe with a one-line clue here.
> 
> Well yes, but this is much more convenient for callers which don't need to
> care if the device they are attempting to add (which in the NVMe target
> case, could be a random block device) is a pci device or not. Especially
> seeing find_parent_pci_dev() is non-trivial.

OK.  I accept that it might be convenient, but I still think it leads
to a weird API.  Maybe that's OK; I don't know enough about the
scenario in the caller to do anything more than say "hmm, strange".

> > > +void *pci_alloc_p2pmem(struct pci_dev *pdev, size_t size)
> > > +{
> > > + void *ret;
> > > +
> > > + if (unlikely(!pdev->p2pdma))
> > 
> > Is this a hot path?  I'm 

Re: [PATCH v2 01/10] PCI/P2PDMA: Support peer to peer memory

2018-03-01 Thread Logan Gunthorpe

Hi Bjorn,

Thanks for the review. I'll correct all the nits for the next version.

On 01/03/18 10:37 AM, Bjorn Helgaas wrote:

On Wed, Feb 28, 2018 at 04:39:57PM -0700, Logan Gunthorpe wrote:

Some PCI devices may have memory mapped in a BAR space that's
intended for use in Peer-to-Peer transactions. In order to enable
such transactions the memory must be registered with ZONE_DEVICE pages
so it can be used by DMA interfaces in existing drivers.



Is there anything about this memory that makes it specifically
intended for peer-to-peer transactions?  I assume the device can't
really tell whether a transaction is from a CPU or a peer.


No there's nothing special about the memory and it can still be accessed 
by the CPU. This is just the intended purpose. You could use this PCI 
memory as regular DMA buffers or regular memory but I'm not sure why you 
would. It would probably be pretty bad performance-wise.




BTW, maybe there could be some kind of guide for device driver writers
in Documentation/PCI/?

Makes sense we can look at writing something for the next iteration.


I think it would be clearer and sufficient to simply say that we have
no way to know whether peer-to-peer routing between PCIe Root Ports is
supported (PCIe r4.0, sec 1.3.1).


Fair enough.


The fact that you use the PCIe term "switch" suggests that a PCIe
Switch is required, but isn't it sufficient for the peers to be below
the same "PCI bridge", which would include PCIe Root Ports, PCIe
Switch Downstream Ports, and conventional PCI bridges?
The comments at get_upstream_bridge_port() suggest that this isn't
enough, and the peers actually do have to be below the same PCIe
Switch, but I don't know why.


I do mean Switch as we do need to keep the traffic off the root complex. 
Seeing, as stated above, we don't know if it actually support it. (While 
we can be certain any PCI switch does). So we specifically want to 
exclude PCIe Root ports and I'm not sure about the support of PCI 
bridges but I can't imagine anyone wanting to do P2P around them so I'd 
rather be safe than sorry and exclude them.




+   addr = devm_memremap_pages(>dev, pgmap);
+   if (IS_ERR(addr))


Free pgmap here?  And in the other error case below?  Or maybe this
happens via the devm_* magic?  If so, when would that actually happen?
Would pgmap be effectively leaked until the pdev is destroyed?


Yes, it happens via the devm magic as that's the way the 
devm_memremap_pages() interface was designed. If I remember correctly, 
in my testing, it would be de-allocated when the driver gets unbound.



+   return PTR_ERR(addr);
+
+   error = gen_pool_add_virt(pdev->p2pdma->pool, (uintptr_t)addr,
+   pci_bus_address(pdev, bar) + offset,
+   resource_size(>res), dev_to_node(>dev));
+   if (error)
+   return error;
+
+   error = devm_add_action_or_reset(>dev, pci_p2pdma_percpu_kill,
+ >p2pdma->devmap_ref);
+   if (error)
+   return error;
+
+   dev_info(>dev, "added peer-to-peer DMA memory %pR\n",
+>res);


s/dev_info/pci_info/ (also similar usages below, except for the one or
two cases where you don't have a pci_dev).


Oh, nice, I didn't notice that was added.


This whole thing is confusing to me.  Why do you want to reject peers
directly connected to the same root port?  Why do you require the same
Switch Upstream Port?  You don't exclude conventional PCI, but it
looks like you would require peers to share *two* upstream PCI-to-PCI
bridges?  I would think a single shared upstream bridge (conventional,
PCIe Switch Downstream Port, or PCIe Root Port) would be sufficient?


Hmm, yes, this may just be laziness on my part. Finding the shared 
upstream bridge is a bit more tricky than just showing that they are on 
the same switch. So as coded, a fabric of switches with peers on 
different legs of the fabric are not supported. But yes, maybe they just 
need to be two devices with a single shared upstream bridge that is not 
the root port. Again, we need to reject the root port because we can't 
know if the root complex can support P2P traffic.



Since "pci_p2pdma_add_client()" includes "pci_" in its name, it seems
sort of weird that callers supply a non-PCI device and then we look up
a PCI device here.  I assume you have some reason for this; if you
added a writeup in Documentation/PCI, that would be a good place to
elaborate on that, maybe with a one-line clue here.


Well yes, but this is much more convenient for callers which don't need 
to care if the device they are attempting to add (which in the NVMe 
target case, could be a random block device) is a pci device or not. 
Especially seeing find_parent_pci_dev() is non-trivial.



+void *pci_alloc_p2pmem(struct pci_dev *pdev, size_t size)
+{
+   void *ret;
+
+   if (unlikely(!pdev->p2pdma))


Is this a hot path?  I'm not sure it's worth cluttering

Re: [PATCH v2 01/10] PCI/P2PDMA: Support peer to peer memory

2018-03-01 Thread Logan Gunthorpe

Hi Bjorn,

Thanks for the review. I'll correct all the nits for the next version.

On 01/03/18 10:37 AM, Bjorn Helgaas wrote:

On Wed, Feb 28, 2018 at 04:39:57PM -0700, Logan Gunthorpe wrote:

Some PCI devices may have memory mapped in a BAR space that's
intended for use in Peer-to-Peer transactions. In order to enable
such transactions the memory must be registered with ZONE_DEVICE pages
so it can be used by DMA interfaces in existing drivers.



Is there anything about this memory that makes it specifically
intended for peer-to-peer transactions?  I assume the device can't
really tell whether a transaction is from a CPU or a peer.


No there's nothing special about the memory and it can still be accessed 
by the CPU. This is just the intended purpose. You could use this PCI 
memory as regular DMA buffers or regular memory but I'm not sure why you 
would. It would probably be pretty bad performance-wise.




BTW, maybe there could be some kind of guide for device driver writers
in Documentation/PCI/?

Makes sense we can look at writing something for the next iteration.


I think it would be clearer and sufficient to simply say that we have
no way to know whether peer-to-peer routing between PCIe Root Ports is
supported (PCIe r4.0, sec 1.3.1).


Fair enough.


The fact that you use the PCIe term "switch" suggests that a PCIe
Switch is required, but isn't it sufficient for the peers to be below
the same "PCI bridge", which would include PCIe Root Ports, PCIe
Switch Downstream Ports, and conventional PCI bridges?
The comments at get_upstream_bridge_port() suggest that this isn't
enough, and the peers actually do have to be below the same PCIe
Switch, but I don't know why.


I do mean Switch as we do need to keep the traffic off the root complex. 
Seeing, as stated above, we don't know if it actually support it. (While 
we can be certain any PCI switch does). So we specifically want to 
exclude PCIe Root ports and I'm not sure about the support of PCI 
bridges but I can't imagine anyone wanting to do P2P around them so I'd 
rather be safe than sorry and exclude them.




+   addr = devm_memremap_pages(>dev, pgmap);
+   if (IS_ERR(addr))


Free pgmap here?  And in the other error case below?  Or maybe this
happens via the devm_* magic?  If so, when would that actually happen?
Would pgmap be effectively leaked until the pdev is destroyed?


Yes, it happens via the devm magic as that's the way the 
devm_memremap_pages() interface was designed. If I remember correctly, 
in my testing, it would be de-allocated when the driver gets unbound.



+   return PTR_ERR(addr);
+
+   error = gen_pool_add_virt(pdev->p2pdma->pool, (uintptr_t)addr,
+   pci_bus_address(pdev, bar) + offset,
+   resource_size(>res), dev_to_node(>dev));
+   if (error)
+   return error;
+
+   error = devm_add_action_or_reset(>dev, pci_p2pdma_percpu_kill,
+ >p2pdma->devmap_ref);
+   if (error)
+   return error;
+
+   dev_info(>dev, "added peer-to-peer DMA memory %pR\n",
+>res);


s/dev_info/pci_info/ (also similar usages below, except for the one or
two cases where you don't have a pci_dev).


Oh, nice, I didn't notice that was added.


This whole thing is confusing to me.  Why do you want to reject peers
directly connected to the same root port?  Why do you require the same
Switch Upstream Port?  You don't exclude conventional PCI, but it
looks like you would require peers to share *two* upstream PCI-to-PCI
bridges?  I would think a single shared upstream bridge (conventional,
PCIe Switch Downstream Port, or PCIe Root Port) would be sufficient?


Hmm, yes, this may just be laziness on my part. Finding the shared 
upstream bridge is a bit more tricky than just showing that they are on 
the same switch. So as coded, a fabric of switches with peers on 
different legs of the fabric are not supported. But yes, maybe they just 
need to be two devices with a single shared upstream bridge that is not 
the root port. Again, we need to reject the root port because we can't 
know if the root complex can support P2P traffic.



Since "pci_p2pdma_add_client()" includes "pci_" in its name, it seems
sort of weird that callers supply a non-PCI device and then we look up
a PCI device here.  I assume you have some reason for this; if you
added a writeup in Documentation/PCI, that would be a good place to
elaborate on that, maybe with a one-line clue here.


Well yes, but this is much more convenient for callers which don't need 
to care if the device they are attempting to add (which in the NVMe 
target case, could be a random block device) is a pci device or not. 
Especially seeing find_parent_pci_dev() is non-trivial.



+void *pci_alloc_p2pmem(struct pci_dev *pdev, size_t size)
+{
+   void *ret;
+
+   if (unlikely(!pdev->p2pdma))


Is this a hot path?  I'm not sure it's worth cluttering

Re: [PATCH v2 01/10] PCI/P2PDMA: Support peer to peer memory

2018-03-01 Thread Bjorn Helgaas
s/peer to peer/peer-to-peer/ to match text below and in spec.

On Wed, Feb 28, 2018 at 04:39:57PM -0700, Logan Gunthorpe wrote:
> Some PCI devices may have memory mapped in a BAR space that's
> intended for use in Peer-to-Peer transactions. In order to enable
> such transactions the memory must be registered with ZONE_DEVICE pages
> so it can be used by DMA interfaces in existing drivers.

s/Peer-to-Peer/peer-to-peer/ to match spec and typical usage.

Is there anything about this memory that makes it specifically
intended for peer-to-peer transactions?  I assume the device can't
really tell whether a transaction is from a CPU or a peer.

> A kernel interface is provided so that other subsystems can find and
> allocate chunks of P2P memory as necessary to facilitate transfers
> between two PCI peers. Depending on hardware, this may reduce the
> bandwidth of the transfer but would significantly reduce pressure
> on system memory. This may be desirable in many cases: for example a
> system could be designed with a small CPU connected to a PCI switch by a
> small number of lanes which would maximize the number of lanes available
> to connect to NVME devices.

"A kernel interface is provided" could mean "the kernel provides an
interface", independent of anything this patch does, but I think you
mean *this patch specifically* adds the interface.

Maybe something like:

  Add interfaces for other subsystems to find and allocate ...:

int pci_p2pdma_add_client();
struct pci_dev *pci_p2pmem_find();
void *pci_alloc_p2pmem();

  This may reduce bandwidth of the transfer but significantly reduce
  ...

BTW, maybe there could be some kind of guide for device driver writers
in Documentation/PCI/?

> The interface requires a user driver to collect a list of client devices
> involved in the transaction with the pci_p2pmem_add_client*() functions
> then call pci_p2pmem_find() to obtain any suitable P2P memory. Once
> this is done the list is bound to the memory and the calling driver is
> free to add and remove clients as necessary. The ACS bits on the
> downstream switch port will be managed for all the registered clients.
> 
> The code is designed to only utilize the p2pmem device if all the devices
> involved in a transfer are behind the same PCI switch. This is because
> using P2P transactions through the PCI root complex can have performance
> limitations or, worse, might not work at all. Finding out how well a
> particular RC supports P2P transfers is non-trivial. Additionally, the
> benefits of P2P transfers that go through the RC is limited to only
> reducing DRAM usage.

I think it would be clearer and sufficient to simply say that we have
no way to know whether peer-to-peer routing between PCIe Root Ports is
supported (PCIe r4.0, sec 1.3.1).

The fact that you use the PCIe term "switch" suggests that a PCIe
Switch is required, but isn't it sufficient for the peers to be below
the same "PCI bridge", which would include PCIe Root Ports, PCIe
Switch Downstream Ports, and conventional PCI bridges?

The comments at get_upstream_bridge_port() suggest that this isn't
enough, and the peers actually do have to be below the same PCIe
Switch, but I don't know why.

> diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
> index 34b56a8f8480..840831418cbd 100644
> --- a/drivers/pci/Kconfig
> +++ b/drivers/pci/Kconfig
> @@ -124,6 +124,22 @@ config PCI_PASID
>  
> If unsure, say N.
>  
> +config PCI_P2PDMA
> + bool "PCI Peer to Peer transfer support"
> + depends on ZONE_DEVICE
> + select GENERIC_ALLOCATOR
> + help
> +   Enableѕ drivers to do PCI peer to peer transactions to and from

s/peer to peer/peer-to-peer/ (in bool and help text)

> +   BARs that are exposed in other devices that are the part of
> +   the hierarchy where peer-to-peer DMA is guaranteed by the PCI
> +   specification to work (ie. anything below a single PCI bridge).
> +
> +   Many PCIe root complexes do not support P2P transactions and
> +   it's hard to tell which support it with good performance, so
> +   at this time you will need a PCIe switch.

Until we have a way to figure out which of them support P2P,
performance is a don't-care.

> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> new file mode 100644
> index ..ec0a6cb9e500
> --- /dev/null
> +++ b/drivers/pci/p2pdma.c
> @@ -0,0 +1,568 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * PCI Peer 2 Peer DMA support.

s/Peer 2 Peer/peer-to-peer/

> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.


Re: [PATCH v2 01/10] PCI/P2PDMA: Support peer to peer memory

2018-03-01 Thread Bjorn Helgaas
s/peer to peer/peer-to-peer/ to match text below and in spec.

On Wed, Feb 28, 2018 at 04:39:57PM -0700, Logan Gunthorpe wrote:
> Some PCI devices may have memory mapped in a BAR space that's
> intended for use in Peer-to-Peer transactions. In order to enable
> such transactions the memory must be registered with ZONE_DEVICE pages
> so it can be used by DMA interfaces in existing drivers.

s/Peer-to-Peer/peer-to-peer/ to match spec and typical usage.

Is there anything about this memory that makes it specifically
intended for peer-to-peer transactions?  I assume the device can't
really tell whether a transaction is from a CPU or a peer.

> A kernel interface is provided so that other subsystems can find and
> allocate chunks of P2P memory as necessary to facilitate transfers
> between two PCI peers. Depending on hardware, this may reduce the
> bandwidth of the transfer but would significantly reduce pressure
> on system memory. This may be desirable in many cases: for example a
> system could be designed with a small CPU connected to a PCI switch by a
> small number of lanes which would maximize the number of lanes available
> to connect to NVME devices.

"A kernel interface is provided" could mean "the kernel provides an
interface", independent of anything this patch does, but I think you
mean *this patch specifically* adds the interface.

Maybe something like:

  Add interfaces for other subsystems to find and allocate ...:

int pci_p2pdma_add_client();
struct pci_dev *pci_p2pmem_find();
void *pci_alloc_p2pmem();

  This may reduce bandwidth of the transfer but significantly reduce
  ...

BTW, maybe there could be some kind of guide for device driver writers
in Documentation/PCI/?

> The interface requires a user driver to collect a list of client devices
> involved in the transaction with the pci_p2pmem_add_client*() functions
> then call pci_p2pmem_find() to obtain any suitable P2P memory. Once
> this is done the list is bound to the memory and the calling driver is
> free to add and remove clients as necessary. The ACS bits on the
> downstream switch port will be managed for all the registered clients.
> 
> The code is designed to only utilize the p2pmem device if all the devices
> involved in a transfer are behind the same PCI switch. This is because
> using P2P transactions through the PCI root complex can have performance
> limitations or, worse, might not work at all. Finding out how well a
> particular RC supports P2P transfers is non-trivial. Additionally, the
> benefits of P2P transfers that go through the RC is limited to only
> reducing DRAM usage.

I think it would be clearer and sufficient to simply say that we have
no way to know whether peer-to-peer routing between PCIe Root Ports is
supported (PCIe r4.0, sec 1.3.1).

The fact that you use the PCIe term "switch" suggests that a PCIe
Switch is required, but isn't it sufficient for the peers to be below
the same "PCI bridge", which would include PCIe Root Ports, PCIe
Switch Downstream Ports, and conventional PCI bridges?

The comments at get_upstream_bridge_port() suggest that this isn't
enough, and the peers actually do have to be below the same PCIe
Switch, but I don't know why.

> diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
> index 34b56a8f8480..840831418cbd 100644
> --- a/drivers/pci/Kconfig
> +++ b/drivers/pci/Kconfig
> @@ -124,6 +124,22 @@ config PCI_PASID
>  
> If unsure, say N.
>  
> +config PCI_P2PDMA
> + bool "PCI Peer to Peer transfer support"
> + depends on ZONE_DEVICE
> + select GENERIC_ALLOCATOR
> + help
> +   Enableѕ drivers to do PCI peer to peer transactions to and from

s/peer to peer/peer-to-peer/ (in bool and help text)

> +   BARs that are exposed in other devices that are the part of
> +   the hierarchy where peer-to-peer DMA is guaranteed by the PCI
> +   specification to work (ie. anything below a single PCI bridge).
> +
> +   Many PCIe root complexes do not support P2P transactions and
> +   it's hard to tell which support it with good performance, so
> +   at this time you will need a PCIe switch.

Until we have a way to figure out which of them support P2P,
performance is a don't-care.

> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> new file mode 100644
> index ..ec0a6cb9e500
> --- /dev/null
> +++ b/drivers/pci/p2pdma.c
> @@ -0,0 +1,568 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * PCI Peer 2 Peer DMA support.

s/Peer 2 Peer/peer-to-peer/

> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.


[PATCH v2 01/10] PCI/P2PDMA: Support peer to peer memory

2018-02-28 Thread Logan Gunthorpe
Some PCI devices may have memory mapped in a BAR space that's
intended for use in Peer-to-Peer transactions. In order to enable
such transactions the memory must be registered with ZONE_DEVICE pages
so it can be used by DMA interfaces in existing drivers.

A kernel interface is provided so that other subsystems can find and
allocate chunks of P2P memory as necessary to facilitate transfers
between two PCI peers. Depending on hardware, this may reduce the
bandwidth of the transfer but would significantly reduce pressure
on system memory. This may be desirable in many cases: for example a
system could be designed with a small CPU connected to a PCI switch by a
small number of lanes which would maximize the number of lanes available
to connect to NVME devices.

The interface requires a user driver to collect a list of client devices
involved in the transaction with the pci_p2pmem_add_client*() functions
then call pci_p2pmem_find() to obtain any suitable P2P memory. Once
this is done the list is bound to the memory and the calling driver is
free to add and remove clients as necessary. The ACS bits on the
downstream switch port will be managed for all the registered clients.

The code is designed to only utilize the p2pmem device if all the devices
involved in a transfer are behind the same PCI switch. This is because
using P2P transactions through the PCI root complex can have performance
limitations or, worse, might not work at all. Finding out how well a
particular RC supports P2P transfers is non-trivial. Additionally, the
benefits of P2P transfers that go through the RC is limited to only
reducing DRAM usage.

This commit includes significant rework and feedback from Christoph
Hellwig.

Signed-off-by: Christoph Hellwig 
Signed-off-by: Logan Gunthorpe 
---
 drivers/pci/Kconfig|  16 ++
 drivers/pci/Makefile   |   1 +
 drivers/pci/p2pdma.c   | 568 +
 include/linux/memremap.h   |  18 ++
 include/linux/pci-p2pdma.h |  87 +++
 include/linux/pci.h|   4 +
 6 files changed, 694 insertions(+)
 create mode 100644 drivers/pci/p2pdma.c
 create mode 100644 include/linux/pci-p2pdma.h

diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index 34b56a8f8480..840831418cbd 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -124,6 +124,22 @@ config PCI_PASID
 
  If unsure, say N.
 
+config PCI_P2PDMA
+   bool "PCI Peer to Peer transfer support"
+   depends on ZONE_DEVICE
+   select GENERIC_ALLOCATOR
+   help
+ Enableѕ drivers to do PCI peer to peer transactions to and from
+ BARs that are exposed in other devices that are the part of
+ the hierarchy where peer-to-peer DMA is guaranteed by the PCI
+ specification to work (ie. anything below a single PCI bridge).
+
+ Many PCIe root complexes do not support P2P transactions and
+ it's hard to tell which support it with good performance, so
+ at this time you will need a PCIe switch.
+
+ If unsure, say N.
+
 config PCI_LABEL
def_bool y if (DMI || ACPI)
depends on PCI
diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
index 941970936840..45e0ff6f3213 100644
--- a/drivers/pci/Makefile
+++ b/drivers/pci/Makefile
@@ -26,6 +26,7 @@ obj-$(CONFIG_PCI_MSI) += msi.o
 
 obj-$(CONFIG_PCI_ATS) += ats.o
 obj-$(CONFIG_PCI_IOV) += iov.o
+obj-$(CONFIG_PCI_P2PDMA) += p2pdma.o
 
 #
 # ACPI Related PCI FW Functions
diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
new file mode 100644
index ..ec0a6cb9e500
--- /dev/null
+++ b/drivers/pci/p2pdma.c
@@ -0,0 +1,568 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * PCI Peer 2 Peer DMA support.
+ *
+ * Copyright (c) 2016-2017, Microsemi Corporation
+ * Copyright (c) 2017, Christoph Hellwig.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct pci_p2pdma {
+   struct percpu_ref devmap_ref;
+   struct completion devmap_ref_done;
+   struct gen_pool *pool;
+   bool published;
+};
+
+static void pci_p2pdma_percpu_release(struct percpu_ref *ref)
+{
+   struct pci_p2pdma *p2p =
+   container_of(ref, struct pci_p2pdma, devmap_ref);
+
+   complete_all(>devmap_ref_done);
+}
+
+static void pci_p2pdma_percpu_kill(void *data)
+{
+   struct percpu_ref *ref = data;
+
+   if (percpu_ref_is_dying(ref))
+   return;
+
+   percpu_ref_kill(ref);
+}
+
+static void pci_p2pdma_release(void 

[PATCH v2 01/10] PCI/P2PDMA: Support peer to peer memory

2018-02-28 Thread Logan Gunthorpe
Some PCI devices may have memory mapped in a BAR space that's
intended for use in Peer-to-Peer transactions. In order to enable
such transactions the memory must be registered with ZONE_DEVICE pages
so it can be used by DMA interfaces in existing drivers.

A kernel interface is provided so that other subsystems can find and
allocate chunks of P2P memory as necessary to facilitate transfers
between two PCI peers. Depending on hardware, this may reduce the
bandwidth of the transfer but would significantly reduce pressure
on system memory. This may be desirable in many cases: for example a
system could be designed with a small CPU connected to a PCI switch by a
small number of lanes which would maximize the number of lanes available
to connect to NVME devices.

The interface requires a user driver to collect a list of client devices
involved in the transaction with the pci_p2pmem_add_client*() functions
then call pci_p2pmem_find() to obtain any suitable P2P memory. Once
this is done the list is bound to the memory and the calling driver is
free to add and remove clients as necessary. The ACS bits on the
downstream switch port will be managed for all the registered clients.

The code is designed to only utilize the p2pmem device if all the devices
involved in a transfer are behind the same PCI switch. This is because
using P2P transactions through the PCI root complex can have performance
limitations or, worse, might not work at all. Finding out how well a
particular RC supports P2P transfers is non-trivial. Additionally, the
benefits of P2P transfers that go through the RC is limited to only
reducing DRAM usage.

This commit includes significant rework and feedback from Christoph
Hellwig.

Signed-off-by: Christoph Hellwig 
Signed-off-by: Logan Gunthorpe 
---
 drivers/pci/Kconfig|  16 ++
 drivers/pci/Makefile   |   1 +
 drivers/pci/p2pdma.c   | 568 +
 include/linux/memremap.h   |  18 ++
 include/linux/pci-p2pdma.h |  87 +++
 include/linux/pci.h|   4 +
 6 files changed, 694 insertions(+)
 create mode 100644 drivers/pci/p2pdma.c
 create mode 100644 include/linux/pci-p2pdma.h

diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index 34b56a8f8480..840831418cbd 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -124,6 +124,22 @@ config PCI_PASID
 
  If unsure, say N.
 
+config PCI_P2PDMA
+   bool "PCI Peer to Peer transfer support"
+   depends on ZONE_DEVICE
+   select GENERIC_ALLOCATOR
+   help
+ Enableѕ drivers to do PCI peer to peer transactions to and from
+ BARs that are exposed in other devices that are the part of
+ the hierarchy where peer-to-peer DMA is guaranteed by the PCI
+ specification to work (ie. anything below a single PCI bridge).
+
+ Many PCIe root complexes do not support P2P transactions and
+ it's hard to tell which support it with good performance, so
+ at this time you will need a PCIe switch.
+
+ If unsure, say N.
+
 config PCI_LABEL
def_bool y if (DMI || ACPI)
depends on PCI
diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
index 941970936840..45e0ff6f3213 100644
--- a/drivers/pci/Makefile
+++ b/drivers/pci/Makefile
@@ -26,6 +26,7 @@ obj-$(CONFIG_PCI_MSI) += msi.o
 
 obj-$(CONFIG_PCI_ATS) += ats.o
 obj-$(CONFIG_PCI_IOV) += iov.o
+obj-$(CONFIG_PCI_P2PDMA) += p2pdma.o
 
 #
 # ACPI Related PCI FW Functions
diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
new file mode 100644
index ..ec0a6cb9e500
--- /dev/null
+++ b/drivers/pci/p2pdma.c
@@ -0,0 +1,568 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * PCI Peer 2 Peer DMA support.
+ *
+ * Copyright (c) 2016-2017, Microsemi Corporation
+ * Copyright (c) 2017, Christoph Hellwig.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct pci_p2pdma {
+   struct percpu_ref devmap_ref;
+   struct completion devmap_ref_done;
+   struct gen_pool *pool;
+   bool published;
+};
+
+static void pci_p2pdma_percpu_release(struct percpu_ref *ref)
+{
+   struct pci_p2pdma *p2p =
+   container_of(ref, struct pci_p2pdma, devmap_ref);
+
+   complete_all(>devmap_ref_done);
+}
+
+static void pci_p2pdma_percpu_kill(void *data)
+{
+   struct percpu_ref *ref = data;
+
+   if (percpu_ref_is_dying(ref))
+   return;
+
+   percpu_ref_kill(ref);
+}
+
+static void pci_p2pdma_release(void *data)
+{
+   struct pci_dev