Re: Enabling peer to peer device transactions for PCIe devices

2016-11-25 Thread Felix Kuehling

On 16-11-25 03:40 PM, Christian König wrote:
> Am 25.11.2016 um 20:32 schrieb Jason Gunthorpe:
>> This assumes the commands are fairly short lived of course, the
>> expectation of the mmu notifiers is that a flush is reasonably prompt
>
> Correct, this is another problem. GFX command submissions usually
> don't take longer than a few milliseconds, but compute command
> submission can easily take multiple hours.
>
> I can easily imagine what would happen when kswapd is blocked by a GPU
> command submission for an hour or so while the system is under memory
> pressure :)
>
> I'm thinking on this problem for about a year now and going in circles
> for quite a while. So if you have ideas on this even if they sound
> totally crazy, feel free to come up.

Our GPUs (at least starting with VI) support compute-wave-save-restore
and can swap out compute queues with fairly low latency. Yes, there is
some overhead (both memory usage and time), but it's a fairly regular
thing with our hardware scheduler (firmware, actually) when we need to
preempt running compute queues to update runlists or we overcommit the
hardware queue resources.

Regards,
  Felix

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: Enabling peer to peer device transactions for PCIe devices

2016-11-25 Thread Felix Kuehling
On 16-11-25 12:20 PM, Serguei Sagalovitch wrote:
>
>> A white list may end up being rather complicated if it has to cover
>> different CPU generations and system architectures. I feel this is a
>> decision user space could easily make.
>>
>> Logan
> I agreed that it is better to leave up to user space to check what is
> working
> and what is not. I found that write is practically always working but
> read very
> often not. Also sometimes system BIOS update could fix the issue.
>
But is user mode always aware that P2P is going on or even possible? For
example you may have a library reading a buffer from a file, but it
doesn't necessarily know where that buffer is located (system memory,
VRAM, ...) and it may not know what kind of the device the file is on
(SATA drive, NVMe SSD, ...). The library will never know if all it gets
is a pointer and a file descriptor.

The library ends up calling a read system call. Then it would be up to
the kernel to figure out the most efficient way to read the buffer from
the file. If supported, it could use P2P between a GPU and NVMe where
the NVMe device performs a DMA write to VRAM.

If you put the burden of figuring out the P2P details on user mode code,
I think it will severely limit the use cases that actually take
advantage of it. You also risk a bunch of different implementations that
get it wrong half the time on half the systems out there.

Regards,
  Felix


___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm