RE: [RFC PATCH Xilinx Alveo 0/6] Xilinx PCIe accelerator driver

2019-04-08 Thread Sonal Santan


> -Original Message-
> From: Jerome Glisse [mailto:jgli...@redhat.com]
> Sent: Wednesday, April 03, 2019 8:48 AM
> To: Ronan KERYELL 
> Cc: Dave Airlie ; Sonal Santan ;
> Daniel Vetter ; dri-devel@lists.freedesktop.org;
> gre...@linuxfoundation.org; Cyril Chemparathy ; linux-
> ker...@vger.kernel.org; Lizhi Hou ; Michal Simek
> ; airl...@redhat.com; linux-f...@vger.kernel.org; Ralph
> Wittig ; Ronan Keryell 
> Subject: Re: [RFC PATCH Xilinx Alveo 0/6] Xilinx PCIe accelerator driver
> 
> On Fri, Mar 29, 2019 at 06:09:18PM -0700, Ronan KERYELL wrote:
> > I am adding linux-f...@vger.kernel.org, since this is why I missed
> > this thread in the first place...
> > >>>>> On Fri, 29 Mar 2019 14:56:17 +1000, Dave Airlie 
> said:
> > Dave> On Thu, 28 Mar 2019 at 10:14, Sonal Santan 
> wrote:
> > >>> From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch]
> 
> [...]
> 
> > Long answer:
> >
> > - processors, GPU and other digital circuits are designed from a lot of
> >   elementary transistors, wires, capacitors, resistors... using some
> >   very complex (and expensive) tools from some EDA companies but at the
> >   end, after months of work, they come often with a "simple" public
> >   interface, the... instruction set! So it is rather "easy" at the end
> >   to generate some instructions with a compiler such as LLVM from a
> >   description of this ISA or some reverse engineering. Note that even if
> >   the ISA is public, it is very difficult to make another efficient
> >   processor from scratch just from this ISA, so there is often no
> >   concern about making this ISA public to develop the ecosystem ;
> >
> > - FPGA are field-programmable gate arrays, made also from a lot of
> >   elementary transistors, wires, capacitors, resistors... but organized
> >   in billions of very low-level elementary gates, memory elements, DSP
> >   blocks, I/O blocks, clock generators, specific
> >   accelerators... directly exposed to the user and that can be
> >   programmed according to a configuration memory (the bitstream) that
> >   details how to connect each part, routing element, configuring each
> >   elemental piece of hardware.  So instead of just writing instructions
> >   like on a CPU or a GPU, you need to configure each bit of the
> >   architecture in such a way it does something interesting for
> >   you. Concretely, you write some programs in RTL languages (Verilog,
> >   VHDL) or higher-level (C/C++, OpenCL, SYCL...)  and you use some very
> >   complex (and expensive) tools from some EDA companies to generate the
> >   bitstream implementing an equivalent circuit with the same
> >   semantics. Since the architecture is so low level, there is a direct
> >   mapping between the configuration memory (bitstream) and the hardware
> >   architecture itself, so if it is public then it is easy to duplicate
> >   the FPGA itself and to start a new FPGA company. That is unfortunately
> >   something the existing FPGA companies do not want... ;-)
> 
> This is completely bogus argument, all FPGA documentation i have seen so far
> _extensively_ describe _each_ basic blocks within the FGPA, this does include
> the excelent documentation Xilinx provide on the inner working and layout of
> Xilinx FPGA. Same apply to Altera, Atmel, Latice, ...
> 
> The extensive public documentation is enough for anyone with the money
> and with half decent engineers to produce an FPGA.
> 
> The real know how of FPGA vendor is how to produce big chips on small
> process capable to sustain high clock with the best power consumption
> possible. This is the part where the years of experiences of each company pay
> off. The cost for anyone to come to the market is in the hundred of millions
> just in setup cost and to catch with established vendor on the hardware side.
> This without any garanty of revenue at the end.
> 
> The bitstream is only giving away which bits correspond to which wire where
> the LUT boolean table is store  ... Bitstream that have been reverse engineer
> never revealed anything of value that was not already publicly documented.
> 
> 
> So no the bitstream has _no_ value, please prove me wrong with Latice
> bitstream for instance. If anything the fact that Latice has a reverse 
> engineer
> bitstream has made that FPGA popular with the maker community as it allows
> people to do experiment for which the closed source tools are an
> impediment. So i would argue that open bitstream is actualy beneficial.
> 
> 
> The only valid reason i have ever seen for hidding the bitstream is to protect
> the IP of 

Re: [RFC PATCH Xilinx Alveo 0/6] Xilinx PCIe accelerator driver

2019-04-03 Thread Jerome Glisse
On Fri, Mar 29, 2019 at 06:09:18PM -0700, Ronan KERYELL wrote:
> I am adding linux-f...@vger.kernel.org, since this is why I missed this
> thread in the first place...
> > On Fri, 29 Mar 2019 14:56:17 +1000, Dave Airlie  
> > said:
> Dave> On Thu, 28 Mar 2019 at 10:14, Sonal Santan  
> wrote:
> >>> From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch]

[...]

> Long answer:
> 
> - processors, GPU and other digital circuits are designed from a lot of
>   elementary transistors, wires, capacitors, resistors... using some
>   very complex (and expensive) tools from some EDA companies but at the
>   end, after months of work, they come often with a "simple" public
>   interface, the... instruction set! So it is rather "easy" at the end
>   to generate some instructions with a compiler such as LLVM from a
>   description of this ISA or some reverse engineering. Note that even if
>   the ISA is public, it is very difficult to make another efficient
>   processor from scratch just from this ISA, so there is often no
>   concern about making this ISA public to develop the ecosystem ;
> 
> - FPGA are field-programmable gate arrays, made also from a lot of
>   elementary transistors, wires, capacitors, resistors... but organized
>   in billions of very low-level elementary gates, memory elements, DSP
>   blocks, I/O blocks, clock generators, specific
>   accelerators... directly exposed to the user and that can be
>   programmed according to a configuration memory (the bitstream) that
>   details how to connect each part, routing element, configuring each
>   elemental piece of hardware.  So instead of just writing instructions
>   like on a CPU or a GPU, you need to configure each bit of the
>   architecture in such a way it does something interesting for
>   you. Concretely, you write some programs in RTL languages (Verilog,
>   VHDL) or higher-level (C/C++, OpenCL, SYCL...)  and you use some very
>   complex (and expensive) tools from some EDA companies to generate the
>   bitstream implementing an equivalent circuit with the same
>   semantics. Since the architecture is so low level, there is a direct
>   mapping between the configuration memory (bitstream) and the hardware
>   architecture itself, so if it is public then it is easy to duplicate
>   the FPGA itself and to start a new FPGA company. That is unfortunately
>   something the existing FPGA companies do not want... ;-)

This is completely bogus argument, all FPGA documentation i have seen so far
_extensively_ describe _each_ basic blocks within the FGPA, this does include
the excelent documentation Xilinx provide on the inner working and layout of
Xilinx FPGA. Same apply to Altera, Atmel, Latice, ...

The extensive public documentation is enough for anyone with the money and
with half decent engineers to produce an FPGA.

The real know how of FPGA vendor is how to produce big chips on small process
capable to sustain high clock with the best power consumption possible. This
is the part where the years of experiences of each company pay off. The cost
for anyone to come to the market is in the hundred of millions just in setup
cost and to catch with established vendor on the hardware side. This without
any garanty of revenue at the end.

The bitstream is only giving away which bits correspond to which wire where
the LUT boolean table is store  ... Bitstream that have been reverse engineer
never revealed anything of value that was not already publicly documented.


So no the bitstream has _no_ value, please prove me wrong with Latice bitstream
for instance. If anything the fact that Latice has a reverse engineer bitstream
has made that FPGA popular with the maker community as it allows people to do
experiment for which the closed source tools are an impediment. So i would argue
that open bitstream is actualy beneficial.


The only valid reason i have ever seen for hidding the bitstream is to protect
the IP of the customer ie those customer that can pour quite a lot of money on
designing something with an FPGA and then wants to keep the VHDL/Verilog
protected and "safe" from reverse engineering.

But this is security by obscurity and FPGA company would be better off providing
strong bitstream encryption (and most already do but i have seen some paper on
how to break them).


I rather not see any bogus argument to try to justify something that is not
justifiable.


Daniel already stressed that we need to know what the bitstream can do and it
is even more important with FPGA where on some FPGA AFAICT the bitstream can
have total control over the PCIE BUS and thus can be use to attack either main
memory or other PCIE devices.

For instance with ATS/PASID you can have the device send pre-translated request
to the IOMMU and access any memory despite the IOMMU.

So without total confidence of what the bitstream can and can not do, and thus
without knowledge of the bitstream format and how it maps to LUT, switch, cross-
bar, clock, fix block (PCIE, 

Re: [RFC PATCH Xilinx Alveo 0/6] Xilinx PCIe accelerator driver

2019-04-03 Thread Daniel Vetter
On Wed, Apr 3, 2019 at 4:17 PM Moritz Fischer  wrote:
>
> Hi Daniel,
>
> On Wed, Apr 03, 2019 at 03:14:49PM +0200, Daniel Vetter wrote:
> > On Fri, Mar 29, 2019 at 06:09:18PM -0700, Ronan KERYELL wrote:
> > > I am adding linux-f...@vger.kernel.org, since this is why I missed this
> > > thread in the first place...
> > >
> > > > On Fri, 29 Mar 2019 14:56:17 +1000, Dave Airlie  
> > > > said:
> > >
> > > Hi Dave!
> > >
> > > Dave> On Thu, 28 Mar 2019 at 10:14, Sonal Santan  
> > > wrote:
> > >
> > > >>> From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch]
> > >
> > > [...]
> > >
> > > >>> Note: There's no expectation for the fully optimizing compiler,
> > > >>> and we're totally ok if there's an optimizing proprietary
> > > >>> compiler and a basic open one (amd, and bunch of other
> > > >>> companies all have such dual stacks running on top of drm
> > > >>> kernel drivers). But a basic compiler that can convert basic
> > > >>> kernels into machine code is expected.
> > >
> > > >> Although the compiler is not open source the compilation flow
> > > >> lets users examine output from various stages. For example if you
> > > >> write your kernel in OpenCL/C/C++ you can view the RTL
> > > >> (Verilog/VHDL) output produced by first stage of compilation.
> > > >> Note that the compiler is really generating a custom circuit
> > > >> given a high level input which in the last phase gets synthesized
> > > >> into bitstream. Expert hardware designers can handcraft a circuit
> > > >> in RTL and feed it to the compiler. Our FPGA tools let you view
> > > >> the generated hardware design, the register map, etc. You can get
> > > >> more information about a compiled design by running XRT tool like
> > > >> xclbinutil on the generated file.
> > >
> > > >> In essence compiling for FPGAs is quite different than compiling
> > > >> for GPU/CPU/DSP.  Interestingly FPGA compilers can run anywhere
> > > >> from 30 mins to a few hours to compile a testcase.
> > >
> > > Dave> So is there any open source userspace generator for what this
> > > Dave> interface provides? Is the bitstream format that gets fed into
> > > Dave> the FPGA proprietary and is it signed?
> > >
> > > Short answer:
> > >
> > > - a bitstream is an opaque content similar to various firmware handled
> > >   by Linux, EFI capsules, x86 microcode, WiFi modems, etc.
> > >
> > > - there is no open-source generator for what the interface consume;
> > >
> > > - I do not know if it is signed;
> > >
> > > - it is probably similar to what Intel FPGA (not GPU) drivers provide
> > >   already inside the Linux kernel and I guess there is no pure
> > >   open-source way to generate their bit-stream either.
> >
> > Yeah, drivers/gpu folks wouldn't ever have merged drivers/fpga, and I
> > think there's pretty strong consensus over here that merging fpga stuff
> > without having clear specs (in the form of an executable open source
> > compiler/synthesizer/whatever) was a mistake.
>
> I don't totally understand this statement. You don't go out and ask
> people to open source their EDA tools that are used to create the ASICs
> on any piece of HW (NIC, GPU, USB controller,...) out there.
>
> FPGAs are no different.
>
> I think you need to distinguish between the general FPGA as a means to
> implement a HW solution and *FPGA based devices* that implement flows such
> as OpenCL etc. For the latter I'm more inclined to buy the equivalence
> to GPUs argument.

Yeah maybe there's a misunderstanding, my comments where in the
context of the submitted xilinx driver, and similar drivers that mean
to expose fpgas to userspace for doing stuff. If all you use your FGPA
for is to load a bitstream as a firmware blob, to then instantiate a
device which doesn't really change anymore, then then bistream is just
like firmware indeed. But we're talking about kernel/userspace api,
where (possible multiple) unpriviledged clients can do whatever they
feel like, and where we have pretty hard requirements about not
breaking userspace. To be able to fully review and more or less
indefinitely support such driver stacks, we need to understand what
they're doing and what's possible. It's the "unpriviledge userspace
submits touring complete (ok sometimes not quite touring complete, but
really powerful i/o is usually on the menu) blobs to be run on
questionable hardware by the kernel" which is the part that
distinguishes a firmware blob from the compute kernels we're talking
about here. It's not gpu vs fpga vs something else. From a kernel
driver pov those are all the same: You take a
shader/bitstream/whatever blob + a bit of state/configuration from
userspace, and need to make sure there's no DOS or other exploit in
there.

And we're not going to take "there's no problem here" on blind faith,
because we know how well designed hw is. This isn't new with
smeltdown, because gpus have been very 

Re: [RFC PATCH Xilinx Alveo 0/6] Xilinx PCIe accelerator driver

2019-04-03 Thread Moritz Fischer
Hi Daniel,

On Wed, Apr 03, 2019 at 03:14:49PM +0200, Daniel Vetter wrote:
> On Fri, Mar 29, 2019 at 06:09:18PM -0700, Ronan KERYELL wrote:
> > I am adding linux-f...@vger.kernel.org, since this is why I missed this
> > thread in the first place...
> > 
> > > On Fri, 29 Mar 2019 14:56:17 +1000, Dave Airlie  
> > > said:
> > 
> > Hi Dave!
> > 
> > Dave> On Thu, 28 Mar 2019 at 10:14, Sonal Santan  
> > wrote:
> > 
> > >>> From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch]
> > 
> > [...]
> > 
> > >>> Note: There's no expectation for the fully optimizing compiler,
> > >>> and we're totally ok if there's an optimizing proprietary
> > >>> compiler and a basic open one (amd, and bunch of other
> > >>> companies all have such dual stacks running on top of drm
> > >>> kernel drivers). But a basic compiler that can convert basic
> > >>> kernels into machine code is expected.
> > 
> > >> Although the compiler is not open source the compilation flow
> > >> lets users examine output from various stages. For example if you
> > >> write your kernel in OpenCL/C/C++ you can view the RTL
> > >> (Verilog/VHDL) output produced by first stage of compilation.
> > >> Note that the compiler is really generating a custom circuit
> > >> given a high level input which in the last phase gets synthesized
> > >> into bitstream. Expert hardware designers can handcraft a circuit
> > >> in RTL and feed it to the compiler. Our FPGA tools let you view
> > >> the generated hardware design, the register map, etc. You can get
> > >> more information about a compiled design by running XRT tool like
> > >> xclbinutil on the generated file.
> > 
> > >> In essence compiling for FPGAs is quite different than compiling
> > >> for GPU/CPU/DSP.  Interestingly FPGA compilers can run anywhere
> > >> from 30 mins to a few hours to compile a testcase.
> > 
> > Dave> So is there any open source userspace generator for what this
> > Dave> interface provides? Is the bitstream format that gets fed into
> > Dave> the FPGA proprietary and is it signed?
> > 
> > Short answer:
> > 
> > - a bitstream is an opaque content similar to various firmware handled
> >   by Linux, EFI capsules, x86 microcode, WiFi modems, etc.
> > 
> > - there is no open-source generator for what the interface consume;
> > 
> > - I do not know if it is signed;
> > 
> > - it is probably similar to what Intel FPGA (not GPU) drivers provide
> >   already inside the Linux kernel and I guess there is no pure
> >   open-source way to generate their bit-stream either.
> 
> Yeah, drivers/gpu folks wouldn't ever have merged drivers/fpga, and I
> think there's pretty strong consensus over here that merging fpga stuff
> without having clear specs (in the form of an executable open source
> compiler/synthesizer/whatever) was a mistake.

I don't totally understand this statement. You don't go out and ask
people to open source their EDA tools that are used to create the ASICs
on any piece of HW (NIC, GPU, USB controller,...) out there.

FPGAs are no different.

I think you need to distinguish between the general FPGA as a means to
implement a HW solution and *FPGA based devices* that implement flows such
as OpenCL etc. For the latter I'm more inclined to buy the equivalence
to GPUs argument.

> We just had a similar huge discussions around the recently merged
> habanalabs driver in drivers/misc, for neural network accel. There was a
> proposed drivers/accel for these. gpu folks objected, Greg and Olof were
> happy with merging.
> 
> And the exact same arguments has come up tons of times for gpus too, with
> lots proposals to merge a kernel driver with just the kernel driver being
> open source, or just the state tracker/runtime, but most definitely not
> anything looking like the compiler. Because $reasons.
> 
> Conclusion was that drivers/gpu people will continue to reject these,
> everyone else will continue to take whatever, but just don't complain to
> us if it all comes crashing down :-)
> 
> > Long answer:
> > 
> > - processors, GPU and other digital circuits are designed from a lot of
> >   elementary transistors, wires, capacitors, resistors... using some
> >   very complex (and expensive) tools from some EDA companies but at the
> >   end, after months of work, they come often with a "simple" public
> >   interface, the... instruction set! So it is rather "easy" at the end
> >   to generate some instructions with a compiler such as LLVM from a
> >   description of this ISA or some reverse engineering. Note that even if
> >   the ISA is public, it is very difficult to make another efficient
> >   processor from scratch just from this ISA, so there is often no
> >   concern about making this ISA public to develop the ecosystem ;
> > 
> > - FPGA are field-programmable gate arrays, made also from a lot of
> >   elementary transistors, wires, capacitors, resistors... but organized
> >   in 

Re: [RFC PATCH Xilinx Alveo 0/6] Xilinx PCIe accelerator driver

2019-04-03 Thread Daniel Vetter
On Fri, Mar 29, 2019 at 06:09:18PM -0700, Ronan KERYELL wrote:
> I am adding linux-f...@vger.kernel.org, since this is why I missed this
> thread in the first place...
> 
> > On Fri, 29 Mar 2019 14:56:17 +1000, Dave Airlie  
> > said:
> 
> Hi Dave!
> 
> Dave> On Thu, 28 Mar 2019 at 10:14, Sonal Santan  
> wrote:
> 
> >>> From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch]
> 
> [...]
> 
> >>> Note: There's no expectation for the fully optimizing compiler,
> >>> and we're totally ok if there's an optimizing proprietary
> >>> compiler and a basic open one (amd, and bunch of other
> >>> companies all have such dual stacks running on top of drm
> >>> kernel drivers). But a basic compiler that can convert basic
> >>> kernels into machine code is expected.
> 
> >> Although the compiler is not open source the compilation flow
> >> lets users examine output from various stages. For example if you
> >> write your kernel in OpenCL/C/C++ you can view the RTL
> >> (Verilog/VHDL) output produced by first stage of compilation.
> >> Note that the compiler is really generating a custom circuit
> >> given a high level input which in the last phase gets synthesized
> >> into bitstream. Expert hardware designers can handcraft a circuit
> >> in RTL and feed it to the compiler. Our FPGA tools let you view
> >> the generated hardware design, the register map, etc. You can get
> >> more information about a compiled design by running XRT tool like
> >> xclbinutil on the generated file.
> 
> >> In essence compiling for FPGAs is quite different than compiling
> >> for GPU/CPU/DSP.  Interestingly FPGA compilers can run anywhere
> >> from 30 mins to a few hours to compile a testcase.
> 
> Dave> So is there any open source userspace generator for what this
> Dave> interface provides? Is the bitstream format that gets fed into
> Dave> the FPGA proprietary and is it signed?
> 
> Short answer:
> 
> - a bitstream is an opaque content similar to various firmware handled
>   by Linux, EFI capsules, x86 microcode, WiFi modems, etc.
> 
> - there is no open-source generator for what the interface consume;
> 
> - I do not know if it is signed;
> 
> - it is probably similar to what Intel FPGA (not GPU) drivers provide
>   already inside the Linux kernel and I guess there is no pure
>   open-source way to generate their bit-stream either.

Yeah, drivers/gpu folks wouldn't ever have merged drivers/fpga, and I
think there's pretty strong consensus over here that merging fpga stuff
without having clear specs (in the form of an executable open source
compiler/synthesizer/whatever) was a mistake.

We just had a similar huge discussions around the recently merged
habanalabs driver in drivers/misc, for neural network accel. There was a
proposed drivers/accel for these. gpu folks objected, Greg and Olof were
happy with merging.

And the exact same arguments has come up tons of times for gpus too, with
lots proposals to merge a kernel driver with just the kernel driver being
open source, or just the state tracker/runtime, but most definitely not
anything looking like the compiler. Because $reasons.

Conclusion was that drivers/gpu people will continue to reject these,
everyone else will continue to take whatever, but just don't complain to
us if it all comes crashing down :-)

> Long answer:
> 
> - processors, GPU and other digital circuits are designed from a lot of
>   elementary transistors, wires, capacitors, resistors... using some
>   very complex (and expensive) tools from some EDA companies but at the
>   end, after months of work, they come often with a "simple" public
>   interface, the... instruction set! So it is rather "easy" at the end
>   to generate some instructions with a compiler such as LLVM from a
>   description of this ISA or some reverse engineering. Note that even if
>   the ISA is public, it is very difficult to make another efficient
>   processor from scratch just from this ISA, so there is often no
>   concern about making this ISA public to develop the ecosystem ;
> 
> - FPGA are field-programmable gate arrays, made also from a lot of
>   elementary transistors, wires, capacitors, resistors... but organized
>   in billions of very low-level elementary gates, memory elements, DSP
>   blocks, I/O blocks, clock generators, specific
>   accelerators... directly exposed to the user and that can be
>   programmed according to a configuration memory (the bitstream) that
>   details how to connect each part, routing element, configuring each
>   elemental piece of hardware.  So instead of just writing instructions
>   like on a CPU or a GPU, you need to configure each bit of the
>   architecture in such a way it does something interesting for
>   you. Concretely, you write some programs in RTL languages (Verilog,
>   VHDL) or higher-level (C/C++, OpenCL, SYCL...)  and you use some very
>   complex (and expensive) tools from 

Re: [RFC PATCH Xilinx Alveo 0/6] Xilinx PCIe accelerator driver

2019-03-31 Thread Ronan KERYELL
I am adding linux-f...@vger.kernel.org, since this is why I missed this
thread in the first place...

> On Fri, 29 Mar 2019 14:56:17 +1000, Dave Airlie  said:

Hi Dave!

Dave> On Thu, 28 Mar 2019 at 10:14, Sonal Santan  wrote:

>>> From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch]

[...]

>>> Note: There's no expectation for the fully optimizing compiler,
>>> and we're totally ok if there's an optimizing proprietary
>>> compiler and a basic open one (amd, and bunch of other
>>> companies all have such dual stacks running on top of drm
>>> kernel drivers). But a basic compiler that can convert basic
>>> kernels into machine code is expected.

>> Although the compiler is not open source the compilation flow
>> lets users examine output from various stages. For example if you
>> write your kernel in OpenCL/C/C++ you can view the RTL
>> (Verilog/VHDL) output produced by first stage of compilation.
>> Note that the compiler is really generating a custom circuit
>> given a high level input which in the last phase gets synthesized
>> into bitstream. Expert hardware designers can handcraft a circuit
>> in RTL and feed it to the compiler. Our FPGA tools let you view
>> the generated hardware design, the register map, etc. You can get
>> more information about a compiled design by running XRT tool like
>> xclbinutil on the generated file.

>> In essence compiling for FPGAs is quite different than compiling
>> for GPU/CPU/DSP.  Interestingly FPGA compilers can run anywhere
>> from 30 mins to a few hours to compile a testcase.

Dave> So is there any open source userspace generator for what this
Dave> interface provides? Is the bitstream format that gets fed into
Dave> the FPGA proprietary and is it signed?

Short answer:

- a bitstream is an opaque content similar to various firmware handled
  by Linux, EFI capsules, x86 microcode, WiFi modems, etc.

- there is no open-source generator for what the interface consume;

- I do not know if it is signed;

- it is probably similar to what Intel FPGA (not GPU) drivers provide
  already inside the Linux kernel and I guess there is no pure
  open-source way to generate their bit-stream either.


Long answer:

- processors, GPU and other digital circuits are designed from a lot of
  elementary transistors, wires, capacitors, resistors... using some
  very complex (and expensive) tools from some EDA companies but at the
  end, after months of work, they come often with a "simple" public
  interface, the... instruction set! So it is rather "easy" at the end
  to generate some instructions with a compiler such as LLVM from a
  description of this ISA or some reverse engineering. Note that even if
  the ISA is public, it is very difficult to make another efficient
  processor from scratch just from this ISA, so there is often no
  concern about making this ISA public to develop the ecosystem ;

- FPGA are field-programmable gate arrays, made also from a lot of
  elementary transistors, wires, capacitors, resistors... but organized
  in billions of very low-level elementary gates, memory elements, DSP
  blocks, I/O blocks, clock generators, specific
  accelerators... directly exposed to the user and that can be
  programmed according to a configuration memory (the bitstream) that
  details how to connect each part, routing element, configuring each
  elemental piece of hardware.  So instead of just writing instructions
  like on a CPU or a GPU, you need to configure each bit of the
  architecture in such a way it does something interesting for
  you. Concretely, you write some programs in RTL languages (Verilog,
  VHDL) or higher-level (C/C++, OpenCL, SYCL...)  and you use some very
  complex (and expensive) tools from some EDA companies to generate the
  bitstream implementing an equivalent circuit with the same
  semantics. Since the architecture is so low level, there is a direct
  mapping between the configuration memory (bitstream) and the hardware
  architecture itself, so if it is public then it is easy to duplicate
  the FPGA itself and to start a new FPGA company. That is unfortunately
  something the existing FPGA companies do not want... ;-)

To summarize:

- on a CPU & GPU, the vendor used the expensive EDA tools once already
  for you and provide the simpler ISA interface;

- on an FPGA, you have access to a pile of low-level hardware and it is
  up to you to use the lengthy process of building your own computing
  architecture using the heavy expensive very subtle EDA tools that will
  run for hours or days to generate some good-enough placement for your
  pleasure.

There is some public documentation on-line:
https://www.xilinx.com/products/silicon-devices/fpga/virtex-ultrascale-plus.html#documentation

To have an idea of the elementary architecture:
https://www.xilinx.com/support/documentation/user_guides/ug574-ultrascale-clb.pdf

Re: [RFC PATCH Xilinx Alveo 0/6] Xilinx PCIe accelerator driver

2019-03-28 Thread Dave Airlie
On Thu, 28 Mar 2019 at 10:14, Sonal Santan  wrote:
>
>
>
> > -Original Message-
> > From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch] On Behalf Of Daniel 
> > Vetter
> > Sent: Wednesday, March 27, 2019 7:12 AM
> > To: Sonal Santan 
> > Cc: Daniel Vetter ; dri-devel@lists.freedesktop.org;
> > gre...@linuxfoundation.org; Cyril Chemparathy ; linux-
> > ker...@vger.kernel.org; Lizhi Hou ; Michal Simek
> > ; airl...@redhat.com
> > Subject: Re: [RFC PATCH Xilinx Alveo 0/6] Xilinx PCIe accelerator driver
> >
> > On Wed, Mar 27, 2019 at 12:50:14PM +, Sonal Santan wrote:
> > >
> > >
> > > > -Original Message-
> > > > From: Daniel Vetter [mailto:dan...@ffwll.ch]
> > > > Sent: Wednesday, March 27, 2019 1:23 AM
> > > > To: Sonal Santan 
> > > > Cc: dri-devel@lists.freedesktop.org; gre...@linuxfoundation.org;
> > > > Cyril Chemparathy ; linux-ker...@vger.kernel.org;
> > > > Lizhi Hou ; Michal Simek ;
> > > > airl...@redhat.com
> > > > Subject: Re: [RFC PATCH Xilinx Alveo 0/6] Xilinx PCIe accelerator
> > > > driver
> > > >
> > > > On Wed, Mar 27, 2019 at 12:30 AM Sonal Santan 
> > wrote:
> > > > >
> > > > >
> > > > >
> > > > > > -Original Message-
> > > > > > From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch] On Behalf Of
> > > > > > Daniel Vetter
> > > > > > Sent: Monday, March 25, 2019 1:28 PM
> > > > > > To: Sonal Santan 
> > > > > > Cc: dri-devel@lists.freedesktop.org; gre...@linuxfoundation.org;
> > > > > > Cyril Chemparathy ;
> > > > > > linux-ker...@vger.kernel.org; Lizhi Hou ;
> > > > > > Michal Simek ; airl...@redhat.com
> > > > > > Subject: Re: [RFC PATCH Xilinx Alveo 0/6] Xilinx PCIe
> > > > > > accelerator driver
> > > > > >
> > > > > > On Tue, Mar 19, 2019 at 02:53:55PM -0700,
> > > > > > sonal.san...@xilinx.com
> > > > wrote:
> > > > > > > From: Sonal Santan 
> > > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > This patch series adds drivers for Xilinx Alveo PCIe accelerator 
> > > > > > > cards.
> > > > > > > These drivers are part of Xilinx Runtime (XRT) open source
> > > > > > > stack and have been deployed by leading FaaS vendors and many
> > > > > > > enterprise
> > > > > > customers.
> > > > > >
> > > > > > Cool, first fpga driver submitted to drm! And from a high level
> > > > > > I think this makes a lot of sense.
> > > > > >
> > > > > > > PLATFORM ARCHITECTURE
> > > > > > >
> > > > > > > Alveo PCIe platforms have a static shell and a reconfigurable
> > > > > > > (dynamic) region. The shell is automatically loaded from PROM
> > > > > > > when host is booted and PCIe is enumerated by BIOS. Shell
> > > > > > > cannot be changed till next cold reboot. The shell exposes two
> > physical functions:
> > > > > > > management physical function and user physical function.
> > > > > > >
> > > > > > > Users compile their high level design in C/C++/OpenCL or RTL
> > > > > > > into FPGA image using SDx compiler. The FPGA image packaged as
> > > > > > > xclbin file can be loaded onto reconfigurable region. The
> > > > > > > image may contain one or more compute unit. Users can
> > > > > > > dynamically swap the full image running on the reconfigurable
> > > > > > > region in order to switch between different
> > > > > > workloads.
> > > > > > >
> > > > > > > XRT DRIVERS
> > > > > > >
> > > > > > > XRT Linux kernel driver xmgmt binds to mgmt pf. The driver is
> > > > > > > modular and organized into several platform drivers which
> > > > > > > primarily handle the following functionality:
> > > > > > > 1.  ICAP programming (FPGA bitstream download with FPGA Mgr
> > > > > > > integration) 2.  Clock scaling 3.  Loading firmware container
> > > > &

RE: [RFC PATCH Xilinx Alveo 0/6] Xilinx PCIe accelerator driver

2019-03-28 Thread Sonal Santan


> -Original Message-
> From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch] On Behalf Of Daniel Vetter
> Sent: Wednesday, March 27, 2019 7:12 AM
> To: Sonal Santan 
> Cc: Daniel Vetter ; dri-devel@lists.freedesktop.org;
> gre...@linuxfoundation.org; Cyril Chemparathy ; linux-
> ker...@vger.kernel.org; Lizhi Hou ; Michal Simek
> ; airl...@redhat.com
> Subject: Re: [RFC PATCH Xilinx Alveo 0/6] Xilinx PCIe accelerator driver
> 
> On Wed, Mar 27, 2019 at 12:50:14PM +, Sonal Santan wrote:
> >
> >
> > > -Original Message-
> > > From: Daniel Vetter [mailto:dan...@ffwll.ch]
> > > Sent: Wednesday, March 27, 2019 1:23 AM
> > > To: Sonal Santan 
> > > Cc: dri-devel@lists.freedesktop.org; gre...@linuxfoundation.org;
> > > Cyril Chemparathy ; linux-ker...@vger.kernel.org;
> > > Lizhi Hou ; Michal Simek ;
> > > airl...@redhat.com
> > > Subject: Re: [RFC PATCH Xilinx Alveo 0/6] Xilinx PCIe accelerator
> > > driver
> > >
> > > On Wed, Mar 27, 2019 at 12:30 AM Sonal Santan 
> wrote:
> > > >
> > > >
> > > >
> > > > > -Original Message-
> > > > > From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch] On Behalf Of
> > > > > Daniel Vetter
> > > > > Sent: Monday, March 25, 2019 1:28 PM
> > > > > To: Sonal Santan 
> > > > > Cc: dri-devel@lists.freedesktop.org; gre...@linuxfoundation.org;
> > > > > Cyril Chemparathy ;
> > > > > linux-ker...@vger.kernel.org; Lizhi Hou ;
> > > > > Michal Simek ; airl...@redhat.com
> > > > > Subject: Re: [RFC PATCH Xilinx Alveo 0/6] Xilinx PCIe
> > > > > accelerator driver
> > > > >
> > > > > On Tue, Mar 19, 2019 at 02:53:55PM -0700,
> > > > > sonal.san...@xilinx.com
> > > wrote:
> > > > > > From: Sonal Santan 
> > > > > >
> > > > > > Hello,
> > > > > >
> > > > > > This patch series adds drivers for Xilinx Alveo PCIe accelerator 
> > > > > > cards.
> > > > > > These drivers are part of Xilinx Runtime (XRT) open source
> > > > > > stack and have been deployed by leading FaaS vendors and many
> > > > > > enterprise
> > > > > customers.
> > > > >
> > > > > Cool, first fpga driver submitted to drm! And from a high level
> > > > > I think this makes a lot of sense.
> > > > >
> > > > > > PLATFORM ARCHITECTURE
> > > > > >
> > > > > > Alveo PCIe platforms have a static shell and a reconfigurable
> > > > > > (dynamic) region. The shell is automatically loaded from PROM
> > > > > > when host is booted and PCIe is enumerated by BIOS. Shell
> > > > > > cannot be changed till next cold reboot. The shell exposes two
> physical functions:
> > > > > > management physical function and user physical function.
> > > > > >
> > > > > > Users compile their high level design in C/C++/OpenCL or RTL
> > > > > > into FPGA image using SDx compiler. The FPGA image packaged as
> > > > > > xclbin file can be loaded onto reconfigurable region. The
> > > > > > image may contain one or more compute unit. Users can
> > > > > > dynamically swap the full image running on the reconfigurable
> > > > > > region in order to switch between different
> > > > > workloads.
> > > > > >
> > > > > > XRT DRIVERS
> > > > > >
> > > > > > XRT Linux kernel driver xmgmt binds to mgmt pf. The driver is
> > > > > > modular and organized into several platform drivers which
> > > > > > primarily handle the following functionality:
> > > > > > 1.  ICAP programming (FPGA bitstream download with FPGA Mgr
> > > > > > integration) 2.  Clock scaling 3.  Loading firmware container
> > > > > > also called dsabin (embedded Microblaze
> > > > > > firmware for ERT and XMC, optional clearing bitstream) 4.
> > > > > > In-band
> > > > > > sensors: temp, voltage, power, etc.
> > > > > > 5.  AXI Firewall management
> > > > > > 6.  Device reset and rescan
> > > > > > 7.  Hardware mailbox for communication between two physical
> > > &g

RE: [RFC PATCH Xilinx Alveo 0/6] Xilinx PCIe accelerator driver

2019-03-27 Thread Sonal Santan


> -Original Message-
> From: Daniel Vetter [mailto:dan...@ffwll.ch]
> Sent: Wednesday, March 27, 2019 1:23 AM
> To: Sonal Santan 
> Cc: dri-devel@lists.freedesktop.org; gre...@linuxfoundation.org; Cyril
> Chemparathy ; linux-ker...@vger.kernel.org; Lizhi Hou
> ; Michal Simek ; airl...@redhat.com
> Subject: Re: [RFC PATCH Xilinx Alveo 0/6] Xilinx PCIe accelerator driver
> 
> On Wed, Mar 27, 2019 at 12:30 AM Sonal Santan  wrote:
> >
> >
> >
> > > -Original Message-
> > > From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch] On Behalf Of
> > > Daniel Vetter
> > > Sent: Monday, March 25, 2019 1:28 PM
> > > To: Sonal Santan 
> > > Cc: dri-devel@lists.freedesktop.org; gre...@linuxfoundation.org;
> > > Cyril Chemparathy ; linux-ker...@vger.kernel.org;
> > > Lizhi Hou ; Michal Simek ;
> > > airl...@redhat.com
> > > Subject: Re: [RFC PATCH Xilinx Alveo 0/6] Xilinx PCIe accelerator
> > > driver
> > >
> > > On Tue, Mar 19, 2019 at 02:53:55PM -0700, sonal.san...@xilinx.com
> wrote:
> > > > From: Sonal Santan 
> > > >
> > > > Hello,
> > > >
> > > > This patch series adds drivers for Xilinx Alveo PCIe accelerator cards.
> > > > These drivers are part of Xilinx Runtime (XRT) open source stack
> > > > and have been deployed by leading FaaS vendors and many enterprise
> > > customers.
> > >
> > > Cool, first fpga driver submitted to drm! And from a high level I
> > > think this makes a lot of sense.
> > >
> > > > PLATFORM ARCHITECTURE
> > > >
> > > > Alveo PCIe platforms have a static shell and a reconfigurable
> > > > (dynamic) region. The shell is automatically loaded from PROM when
> > > > host is booted and PCIe is enumerated by BIOS. Shell cannot be
> > > > changed till next cold reboot. The shell exposes two physical functions:
> > > > management physical function and user physical function.
> > > >
> > > > Users compile their high level design in C/C++/OpenCL or RTL into
> > > > FPGA image using SDx compiler. The FPGA image packaged as xclbin
> > > > file can be loaded onto reconfigurable region. The image may
> > > > contain one or more compute unit. Users can dynamically swap the
> > > > full image running on the reconfigurable region in order to switch
> > > > between different
> > > workloads.
> > > >
> > > > XRT DRIVERS
> > > >
> > > > XRT Linux kernel driver xmgmt binds to mgmt pf. The driver is
> > > > modular and organized into several platform drivers which
> > > > primarily handle the following functionality:
> > > > 1.  ICAP programming (FPGA bitstream download with FPGA Mgr
> > > > integration) 2.  Clock scaling 3.  Loading firmware container also
> > > > called dsabin (embedded Microblaze
> > > > firmware for ERT and XMC, optional clearing bitstream) 4.
> > > > In-band
> > > > sensors: temp, voltage, power, etc.
> > > > 5.  AXI Firewall management
> > > > 6.  Device reset and rescan
> > > > 7.  Hardware mailbox for communication between two physical
> > > > functions
> > > >
> > > > XRT Linux kernel driver xocl binds to user pf. Like its peer, this
> > > > driver is also modular and organized into several platform drivers
> > > > which handle the following functionality:
> > > > 1.  Device memory topology discovery and memory management 2.
> > > > Buffer object abstraction and management for client process 3.
> > > > XDMA MM PCIe DMA engine programming 4.  Multi-process aware
> context management 5.
> > > > Compute unit execution management (optionally with help of ERT) for
> > > > client processes
> > > > 6.  Hardware mailbox for communication between two physical
> > > > functions
> > > >
> > > > The drivers export ioctls and sysfs nodes for various services.
> > > > xocl driver makes heavy use of DRM GEM features for device memory
> > > > management, reference counting, mmap support and export/import.
> > > > xocl also includes a simple scheduler called KDS which schedules
> > > > compute units and interacts with hardware scheduler running ERT
> > > > firmware. The scheduler understands custom opcodes packaged into
> > > > command objects
> > > and
> > > > provides

Re: [RFC PATCH Xilinx Alveo 0/6] Xilinx PCIe accelerator driver

2019-03-27 Thread Daniel Vetter
On Wed, Mar 27, 2019 at 12:50:14PM +, Sonal Santan wrote:
> 
> 
> > -Original Message-
> > From: Daniel Vetter [mailto:dan...@ffwll.ch]
> > Sent: Wednesday, March 27, 2019 1:23 AM
> > To: Sonal Santan 
> > Cc: dri-devel@lists.freedesktop.org; gre...@linuxfoundation.org; Cyril
> > Chemparathy ; linux-ker...@vger.kernel.org; Lizhi Hou
> > ; Michal Simek ; airl...@redhat.com
> > Subject: Re: [RFC PATCH Xilinx Alveo 0/6] Xilinx PCIe accelerator driver
> > 
> > On Wed, Mar 27, 2019 at 12:30 AM Sonal Santan  wrote:
> > >
> > >
> > >
> > > > -Original Message-
> > > > From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch] On Behalf Of
> > > > Daniel Vetter
> > > > Sent: Monday, March 25, 2019 1:28 PM
> > > > To: Sonal Santan 
> > > > Cc: dri-devel@lists.freedesktop.org; gre...@linuxfoundation.org;
> > > > Cyril Chemparathy ; linux-ker...@vger.kernel.org;
> > > > Lizhi Hou ; Michal Simek ;
> > > > airl...@redhat.com
> > > > Subject: Re: [RFC PATCH Xilinx Alveo 0/6] Xilinx PCIe accelerator
> > > > driver
> > > >
> > > > On Tue, Mar 19, 2019 at 02:53:55PM -0700, sonal.san...@xilinx.com
> > wrote:
> > > > > From: Sonal Santan 
> > > > >
> > > > > Hello,
> > > > >
> > > > > This patch series adds drivers for Xilinx Alveo PCIe accelerator 
> > > > > cards.
> > > > > These drivers are part of Xilinx Runtime (XRT) open source stack
> > > > > and have been deployed by leading FaaS vendors and many enterprise
> > > > customers.
> > > >
> > > > Cool, first fpga driver submitted to drm! And from a high level I
> > > > think this makes a lot of sense.
> > > >
> > > > > PLATFORM ARCHITECTURE
> > > > >
> > > > > Alveo PCIe platforms have a static shell and a reconfigurable
> > > > > (dynamic) region. The shell is automatically loaded from PROM when
> > > > > host is booted and PCIe is enumerated by BIOS. Shell cannot be
> > > > > changed till next cold reboot. The shell exposes two physical 
> > > > > functions:
> > > > > management physical function and user physical function.
> > > > >
> > > > > Users compile their high level design in C/C++/OpenCL or RTL into
> > > > > FPGA image using SDx compiler. The FPGA image packaged as xclbin
> > > > > file can be loaded onto reconfigurable region. The image may
> > > > > contain one or more compute unit. Users can dynamically swap the
> > > > > full image running on the reconfigurable region in order to switch
> > > > > between different
> > > > workloads.
> > > > >
> > > > > XRT DRIVERS
> > > > >
> > > > > XRT Linux kernel driver xmgmt binds to mgmt pf. The driver is
> > > > > modular and organized into several platform drivers which
> > > > > primarily handle the following functionality:
> > > > > 1.  ICAP programming (FPGA bitstream download with FPGA Mgr
> > > > > integration) 2.  Clock scaling 3.  Loading firmware container also
> > > > > called dsabin (embedded Microblaze
> > > > > firmware for ERT and XMC, optional clearing bitstream) 4.
> > > > > In-band
> > > > > sensors: temp, voltage, power, etc.
> > > > > 5.  AXI Firewall management
> > > > > 6.  Device reset and rescan
> > > > > 7.  Hardware mailbox for communication between two physical
> > > > > functions
> > > > >
> > > > > XRT Linux kernel driver xocl binds to user pf. Like its peer, this
> > > > > driver is also modular and organized into several platform drivers
> > > > > which handle the following functionality:
> > > > > 1.  Device memory topology discovery and memory management 2.
> > > > > Buffer object abstraction and management for client process 3.
> > > > > XDMA MM PCIe DMA engine programming 4.  Multi-process aware
> > context management 5.
> > > > > Compute unit execution management (optionally with help of ERT) for
> > > > > client processes
> > > > > 6.  Hardware mailbox for communication between two physical
> > > > > functions
> > > > >
> > > > > The drivers exp

RE: [RFC PATCH Xilinx Alveo 0/6] Xilinx PCIe accelerator driver

2019-03-27 Thread Sonal Santan


> -Original Message-
> From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch] On Behalf Of Daniel Vetter
> Sent: Monday, March 25, 2019 1:28 PM
> To: Sonal Santan 
> Cc: dri-devel@lists.freedesktop.org; gre...@linuxfoundation.org; Cyril
> Chemparathy ; linux-ker...@vger.kernel.org; Lizhi Hou
> ; Michal Simek ; airl...@redhat.com
> Subject: Re: [RFC PATCH Xilinx Alveo 0/6] Xilinx PCIe accelerator driver
> 
> On Tue, Mar 19, 2019 at 02:53:55PM -0700, sonal.san...@xilinx.com wrote:
> > From: Sonal Santan 
> >
> > Hello,
> >
> > This patch series adds drivers for Xilinx Alveo PCIe accelerator cards.
> > These drivers are part of Xilinx Runtime (XRT) open source stack and
> > have been deployed by leading FaaS vendors and many enterprise
> customers.
> 
> Cool, first fpga driver submitted to drm! And from a high level I think this
> makes a lot of sense.
> 
> > PLATFORM ARCHITECTURE
> >
> > Alveo PCIe platforms have a static shell and a reconfigurable
> > (dynamic) region. The shell is automatically loaded from PROM when
> > host is booted and PCIe is enumerated by BIOS. Shell cannot be changed
> > till next cold reboot. The shell exposes two physical functions:
> > management physical function and user physical function.
> >
> > Users compile their high level design in C/C++/OpenCL or RTL into FPGA
> > image using SDx compiler. The FPGA image packaged as xclbin file can
> > be loaded onto reconfigurable region. The image may contain one or
> > more compute unit. Users can dynamically swap the full image running
> > on the reconfigurable region in order to switch between different
> workloads.
> >
> > XRT DRIVERS
> >
> > XRT Linux kernel driver xmgmt binds to mgmt pf. The driver is modular
> > and organized into several platform drivers which primarily handle the
> > following functionality:
> > 1.  ICAP programming (FPGA bitstream download with FPGA Mgr
> > integration) 2.  Clock scaling 3.  Loading firmware container also
> > called dsabin (embedded Microblaze
> > firmware for ERT and XMC, optional clearing bitstream) 4.  In-band
> > sensors: temp, voltage, power, etc.
> > 5.  AXI Firewall management
> > 6.  Device reset and rescan
> > 7.  Hardware mailbox for communication between two physical functions
> >
> > XRT Linux kernel driver xocl binds to user pf. Like its peer, this
> > driver is also modular and organized into several platform drivers
> > which handle the following functionality:
> > 1.  Device memory topology discovery and memory management 2.  Buffer
> > object abstraction and management for client process 3.  XDMA MM PCIe
> > DMA engine programming 4.  Multi-process aware context management 5.
> > Compute unit execution management (optionally with help of ERT) for
> > client processes
> > 6.  Hardware mailbox for communication between two physical functions
> >
> > The drivers export ioctls and sysfs nodes for various services. xocl
> > driver makes heavy use of DRM GEM features for device memory
> > management, reference counting, mmap support and export/import. xocl
> > also includes a simple scheduler called KDS which schedules compute
> > units and interacts with hardware scheduler running ERT firmware. The
> > scheduler understands custom opcodes packaged into command objects
> and
> > provides an asynchronous command done notification via POSIX poll.
> >
> > More details on architecture, software APIs, ioctl definitions,
> > execution model, etc. is available as Sphinx documentation--
> >
> > https://xilinx.github.io/XRT/2018.3/html/index.html
> >
> > The complete runtime software stack (XRT) which includes out of tree
> > kernel drivers, user space libraries, board utilities and firmware for
> > the hardware scheduler is open source and available at
> > https://github.com/Xilinx/XRT
> 
> Before digging into the implementation side more I looked into the userspace
> here. I admit I got lost a bit, since there's lots of indirections and 
> abstractions
> going on, but it seems like this is just a fancy ioctl wrapper/driver backend
> abstractions. Not really something applications would use.
Sonal Santan 

4:20 PM (1 minute ago)

to me


> -Original Message-
> From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch] On Behalf Of Daniel Vetter
> Sent: Monday, March 25, 2019 1:28 PM
> To: Sonal Santan 
> Cc: dri-devel@lists.freedesktop.org; gre...@linuxfoundation.org; Cyril
> Chemparathy ; linux-ker...@vger.kernel.org; Lizhi Hou
> ; Michal Simek ; airl...@redhat.com
> Subject: Re: [RFC PATCH Xilinx A

Re: [RFC PATCH Xilinx Alveo 0/6] Xilinx PCIe accelerator driver

2019-03-27 Thread Daniel Vetter
On Wed, Mar 27, 2019 at 12:30 AM Sonal Santan  wrote:
>
>
>
> > -Original Message-
> > From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch] On Behalf Of Daniel 
> > Vetter
> > Sent: Monday, March 25, 2019 1:28 PM
> > To: Sonal Santan 
> > Cc: dri-devel@lists.freedesktop.org; gre...@linuxfoundation.org; Cyril
> > Chemparathy ; linux-ker...@vger.kernel.org; Lizhi Hou
> > ; Michal Simek ; airl...@redhat.com
> > Subject: Re: [RFC PATCH Xilinx Alveo 0/6] Xilinx PCIe accelerator driver
> >
> > On Tue, Mar 19, 2019 at 02:53:55PM -0700, sonal.san...@xilinx.com wrote:
> > > From: Sonal Santan 
> > >
> > > Hello,
> > >
> > > This patch series adds drivers for Xilinx Alveo PCIe accelerator cards.
> > > These drivers are part of Xilinx Runtime (XRT) open source stack and
> > > have been deployed by leading FaaS vendors and many enterprise
> > customers.
> >
> > Cool, first fpga driver submitted to drm! And from a high level I think this
> > makes a lot of sense.
> >
> > > PLATFORM ARCHITECTURE
> > >
> > > Alveo PCIe platforms have a static shell and a reconfigurable
> > > (dynamic) region. The shell is automatically loaded from PROM when
> > > host is booted and PCIe is enumerated by BIOS. Shell cannot be changed
> > > till next cold reboot. The shell exposes two physical functions:
> > > management physical function and user physical function.
> > >
> > > Users compile their high level design in C/C++/OpenCL or RTL into FPGA
> > > image using SDx compiler. The FPGA image packaged as xclbin file can
> > > be loaded onto reconfigurable region. The image may contain one or
> > > more compute unit. Users can dynamically swap the full image running
> > > on the reconfigurable region in order to switch between different
> > workloads.
> > >
> > > XRT DRIVERS
> > >
> > > XRT Linux kernel driver xmgmt binds to mgmt pf. The driver is modular
> > > and organized into several platform drivers which primarily handle the
> > > following functionality:
> > > 1.  ICAP programming (FPGA bitstream download with FPGA Mgr
> > > integration) 2.  Clock scaling 3.  Loading firmware container also
> > > called dsabin (embedded Microblaze
> > > firmware for ERT and XMC, optional clearing bitstream) 4.  In-band
> > > sensors: temp, voltage, power, etc.
> > > 5.  AXI Firewall management
> > > 6.  Device reset and rescan
> > > 7.  Hardware mailbox for communication between two physical functions
> > >
> > > XRT Linux kernel driver xocl binds to user pf. Like its peer, this
> > > driver is also modular and organized into several platform drivers
> > > which handle the following functionality:
> > > 1.  Device memory topology discovery and memory management 2.  Buffer
> > > object abstraction and management for client process 3.  XDMA MM PCIe
> > > DMA engine programming 4.  Multi-process aware context management 5.
> > > Compute unit execution management (optionally with help of ERT) for
> > > client processes
> > > 6.  Hardware mailbox for communication between two physical functions
> > >
> > > The drivers export ioctls and sysfs nodes for various services. xocl
> > > driver makes heavy use of DRM GEM features for device memory
> > > management, reference counting, mmap support and export/import. xocl
> > > also includes a simple scheduler called KDS which schedules compute
> > > units and interacts with hardware scheduler running ERT firmware. The
> > > scheduler understands custom opcodes packaged into command objects
> > and
> > > provides an asynchronous command done notification via POSIX poll.
> > >
> > > More details on architecture, software APIs, ioctl definitions,
> > > execution model, etc. is available as Sphinx documentation--
> > >
> > > https://xilinx.github.io/XRT/2018.3/html/index.html
> > >
> > > The complete runtime software stack (XRT) which includes out of tree
> > > kernel drivers, user space libraries, board utilities and firmware for
> > > the hardware scheduler is open source and available at
> > > https://github.com/Xilinx/XRT
> >
> > Before digging into the implementation side more I looked into the userspace
> > here. I admit I got lost a bit, since there's lots of indirections and 
> > abstractions
> > going on, but it seems like this is just a fancy ioctl wrapper/driver 
> > backend
> >

Re: [RFC PATCH Xilinx Alveo 0/6] Xilinx PCIe accelerator driver

2019-03-25 Thread Daniel Vetter
On Tue, Mar 19, 2019 at 02:53:55PM -0700, sonal.san...@xilinx.com wrote:
> From: Sonal Santan 
> 
> Hello,
> 
> This patch series adds drivers for Xilinx Alveo PCIe accelerator cards.
> These drivers are part of Xilinx Runtime (XRT) open source stack and
> have been deployed by leading FaaS vendors and many enterprise customers.

Cool, first fpga driver submitted to drm! And from a high level I think
this makes a lot of sense.

> PLATFORM ARCHITECTURE
> 
> Alveo PCIe platforms have a static shell and a reconfigurable (dynamic)
> region. The shell is automatically loaded from PROM when host is booted
> and PCIe is enumerated by BIOS. Shell cannot be changed till next cold
> reboot. The shell exposes two physical functions: management physical
> function and user physical function.
> 
> Users compile their high level design in C/C++/OpenCL or RTL into FPGA
> image using SDx compiler. The FPGA image packaged as xclbin file can be
> loaded onto reconfigurable region. The image may contain one or more
> compute unit. Users can dynamically swap the full image running on the
> reconfigurable region in order to switch between different workloads.
> 
> XRT DRIVERS
> 
> XRT Linux kernel driver xmgmt binds to mgmt pf. The driver is modular and
> organized into several platform drivers which primarily handle the
> following functionality:
> 1.  ICAP programming (FPGA bitstream download with FPGA Mgr integration)
> 2.  Clock scaling
> 3.  Loading firmware container also called dsabin (embedded Microblaze
> firmware for ERT and XMC, optional clearing bitstream)
> 4.  In-band sensors: temp, voltage, power, etc.
> 5.  AXI Firewall management
> 6.  Device reset and rescan
> 7.  Hardware mailbox for communication between two physical functions
> 
> XRT Linux kernel driver xocl binds to user pf. Like its peer, this driver
> is also modular and organized into several platform drivers which handle
> the following functionality:
> 1.  Device memory topology discovery and memory management
> 2.  Buffer object abstraction and management for client process
> 3.  XDMA MM PCIe DMA engine programming
> 4.  Multi-process aware context management
> 5.  Compute unit execution management (optionally with help of ERT) for
> client processes
> 6.  Hardware mailbox for communication between two physical functions
> 
> The drivers export ioctls and sysfs nodes for various services. xocl
> driver makes heavy use of DRM GEM features for device memory management,
> reference counting, mmap support and export/import. xocl also includes a
> simple scheduler called KDS which schedules compute units and interacts
> with hardware scheduler running ERT firmware. The scheduler understands
> custom opcodes packaged into command objects and provides an asynchronous
> command done notification via POSIX poll.
> 
> More details on architecture, software APIs, ioctl definitions, execution
> model, etc. is available as Sphinx documentation--
> 
> https://xilinx.github.io/XRT/2018.3/html/index.html
> 
> The complete runtime software stack (XRT) which includes out of tree
> kernel drivers, user space libraries, board utilities and firmware for
> the hardware scheduler is open source and available at
> https://github.com/Xilinx/XRT

Before digging into the implementation side more I looked into the
userspace here. I admit I got lost a bit, since there's lots of
indirections and abstractions going on, but it seems like this is just a
fancy ioctl wrapper/driver backend abstractions. Not really something
applications would use.

From the pretty picture on github it looks like there's some
opencl/ml/other fancy stuff sitting on top that applications would use. Is
that also available?

Thanks, Daniel

> 
> Thanks,
> -Sonal
> 
> Sonal Santan (6):
>   Add skeleton code: ioctl definitions and build hooks
>   Global data structures shared between xocl and xmgmt drivers
>   Add platform drivers for various IPs and frameworks
>   Add core of XDMA driver
>   Add management driver
>   Add user physical function driver
> 
>  drivers/gpu/drm/Kconfig|2 +
>  drivers/gpu/drm/Makefile   |1 +
>  drivers/gpu/drm/xocl/Kconfig   |   22 +
>  drivers/gpu/drm/xocl/Makefile  |3 +
>  drivers/gpu/drm/xocl/devices.h |  954 +
>  drivers/gpu/drm/xocl/ert.h |  385 ++
>  drivers/gpu/drm/xocl/lib/Makefile.in   |   16 +
>  drivers/gpu/drm/xocl/lib/cdev_sgdma.h  |   63 +
>  drivers/gpu/drm/xocl/lib/libxdma.c | 4368 
>  drivers/gpu/drm/xocl/lib/libxdma.h |  596 +++
>  drivers/gpu/drm/xocl/lib/libxdma_api.h |  127 +
>  drivers/gpu/drm/xocl/mgmtpf/Makefile   |   29 +
>  drivers/gpu/drm/xocl/mgmtpf/mgmt-core.c|  960 +
>  drivers/gpu/drm/xocl/mgmtpf/mgmt-core.h|  147 +
>  drivers/gpu/drm/xocl/mgmtpf/mgmt-cw.c  |   30 +
>  drivers/gpu/drm/xocl/mgmtpf/mgmt-ioctl.c   |  148 +
>  drivers/gpu/drm/xocl/mgmtpf/mgmt-reg.h |