Hi, B.
In all honesty, I've looked at some of the supplied materials in the Vivado_hls software,
the free web-version has got a number of examples that can work on my very cheap
Parallella board. It's the cheapest Zynq board on the marker, which of course might not be
the best way to start a Artix-6 project with large DSP computation blocks, partial
reconfiguring and commercial Intellectual Property and commercial Vivado modules like DSP
designer. So I wasn't trying to sound like a blue shirt, perfectly correct and intent
sales-person, rather I was sharing enthusiasm about this technology and the relatively
powerful tool to get your DSP code rolling from a C program.
Of course, nothing is a simple as it seems at first sight appears to be some sort of way
of engineers that want interesting projects to get respected by their fellow designer, but
I have tested some code (one small but not simple example I've shared here, IIRC) made on
my free Linux Xilinx software, ftp-ed and device-loaded into the Zynq board's FPGA, and it
worked perfectly. That was using a 100MHz clock. The Parallella board is able to run at
333Mhz and I seem to recall some of the standard design to make the Zynq work with the
additional chip on that little board runs even faster, but I didn't check.
As I see it, the result of the optimized C-to-Verilog effort is to end up with a netlist
with ports and (in this case) Xilinx IP like Rams, DSP slices, etc, so that indeed it's a
matter of loading the resulting IP "project" into the normal Vivado to compile the DSP
function into a function that in some cases can be coupled with the AXI interface, so that
the Zynq ARM cores can talk to the function you've made (at about 3 mega 32 bit word r/w
accesses per second, the way I did it). So it could well be that if vivado_hlx says the
timing for the chip involved is 3 Nano seconds for clock, all kinds of factors make the
final Verilog compile decided it should be less. Also, that might well be true only for
relatively simple computations, like multiplies or something.
Having used the example that is supplied with Vivado (used a 2015 one and now the latest
2016.2), there's a 1 clock cycle optimized FPGA design coming from the C-to-Verilog
compile, but only after about 6 steps of optimization, that include a lot of "#pragma"'s
or parallel Tcl code per C-program to get a matrix multiplication to that point!! It's
pretty smart to optimize, but it won't do parallel generation of blocks to increase
pipeline start up time or core skewed pipe lining to my knowledge. The example, and a
course to run it is in application note
ug871-vivado-high-level-synthesis-tutorial.pdf
from Xilinx (should be easy to find on the web if you would like to have a look
at it).
T. Verelst
_______________________________________________
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp