Hi, B.

In all honesty, I've looked at some of the supplied materials in the Vivado_hls software, the free web-version has got a number of examples that can work on my very cheap Parallella board. It's the cheapest Zynq board on the marker, which of course might not be the best way to start a Artix-6 project with large DSP computation blocks, partial reconfiguring and commercial Intellectual Property and commercial Vivado modules like DSP designer. So I wasn't trying to sound like a blue shirt, perfectly correct and intent sales-person, rather I was sharing enthusiasm about this technology and the relatively powerful tool to get your DSP code rolling from a C program.

Of course, nothing is a simple as it seems at first sight appears to be some sort of way of engineers that want interesting projects to get respected by their fellow designer, but I have tested some code (one small but not simple example I've shared here, IIRC) made on my free Linux Xilinx software, ftp-ed and device-loaded into the Zynq board's FPGA, and it worked perfectly. That was using a 100MHz clock. The Parallella board is able to run at 333Mhz and I seem to recall some of the standard design to make the Zynq work with the additional chip on that little board runs even faster, but I didn't check.

As I see it, the result of the optimized C-to-Verilog effort is to end up with a netlist with ports and (in this case) Xilinx IP like Rams, DSP slices, etc, so that indeed it's a matter of loading the resulting IP "project" into the normal Vivado to compile the DSP function into a function that in some cases can be coupled with the AXI interface, so that the Zynq ARM cores can talk to the function you've made (at about 3 mega 32 bit word r/w accesses per second, the way I did it). So it could well be that if vivado_hlx says the timing for the chip involved is 3 Nano seconds for clock, all kinds of factors make the final Verilog compile decided it should be less. Also, that might well be true only for relatively simple computations, like multiplies or something.

Having used the example that is supplied with Vivado (used a 2015 one and now the latest 2016.2), there's a 1 clock cycle optimized FPGA design coming from the C-to-Verilog compile, but only after about 6 steps of optimization, that include a lot of "#pragma"'s or parallel Tcl code per C-program to get a matrix multiplication to that point!! It's pretty smart to optimize, but it won't do parallel generation of blocks to increase pipeline start up time or core skewed pipe lining to my knowledge. The example, and a course to run it is in application note

  ug871-vivado-high-level-synthesis-tutorial.pdf

from Xilinx (should be easy to find on the web if you would like to have a look 
at it).

T. Verelst
_______________________________________________
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp

Reply via email to