On Sat, 2013-11-30 at 12:48 +0000, Brian Drummond wrote: > The other thing you could try for the fsm is a 64-bit build of ghdl, on > a machine with at least 8 GB of physical RAM. "Serious" gcc users regard > 16GB as not too large for some purposes. I would start with the source > and build process from https://gna.org/bugs/?21305. I ran GHDL up to a > 4.8GB footprint here but I only have 4GB so it was swapping badly at > that stage, but it did prove that 4GB is not an upper limit for ghdl.
Argh However I still feel that much memory is way beyond what should be enough to just compile this fsm.vhd... After all I generated this, along with the rest of the circuit architecture, with 100MB RAM and a few dozen minutes (a few minutes if I manually do part of the job). As far as I know, I could dump x86 assembly the exact same way with an appropriate dump function. I'd really love that the idea of reducing gcc back-end optim level could lead to good results, if not outright solving the problem. > On the subject of high level synthesis : have you seen these projects? > > http://www.nkavvadias.com/hercules/ > > What's interesting about this one, to me, is that it involves GIMPLE as > an intermediate language, with the C front end based on gcc. > > Which opens up the hypothetical possibility of adding > --enable-languages=ada to the configure stage, and offering high level > synth from Ada (perhaps Fortran would appeal in some circles) Actually, my tool inherits the code parser of another HLS tool, UGH. This parser IS gcc's parser, modified to an unknown level, from an old gcc version. What the HLS part of my tool does is take parsed GIMPLE, convert it to some other graph more appropriate for HLS. > If you've never used Ada, you may be wondering, why? I could suggest > many reasons, but here's one useful for HLS : fixed point types fully > supported by the language, and you get to choose the width... > > Or the York Hardware Ada Compiler : for example > ftp://ftp.cs.york.ac.uk/papers/rtspapers/R%3AWard%3A2001.ps > or in more detail > http://www.cs.york.ac.uk/ftpdir/reports/2005/YCST/09/YCST-2005-09.pdf > A practical detail that undermines this paper a little is that the > language subset he uses for his "sequential Ada" example (p.176 of the > latter paper) is ... synthesisable VHDL. > > Seriously. > > Substitute " to " for " .. ", prepend "variable " to each variable > declaration, and wrap the example in a process, and XST swallows it > whole. > > And spits out a lump of hardware, using about 3x as many CLBs as his > resource estimates (bigger if you factor in that I targetted a newer > FPGA) to implement the task in a single (very slow!) cycle. > > Sound familiar? Yes it does. There are many HSL tools in the wild... some that do rather only code transcription to vhdl, others that do much more elaborate things. However generating appropriately pipelined circuits is like the Holy Grail of HLS. Really, I think some tools have achieved that, like GAUT, maybe SPARK and LegUp. My works with AUGH are not (yet) at that level, however resource usage estimated by AUGH are guaranted after place and route (calibration for back-end tols). > For me, the important step in the York Hardware Ada Compiler is ... > it reveals techniques for extracting sequentiality from an inherently > parallel problem! > > In other words, automatic resource sharing, to reduce the hardware size. > (Ironically, the exact opposite of the GPU programmers' Grand Challenge > turns out to be important!) > > At which point it *might* interest you. It *may* have cracked a > different but important part of the puzzle. > > My opinion is that he takes it too far, extracting all the sequentiality > he can find, hundreds of cycles, as if he was compiling for a > single-stream CPU. And the result is - to me - disappointing; the > hardware isn't orders of magnitude smaller. At least when targeting FPGA, it is known that the working frequency of the resulting circuit will be 10x lower that what a microprocessor (or GPU) achieves. So the only way to outperform these it to extract every bit of parallelism achievable, use custom operators that are built specifically for the application, take care not to exceed clock period and pipeline everything as much as possible. All this while ensuring the HW resource limits of the targeted FPGA are fulfilled. However, if the user specifically wants to obtain combinatorial circuit, then there is only one way. > I will also read your papers with interest Will be sent separately Best regards, Adrien _______________________________________________ Ghdl-discuss mailing list [email protected] https://mail.gna.org/listinfo/ghdl-discuss
